SciTech

Password data promise future applications

Credit: Claire Gianakas /Editor-in-Chief Credit: Claire Gianakas /Editor-in-Chief

Passwords are one of the Internet’s most ubiquitous features, as they have been used throughout history to authenticate the identities of people trying to access privileged information. But as code-breaking computations become more and more complex, it’s becoming increasingly difficult to prove that a user is human with just a simple password.

In order to tackle this mounting challenge, cyber security researchers have begun studying rigorous human authentication methods. Research into cyber security is often very difficult to conduct, however, since, in order to come to empirically significant conclusions, researchers have to have access to a robust body of data.

This data, which often includes private information like passwords and user IDs, is safeguarded to the point that these experiments are often over before they begin. However, a team of researchers from Carnegie Mellon University and Stanford University have created a breakthrough method that will allow researchers to safely share this privileged information to further the study of cyber security protocols.

The research team consisted of Anupam Datta, an associate professor in Carnegie Mellon’s Department of Computer Science, Jeremiah Blocki, a postdoctoral researcher at Microsoft Research, and Joseph Bonneau, a postdoctoral researcher at Stanford University.

Recently, this project has gained the confidence of Yahoo!, a large-scale search engine that boasts around 800 million active users each month across the globe. Yahoo! has such faith in this method that they have agreed to share password statistics from 70 million of their users. This tremendous data set will allow researchers throughout the field of cyber security to analyze real-world information and construct a more accurate picture of passwords in their natural habitat, and the ways by which they might be better protected or constructed.

“This is the first time a major company has released frequency information on user passwords,” said Datta in a university press release. “It’s the kind of information that legitimate researchers can use to assess the impact of a security breach and to make informed decisions about password defenses. This is extremely valuable, so we hope other organizations will follow Yahoo’s lead.”

The algorithm works, essentially, by “distorting numbers in the data set so the list is ‘differentially private,’” according to the university press release.

Differential privacy is a cryptographic term which essentially means that the accuracy of queries, or searches, in databases are maximized, but the chances of identifying the records from said database are minimized. In terms of the Yahoo! data set, this means that when using the information, there will be a high degree of accuracy, but finding out whether or not a specific user is included in a certain group of data will be nearly impossible.

The research encrypts the Yahoo! data set by distorting the numbers in this way, such that no individual user is truly at risk. Another safeguard within this data set is that the data isn’t necessarily actual passwords, but frequency lists, which count the number of times a specific password was created by a certain group of users.

“I have already started the Yahoo! data to model and analyze a rational offline password attacker using game theory,” wrote Blocki in an email interview with The Tartan.

There are many types of password-cracking programs out there, and by running real data through constructed examples of these programs, researchers can really determine how to better protect actual users. With access to a data sample this large, there are tremendous opportunities for relevant experiments that can be extrapolated into the real world.

Blocki notes that he is “excited to see what other security researchers are able to [do] with the frequency data now that it is available publicly,” and, given the new avenues this secure sharing method open up, there are a variety of possibilities.