Author’s note

This post was recovered from my old blog via the Wayback Machine, it has been edited for grammar. I wrote it in 2012 in response to a data braeach at Stratfor. Data from 860K users was leaked online, and Stratfor had not been following the “encrypt everything” mantra. They had encrypted user passwords however, but I quickly recognized those as MD5 hashes. Knowing this was realtively insecure, I built a quick script to test their encryption. They’d at least “salted” user passwords (e.g., adding data to the original password before encrypting), but salting is a two-way encryption, and it’s really just a form of security-through-obscurity - once you know the salt, it’s much easier to extract the password.

I wanted to see if I could find the salt, and was able to pretty quickly. This post describes how I did it.

What’s especially sad to me after re-reading this is that I say “breaches happen almost every day” - and that statement is still true today, 10 years later (and likely will be 10 years from now as well).

@jbnunn

25-Sept 2022


Extracting a salt from an MD5 Hash - Apr 5 2012

In December of 2011, members of activist group Anonymous released a slew (over 860,000 records) of private data stolen from think-tank Stratfor. While I don’t condone the theft, I do 1) condone the attention it brings to a firm that prides itself on being both intelligent and secure – as a means of showing the public that no data is entirely secure, and 2) as a means of pointing out these insecurities in the hopes that it will make them more intelligent and more secure with our data.

I’ve seen the list, in an attempt to see if my own information was compromised. It was not, but I can’t say the same for almost a million other people. The leak contains mostly inconsequential information - but it does have an encrypted password (along with the email address and username) for each person. After a cursory run through of several thousand random encrypted passwords, I was not able to crack any using the method I published a few years back.

Salting

These passwords are at least salted (salting is the process of taking a password and adding extra characters to it to make it more difficult to crack.) If your password was “submarine” using MD5 encryption (which is what the majority of websites use to encrypt stored data) it would be encrypted as “a9bdfa76aa6d76f7bde66e470cf98553.” In an effort to make your data more secure, a programmer might salt your data with another word, like “kangaroo,” by adding it to your password before storing it. So, instead of storing the MD5 hash of “submarine,” which might be easy for a hacker to guess if they accessed the user database, the password is stored as a hash of “submarinekangaroo,” which would be much harder for someone to guess. A smarter salt would be something random, like “tH7rWslwj6”, so that brute-force attacks on passwords with a word-list for salts would be rendered mostly useless. Try it yourself if you want: If you’re on a Mac, go into Terminal and type

md5 -s 'whatever-you-want'

then hit Enter. What you’ll see is the hashed value of your string of text. Now try to add some characters to it–your own salt–and see how the results change. It’s important to realize that there’s no “unhash” method, per-se. There’s no such thing as

unmd5 -s 'a9bdfa76aa6d76f7bde66e470cf98553'

and get “submarine” in response. But–if you go to Google and search for “a9bdfa76aa6d76f7bde66e470cf98553”, you’ll find plenty of posts telling you the answer is “submarine.” Salt submarine with your own new word (md5 -s ‘submarineastroturf’), then search for that–chances are, your search will come up empty. That’s the importance of a salt.

How does my website know my password then?

In most cases, they don’t. They keep the hashed version of your password, but they have no way of knowing what it actually is in “plain-text.” To see if the password you enter when you login matches what they’ve stored in their database, they have to hash it, and compare it to what’s on file. So if your hashed password was stored as “8833f74b9da9cf81d33f6c6a79ac9985” and you entered “telescope” as your password, a program quickly converts your plain-text password to “8833f74b9da9cf81d33f6c6a79ac9985” and compares it to what’s stored. In this case, there’s a match–and you’re granted access to your account. If they happened to salt your password before storing it by adding the word “pineapple” to the beginning, then your stored password would be: “0cf7664d30e8a72b6b423148578ddfba” (again, you can confirm by typing md5 -s 'pineappletelescope' in your terminal).

So, when you enter “telescope” into your website’s login box, before it’s hashed, the website will add “pineapple” to your password, then hash it to compare with what’s stored in the database. You can see not only the importance of salting, but also knowing exactly what the salt is. Without it (without knowing pineapple, in this example), it would difficult to match the password you entered with what was stored.

Looking for patterns

So, we can assume that Stratfor is at least smart enough to salt their passwords–the question is, can we take 800+K hashed salted-passwords, and find any patterns or similarities in them? From that, could we build a frequency of the most common hashed passwords, then assume that those passwords are the same–and try to derive an algorithm that produces a salt? Can we get lucky and hope that Stratfor salted their passwords with either the username or email address of each user? Or did they use the same salt for every user? I would assume they wouldn’t use an email address–especially since a user can change their email address–so we’ll take that one out of the mix. I will, however, try the username as a salt–as that is typically something a user isn’t allowed to change.

The First Clue – No Duplicate Hashes

To begin, I sorted the 860,160 hashed-passwords alphabetically, and interestingly (at least in the few thousand I quickly scanned), there were no matches.

What does this mean?

It means that a different salt is being used for each person.

Why?

Because in a list of 860,160 passwords, the chances of none being the same are infinitesimally small. Let’s say two people used the phrase “opensesame” as their password. The hash of this is:

e6078b9b1aac915d11b9fd59791030bf

Let’s now say that Stratfor salted all passwords when they stored them, and salted them with the phrase “fishbowl123” by appending it to the end of a user’s password. So, “opensesame” becomes “opensesamefishbowl123”, which is hashed as

8feb9db2775f81e3b152803bb9704fad

So, theoretically, if only 2 out of 860,160 people had the password of “opensesame”, we should see the hash “8feb9db2775f81e3b152803bb9704fad” show up at least twice. But there are no duplicates - and that indicates that the same salt isn’t being used for each person. This is too large a sample size to not have at least 2 people with the same password–any password. Since we learned above that the salt must be known in order for a website to check your password, we’ll assume that Stratfor made their salt based on something unique to the user.

The User Record

The user records for the Stratfor file include information like

  • name,
  • stratfor ID,
  • user ID,
  • user email address,
  • timezone,
  • picture,
  • signature,
  • theme,
  • last login date,
  • account creation date,

and a few trivial ones. We know that the salt most likely comes from one of these fields of information, and we know the salt needs to be unique to each user, so we can start eliminating some of these. The dates are interesting, but there is a good possibility that there are plenty of users with the same login date, or account creation date, even down to the hour or minute – so we can’t assume that is unique. We also know that there will be plenty of duplications of the timezone, so that one could be eliminated as well. The theme (which I assume was some sort of color theme or account theme for each user) can also fall under the “duplicate” category, but it falls under another category – which is that of a field where the value could change. For the salted password to work–the salt must always stay the same. We can also consider user email address as something changeable, as well as the user’s name – so we’ll eliminate those from our list of possible salt options.

That leaves us with 2 good options:

user id Stratfor id

Because we know that the salt is unique to a user, we have a good starting point for our attack, using the two options above as our primary salt tests. We know that Stratfor isn’t using a random string for a salt – something that they’ve locked away in some file–because even if they did, there’s a great possibility we would have duplicate hashes – and we have none.

We have candidates for our salt, now what?

To do all the password crunching and text analysis, I’ll be using my new friend, Ruby on Rails. Rails makes it really easy to spin up a quick database and start throwing data in it and doing text manipulation. The first step is to clean up the list and throw it into a database table. I took the enitre Stratfor file, removed the extraneous columns and imported the user records into a database.

Next I created a model for attempts. The attempts are based on the premise that at least one user out of the 860K will have one of the “10 most common passwords” (which, incidentally, were taken from the leak of 32 MILLION passwords from RockYou.com’s compromised systems.)

The 10 passwords we’ll start with are:

123456 12345 123456789 password iloveyou princess 1234567 12345678 abc123 monkey

What we’ll do is take each of the 10 passwords, and add the user id to the beginning, test it, then add the user id to the end, and test it. For example, lets say the user’s password hash is “3d50169ccfe06ecf1bdf4c63fb199bd9”, their user id is “20,” and their Stratfor ID is “23087.”

I’ll take our first password, “123456,” prepend “20″ to it, to get “20123456,” then get the hash (md5 -s "20123456"):

11720f3fa65c0fe57212ba6f12af1af1

No match. So now I’ll try “123456″ and append “20″ to it, to get “12345620,” then get the hash (md5 -s "12345620"):

594111f029cbea462f70398257ac0e7f

No match. Now I’ll try it with their Stratfor ID. No match? Now I’ll move to the next of our Top 10 passwords, “12345,” and continue the test. For each password in our list, we have to try 4 different combinations. That’s 40 combinations for our 10 passwords, tried across 860,160 rows, which means over 36 million tries.

If none of these works, the odds of the salt being based off one of our test columns seems slim, at which point we might consider that the hash is built off of more than one column (for example, prepending the strafor id to the password and appending the user id to the end). If that’s the case, our number of brute-force attempts increases exponentially–and that’s bad news for this exercise, but better news for those whose data is at risk.

The Results

Armed with my list of 10 common passwords and the Stratfor hash, I put Ruby to the test. Less than 20 minutes later (even running on an underpowered MacBook Air), the experiment was a success, and the results are stunning:

Of the 860,160 user accounts from the Stratfor file, 986 of the users had one of the ten common passwords. The salt, as it turns out, is the Stratfor ID, prepended to a user’s password. So, if your password happend to be “monkey,” and your Stratfor ID was “187519,” your password is the MD5 hash of “187519monkey.” (Incidentally, 14 people of 860,160 had the password monkey. The most common, sadly, were 123456 (483 occurrences), and password (285 occurrences).

Is this bad?

Yes. Someone nefarious, knowing the salt column, could take it and run each of the users’ passwords against a brute-force dictionary – and there is no doubt that the 986 number would greatly increase, giving the hacker access to thousands of accounts.

It also means that it only takes two people to have a bad password to crack a salt. If noone in the 800K test had used one of those top 10 passwords, there’s a good chance I would’ve gone on to another method, having found no matches.

What does it mean to Stratfor, and companies like them? You must do a better job of protecting our data. Salting is a good step towards protecting data, but if you don’t use it right, it’s only a minor stumbling block to someone with relatively little skill. Perhaps salting with data from multiple columns, or column data in reverse (maybe the username backwards), or a column on each end of the password (maybe a username and the account-created date), like “usernamemonkey01-25-2012” would be better. The insecurity of our personal data is troublesome, and breaches happen almost every day. I can only hope this will help those who keep our data become more responsible in their protection of it.

- @jbnunn