Linda!

I generate random passwords of 24 characters chosen from a wide variety when registering to sites. I then use a password manager to remember the password, because of course it would be impossible to remember these.

This allows me to browse with a little more security, and it also has the intended side effect of discovering insecure sites. The passwords I choose are by design relatively hard to guess, hence relatively secure. A site that won't allow my relatively secure passwords, but only weaker ones is, by definition, insecure.

Jump to the bottom for the two scary signs I promised, or read along for an easy to follow explanation on how passwords should be stored, why, and why those signs are absolutely horrific in terms of security.

How do passwords work?

A password works because it's "word" that is hard to guess for an attacker. There are three main ways to attack a password log in:

guess the password by intuition
attempt all possible passwords until the correct one is found
steal the password stored in the system they are meant to protect

The first two ways are similar to hacking a safe, and have the same countermeasures. You should never use an easy to guess number as your safe combination, similarly using a random password makes it hard to guess intuitively. Safes with longer combinations are safer, and equally using a long password, with a wide variety of possible characters are safer.

What about the third avenue to attack, stealing the password from the site itself? It sounds much harder than the other two options, but there is an added incentive for hackers to steal the password from the site. Stealing passwords from a site hacks all the accounts at the same time, whereas guessing and attempting only guess one password at a time.

Sites will therefore try to protect the passwords. There's no doubt that some form of password needs to be stored, otherwise how could they check if your log in is correct? On the other hand, not all strategies to secure these secrets are equally safe, but fortunately there are some standard, safe, ways.

Let's see how a system can store a password. They will need to have a list of user names, and for each they need to store the corresponding password.

unencrypted passwords are bad

^{Plain text: a system could store passwords in plain text and this would basically give all the user names and passwords to a hacker that stole this list. This is bad.}

What we can do is use encryption. Encryption is a technique that makes it hard to read a "secret" without knowing a "key". There are many ways to encrypt a secret, and here is an example. Feel free to skip if it's too technical (but maybe instead try to do it, it's fun!)

Suppose we want to encrypt the password hunter2. A possible way to do so is to move "forwards or backwards" in the alphabet in a predefined way. This is the key, and let's pretend this key is "7 forward, 1 forward, 12 forward, 9 forward, 12 forward, 5 forward, 15 forwards", or in short "7-1-12-9-12-5-15".

We can create the "encrypted" version of the password by following the instructions.

We write the password and pair it with the key (if the key is longer, just use the parts we need, if the password is longer, just repeat the key)

In order to count the steps forwards we can use this "extended alphabet" which contains numbers

so we start with h, go forwards 7 letters and find o.

then u, go forwards one and get v, n plus 12 is z, t plus 9 is 2 and so on. This way we can write an encrypted word, letter by letter, in the table we prepared

once we get to 2 we run into a problem — there are not enough letters in the alphabet! No problem, we start back from the left. Therefore, to perform 2 plus 15 we count 1 and 3, 2 and 4, 3 and 5, …, 7 and 9, 8 and a, …, 15 and h.

Finally we have a result!

This operation of "sum and wrap" is generally called a "modular sum" or a "sum modulo 36" and I'll use the circled plus "⊕" symbol for it, because it's like a sum, but different.

To decrypt we can do the same operation, but follow the instructions in reverse.

This is just an example of bad encryption, but it is just a toy: there are much safer ways to encrypt data.

The storage will now look like

passwords encrypted with a shared key are bad

^{Single password encryption: A system could store the passwords safely encrypted with a master password, and this would make it hard for the attacker... unless they also steal the master password to read them!}

A way around this attack is to store each password encrypted with "itself". Here's how it could be done with out toy encryption scheme.

In order to do so a key must be derived from the password. Looking at the previous key "7-1-12-9-12-5-15" and extended alphabet "abcdefghijklmnopqrstuvwxyz0123456789" we notice two facts.

Firstly, an instruction to move forwards by 36 leaves us where we started, a move by 37 has the same result as a move by one, etc. This is because we "wrap around" and basically limit the possible "moves" to 36 different displacements.

Secondly, since there are obviously 36 letters in our extended alphabet, each type of "move" can be mapped to a letter!

Therefore by looking up our "key" in the extended alphabet we can express it differently: 7 forwards corresponds to a "g", 1 to a "a", etc.

Our key in the example above can be represented as the word galileo.

Of course, that's because I chose the instructions in a special way! But this shows that any word can be "mapped" to a key by looking up each letter's position in the alphabet.

So: to derive a key from "hunter2" we can lookup each letter's ordinal position in the alphabet: "h" is the 8th letter, "u" is the 21st, … and we can get this key "8-21-14-20-5-18-29".

Finally, we can encrypt "hunter2" with a key derived from itself:

The storage will look like this

passwords encrypted with a shared key are bad

^{Encrypt password with itself: They could store the password safely encrypted using the password as a key.}

This is interesting because you can recover a password only if you know it already. This is enough to test whether the password a user enters is correct, but it makes it hard to steal! However, there are some issues with this approach: firstly it's possible to guess the password length from the encrypted version and secondly two users with the same password would still have the same encrypted password.

There is one important thing that we learnt so far: there is no need to store any password, there are ways of verifying a password without storing it in a recoverable way.

In general this is known as password hashing and it turns out we can create a hash from an encrypted password in a simple way. The advantage is that we can create a hash of always the same length, no matter how long the password.

There are many ways of doing this, but this is particularly simple and easy to explain.

We take the password "hunter2" and encrypt it with itself as above. We get "pf1dj9u", which by construction has the same length as hunter2 and thus "leaks" that information to an attacker.

We want to get a fixed length "signature" or hash of, say, 4 letters. In order to do so, we split pf1dj9u in groups of four letters. In the last group, we might have less than four letters, in which case we repeat pf1d... until the group is composed of four letters as well.

If we have 1 group, we're done. In this case there are 2 groups and we'll write the second below the first

using the rules we discovered in encryption we can transform the second block in a key: j corresponds to a 10, 9 to a 36, etc. And we can encrypt the first block with this key

zfmt is the signature of hunter2. To verify a user entered password we hash it and compare it to zfmt. Cool, uh?

Let's see what happens with a wrong password: incorrect. The corresponding key is 9-14-3-15-18-18-5-3-20. We encrypt one with the other:

Then proceed to hash. Let's split it into three 4 letter blocks (the third needs to be padded 3 times)

These are the first 2 blocks:

And this is the third, padded with 3 letters from the first:

So the hash is yihf, which is not the same as zfmt and the user is not verified. However, repeating the process with the correct hunter2 will verify the user.

The user data now looks like

hashed passwords still leak too much information

^{Hashed passwords: These do not leak password length, but equal passwords have equal hashes}

Now, this solves the problem of "leaking" the password length, but not the problem that two equal passwords will get the same hash. Evidently if someone knows that the password "hunter2" is stored as "zfmt", then they can log in as any user who uses the same password. This may sound unlikely but it's actually a common scenario, unfortunately.

More subtly, we can take the first few million common passwords, hash them and create a "decoding table" which assigns a working password to a few different million hashes! This is not a theoretical threat: they are called rainbow tables and can be downloaded from security oriented sites or generated.

How can we make each password unique, and harder to crack? This mechanism is called "salting" the hash and it is quite simple.

Suppose that we generate a random number each time a user sets her password. If the number is random enough, and big enough, we can append it to the password and make it much stronger!

For example: if the user's password is "hunter2", we can generate a random 3 digit number, called "salt", like 238, and hash "hunter2238". This will have a different hash from "hunter2" (in fact, it is mc24). For a different user which also uses "hunter2" we will generate a different random 3 digit number, like 980 and store the hash for "hunter2980" which is v02i.

This of course leaves us with a problem: how do we verify a user giving a password of "hunter2" if the hash is different? The simple answer is that we store the salt along with the hash.

The user data now looks much more random, and no obvious facts about the passwords are leaked if someone steals the data.

salted hashed passwords are safer

^{Hashed and salted passwords: the current standard way to store passwords.}

which is, with few modifications like using proper hashing algorithms and many many passes of hashing instead of a single pass as I've shown, is the current state of the art in password storage.

The two things that should always make you scared

Invalid characters in the password

Are you sure you are encrypting this at all?

The more variety of characters we can choose from, the harder the password is to guess. Since modern, valid hashing algorithms always return a number what possible reasons could there be to stop people from using strange characters?

Normally the reason is that the software or site you are dealing with is not hashing your password! If all the software has to do is hash the password, there is absolutely no difference in the output if you use strange, accented characters or punctuation in your password.

Passwords which are too long

Limiting password length to 20 chars? Why you no hash?

Besides the obvious idiocy of limiting password length, this is a clear indication that the site is not hashing your password. If they did, then password length would be irrelevant as hashes have, by construction, a fixed size.

I am the Chief R&D at BaxEnergy, developer, hacker, blogger, conference lecturer. Bio: ex Stack Overflow core, ex Toptal core.

TDD and the Zero-Defects Myth

December 27, 2024 by Marco Cecconi

TDD can’t guarantee zero-defects. Let us debunk this software development myth.

What can Stack Overflow learn from ChatGPT?

March 12, 2023 by Marco Cecconi

Stack Overflow could benefit from adopting a using conversational AI to provide specific answers

Fan mail

October 15, 2021 by Marco Cecconi

Multiple people with my name use my email address and I can read their email, chaos ensues!

Intelligent Trip

September 29, 2021 by Marco Cecconi

After years of building, our top-notch consultancy to help start-ups and scale-ups create great, scalable products, I think it is high time I added an update to how it is going and what's next for us.

Guest blog: Building, in partnership with communities by Shog9

February 03, 2021 by Marco Cecconi

A lesson in building communities by Stack Overflow's most prominent community manager emeritus, Shog9

And the Most Realistic Developer in Fiction is...
Julia Silge • Mar 28, 2017

We can say that Mr. Robot is having a moment. The main character was one of the top choices and thus is perhaps the most/least realistic/annoying/inspiring portrayal of what it’s like to be a computer programmer today.

Two scary signs that a site is unsafe

How do passwords work?

The two things that should always make you scared

Invalid characters in the password

Passwords which are too long

Newest Posts

TDD and the Zero-Defects Myth

What can Stack Overflow learn from ChatGPT?

Fan mail

Intelligent Trip

Guest blog: Building, in partnership with communities by Shog9

Gleanings

And the Most Realistic Developer in Fiction is...
Julia Silge • Mar 28, 2017

Two scary signs that a site is unsafe

How do passwords work?

The two things that should always make you scared

Invalid characters in the password

Passwords which are too long

Newest Posts

TDD and the Zero-Defects Myth

What can Stack Overflow learn from ChatGPT?

Fan mail

Intelligent Trip

Guest blog: Building, in partnership with communities by Shog9

Gleanings

And the Most Realistic Developer in Fiction is... Julia Silge • Mar 28, 2017

And the Most Realistic Developer in Fiction is...
Julia Silge • Mar 28, 2017