September 26, 2015 by Marco Cecconi
I generate random passwords of 24 characters chosen from a wide variety when registering to sites. I then use a password manager to remember the password, because of course it would be impossible to remember these.
This allows me to browse with a little more security, and it also has the intended side effect of discovering insecure sites. The passwords I choose are by design relatively hard to guess, hence relatively secure. A site that won't allow my relatively secure passwords, but only weaker ones is, by definition, insecure.
Jump to the bottom for the two scary signs I promised, or read along for an easy to follow explanation on how passwords should be stored, why, and why those signs are absolutely horrific in terms of security.
A password works because it's "word" that is hard to guess for an attacker. There are three main ways to attack a password log in:
The first two ways are similar to hacking a safe, and have the same countermeasures. You should never use an easy to guess number as your safe combination, similarly using a random password makes it hard to guess intuitively. Safes with longer combinations are safer, and equally using a long password, with a wide variety of possible characters are safer.
What about the third avenue to attack, stealing the password from the site itself? It sounds much harder than the other two options, but there is an added incentive for hackers to steal the password from the site. Stealing passwords from a site hacks all the accounts at the same time, whereas guessing and attempting only guess one password at a time.
Sites will therefore try to protect the passwords. There's no doubt that some form of password needs to be stored, otherwise how could they check if your log in is correct? On the other hand, not all strategies to secure these secrets are equally safe, but fortunately there are some standard, safe, ways.
Let's see how a system can store a password. They will need to have a list of user names, and for each they need to store the corresponding password.
Plain text: a system could store passwords in plain text and this would basically give all the user names and passwords to a hacker that stole this list. This is bad.
What we can do is use encryption. Encryption is a technique that makes it hard to read a "secret" without knowing a "key". There are many ways to encrypt a secret, and here is an example. Feel free to skip if it's too technical (but maybe instead try to do it, it's fun!)
Suppose we want to encrypt the password
hunter2
. A possible way to do so is to move "forwards or backwards" in the alphabet in a predefined way. This is the key, and let's pretend this key is "7 forward, 1 forward, 12 forward, 9 forward, 12 forward, 5 forward, 15 forwards", or in short "7-1-12-9-12-5-15
".We can create the "encrypted" version of the password by following the instructions.
We write the password and pair it with the key (if the key is longer, just use the parts we need, if the password is longer, just repeat the key)
In order to count the steps forwards we can use this "extended alphabet" which contains numbers
so we start with
h
, go forwards 7 letters and findo
.then
u
, go forwards one and getv
,n
plus 12 isz
,t
plus 9 is2
and so on. This way we can write an encrypted word, letter by letter, in the table we preparedonce we get to
2
we run into a problem — there are not enough letters in the alphabet! No problem, we start back from the left. Therefore, to perform2
plus 15 we count 1 and3
, 2 and4
, 3 and5
, …, 7 and9
, 8 anda
, …, 15 andh
.Finally we have a result!
This operation of "sum and wrap" is generally called a "modular sum" or a "sum modulo 36" and I'll use the circled plus "⊕" symbol for it, because it's like a sum, but different.
To decrypt we can do the same operation, but follow the instructions in reverse.
This is just an example of bad encryption, but it is just a toy: there are much safer ways to encrypt data.
The storage will now look like
Single password encryption: A system could store the passwords safely encrypted with a master password, and this would make it hard for the attacker... unless they also steal the master password to read them!
A way around this attack is to store each password encrypted with "itself". Here's how it could be done with out toy encryption scheme.
In order to do so a key must be derived from the password. Looking at the previous key "
7-1-12-9-12-5-15
" and extended alphabet "abcdefghijklmnopqrstuvwxyz0123456789
" we notice two facts.Firstly, an instruction to move forwards by 36 leaves us where we started, a move by 37 has the same result as a move by one, etc. This is because we "wrap around" and basically limit the possible "moves" to 36 different displacements.
Secondly, since there are obviously 36 letters in our extended alphabet, each type of "move" can be mapped to a letter!
Therefore by looking up our "key" in the extended alphabet we can express it differently: 7 forwards corresponds to a "
g
", 1 to a "a
", etc.Our key in the example above can be represented as the word
galileo
.Of course, that's because I chose the instructions in a special way! But this shows that any word can be "mapped" to a key by looking up each letter's position in the alphabet.
So: to derive a key from "
hunter2
" we can lookup each letter's ordinal position in the alphabet: "h
" is the 8th letter, "u
" is the 21st, … and we can get this key "8-21-14-20-5-18-29
".Finally, we can encrypt "
hunter2
" with a key derived from itself:
The storage will look like this
Encrypt password with itself: They could store the password safely encrypted using the password as a key.
This is interesting because you can recover a password only if you know it already. This is enough to test whether the password a user enters is correct, but it makes it hard to steal! However, there are some issues with this approach: firstly it's possible to guess the password length from the encrypted version and secondly two users with the same password would still have the same encrypted password.
There is one important thing that we learnt so far: there is no need to store any password, there are ways of verifying a password without storing it in a recoverable way.
In general this is known as password hashing and it turns out we can create a hash from an encrypted password in a simple way. The advantage is that we can create a hash of always the same length, no matter how long the password.
There are many ways of doing this, but this is particularly simple and easy to explain.
We take the password "
hunter2
" and encrypt it with itself as above. We get "pf1dj9u
", which by construction has the same length ashunter2
and thus "leaks" that information to an attacker.We want to get a fixed length "signature" or hash of, say, 4 letters. In order to do so, we split
pf1dj9u
in groups of four letters. In the last group, we might have less than four letters, in which case we repeatpf1d
... until the group is composed of four letters as well.If we have 1 group, we're done. In this case there are 2 groups and we'll write the second below the first
using the rules we discovered in encryption we can transform the second block in a key:
j
corresponds to a 10,9
to a 36, etc. And we can encrypt the first block with this key
zfmt
is the signature ofhunter2
. To verify a user entered password we hash it and compare it tozfmt
. Cool, uh?Let's see what happens with a wrong password:
incorrect
. The corresponding key is9-14-3-15-18-18-5-3-20
. We encrypt one with the other:Then proceed to hash. Let's split it into three 4 letter blocks (the third needs to be padded 3 times)
These are the first 2 blocks:
And this is the third, padded with 3 letters from the first:
So the hash is
yihf
, which is not the same aszfmt
and the user is not verified. However, repeating the process with the correcthunter2
will verify the user.
The user data now looks like
Hashed passwords: These do not leak password length, but equal passwords have equal hashes
Now, this solves the problem of "leaking" the password length, but not the problem that two equal passwords will get the same hash. Evidently if someone knows that the password "hunter2
" is stored as "zfmt
", then they can log in as any user who uses the same password. This may sound unlikely but it's actually a common scenario, unfortunately.
More subtly, we can take the first few million common passwords, hash them and create a "decoding table" which assigns a working password to a few different million hashes! This is not a theoretical threat: they are called rainbow tables and can be downloaded from security oriented sites or generated.
How can we make each password unique, and harder to crack? This mechanism is called "salting" the hash and it is quite simple.
Suppose that we generate a random number each time a user sets her password. If the number is random enough, and big enough, we can append it to the password and make it much stronger!
For example: if the user's password is "
hunter2
", we can generate a random 3 digit number, called "salt", like238
, and hash "hunter2238
". This will have a different hash from "hunter2
" (in fact, it ismc24
). For a different user which also uses "hunter2
" we will generate a different random 3 digit number, like980
and store the hash for "hunter2980
" which isv02i
.This of course leaves us with a problem: how do we verify a user giving a password of "
hunter2
" if the hash is different? The simple answer is that we store the salt along with the hash.
The user data now looks much more random, and no obvious facts about the passwords are leaked if someone steals the data.
Hashed and salted passwords: the current standard way to store passwords.
which is, with few modifications like using proper hashing algorithms and many many passes of hashing instead of a single pass as I've shown, is the current state of the art in password storage.
The more variety of characters we can choose from, the harder the password is to guess. Since modern, valid hashing algorithms always return a number what possible reasons could there be to stop people from using strange characters?
Normally the reason is that the software or site you are dealing with is not hashing your password! If all the software has to do is hash the password, there is absolutely no difference in the output if you use strange, accented characters or punctuation in your password.
Besides the obvious idiocy of limiting password length, this is a clear indication that the site is not hashing your password. If they did, then password length would be irrelevant as hashes have, by construction, a fixed size.
I am the Chief R&D at BaxEnergy, developer, hacker, blogger, conference lecturer. Bio: ex Stack Overflow core, ex Toptal core.
Read moreDecember 27, 2024 by Marco Cecconi
TDD can’t guarantee zero-defects. Let us debunk this software development myth.
Read moreMarch 12, 2023 by Marco Cecconi
Stack Overflow could benefit from adopting a using conversational AI to provide specific answers
Read moreOctober 15, 2021 by Marco Cecconi
Multiple people with my name use my email address and I can read their email, chaos ensues!
Read moreSeptember 29, 2021 by Marco Cecconi
After years of building, our top-notch consultancy to help start-ups and scale-ups create great, scalable products, I think it is high time I added an update to how it is going and what's next for us.
Read moreFebruary 03, 2021 by Marco Cecconi
A lesson in building communities by Stack Overflow's most prominent community manager emeritus, Shog9
Read moreWhat began, in Boole’s words, with an investigation “concerning the nature and constitution of the human mind,” could result in the creation of new minds—artificial minds—that might someday match or even exceed our own.
Read more…