Ashley Sheridan​.co.uk

Checking Password Strength with Regular Expressions

Posted on

Tags:

Randall Munroe of XKCD explains wonderfully how through 20 years of effort, we've successfully trained everyone to use passwords that are hard for humans to remember, but easy for computers to guess. As humans, we see a password that seems entirely nonsensical and assume it to be as difficult to crack as it is to remember. Despite this logical fallacy, developers worldwide still insist on enforcing rules that just push us humans to write the passwords down on sticky notes and in text files on our desktops!

Given that these password rules are not going away anytime soon, there's a need to validate that passwords meet these rules. Now, typically these tests are performed as specific individual checks in code. What I was wondering, was could all of these rules be bundled into a single regular expression. It should be worth noting that just because it may be possible, it's not really recommended to do it like this, for reasons I'll go into further down.

The Rules

The most typical rules for password strength look something like this:

  1. minimum of 8 characters
  2. maximum of 100 characters
  3. contains at least 3 of the following 4 types of character
    1. uppercase letters
    2. lowercase letters
    3. numbers
    4. a "special" character, e.g. punctuation, monetary symbols, etc

Turning those specific character rules into mini regular expressions gives you this list:

  1. [A-Z]
  2. [a-z]
  3. \d
  4. [,\.\?!:;\$£#]

Now that we have each segment of what we want to match, we need to work out how many different combinations there are of the 4 character requirements that include at least 3:

  • a, b, c
  • a, b, d
  • a, c, d
  • b, c, d

Note that the exact order of an item doesn't really matter, so that is why there are only 4 possible combinations.

Building the Expression

So how do we put all of this together to create our full regular expression? The key to all of this is positive lookaheads. What these do is generate a match if something further along in the string matches. For example, the following will match any string that contains at least a single uppercase letter:

^(?=.*[A-Z]).+$

The ?= is the positive lookahead. Because it occurs right after the ^, it doesn't match any character in the normal sense, it matches the start of the string but only if somewhere further along there is a capital letter. The .+ then matches the whole string.

In order to match abc we need to chain together the individual expressions we created:

^ (?=.*[A-Z]) (?=.*[a-z]) (?=.*\d) .+$

I've added new lines here for brevity only, you would remove them for the final regex; the syntax has always favoured a write-only methodology! If you try this out (a great online regex testing tool can be found at regex101.com) you'll see it matches a string of any length (at least 3 as we're trying to match 3 specific characters) as long as it contains uppercase and lowercase letters and a number.

The next step is to add in the next pattern of varying cased letters and a symbol. Again, I'll break it down into separate lines, with indentation to aid reading, but when you put it all together again remove all newlines and indents.

^ ( ( (?=.*[A-Z]) (?=.*[a-z]) (?=.*\d) ) | ( (?=.*[A-Z]) (?=.*[a-z]) (?=.*[,\.\?!:;\$£#]) ) ) .+$

We can repeat this pattern as required for all combinations that we discovered earlier given the password constraints, giving us something that looks like this. Then, there's only one thing left to add from our list of rules: the length of the password. We determined it should be between 8 and 100 characters long. This is easily achieved by changing the last line that matches any character to match only a specific number of times:

^ ( ( (?=.*[A-Z]) (?=.*[a-z]) (?=.*\d) ) | ( (?=.*[A-Z]) (?=.*[a-z]) (?=.*[,\.\?!:;\$£#]) ) | ( (?=.*[A-Z]) (?=.*\d) (?=.*[,\.\?!:;\$£#]) ) | ( (?=.*[a-z]) (?=.*\d) (?=.*[,\.\?!:;\$£#]) ) ) .{8,100}$

When it's fully built, and all extra whitespace removed, it looks like this:

^(((?=.*[A-Z])(?=.*[a-z])(?=.*\d))|((?=.*[A-Z])(?=.*[a-z])(?=.*[,\.\?!:;\$£#]))|((?=.*[A-Z])(?=.*\d)(?=.*[,\.\?!:;\$£#]))|((?=.*[a-z])(?=.*\d)(?=.*[,\.\?!:;\$£#]))).{8,100}$

Additonal Rules

So we have our full regular expression, and it works perfectly, but a new change request comes in. For very specific and totally not ridiculous reasons, the passwords cannot contain an asperand (@) symbol. Can the regex be modified to take this into account? Of course it can! The trick is to add a negative lookahead to our expression. The lookahead itself looks like this:

(?!.*@)

Just place it after the start of string matcher in the original expression:

^ (?!.*@) ( ( (?=.*[A-Z]) (?=.*[a-z]) (?=.*\d) ) | ( (?=.*[A-Z]) (?=.*[a-z]) (?=.*[,\.\?!:;\$£#]) ) | ( (?=.*[A-Z]) (?=.*\d) (?=.*[,\.\?!:;\$£#]) ) | ( (?=.*[a-z]) (?=.*\d) (?=.*[,\.\?!:;\$£#]) ) ) .{8,100}$

Why You Should Ignore All This And Not Use It

Just as it's perfectly possible to start a fire with a stick, it's a lot of effort, and it's quite possibly the slowest method available to you, so too is this method completely unsuitable for determining password strengths.

Regular expressions are not speedy beasts when they become overly complex. The regex101 tester reports that this performs nearly 100 steps against the password "This Password Should Pass Our Strength Test 12345". That number increases as the number of characters you're checking for increases and as the length of the password increases. The time for checking one string is negligible, but if you're checking thousands, it may matter.

The most compelling reason to avoid this method though is the maintainability of such code. Regular expressions are known to be incredibly difficult to read, let alone modify, and the syntax doesn't even lend itself terribly well to writing in the first place. If you had to hand this off to another developer, they would be fully justified ripping it out and rewriting the logic with something that's readable.

Still, the concept is interesting, and shows the power of regular expressions. But definitely, definitely, don't use them for something like this!

Comments

Leave a comment