There's one programming puzzle I solved many years ago but I still remember it because of the unusual solution. Problem: You have a text encrypted with a simple monoalphabetic substitution cipher. You don't have the key but you have a dictionary. Every word in a text can be found in this dictionary. Find the original text. It can be solved by dynamically generating regular expressions.
Full problem text can be found here: https://www.hackerrank.com/challenges/basic-cryptanalysis/problem
What can we see by looking at this encrypted word?
rjayojfr
It may look like "nothing" but actually it provides us a lot of information:
- this is an 8-letter word
- second letter is different from the first one
- third letter is different from the first two
- …
- sixth letter is same as the second one
- seventh letter is different from all previous ones
- eighth letter is same as first one
Not so many words match these rules so we can loop over the dictionary looking through them. And we already have a nice tool for checking if the string matches some rules, it's regular expressions.
All these rules can be automatically turned into a regular expression like this:
^(.)((?!\1).)((?!\1)(?!\2).)((?!\1)(?!\2)(?!\3).)((?!\1)(?!\2)(?!\3)(?!\4).)\2((?!\1)(?!\2)(?!\3)(?!\4)(?!\5).)\1$
It's long but simple. (.)
means a subpattern with any char, \1
means the same char as in first subpattern ((?!\1).)
means a subpattern with any char except the one that was if first subpattern, ((?!\1)(?!\2).)
means a subpattern with any char except the ones that was if first and seconds subpatterns and so on.
If suddenly we already know that r
decrypts as "d" and f
decrypts as "a", we can also incorporate this information into our regular expression:
^d(.)((?!\1).)((?!\1)(?!\2).)((?!\1)(?!\2)(?!\3).)\1ad$
BTW the word I meant is download
.
With this in mind it's easy to find an algorithm:
- Take first encrypted word.
- Generate a regular expression using this word and known part of the key (it's empty one the first run).
- Find the words in dictionary matching the regular expression, call them decrypted words.
- For each decrypted word:
- Update result.
- Update the key with new chars in decrypted word.
- Repeat from step 1 with the next word of encrypted text.
That's it. Here's the solution in PHP:
<?php
$_fp = fopen("php://stdin", "r");
$input = trim(fgets($_fp));
$dictionary = trim(file_get_contents('./dictionary.lst'));
$variants = [
[[], ''],
];
$outputWords = [];
foreach (explode(' ', $input) as $cipheredWord) {
$nextVariants = [];
foreach ($variants as [$key, $outputText]) {
$pattern = '';
$indexes = [];
$index = 1;
$nots = '';
foreach (str_split($cipheredWord) as $letter) {
if (isset($key[$letter])) {
// If letter is in the key - add decyphered letter to a pattern
$pattern .= $key[$letter];
} elseif (isset($indexes[$letter])) {
// If letter already found in this word - add a back reference
$pattern .= '\\' . $indexes[$letter];
} else {
// If letter appears first time in this word - add a new reference with negative lookaheads to previous references
$pattern .= '(' . $nots . '.)';
$indexes[$letter] = $index;
$nots .= '(?!\\' . $index . ')';
$index++;
}
}
$pattern = '/^' . $pattern . '$/mi';
if (preg_match_all($pattern, $dictionary, $match, PREG_SET_ORDER)) {
foreach ($match as $set) {
$newKeyPart = array_map(function ($index) use ($set) { return $set[$index]; }, $indexes);
$nextVariants[] = [$key + $newKeyPart, $outputText . ' ' . $set[0]];
}
}
}
$variants = $nextVariants;
}
if (!$variants) {
die('no possible solutions');
}
echo strtolower(trim($variants[0][1]));
What was the strangest usage of regular expressions in your case?