Algorithms

Solution for Hackerrank Problem: Basic Cryptanalysis

There's one programming puzzle I solved many years ago but I still remember it because of the unusual solution. Problem: You have a text encrypted with a simple monoalphabetic substitution cipher. You don't have the key but you have a dictionary. Every word in a text can be found in this dictionary. Find the original text. It can be solved by dynamically generating regular expressions.

Full problem text can be found here: https://www.hackerrank.com/challenges/basic-cryptanalysis/problem

What can we see by looking at this encrypted word?

rjayojfr

It may look like "nothing" but actually it provides us a lot of information:

  • this is an 8-letter word
  • second letter is different from the first one
  • third letter is different from the first two
  • sixth letter is same as the second one
  • seventh letter is different from all previous ones
  • eighth letter is same as first one

Not so many words match these rules so we can loop over the dictionary looking through them. And we already have a nice tool for checking if the string matches some rules, it's regular expressions.

All these rules can be automatically turned into a regular expression like this:

^(.)((?!\1).)((?!\1)(?!\2).)((?!\1)(?!\2)(?!\3).)((?!\1)(?!\2)(?!\3)(?!\4).)\2((?!\1)(?!\2)(?!\3)(?!\4)(?!\5).)\1$

It's long but simple. (.) means a subpattern with any char, \1 means the same char as in first subpattern ((?!\1).) means a subpattern with any char except the one that was if first subpattern, ((?!\1)(?!\2).) means a subpattern with any char except the ones that was if first and seconds subpatterns and so on.

If suddenly we already know that r decrypts as "d" and f decrypts as "a", we can also incorporate this information into our regular expression:

^d(.)((?!\1).)((?!\1)(?!\2).)((?!\1)(?!\2)(?!\3).)\1ad$

BTW the word I meant is download.

With this in mind it's easy to find an algorithm:

  1. Take first encrypted word.
  2. Generate a regular expression using this word and known part of the key (it's empty one the first run).
  3. Find the words in dictionary matching the regular expression, call them decrypted words.
  4. For each decrypted word:
    1. Update result.
    2. Update the key with new chars in decrypted word.
    3. Repeat from step 1 with the next word of encrypted text.

That's it. Here's the solution in PHP:

<?php

$_fp = fopen("php://stdin", "r");
$input = trim(fgets($_fp));
$dictionary = trim(file_get_contents('./dictionary.lst'));

$variants = [
    [[], ''],
];

$outputWords = [];
foreach (explode(' ', $input) as $cipheredWord) {
    $nextVariants = [];

    foreach ($variants as [$key, $outputText]) {
        $pattern = '';

        $indexes = [];
        $index = 1;
        $nots = '';
        foreach (str_split($cipheredWord) as $letter) {
            if (isset($key[$letter])) {
                // If letter is in the key - add decyphered letter to a pattern
                $pattern .= $key[$letter];

            } elseif (isset($indexes[$letter])) {
                // If letter already found in this word - add a back reference
                $pattern .= '\\' . $indexes[$letter];

            } else {
                // If letter appears first time in this word - add a new reference with negative lookaheads to previous references
                $pattern .= '(' . $nots . '.)';
                $indexes[$letter] = $index;
                $nots .= '(?!\\' . $index . ')';
                $index++;
            }
        }

        $pattern = '/^' . $pattern . '$/mi';
        if (preg_match_all($pattern, $dictionary, $match, PREG_SET_ORDER)) {
            foreach ($match as $set) {
                $newKeyPart = array_map(function ($index) use ($set) { return $set[$index]; }, $indexes);
                $nextVariants[] = [$key + $newKeyPart, $outputText . ' ' . $set[0]];
            }
        }
    }

    $variants = $nextVariants;
}

if (!$variants) {
    die('no possible solutions');
}

echo strtolower(trim($variants[0][1]));

What was the strangest usage of regular expressions in your case?


Tags: , , ,

Comments