Pattern search (at 2nd method for analyzing substitution ciphers)

The analysis method described here is used when analyzing monoalphabetic ciphers. It is based on most frequent words and works only for encrypted versions of English or German standard text, in which the spaces between words are retained in the encrypted text (see dialog Text Options).

This section describes the analysis for English plaintext. However, the analysis works in exactly the same way for German plaintext, although in this case due to the statistical characteristics of the German language the baseline numbers are somewhat different. To avoid generalising too much in the description of the analysis procedure, only the analysis steps based on the statistical frequencies applicable to English plaintext is described.

The analysis for English plaintext is based on the fact that any word taken from an English standard text will have a 0.5 probability of being contained in the list of the 135 most common English words.

The first step is to break down the text to be analysed into individual words. Every individual words is checked to see whether there is a word in the list of the 135 most common words which exhibits the same pattern.

"Pattern" refers here to the pattern of letters, for example, letters 1 and 3 are the same, or all the others are different. Thus, the words "cat", "big", "the" and "are" all follow the same pattern (every word consists of three different letters), whereas the words "ally" and "fall" follow different patterns.

Words which do not have a suitable counterpart in the list are not considered. All the other words are sorted by patterns and collected together into sets (all following the same pattern). Each of these sets is now assigned the set of words in the list which follows the same pattern.

Each pair of words from sets assigned to each other now undergoes a partial substitution. In this context, "partial" means that the matching replacement does not necessarily have to be found for every letter. Instead, it can happen that one cannot make any sense out of the assignment of individual letters.

Using a search tree that is searched with a depth-first search, the (partial) substitution formed from the most individual substitutions without any collisions (i.e. conflicts) is now searched.

To avoid the effort required to search through the tree assuming exponential proportions, branches in which collisions occur towards the top are not investigated further.

Letters which could not be assigned are represented in the resulting text by lower case letters. Letters for which a substitute was found are shown in upper case.

Remark:

This search method does not have a very good performance and it requires blanks to be kept as word seperators between in the cipher text.