Crypto-Buster Background

When I was in college, in the late 50's, I remember the daily student newspaper had Cryptoquote puzzles in it. And I remember seeing other students working on them... during lectures (liberal arts majors?). Some twenty years later, I became 'addicted' myself. Being a programmer at the time (Fortran IV on an IBM mainframe) I couldn't help think that it would be neat to get some automated help in solving these things... and even neater to develop the application myself!

In the early 80's when I got my first personal computer (IBM PC-1) I needed a worthwhile application as a vehicle to learn Basic. Since 'learning machine' technology was the new thing in software development at the time, I decided to start trying to develop a cryptoquote solver that would, based on manually-assisted completed solutions, 'learn' to solve them automatically. I've been working on it off-and-on ever since, through several major iterations (in Basic, Pascal & C), and I am nowhere close to my original goal!


The current incarnation, Crypto-Buster, amounts to being only a 'solution assist tool' rather than trying to be an 'automatic solver'. But it contains the most promising 'solution aid' concepts and techniques that I originally developed in my 'solver' attempts... and lacks a couple that didn't "pan-out" too well.

Solution Vocabulary Database

Not surprisingly, a lot of the types of words and phrases found in cryptoquotes seem to re-occur quite frequently. Many have some special properties. Rather than to blindly use a dictionary of all English words as a solution database, Crypto-Buster uses these "special" words. They are saved (or 'learned') as they are found solving actual cryptoquotes.

The 'learned' solution word 'vocabulary' consist primarily of those words that have one or more of the following special properties:

Pattern Code - A pattern code is generated for Solution Words, where each unique letter in the word is assigned a unique integer number starting at one and increasing sequentially. Repeated letters are assigned the same respective code. All non-alphabetic characters are assigned the number zero. Presently, the nine letter limit restricts which words can be coded and saved/'learned'. For example, the word governmental contains 10 unique letters. Also, because non-alphabetic characters are coded as a '0', it's impossible to distinguish between different special characters. For example, gov't. [123040]. Generally, words that contain no repeated letters, i.e., words with a pattern code like [123456] are not saved. The exception here is short words - up to about three or letters. Also, solution words are further limited to 13 total characters.

Word Records - When a solution word is found, if it's key already exist the word is saved in the same record with the other word(s), in aplhabetical order. Otherwise, a new word record is created with the new key. The word records are stored with numerically ascending key order. The database is simply a flat text file. For example, a segment of the database might contain:

Word Categories - Since cryptoquotes consist of the body of a quote followed by the author's name (attribution) or source, it is useful to categorize solution words. Three categories (sub-databases) are presently used: General or Generic words - found in the body of the cryptoquote, and First Names and Last Names - found in the attribution. Word records, as described above, are used for each category.

To view the current vocabulary database, click: cbWordLib.cpf or use Crypto-Buster's DB Tools option - View DB.

Solutions using Word Patterns

One of the best solution methods I've found is to focus on those cipher words that contain repeated letters or letter sequences, punctuation characters, single letters, etal. A pattern code can be generated for cipher words also. For example: FQFZ'X[121304], HQXHWQ[123142] and KWTZJQK[1234561].

So, if you encounter a crypto word, such as FQFZ'X, with the pattern [121304], then you have a definite solution (didn't) because there is only one word with this pattern in the Crypto-Buster vocabulary (database); that is, until another (different) word is discovered that has the same pattern code. Presently, it's the only word I've personally found for the code [121304].

Single Word Solution Search

The mechanism for looking-up candidate solutions is pretty straight-forward. When a coded crypto word is entered, the program generates it's pattern code and then finds and reports, for each word category, all words with that key, and, that further have no cipherd letters that also match the corresponding solution letters. It also reports if no candidates are found, or, if the pattern (key) does not exist. In the former case you know you've found another new solution word for this particular pattern code and category. In the latter case, however, you know you've discovered a single brand new solution word. These are usually good finds because they are definite solutions - until you find another word with it's pattern code.

The search mechanism also allows supplying a filter, to narrow-down the search. Here you simply give it any known, or trial, solution letter(s) for any given coded letter(s). The search then selects only those solution word candidates that match the specified letter assignment and don't match for any of the other unspecified letters, and, that do not also match any of the pre-assigned solution letters. I think this says it right!

Word Phrase (multi-word) Solution Search

Often a particular cryptoquote will not contain any, or many, patterned type words (those with repeated letters). Thus, the above single-word search capability is of little value since words with all unique letters are not generally saved in the database. However, in many cases it often occurs that two or more words (sometimes in succession) share one or more letters, even letter sequences, in common. An example might be: QWZXGF ZKB GVW QWZPG. And the phrase search might yield: 'beauty and the beast' (among other possible solutions).

The mechanism for completing this search is quite complicated and can be time consuming. It starts by first doing a single-word search for each phrase word, independently. From this set of possible candidate solutions for each phrase word it then, essentially, builds trial solution phrases consisting of all combinations of candidate solution words for each phrase word. So, if there are three words in the phrase and each word has five individual candidate solutions, then the total number of trial phrase evaluations required would be 125 (5x5x5).

In the trial phrase solution search, once a candidate solution for a particular phrase word is assigned, its solution letter assignments become filters, for any common letters, in all of the subsequent phrase word solution searches. It's an iterative process that's kind of hard to describe in words - and was a bit of a challenge to code!

As with the single word search, it is also possible to initially include filter letter assignmnets in the phrase search. And also, unlike the single word search, this search is constrained to looking at either only Generic words for solutions or for Names - where First name are used for all cipher words except the last one, which uses only Last names.

Phrase Search Limitations - There are presently many limitations with the phrase search:

Top of Page