Back
Crypto-Buster Background
When I was in college, in the late 50's, I remember the daily student newspaper had
Cryptoquote puzzles in it. And I remember seeing other students working on them...
during lectures (liberal arts majors?). Some twenty years later, I became 'addicted' myself.
Being a programmer at the time (Fortran IV on an IBM mainframe) I couldn't help think
that it would be neat to get some automated help in solving these things... and even neater
to develop the application myself!
In the early 80's when I got my first personal computer (IBM PC-1) I needed a worthwhile
application as a vehicle to learn Basic.
Since 'learning machine' technology was the new thing in software development at the
time, I decided to start trying to develop a cryptoquote solver that would, based on
manually-assisted completed solutions, 'learn' to solve them automatically. I've been
working on it off-and-on ever since, through several major iterations (in Basic, Pascal & C),
and I am nowhere close to my original goal!
Crypto-Buster
The current incarnation, Crypto-Buster, amounts to being only a 'solution assist tool'
rather than trying to be an 'automatic solver'. But it contains the most promising 'solution
aid' concepts and techniques that I originally developed in my 'solver' attempts... and
lacks a couple that didn't "pan-out" too well.
Solution Vocabulary Database
Not surprisingly, a lot of the types of words and phrases found in cryptoquotes seem
to re-occur quite frequently. Many have some special properties.
Rather than to blindly use a dictionary of all English words as a solution database,
Crypto-Buster uses these "special" words. They are saved (or 'learned') as they are found
solving actual cryptoquotes.
The 'learned' solution word 'vocabulary' consist primarily of those words that have one
or more of the following special properties:
- commonly occur in cryptoquotes and, often, repeatedly within a given cryptoquote (the, man, world ).
- have distinguishing letter/punctuation properties, or patterns, such as:
- length ( I, a, is )
- special characters ( i'm, it's, i'll, Dr. )
- various forms of repeated letters or letter sequences ( people, remember, successfulness ).
- can be classified as a surname, given name or title from it's use in the cryptoquote attribution.
Pattern Code -
A pattern code is generated for Solution Words, where each unique letter in the word is assigned
a unique integer number starting at one and increasing sequentially. Repeated letters are assigned
the same respective code. All non-alphabetic characters are assigned the number zero.
Presently, the nine letter limit restricts which words can be coded and saved/'learned'.
For example, the word governmental contains 10 unique letters.
Also, because non-alphabetic characters are coded as a '0', it's impossible to
distinguish between different special characters. For example, gov't. [123040].
Generally, words that contain no repeated letters, i.e., words with a pattern code like [123456]
are not saved. The exception here is short words - up to about three or letters.
Also, solution words are further limited to 13 total characters.
Word Records -
When a solution word is found, if it's key already exist the word is saved in the same record
with the other word(s), in aplhabetical order. Otherwise, a new word record is created with the
new key. The word records are stored with numerically ascending key order. The database is simply
a flat text file.
For example, a segment of the database might contain:
- [121304] didn't
- [123142] aerate, people, proper
- [123451] america, chronic, ..., example, ...trumpet
Word Categories -
Since cryptoquotes consist of the body of a quote followed by the author's name (attribution)
or source, it is useful to categorize solution words. Three categories (sub-databases) are
presently used: General or Generic words - found in the body of the cryptoquote, and First Names
and Last Names - found in the attribution. Word records, as described above, are used for each
category.
To view the current vocabulary database, click: cbWordLib.cpf or
use Crypto-Buster's DB Tools option - View DB.
Solutions using Word Patterns
One of the best solution methods I've found is to focus on those cipher words that contain
repeated letters or letter sequences, punctuation characters, single letters, etal.
A pattern code can be generated for cipher words also. For example: FQFZ'X[121304],
HQXHWQ[123142] and KWTZJQK[1234561].
So, if you encounter a crypto word, such as FQFZ'X, with the pattern [121304], then you
have a definite solution (didn't) because there is only one word with this pattern in the
Crypto-Buster vocabulary (database);
that is, until another (different) word is discovered that has the same pattern code.
Presently, it's the only word I've personally found for the code [121304].
Single Word Solution Search
The mechanism for looking-up candidate solutions is pretty straight-forward. When a coded
crypto word is entered, the program generates it's pattern code and then finds and reports,
for each word category, all words with that key, and, that further have no cipherd letters
that also match the corresponding solution letters. It also reports if no candidates are
found, or, if the pattern (key) does not exist. In the former case you know you've found
another new solution word for this particular pattern code and category.
In the latter case, however, you know you've discovered a single brand new solution
word. These are usually good finds because they are definite solutions - until you
find another word with it's pattern code.
The search mechanism also allows supplying a filter, to narrow-down the search. Here you
simply give it any known, or trial, solution letter(s) for any given coded letter(s). The
search then selects only those solution word candidates that match the specified letter
assignment and don't match for any of the other unspecified letters, and, that do not also
match any of the pre-assigned solution letters. I think this says it right!
Word Phrase (multi-word) Solution Search
Often a particular cryptoquote will not contain any, or many, patterned type words (those
with repeated letters). Thus, the above single-word search capability is of little value
since words with all unique letters are not generally saved in the database. However, in many
cases it often occurs that two or more words (sometimes in succession) share one or more
letters, even letter sequences, in common. An example might be: QWZXGF ZKB GVW QWZPG.
And the phrase search might yield: 'beauty and the beast' (among other possible solutions).
The mechanism for completing this search is quite complicated and can be time consuming.
It starts by first doing a single-word search for each phrase word, independently.
From this set of possible candidate solutions for each phrase word it then, essentially,
builds trial solution phrases consisting of all combinations of candidate solution words
for each phrase word. So, if there are three words in the phrase and each word has five
individual candidate solutions, then the total number of trial phrase evaluations required
would be 125 (5x5x5).
In the trial phrase solution search, once a candidate solution for a particular phrase word
is assigned, its solution letter assignments become filters, for any common letters, in all
of the subsequent phrase word solution searches. It's an iterative process that's kind
of hard to describe in words - and was a bit of a challenge to code!
As with the single word search, it is also possible to initially include filter letter
assignmnets in the phrase search. And also, unlike the single word search, this search
is constrained to looking at either only Generic words for solutions or for Names -
where First name are used for all cipher words except the last one, which uses only Last
names.
Phrase Search Limitations - There are presently many limitations with the phrase search:
- If the solution to any one word in the phrase fails the entire solution fails - it
needs a partial solution capability (some day!).
- The search can "time-out" (abort) due to a 2 minute server transaction time limit
- Some phrases may produce several hundred solutions - making it hard to find the one
or two (or none) valid ones in the reported results.
- The phrase search can find solutions consisting only of either Generic words or
Names - where First names are use for all words except the last, which uses only Last names.
But I may change this to use both First and Last Names for all cipher words.
- Phrase words are solved for in their input order. Sometimes a different order results
in faster searches. I'm thinking about this problem.
Top of Page