A somewhat odd attempt to answer the question:
(Please note no monkeys were harmed during the course of this experiment.)
ShakespeareMonkey@Home is a distributed application that simulates a population of evolving monkeys typing lines from The Complete Works of William Shakespeare (Each client represents one monkey family typing). The family connects to the “Zoo” (a mySQL database containing Shakespeare’s complete works), selects a play to work on, and receives a line from that play to type. Once the line is completed, the family reports its work to the zoo and waits to receive a new line.
CURRENT MONKEY STATUS: Unfortunately the server that housed the database is no longer operational. As soon as I can get the server application back up and running in a new home, the client will be available again.
Shakespeare Monkey Probability
For our purposes, we won’t use an actual monkey or a typewriter. Instead, we will employ a digital monkey simulation — for now, let’s call our monkey George.
George types at a reduced keyboard, containing only 32 characters:
1 space bar
5 punctuation marks
On George’s keyboard, the space bar is slightly larger than the other keys and George’s fingers tend to hit it more often. The probability of George striking any given key is as follows:
Each Letter and Punctuation Mark 1 in 34 The space bar 3 in 34
Consider the phrase:
To be, or not to be: that is the question:
The phrase is 42 characters long — 9 space bars, and 33 other characters. If George starts typing, the chance he’ll get the first character right is 1 in 34. Since the probability he’ll get the second character right is also 1 in 34, he has a 1 in 34*34 chance of getting landing on both the first two characters. It follows that the probablility that George will type the full Shakespearan stanza is:
((1 / 34) ^ 33) * ((3 / 34) ^ 9)
or ~ 1 in 1.0666531575100571968353546589146 * 10^60
That exact # is a:
1 in 1,066,653,157,510,057,196,835,354,658,
chance of George getting it right.
Our digital monkey, however, is a pretty fast typer — George can do about 20 lines per second. How long would it take, therefore, for George to have a 99% probability of eventually getting it right? To compute this number, we’ll first need to consider how many lines George would have to type to have a 1% probability of getting it wrong every single time.
Assume N = that long # listed above (the probability of getting it right on one attempt).
The probablity of George getting it wrong on one try is therefore:
(1 – 1/N)
We want George to have a 1 out of 100 chance of getting it wrong over X # of tries, therefore:
(1 – 1/N) ^ X = 1/100 which is equivalent to:
((N-1)/N) ^ X = 1/100
Solving for x, we take the natural log of each side:
ln((N-1)/N) ^ X) = ln(1/100)
X * ln((N-1)/N) = ln(1/100)
X = ln(100) / (ln(N/(N-1))
As N goes to infinity, ln(N/(N-1)) = ~1/N, therefore:
X = ~ N * ln(100)
Since George types 20 lines per second, we can say that he’ll type 630,720,000 lines per year. The number of years it will take him is:
(N / 630,720,000) * ln(100) = 7.8 * 10^51 years
Ok, ok, so that’s a really long time. But what if, say, we have a hundred trillion monkeys all typing at once? How long would it take to have a 99% probability of one of the monkeys getting it right? A few calculations will lead us to the conclusion that this will take about
7.8 * 10^37 years
The Collaborative Monkey
Consider this, however. What if all the monkeys are connected together via a massive network, and 1 monkey tracked all of the monkey’s activity in a database? And instead of typing randomly, each monkey would type in sequence:
Each character represents a number from 1 to 32 (i.e. a 5-bit number). If each character can be represented by a 5-bit number, than the entire string can be represented by a 210-bit number. So consider if the total (2^210) was divided into 2^194 “blocks” of numbers. Each monkey connects to ServerMonkey and retrieves a random 194-bit number outlining which block to check. The monkey is then required to count through all the possibilities inside their block — starting at (2^16)*block# and finishing at ((2^16)*(block#+1))-1 (note block #’s start at 0). Each monkey would be able to complete a block in ~55 minutes. How long would it take for a 99% probability of “To be or not to be. . .” with the 100 trillion collaborative monkeys going all at once (never typing the same line twice)? Sadly, the answer is still a massively large number — 3.9 * 10^35 years. I should point out, however, that we have saved ourselves about 7.761 * 10^37 years. Not so bad, eh?
The Genetically Enhanced Monkey
Even though millions of monkeys working together on a network has improved our result on a massive scale, we still haven’t solved the problem of turning a monkey into shakespeare in an amount of time that any human could experience in a given lifetime.
What if we bred digital monkeys with DNA signatures — and these DNA signatures determine the pattern by which the monkeys type 42 characters? A monkey’s “fitness” would be the number of characters that match “to be or not to be. . .” in his or her DNA. Let’s start with a population of 100 monkeys and give the high fitness monkeys a proportionately higher chance of being selected for mating. At birth, each monkey child’s DNA is a combination of two selected parent monkeys. (In addition, each individual character in the child’s DNA string also has a small probability of “mutation” — i.e. of becoming a random character, rather than one from the parent.)
After running the simulation only once, the genetically enhanced monkeys were able to type the line of Shakespeare in 30 seconds, after only 198 generations of monkeys. In a real-world setting, if a new generation of live monkeys could be birthed every 10 years, we’d only have to wait 1,980 years for monkey + typewriter = shakespeare.
We have established that the monkey’s DNA must determine that individual monkey’s pattern of typing. Let’s assume we want our monkeys to type “To be or not to be: That is the question:” There are 32 characters on the keyboard, and 42 characters in the desired string. How can we express the monkey’s DNA as binary information?
Each key on the monkey’s typewriter corresponds to a 32-bit number.
KEY # 1 = ‘a’ or ’00000′
KEY # 2 = ‘b’ or ’00001′
KEY # 3 = ‘c’ or ’00010′
etc. etc. etc. up until
KEY # 32 = SPACE or ’11111′
Let’s take the genome:
We can divide it in to 5-bit segments:
’01011 00111 10101 01001 01010 10101 11100 00101′
This translates to decimal:
Which translates to a typing sequence of key #’s:
And which translates to the phrase:
So now, we have a monkey somewhat like Jack Torrence in The Shining, only instead of “All work and no play make Jack a dull boy,” we have a monkey that types pages and pages of “lguiju;flguiju;flguiju;flguiju;flguiju;f. . . ”
For us to get “To be or not to be: That is the question:” out of the monkey, we’ll need a longer genome — a 210-bit character one.
16, 7, 23, 0, 5, 0, 12, 7, 11, 4, 8, 0, 10, 24, 11, 8, 3, 27, 11, 28, 22, 16, 16, 5, 11, 18, 14, 9, 17, 27, 5, 15, 21, 9, 17, 27, 20, 28, 31, 6, 1, 17
Suppose we have a group of 5 monkeys, with the following DNA sequences (expressed as a character string for simplification).
1. “,:drxkaitxsdp: f?ynjasqosupchujk,, rizhg:r”
2. “cklavicmemt :bbkyyn fjgd?y?k?dl’goyponko’a”
3. “lt;wv hmstx,cxy ‘,mo;vwwpbqkuibuhq;f’q'ow”
4. “zqayapqfiszwrh ?isye: ghg:mllowx’zc’uyadly”
5. “ielcm,sc dnzm,ngeuviu:lu;zo ojxwpizvk;adqe”
Each monkey’s fitness is the # of characters in their DNA that match “To be, or not to be: That is the question:”
1. Fitness = 0
2. Fitness = 1
3. Fitness = 2
4. Fitness = 0
5. Fitness = 2
Total Fitness = 5
1. % of Total Fitness = 0%
2. % of Total Fitness = 20%
3. % of Total Fitness = 40%
4. % of Total Fitness = 0%
5. % of Total Fitness = 40%
Next, we will select parent monkeys for breeding. Each monkey’s chance at being selected is equal to their % of Total Fitness. In this case, it’s as if we spun a roulette wheel with these possible outcomes: 2,3,3,5,5 (since Monkeys 1 and 3 scored a zero,
they are not eligible for reproduction). Let’s assume we randomly select monkeys #3 and #5 to be our first parents.
Next we pick a random point in the chromosone string for crossover. For a true genetic algorithm, we would revisit our 210-bit representation of the genome, but for our purposes, we’ll just pick a # between 2 and 41 and deal with the character string. Let’s say we pick 18. For our new monkey, we take the first 18 characters of Monkey #3 DNA:
and the last 24 characters of Monkey #5 DNA:
Our new DNA is:
“lt;wv hmstx,cxy ‘viu:lu;zo ojxwpizvk;adqe”
We’re not done, however. Let’s assume there is a 2% chance of genetic mutation on any given chromosone in the newly born monkey’s DNA and chromosone #8 changes to an “e”. Our final DNA:
“lt;wv emstx,cxy ‘viu:lu;zo ojxwpizvk;adqe”
Well, we haven’t gotten “to be or not to be. . .” yet, but we do have a newly born monkey with a fitness of 2. If we increase the monkey population size from 5 to 100, and iterate the process over several hundred generations, we will most likely find a monkey with a DNA string of “to be, or not to be: that is the question:”.
Now, what we need is a zoo of monkey families, all working on different parts of Hamlet at the same time — a distributed network of evolutionary typing monkeys. . .