HR Geeks

Avatar

Hampton Roads Geek community

757 Monkeys, Typewriters, and Shakespeare — Project GorillaSpeare

Filed under: IRC - cool ideas - humor - lulz

I am sure many of you have heard of the thought experiment relating monkeys, typewriters, and Shakespeare, to the concept of entropy. Monkeys Typewriters Shakespeare you say!? How much cooler can things get? Well, this creative thought experiment goes as follows:

“The infinite monkey theorem states that a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type a particular chosen text, such as the complete works of William Shakespeare” [Source: http://en.wikipedia.org/wiki/Infinite_monkey_theorem].

I am not going to go into the history of that study, or much more. The wiki link above should do you justice. So what do monkeys and typewriters have to do with the 757ers? Well I’ll let you take a look for yourself, as I should not impose any bias:

That’s right, nerds, computers, and text generation. So I had the idea, well if there is a potential for monkeys to produce such a marvelous work as Shakespeare, surely my fellow Homo Sapiens should be able to generate something of equivalent brilliance. Thus, the birth of Project GorillaSpeare. The idea was to gather a log in #proto on the 757 IRC server, and eventually compare the log to Hamlet. Thanks to Project Guttenberg, I obtained a pure text of Shakespeare’s Hamlet, from which I parsed out the lines that represented who was to say what in the play, yep Hamlet is written as a play, and I also removed newlines, and some of the play-actions following a similar form to: [Ham. exits]. Once parsed, I wrote some code that compared each character of Hamlet to the first instance in the IRC log file of that character. Also captured was the user who constructed that character (spaces included). The processing job ended when the IRC log ran out. Now I must say, my parsing job was not perfect, nor can I credit the findings as being anything of scientific worth. But enough with the wordy-foreplay and on to the results:

  • Parsed Hamlet Text: 164642 characters
  • Parsed IRC Log: 32365 characters from January 11, 2008 till April 5, 2008. (log gathering only when I was logged in).
  • We banged out about 19.657% of Hamlet
  • About every 5.087 characters we plopped out 1 character of Hamlet.
Index
Handle Hamlet Character Matches
1 telmnstr 2140
2 count 1027
3 enferex 549
4 remad 379
5 sean 294
6 derez 284
7 skhisma 198
8 chad 196
9 zotobot 193
10 Fister 144

The rest of the results can be obtained here.

So what does this “study” tell us about our entropy? Well, for one, I would think that a 1/5 ratio of Hamlet to Nerds is pretty efficient, but that’s my opinion. The results do not tell us too much, I just figured it would be interesting to see how efficient the IRC room is at generating a novel, without the premise of doing such. Granted, we are not communicating a novel per’se, rather what our blabberings have generated is still somewhat ordered, in comparison to a text that is not our goal of generating. In the thought experiment, the monkeys are typing pseudo-randomly. The next phase (GorillaSpeare 2.0) is to compare our writings to monkeys and measure, what I assume the original intent of the monkeys was, and that is a fairly good quality of pseudo randomness. My conclusion is that monkeys, our brethren, are awesome, and we as homo sapiens are no higher. If we were asked to bang on some keyboards without a premise, I’m sure we could do just a good of job.

-Matt (enferex)

Comments: 1

Who has the better satellite view?

Filed under: cool ideas - links - website

I recently was linked to Flash Earth. This site allows you to switch between satellite map views with a click of the mouse. Compare Google, Yahoo!, Microsoft VE (Virtual Earth), Ask.com, OpenLayers, and NASA Terra.

Flash Earth Screen Shot 01

Images are presented via an all Flash interface and the speed you can switch between services and at which the overlays are changed is quite amazing.

I thought Google had really good images of Norfolk till I switched over to Microsoft VE. Here is an example of the Norfolk Southern coal yard and train depot. (Left: Microsoft VE,Right: Google)

Flash Earth Screen Shot Microsoft VE Flash Earth Screen Shot Google

Thanks Erin.

Comments: 1

Newegg, CAPTCHA, browsers = reCAPTCHA

Filed under: cool ideas

Something curious I noticed today. Login to Newegg using Firefox and you are forced to use a CAPTCHA, use IE and its not there. I’m going with IE here because I couldnt figure out if it was showing me a “Z” or and “N” and neither would work!

Newegg acct login in Firefox Newegg acct login in Internet Explorer

Now while I am generally a fan of CAPTCHA, I am an even bigger fan of using the computer for good; enter reCAPTCHA. Essentially, a Carnegie Mellon team led by Luis von Ahn noticed that during the process of digitizing books there are words during scanning process which cannot be converted by OCR software to text. “Each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA.”

Sample reCAPTCHA OCR scan

Here’s how it works:

“Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.”

By reCAPTCHA’s estimates about 60 million CAPTCHAs are solved by humans every day, equating to more than 150,000 hours of work each day worldwide. Thats a lot of potentially useful computing power for the general good by taking the effort spent solving CAPTCHAs online into “reading” books.

 

 

Comments: 3

Continue