I was originally introduced to Markov chains by Josh Millard’s Garkov, a project that applies this mathematical model to Garfield comics. I thought it was an amusingly random idea, but didn’t fully appreciate the concept until Jeff Atwood wrote a post detailing how Markov Chains work.
To quickly summarize, the algorithm chooses its next word using a probabilistic function based on two criteria: the current word, and a large sample of coherent text. The chance of a word being chosen is based entirely on how frequently it follows the current word in the sample text. For instance, if the sample was “To be, or not to be, that is the Question”, the word “be” would have a 50/50 chance of being followed by “or” or “that.” As the sample size increases, the text produced becomes increasingly sound. [Update: The size of the sample does not necessarily increase the coherency of the text, the number of words used in the seed is the biggest determining factor. Thanks for the correction, Josh.]
Reading about this got me very curious. What would a Markov chain generated using my own writing look like? Would it overuse “however” and frequently misapply the word “parse” like I do? I decided some experimentation using the Markov Text Synthesizer was in order.
The first test sample was my Rockin’ The Boat series on rock ‘n’ roll and race relations in 1950′s America. At a scant ~4000 words it was a poor sample, which made it quite possible to recognize parts of the original sample in the text:
Rock ‘n’ roll party. Freed considered completely alien. Mintz had his personal success, but rather the Speeds had been a little pressed for school facilities. They were subject to unsuccessful alternative names such as much for the trend was classified under the Brown v. Board of, in ten in, most popular opinion in the same records was no memory of the major label was a lesson. He was a token job. The major record store owner in postwar economy. During the following lyrics which were localized in America. A successful rhythm and I feel would sing to popular.
For my next attempt, I used every post listed under the Video Games category. This was over ten times larger than my original sample, and the text was much more randomized as a result (it even snuck in a “however”, hooray!):
There are games have two things spectacularly well. One where it will work at first. They mix means of Internet hype around the U.S. Addresses Only. Heartbroken, I was, however, describe what is one of the censor, the creative Wiimote according to the end. Plot holes and Escapist feature. Embedded below is minutes of military shooters had her to convince you feel that the game will work done with blood all available up the narrative. He created the Medic class impressions. The use of his weekly feature film was already breaking my favourite newspaper articles. Read the end. The very effective ways. My favorite aspect of a short stories, particularly dystopian centuries. At first game Rock Band, but the best like a timeless void movie revolved largely passed under them all.
Next, I decided to use the same sample but increase the “word grouping” to two. This means that the function will now determine how frequently a word follows the two previous words (in order.) This made the text much more coherent, but brought back the problem of being similar to the original sample:
There have been used by other games in the villa’s lobby. Recharging your beam sword’s energy with a Little Sister crying for “Mr. Bubbles”, but it costs a whopping. December is a classic for years to come. Valve could have easily rested on their laurels and sold Portal on its PS release Tools of Destruction, with some great part series detailing each other, it seems that the FPS genre has been getting a lot of the murder investigation. The evidence found and is by far the best thing the Bioshock movie revolved largely around and go buckwild with ADAM. I would rescue them all. It was also annoying at times. To move Zack around you have to mimic it in real life.
Finally, I decided to test the program with a very very large sample: Cory Doctorow’s free e-book Little Brother. I don’t think this is the sort of remixing Doctorow had in mind when he released his book under Creative Commons, but here goes:
I’m a stone Disney park freak see my fear. Marcus, it can fill you with our magic coffee bean supply. If you’ve got it for a lot longer than me. One only spoke Arabic he began, when one of the country, the world to know it’s there. The truck drew closer. I kept hitting it with both hands. She was dressed in a drawer with the texture of very ugly stuff on the back of a polished marble plinth. I decided I wanted to do stand down. Will you promise me that? They promised with all the walls and other little devices, my wallet intact. Forbidden Planet rarely do I go to jail, Marcus, and not that many cool puzzles that you wanted to be a part of that Hummer, my head for my book by net or phone and memory sticks are private. Thats because of cryptography message scrambling. The math behind crypto is good and solid, and you wonder, Was it always like that? I hadn’t thought about trying to intimidate me the picture she’d snapped of us, if you wanted to.
I’m amazed, a very large sample with a word grouping of two or greater can produce some terrific results. It makes me wonder why spammers always end up with such nonsense titles, use a larger sample you jerks!
In my various attempts, I also ended up with some rather spectacular phrases. Here are some of my favourites, I swear I did not make any of these up:
The whimsical charms of each of disappointed children
Holding the Wiimote in the holiday rush? Either way, you’re probably online looking for awesome news like this
Your character’s level is on a couch. They’re comfortable for both sides.
a combination of raw stem cells, dubbed thing, and the USA
I believe I was blindsided by exploiting my childhood memories
…and the best one by far:
This essay will be successfully ignored altogether.
Check out Ben Abraham’s post for some more great ones (he beat me to the punch by a few hours.) Finally, I’ll leave you with a challenge. Can you guess what sample I used to produce the text below?
Peyote solidities of halls, backyard green tree cemetery dawns, wine drunkenness over the rooftops, storefront boroughs of teahead joyride neon blinking traffic light, sun and moon and tree vibrations in the roaring winter dusks of Brook-lyn, ashcan rantings and kind king light of mind, who chained themselves to subways for the endless ride from Battery to holy Bronx on benzedrine until the noise of wheels and children brought them down shuddering mouth-wracked and battered bleak of brain all drained of brilliance in the drear light of Zoo,
***********ANSWER***********
Trick question, it’s actually a word-for-word excerpt from Allen Ginsberg’s Howl. Only the formatting has been edited. I think the beat poets would have dug Markov chains, don’t you?






