Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A Markov text generator that learns from the Twitter streaming API (twitter.com/tweetovermind)
42 points by pclark on May 23, 2010 | hide | past | favorite | 15 comments


Here is a layman's introduction to Markov text generation:

http://www.in-vacua.com/markov_text.html

With some code:

http://uswaretech.com/blog/2009/06/pseudo-random-text-markov...


Thank god for this... I had no idea what Markov was!


Wow, I built a Markov text generator years ago, and made a PHP bot to crawl web pages to gather text for it, but that was before Twitter even existed. One suggestion might be to add a dictionary of english words and acronyms so that you can weed out nonsense words and other languages. It would probably have a big effect on the accuracy.

I wonder if the code is available anywhere? I'd love to experiment with it myself and see if I can improve on it some.

Edit: I found the code: http://github.com/OEP/markov


This is a rather fun idea, I like it a lot :) So far I'm using a mix of PHP, C, and more PHP and have just set the bugger live here: https://twitter.com/markov_chains


Wow, yours looks like the quality is much better (especially so considering it just started learning a few hours ago). The OP's is a bunch of meaningless gibberish.


Thanks! To be honest, I'm really quite scared by how often its tweets kinda-sorta make sense ;)


Oh gosh. You just programmed a 13 year old.


I altered the engine to stop rejecting @ and # characters, and now it looks even more like a proper teenager ;)


This is delightful. What I'd really like to see now is one that could learn from people's reactions to it: pay attention to which of its tweets were retweeted or favorited and try to generate more like those.


There was a similar thread on reddit 2 days ago: http://www.reddit.com/r/programming/comments/c6o1t/i_created...

I pointed out a bot that I run there: http://twitter.com/twatterhose

Here's the code: http://github.com/avar/bot-twatterhose

And the Markov engine powering it: http://hailo.github.com/


A friend of mine runs a similar bot on Twitter, seeded with Twitter messages it sees:

http://twitter.com/x11r5

...on Identica, seeded with Identica messages it sees:

http://identi.ca/x11r5

...and on the web, seeded by content from an IRC channel:

http://www.x11r5.com/

As the about page says, "X11R5 is an insane geek on Identica, X11R5 is an insane 12-year-old on Twitter."

http://www.x11r5.com/wtf?


I built something like this for Sunlight Labs' Apps For America contest last summer. It feeds US patent application abstracts to a Markov processor to generate random invention descriptions. http://eurekaapp.com/

It's pretty slow (tail call optimization in Ruby would be nice), but what it spits out tends to be pretty funny.


This is a reference to a classic hack, Mark V. Shaney, where a Markov-chain generator was used to post to Usenet. (http://en.wikipedia.org/wiki/Mark_V_Shaney)

Bonus tie-ins to HN topics:

Rob Pike was one of the perpetrators (an author of the Go language)

It was featured in Martin Gardner's Mathematical Games column in Scientific American

(yeah, I wrote one too, after reading the article. I think it's catnip to hackers, like the Game of Life)


I'm sorry, but I really don't see the benefit for having made this thing-- the tweets it puts out are useless!

Are we just playing around in how to program Markov text generation? Because who cares; they figured that out ages ago.


hmm. I can't tell the difference between that and any other twitter feed. Mindless gibberish brain-hand-grenades that take 45 minutes to decipher and turn out to be about a television series I've never seen.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: