Wow, I built a Markov text generator years ago, and made a PHP bot to crawl web pages to gather text for it, but that was before Twitter even existed. One suggestion might be to add a dictionary of english words and acronyms so that you can weed out nonsense words and other languages. It would probably have a big effect on the accuracy.
I wonder if the code is available anywhere? I'd love to experiment with it myself and see if I can improve on it some.
This is a rather fun idea, I like it a lot :) So far I'm using a mix of PHP, C, and more PHP and have just set the bugger live here: https://twitter.com/markov_chains
Wow, yours looks like the quality is much better (especially so considering it just started learning a few hours ago). The OP's is a bunch of meaningless gibberish.
This is delightful. What I'd really like to see now is one that could learn from people's reactions to it: pay attention to which of its tweets were retweeted or favorited and try to generate more like those.
I built something like this for Sunlight Labs' Apps For America contest last summer. It feeds US patent application abstracts to a Markov processor to generate random invention descriptions. http://eurekaapp.com/
It's pretty slow (tail call optimization in Ruby would be nice), but what it spits out tends to be pretty funny.
hmm. I can't tell the difference between that and any other twitter feed. Mindless gibberish brain-hand-grenades that take 45 minutes to decipher and turn out to be about a television series I've never seen.
http://www.in-vacua.com/markov_text.html
With some code:
http://uswaretech.com/blog/2009/06/pseudo-random-text-markov...