In Word Clouds from Adjusted Counts I introduced the idea of accentuated word clouds and mentioned the possibility of breaking down a collection of tweets by geographic origin and contrasting the word counts to uncover geographic patterns. I've done something similar with a large collection of tweets sent from either Toronto, London, or San Francisco. They are actually a 1% sample of all the public tweets sent within 50 miles of the respective city centers during the month of July, 2009.
The three blocks of words reflect those words used frequently and proportionally more often in tweets being sent from the respective cities. Apart from the city names, some prominent words are:
The prominence of 'pumper' for Toronto puzzled me a bit so I looked into the data more closely. There is a series of twitter accounts similar to ToFireE that pump out alerts for every emergency fire unit dispatched in the city. They include reason for dispatch, location information, and also the vehicle which is often named pumper-nnn where nnn is some number.
Another interesting thing that you can pick out from the clouds is that San Francisco tweets contain a lot more hashtags than in London or Toronto. Those that seem largest are: #science, #gaming, #loss, #prop8, #discount, #ffs, #weight, #wine, #sfgiants. It might be interesting to more carefully examine the proportion of tweets that contain hashtags and whether it is changing over time.