At the end of the previous post, Tweet Stream Similarity, I suggested using a network graph to visualize the similarity relationships between the twitter accounts. Here is such a graph for the same small set of accounts I looked at before:
It nicely shows the small group of technology-related accounts (techcrunch, timoreilly, cshirky), the (britneyspears, mariahcarey) entertainment link, and the fact that the nfl account is not closely related to these others. It's interesting that the twitter ceo, ev, is connected to both the technology group and the entertainment group.
The mariahcarey link to the nba surprised me a bit and I looked into the details. Some of the shared vocabulary that caused the link are 'basket' ( as in Easter basket for mariah, and basketball basket for the nba) , and 'shoot' ( as in photo shoot for mariah and shoot the ball for the nba). It's obvious my metric will confuse different senses of the same word. There are many other shared words between these two accounts like friends, guys, baby, twitter, vegas, and everybody. I'm currently using the latest 200 tweets for each user in the analyis. Using more tweets might give better results.