Twitter Venn

By: Jeff Clark Date: Wed, 17 Dec 2008

Venn Diagram's can be used to illustrate the amount of overlap between various sets of items. In the projects section of Neoformix I have just published an application I call Twitter Venn. It supports investigation into the relationship between how words are used within the messages of all the people using Twitter.

Basically, you type in either two or three terms separated by commas, click 'Search', and get something like this:

In this example, the large circle on the left contains a great many small red circles which represent messages (tweets) that contain the word 'chocolate' but do not contain 'milk'. The large circle on the right has blue circles representing messages that contain 'milk' but not 'chocolate'. The intersecting area has purple circles indicating how many tweets contain both terms used together. The number of smaller circles is mean't to show how frequently those words or combinations of words are used by people within Twitter. The bottom right area has a small table showing an estimate of the number of tweets/day for the various combinations.

You can click on one of the regions to see a word cloud of the most commonly used words in the corresponding messages. The selected region has a slight gray background. In this example, the purple intersecting region is selected and the word cloud shows that the words 'drinking', 'soy', and 'need' were commonly used in the tweets that contained both 'chocolate' and 'milk'.

The bottom of the application will show tweets matching the selected region. These change every few seconds unless you hover your mouse in the rectangle which will pause at the current one - for you slow readers! If you click in this region a browser window opens showing you the original tweet.

If you enter three terms in the search box you get a diagram with three intersecting circles:

This Venn diagram shows that when I did this analysis the word 'hot' was used more than 'chocolate' which was, in turn, used more frequently than 'milk'. It shows even more clearly that the combinations 'hot+chocolate' and 'chocolate+milk' were much more common than 'hot+milk'.

Note that you can use multiple words as a term or even a phrase within double quotes. So 'christmas party' will match messages that have both words anywhere within them. But '"christmas party"' will only match if they are in that precise order. This third example shows the difference. Note that all instances of messages having the phrase will also match with the two possibly-separate word version. That's why the blue set is empty - the matching messages are all in the intersecting purple region.

You can also use the special operators 'from:TwitterID' and 'to:TwitterID' to match messages from or directed to a particular person. The example below also shows in the red square a very small purple circle. This indicates that there are tweets in this intersecting region but not enough to warrant the use of a complete symbol.

The 'Show URL' button will open another browser window with the URL parameterized to show the current search. This makes it easy to repeat a given analysis over time, to embed a link to something particular, or to show someone something interesting. Just send them the URL with your parameters. Here is an example: http://www.neoformix.com/Projects/TwitterVenn/view.php?q=bank,auto,bailout

The Tweets/day are estimated based on the latest results and can fluctuate quite quickly - especially for commonly used terms. So searching for 'coffee' will give you a lot more tweets/day in the morning than late at night. Rates for uncommon terms will change much more slowly.

Thanks to Processing for the development tools and Twitter Search for the data.

Have fun - give Twitter Venn a try !

Text Snowflake Creator II

Blog

Twitter Venn Examples