Shaped Word Cloud: Canada

By: Jeff Clark    Date: Wed, 01 Jul 2009

Happy Canada Day ! This is a Shaped Word Cloud created from the text of approximately 168,000 tweets containing the word 'canada'. The tweets were gathered over an 11 month period from July 31, 2008 to June 30, 2009.

Basically, the larger the word the more frequently it appears in the text. Stop words were discarded. I also adjusted the size based on the relative frequency of the word in the canada dataset versus a baseline dataset containing tweets about india and china. A word like 'country' or 'travel' is used approximately the same for canada as for india and china and so will be de-emphasized. Words like 'hockey' , 'canadian', 'snow' and place names within canada will appear bigger. Because of the baseline content the result will not properly reflect any strong associations between canada and india or canada and china. As usual you can click on a word to see the current twitter search results.

Word Search: Canada Map

By: Jeff Clark    Date: Tue, 30 Jun 2009

Here is another Shaped Word Search in honour of Canada Day tomorrow, July 1st. This one is in the shape of a map of Canada and uses Provinces, Territories, and cities in the word list. Click on the image or here for the PDF version.

Feel free to print this in any newspaper or magazine. I only ask that you keep the reference to http://neoformix.com and that you send me an email letting me know.

Click on the image to download a hi-res PDF version suitable for printing

Word Search: Maple Leaf

By: Jeff Clark    Date: Tue, 30 Jun 2009

In honour of Canada Day tomorrow, July 1st, I have created a Shaped Word Search with a maple leaf design and words I associate with Canada. I improved my tool slightly to sort the words in alphabetical order so it is more convenient to look them up. Thanks to Joe S. for the suggestion. Click on the image or here for the PDF version.

Feel free to print this in any newspaper or magazine. I only ask that you keep the reference to http://neoformix.com and that you send me an email letting me know.

Click on the image to download a hi-res PDF version suitable for printing

Word Portrait: Michael Jackson

By: Jeff Clark    Date: Sat, 27 Jun 2009

Here is a Word Portrait of Michael Jackson created from the titles of many of his top songs.

Click on the image to see a larger version

Twitter Venn: Celebrity Deaths

By: Jeff Clark    Date: Fri, 26 Jun 2009

Here is a Venn Diagram made with Twitter Venn that shows the relative frequency of tweets made about the recent deaths of three celebrities - Michael Jackson, Farrah Fawcett, and Ed McMahon. This analysis was done around 7am EST today and the absolute numbers for tweets/day will certainly increase as more people in the US come online. I expect the proportions among the various combination regions to stay roughly the same.

A couple of points of interest:

  • Celebrity interest ranked by number of tweets is Michael > Farrah > Ed with ratios 62:6:1
  • Ed was mentioned together with both Michael and Farrah more often than he was by himself

To explore the data using the interactive application click on the image below or this link: Twitter Venn for #michaeljackson, #farrahfawcett, and #edmcmahon.

Twitter Employee Clusters

By: Jeff Clark    Date: Thu, 25 Jun 2009

Here is a different view of the relationships between the Twitter employee accounts first presented in this post. I measured the similarity between all the twitter employee accounts based on the overlap in vocabulary used in their last 200 tweets. A clustering algorithm was then used to group them together based on the pairwise similarity scores. The algorithm was tuned to limit clusters to have a maximum of 8 members.

The image below was created from the cluster members data, the similarity between clusters, and the similarity within each cluster. To minimize line clutter I am only drawing a connection if it is one of the top 2 strongest for either end node. The clustering and layout code is based on what I used for the Toronto Twitter Community project but has been recently enhanced to support some new client work.

Here is the PDF version of the Twitter Employee Clusters.

Shaped Word Search - Twitter

By: Jeff Clark    Date: Mon, 22 Jun 2009

Here is another example of a Shaped Word Search. This one uses a Twitter Bird as the image and a list of words related to twitter. I also experimented a bit with adding distractors in order to make the puzzle more difficult. There are a couple of partial matches for each word mixed in to the letter matrix. Click on the image or here for the PDF version.

Click on the image to download a hi-res PDF version suitable for printing

A Shaped Word Search - Malta

By: Jeff Clark    Date: Sun, 21 Jun 2009

I celebrated Father's Day this weekend with my wife's parents. While there, I spent a frustrating and unsuccessful 15 minutes looking for one of the few remaining words in a giant word search my father-in-law was working on. We found out later by checking online that there was an error and the word wasn't even present in the puzzle!

Much more enjoyable was the hour or so we spent doing a virtual tour of Malta using Google Earth. My father-in-law was born there and we had great fun zooming in with the aerial views finding the house where he lived, the church where he was baptized, etc. We were also able to easily see wonderful pictures of the many famous churches and natural features like the Blue Grotto. It's a beautiful and fascinating place and I'd love to visit sometime.

Well, the ideas of Malta, word search puzzles, and the usual mishmash from my coding projects mixed together in my brain while I was sleeping and I woke up early realizing I could easily write a tool to create 'Shaped Word Search Puzzles'. Basically, I can take a template image and a list of words and automatically construct a word search puzzle shaped and coloured to match the image.

The first example is below and uses a Maltese Cross with a list of words related to Malta. Most of the words are place names but there are a few other things mixed in as well. For example, Pastizzi are one of my favourite Maltese foods.

Click on the image to download a hi-res PDF version suitable for printing

IranElection Tweets Phrase Net

By: Jeff Clark    Date: Sat, 20 Jun 2009

I have uploaded the set of tweets I used to create the Iran Election Word Cloud to the wonderful Many Eyes and created a Phrase Net visualization for the data. This image below shows the net for the pattern word1 and word2. So, for example, the arrow connecting 'police' to 'riot' means there were lots of instances of the phrase 'police and riot'.

Static image of the phrase net for #IranElection Tweet Data (see below for interactive version)

See below for the interactive version.


(More...)

Iran Election Tweet Narrative II

By: Jeff Clark    Date: Sat, 20 Jun 2009

I have updated my Tweet Narrative about the Iran election. This one uses 141,000 tweets from the time period June 14-20th, 2009. I have also improved the algorithm that selects the characteristic tweets. The changes are difficult to describe succinctly but did reduce the number of tweets that started with 'RT'. This helps meet my primary goal of constructing a readable summary of the content. For this analysis I also only counted the first 10 tweets from any particular account which helps prevent the tweets from a few individual accounts from dominating the results.

DateCharacteristic Tweet
Jun 14
20:12 gmt
WTF! They're bringing tanks on the streets in Tehran #iranelection *
Jun 15
00:51 gmt
@Change_for_Iran 5:17am people outside are burning Saderat bank building or as it seems from this far #iranelection *
Jun 15
07:13 gmt
@IranNewsNow HUGE NEWS!!!! CNN reports that GRAND AYATOLLA SANAI has issued FATWA to resist govt that steals #IranElection *
Jun 15
10:24 gmt
Iran supreme leader orders probe of vote fraud #iranelection *
Jun 15
18:43 gmt
BEST FILTER SHEKAN: www.julo.free4r.com/prox.html #IranElection *
Jun 15
21:26 gmt
Please postpone maintenance! #nomaintenance #iranelection *
Jun 16
01:52 gmt
Twitter Reschedules Maintenance Around #IranElection Controversy *
Jun 16
05:09 gmt
Iran has blocked "#iranelection" Use #Tehran or #Iranians *
Jun 16
11:18 gmt
#iranelection cyberwar guide for beginners *
Jun 16
16:32 gmt
unconfmd major incident at Azadi - shooting - fires - ppl running #Iranelection *
Jun 16
22:28 gmt
pls everyone change your location on tweeter to IRAN inc timezone GMT+3.30 hrs - #Iranelection - cont.... *
Jun 17
03:44 gmt
NYT publishing sensitive names of Iranians on Twitter. Get them to stop! #NYTfail #iranelection *
Jun 17
05:52 gmt
BLOCK @serv_ SPREADING MISINFOMATIONS #iranelection *
Jun 17
09:29 gmt
Tehran march TODAY 5pm - 7Tir Sq - Meydan 7 Tir - silent - sea of green - #Iranelection *
Jun 17
15:17 gmt
Show support for #iranelection add green overlay to your Twitter avatar with 1-click - http://helpiranelection.com/ *
Jun 17
18:56 gmt
news - Mousavi & Khatami have delivered joint letter to Ministry of Justice demanding release of protestors - #Iranelection *
Jun 18
02:15 gmt
"Change does not roll in on the wheels of inevitability, but comes through continuous struggle." -Dr.Martin Luther King #iranelection *
Jun 18
05:00 gmt
DOA Remix (Death of the Ayatollahs). Theme song for #IranElection www.myspace.com/revolutionofthemindhiphop *
Jun 18
11:00 gmt
Today - Sea of Green - Imam Khomeine Sq - 4pm - Tehran - All wear BLACK - we pray together - #Iranelection *
Jun 18
14:46 gmt
MOUSAVI - 25% inflation means IGNORANCE - THIEVING - CORRUPTION - where is the wealth of my nation? #Iranelection RT *
Jun 18
21:28 gmt
RT @andylevy BREAKING: Faulty #iranelection results attributed to Clerical errors. *
Jun 18
23:15 gmt
confirmed - Saeed Rajaie's (a prominent Iranian wartime martyr) wife has been arrested while praying in Qom - #Iranelection *
Jun 19
04:29 gmt
[Mashable] Facebook Releases Persian Translation for #IranElection Crisis http://tinyurl.com/kuzmc4 *
Jun 19
09:31 gmt
#iranelection Khamenei: (summery) (( correction )) Crowed yell: Death to england *
Jun 19
13:21 gmt
situation in Iran is now CRITICAL - nation is heartbroken - suppression is iminent - #Iranelection *
Jun 19
21:06 gmt
Mousavi's offices are trashed, Mousavi's staff in police custody, Mousavi is missing. #iranelection #gr88 #clarification *
Jun 19
23:22 gmt
#IranElection Must watch video & read transcript at the same time. Chills Pls RT after you watch. http://bit.ly/10qe5H *
Jun 20
06:44 gmt
whenwill we all stand together ascitizens of thewrld and demandour elected officials tohelp? one day wecould be in that crowd #iranelection *
Jun 20
08:28 gmt
Google Earth to update satellite images of Tehran #Iranelection http://twitition.com/csfeo *
Jun 20
13:26 gmt
Unconfirmed: Bomb Blast in Khomeini's shrine #iranelection *

Iran Election Word Cloud

By: Jeff Clark    Date: Thu, 18 Jun 2009

This is a Shaped Word Cloud created from the text of approximately 84,000 tweets containing the term #iranelection. The larger the word the more frequently it appears in the text. As usual you can click on a word to see the current twitter search results.

Feel free to follow JeffClark on Twitter to get more updates on my work.

Iran Election Tweet Narrative

By: Jeff Clark    Date: Tue, 16 Jun 2009

The world is watching with great interest the demonstrations in Iran related to the recent election. The twittersphere is filled with discussion of the event and, of course, much of it is redundant. I have built a Tweet Narrative based on a collection of ~ 60,000 tweets containing the tag #IranElection. Basically, I divided the tweets into 30 groups based on the time they were published and then statistically select the one tweet most representative for that time slot.

DateCharacteristic Tweet
Jun 14
20:52 gmt
RT @StopAhmadi WTF! They're bringing tanks on the streets in Tehran #iranelection *
Jun 14
22:49 gmt
We people of iran want peace! #CNNfail #iranelection *
Jun 15
00:09 gmt
RT @persiankiwi students being killed in tehran uni dorm in amirabad right now. this must stop. #Iranelection *
Jun 15
00:50 gmt
Follow @Change_for_Iran 5:17am people outside are burning Saderat bank building or as it seems from this far #iranelection *
Jun 15
02:49 gmt
RT @parinaz AhmadiN revoked all permits of foreign media & has instructed them to stop reporting or they will face jail time. #IRANelection *
Jun 15
05:00 gmt
Will you wear green tomorrow to support freedom in Iran? #iranelection #greenscream *
Jun 15
05:38 gmt
RT @greenscreamiran: World to wear green tomorrow for freedom in Iran. RT please. #IranElection #greenscream *
Jun 15
07:11 gmt
RT @IranNewsNow: HUGE NEWS!!!! CNN reports that GRAND AYATOLLA SANAI has issued FATWA to resist govt that steals #IranElection RT THIS *
Jun 15
08:42 gmt
RT @persiankiwi March is NOT CANCELLED today. Mousavi is in danger of being killed. #Iranelection *
Jun 15
11:25 gmt
RT @persiankiwi: March Started: ADVICE - carry photos of imam khomeini. they cannot shoot at us with these. #Iranelection *
Jun 15
11:54 gmt
RT @persiankiwi for later we need proxy address to upload film. we have no upload possibility now, can anyone help? #Iranelection *
Jun 15
13:27 gmt
RT @persiankiwi: Valli Asr st closed to traffic - tens of thousands marching - unbelievable sight. #Iranelection *
Jun 15
15:54 gmt
RT @herrcafe RT @phelo Telegraph reports of Iranian Interior Ministry leak that Ahmedinajad came in thir #IranElection - http://bit.ly/GGUy2 *
Jun 15
17:15 gmt
RT @persiankiwi: streets very dangerous now. groups of militia on motorbikes searching for protesters. #Iranelection *
Jun 15
18:36 gmt
RT @stephenfry Functioning Iran proxies 218.128.112.18:8080 218.206.94.132:808 218.253.65.99:808 219.50.16.70:8080 #IranElection *
Jun 15
20:00 gmt
RT @persiankiwi Gohardasht in Karaj - confirmed - people in street batles with militia - #Iranelection *
Jun 15
21:57 gmt
RT @IranRiggedElect: Please postpone Twitter maintenance #IranElection @twitter @ev @bs @ded @ej @lg @nk @rk @vl @al3x @stop #nomaintenance *
Jun 15
23:23 gmt
RT @nttajohn maintenance is postponed, twitter will be posting press release soon #nomaintenance #iranelection *
Jun 16
00:34 gmt
RT IRAN: we are moving location - seperating - situation in Tehran is tense - cant explain #Iranelection *
Jun 16
03:01 gmt
RT From Iran: CONF: #IRANELECTION tag/string is not filtered in #iran. Plz KEEP USING IT! #iran9 *
Jun 16
03:59 gmt
People in Iran, use https://twitter.com/ instead of http://twitter.com/ to avoid hashtag filtering #Iran9 #IranElection #tehran #iranians *
Jun 16
06:48 gmt
RT from inside Iran: rumour spreading Tehran - Army Generals have met in secret - Army considering position #Iranelection #iran9 *
Jun 16
07:16 gmt
RT @stephenfry @arashamel Pls get this out to your followers. #iranelection has been blocked in Iran. Switch to #Iranians , #Tehran, #Iran9 *
Jun 16
08:29 gmt
RT @stephenfry RT: pls get this out to your followers. #iranelection has been blocked in Iran. Switch to #Iranians , #Tehran, and #Iran9 ... *
Jun 16
10:33 gmt
RT @persiankiwi only official march today is valli asr. others may be a trap - avoid others - #Iranelection #gr88 *
Jun 16
12:51 gmt
#iranelection Iran has banned all foreign journalists from reporting on the sts. *
Jun 16
13:50 gmt
RT @twistedchick: RT URGENT: Army forces entering Tehran. Barricade streets where protests are on. Now. #iranelection #gr88 *
Jun 16
15:05 gmt
RUMOUR: the former prince of #Iran, Reza Pahlavi has announced returning to #Tehran in 36h. #IranElection #GR88 *
Jun 16
16:32 gmt
RT [redacted]: unconfmd major incident at Azadi - shooting - fires - ppl running #Iranelection #gr88 *
Jun 16
19:38 gmt
RT @PCMag: The U.S. State Department asked Twitter to delay downtime to help with #IranElection. *

Twitter StreamGraph Update II

By: Jeff Clark    Date: Mon, 15 Jun 2009

I have posted a small update to the Twitter StreamGraphs application to make it more useful. Previously it used Twitter Search to get results for simple queries of the type 'from:twitterid'. Twitter Search currently only gives results going back about 14 days - it used to be much longer. For most people who don't tweet frequently this resulted in a poor quality streamgraph because there weren't many results to work with.

I'm now using the standard Twitter API to retrieve the tweets for any simple user query and it will graph up to a maximum of 1000 tweets irregardless of how far back they go. The difference is shown below for Clay Shirky. The second image shows the new improved results which, for him, go back almost a year. The graph is much richer than the first one which can only base the graph on tweets in the last two weeks.

Previous results limited to approximately 14 days due to Twitter Search limitation
 
New results for simple queries of the type from:twitterid

Chinese Ideogram for Flower

By: Jeff Clark    Date: Sun, 14 Jun 2009

Here is another design made with the flower images from Wikimedia Commons. It's the chinese ideogram for 'flower' rendered with flowers.

Others in this series: FlowerTank, FlowerCycle, and John Lennon Flower Portrait.

Venn: Iran, Iraq, Afghanistan

By: Jeff Clark    Date: Sun, 14 Jun 2009

Here is the result of a Twitter Venn query for Iran, Iraq, and Afghanistan. The recent controversial elections in Iran have obviously grabbed a lot of attention in the Twittersphere. It's interesting that the number of tweets mentioning both Iran and Iraq is roughly the same as the number mentioned Afghanistan and Iraq even though tweets about Iran are so dominant.

Click on the image to see the current Twitter venn diagram for these three terms.

Celebrity Twitter Accounts

By: Jeff Clark    Date: Sun, 14 Jun 2009

I recently made some improvements in my graph display code for a client and have used it to create a new graph showing the vocabulary relationships between many celebrities on Twitter. The post More Twitter Account Graphs explains a little about what the similarity is based on.

The central people in this set appear to be RyanSeacrest, PaulaAbdul, and TheEllenShow. The similarity score between Ryan and Paula is 19.8% and the top words connecting them together are: 'radio', 'game', 'guys', 'adam', 'movie', 'coast', 'studio', and their respective Twitter IDs.

Another interesting grouping is BarackObama, schwarzenegger, and timoreilly. The similarity score between Obama and Schwarzenegger is 16.7% with the top connecting words being 'health' , 'care', 'video', 'president', 'address', 'vote', and 'event'.

I included jtimberlake in the analysis as well but he was removed from the final graph because he wasn't connected strongly enough with anybody else. His closest match was only 4.5% and was with Oprah.

Beetles

By: Jeff Clark    Date: Thu, 11 Jun 2009

After my previous John Lennon Flower Portrait I had the Beatles on my brain and stumbled across a lovely set of photographs of beetles on COLOURlovers. I have tried creating an image of The Beatles using beetles but haven't yet come up with a decent design. Instead I made this beetle outline image from 24 different species. I have seen a lovely physical display of beetles arranged in this manner but I'm not sure where it was. It may have been at the Royal Ontario Museum.

Click image to see larger version

John Lennon Flower Portrait

By: Jeff Clark    Date: Tue, 09 Jun 2009

Here is a flower portrait of John Lennon created from the image on the page 100 Portraits of Iconic People of all time. The flower images are from Wikimedia Commons.

John Lennon Word Portrait

By: Jeff Clark    Date: Tue, 09 Jun 2009

It has been a while since I've created a Word Portrait. Here is one of John Lennon created from the image on the page 100 Portraits of Iconic People of all time.

Here are links to Word Portaits of Obama and Einstein.

Cairo Speech Word Graph

By: Jeff Clark    Date: Thu, 04 Jun 2009

Here is another way to look at Obama's speech in Cairo calling for A New Beginning with Muslims. It uses a standard node link graph to show which words were used near each other in the text. There are virtual springs connecting words that are used frequently together and forces pushing apart nodes so they don't overlap too much. The nodes in orange have been fixed to a certain location and the other nodes move based on the springs and forces until a stable configuration is reached. This allows us to stretch out the graph and easily see where terms lay along a spectrum between 2 or more words of interest.

This first view shows that there was more discussion of 'peace' than 'war' and that words like 'palestinian', 'israel', and 'god' were highly associated with 'peace' relative to the other highlighted words.

Click image to see a larger version

This second view below is of the same graph but with different words pegged in place. The terms 'nuclear' 'weapons' and 'united' 'states' are both closer to 'iran' than the other countries. Similarly, 'women' 'denied' 'equal' is more associated with 'afghanistan'.

Click image to see a larger version

An obvious way to improve these would be to use word stemming to combine different forms of the same word. For example, 'muslim' and 'muslims' would use one node, as would 'peaceful' and 'peace'. This would reduce the number of nodes and probably more clearly expose any relationships.

The code to construct these was written with Processing and makes use of the excellent Traer Physics library.

Obama Cairo Speech StreamGraph

By: Jeff Clark    Date: Thu, 04 Jun 2009

Obama just delivered a speech in Cairo calling for A New Beginning with Muslims. Here is a StreamGraph prepared from the text. It does a reasonable job of illustrating which major themes were covered at the various points in the speech.

Click image to see a larger version

Google Squared

By: Jeff Clark    Date: Wed, 03 Jun 2009

datavisualization.ch has a quick review of a new Google offering called Google Squared. It allows you to see the results of a query organized in a table. One of the suggested queries is 'dog breeds' which seemed to work pretty well. The next one I tried was 'mammals' and it seemed OK as well until I looked more closely at the images shown for 'jaguar' and 'wolverine'...

Twitter Employee Account Similarity

By: Jeff Clark    Date: Tue, 02 Jun 2009

Dave Winer recently investigated Who do the people of Twitter follow?. He looked at which twitter accounts were followed by the most employees of Twitter and was curious about how that might be related to the accounts suggested to new Twitter users when they sign up.

His idea sparked one of my own - what are the relationships between Twitter employees themselves with respect to similarity of the vocabulary used in their tweets ? Here is the graph created using the same layout technique described in my recent post Twitter Account Graphs.

As a whole, the group of twitter employees seem to be well connected based on this vocabulary similarity metric. There are a few people floating around on their own - thuske, akshay_abd, jeremy, lukester, and em33. There is also a doublet separated from the others - keerthi and mikelimondba. They both only have about 40 tweets so this link is more tenuous than the others which are based on the latest 200 tweets. The bottom right shows a fairly cohesive subgroup connected to most of the rest thru ej or perhaps mzsanford/abdur. Co-founder biz seems to be a more central figure by this measure than CEO ev.

WeFollow Twitter Directory

By: Jeff Clark    Date: Mon, 01 Jun 2009

WeFollow has quickly become one of the primary directories of Twitter users. The site lets people assign up to 3 tags to their own account in order to describe their interests. People visiting WeFollow can then see for each tag the list of matching accounts sorted by number of followers.

When you categorize yourself on WeFollow, it sends out a tweet to all your followers having the form: 'Just added myself to the http://wefollow.com twitter directory under: #tag1, #tag2, #tag3'. This automatic viral message has helped WeFollow spread across the twittersphere. Some people have complained that they see too many of these and call them spam. Personally, I find it interesting to see how the people I'm following classify themselves.

These automatic registration messages can be tracked using Twitter Search and reveal lots of information about WeFollow that isn't publically available on their own site. I have analyzed the set of WeFollow registration tweets for the two month period Mar 28 - May 28, 2009. There were 144,506 tweets matching my search pattern in this time frame, or roughly 2400 new people added to the directory per day. Here is the graph over time:

The peak during this time frame occurred at the end of March and was about 6000. The time period for the analysis was shortly after the WeFollow launch which likely accounts for the rough gradual decline shown. It would be nice to see the data for the launch date but unfortunately limitations in Twitter Search prevent me from accessing this data. There appears to be a new peak showing up at the end of May and there are two obvious troughs around April 10th and 22nd. I've checked other data streams I'm monitoring and they don't show troughs or 'holes' during these two dates so it looks pretty likely that there was a problem with WeFollow infrastructure during those periods rather than it being a data collection problem.

The main page of WeFollow shows the 'top tags' but bases this on the number of followers of the people using those tags rather than the tag count itself. Which tags are actually used most often ? An analysis of our sample gives this graph:

The top three tags by follower count on the WeFollow site are Celebrity, TV, and Entrepeneur. When ranking instead by the number of people who actually self-assign these tags these rankings drop to 12 for Celebrity, 44 for TV, and 3 for Entrepeneur. This shows quite clearly that the average account tagged Celebrity or TV has more followers than, say, those tagged with Blogger.

The WeFollow registration tweets also show which tags are used together. I've constructed a couple of different types of graphics to illustrate the tag similarity relationships. This first one is a Clustered Word Cloud and show colored groups of tags that are frequently used together. The big blue group in the middle seems to contain many of the most frequently used tags and doesn't appear particularly cohesive. Many of the others do, at least subjectively, seem to make sense. Here are a couple of example clusters from the image: (church, conservative, christian, pastor, tcot) , (publishing, poetry, books, writing, poet).

This last image was created using the same layout technique as my recent Twitter Account Graphs. Basically, the tag nodes are positioned near others that they are 'similar to' in the sense that they are often used together.

Click on this to see the larger version

North Korean Flag Word Cloud

By: Jeff Clark    Date: Thu, 28 May 2009

The world is watching carefully the things happening in North Korea and there are lots of tweets discussing the issue. I have created a Shaped Word Cloud using 4000 tweets from the last few days and using the North Korean flag as a template. As usual you can click on a word to see the current twitter search results.

More Twitter Account Graphs

By: Jeff Clark    Date: Thu, 28 May 2009

Here is another graph showing a larger set of twitter accounts and their relationships based on a measure of shared vocabulary. The middle left cluster contains many Twitter accounts who discuss web technology including Twitter itself. I'm familiar with many of these accounts and know that the ones around my own icon ( JeffClark ) discuss data visualization (eagereyes, flowingdata, datavis, infosthetics). At the bottom right is a cluster of accounts that I follow which are focused on computational art (blprnt, flight404, toxi, mariuswatz, golan, reas, natzke). The group at the very top contains accounts with an interest in music or entertainment.

To create this graph I'm connecting nodes with a virtual spring if their similarity was greater than 9%. The stronger the similarity the shorter the spring. There are also long springs connecting extremely dissimilar nodes to push them apart but these are not shown. I've tried to avoid the usual tangled mess by not connecting nodes of medium similarity and also by only connecting two nodes if the link is one of the three strongest for either node.

Tweet Stream Similarity Graph

By: Jeff Clark    Date: Thu, 28 May 2009

At the end of the previous post, Tweet Stream Similarity, I suggested using a network graph to visualize the similarity relationships between the twitter accounts. Here is such a graph for the same small set of accounts I looked at before:

It nicely shows the small group of technology-related accounts (techcrunch, timoreilly, cshirky), the (britneyspears, mariahcarey) entertainment link, and the fact that the nfl account is not closely related to these others. It's interesting that the twitter ceo, ev, is connected to both the technology group and the entertainment group.

The mariahcarey link to the nba surprised me a bit and I looked into the details. Some of the shared vocabulary that caused the link are 'basket' ( as in Easter basket for mariah, and basketball basket for the nba) , and 'shoot' ( as in photo shoot for mariah and shoot the ball for the nba). It's obvious my metric will confuse different senses of the same word. There are many other shared words between these two accounts like friends, guys, baby, twitter, vegas, and everybody. I'm currently using the latest 200 tweets for each user in the analyis. Using more tweets might give better results.

Tweet Stream Similarity

By: Jeff Clark    Date: Sat, 23 May 2009

In my recent Twitter Spam post I showed two Twitter accounts that had an almost identical set of tweets. Being able to detect this situation automatically might have obvious benefit in detecting invalid accounts that should be disabled. We can do this by calculating a text similarity measure between the set of tweets coming from the two accounts. A high degree of similarity (say > 80%) is suggestive of automated duplication. This, coupled with some other likely indicators of spam (lots of links to commercial websites, high rate of updates, very low followers/following ratio, lots of followers showing spam-like behaviour) should be good enough for Twitter to find lots of spam accounts automatically.

A tweet stream similarity metric has some other potential uses as well. Given a set of accounts, we could group them into clusters based on similarity of tweet content. Or we could help a twitter user find new people to follow that seem to have shared interests based on tweet content.

There are lots of different functions that can be used to calculate text similarity. The current one I have designed is based on word frequency and excludes standard stop words (the,of,and...) , ignores URLs, ignores some words extremely common in tweets (RT, via), and discounts some other words found often in tweets (like, good, day, thanks...) . This metric can be refined over time and is fairly crude. It completely ignores word order for example and does not consider the semantics of the text at all. I'm hoping it is useful for detecting similarities at a broad topical level.

I have used my metric to calculate the tweet stream similarity between all pairs of 9 fairly well known twitter personalities. I used the last 200 tweets from each account for the analysis with the exception of britneyspears who only has 144 at this time. The lowest similarity score was 2.8% for ev (the twitter ceo) vs nfl (news about the National Football League). The highest was 20.3% and was between cshirky (Clay Shirky - American writer, consultant and teacher on the social and economic effects of Internet technologies) and timoreilly (Tim O'Reilly - founder and CEO of O'Reilly media). The highest score for THE_REAL_SHAQ ( Shaquille O'Neal ) was with the nba twitter account. The highest score for MariahCarey was with britneyspears. The metric seems to be doing a reasonable job. Here is the complete list:

  1. Sim(cshirky, timoreilly) = 20.0%
  2. Sim(cshirky, techcrunch) = 16.6%
  3. Sim(timoreilly, techcrunch) = 15.8%
  4. Sim(timoreilly, ev) = 14.2%
  5. Sim(cshirky, ev) = 13.3%
  6. Sim(MariahCarey, britneyspears) = 12.9%
  7. Sim(THE_REAL_SHAQ, nba) = 11.8%
  8. Sim(MariahCarey, ev) = 11.6%
  9. Sim(ev, techcrunch) = 10.9%
  10. Sim(MariahCarey, nba) = 10.8%
  11. Sim(cshirky, MariahCarey) = 10.7%
  12. Sim(MariahCarey, timoreilly) = 9.6%
  13. Sim(ev, britneyspears) = 9.2%
  14. Sim(timoreilly, nba) = 9.1%
  15. Sim(cshirky, nba) = 9.1%
  16. Sim(THE_REAL_SHAQ, ev) = 9.0%
  17. Sim(ev, nba) = 9.0%
  18. Sim(THE_REAL_SHAQ, MariahCarey) = 8.2%
  19. Sim(britneyspears, techcrunch) = 8.1%
  20. Sim(nba, britneyspears) = 7.8%
  21. Sim(MariahCarey, techcrunch) = 7.7%
  22. Sim(cshirky, britneyspears) = 7.5%
  23. Sim(cshirky, THE_REAL_SHAQ) = 7.5%
  24. Sim(timoreilly, britneyspears) = 7.2%
  25. Sim(THE_REAL_SHAQ, timoreilly) = 6.5%
  26. Sim(THE_REAL_SHAQ, britneyspears) = 6.4%
  27. Sim(nba, techcrunch) = 6.4%
  28. Sim(nba, nfl) = 4.5%
  29. Sim(THE_REAL_SHAQ, techcrunch) = 3.9%
  30. Sim(timoreilly, nfl) = 3.9%
  31. Sim(nfl, techcrunch) = 3.7%
  32. Sim(MariahCarey, nfl) = 3.6%
  33. Sim(cshirky, nfl) = 3.6%
  34. Sim(THE_REAL_SHAQ, nfl) = 3.4%
  35. Sim(nfl, britneyspears) = 3.2%
  36. Sim(ev, nfl) = 2.8%

An obvious next step is to use a better way to visualize this information. I'm thinking of using a network layout with nodes positioned closely and connected for high similarity scores and positioned far apart for low similarity scores. I'm hoping that it would illustrate nicely any structure within the group.

American Idol Tweet Narrative

By: Jeff Clark    Date: Thu, 21 May 2009

I have taken the collection of tweets I gathered for the American Idol StreamGraph and run them through my tool for creating a Characteristic Tweets Summary to produce the following output. My initial attempt included some obvious spam tweets so I had to refine my technique a little bit. Basically, a twitter spammer who repeated the same text over and over was highly likely to have one of their tweets selected as the 'characteristic tweet' for the time period containing the spam. The refinement was to only analyze one tweet per user per time period.

In the output table I also de-emphasized the twitter account for each tweet since they are statistically selected to be representative of an aggregate. The trailing '*' is a link to the original tweet which, of course, shows the proper attribution.

DateCharacteristic Tweet
May 03, 2009American Idol winner David Cook's brother dies of cancer. *
May 04, 2009'American Idol' star David Cook's brother Adam dies of brain cancer! *
May 05, 2009getting ready to watch american idol. *
May 06, 2009Headed home for american idol *
May 07, 2009very mad because Allison Iraheta got off American Idol *
May 08, 2009tickets for the american idol tour go on sale saturday @ 10!!!!!!!!! *
May 09, 2009Just got tickets to the American Idol tour!!!! *
May 10, 2009Tickets for the American Idol 2009 Summer tour on Sale|Tour Dates ... http://tinyurl.com/rdmcyl *
May 11, 2009Can't wait to see American Idol!!!! *
May 12, 2009getting ready to watch American Idol *
May 13, 2009American Idol i'm waiting for who is going home tonight !!!! *
May 14, 2009@jordanknight who cares about american idol...you're my american idol! *
May 15, 2009RT @kingsthings: who do you want to win American Idol? *
May 16, 2009What is the difference between the American Idol and Eurovision? *
May 17, 2009Clouds on horizon for "American Idol" juggernaut? (Reuters) http://ow.ly/7q1O *
May 18, 2009britney to perform on American Idol finale? *
May 19, 2009getting ready to watch american idol. come on,kris! *
May 20, 2009American Idol finale!!!! come on kris!!! even though adam has it, i really want you to win!!!! *
May 21, 2009Kris won the american idol *

Fish Tank

By: Jeff Clark    Date: Thu, 21 May 2009

Sorry - I couldn't resist. The fish images are Reef Fish of the Commonwealth of the Northern Mariana Islands and the tank outline comes from the free font Tanks-WW2.

American Idol StreamGraph

By: Jeff Clark    Date: Thu, 21 May 2009

Here is a Twitter StreamGraph created from the query "American Idol" OR #idol in the date range of May 3-21, 2009. I had to use a custom version of my tool that used tweet data harvested in a different manner from the online version which is limited to viewing the last 1000 tweets only. Given such a popular topic, 1000 tweets only goes back a few minutes and is uninteresting.

A couple of observations:

  • Note the large spikes for 'David', 'Cook', and 'brother' around May 3rd. This occurred because the contestant David Cook's brother had just passed away from cancer.
  • The eventual winner (Kris), was mentioned less often than the other finalist (Adam) for most of the time span.

It would be interesting to see the graph for a longer time period but Twitter Search is currently only returning data that goes back around 21 days.

Some Text Art

By: Jeff Clark    Date: Tue, 19 May 2009

I recently stumbled across a collection of text art creations at The Gawno Magazine. Those of you who have enjoyed my Einstein Word Portrait or other designs created from text might find it interesting. A few sample designs are shown below. See the larger versions including references to the original art at Micrography: Text Art and Typography

FlowerTank

By: Jeff Clark    Date: Fri, 15 May 2009

Here is another simple flower design. The flower images are again from Wikimedia Commons and the tank outline comes from the free font Tanks-WW2.

FlowerCycle

By: Jeff Clark    Date: Fri, 15 May 2009

Spring is the time for flowers and ... motorcycles. Why not combine the two together ? The flower images are from Wikimedia Commons and the motorcycle design from The Gerd Arntz Web Archive.

Twitter Spam Update

By: Jeff Clark    Date: Fri, 15 May 2009

Yesterday I described how I stumbled across a set of twitter accounts obviously being used for spam. I also mused that it shouldn't be that hard to detect them algorithmically. Well, I happened to check them today and found that Twitter reports they have been 'suspended due to strange activity' ! The accounts had existed for quite some time since they had sent out over a 1000 updates and had a substantial number of followers. I suspect Twitter likely detected them automatically and shut them down as part of a regular process but it does seem a bit of a coincidence that they were shut down so shortly after I wrote about them...

Twitter Spam

By: Jeff Clark    Date: Thu, 14 May 2009

I was looking at the Twitter StreamGraph for 'Star Trek' a little while ago and noticed an unusual pattern. There was a peak caused by many users sending the exact same tweet which contained a long list of trending hashtags that were otherwise unrelated - #googlefail, #whyitweet, #hubble, #star trek, #gmail etc. The tweet actually does vary slightly in that a different ow.ly URL is used but they all lead to the same place. It's obvious twitter spam carefully constructed to catch the attention of people following the trending terms.

Here are snapshots of two of the accounts showing that their last 6 tweets were identical. They have different numbers of followers with the one account acquiring an impressive 924 - more than I have. Presumably there is a large set of spam accounts and many follow each other. Other characteristics that seem to suggest spam besides the redundancy are no evidence of @replies and the fact that every single tweet seems to mention a product name and include a link.

I suspect it wouldn't be too hard to detect these algorithmically.

More Twitter Venn Examples

By: Jeff Clark    Date: Thu, 14 May 2009

Here are a couple of venn diagrams created with Twitter Venn for some topics in the news. The first shows H1N1 vs 'swine flu' and it clearly shows that the less technical name is occurring much more frequently in tweets and also that there is a fair amount of overlap. The second example compares 'star wars' with 'star trek' and has a very similar appearance to the first. I'm surprised that with the launch of the new Star Trek movie it doesn't dominate references to Star Wars even more although it does have about a 5-6 x advantage right now. It may be because there was some discussion recently on twitter about the many plot parallels between the new Star Trek movie and the original Star Wars. Notice in the word cloud for tweets containing both terms the high visibility of 'rips' and 'off'.

Click on the images to see the current diagrams inside the interactive tool.

Unilever Logo Reconstructed

By: Jeff Clark    Date: Wed, 13 May 2009

A wonderful example of a composite logo is that of Unilever, one of the world's largest consumer goods companies. There are 25 small icons put together to form a large 'U'. Here is a description of the various icons and what they represent.

Just for fun I've taken the individual icons and rearranged them in a few different ways. Below, see Unilever Man, Woman, and Baby.

The outline icons came from AIGA Signs and Symbols.

Happy Mother's Day

By: Jeff Clark    Date: Sun, 10 May 2009

Happy Mother's Day to all the moms out there ! Here are a couple of simple designs to celebrate. The first was created with my recent custom tool for filling space with images and the second was made using the old Word Hearts application. You can use it to create a customized version with your own words and colors.


 

illo Art

By: Jeff Clark    Date: Fri, 08 May 2009

I've been thinking lately about composite images that are built from smaller sub-images as in my recent Butterfly Falcon and Butterfly Plane. While wandering in the store yesterday I saw some notepads with some interesting composite image cover designs. I've found the designers online at illo Art. A couple of their designs are shown in small form below.

Integra Magazine Cover

By: Jeff Clark    Date: Wed, 06 May 2009

Integra-Magazine is a biannual popular journal on Integrative Tourism and Development,  published by respect, an Austrian based  NGO. I recently gave permission for them to use my World Peace image on the   cover of their next issue which has the theme of Peace/Tourism and Conflict. It just came out of press recently and the cover image is shown below.   The site is in both English and German.

Butterfly Falcon

By: Jeff Clark    Date: Tue, 05 May 2009

This one uses a falcon silhouette with the same butterfly image components. Source images are Falcon Silhouette, Butterfly set 1, and Butterfly set 2.

It was generated with custom code written in Processing.

Butterfly Plane

By: Jeff Clark    Date: Tue, 05 May 2009

Here is another experiment with images reconstituted from sub-images. It was generated with custom code written in Processing. Source images are Plane Silhouette, Butterfly set 1, and Butterfly set 2. This image might make a nice poster.

Spider Man

By: Jeff Clark    Date: Mon, 04 May 2009

This is a different kind of spider man. The image was generated with custom code written in Processing that is a variation on the code used for my Word Portraits. I was inspired by Quasimondo (Mario Klingemann) as mentioned in my last post to experiment with more complex constituent images and image rotations. Source images are Man Silhouette and Spider.

Image Foam Technique

By: Jeff Clark    Date: Sun, 03 May 2009

The excellent computational artist known as Quasimondo (Mario Klingemann) has posted an interesting set of images to Flickr that he created with an algorithm he calls 'image foam'. The technique has some similarites to what I do to generate some of my images - World Peace , and Einstein for example. The base concept is to fill 2D space using component images(or letters) without any overlap. Quasimondo has used more complex and colourful constituent images and placed them in a more varied and interesting manner than I have. Smaller versions of a few of his images are shown below - click on them to see his originals.

Chrysler Tweet Summary

By: Jeff Clark    Date: Sat, 02 May 2009

Here is another example of the Characteristic Tweets idea. The troubles of GM and Chrysler have been in the news for some time now and have been widely discussed in the twittersphere. I have a personal connection to Chrysler having grown up in Windsor, Ontario where they are a major part of the economy and I still have family members who work there.

I have analyzed six months of tweets containing 'chrysler' for the time period Nov 1, 2008 until Apr 30, 2009 - around 66,000 in all. Rather than finding a characteristic message for every day I have split the set into 20 equal-time periods and found the most representative for each period. It tells the sad story fairly well I think. Let's hope if I repeat the exercise in another six months that it has a happier ending.

DateCharacteristic Tweet
Nov 02, 2008odanielpavon: No big sellers in sight to save troubled Chrysler (AP): AP - In crises past, Chrysler has somehow managed to stamp out a b..
Nov 13, 2008reutersbiz: Goldman suspends GM rating, Chrysler urges aid: DETROIT (Reuters) - Goldman Sachs suspended its rating.. http://tinyurl.com/6pwcgo
Nov 21, 2008mayankchandak: Chrysler's Web Edition vehicle package: includes WiFi, iPod touch and a Dell Mini 9: Chrysler has been toying with in-car ..
Dec 05, 2008odanielpavon: Senators grill auto CEOs, eye GM-Chrysler deal (Reuters): Reuters - The chief executives of General Motors Corp and Chrysl..
Dec 12, 2008michaelreuter: Chapeau! US Senate "No bailout for GM, Ford, Chrysler"
Dec 17, 2008nishachittal: Is chrysler really closing all its plants for a month??
Jan 03, 2009odanielpavon: Chrysler gets $4 billion U.S. government loan (Reuters): Reuters - Chrysler LLC on Friday received an initial $4 billion emergency loa..
Jan 05, 2009studentoflife: Chrysler U.S. December sales drop 53%
Jan 20, 2009karlturnbull: fiat to buy 35% stake in chrysler
Jan 26, 2009magneda2: Reuters: Chrysler urges dealers to order cars, cut costs: NEW ORLEANS (Reuters) - Chrysler LLC on Sunday.. http://tinyurl.com/b46y2x
Feb 05, 2009dugg: GM, Chrysler offer to buyout nearly 100% of hourly workers: General Motors is offering buyouts to virtually all .. http://tinyurl.com/c2sw64
Feb 14, 2009googlenewsbiz: GM, UAW talks break off Chrysler talks stall - Reuters:
Feb 18, 2009latimesnews: GM, Chrysler seek billions more in federal loans: General Motors asks for $9.1 billion to $16.6 billion and says.. http://tr.im/grbq
Feb 26, 2009feedsontap: Acquisition Chrysler company_beingacquired Fiat SpA company_acquirer
Mar 12, 2009MobileAuto: Chrysler threatens Canada pull out - The Associated Press
Mar 20, 2009wopularall: Fiat says it won't assume Chrysler debt in deal http://ff.im/-1CV1s
Apr 01, 2009alexanderwatson: Obama: Bankruptcy only option for GM and Chrysler.
Apr 08, 2009toledonews: Chrysler rolls out new Jeep Grand Cherokee after government scolding http://ff.im/-20515
Apr 15, 2009borgellaj: Fiat CEO warns Chrysler unions: cut costs or we walk
Apr 30, 2009SecurityCanada: Chrysler will file for Chapter 11 bankruptcy

Characteristic Tweets

By: Jeff Clark    Date: Fri, 01 May 2009

There are huge numbers of Twitter status messages being created every day. I've been tracking tweets containing the word 'obama' for more than 250 days now and on average there are more than 10,000 tweets/day. There is so much data that it can be overwhelming to try and extract useful information. The nature of the twitter platform means that any useful information for a particular topic is highly fragmented. There is also a large amount of redundant information especially since so many tweets are actually 'retweets'.

Can we construct something approaching a narrative from all the bits ? Can we eliminate much of the redundancy ? I've started to tackle this problem with the following approach:

  1. Gather a collection of tweets for a topic of interest
  2. Eliminate non-English tweets
  3. Partition the tweets into separate bunches by date and time
  4. Analyze the word frequency in the bunches and determine, for each bunch, what the characteristic words are. These are the words that occur relatively more frequently in that bunch compared to the complete set
  5. Use the word relative frequency for each bunch to find a 'characteristic tweet' for each bunch. Roughly speaking, this is the tweet in that bunch which contains the highest proportion of words that are characteristic of the bunch

As an example I have analyzed a sample of tweets taken from Obama's first 100 days in office. The table below shows the characteristic tweet for each day. I used every 25th tweet containing 'obama' in the time period and discarded non-English tweets. This left me approximately 75,000 tweets for the analysis. It seems to work fairly well. You can read through them and get a pretty good summary of the various Obama-related events that have recently occurred.

Days 1-50:

DateCharacteristic Tweet
Jan 20, 2009charlesta: watching Barack Obama's inauguration on TV
Jan 21, 2009francis_gt: watching obama's inauguration speech
Jan 22, 2009GeorgeReese: Obama retakes the oath of office tonight :)
Jan 23, 2009Hops11: Obama overturned global gag rule! YES!
Jan 24, 2009PoliticsFix: Obama reverses abortion-funds policy - http://is.gd/h1VQ - WFIE-TV
Jan 25, 2009odanielpavon: Some global adversaries ready to give Obama chance (AP): AP - In his inaugural address, President Barack Obama signaled conciliation t..
Jan 26, 2009dustytrice: Breaking: Obama will direct EPA to move swiftly to grant 14 states the right to set strict auto emission standards on Mon (via @Populista)
Jan 27, 2009nyycarl07: @ricksanchezcnn Obama's Al-Arabiya interview/Mitchell Mideast visit...mending fences with the Arabic world..meaningful dialog..long overdue!
Jan 28, 2009YahooNews: Obama open to compromise on $825B stimulus bill (AP)
Jan 29, 2009keramurphy: Obama signed the Lilly Ledbetter Equal Pay Bill. Love it.
Jan 30, 2009binikadwa: Even Obama's rooting for the Steelers
Jan 31, 2009bigkumadog: Obama's half brother arrested on charge of drug possession: NAIROBI, Kenya - George Obama, the half brother of U.. http://tinyurl.com/dzazy8
Feb 01, 2009wbaustin: Obama Takes Jab at Chief of Staff at Alfalfa Club Dinner: President pokes fun at his volatile chief of staff Rah.. http://tinyurl.com/cbhkrd
Feb 02, 2009caerickson: Rooney just thanked Obama for supporting the Steelers!
Feb 03, 2009Headline_News: Daschle withdraws as HHS nominee: Former Sen. Tom Daschle has asked President Obama to withdraw his nomination f.. http://tinyurl.com/d66eaj
Feb 04, 2009idigg: Obama To Cap Executive Pay At $500K For Bailout Recipients http://tinyurl.com/abqpq2
Feb 05, 2009gregspradlin: Reading about Fairey and AP......AP alleges copyright infringement of Obama image .. http://tinyurl.com/czxvat
Feb 06, 2009nelking: @joshcagan Headline: "Senate Struggles on Stimulus in Nighttime Session" Related news: Obama adds Dr. Ruth to Economic Advisory Board
Feb 07, 2009latimesnational: Artist of famed Obama poster arrested in Boston: Police in Boston say the artist famous for his "Hope" posters o.. http://bit.ly/FPN6
Feb 08, 2009inaug: #Inauguration Lompoc man has front row seat at Obama inauguration - Lompoc Record: Lompoc man has f.. http://tin.. http://tinyurl.com/bm ...
Feb 09, 2009ElkhartTruth: Obama: "We've got the best workers right here in Elkhart." #obamaelkhart
Feb 10, 2009jclayiv: watching the obama press conference
Feb 11, 2009fwstylewatch: breaking... michelle obama's march vogue cover finally unveiled!
Feb 12, 2009Love_The_Oscars: Obama praises Lincoln's legacy at Ford's Theatre
Feb 13, 2009Politisite: Republican Senator Judd Gregg withdraws as Obama's Commerce Pick over conflict on stimulus #tcot
Feb 14, 2009NewsOnTwitter: MSNBC - Obama: Stimulus bill is 'major milestone': President Barack Obama, savoring his first major victo.. http://tinyurl.com/cvv6gc
Feb 15, 2009lemonhed77: news update Air Force One is one 'spiffy ride,' Obama says: It's longer than a hockey rink, has two f.. http://tinyurl.com/b8wky4
Feb 16, 2009imacsweb: Obama decides on task force to oversee auto industry reform rather than appoint "car czar" http://tinyurl.com/cv66z3
Feb 17, 2009keyc: Pres. Obama Signs Stimulus Bill in Denver | http://keyc.tv
Feb 18, 2009timesnews: Obama to unveil mortgage foreclosure plan http://www.timesoftheinternet.com/47845.html
Feb 19, 2009caniba: Obama goes to Ottawa, ON, Canada and what do the Internets call it? #Obamawa -- I don't say this enough but... I love you Internets.
Feb 20, 2009ThomasGalvin: thinks its funny that Obama is lecturing mayors to "spend wisely"
Feb 21, 2009roadkillrefugee: Obama's Weekly Video Address: Quickest & Broadest Tax Cut EVAH! http://tinyurl.com/dxdg7b
Feb 22, 2009IvorKellock: Obama aims to halve deficit by 2013 http://ff.im/-1aRkZ
Feb 23, 2009AccordionGuy: Sasha Obama Keeps Seeing Creepy Bush Twins While Riding Tricycle Through White House: http://is.gd/kyi1
Feb 24, 2009sumbonet: NewsOnTwitter: BBC NEWS - Japan PM visits Obama White House: Japan's Prime Minister Taro Aso will be the first... http://ff.im/-1bNY1
Feb 25, 2009amyz5: For those who missed my post speech commentweet last night: Obama is to Jindal as Dylan is to the Jonas Brothers. #nsotu
Feb 26, 2009neilkelty: Disappointed in President Obama's budget.
Feb 27, 2009profchandler: RT: @NewsHour: At 11:45 Obama will address Marines at Camp Lejeune.expected to announce withdrawal of U.S. combat forces from Iraq Aug 2010
Feb 28, 2009headlinenews: AP: Obama moved toward commanders in Iraq decision: WASHINGTON (AP) -- President Barack Obama leaned heavily .. http://tinyurl.com/chavyl
Mar 01, 2009ReddingNews: Data on Obama's Helicopter Breached Via P2P?: Tiversa, headquartered in Cranberry Township, Pa., reportedly disc.. http://tinyurl.com/cd28gf
Mar 02, 2009thebodybreaks: Obama nominates Gov. Sebelius for health post: Kansas Gov. Kathleen Sebelius, President Obama's nominee to head .. http://tinyurl.com/d989ha
Mar 03, 2009atifunaldi: Sources: Obama to shelve species rule
Mar 04, 2009TechGlance: Obama taps Julius Genachowski to head the FCC http://tr.im/h10T
Mar 05, 2009leeharveydent: Watching CNN: Obama's Rx for health care reform.
Mar 06, 2009news_by_robots: Obama to Lift Ban on Funding for Embryonic Stem Cell Research @Washington_Post
Mar 07, 2009caketeagirl: Pleased about Obama's decision to reverse Bush's limits on stem cell research
Mar 08, 2009ftantillo: "The Rock" Obama on SNL = awesome
Mar 09, 2009Atticus_James: yay obama and stem cell research!
Mar 10, 2009HootieMcBoob: Go Obama on the stem cell research! WOOOT! :D

Days 51-100:


(More...)

TED Shaped Word Cloud

By: Jeff Clark    Date: Tue, 28 Apr 2009

Brain Pickings just built a typographic visualization using Wordle based on the title text from the various TED talks. If you don't know about TED already then be sure to check it out. They provide 'riveting talks by remarkable people, free to the world' and it's some of my favourite content on the web.

Brain Pickings used the title text from this spreadsheet to generate their cloud. I've taken both the title and summary text to produce my own shaped word cloud based on their logo. Click on a word to see the related talks, pick one and then watch it !

NAS Remarks StreamGraph

By: Jeff Clark    Date: Tue, 28 Apr 2009

Yesterday President Obama delivered an address to the National Academy of Sciences. I am a strong believer in the critical importance of science and technology as a means of improving the average quality of life in our world and it was refreshing to hear from a president who believes the same. Here are a few snippets:

At such a difficult moment, there are those who say we cannot afford to invest in science. That support for research is somehow a luxury at a moment defined by necessities. I fundamentally disagree. Science is more essential for our prosperity, our security, our health, our environment, and our quality of life than it has ever been.
we are restoring science to its rightful place ... Under my administration, the days of science taking a
back seat to ideology are over. Our progress as a nation – and our values as a nation – are rooted in free and open inquiry. To undermine scientific integrity is to undermine our democracy.

The streamgraph below was created fom the complete text of the speech. Click on it to see a high resolution PDF version.

Lexical Analysis of Debates

By: Jeff Clark    Date: Mon, 27 Apr 2009

I did a fair number of posts last year that analyzed various texts related to the US election. A number of different techniques were used including StreamGraphs , Speech Contrast Diagrams, an interactive transcript visualizer, and, of course, word clouds. I introduced Martin Krzywinski in my last post as the creator of Circos. Martin has also done some excellent work in the area of lexical analysis and visualization of text in the post Lexical Analysis of 2008 US Presidential and Vice-Presidential Debates — who's the Windbag?

Here is a portion of one of his graphics that illustrates thematic profiles for Obama and McCain during a debate. It has some conceptual similarity to my interactive transcript visualizer.

These word clouds below were created by Martin and use different colours to show the words spoken uniquely by Obama in green, uniquely by McCain in blue, and by both men in white. The first one shows nouns and the second is limited to adjectives. I think the idea of limiting the cloud to a particular part of speech is a fruitful one to explore.

In the same document Martin also formulates and calculates an interesting 'windbag index' that is a composite of measures of repetition in various aspects of speech.

Circos

By: Jeff Clark    Date: Mon, 27 Apr 2009

FlowingData recently had an interesting guest post about an alternative way of visualizing tabular data. It was by Martin Krzywinski and featured his visualization tool called Circos. Circos can produce a wide variety of information-rich, radial-based diagrams.

Some of the comments on FlowingData were quite negative and inspired a follow-on post by Nathan titled Narrow-minded Data Visualization. His post and the many related comments are interesting reading for those who care about data visualization and the tension between traditional/novel , expert/amateur, and cautious/exuberant approaches.

Some of these diagrams are very information-dense and might be a challenge to decode for those without much experience in interpreting them but I believe they are likely a powerful technique in the right situation. I suspect that no matter what your feelings are on the utility you will find it stimulating to examine a few example diagrams created with Circos.

Earth Day Twitter Map

By: Jeff Clark    Date: Thu, 23 Apr 2009

I was too busy yesterday to create this for Earth Day so here it is one day late. Besides, shouldn't every day be earth day? Around 3500 tweets containing the text 'Earth day' were analyzed and the shaped word cloud below was created based on the frequency of the other words used. Click on a word to see the latest matching tweets. I used the same globe image as in My World Has Room For Wildlife and World Peace. The image was made with NASA World Wind.

Twitter Spectrum in Print

By: Jeff Clark    Date: Mon, 20 Apr 2009

A few months back I was contacted by someone at McKinsey & Company for permission to include a graphic in a publication they were producing. They used my tool Twitter Spectrum to create an image illustrating the words associated with the terms 'collaboration' and 'individualism' in the latest tweets on Twitter. This was used in a section of a printed book called What Matters - Ten questions that will shape our future . The book was distributed to leading business executives and world leaders at the World Economic Forum annual meeting in Davos Switzerland at the end of January. I was very pleased to be associated, even in such a small way, with such a prestigious undertaking.

The online version of the content does not include the image but the scanned image is shown below. It shows that 'collaboration' was used more frequently than 'individualism' in tweets. Dominant terms related to collaboration are: blogging, power, world, strategy, socialtext, and tomorrow. Terms related to individualism include rugged, hyper, sovereignty, obama, and american.

The image above was generated in Nov 2008. Just for fun I have created the current spectrum to see how it compares. It looks quite different and is much more balanced. Note that McKinsey manually recreated the image they used in order to get the colours they wanted.

Celebrities on Twitter

By: Jeff Clark    Date: Sun, 19 Apr 2009

There has been a lot of attention on Twitter this week to three celebrity-related topics. Early in the week there was a lot of discussion about Susan Boyle, the candidate on Britain's Got Talent. In the middle of the week there was Ashton Kutcher becoming the first Twitter user to have more than 1,000,000 followers. Finally, on friday, Oprah joined Twitter and featured it on her show.

I've used Twitter Venn to compare the current rate at which these three people are being referred to in the TwitterSphere. Susan Boyle is slightly behind Oprah right now and far ahead of Asthon Kutcher. These results reflect the current zeitgeist and could be quite different tomorrow. It's also interesting to note the high frequency of the hashtag #herebeforeoprah within oprah references. Click on the image or this link to see what it's like right now.

Public Commitment

By: Jeff Clark    Date: Sat, 18 Apr 2009

We are 108 days into the year and Neoformix has had 28 posts to date. This works out to about 1 post every 4 days. I'm making a public commitment right now to try and post more often with a target of averaging 1 post/day for the rest of the year. I will continue to highlight my own work but you can expect to see more posts about other data visualization related material on the web.

Thank you to everyone for your continued support. Feel free to recommend ideas for new content through email or Twitter !

Mesh 2009 Word Map

By: Jeff Clark    Date: Thu, 09 Apr 2009

The Mesh Web Conference just finished in my hometown of Toronto. I didn't attend but it looked like it would have been an interesting experience. I built another shaped word cloud based on tweets containing the text 'mesh09' sent over the last few days. The larger the word, the more frequently it was used. Click on any word to see the related tweets in Twitter Search. It seems to illustrate the primary topics and speakers reasonably well.

ZZZZZ

By: Jeff Clark    Date: Wed, 08 Apr 2009


(More...)

Web 2.0 Expo Word Map

By: Jeff Clark    Date: Sat, 04 Apr 2009

The image map below was constructed from the most popular content words in tweets about the Web 2.0 Expo taking place in San Francisco. The larger the word, the more frequently it was used. Click on any word to see the related tweets in Twitter Search.