I was looking for pictures of the new Apple iPad and stumbled across this image of Apple form Factor Evolution. It's got lots of images of Apple products on a nice simple white background and was perfect fodder to use with the Image Foam Technique so I made this version of the Apple logo from the product sub-images.
Last night President Obama delivered the State of the Union Address. The Shaped Word Cloud below was created from the text.
In a recent post I showed the Top 20 Individual Data Visualizations Mentioned on Twitter and remarked that many of the most frequently mentioned twitter links were to collections of visualizations. Shown below is a meta list of the top collection-type data visualization or infographic links.
Here are the top product type links in the field according to Twitter data between March 24 and Dec 31, 2009.
and finally:
Michael Deal has published an interesting collection of graphics in his Charting the Beatles project. This first snippet below shows the beginnings of a graph illustrating authorship and collaboration in songwriting throughout their song collection. The full graphic clearly shows the trend towards less collaboration over time in songwriting, the increasing contribution from George, and increasing contribution by outside contributors.
This second image is from a chart showing references in Beatles songs to earlier songs. There are full images and several other interesting graphics on his site.
For many people Twitter has become the best place for discovering the latest and most interesting work in a variety of fields. In my twitter client I keep a search column open that gets constantly updated with the latest tweets pertaining to data visualization or infographics and I see lots of beautiful content flow by. I've been collecting these tweets for quite a while and thought it would be interesting to analyze them and see which visualizations were shared through twitter the most often.
Many of the top links in the domain were articles containing collections of visualizations chosen to be the 'Top NNN' by some panel of experts. For example, the top most shared link was 50 Great Examples of Data Visualization by Web Designer Depot. I will have another post in the near future that lists the most popular of these types of links as well as separate lists for products/frameworks and news/analysis. For this list I chose to focus instead on references to individual data visualizations or infographics.
Here are the top 20 ordered by popularity. Click on either the link or image to go to the original article.










Note that the link made popular on Twitter for #9 Death and Taxes was actually a link to an image on imageshack and I have used instead a link to the original source of the material.
The tweets for this entire analysis were collected from March 24, 2009 until December 31, 2009. Only the first link to a specific item from each Twitter ID was counted so that one person did not unfairly impact the results by tweeting frequently about the same thing.
Items 11-20 are listed below.
Here is a Shaped Word Cloud for tweets containing 'android' from 2009. I removed the tokens 'android' and '#android' from the analysis. You can click on the words to jump to Twitter Search and see the matching tweets. It's pretty clear that android is a 'google' 'phone' and is related to 'iphone' and 'htc'.
I've taken another look at the set of tweets from 2009 that contain 'Obama'. This time I started by focusing on the most popular hashtags that were used. This graph shows the top 10 hashtags, their distribution over the course of 2009, and the total references to them. The top hashtag by far was #tcot which stands for 'Top Conservatives on Twitter'.
How do tweets that contain #tcot differ from those that don't have it? What words seem especially associated with the tag? What topics do people using the tag seem to be focusing on?
I've done an analysis on the word frequency inside tweets containing the tag versus tweets without it. This chart below shows the words that are used much more frequently in the #tcot tweets compared to the baseline. Words on the left like 'CARE' and 'BUSH' are used at a rate of around 100-120% of the baseline rate. Words on the right like 'BHO' (shorthand for Barack Hussein Obama) and 'RASMUSSEN' are used around 500% of the baseline rate - or, in other words, they occur around five times as often in #tcot tweets as they do in non-#tcot tweets.
The chart is an interesting collection of terms and is an attempt at distilling what the people who use the tag #tcot are saying in relation to Obama. Some notable words in the set are 'DANGEROUS', 'SOCIALIZED', 'EXPOSE', 'RADICALS', 'ARROGANT', 'MARXIST', 'COMMUNIST', 'CLIMATEGATE'.
I collected all the public tweets containing 'Obama' during 2009. There were over 5 million recorded during the course of the year. I've done some analysis on a sample containing every 20th tweet. This first graph simply shows the distribution over the course of the year of the number of times the name 'Obama' was used. The curve has a big peak during the inauguration, a few smaller ones in February and March and is then remarkably level for the rest of the year.
This set of graphs shows other words that were used frequently in the tweets about Obama and that had distributions with a high concentration near specific dates during the year. When ordered by the peak date for each graph they give an interesting graphical narrative of Obama-related events during 2009.






It's been snowing where I live for the last month or so and I've been playing around with generating a dove image from snowflake constituents. This first image is constructed from smaller snowflakes built using the Text Snowflake Creator based on the words PEACE, LOVE, and TRUTH. The dove image is from Wikimedia Commons.
This second version uses the three unicode snowflake characters in the font Arial Unicode MS. I've also applied a small variation in color.
Thank you everybody for your interest in Neoformix over the past year. I wish you all a Wonderful and Happy 2010!
These are the 20 most popular posts published on Neoformix during 2009 ordered by their popularity. There are a large number of popular posts based on the Shaped Word Cloud concept and a few more on the related Image Foam Technique.




















Note that many of the most popular parts of Neoformix visited during the past year were for projects published prior to 2009 and include Twitter StreamGraphs, Twitter Venn, Big Small, and Word Hearts.
One year ago today I launched Twitter Venn. Those of you who have not used it before or have forgotten about it might want to check it out. The image below is an example of what it produces.
I'm very pleased to announce that an image from my Twitter StreamGraphs tool was chosen as the cover for the current issue of ACM Crossroads - the Student Journal of the Association for Computing Machinery. There is also a small writeup inside about the image. It depicts the streamgraph for the phrase 'data visualization' and suits the issue well since it is dedicated to the Social Web. The entire issue is available online.
Thanks to Chris Harrison, the editor-in-chief, for inviting me to contribute the image and to Senior Editor Jill Duffy for sending me some copies of the issue.
Fifty-six papers in forty-five countries published a front page article today calling for action at the climate summit in Copenhagen. I've taken the text of the article and created a couple of images. The first is a Clustered Word Cloud which shows the more prominent words from the article grouped into clusters based on whether they were used together.
This second image takes the word clusters and arranges them in a starburst type pattern. The visual form was influenced by the Word Associations work by Chris Harrison. It's a little more interesting to look at and makes the groupings more obvious but has the drawback that the words are smaller than in the first format.
Last night Obama outlined the new policy in Afghanistan in a speech at West Point entitled The Way Forward in Afghanistan and Pakistan. Like many people, I have mixed feelings towards a larger military effort in the region. I have tried to represent that ambivalence with an animated word cloud based on the speech that transitions from one symbol to another.
This was created with custom code written in Processing. The two images came from here and here.
The organization Wikileaks recently published a data set of pager intercepts from the 9/11 tragedy. As described on their website:
Text pagers are usually carried by persons operating in an official capacity. Messages in the archive range from Pentagon, FBI, FEMA and New York Police Department exchanges, to computers reporting faults at investment banks inside the World Trade Center
The archive is a completely objective record of the defining moment of our time. We hope that its entrance into the historical record will lead to a nuanced understanding of how this event led to death, opportunism and war.
I have taken this data and done an analysis for 100 phrases selected to summarize the events of that horrible day. I have focused on the time period from 8am until 8pm, September 11th, 2001.
This video below shows a Phrase Burst Visualization of the data. The larger the text the more frequently it was used during the 12 hour period. Text appears bright during the times of high usage and fades away otherwise. The color hues are cosmetic. This phrase burst visualization is basically a word cloud where the brightness of the words varies according to how prominent the words were during specific periods of time. You can drag the playhead for the video around to examine specific times.
Pager Data from 9/11 - Phrase Cloud Visualization from Jeff Clark on Vimeo.
Perhaps a more useful view of the data is provided by this set of timeline graphs. They are ordered by the time of the highest peak for the phrase and in this arrangement provide a narrative of the events.




Video, graphing, and analysis done with custom code created with Processing.
I believe that the recent Swine Flu pandemic has been dramatically overplayed in the media. This morning I came across the image below on dataviz.tumblr.com that shows the number of deaths in the last 300 days from various causes including Swine Flu. There are a lot of things done really well here - the most important of which is that the deaths due to swine flu are put in a proper context.
Unfortunately the choice of using a solid red bar for emphasis beside the bar graph for Swine Flu deaths confuses the message because at first glance the bar can be interpreted as an extension of the bar graph itself. The first impression (and for some viewers the only impression) is that the deaths due to swine are exceptionally high - the very myth that the graphic is trying to dispel.
I have made a small intervention to the graphic that I believe makes the message less likely to be confused. The bar has been replaced with a text label and three arrows that can't be confused with an extension of the graph itself but still draw attention to the relatively small number of deaths for Swine Flu.
Unfortunately there is no reference on dataviz.tumblr.com to either the source of the original graphic or the data depicted. If anyone knows then send me a note and I'll add proper attribution here.
In a recent post I defined the idea of Twitter ListMates as IDs that are frequently grouped together on the same twitter lists. The listmates for some starting ID give an interesting perspective on how that ID is perceived by others and are in some sense similar to it.
If the starting 'seed' ID is highly characteristic of some particular domain then the highest ranking listmates will also be characteristic of that domain. As a concrete example, let's start from infosthetics, the twitter account for one of the central websites in the area of data visualization. The top ranking listmates are: flowingdata, datavis, and infobeautiful which are all very important voices in the domain.
If we start with all four of these IDs, find the lists they are on, and see who else appears on the same lists the most often we can get an excellent quality list of twitter IDs for the field of data visualization. By starting with a small set of IDs rather than just one we introduce less bias into the result. Another technique that can be used to improve quality is to only use twitter lists whose name matches the domain as well - for example include the members of a list called 'datavis' but not of one called 'friends' when determining the listmates.
I have used this technique to define a number of twitter lists for various domains and saved them under the twitter ID Top100in. The lists defined so far are:
These meta-lists seem to be filled with interesting accounts for the various topics although the datavis one does have a few IDs that are more focused on digital art and design rather than visualization in particular. Feel free to follow them!
I have updated Twitter StreamGraphs to support the new twitter lists. You just enter a list in the standard format in the text box to see the graph for the latest 1000 tweets from all members of the list. The standard format looks like this: @scobleizer/web-innovators.
In Twitter ListMates I introduced a name for the idea of people who are often grouped together on Twitter lists. The idea has value because listmates have been grouped together by multiple people who independently decided that those accounts are similar in some sense. Doing this type of analysis starting from my account, JeffClark, helped me find new people to follow.
I have repeated the process for four other accounts to try and confirm that this technique is indeed useful. The results are shown below.
| For Robert Scoble (scobleizer) we get: | For Shaquille O'Neal (THE_REAL_SHAQ) we get: |
| For John Mayer (johncmayer) we get: | And for Alex Payne (al3x), an engineer at Twitter: |
Again, it seems to give good results: Scoble is grouped with other influential people in the field of technology; Shaq with a mixture of athletes and other celebrities; John Mayer with musicians and celebrities; And Alex with a mixture of developers, other twitter employees, and people influential in technology.
In the recent post called Twitter List Profile Clouds I explored how the Twitter list names to which a person has been added can reveal how they are perceived across the twittersphere. Another interesting idea is that when somebody adds an account to a list they are implicitly defining a relation between that account and every other account on the same list. They are essentially making a declaration that all the members of the list share some characteristic. The name of the list usually offers a clue about how all the list members are related.
So, for example, the fact that datavis and flowingdata both appear on a list together means that somebody thinks they are similar in some sense. And if the list name is called 'datavisualization' then that reveals how the list creator thinks they are similar.
I think of two accounts that appear on a list together as 'listmates'. It seems a reasonable name for the concept and follows the pattern of schoolmates, roommates, teammates etc. If you take all the Twitter Lists that an account is listed on and find all the members of those lists you can define a set of users related to the starting account. Keep track of how many times they appear in total and you also get a numeric score for how similar they are.
I tried out the idea using my own account, JeffClark, as a starting point. Here are my top 25 Twitter Listmates:
The list is a who's who of people I respect and admire in the field of data visualization and I'm very pleased that others have grouped us together. I believe this technique has promise for finding interesting new accounts to follow.