Document Cloud Comparison

By: Jeff Clark    Date: Mon, 04 Feb 2008

Word Association Clouds appear to be an interesting way to navigate within a document and get an understanding of the concepts discussed. I've also been playing around with the idea of using two of them linked together in order to explore the similarities and differences between two different documents.

The image below shows an example using the State of the Union addresses for both 2007 and 2008. The two clouds show the words related to the focus word in both documents in the same manner as for the single Word Association Cloud. The only difference is that colour is used to indicate words that are unique to one document or another. The words in blue on the left are unique to the 2007 SOTU and those in red on the right are unique to the 2008 SOTU. As before, you can click on a word to bring it in focus or click on the top edit box to change it. The clouds are linked in this case so that they always show the same word for both documents.

Document Cloud Comparison (static image)

We show here the words associated with 'energy' in both of the transcripts. The word 'supply' is most highly associated with 'energy' in the 2007 version and the blue colour shows that it isn't even used in the 2008 address. You can also easily see that 'wind', 'solar', 'electric' and 'vehicles' were all used in relation to 'energy' in 2007 but were not even mentioned in 2008. In 2008 the word 'security' is the most highly associated term. It does appear in 2007 but is not as prominent in relation to 'energy'.

It's much more interesting to try it out yourself. Click on the image or 'more' to give it a try.

The interactive application:

To view this content, you need to install Java from


SOTU 2008 Arc Diagram
Clinton-Obama Super Tuesday Speech Comparison