Lexical Analysis of Debates

By: Jeff Clark    Date: Mon, 27 Apr 2009

I did a fair number of posts last year that analyzed various texts related to the US election. A number of different techniques were used including StreamGraphs , Speech Contrast Diagrams, an interactive transcript visualizer, and, of course, word clouds. I introduced Martin Krzywinski in my last post as the creator of Circos. Martin has also done some excellent work in the area of lexical analysis and visualization of text in the post Lexical Analysis of 2008 US Presidential and Vice-Presidential Debates who's the Windbag?

Here is a portion of one of his graphics that illustrates thematic profiles for Obama and McCain during a debate. It has some conceptual similarity to my interactive transcript visualizer.

These word clouds below were created by Martin and use different colours to show the words spoken uniquely by Obama in green, uniquely by McCain in blue, and by both men in white. The first one shows nouns and the second is limited to adjectives. I think the idea of limiting the cloud to a particular part of speech is a fruitful one to explore.

In the same document Martin also formulates and calculates an interesting 'windbag index' that is a composite of measures of repetition in various aspects of speech.


