The graphic below shows a Clustered Word Cloud for the world news headlines from 2008. As in my last post, the data comes from the Toronto Star so it comes from a Canadian perspective. Several groups of keywords bear this out including the second largest (in red) which shows there was a lot of coverage about Canadian soldiers killed or injured in southern Afghanistan. The largest cluster by far (light blue) shows that the US presidential campaign received a lot of coverage. The automated clustering did produce the unusual grouping of 'Korea' with 'Carolina', 'primary', and 'victory'. They were linked through frequent use of 'North' and 'South' as in 'North Korea' and 'North Carolina'.
By grouping related words this technique does a much better job of summarizing the most covered international events than the Streamgraph representation. However, in order to do so it sacrifices any attempt at showing the distribution of events over time. Perhaps some combination of these two ideas would be fruitful.