In a previous entry I introduced a simple topic hierarchy and I've used it to characterize weblog posts by measuring which topics were more prevalent in a given document. This can be considered a dimension reduction technique that attempts to capture important aspects of a document with just a few numerical quantities. In data visualization it is often useful to use graphical objects whose elements (e.g. position, shape, size, colour, orientation, etc.) are bound to a given set of numerical quantities. These are usually called glyphs. Some examples are Chernoff Faces and Star Plots.
I recently began investigating a tool called Processing which is an open source programming language and environment for people who want to program images, animation, and sound. I came across an interesting example that generated reasonably attractive quasi-realistic images of flowers. I've enhanced this so that I can create flower-like images whose characteristics are driven by features derived from arbitrary text. Most of these features are related to what high-level topics are evident in the text so I call the generated images Topic Flowers.
Here are a few examples with the text used to generate them.
|A University of Tasmania PhD fine art student, King is also artist-in-residence at the university's school of medicine. And it is there, more than in an artist's studio as such, that she creates her best work in glass vials. King grows the membrane over marble-sized glass forms and then incorporates it into her sculptures. She wants her work to confront viewers and provoke thought and debate. "Contemporary art is perfectly placed in an influential position to promote biotechnology," King said. (Link)|
|I love art.|
|The science of today is the technology of tomorrow. - Edward Teller|
In my scheme each high-level topic is always represented by the same colour. The colours for the three highest scoring topics are used in the flower. The second example above makes it obvious that Art is represented by red and the third example shows that Science and Technology are blue and cyan. In my next post I will include a little application that will allow you to generate your own flowers based on whatever input text you want to use. I'll also explain more how to interpret the generated images.