I have built another little digital humanities project based on the text of the 62 stories in Grimm's Fairy Tales. This one is called Grimm's Story Metrics and presents an interactive matrix of stories together with various metrics calculated from their text. You can click on a column to sort by that data, click again to reverse the direction, and click on a story name to open it in another window. The image below shows the stories sorted by the 'Royalty' metric which indicates, as you would expect, how many references there are to words related to the topic of royalty. Click on the image to go to the interactive tool.
Hovering over any of the bars shows details about that particular measurement. Most of the metrics, like 'Royalty', are based on topics and the details shown are the words characteristic of that topic used in the story. So, for example, the details for 'Royalty' in the 'Frog-Prince' are princess, prince, king, kingdom which are listed in frequency order. These topical metrics are normalized based on total words in the story so longer stories have no scoring advantage.
The 'Lexical Diversity' is a ratio of the number of unique words in the story to the total words. These stories are fairly short and you can observe a rough inverse relationship between 'Story Length' and 'Lexical Diversity'. 'Clever Hans' is an outlier in this relationship. If you examine the text for this story you'll see that there is a great deal of repitition.