I have been having fun recently exploring how the use of words in tweets varies over the time of day ( #1, #2, #3, #4, and #5 ). A minor change in the code I use for the analysis of the text in the tweets lets me look instead at how use of words varies over the course of a week. The dataset contains over a million tweets sent from Toronto during June and July, 2009 so we have roughly 8 weeks of data. I've binned the data into 2 hour segments by day of the week.
As in the charts below, many of the time series show obvious daily patterns with no apparent variation across the different days. Note that the day of week labels are positioned at noon of the respective day.
Other words show strong peaks for certain days of the week. The terms 'tgif' (Thank God It's Friday), '#followfriday', and 'mondays' appear in the expected locations. Why is 'father' localized to Sunday ? And 'michael' on Thursday ?
Let's check out the terms that have similar shaped curves to these words. For 'father' we get:
From these terms that are temporally related I suspect the tight association between Father and Sunday is because of Father's Day which was on Sunday, June 21st this year which was in the range of data we used for this analysis.
Similarly for 'michael' we get the graphs below and it's easy to see that Michael Jackson died on a Thursday.
Here are a few terms that seem relatively high on weekends:
Overall the technique seems to work well for analyzing day of week patterns. As is often the case, much of what gets revealed seems obvious in retrospect. I suspect, however, that this type of analysis could discover non-obvious patterns as well.