We all know that Google has digitized millions of books. Less well known is Google books Ngram Viewer, "an online phrase-usage graphing tool" (according to Wikipedia).
Essentially, Ngram Viewer gives us the ability to look at percentage occurrences of terms and phrases across a vast database of digitized books in English and other languages. These occurrences are charted by date range, allowing us to indentify trends in English language usage.
Don't expect high percentages for any given term or phrase. The definite article "the" tops out at 6%, which clearly beats out its indefinite cousins "a" and "an" by over 3%. As one would expect, most words and phrases occur much, much less frequently; in such a database of books - the .00005% territory is normal. The very small percentages suggest that finding meaning will be more profitable by comparing the results of two or more search terms or phrases.
Interpreting search results, however, can be fraught with challenges, many of which are pointed out in the book Uncharted: big data as a lens on human culture.
Nevertheless, it is both interesting and fun to compare the search results of terms and phrases assembled around a theme. Lets try the following: Santa Claus, Scrooge, Coca Cola, Christmas Tree, Grinch to see what happens: