Posted on Saturday 25 December 2010
When I started this blog, it arose after a couple of other failed blogging attempts. I found that I sent emails to my friends and put links and clips in them. I decided to turn this into a blog. It seemed that my emails fell into six categories: art, science, technology, politics, literature and kids. I made posts in those categories. Over time, it seemed that I needed to add a few more. As of late, I have encountered bits about language. There are times when I think that language is a poor form of communication, but it is the best that we have.
There is a project underway to scan books at Google for the purposes of research.
Together with over 40 university libraries, the internet titan has thus far scanned over 15 million books, creating a massive electronic library that represents 12% of all the books ever published. All the while, a team from Harvard University, led by Jean-Baptiste Michel and Erez Lieberman Aiden have been analysing the flood of data. Their first report is available today. Although it barely scratches the surface, it’s already a tantalising glimpse into the power of the Google Books corpus. It’s a record of human culture, spanning six centuries and seven languages. It shows vocabularies expanding and grammar evolving. It contains stories about our adoption of technology, our quest for fame, and our battle for equality. And it hides the traces of tragedy, including traces of political suppression, records of past plagues, and a fading connection with our own history.



