darkoshi: (Default)
In 500 Billion Words, New Window on Culture

Google has made a mammoth database culled from nearly 5.2 million digitized books available to the public for free downloads and online searches, opening a new landscape of possibilities for research and education in the humanities.

The digital storehouse, which comprises words and short phrases as well as a year-by-year count of how often they appear, represents the first time a data set of this magnitude and searching tools are at the disposal of Ph.D.’s, middle school students and anyone else who likes to spend time in front of a small screen. It consists of the 500 billion words contained in books published between 1500 and 2008 in English, French, Spanish, German, Chinese and Russian.

The intended audience is scholarly, but a simple online tool allows anyone with a computer to plug in a string of up to five words and see a graph that charts the phrase’s use over time.
...
Google says the culturomics project raises no copyright issue because the books themselves, or even sections of them, cannot be read.

So far, Google has scanned more than 11 percent of the entire corpus of published books, about two trillion words.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

May 2025

S M T W T F S
    123
45678910
11121314151617
1819 202122 2324
25262728293031

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Saturday, May 24th, 2025 11:09 pm
Powered by Dreamwidth Studios