“Big data” is the phrase du jour. The Harvard Business Review has declared “data scientist” the sexiest job of the twenty-first century. “Digital humanists” have abandoned the archives for computer labs. What the hell is going on? What are all of these things? Well. (That’s a bit of self-indulgent irony.)
In Uncharted: Big Data as a Lens on Human Culture, wunderkinds Erez Aiden and Jean-Baptiste Michel demonstrate for readers the uses of big data beyond marketing and social networks. The minds behind Google’s Ngram Viewer approach human history through the texts of over 30,000 books. Cultural output is ingested, digitized, quantified, and analyzed; the result was the most massive experiment of its kind to that time, and Aiden and Michel use their work here to illuminate the ways in which big data provides collaborative opportunities for scholars in the sciences and humanities.
The notion behind big data, at least in regards to its applications in the humanities, is that it provides scholars a potential “long view” of cultural change. Aiden and Michel begin by looking into when Americans began saying “the United States is” rather than “the United States are.” Traditional scholarship held that this change occurred in the immediate aftermath of the Civil War. According to Aiden and Michel’s analysis, though, the change too place later, with the former superseding the latter only in the 1880s. (Note that both phrases have always been and continue to be used.) Here, then, is 200 years of cultural evidence applied to a (minor) historical question.
Much of the first half of Uncharted is exposition: Aiden and Michel must acclimate readers to methods and concepts of big data and the digital humanities, which takes time. Thankfully, they do it well: The authors have a knack for explaining potentially fraught processes with clear language and apt metaphors, and rarely will the lay reader find himself lost in the thickets of the science. Readers with a background in information or technology will appreciate Aiden and Michel’s story, the way they conceived their project, and their ultimate success, access to the data they needed, which was held by Google.
Aiden and Michel devote the second half of the book to particular questions (or experiments, if you will) that serve as examples of big data’s potential scholarly applications. Consider, for instance, whether or not it is possible to “kill” an idea. Common wisdom tells us that it can’t be done, and the data to which Aiden and Michel had access provided them fertile testing grounds to test this hypothesis. With that in mind, the investigators looked into censorship in 1930s Germany, particularly at the Nazis’ efforts to blot out of the historical records mentions to such “deviant” artists as Marc Chagall. With hindsight, we know that the Nazis ultimately failed. But how successful were they during the 1930s and ’40s? Big data reveals that they were remarkably successful: As Chagall’s fame rose in France, his name was effectively removed from the public record in Germany. (All of this is based on data culled from books, remember.) In other words, ideas can be silenced, at least for a time.
The obvious rejoinder to Uncharted is that its research is based on flawed data sets: Where are the newspapers? Where are the letters? Aiden and Michel are aware that their research was informed by an incomplete historical record. Indeed, Google has made impressive progress digitizing books, but only local and sporadic attention has been given to humanity’s vast historical output. The authors’ intent, then, is not to provide definitive answers to the questions they pose, but to highlight the methods big data makes available to researchers, and to advocate for more comprehensive digitization efforts.
Those readers with some knowledge of historiography might, as an addendum to the above critique, complain that, for most of human history, there is no or little data available regarding the life of the average person. Indeed, the written word reflects a type of bias: It is the expression of a culture that has developed the knowledge and technology necessary to express and record its thoughts, an activity that was usually done by people in the top tier of society, either by means of education or wealth. A monk was not a king, perhaps, but was elite in terms of his education, and his output reflects that privilege. In other words, serfs didn’t write books, and the record to which we have access is skewed in favor of a particular class, although that bias decreases over time. Again, though, the digital humanities aren’t the solution to all such problems; they’re a new avenue by which to approach scholarly investigation.
Uncharted is a breezily written introduction to the ways in which big data provides opportunity for collaboration between the sciences and humanities. Readers worried that they might not follow the book should not feel intimidated; the text is accessible, peppered with clever turns of phrase, and Aiden and Michel are skilled story tellers. Recommended for readers with an interest in technology and the information sciences.