Tinker, Tailor, Builder, Maker: Digital Humanists in the World of Big Data

Link to my prezi: http://prezi.com/chuttfitawvw/tinker-tailor-builder-maker-digital-humanists-and-the-world-of-big-data/

Who’s In and Who’s Out of …What Exactly?

There are a lot of questions that tend to make an appearance (like whack-a-moles) in discussions related to Digital Humanities. One of the more contentious questions is who are digital humanists? What are there qualifications?  How do people go about getting “into” the digital humanities field?

Any discussion of who is “in” and who is “out” in digital humanities raises a whole host of other questions and concerns, including issues of skill-sets, qualifications, and, interestingly enough, what the digital humanities are in the first place. Digital humanities, as a burgeoning field, a concept, a profession, a term, is very much in flux.  This ongoing debate makes for some interesting and exciting discussion, which is reflected in the readings I am presenting.

Big Data: Go Big or Go Home

To set the stage, I’d like to start with our friend Lev Manovich and his article Trending: The Promises and the Challenges of Big Social Data. In this piece, Manovich discusses the role big data is playing in the digital humanities, and the impact big data is having on those who engage in the digital humanities in their work.

What is “big data”?  According to Manovich, big data is, in a nutshell, massive amounts of data.  In 2008 Wired magazine noted that we’re entering a new “Petabyte” age of big data, where our ability to handle massive data sets is growing and changing.

The NEH Office of Digital Humanities has recognized this shift and has issued a Digging into Data Challenge. The NEH notes that “As the world becomes increasingly digital, new techniques will be needed to search, analyze, and understand these everyday materials.”

Manovich considers many projects that tackle big data in various ways and forms. Manovich’s own Software Studies Initiative is a forum where software is considered as a subject of academic inquiry. Tools like Many Eyes and Tableau offer free data analysis and visualization. The List of Data Repositories at the Digging into Data project shows just how many organizations are making large amounts of data available for study.

Digital humanists now have the opportunity to handle and explore large data sets. But this opportunity is also something of a burden. As digital humanists start working with things like billions of tweets or Flickr photos, a number of theoretical and practical issues arise.

Surface Data vs. Deep Data

Manovich identifies a few major issues that surround the use of big data in the digital humanities. The first issue is the differences between “deep data” about a few people and “surface data” about many people. Traditionally, deep data has been the realm of the humanities, while surface data has been the domain of social scientists using quantitative methods.  Existing between these two data arenas was the land of statistics and sampling.

According to Manovich, big data is changing the data landscape. Can we really distinguish between deep and surface data now? Manovich agrees that the type and amount of data we can now access is changing the research landscape. But he also raises a few objections to the idea that big data has wiped out any distinctions between deep data and surface data.

His first objection is that only social media companies have access to large data sets. The public APIs that researchers can access do not contain all the user data that a company has. The data a researcher gets might be massive, but it does lack depth.

Second, Manovich discusses issues surrounding the authenticity of digital footprints, and considers cases of self-censoring and governmental censorship online.

Aside from these more practical objections, Manovich also notes that different types of researchers access different kinds of data and, therefore, ask different questions in their research. Digital humanists will inevitably have a different approach to data, big, deep, or surface, than a computer scientist researcher.

So the question for digital humanists becomes one of relating technique to theory. Manovich asks “What can be discovered and understood with computer analysis of social and cultural data?” Computer can gather information, but they are not very good at analyzing data. Just look at Watson’s performance on Jeopardy, where he (it?) had trouble with more nuanced phrases and reasoning.

For digital humanists, the ideal may be to have computers parse large amounts of data and humans interpret the data.

Dealing with Big Data

But are digital humanists up to the task of dealing with big data?  Manovich concedes that some technical know-how is needed to really do a lot with tools.  According to him there are three types of people: those who create data (everyone), those who can collect data, and those who can analyze it. The gap for many digital humanists lies in having the means to collect data. Whether it’s an issue of funding, a lack of public tools, or a lack of knowledge in using tools, many digital humanists face obstacles in actually accessing and parsing big data.

Surface data might be on its way to becoming the new deep data for digital humanists, but there are many practical obstacles to overcome before that theory is a reality.

Computer Education 101

Various types of data, and the necessity of using tools to deal with big data, are now part of the digital humanities landscape. But, as Manovich noted, there is a knowledge gap among many digital humanists in terms of computer savvy.  Do digital humanists need remedial digital education?

Into the breach step James J. Lu, computer science and mathematics professor at Emory, and George H.L. Fletcher, engineering and computer science professor at Washington State. In a nutshell, these two men argue that we need to teach computer science and computational thinking early and often.  Sounds good, right?

Yes and no. This article is really a wonderful example of why humanists are really quite important in the world. The problem with this article is that the engineer and computer scientist authors do not seem to know much of anything about primary school education (and they appeared to use only the Internet as a resource for grade school curriculum ideas).

At any rate, they argue that we should focus on teaching kids “computational thinking” rather than programming. In fact, no one should learn programming until they are older and in high school or college. Children should learn the theory and concepts behind computers before beginning to program.

Would this type of education be useful at all?  A potential counter argument comes from UT’s own Computer Science Department. Their outreach activities to elementary schools are all quite hands-on and practical and teach basic programming concepts as opposed to straight theory.  Likewise, a course in Computer Science for Non-Majors features a mix of practical concepts, programming basics, and information on how computers actually work.

The real lesson from this article seems to be that humanists (such as educators) should work with computer scientists to improve school curriculum and turn out a generation of tech-savvy humanists who can parse big data with the best of them.

Digital Humanities Throw-down, 2011

Educating children about computers (a noble aspiration) is an area of interest, but what of the current, adult digital humanists who likely never learned programming in grade school?  Is programming even necessary for digital humanists to learn?

According to Stephen Ramsay, the answer is yes.  At the 2011 MLA conference, Ramsay set off something of a firestorm when he insisted that digital humanists need to know how to code. Ramsay said that “If you are not making anything, you are not … a digital humanist.”

Ramsay took to his own blog in the aftermath, to clarify his position.  While he expanded on it, he never really took back his initial assertion.  To Ramsay, making is an essential and intrinsic part of the digital humanities. Making, and the process of doing so, “yields insights that are difficult to acquire otherwise.”

In clarifying his stance, Ramsay amends that coding per-se is not necessary for digital humanists to know. But, if you plan to put digital humanist on your business card, you have to be involved in building something.  Ramsay argues that all aspects of the digital humanities (data mining, XML encoding, GIS, tool design, etc.) involve building.

Bricoleurs and Collaborators

Alan Liu commented on Ramsay’s post with a thoughtful counter-suggestion (argument is a bit strong for what Liu did).

Liu notes that many digital humanists are bricoleurs of code; they borrow and patch things together. In fact, digital humanists have a lot in common with engineers. Liu says that structural engineers do calculations and make drawings, but a whole host of other roles are needed in order to actually build a structure.

The idea of having makers/builders be “in” the digital humanities clubhouse excludes the idea of collaboration, which is at the heart of digital humanities.

Liu concludes by saying that we should recognize a multiplicity of building roles in the digital humanities. In this respect it seems that the digital humanities should be no different from the more traditionally defined humanities in that there are many different types of humanists doing many different types of work.

Defining the Digital Humanities

Mark Sample’s post “The digital humanities is not about building, it’s about sharing,” also responds to Ramsay’s rather contentious remarks with a thoughtful assessment of what the digital humanities are and who digital humanists are.

In the digital humanities field, there’s a definite tension between “do vs. think.” But Sample argues that these tensions and debates are little more than a “distracting sideshow to the true power of the digital humanities, which has nothing to do with production of either tools or research. The heart of the digital humanities is not the production of knowledge; it’s the reproduction of knowledge.”

Appropriately enough for a digital humanist, Sample thinks that he really nailed his belief down in 140 character or less on Twitter:  “DH shouldn’t only be about the production of knowledge. It’s about challenging the ways that knowledge is represented and shared.”

Now that we are back into the humanistic happy place, where sharing is caring, Sample turns his piece to a discussion of how digital humanities can pioneer ways to share knowledge. Sample is particularly energized by the new head of the MLA’s Office of Scholarly Communication, Kathleen Fitzpatrick, who is a founder of Media Commons.  He hopes that Fitzpatrick will be an advocate for positive change in the academy. Some of Sample’s ideas for the future include scholars starting their own small academic press, the abolition of blind-review, and the accepting digital projects as tenure-worthy.

In closing, Sample considers the digital humanists, the makers and creators and builders and bricoleurs, as more than just those who create or use tools. To Sample, a digital humanist is someone who has “the opportunity to distribute knowledge more fairly, and in greater forms.”

Ultimately, digital humanists are not just making tools and techniques and theories; they are “making” a landscape where various kinds of knowledge are widely available.

Discussion Questions

At the very end of his article, Manovich raises an issue about privacy. He writes “would you trust academic researchers to have all your communication and behavior data automatically captured?”  Well, would you?

Do you agree with Ramsay’s assertions that digital humanists need to know how to code, or at the very least be “building” something?

Do you agree with Sample about the overarching goal of the digital humanities, to reproduce and share knowledge?  What are some other ways digital humanists can accomplish this?

A Note on my Prezi Presentation

Is prezi presentation redundant?  At any rate, I designed my presentation (using the word design rather loosely) to loosely resemble a web of circles and lines. Unoriginal, yes, but I felt that a web framework best expressed the interlocking questions and issues that I explored in my presentation. Considerations of who digital humanists are, what they do, and what digital humanities is in the first place are all linked together. I also hoped that my design would help to demonstrate the slightly messy and very complicated nature of these questions and issues in the digital humanities. Additionally, I also chose to spread my presentation out in terms of space, to represent the wide-ranging nature of the questions posed throughout the presentation. In terms of colors, I opted to use splashes of primary colors (in the CMYK model, salient to both paper printing and computers) to represent the essential nature of these questions and issues to the field as a whole.

Leave a Reply