[box type=“warning“] I haven’t written many english articles in the past few years. So bear with me until I’m back to my old self. [/box]
This is another article about the Herrenhausen Digital Humanities Conference. There is one talk I can’t get out of my head. First of all, because I don’t want to – ok. But Lev Manovich’s approach of taking messy data to find patterns was inspiring. He said we shouldn’t concentrate on metadata, leave our expectations behind to have a clear view on our study object. With that, he addressed something which bugs me about (digital) humanities. Our usual approach is looking at something, phrase a hypothesis and then verify it. The actual research is concentrated at step 3.
This is the right approach for a lot of things and I won’t question basic academic methods here. But we’re limiting what we can find by what we are looking for. We are expecting something when we’re testing an often very narrow hypothesis. We aren’t observing cultural phenomenons. Mostly, because we couldn’t. The empiric work to study global phenomenons is staggering and we would only learn about what we ask for. Again. But with huge amounts of data produced every day by a lot of people it gets easier. For me, the most important thing is, that no one produced this information for science. It’s just as real as it gets. Now we can get a huge chunk of it, look at it and see what it tells us.
There is a catch though. All data we get is shaped and filtered. When we speak of BigData it’s mostly Unicode based text written in english. I know, that’s a little exaggerated, but you catch my drift. Most researchers like to have well structured, clean metadata, which is great for a lot of things. I still remember how the temperature in the conference room dropped, when Manovich told us to forget metadata and use messy data. In my opinion what’s a great idea, but that wasn’t true for everyone. I like how he uses pictures, because they aren’t language or culture specific. At least less than a written text.
But still, even his data isn’t as messy as it could be. Everytime we look at specific things, we chose to do it. Even if it’s a huge amount of things. We always expect something. How do we circumvent that? How can we look at something with the most open mind? How do we empty the cup as Manovich asked.
@ChaosPhoenx great formulation: “ If your cup of knowledge is full, you can’t learn anything new. You’ve got to empty it“
— manovich (@manovich) December 6, 2013
I don’t know either. Though if I could, I would try a few things. One thing is to split the roles of the researcher and the observer. If you’re doing a project like Manovich’s Instagram-Project. The researchers could choose the data set which people are looking at, but the actual observations could be made by a second group. You could even compare it to observations made by the researchers itself. This would probably lead to meta-patterns resulting from the cultural and educational background of the observers, which would either be interesting or destroy the approach completely. But as one of the attendees noticed, Manovich’s patterns change drastically when you’re colorblind. It get’s messy, yes. But I think that isn’t bad per se, but enables you to see new things.
The second approach is complicated and I don’t know if it’s more promising. You can use 2 million instagram photos, which is already a huge dataset to begin with. But it’s still pretty narrow, compared to what we produce every day. It’s just photos, uploaded to a specific service. What would happen if you wouldn’t aim for big data, but for the biggest data? Noisy, messy data? Maybe a snapshot of the whole Internet of a week, using BigData as humanities shotgun.
— V Mayer-Schönberger (@Viktor_MS) December 5, 2013
It would still be a narrow, prejudiced approach, but the amount of data might blow our minds wide open. We could just look at it and see if it tells us something. Then we could still do something like compare culture specific reactions to certain events, see how they impact media generation. This approach to observation would be more like going through the city with open eyes and ears, noticing everything. Yes it’s still a meta-data filtered approach, but we would combine different types of media coming from all places. The biggest flaw is, that we might find patterns which aren’t actually there. It might get so messy, we’re just seeing noise.
If anyone tests new approaches or doing something like Manovich, I’m in.