Conceptualizing Humanities Research as Data

The Metropolitan Museum of Art, New York

Working in the humanities fields such as art history, history, cultural heritage, or literature, to name just a few, makes it hard to think about how typical research processes that rely on historical text-based or visual sources can be viewed as data, or why they should.


To illustrate how, key terms will be defined:


In this module we will define data in two ways,

1. as informational parts used as a basis for reasoning, discussion, and critical thought.

2. as information that can be transmitted and processed.

Each definition informs the other. To think of humanities research as data by these definitions it has to be broken into parts. The process of classifying information to enable it to be used critically (1), and to be analyzed or visualized (2) is also the process of applying an ontology.



The ontology is applied at the beginning of any research to ensure the project’s scope and goal are focused. This looks like defining the topic of the research and elements for analysis. This structure is thanks to Dr. Ashley Sanders’ work in the digital humanities.


First define the unit of analysis or the broad type of subject being studied.

Next the population, a specific group within the unit should be defined.

Then determine each case or instance making up the population that will be analyzed.

Last are the finer variables or attributes that determine what is pulled from each case or instance. It is this level that is most recognizable as research, and what defines what can be analyzed, processed, and visualized from the research.


Within each variable are class definitions, or possible representations for each variable. These would be unique, and attempt to represent the source as closely as possible. This could lead to different descriptions that mean the same thing, or names that have changed over time. As your research progresses, class definitions can be cleaned and simplified, or expanded into new attributes.

To better visualize these terms imagine you want to research the development of motifs within Iznik pottery between the 15th and 17th century.

To structure the data generated from sources, first unit of analysis is defined as ceramic pottery.

Population as Iznik pottery during the 15th, 16th, and 17th century.

Each case or instance would be a specific source. Like the the dish depicting two birds among flowering plants from the Metropolitan Museum of Art.

Variables and attributes to support the final project would be:

Type of work

Date of production

Place of origin

Body

Glaze

Color

Motif


Class definitions for "place of origin" would be Anatolia or Iznik. Examining sources, “place of origin” became more complex and represented a history of excavation and misattributions. Better variables became “location found”, “place of production”, and “attributed location”.

As you take a closer look at the basic example provided some issues and criticisms may arise.

Thinking of humanities research in this way, often meets the assumption that nuanced and complex subjects and relationships being represented must become simplified, generalized to fit into tabular structured categories meant for singular values. Adapting practices that were formulated for scientific and mathematical processes is difficult, there is a lot of work to be done, but as you move through this module you will see there are customized tools and methods, as well as benefits to viewing humanities research in this way.

To read more about humanist data visualization, see Johanna Drucker’s article on “Humanities Approaches to Visual Display.”

It is important to remember applying an ontology always adds a lens, a bias, and a perspective to the research being gathered. Individual fields will sometimes have their own standards for these classifications and descriptions to ensure best practices, collaboration, and accessibility, but it does not mean these should not be questioned or acknowledged.




So why do this at all?

Practices for compiling, constructing, and classifying sources for research is more common, with more established models in the social sciences and natural sciences.

By establishing and sharing these practices in the humanities, it allows for transparency and deeper critical thought about how research is conducted, making available any hidden biases, miscalculations, and further evaluation of historical description.

It also leads to the ability to better visualize gaps and missing information, to provide more support for hypotheses preventing overgeneralizations, as well as the potential for collaborative, reusable historical datasets across disciplines.


Let's Review!

Take 5 minutes to think of definitions using this ontology pertaining to your own field of study or interests.

From your own studies, what is a unit of analysis?

A population?

A case or instance?

Variables or attributes, and class definitions?