Structuring Your Data

Before you begin to preserve your data, you must make certain the data is able to be preserved.

Creating a lasting structure for your research data is key.

When creating a structure for your data to live in, it is important to first consider the types of data you think you may need to store and the end use of that data. What kind of data am I gathering - for example, am I only gathering text-based information, or do I need to store images as well? Will others need to access this data? If so, will the data be accessible and comprehensible to them? What kinds of questions might they be asking of the data? The answers to these questions will decide if your data is best suited to live in a spreadsheet or a relational database.

Now that you’ve considered spreadsheets vs. databases to store your data, here are some of the software options you might consider:

When structuring data within a spreadsheet or database, it is important to both create consistency within the data (Data Hygiene) and to use open-source file formats so that when exported for preservation, it is retrievable for the longest possible period. You will learn more about this in the next section, your Personal Digital Preservation Plan.


When you are working with your data set, data validation and cleaning is an important part of verifying data before being able to publish it. Tools like OpenRefine allow you to compare your data and clean up spelling inconsistencies, date format inconsistencies, remove whitespace, combine/remove duplicates, split cells into multiple columns, and more. When working with images, Tropy is a helpful organizational tool that will assist in sorting and adding metadata to photographs of primary sources so that you can track your research within the broader context of your project. These are the kinds of tools that are necessary when working with spreadsheets, but are built-in to relational databases.


When thinking of the long-term preservation of your data, consider exporting or converting data into open-source formats if they do not already exist in such a format. Also consider what software others might need to open files in and whether that might or might not be possible in five years. Below is a table of commonly encountered file formats in humanities research and whether they are proprietary or open-source:

Let's Review!

Is my data better suited for a spreadsheet or a database? Why?

Am I storing my data in open-source formats? If not, do I have a plan to export or convert my files?

Has my data been validated and cleaned, or do I have a plan do to so on a regular basis during my research?