Tutorial — How to Turn a Spreadsheet into the Contents List of an EAD-Encoded Finding Aid

There are a lot of reasons to turn a spreadsheet into a finding aid. Maybe your donor sent you an inventory of a new accession in Excel, and you want to modify and re-use that data. Maybe you’ve determined that Excel has data entry features that Archivists’ Toolkit doesn’t have, and you want to take advantage of those. Maybe it’s the good in-between step between a legacy Word Document and structured data.

At Tamiment, we have an MS word inventory for the National Guardian’s photographic prints that has never been a finding aid. And looking at the inventory, we noticed that there were a few things to clean up — none of the prints had date information, the boxes numbering was confusing (A and B boxes), and the same folder title was often repeated as many as ten times instead of adding an extent statement to indicate that there are ten folders. There had also been a project many years ago to label archival folders with information about these prints, but the prints were never re-foldered.

This became a good back-burner project for our awesome graduate student, Giana. And since I get a lot of questions about how to transform this kind of data, let’s go through all of the steps together. My documents are all on GitHub, so you should feel free to use or steal or modify what I’ve made available (which, indeed, I used and stole and modified from others).

First, I copied the text from MS Word to Excel (actually, Google Spreadsheets), used filters to delete blank rows, and created a new column for box numbers, which I pulled down so that there is a box number on each line. I then added column headers for Date Start, Date End, New Box Number (I knew that boxes might swell because of re-foldering, and I wanted to get rid of 14A, etc. and make sure that all box numbers are whole numbers), Folder Number, Extent, and Notes (just in case).

Then, Giana did the real work. She re-foldered prints, verified folder names, copied date information, deleted repeating rows and added extent information, and made sure that everything was more-or-less right. Her result looked like this:

National Guardian -- Spreadsheet

National Guardian — Spreadsheet

This looked pretty good, although there were one or two things to fix — we deleted empty folders from the spreadsheet and threw away the physical folder, too. I also looked through any of her notes, and did some clean-up. Finally, we saw that there were some missing materials that our former director has pulled from the file and never returned — I removed them from the finding aid, but I’ll add a note about them to the scope and contents note at the end. Hopefully, they’ll turn up someday. I also went through and did general proofreading — although Giana had already done a really good first pass of this.

The last thing I did with the spreadsheet was update the column headings to map to EAD fields.

Next, I import this spreadsheet into Oxygen. Did you know that you can do this? It’s here:

Importing Excel into Oxygen

Importing Excel into Oxygen

I set my import preferences so that the column headers are used as tag names, and I can elect to not include some columns.

Settings

Now that I have my data in xml, I use an xslt to turn this into an EAD container list.

If you’re following along at home, the first thing you want to do is look at the xslt and get comfy with it. It’s basically working like a good, old-fashioned mail merge in Word/Excel — it’s asking whether elements are present, then giving commands to pull in that element if it is. I’ve gone through and added the text “folders” after the extent statement, for instance, and I’ve modified unitdate to conform with my column names.

XSLT

Next, I set up my transformation in Oxygen (the little wrench icon). I give the transformation a name, and tell it that I want it to apply this file to whatever xml I’m working on. I use the Saxon-PE engine, and in the output tab I tell it to open the transformed file in the editor.

Transformation

Since my XSLT is kind of dumb (it doesn’t deal with one folder vs several folders, or a situation where there might not be an end date) I have to do some mass find-and-replaces. But for this project, it’s easier doing this clean-up work than writing the logic into the XSLT.

find-replace

Finally, I paste the file list into the exported EAD from the toolkit that has collection-level information, and re-import. Voila!

AT Record

Advertisements

10 thoughts on “Tutorial — How to Turn a Spreadsheet into the Contents List of an EAD-Encoded Finding Aid

  1. Thanks, Jackie! None of this is cutting-edge, but it’s useful to know.

  2. Great post. You might also take a look at Steady, a web application built by Jason Ronallo at NC State that converts .csv files to EAD.

    http://steady2.herokuapp.com/

    I use it all the time for loading massive container lists into AT. There are some limitations, but it fits 90% of use cases

    • Thanks so much for this, Noah! This totally transformed a couple of my legacy descriptive projects– great tool!

      • Hi Carrie,
        I wonder if you’ve used Steady to import into ArchivesSpace? I’ve been able to successfully do this, but if I transfer the files to another resource record, I can’t add any additional files.

        Best,

        Beth

  3. A bit more clunky, but Northwest Digital Archives (NWDA) has instructions using the mail merge function in Excel.

    https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&cad=rja&uact=8&ved=0CDgQFjAC&url=https%3A%2F%2Fwww.orbiscascade.org%2Ffile_viewer.php%3Fid%3D1479&ei=cqUgU6PaFuf20gHSmIDYDw&usg=AFQjCNEqnXnGFJuN1Uuz4ziCcMpxSGOOvw&sig2=Van6rTUKIv2zkf_VrvvWmw&bvm=bv.62788935,d.dmQ

    Goes pretty fast once it’s set up. I’m sure people would be happy to share the template and related documents for those interested.

  4. Pingback: Converting spreadsheets to word documents: A walkthrough | Documenting the Humanities

  5. Pingback: Chaos —> Order | Exporting, Editing, and Importing EAD in Archivists’ Toolkit: A Checklist

  6. Pingback: Chaos —> Order | Sorting, Adding, and Integrating Container Lists

  7. Pingback: Tutorial — How to Turn a Spreadsheet into the Contents List of an EAD-Encoded Finding Aid | Library Workflow Exchange

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s