There are a lot of reasons to turn a spreadsheet into a finding aid. Maybe your donor sent you an inventory of a new accession in Excel, and you want to modify and re-use that data. Maybe you’ve determined that Excel has data entry features that Archivists’ Toolkit doesn’t have, and you want to take advantage of those. Maybe it’s the good in-between step between a legacy Word Document and structured data.
At Tamiment, we have an MS word inventory for the National Guardian’s photographic prints that has never been a finding aid. And looking at the inventory, we noticed that there were a few things to clean up — none of the prints had date information, the boxes numbering was confusing (A and B boxes), and the same folder title was often repeated as many as ten times instead of adding an extent statement to indicate that there are ten folders. There had also been a project many years ago to label archival folders with information about these prints, but the prints were never re-foldered.
This became a good back-burner project for our awesome graduate student, Giana. And since I get a lot of questions about how to transform this kind of data, let’s go through all of the steps together. My documents are all on GitHub, so you should feel free to use or steal or modify what I’ve made available (which, indeed, I used and stole and modified from others).
First, I copied the text from MS Word to Excel (actually, Google Spreadsheets), used filters to delete blank rows, and created a new column for box numbers, which I pulled down so that there is a box number on each line. I then added column headers for Date Start, Date End, New Box Number (I knew that boxes might swell because of re-foldering, and I wanted to get rid of 14A, etc. and make sure that all box numbers are whole numbers), Folder Number, Extent, and Notes (just in case).
Then, Giana did the real work. She re-foldered prints, verified folder names, copied date information, deleted repeating rows and added extent information, and made sure that everything was more-or-less right. Her result looked like this:
This looked pretty good, although there were one or two things to fix — we deleted empty folders from the spreadsheet and threw away the physical folder, too. I also looked through any of her notes, and did some clean-up. Finally, we saw that there were some missing materials that our former director has pulled from the file and never returned — I removed them from the finding aid, but I’ll add a note about them to the scope and contents note at the end. Hopefully, they’ll turn up someday. I also went through and did general proofreading — although Giana had already done a really good first pass of this.
The last thing I did with the spreadsheet was update the column headings to map to EAD fields.
Next, I import this spreadsheet into Oxygen. Did you know that you can do this? It’s here:
I set my import preferences so that the column headers are used as tag names, and I can elect to not include some columns.
Now that I have my data in xml, I use an xslt to turn this into an EAD container list.
If you’re following along at home, the first thing you want to do is look at the xslt and get comfy with it. It’s basically working like a good, old-fashioned mail merge in Word/Excel — it’s asking whether elements are present, then giving commands to pull in that element if it is. I’ve gone through and added the text “folders” after the extent statement, for instance, and I’ve modified unitdate to conform with my column names.
Next, I set up my transformation in Oxygen (the little wrench icon). I give the transformation a name, and tell it that I want it to apply this file to whatever xml I’m working on. I use the Saxon-PE engine, and in the output tab I tell it to open the transformed file in the editor.
Since my XSLT is kind of dumb (it doesn’t deal with one folder vs several folders, or a situation where there might not be an end date) I have to do some mass find-and-replaces. But for this project, it’s easier doing this clean-up work than writing the logic into the XSLT.
Finally, I paste the file list into the exported EAD from the toolkit that has collection-level information, and re-import. Voila!