We recently received a large (220 record carton) addition to one of our collections of publishers’ records. The files were in good order in well labeled folders so it wasn’t a matter of having to do any complicated arrangement and description work with the records, and we definitely weren’t refoldering, just compiling a box list and integrating that box list into the existing finding aid.
One of our students went through the containers and created an inventory, but then we hit what is a pretty typical challenge with these sorts of accessions, and with additions to existing collections more generally. I had a box list, but the boxes themselves weren’t received in any particular order so while obviously the records had been stored alphabetically, the list I received (in a semi-structured Microsoft Word document) reflected that on the box level, but not as a whole. So I wanted to take that box list and alphabetize the entire list by file name (which, in this case since these are editorial files for a publisher, was also by the last name of the author). Complicating this was the fact that there were nested files within the document and I wanted to keep those files grouped with their parent files and only alphabetize by the parent component. Also, I wanted to be able to integrate this list with the existing EAD container list– ideally to insert the new files into the container list in proper alphabetical sequence.
Because part of my data (the existing EAD container list) was only available to me in xml, I first converted the Microsoft Word container list into EAD as well so that all of the data I was working with was at least in the same format. I then pasted all of the code together into one document.
Then I created a new open refine project by opening my xml document in Open Refine– when importing xml into OR it asks the user to click on the first element to load, so I skipped my dsc tag since I was working on the component level and selected my first <c> tag and then “create project” to get something that looks like this:
But then I need to scroll right, because I am dealing with hierarchical description with nested sub-components and I want to alphabetize by that top level hierarchy– what is represented here as the c-did-unittitle
To sort the list alphabetically a this point you need to click on the arrow at the top of that column, sort by text a-z
Voila, the file list is now sorted alphabetically, with the nested subcomponents grouped together as part of the record for the parent component.
Up at very top go to “sort” and preserve the reordering permanently so that when you export it is still alphabetized (you can also do a custom export that will allow you to only export certain columns and to reorder as necessary). Awesome, so now I have a spreadsheet that has taken all of my data, integrated it, and alphabetized it.
Now, however, I need to get it back into xml so that I can insert it into my container list. We’ve covered this ground before– Mo has a stellar tutorial for this spreadsheet to EAD thing, and another intrepid Chaos–>Order commenter (this blog is one of the few places where you actually should read the comments) brought to my attention this fantastic site, or you can export what you have into an Excel sheet and import it directly into Oxygen and massage from there.
So I converted my spreadsheet back into EAD for upload to our finding aids site. This solution didn’t end up being perfect–in all of the format conversions I lost some nesting of components that I had to go back and deal with manually. There was also some sloppy data entry that I didn’t catch before I started working through the data– but these are things I can solve either through better management or a bit more experimenting with conversion processes. Even though this wasn’t perfect, it was a fast solution for dealing with a very common and often troublesome scenario.