Sniffing out problems — bad barcode information

Like many large repositories, we barcode all of our containers and keep them offsite. This is a pretty straightforward and on the whole error-resistant process, where an archivist uses a plug-in to Archivists’ Toolkit to select a container and then wands in a barcode to associate it with that container. That barcode then gets stored in an item record in our ILS, in the off-site storage system’s database, and on a barcode sheet. It’s also serialized as part of the EAD. When a patron wants a box, a series of systems and people  (including Aeon, our ILS, and our off-site storage workers) work behind the scenes to bring it here within two business days. It’s pretty magical, and it’s how we can provide a high quality of service to our patrons.

However, there are a few less fool-proof methods by which a barcode could be introduced to a database. A barcode could be entered as part of rapid data entry. Since container information is entered as part of each component, it’s possible to accidentally mistype the container number but wand the same barcode. It’s also possible for barcodes to be entered as part of EAD import (although they don’t make it easy) or as a SQL update.

We also have an extra legacy problem — a bad update (I don’t know if it was SQL or XSL) assigned barcodes based on container number, but didn’t check for container type. So, reel 1 was assigned the same barcode as box 1.

The diagnosis is pretty straightforward (although I attempted it a zillion much-harder ways first). As always, many many thanks to my colleague Mark, the brains behind this operation, who actually wrote this SQL query after seeing me flail about.

This query basically asks — okay. Give me a list of all of your distinct barcodes and container numbers. Do you have any barcodes that are associated with more than one container number? If so, that’s a huge problem! Report back that barcode!

Barcode 987654321 should only be associated with box 1 — if it’s associated with both box 1 AND box 2, that’s a real problem. It means either that somehow the barcode was duplicated (which we go to GREAT PAINS to prevent happening) or that there was a data entry error.

You can run this query against your own AT database to see if there were any barcodes assigned to more than one box.

select barcode,
group_concat(distinct coalesce(container1NumericIndicator, container1AlphaNumIndicator)) from ArchDescriptionInstances where barcode <> ''
group by barcode
having count(barcode) > 1
and count(distinct coalesce(container1NumericIndicator, container1AlphaNumIndicator)) > 1

Maybe you have a healthier relationship with the small stuff than I do, but this is the kind of thing that maintains a constant low-level stress buzz in the back of my head. I find it reassuring that we can run these kinds of reports/checks to know for sure that collection control is working the way it should.

Oh, and by the way, the diagnosis query for my reel/box problem is here:

SELECT distinct barcode, container1Type FROM
) as tt
group by tt.barcode
having count(tt.barcode) > 1

I hope that someone out there finds this useful!

Sorting, Adding, and Integrating Container Lists

We recently received a large (220 record carton) addition to one of our collections of publishers’ records. The files were in good order in well labeled folders so it wasn’t a matter of having to do any complicated arrangement and description work with the records, and we definitely weren’t refoldering, just compiling a box list and integrating that box list into the existing finding aid.

One of our students went through the containers and created an inventory, but then we hit what is a pretty typical challenge with these sorts of accessions, and with additions to existing collections more generally. I had a box list, but the boxes themselves weren’t received in any particular order so while obviously the records had been stored alphabetically, the list I received (in a semi-structured Microsoft Word document) reflected that on the box level, but not as a whole. So I wanted to take that box list and alphabetize the entire list by file name (which, in this case since these are editorial files for a publisher, was also by the last name of the author). Complicating this was the fact that there were nested files within the document and I wanted to keep those files grouped with their parent files and only alphabetize by the parent component. Also, I wanted to be able to integrate this list with the existing EAD container list– ideally to insert the new files into the container list in proper alphabetical sequence.

Because part of my data (the existing EAD container list) was only available to me in xml, I first converted the Microsoft Word container list into EAD as well so that all of the data I was working with was at least in the same format. I then pasted all of the code together into one document.

Then I created a new open refine project by opening my xml document in Open Refine– when importing xml into OR it asks the user to click on the first element to load, so I skipped my dsc tag since I was working on the component level and selected my first <c> tag and then “create project” to get something that looks like this:



But then I need to scroll right, because I am dealing with hierarchical description with nested sub-components and I want to alphabetize by that top level hierarchy– what is represented here as the c-did-unittitle


To sort the list alphabetically a this point you need to click on the arrow at the top of that column, sort by text a-z

Voila, the file list is now sorted alphabetically, with the nested subcomponents grouped together as part of the record for the parent component.

Up at very top go to “sort” and preserve the reordering permanently so that when you export it is still alphabetized (you can also do a custom export that will allow you to only export certain columns and to reorder as necessary). Awesome, so now I have a spreadsheet that has taken all of my data, integrated it, and alphabetized it.

Now, however, I need to get it back into xml so that I can insert it into my container list. We’ve covered this ground before– Mo has a stellar tutorial for this spreadsheet to EAD thing, and another intrepid Chaos–>Order commenter (this blog is one of the few places where you actually should read the comments) brought to my attention this fantastic site, or you can export what you have into an Excel sheet and import it directly into Oxygen and massage from there.

So I converted my spreadsheet back into EAD for upload to our finding aids site.  This solution didn’t end up being perfect–in all of the format conversions I lost some nesting of components that I had to go back and deal with manually. There was also some sloppy data entry that I didn’t catch before I started working through the data– but these are things I can solve either through better management or a bit more experimenting with conversion processes. Even though this wasn’t perfect, it was a fast solution for dealing with a very common and often troublesome scenario.