Getting into the guts of AT

It’s the thing that we keep saying — in order to deal with our masses of stuff better, we need better ways of understanding what we have. A lot of my questions aren’t just about what’s in our finding aids — they’re about the relationships between archival materials and other archival management functions — accessioning, digital object management, location management, container management. For instance, the following questions have come up in the past or could easily come up in the future:

  • Which collections are constituted of accessions that came in before 1980?
  • Which collections have digital objects associated with them? What are the URIs of those digital objects?
  • I have a barcode for a box. Can you tell me the materials that are supposed to be in that box? What collection is this from?
  • We haven’t used this location listed in the location table since 2005! Are there any boxes associated with that location? What are they?

In order to answer these questions, I need to write reports that join different tables in Archivists’ Toolkit together. And this is a little bit tricky, because in their own way, components in the AT database are hierarchical (just like in an EAD-encoded finding aid). If I have an instance (a container with a barcode), and I want to know which collection it belongs to, I don’t have a direct relationship in the database. Instead, an instance is associated with a component. That component is associated with its parent component. It may have a lot of ancestor components before the most high-level component is associated with the collection-level information in the resource table.

These relationships are made in sql through what are called “joins”. And joining a table on itself (in some cases several times, recursively) is a huge friggin pain in the neck. So, after mucking around for a little while, the solution was to just ask someone smarter than me how he would handle this.

This is where my colleague Steelsen comes in — Steelsen introduced the idea of writing a stored procedure that would look for the top-most component instead of having to do this through joins. And then he wrote them for me, because he is a mensch of the first order. His procedures are here, and available to anyone who might find them useful. They have seriously revolutionized the way that I’ve been able to do reporting and solve problems.

For instance, something that folks have been begging for is a barcode look-up tool — they have a barcode, and they want to know which collection it belongs to, what its call number is, which location it’s assigned to, and which components are associated with that box. So here’s what I wrote (the user indicates the barcode in the where statement):

use schema;
SELECT 
 CONCAT(r.resourceIdentifier1,
 ' ',
 LPAD(r.resourceIdentifier2, 4, '00')) 'Collection',
 r.title 'Collection Title',
 series.subdivisionIdentifier 'Series/Accession Number',
 series.title 'Series Title',
 rc.title 'Component Title',
 rc.dateExpression 'Component Date',
 adi.container1Type 'Container Type',
 adi.container1NumericIndicator BoxNum,
 adi.container1AlphaNumIndicator BoxAlpha,
 adi.container2NumericIndicator FolderNum,
 adi.container2AlphaNumIndicator FolderAlpha,
 adi.archDescriptionInstancesId InstanceID,
 adi.barcode Barcode,
 adi.userDefinedString1 'Voyager Info',
 loc.LocationId,
 loc.building,
 loc.room,
 loc.coordinate1NumericIndicator ShelfNum,
 loc.coordinate1AlphaNumIndicator ShelfAlpha
FROM
 ArchDescriptionInstances adi
 INNER JOIN
 ResourcesComponents rc ON adi.resourceComponentId = rc.resourceComponentId
 INNER JOIN
 LocationsTable loc ON adi.locationID = loc.locationID
 INNER JOIN
 Resources r ON r.resourceId = GETRESOURCEFROMCOMPONENT(rc.resourceComponentId)
 LEFT OUTER JOIN
 ResourcesComponents series ON GETTOPCOMPONENT(rc.resourceComponentId) = series.resourceComponentID
WHERE
 adi.barcode = 39002042658774;

Here I use two of Steelsen’s procedures. In GETRESOURCEFROMCOMPONENT, I go up the tree of a component to find out what resource it belongs to and join that to the resource. I use GETTOPCOMONENT to help figure out what series a component belongs to (this assumes that the top-most component is a series, but that’s usually a safe bet for us).

I’m a sql n00b, and this isn’t the most efficient query I’ve ever run, but I’m really happy with the results, which can be viewed in a spreadsheet here.

By changing the where statement, I can find out all kinds of associated information about a location, a collection, a box, whatever. I can find out if barcodes have been assigned to components with different box numbers; I can find out if components with the same barcode have been assigned to more than one location. This set of procedures has really been a godsend to help me know more about the problems I’m fixing. So many thanks to Steelsen. I hope others find them useful too.

Advertisements

Round-tripping EAD, updated — how to not lose those barcodes

Several weeks ago, Bonnie published a hugely-useful checklist of things to think about when trying to round-trip data from Archivists’ Toolkit to EAD and back to Archivists’ Toolkit to help minimize data loss.

Don’t lose those barcodes!

In a comment, my colleague Mark gave a very sensible suggestion for how to not lose barcodes during that trip. When you export an EAD from AT, the container information looks something like this:

<c id="ref21" level="file">
    <did>
        <unittitle>Notecards</unittitle>
        <container id="cid1063028" type="Box" label="Mixed Materials (39002042652603)">7</container>
        <unitdate normal="1930/1940" type="inclusive">circa 1935-1950</unitdate>
    </did>
</c>

What I see here is that two separate fields in the AT database have been jammed together for EAD export. “Mixed materials” lives in the ArchDescriptionInstances table in AT as instanceType. 39002042652603 lives in the ArchDescriptionInstances table as barcode. During export, these are brought together as container @label.

So, the trick is to get them back into the fields in tables where they belong. By default, AT imports container @label to ArchDescriptionInstances.instanceType. You want to get the barcode information into barcode. Here’s how.

I sometimes have bad luck getting AT to re-import if container @label=”Mixed Materials (39002042652603)” — I do a big find/replace to make that look like @label=”39002042652603″

Then, I import into AT.

Getting the barcodes where they belong is pretty easy, but it has to happen in SQL in the back-end database. If you haven’t written SQL before, you might want to run this by a database administrator, just to have another set of eyes on it. You DEFINITELY want to do this in the test or dev version of your database first.

Step 1: copy over barcodes from instanceType to barcode (this assumes 14-digit barcodes)

UPDATE ArchDescriptionInstances 
SET barcode = instanceType
WHERE instanceType REGEXP '^[0-9]{14}$';

Step 2: replace barcodes in instanceType with “Mixed materials” (this assumes that you want everything to be “Mixed Materials”. I have a rant in the works about how inappropriate it is to have this label on containers anyway, but I’ll save that for another day).

UPDATE ArchDescriptionInstances 
SET instanceType = 'Mixed materials'
WHERE instanceType REGEXP '^[0-9]{14}$';

Congratulations! You haven’t lost your barcodes! You can now delete the previous resource record and enjoy your new-and-improved one.

But that’s not all…

Extra super-secret tips for folks keeping Voyager item information in Archivists’ Toolkit.

So, where I work, this wouldn’t be enough. We have a whole other set of data in the ArchDescriptionInstances table that isn’t serialized in EAD. Container type, location code, Voyager bibid, container restriction and a Boolean of whether this information has been exported to Voyager are kept in the ArchDescriptionInstances table in user defined fields. We then have a background program that sends this information to Voyager, where holdings/item records are created.

This means that if I round-trip EAD, this item information that exists is disassociated with the descriptive information. This doesn’t work.

Note: If you’re also using the plug-in that Yale developed to keep Voyager holdings information in AT, could you please drop me a line? We’re starting to think about migration paths to ArchivesSpace.

But I had a brainwave — if I round-trip EAD with barcodes, I now have a key. The same information associated with a barcode which is now blank (what’s the Voyager bib/holdings ID? was it exported?), is present in a different record with the same barcode from when the information was entered before, before the EAD was round-tripped.

This means that no one has to re-enter the same information or re-export to Voyager. Check out my SQL update below — same warnings and caveats apply.

 

start transaction;

UPDATE schema.ArchDescriptionInstances AS ArchDesc

JOIN (
SELECT
barcode,
userDefinedString1,
userDefinedString2,
locationId,
userDefinedBoolean1,
userDefinedBoolean2
FROM schema.ArchDescriptionInstances
WHERE userDefinedString1 = "{{{bibid}}}"
) AS newdata ON newdata.barcode=ArchDesc.barcode
SET
ArchDesc.userDefinedString1 = newdata.UserDefinedString1,
ArchDesc.userDefinedString2 = newdata.UserDefinedString2,
ArchDesc.locationId = newdata.locationId,
ArchDesc.userDefinedBoolean1 = newdata.userDefinedBoolean1,
ArchDesc.userDefinedBoolean2 = newdata.userDefinedBoolean2

WHERE
ArchDesc.userDefinedString1 = ""
AND ArchDesc.userDefinedString2 = "";

commit;