I’m here to talk about boxes. Get excited.
I’ve been spending a LOT of time lately thinking about containers — fixing them, modelling them, figuring out what they are and aren’t supposed to do. And I’ve basically come to the conclusion that as a whole, we spend too much time futzing with containers because we haven’t spent enough time figuring out what they’re for and what they do.
For instance, I wrote a blog post a couple of months ago about work we’re doing to remediate stuff that should not but is happening with containers — barcodes being assigned to two different containers, two different container types with the same barcode/identifier information, etc. Considering the scale of our collections, the scale of these problems is mercifully slight, but these are the kinds of problems that turn into a crisis if a patron is expecting to find material in the box she ordered and the material simply isn’t there.
I’m also working with my colleagues here at Yale and our ArchivesSpace development vendor Hudson Molonglo to add functionality to ArchivesSpace so that it’s easier to work with containers as containers. I wrote a blog post about it on our ArchivesSpace blog. In short, we want to make it much easier to do stuff like assigning locations, assigning barcodes, indicating that container information has been exported to our ILS, etc. In order to do this, we need to know exactly how we want containers to relate to archival description and how they relate to each other.
As I’ve been doing this thinking about specific container issues, I’ve had some thoughts about containers in general. Here they are, in no particular order.
What are container numbers doing for us?
A container number is just a human-readable barcode, right? Something to uniquely identify a container? In other words, speaking in terms of the data model, isn’t this data that says something different but means the same thing? And is this possibly a point of vulnerability? At the end of the day, isn’t a container number something that we train users to care about when really they want the content they’ve identified? And we have a much better system for getting barcodes to uniquely identify something than we do with box numbers?
In the days that humans were putting box numbers on a call slip and another human was reading that and using that information to interpret shelf location, it made sense to ask the patron to be explicit about which containers were associated with the actual thing that they want to see. But I think that we’ve been too good at training them (and training ourselves) to think in terms of box numbers (and, internally, locations) instead of creating systems that do all of that on the back end. Information about containers should be uniform, unadorned, reliable, and interact seamlessly with data systems. Boxes should be stored wherever is best for their size and climate, and that should be tracked in a locations database that interacts with the requesting database. And the actual information should be associated seamlessly with containers.
This means that instead of writing down a call number and box number and reading a note about how materials of this type are stored on-site and materials of another type are stored off-site, let’s take a lot of human error out of this. Let’s let them just click on what they want to see. Then, the system says “a-ha! There are so many connections in my database! This record is in box 58704728702861, which is stored in C-29 Row 11, Bay 2, Shelf 2. I’ll send this to the queue that prints a call slip so a page can get that right away!” And instead of storing box numbers and folder numbers in the person’s “shopping cart” of what she’s seen, let’s store unique identifiers for the archival description, so that if that same record get’s re-housed into box 28704728702844 and moved to a different location, the patron doesn’t have to update her citation in any scholarly work she produces. Even if the collection gets re-processed, we could make sure that identifiers for stuff that’s truly the same persists.
Also, don’t tell me that box numbers do a good job of giving cues about order and scale. There are waaaaaayyyyy better ways of doing that than making people infer relationships based on how much material fits into 0.42 linear feet.
We have the concepts. Our practice needs to catch up, and our tools do too.
Darn it, Archivists’ Toolkit, you do some dumb things with containers
Archival management systems are, obviously, a huge step up from managing this kind of information in disparate documents and databases. But I think that we’re still a few years away from our systems meeting their potential. And I really think that folks who do deep thinking about archival description and standards development need to insert themselves into these conversations.
Here’s my favorite example. You know that thing where you’re doing description in AT and you want to associate a container with the records that you just described in a component? You know how it asks you what kind of an instance you want to create? That is not a thing. This is just part of the AT data model — there’s nothing like this in DACS, nothing like it in EAD. Actual archival standards are smart enough to not say very much about boxes because they’re boxes and who cares? When it exports to EAD, it serializes as @label. LABEL. The pinnacle of semantic nothingness!
This is not a thing.
Like, WHY? I can see that this could be the moment where AT is asking you “oh, hey, do you want to associate this with a physical container in a physical place or do you want to associate it with a digital object on teh interwebz?” but there’s probably a better way of doing this.
My problem with this is that it has resulted in A LOT of descriptive malpractice. Practitioners who aren’t familiar with how this serializes in EAD think that they’re describing the content (“oh yes! I’ve done the equivalent of assigning a form/genre term and declaring in a meaningful way that these are maps!”) when really they’ve put a label on the container. The container is not the stuff! If you want to describe the stuff, you do that somewhere else!
Oh my gosh, my exclamation point count is pretty high right now. I’ll see if I can pull myself together and soldier on.
Maybe we should be more explicit about container relationships.
Now, pop quiz, if you have something that is in the physical collection and has also been microfilmed, how do you indicate that?
In Archivists’ Toolkit, there’s nothing clear about this. You can associate more than one instance with an archival description, but you can also describe levels of containers that (ostensibly) describe the same stuff, but happen to be a numbered item within a folder, within a box.
Anything can happen here.
So this means that in the scenario I mentioned above, it often happens that someone will put the reel number into container 3, making the database think that the reel is a child of the box.
But even if all of the data entry happens properly, EAD import into Archivists’ Toolkit will take any three <container> tags and instead of making them siblings, brings the three together into parent-child instance relationship like you see above. This helps maintain relationships between boxes and folders, but is a nightmare if you have a reel in there.
EAD has a way of representing these relationships, but the AT EAD export doesn’t really even do that properly.
<c id="ref10" level="file">
<container id="cid342284" type="Box" label="Mixed Materials (39002038050457)">1</container>
<container parent="cid342284" type="Folder">2</container>
<c id="ref11" level="file">
<container id="cid342283" type="Box" label="Mixed Materials (39002038050457)">1</container>
<container parent="cid342283" type="Folder">3</container>
Here, we see that these box 1’s are the same — they have the same barcode (btw, see previous posts for help working out what to do with this crazy export and barcodes). But the container id makes it seem like these are two different things — they have two different container id’s and their folders refer two two different parents.
What we really want to say is “This box 1 is the same as the other box 1’s. It’s not the same as reel 22. Folder 2 is inside of box 1, and so is folder 3.” Once we get our systems to represent all of this, we can do much better automation, better reporting, and have a much more reliable sense of where our stuff is.
So if we want to be able to work with our containers as they actually are, we need to represent those properly in our technology. What should we be thinking about in our descriptive practice now that we’ve de-centered the box?
“Box” is not a level of description.
In ISAD(G) (explicitly) and DACS (implicitly), archivists are required to explain the level at which they’re describing aggregations of records. There isn’t a vocabulary for this, but traditionally, these levels include “collection”, “record group”, “series”, “file” and “item.” Note that “box” is not on this list or any other reasonable person’s list. I know everyone means well, and I would never discourage someone from processing materials in aggregate, but the term “box-level processing” is like nails on a chalkboard to me. As a concept, it should not be a thing. Now, series-level processing? Consider me on board! File-group processing? Awesome, sounds good! Do you want to break those file groups out into discrete groups of records that are often surrounded by a folder and hopefully are associated with distinctive terms, like proper nouns? Sure, if you think it will help and you don’t have anything better to do.
A box is usually just an accident of administravia. I truly believe that archivists’ value is our ability to discern and describe aggregations of records — that box is not a meaningful aggregation, and describing it as such gives a false impression of the importance of one linear foot of material. I’d really love to see a push toward better series-level or file-group-level description, and less file-level mapping, especially for organizations’ records. Often, unless someone is doing a known item search, there’s nothing distinct enough about individual files as evidence (and remember, this is why we do processing — to provide access to and explain records that give evidence of the past) to justify sub-dividing them. I also think that this could help us think past unnecessary sorting and related housekeeping — our job isn’t to make order from chaos*, it’s to explain records and their context of creation of use. If records were created chaotically and kept in a chaotic way, are we really illuminating anything by prescribing artificial order?
This kind of thinking will be increasingly important when our records aren’t tied to physical containers.
In conclusion, let’s leave the robot work to the robots.
If I never had to translate a call number to a shelf location again, it would be too soon (actually, we don’t do that at MSSA, but still). Let’s stop making our patrons care about boxes, and let’s start making our technology work for us.
* This blog’s title, Chaos –> Order, is not about bringing order to a chaotic past — it’s about bringing order to our repositories and to our work habits. In other words, get that beam out of your own eye, sucka, before you get your alphabetization on.