Processing Levels: The Hows and Whys

It’s no surprise to anyone who has been reading this blog that I am a firm believer in building a processing program that relies heavily on minimal processing techniques, including developing a program that applies different levels of processing to collections, or parts of collections.   Describing our collections is one of the most important things that we as archivists do, and also one of the most time-consuming and expensive. We want to make sure that our time and intellectual capital is being well spent, and I firmly believe that the thoughtful, intentional application of processing levels is a really important way to ensure that.  This leads to more accessible collections, encourages collection-level thinking and analysis, and opens up archivists’ time to work on big, strategic projects.

Standards like DACS encourage this kind of collection-level thinking and support different levels of arrangement and description, but there’s not a lot of advice out there about how and when to apply each of these levels (though the University of California’s Efficient Processing Guidelines, as always, does a great job of this). How do we decide if a collection is a good candidate to describe at the collection level versus the file level?  And what principles guide these decisions?  Here I’ll give you some thoughts into the principles I’ve used to build a levels-based processing program, and then some criteria I use when making decisions about an individual collection.

Thinking Programatically

Put Your Thinking Cap On:

Start by analyzing the records (their content and context) at a high level. What does the collection as a whole tell us? What are the major pieces of the collection and how do they relate to each other? Why does that matter? How can be best expose all of this to researchers? I’m not gonna lie- this is hard. This is way harder than making a container list. However, it brings a lot more value to the table. Archivists are trained to understand the ways that records are created; and to assess their potential value as evidence, information, and/or as symbols. Often by doing this higher level intellectual work at the outset we can create very robust and meaningful description that exposes how the parts of the whole of the collection come together and how they function without doing a significant amount of work in the collection.

Define Terms and Build Consensus:

Be clear about what you mean by a level of processing. It is critical that all stakeholders—archivists, curators, research services staff, donors—are all on the same page about what it means for a collection to be arranged and described at a certain level. This means defining and documenting these terms and circulating them widely throughout the organization. It also means being clear about amount of both time and money required to arrange and describe collections to different levels.

It’s also very important to involve institutional stakeholders in your decision making process. Assessing stakeholder needs and managing expectations from the outset helps to ensure that processing projects go smoothly and end with happy stakeholders. In my institution this generally means that the archivists work with with curators to make sure that we understand the value and needs of the collection, that they understand what I mean by a certain level of description, and that I can clearly communicate how more detailed processing of one collection impacts the time and resources available to address other projects that individual has a stake in.

Always Look For the Golden Minimum:

I always approach assigning processing levels by determining what the goals for a collection are (determined in conjunction with stakeholders!) are and what path provides the lowest set of barriers to getting there.  Greene and Meissner call this sweet spot where you meet your stated needs with the lowest investment of time and resources the golden minimum and this should be the goal of all processing projects.

Processing is Iterative:

This is huge for me. I go back and tweak description ALL THE TIME. Sometimes I’ve straight up misjudged what amount of description was necessary to facilitate discovery, sometimes research interests and needs change and the old levels of arrangement or description didn’t cut it anymore. Your needs change and evolve, the researchers needs change and evolve over time, the institutional context changes, sometimes you realize that something, for whatever reason just isn’t working. You always have the option to go back into a collection and do more. You never, however, have the ability to recapture the time that you spent on a collection up front, so be thoughtful about how you apply that time to best meet the needs of the institution, the researchers, you and your colleagues, and the collection itself.

Arrangement and Description are Not the Same Thing:

And don’t need to happen at the same level nor happen at the same level across all parts of a collection. A collection can be arranged at the series level and described at the file level. Or completely unarranged but described at the series level. By breaking apart these two aspects of processing we have more flexibility in how we approach and make available collections, and we can be more efficient and effective in managing each individual collection and serving our users.

Discovery and Access are Key:

At the end of the day, the main things to keep in mind when determining the most appropriate processing level are discovery and access. The main goal of any processing project is to give users enough information in our description to both identify the material they are most interested in, and to be able to put their hands on it. How much description is necessary to find relevant material in a collection? What do you need to know to successfully retrieve that relevant box?

Making Decisions at the Collection Level

Now that we know why we’re doing this, and what principles are guiding the application of processing levels, here are some criteria that I use to determine what the most appropriate levels of arrangement and description for a collection are:

  • Research Value and Use: If a collection has a high research value and you anticipate heavy and sustained use, it may well be worthwhile to invest additional time and resources into it in at the outset. This is especially true if the collection is not well ordered or is difficult to access.
  • Institutional Priorities: While I tend to default towards more minimal processing most of the time, there are plenty of internal factors that may encourage me to apply more detailed levels of processing. A flagship collection in an area where we are trying to build collections, if material from a collection is going to be featured in an exhibition, how much staff time needs to be devoted to other projects, how administrators allocate resources—all of these may affect processing decisions.
  • Restrictions: If a collection has significant access or use restrictions, or if there is a high likelihood that there are materials in the collection that would fall under legal protections such as FERPA or HIPAA (especially if these items are scattered throughout the collection) you will need to arrange the collection at a more granular level to ensure that you are doing your due diligence to meet your legal obligations.
  • Material Type and Original (Dis)Order:   The nature of a collection and the state in which a repository receives it will also, to some extent, determine the amount of archival intervention that it needs to be usable. If a collection arrives foldered, but entirely without a discernable order, it may require at least a series level sorting to enable a researcher to navigate the collection and locate material of interest. This also means that a collection that arrives unfoldered or without any organization will require more granular processing in order to be able to provide meaningful access. If the material is pretty uniform a collection level description will probably suffice. In general, the greater the diversity of the collection the more description is required to render the collection intelligible.
  • Size: I try not to make too many blanket decisions based solely on the size of a collection, but it can be a factor in determining processing levels. A collection that is only one box will not need a tremendous amount of description beyond the collection level because a researcher will only need to request one box to see the entirety of material available—tons of additional description is not going to aid in locating material. Conversely, a collection where one series spans hundreds of boxes will need additional file level description so that a user can isolate and access the part of that series that he or she needs.

These are some of the things that I take into consideration in my role as a manager at an academic special collection library. Other types of repositories and institutional contexts may well have other needs and different criteria. Feel free to add or expand in the comments!

Access Restrictions that Promote Access

Access restrictions, if done well, are tools for ensuring that as much information as possible is made available as broadly as possible while still respecting and adhering to individual privacy, corporate confidentiality, legal requirements, cultural sensitivities, and agreements. In order to promote access, rather than present unnecessary barriers to it, restrictions on the availability of archival materials for research should follow these principles:

noun_30816_cc

Unlock by Eric Bird from the Noun Project

  • They should be as broad as necessary to be practicable, but no broader. Where this point falls will vary between restriction types, collections and repositories, but as archivists we should champion increasing access whenever we can.
  • They should be clear, as concise as possible, and avoid jargon of any type. A typical user should be able to understand the access restrictions. Not sure if your restrictions pass this test? Why not ask a user? This isn’t just a usability issue; it’s an equal access issue.
  • They should spell out exceptions and make the implicit explicit. Publishing information about exceptions, appeals and alternatives that may exist helps ensure that all users have equal access to that information, and that learning about them does not require additional inquiry or personal interaction with a gatekeeper an archivist.
  • They should acknowledge the role of professional judgement and enable appeal. In support of professional transparency and accountability, we need to explain restrictions well enough that researchers can understand both their basis and application, and challenge either element if they have good cause to believe our judgment is in error.

DACS gives some good guidance on what to include in an access restriction. In keeping with and expanding on that, a specific practice that I find helpful is to pay attention to the Five Ws and one H of access restrictions: who, what, where, when, why and how. Most access restrictions will not address all of these, but asking whether or not each applies can be useful when drafting restrictions.

Continue reading

On Containers

I’m here  to talk about boxes. Get excited.

I’ve been spending a LOT of time lately thinking about containers — fixing them, modelling them, figuring out what they are and aren’t supposed to do. And I’ve basically come to the conclusion that as a whole, we spend too much time futzing with containers because we haven’t spent enough time figuring out what they’re for and what they do.

For instance, I wrote a blog post a couple of months ago about work we’re doing to remediate stuff that should not but is happening with containers — barcodes being assigned to two different containers, two different container types with the same barcode/identifier information, etc. Considering the scale of our collections, the scale of these problems is mercifully slight, but these are the kinds of problems that turn into a crisis if a patron is expecting to find material in the box she ordered and the material simply isn’t there.

I’m also working with my colleagues here at Yale and our ArchivesSpace development vendor Hudson Molonglo to add functionality to ArchivesSpace so that it’s easier to work with containers as containers. I wrote a blog post about it on our ArchivesSpace blog. In short, we want to make it much easier to do stuff like assigning locations, assigning barcodes, indicating that container information has been exported to our ILS, etc. In order to do this, we need to know exactly how we want containers to relate to archival description and how they relate to each other.

As I’ve been doing this thinking about specific container issues, I’ve had some thoughts about containers in general. Here they are, in no particular order.

What are container numbers doing for us?

A container number is just a human-readable barcode, right? Something to uniquely identify a container? In other words, speaking in terms of the data model, isn’t this data that says something different but means the same thing? And is this possibly a point of vulnerability? At the end of the day, isn’t a container number  something that we train users to care about when really they want the content they’ve identified? And we have a much better system for getting barcodes to uniquely identify something than we do with box numbers?

In the days that humans were putting box numbers on a call slip and another human was reading that and using that information to interpret shelf location, it made sense to ask the patron to be explicit about which containers were associated with the actual thing that they want to see. But I think that we’ve been too good at training them (and training ourselves) to think in terms of box numbers (and, internally, locations) instead of creating systems that do all of that on the back end. Information about containers should be uniform, unadorned, reliable, and interact seamlessly with data systems. Boxes should be stored wherever is best for their size and climate, and that should be tracked in a locations database that interacts with the requesting database. And the actual information should be associated seamlessly with containers.

This means that instead of writing down a call number and box number and reading a note about how materials of this type are stored on-site and materials of another type are stored off-site, let’s take a lot of human error out of this. Let’s let them just click on what they want to see. Then, the system says “a-ha! There are so many connections in my database! This record is in box 58704728702861, which is stored in C-29 Row 11, Bay 2, Shelf 2. I’ll send this to the queue that prints a call slip so a page can get that right away!” And instead of storing box numbers and folder numbers in the person’s “shopping cart” of what she’s seen, let’s store unique identifiers for the archival description, so that if that same record get’s re-housed into box 28704728702844 and moved to a different location, the patron doesn’t have to update her citation in any scholarly work she produces. Even if the collection gets re-processed, we could make sure that identifiers for stuff that’s truly the same persists.

Also, don’t tell me that box numbers do a good job of giving cues about order and scale. There are waaaaaayyyyy better ways of doing that than making people infer relationships based on how much material fits into 0.42 linear feet.

We have the concepts. Our practice needs to catch up, and our tools do too.

Darn it, Archivists’ Toolkit, you do some dumb things with containers

Archival management systems are, obviously, a huge step up from managing this kind of information in disparate documents and databases. But I think that we’re still a few years away from our systems meeting their potential. And I really think that folks who do deep thinking about archival description and standards development need to insert themselves into these conversations.

Here’s my favorite example. You know that thing where you’re doing description in AT and you want to associate a container with the records that you just described in a component? You know how it asks you what kind of an instance you want to create? That is not a thing. This is just part of the AT data model — there’s nothing like this in DACS, nothing like it in EAD. Actual archival standards are smart enough to not say very much about boxes because they’re boxes and who cares? When it exports to EAD, it serializes as @label. LABEL. The pinnacle of semantic nothingness!

This is not a thing.

This is not a thing.

Like, WHY? I can see that this could be the moment where AT is asking you “oh, hey, do you want to associate this with a physical container in a physical place or do you want to associate it with a digital object on teh interwebz?” but there’s probably a better way of doing this.

My problem with this is that it has resulted in A LOT of descriptive malpractice. Practitioners who aren’t familiar with how this serializes in EAD think that they’re describing the content (“oh yes! I’ve done the equivalent of assigning a form/genre term and declaring in a meaningful way that these are maps!”) when really they’ve put a label on the container. The container is not the stuff! If you want to describe the stuff, you do that somewhere else!

Oh my gosh, my exclamation point count is pretty high right now. I’ll see if I can pull myself together and soldier on.

Maybe we should be more explicit about container relationships.

Now, pop quiz, if you have something that is in the physical collection and has also been microfilmed, how do you indicate that?

In Archivists’ Toolkit, there’s nothing clear about this. You can associate more than one instance with an archival description, but you can also describe levels of containers that (ostensibly) describe the same stuff, but happen to be a numbered item within a folder, within a box.

Anything can happen here.

Anything can happen here.

So this means that in the scenario I mentioned above, it often happens that someone will put the reel number into container 3, making the database think that the reel is a child of the box.

But even if all of the data entry happens properly, EAD import into Archivists’ Toolkit will take any three <container> tags and instead of making them siblings, brings the three together into parent-child instance relationship like you see above. This helps maintain relationships between boxes and folders, but is a nightmare if you have a reel in there.

EAD has a way of representing these relationships, but the AT EAD export doesn’t really even do that properly.

 <c id="ref10" level="file">
   <did>
     <unittitle>Potter, Hannah</unittitle>
     <unitdate normal="1851/1851">1851</unitdate>
     <container id="cid342284" type="Box" label="Mixed Materials (39002038050457)">1</container>
     <container parent="cid342284" type="Folder">2</container>
   </did>
 </c>

 <c id="ref11" level="file">
   <did>
     <unittitle>Potter, Horace</unittitle>
     <unitdate normal="1824/1824">1824</unitdate>
     <container id="cid342283" type="Box" label="Mixed Materials (39002038050457)">1</container>
     <container parent="cid342283" type="Folder">3</container>
   </did>
 </c>

Here, we see that these box 1’s are the same — they have the same barcode (btw, see previous posts for help working out what to do with this crazy export and barcodes). But the container id makes it seem like these are two different things — they have two different container id’s and their folders refer two two different parents.

What we really want to say is “This box 1 is the same as the other box 1’s. It’s not the same as reel 22. Folder 2 is inside of box 1, and so is folder 3.” Once we get our systems to represent all of this, we can do much better automation, better reporting, and have a much more reliable sense of where our stuff is.

So if we want to be able to work with our containers as they actually are, we need to represent those properly in our technology. What should we be thinking about in our descriptive practice now that we’ve de-centered the box?

“Box” is not a level of description.

In ISAD(G) (explicitly) and DACS (implicitly), archivists are required to explain the level at which they’re describing aggregations of records. There isn’t a vocabulary for this, but traditionally, these levels include “collection”, “record group”, “series”, “file” and “item.” Note that “box” is not on this list or any other reasonable person’s list. I know everyone means well, and I would never discourage someone from processing materials in aggregate, but the term “box-level processing” is like nails on a chalkboard to me. As a concept, it should not be a thing. Now, series-level processing? Consider me on board! File-group processing? Awesome, sounds good! Do you want to break those file groups out into discrete groups of records that are often surrounded by a folder and hopefully are associated with distinctive terms, like proper nouns? Sure, if you think it will help and you don’t have anything better to do.

A box is usually just an accident of administravia. I truly believe that archivists’ value is our ability to discern and describe aggregations of records — that box is not a meaningful aggregation, and describing it as such gives a false impression of the importance of one linear foot of material. I’d really love to see a push toward better series-level or file-group-level description, and less file-level mapping, especially for organizations’ records. Often, unless someone is doing a known item search, there’s nothing distinct enough about individual files as evidence (and remember, this is why we do processing — to provide access to and explain records that give evidence of the past) to justify sub-dividing them. I also think that this could help us think past unnecessary sorting and related housekeeping — our job isn’t to make order from chaos*, it’s to explain records and their context of creation of use. If records were created chaotically and kept in a chaotic way, are we really illuminating anything by prescribing artificial order?

This kind of thinking will be increasingly important when our records aren’t tied to physical containers.

In conclusion, let’s leave the robot work to the robots.

If I never had to translate a call number to a shelf location again, it would be too soon (actually, we don’t do that at MSSA, but still). Let’s stop making our patrons care about boxes, and let’s start making our technology work for us.


* This blog’s title, Chaos –> Order, is not about bringing order to a chaotic past — it’s about bringing order to our repositories and to our work habits. In other words, get that beam out of your own eye, sucka, before you get your alphabetization on.