Processing Levels: The Hows and Whys

It’s no surprise to anyone who has been reading this blog that I am a firm believer in building a processing program that relies heavily on minimal processing techniques, including developing a program that applies different levels of processing to collections, or parts of collections.   Describing our collections is one of the most important things that we as archivists do, and also one of the most time-consuming and expensive. We want to make sure that our time and intellectual capital is being well spent, and I firmly believe that the thoughtful, intentional application of processing levels is a really important way to ensure that.  This leads to more accessible collections, encourages collection-level thinking and analysis, and opens up archivists’ time to work on big, strategic projects.

Standards like DACS encourage this kind of collection-level thinking and support different levels of arrangement and description, but there’s not a lot of advice out there about how and when to apply each of these levels (though the University of California’s Efficient Processing Guidelines, as always, does a great job of this). How do we decide if a collection is a good candidate to describe at the collection level versus the file level?  And what principles guide these decisions?  Here I’ll give you some thoughts into the principles I’ve used to build a levels-based processing program, and then some criteria I use when making decisions about an individual collection.

Thinking Programatically

Put Your Thinking Cap On:

Start by analyzing the records (their content and context) at a high level. What does the collection as a whole tell us? What are the major pieces of the collection and how do they relate to each other? Why does that matter? How can be best expose all of this to researchers? I’m not gonna lie- this is hard. This is way harder than making a container list. However, it brings a lot more value to the table. Archivists are trained to understand the ways that records are created; and to assess their potential value as evidence, information, and/or as symbols. Often by doing this higher level intellectual work at the outset we can create very robust and meaningful description that exposes how the parts of the whole of the collection come together and how they function without doing a significant amount of work in the collection.

Define Terms and Build Consensus:

Be clear about what you mean by a level of processing. It is critical that all stakeholders—archivists, curators, research services staff, donors—are all on the same page about what it means for a collection to be arranged and described at a certain level. This means defining and documenting these terms and circulating them widely throughout the organization. It also means being clear about amount of both time and money required to arrange and describe collections to different levels.

It’s also very important to involve institutional stakeholders in your decision making process. Assessing stakeholder needs and managing expectations from the outset helps to ensure that processing projects go smoothly and end with happy stakeholders. In my institution this generally means that the archivists work with with curators to make sure that we understand the value and needs of the collection, that they understand what I mean by a certain level of description, and that I can clearly communicate how more detailed processing of one collection impacts the time and resources available to address other projects that individual has a stake in.

Always Look For the Golden Minimum:

I always approach assigning processing levels by determining what the goals for a collection are (determined in conjunction with stakeholders!) are and what path provides the lowest set of barriers to getting there.  Greene and Meissner call this sweet spot where you meet your stated needs with the lowest investment of time and resources the golden minimum and this should be the goal of all processing projects.

Processing is Iterative:

This is huge for me. I go back and tweak description ALL THE TIME. Sometimes I’ve straight up misjudged what amount of description was necessary to facilitate discovery, sometimes research interests and needs change and the old levels of arrangement or description didn’t cut it anymore. Your needs change and evolve, the researchers needs change and evolve over time, the institutional context changes, sometimes you realize that something, for whatever reason just isn’t working. You always have the option to go back into a collection and do more. You never, however, have the ability to recapture the time that you spent on a collection up front, so be thoughtful about how you apply that time to best meet the needs of the institution, the researchers, you and your colleagues, and the collection itself.

Arrangement and Description are Not the Same Thing:

And don’t need to happen at the same level nor happen at the same level across all parts of a collection. A collection can be arranged at the series level and described at the file level. Or completely unarranged but described at the series level. By breaking apart these two aspects of processing we have more flexibility in how we approach and make available collections, and we can be more efficient and effective in managing each individual collection and serving our users.

Discovery and Access are Key:

At the end of the day, the main things to keep in mind when determining the most appropriate processing level are discovery and access. The main goal of any processing project is to give users enough information in our description to both identify the material they are most interested in, and to be able to put their hands on it. How much description is necessary to find relevant material in a collection? What do you need to know to successfully retrieve that relevant box?

Making Decisions at the Collection Level

Now that we know why we’re doing this, and what principles are guiding the application of processing levels, here are some criteria that I use to determine what the most appropriate levels of arrangement and description for a collection are:

  • Research Value and Use: If a collection has a high research value and you anticipate heavy and sustained use, it may well be worthwhile to invest additional time and resources into it in at the outset. This is especially true if the collection is not well ordered or is difficult to access.
  • Institutional Priorities: While I tend to default towards more minimal processing most of the time, there are plenty of internal factors that may encourage me to apply more detailed levels of processing. A flagship collection in an area where we are trying to build collections, if material from a collection is going to be featured in an exhibition, how much staff time needs to be devoted to other projects, how administrators allocate resources—all of these may affect processing decisions.
  • Restrictions: If a collection has significant access or use restrictions, or if there is a high likelihood that there are materials in the collection that would fall under legal protections such as FERPA or HIPAA (especially if these items are scattered throughout the collection) you will need to arrange the collection at a more granular level to ensure that you are doing your due diligence to meet your legal obligations.
  • Material Type and Original (Dis)Order:   The nature of a collection and the state in which a repository receives it will also, to some extent, determine the amount of archival intervention that it needs to be usable. If a collection arrives foldered, but entirely without a discernable order, it may require at least a series level sorting to enable a researcher to navigate the collection and locate material of interest. This also means that a collection that arrives unfoldered or without any organization will require more granular processing in order to be able to provide meaningful access. If the material is pretty uniform a collection level description will probably suffice. In general, the greater the diversity of the collection the more description is required to render the collection intelligible.
  • Size: I try not to make too many blanket decisions based solely on the size of a collection, but it can be a factor in determining processing levels. A collection that is only one box will not need a tremendous amount of description beyond the collection level because a researcher will only need to request one box to see the entirety of material available—tons of additional description is not going to aid in locating material. Conversely, a collection where one series spans hundreds of boxes will need additional file level description so that a user can isolate and access the part of that series that he or she needs.

These are some of the things that I take into consideration in my role as a manager at an academic special collection library. Other types of repositories and institutional contexts may well have other needs and different criteria. Feel free to add or expand in the comments!


The Value of Archival Description, Considered

This is a talk that I gave at the Radcliffe Workshop on Technology and Archival Processing on April 3, 2014. I hope you enjoy what I had to say. I think it dovetails nicely with the work the four of us do on this blog.


I’m very happy to be here today. As Ellen mentioned, my name is Maureen Callahan. I currently work at the Tamiment Library at New York University in a technical services role. In our context, which I know isn’t unique, almost all of our arrangment and description work is done by very new professionals or pre-professionals. This means that most of my job is teaching, coaching and supervising – and making sure that all of the workers I supervise have the infrastructure, support, and knowledge they need to meet our obligations to users and donors.

Because I work with pre-professionals, I think it’s important to be deliberate and take the time to explain the values behind archival description – what our obligations are, how to make our work transparent, what’s valuable and what isn’t, how we should be thinking about how we spend our time, and how to look at the finding aid that we’ve created from a researcher’s point of view.


When the organizers asked me to present, they included a few questions, questions that have been weighing on my mind too.

In their initial email, Ellen Shea and Mary O’Connell Murphy asked, “Is the product of a finding aid worthy of the time required to make them considering emerging technologies? Where do you think research guides might be headed in the future? How do you think they must change in order to improve access to archival collections and meet today’s user’s needs?”

Most provocatively, they asked, “What do researchers really want from finding aids? Do they want them at all?”

And I think that the answer is no. And maybe. And yes.


At its core, I think that this question gets at what is and isn’t valuable about what archivists do, and what might be good for us to pay more attention to.

So, what do finding aids do? Why do we create them?

OK, so we can start by looking at finding aids as a way to address the practical problem of giving potential researchers access to unique or rare material that can only be found in a single location, behind a locked door in a closed stacks. Until you come here and show us your ID and solemnly swear that you’re going to follow our rules, the finding aid is all you get. This is the deal. So, to answer the question of whether researchers want finding aids – no. They don’t. They want the records. But they get the guide first.


And many parts of a finding aid – the parts that we spend so much of our time creating – take this imperfect surrogate role. Many finding aids are built on the model of looking at a body of records, dividing it into groupings (either physically or intellectually, usually both), and then faithfully representing files in that grouping to a mind-numbing level of meticulous detail. I’m going to call this model a map.


And what this slide, which is based on an analysis of the finding aids at the Tamiment Library, will show you is that yes, this work is getting done. We have plenty of information about what the materials tell us about their titles and dates and how much we have of it. But this slide only tells us information about finding aids that have been created. I also know that backlogs are still a problem at a lot of repositories, especially mine. This “mapping” model of tedious representation, starting at the beginning and going to the end, means that often the end never comes. We have plenty of collections that aren’t represented at all. Is this serving our users? Does this meet our donors’ expectations? Can’t we find a better way?


I’m looking forward to hearing from speakers today and tomorrow who will talk about how we can get machines to do some of this mapping for us. Because, as far as I’m concerned, good riddance. I don’t think that archivists are just secretaries for dead people, and I welcome as much automation as we can get for this kind of direct representation of what the records tell us about themselves.


Indeed, it’s already happening. At my institution, we’re just starting to work through the process of accessioning electronic records, and I can already see how tools like Forensic Toolkit help us to get electronic records to describe themselves.

After all, electronic records are records. Digital archives are archives. This is our present, future, and poorly-served past. And in the case of electronic records, we have ways of transcending the problem of our collections being singly, uniquely sited, requiring a mapping of what’s inside.

But some collections are, indeed, unique and sited. Before going on, I want to be pragmatic about the idea of scanning everything that isn’t born-digital, that does require a certain degree of mapping. I think we should be scanning a lot, I think we should be scanning much more than we are, but I don’t think that we necessarily should be scanning everything. I think we should scan what the people want. The city archives of Amsterdam, which has the most complete and sophisticated scanning operations that I’ve encountered, has committed to providing researchers with what they want not by scanning everything (they estimate that it would take 406 years to do so for all 739 million pages in their holdings, even in an extremely robust production environment) but by scanning what the users want to see. After all, what if you want to see the 739 millionth scan? And in order to figure out what the people want, we need some minimal level of mapping. Not every file, not in crazy, tedious detail, but some indication of what’s in a collection

So, we’ve dispensed with much of the map. Didn’t that feel good? What else is a finding aid? What else does the archivist do? What else do our researchers need from us?


At the next level of abstraction, a really good finding aid is a guide. In this painting by Eugene Delacroix, we see Virgil leading Dante across the river Styx. I don’t want to take this metaphor too far, but I do think that there’s a role for the archivist to help researchers understand our materials by explaining the collections, pointing out pitfalls and rich veins of content, rather than just representing titles on folders.

I can see, in some contexts, that it makes sense for an archivist to spend quality time really understanding the records and explaining this understanding so that each researcher doesn’t have to wade through it every time. When I teach description, I urge workers to evaluate rather than represent records. For instance, does a correspondence series include long, juicy, hand-written letters wherein the writer pours his heart out? Or are they dictated carbon copies based on forms? A title of “Letter from John Doe to Jane Smith” doesn’t tell us this, but an archivist’s scope and content note can. It takes a lot of time to type “Correspondence” and the date a zillion times. Wouldn’t researchers prefer an aggregate description and date range with a nice, full note about what kinds of correspondence with what kinds of information she can expect therein? This is a choice to guide rather than map.

So here, we’re representing information about the collection that a researcher would need to spend a lot of time to discover on his own. And by the way, I’m not claiming a breakthrough. Seasoned archivists do this all the time. It’s also what Greene and Meissner were talking about in their 2005 article – our value is in our focus on the aggregate and the judgment required to make sense of records, rather than just representing them.

So to answer the original question, I would say that maybe, yes, maybe, researchers do want these kinds of finding aids where some of the sensemaking has already been done for them. The scale of archives is large, and it may indeed be inefficient to expect researchers to browse scanned document after scanned document to get a good understanding of what this all means together.

But there’s an even higher level of abstraction central to our role as archivists that should be included in our finding aids, which I rarely see documented comprehensively or well. This is the information about a collection that no amount of time with a collection will reveal to a researcher – it has to do with the archivists’ interventions into a collection, the collection’s custodial history, and the contexts of the records’ creation.


This last bit – getting to understand who created records, why they were created, and what they provide evidence of – really gets to the nature of research. These are the questions that historians and journalists and lawyers and all of the communities that use our collections ask – they don’t just see artifacts, they see evidence that can help them make a principled argument about what happened in the past. They want to know about reliability, authenticity, chain of custody, gaps, absences and silences.

This is the core work of archivists. This is what we talked about over and over again when I was in graduate school, and what has been drilled into me as the true value we, as archivists, add to the research process. We occupy a position of responsibility, of commitment to transparency and access. Researchers expect us to tell them this information, and we do a terrible job of doing so.

The above slide is based on the same corpus of finding aids at the Tamiment Library. While we did a great job of documenting what we saw before us, we did an abysmal job of explaining who gave us the collection and under what circumstances, how we changed the collection when we processed it, and what choices we made about what stays in the collection and what’s removed. And from what I can tell, it’s pretty consistent with the kinds of meta-analysis done by Dean and Wisser, and also by Bron, Proffitt, and Washburn in their recent articles analyzing EAD tag usage.


Like I say, communicating this in the finding aid is some of the most important work we do, and we do a pretty bad job of it. I have no reason to believe that my library is unique in this.

Because I also know, when I go to describe records, especially legacy collections that have sat unprocessed for a long time, I often have to do this by guessing. I’m like an archaeologist who tries to figure out the life of these documents before they came to me based on the traces left behind. It’s what I most want to explain, and what I often have the least evidence of.

This is an area where curators — collectors — whatever you call them — can intervene, where the best of breed are invaluable. After all, we’re not doing archaeology and working with the remains of long-dead civilizations. Creators, heirs or successors are usually around — they’re the ones who packed the boxes and dropped off the materials. Let’s make sure that we sit them down and talk with them then. Let’s make sure we’re getting all of the good stuff. Let’s make sure we really understand the nature of the records before we ask the processing archivist — usually a person fairly low in the organizational hierarchy, often a new professional, and almost always the person with the least access to the creator — to labor at reconstruction when just asking the creator might reveal all.

I have one short anecdote from my own repository to help illustrate this problem. In 1992, the Tamiment Library acquired the records of the Church League of America from Liberty University in Lynchburg Virginia. The Church League of America was a group created in the 1930s to oppose left-wing and social gospel influences in Christian thought and organizations through research and advocacy. The first iteration of the finding aid for this collection could be described as a messy map – a complicated rendering of the folder titles found in this extensive collection, without much explanation of what it all means and how it came.

Two years ago, before I came to Tamiment, my colleagues did a re-processing project. In doing so, they realized that these records had a rich history and diverse creators — far richer than what the finding aid had indicated. It turns out that the collection is an amalgamation of many creators’ work, including the files of the Wackenhut Corporation, which started as a private investigations firm and moved on to be government contractor for private prisons. The organization maintained files on four million suspected dissidents, including files originally created by Karl Barslaag, a former HUAC staff member, and only donated them to the Church League of America in 1975 as a way of side-stepping the Fair Credit Reporting Act.

Until re-processing happened, researchers had an incomplete picture of the relationship between private commerce and non-profit organizations that converged to become the lobbying arm of the anti-Communist religious right.

So back to our original question. Do researchers want finding aids qua finding aids? No, maybe, yes. They want the stuff, not descriptions of the stuff. They might want some help navigating the stuff. And they absolutely want all the help that they can get with uncovering the story behind the story.


Before I turn this over to Trevor, I want to add a brief coda about how we should be thinking of finding aids as discovery tools as long as we decide to have them.

Let’s start with a reality check – how are finding aids used? What do we know about information-seeking behavior around archival resources?

The first and most important thing that we know is that discovery happens through search engines. It is true that some sophisticated researchers know what kinds of records are held at what repositories – that the Tamiment Library holds records of labor and the radical left, or that Salman Rushdie’s papers are at Emory.  But I think that we can all agree that “just knowing” isn’t a good strategy to make sure that researchers discover our materials!


This was the understanding that we started with at Princeton (my previous job) when we decided to revise our finding aids portal. Previously, our finding aids looked like a lot of other finding aids – very, very long, often monograph-length webpages that give a map – and the better once (there were many better once there), would also be a good guide as well.

Basically, we decided to surrender to Google. We hoped that by busting apart the finding aid into the components that archivists create (collections, series, files and items), and letting Google index it all, our users would be able to come directly to the content that they want to find.


This is the dream. A researcher searches Google for George Kennan’s the Long Telegram, and we can give her exactly what she’s looking for, in the context of the rest of the papers. We also wanted the finding aid to be actionable – a researcher can ask a question about the material, request to see it in the reading room, and, if it had been scanned, would be able to look at images directly in the context of the finding aid.


In this case, you can see a report on Jack Ruby from Allen Dulles’s Warren Commission files.

While we’re putting so much effort into making our finding aids into structured data, let’s make our finding aids function as data. Let’s make it so that we can sort, filter, compare, comment and annotate. Why do we take our EAD, which we’ve painstakingly marked up, and render it in finding aids as flat HTML?



Let’s work together to take the next step, to think critically about the metadata we’re creating, and then make sure that it’s readable by the machines that present it to our users.