Processing Levels: The Hows and Whys

It’s no surprise to anyone who has been reading this blog that I am a firm believer in building a processing program that relies heavily on minimal processing techniques, including developing a program that applies different levels of processing to collections, or parts of collections.   Describing our collections is one of the most important things that we as archivists do, and also one of the most time-consuming and expensive. We want to make sure that our time and intellectual capital is being well spent, and I firmly believe that the thoughtful, intentional application of processing levels is a really important way to ensure that.  This leads to more accessible collections, encourages collection-level thinking and analysis, and opens up archivists’ time to work on big, strategic projects.

Standards like DACS encourage this kind of collection-level thinking and support different levels of arrangement and description, but there’s not a lot of advice out there about how and when to apply each of these levels (though the University of California’s Efficient Processing Guidelines, as always, does a great job of this). How do we decide if a collection is a good candidate to describe at the collection level versus the file level?  And what principles guide these decisions?  Here I’ll give you some thoughts into the principles I’ve used to build a levels-based processing program, and then some criteria I use when making decisions about an individual collection.

Thinking Programatically

Put Your Thinking Cap On:

Start by analyzing the records (their content and context) at a high level. What does the collection as a whole tell us? What are the major pieces of the collection and how do they relate to each other? Why does that matter? How can be best expose all of this to researchers? I’m not gonna lie- this is hard. This is way harder than making a container list. However, it brings a lot more value to the table. Archivists are trained to understand the ways that records are created; and to assess their potential value as evidence, information, and/or as symbols. Often by doing this higher level intellectual work at the outset we can create very robust and meaningful description that exposes how the parts of the whole of the collection come together and how they function without doing a significant amount of work in the collection.

Define Terms and Build Consensus:

Be clear about what you mean by a level of processing. It is critical that all stakeholders—archivists, curators, research services staff, donors—are all on the same page about what it means for a collection to be arranged and described at a certain level. This means defining and documenting these terms and circulating them widely throughout the organization. It also means being clear about amount of both time and money required to arrange and describe collections to different levels.

It’s also very important to involve institutional stakeholders in your decision making process. Assessing stakeholder needs and managing expectations from the outset helps to ensure that processing projects go smoothly and end with happy stakeholders. In my institution this generally means that the archivists work with with curators to make sure that we understand the value and needs of the collection, that they understand what I mean by a certain level of description, and that I can clearly communicate how more detailed processing of one collection impacts the time and resources available to address other projects that individual has a stake in.

Always Look For the Golden Minimum:

I always approach assigning processing levels by determining what the goals for a collection are (determined in conjunction with stakeholders!) are and what path provides the lowest set of barriers to getting there.  Greene and Meissner call this sweet spot where you meet your stated needs with the lowest investment of time and resources the golden minimum and this should be the goal of all processing projects.

Processing is Iterative:

This is huge for me. I go back and tweak description ALL THE TIME. Sometimes I’ve straight up misjudged what amount of description was necessary to facilitate discovery, sometimes research interests and needs change and the old levels of arrangement or description didn’t cut it anymore. Your needs change and evolve, the researchers needs change and evolve over time, the institutional context changes, sometimes you realize that something, for whatever reason just isn’t working. You always have the option to go back into a collection and do more. You never, however, have the ability to recapture the time that you spent on a collection up front, so be thoughtful about how you apply that time to best meet the needs of the institution, the researchers, you and your colleagues, and the collection itself.

Arrangement and Description are Not the Same Thing:

And don’t need to happen at the same level nor happen at the same level across all parts of a collection. A collection can be arranged at the series level and described at the file level. Or completely unarranged but described at the series level. By breaking apart these two aspects of processing we have more flexibility in how we approach and make available collections, and we can be more efficient and effective in managing each individual collection and serving our users.

Discovery and Access are Key:

At the end of the day, the main things to keep in mind when determining the most appropriate processing level are discovery and access. The main goal of any processing project is to give users enough information in our description to both identify the material they are most interested in, and to be able to put their hands on it. How much description is necessary to find relevant material in a collection? What do you need to know to successfully retrieve that relevant box?

Making Decisions at the Collection Level

Now that we know why we’re doing this, and what principles are guiding the application of processing levels, here are some criteria that I use to determine what the most appropriate levels of arrangement and description for a collection are:

  • Research Value and Use: If a collection has a high research value and you anticipate heavy and sustained use, it may well be worthwhile to invest additional time and resources into it in at the outset. This is especially true if the collection is not well ordered or is difficult to access.
  • Institutional Priorities: While I tend to default towards more minimal processing most of the time, there are plenty of internal factors that may encourage me to apply more detailed levels of processing. A flagship collection in an area where we are trying to build collections, if material from a collection is going to be featured in an exhibition, how much staff time needs to be devoted to other projects, how administrators allocate resources—all of these may affect processing decisions.
  • Restrictions: If a collection has significant access or use restrictions, or if there is a high likelihood that there are materials in the collection that would fall under legal protections such as FERPA or HIPAA (especially if these items are scattered throughout the collection) you will need to arrange the collection at a more granular level to ensure that you are doing your due diligence to meet your legal obligations.
  • Material Type and Original (Dis)Order:   The nature of a collection and the state in which a repository receives it will also, to some extent, determine the amount of archival intervention that it needs to be usable. If a collection arrives foldered, but entirely without a discernable order, it may require at least a series level sorting to enable a researcher to navigate the collection and locate material of interest. This also means that a collection that arrives unfoldered or without any organization will require more granular processing in order to be able to provide meaningful access. If the material is pretty uniform a collection level description will probably suffice. In general, the greater the diversity of the collection the more description is required to render the collection intelligible.
  • Size: I try not to make too many blanket decisions based solely on the size of a collection, but it can be a factor in determining processing levels. A collection that is only one box will not need a tremendous amount of description beyond the collection level because a researcher will only need to request one box to see the entirety of material available—tons of additional description is not going to aid in locating material. Conversely, a collection where one series spans hundreds of boxes will need additional file level description so that a user can isolate and access the part of that series that he or she needs.

These are some of the things that I take into consideration in my role as a manager at an academic special collection library. Other types of repositories and institutional contexts may well have other needs and different criteria. Feel free to add or expand in the comments!

Extensible Processing: Who is Involved and Who Cares?

So earlier in this series Maureen looked at the chapters dealing with why repositories should implement an extensible processing program and Meghan looked at the chapters that talk about the hows of implementation. I am focusing here on who is involved in implementing and maintaining an extensible processing program. My review focuses on Chapters 6-8, sections that in one way or another assess the ways that an extensible processing program plays well with others, from the professional community and its systems (through the rigorous application of standards based description), with repository staff and administration (through effective management of staff and advocating to management and administrators), and with users (through seeing online digitized content as an end goal of the processing process).

One really important aspect of this book is that it makes a very serious case that while archival collections may all be unique, the ways that we approach them are not. The fundamentals of our work stay the same as does the end goal of quickly and effectively serving our user communities.  Extensible processing techniques are carried out in similar ways at the collection level and the repository level, and they are supported and guided by widely accepted professional standards. While some detractors of baseline processing and other extensible processing techniques claim that these approaches are incompatible with standardized archival practice, Dan moves point by point through the most relevant sections of DACS explaining why the careful adherence to standardized description, far from being incompatible with minimal processing, in fact undergirds the entire enterprise of an extensible processing program. Archival descriptive standards are specifically designed to be flexible and to accommodate a range of levels of description and local practices. If they work right, and we do our jobs, they provide a way for the entire professional community to participate in and guide the principles behind individual processing programs at individual repositories.

So this sort of processing program is firmly based in broad professional standards, but on a more localized level there are any number of people that are involved in arrangement and description work.  Chapter 8 focuses in on the repository level, and addresses how to lead, administer, and manage and extensible processing program, with a major focus on project planning and management.  This section highlights one of the real strengths of the book– its concrete, realistic, and implementable advice. Santamaria walks the reader through various decision making processes, discusses criteria for priority setting, lays out specific elements of a processing plan, discusses resource allocation and personnel decisions, and how and why to adhere to firm timelines. This chapter is an excellent road map for a manager interested in talking the principles throughout the book and making them a reality. The specific suggestions are supplemented by a series of appendices that provide examples of processing plans and other forms of documentation to assist archivists in codifying their practice and moving towards and extensible processing model.  This is a chapter I will be coming back to and reviewing when I need to manage new projects, create buy-in from staff, and advocate for extensible processing procedures to my management and administration.

The final people affected by our arrangement and description decisions are, of course, our users. Chapter 7, Digitization and Facilitating Access to Content, investigates user expectations around digital delivery of archival content (and our remarkable failure to meet them). Dan not only calls for digitization to be an integrated aspect of archival processing work (rather than a separate program) but frames this argument, usefully and importantly as an ethical consideration of equitable access to collection resources. He states that

Just as with processing, if our goal is to provide ‘open and equitable access’ to collections material, archivists need to use all the tools at our disposal to allow researchers of all backgrounds access, not just those who can afford to travel to our repositories and visit during the work week. [1]

He then goes on to suggest models for broad digitization and concrete suggestions for how repositories can work digitization into workflows, work with vendors, and manage privacy and copyright issues, but, for me, the heart of the chapter is the same message that is the heart of the book and of this processing model as a while, the insistence on equitable access.

These three chapters clearly articulate that the adherence to standards, the focus on end-user access, and the high levels of planning and management acumen that go into an extensible processing program serve to reiterate to the archival community that minimal processing is not lazy, sloppy processing. Dan reminds us, in what I think is one of the most important lines in the book, that

In an efficient extensible processing program the intellectual work of arranging material into broad groupings takes the place of neatly ordering items in folders and folders in series [2]

As archivists we add value to collections by applying our knowledge of how people and organizations work and how to think critically about the records that they create in that process. As a community we need to use our real professional skills to assess the records that our repositories hold. Quickly and competently assessing the nature of records is a difficult and skilled high level work; refoldering is not. We need to focus our professional skills and our repositories’ resources where it counts and where it is most likely to provide value to our various communities of stakeholders.

[1] Santamaria, Daniel A. Extensible Processing for Archives and Special Collections: Reducing Processing Backlogs. Chicago: ALA Neal-Schuman, 2015, 85

[2] ibid, 72

Sorting, Adding, and Integrating Container Lists

We recently received a large (220 record carton) addition to one of our collections of publishers’ records. The files were in good order in well labeled folders so it wasn’t a matter of having to do any complicated arrangement and description work with the records, and we definitely weren’t refoldering, just compiling a box list and integrating that box list into the existing finding aid.

One of our students went through the containers and created an inventory, but then we hit what is a pretty typical challenge with these sorts of accessions, and with additions to existing collections more generally. I had a box list, but the boxes themselves weren’t received in any particular order so while obviously the records had been stored alphabetically, the list I received (in a semi-structured Microsoft Word document) reflected that on the box level, but not as a whole. So I wanted to take that box list and alphabetize the entire list by file name (which, in this case since these are editorial files for a publisher, was also by the last name of the author). Complicating this was the fact that there were nested files within the document and I wanted to keep those files grouped with their parent files and only alphabetize by the parent component. Also, I wanted to be able to integrate this list with the existing EAD container list– ideally to insert the new files into the container list in proper alphabetical sequence.

Because part of my data (the existing EAD container list) was only available to me in xml, I first converted the Microsoft Word container list into EAD as well so that all of the data I was working with was at least in the same format. I then pasted all of the code together into one document.

Then I created a new open refine project by opening my xml document in Open Refine– when importing xml into OR it asks the user to click on the first element to load, so I skipped my dsc tag since I was working on the component level and selected my first <c> tag and then “create project” to get something that looks like this:



But then I need to scroll right, because I am dealing with hierarchical description with nested sub-components and I want to alphabetize by that top level hierarchy– what is represented here as the c-did-unittitle


To sort the list alphabetically a this point you need to click on the arrow at the top of that column, sort by text a-z

Voila, the file list is now sorted alphabetically, with the nested subcomponents grouped together as part of the record for the parent component.

Up at very top go to “sort” and preserve the reordering permanently so that when you export it is still alphabetized (you can also do a custom export that will allow you to only export certain columns and to reorder as necessary). Awesome, so now I have a spreadsheet that has taken all of my data, integrated it, and alphabetized it.

Now, however, I need to get it back into xml so that I can insert it into my container list. We’ve covered this ground before– Mo has a stellar tutorial for this spreadsheet to EAD thing, and another intrepid Chaos–>Order commenter (this blog is one of the few places where you actually should read the comments) brought to my attention this fantastic site, or you can export what you have into an Excel sheet and import it directly into Oxygen and massage from there.

So I converted my spreadsheet back into EAD for upload to our finding aids site.  This solution didn’t end up being perfect–in all of the format conversions I lost some nesting of components that I had to go back and deal with manually. There was also some sloppy data entry that I didn’t catch before I started working through the data– but these are things I can solve either through better management or a bit more experimenting with conversion processes. Even though this wasn’t perfect, it was a fast solution for dealing with a very common and often troublesome scenario.

Advocacy and the Middle Manager

So we’ve been talking a lot about professional and workplace issues that disproportionately affect women the last week or so.  Meghan’s last two posts in particular are pretty stark reminders that there are repository-specific, institutional, professional, and federal policy levels that work against equitable workplaces.  Meaning it is time for those of us in any kind of management roles to be asking ourselves what our responsibilities as supervisors are and how we can address some of these issues for ourselves and for our staffs.    

I supervise a unit that includes about 7 full-time professional archivists (give or take depending on the ebb and flow of project gigs) in a large special collections library housed within a massive university library system.  Large, structured, hierarchical organizations are designed to make most workers invisible to upper management.  The sort of middle-management type role that I fill in my organization functions, both intentionally and unintentionally, to depersonalize a lot of the work that goes on in my unit and to create buffers between types of staff in the library.  In many ways this is a good thing– I don’t need an Associate University Librarian weighing in on something like how to structure external links in finding aids– we can handle that, thank you very much.  

The flipside of that is that the more human needs of staff are generally not seen by the policy makers at an institution.  Obviously most of us, even those of us who are in management positions, are not the deciders on HR policy.  But that doesn’t mean that we can’t instigate any change in our own offices or institutions.  I’ve always approached my job as a manager with the understanding that I am an advocate first and foremost.  I get people what they need so they can do their job right and try to take away any barriers that prevent getting the job done.  This advocacy role is especially key when thinking about work-life balance and workplace accommodations.  And since us low and middle management types are, as direct supervisors, the ones most likely to be aware of issues and concerns that impact our staffs we are the ones who have the responsibility to make some noise and, hopefully, make some changes.

But what can we do?  What are some concrete steps we can take?

  • Be an advocate!  Take employee concerns to managers, directors, and HR officers and make it really clear that you support your staff and demand that an institution make reasonable efforts to be supportive as well.
  • Be sensitive.  If someone who reports to you is coming to you with a medical, family, or other personal issue looking for some kind of an accommodation that person is likely in a position where he or she is already feeling vulnerable.  Respect that this person may not be comfortable or able to go to higher level decision makers.  Also be aware that someone may need to share news about health or family issues with a supervisor before they are really ready to make that news public.  Show some discretion.
  • Think about options and alternatives and try to be flexible in order to meet employees’ needs.  Is there anything mission-centric that an employee can do from home one day a week? Are there ways to adjust schedules, negotiate lunch breaks, or otherwise figure out ways to work with schedule requirements?  Do you have staff around who can assist with heavy lifting if an employee is unable to manage that for whatever reason, or find other alternatives?
  • Be aware that sometimes there are outside factors that will hamstring you.  You probably have less flexibility and fewer options with hourly support staff than with professional employees.  Your institution may not have bereavement days and may force an employee to use vacation time.  You can’t change everything. It sucks, but it’s true that there are things you can’t do, but recognize what those are and focus on what you can change.
  • Ask.  Our institution recently changed the way that we accumulate personal days, resulting in a staff member having bought plane tickets for a vacation she no longer had the PTO to cover; HR worked with her.  When our management team was unwilling to adjust a staff member’s security level to allow her to enter the library early (to leave early to care for a family member) we located office space outside the security perimeter so she had more flexibility.  It never hurts to ask for what you want or what your staff needs.
  • Listen.  People have a sense of what an ideal solution would look like most of the time, respect that and hear them out, try to work with it.  Don’t be a pushover or give in to suggestions or demands that would negatively impact other parts of your work or harms other staff members, but be open.  

Most of all remember that with slight power comes great responsibility.

Lean where, now?

At and around the most recent Society of American Archivists annual meeting I heard and participated in a lot of great (and not so great), official and unofficial conversations about being a women in the archival field and being a female professional in general. Between the discussions about the availability of childcare and lactation rooms that happened in advance of the meeting, the Women Archivists Roundtable meeting that focused explicitly on helping women self-promote even in uncomfortable situations, and the Lean In Panel that prompted so much discussion, feminism became a sort of subtheme of the conference for me.

As it turns out, it is a pretty good moment to be talking about the intersection of women’s workforce issues, technology, and professional advancement. Whether it’s Yahoo’s Marissa Mayer’s two weeks of maternity leave or Sheryl Sandberg’s advice to young women to “lean in,” strong, professional women talking about workplace issues have been in the national news headlines with some frequency as of late.  These women and their colleagues, commentators, and critics have sparked a healthy national debate and plenty of heated water cooler conversations.

I am thrilled that these conversations are happening. We all need to be willing to share our experiences, encourage our colleagues, and call out bad practice when we see it. I am also particularly gratified that these conversations are highlighting highly successful women with serious skills in realms often thought of as guys’ domains such as the tech world. What I am less excited about is how quickly we have accepted Sandberg’s “Lean In” model as the logical and appropriate framework for that debate.  I cannot believe that encouraging young professionals to make some serious sacrifices in the areas of family, relationships, friendships, personal time, and leisure of all sorts and instead to go into professional overdrive all the time is the path to more equitable workplaces or happier, healthier professionals . Not all women have the ability to follow that path, and I seriously doubt that most of us want to.

I seriously question a model that asks us (all of us, women, men, black, white, religious, queer, ALL OF US) to figure out how to succeed in a system that doesn’t take our needs as workers and as people into consideration. Perhaps even more importantly, I find it very dangerous to posit a critique of structural, systemic inequities and then suggest solutions that are about individual action not systemic change. Sure, learning how to lean in will help many of us to assert ourselves more often, to negotiate a bit harder, and to promote our work more stridently, but honing those skills is not going to alter the wage gap, change parental leave policies, or eliminate all of the invidious, subtle forms of discrimination that we face in the workplace.

We don’t need to lean in as much as we need to change what it is we are leaning in to.

Baby Steps in Using Openrefine to clean up collection data

As I mentioned in my last post, my latest collection management project is making sure that we have collection level records for everything in the repository, which I am doing through creating accession records in Archivists’ Toolkit (I chose accession records rather than resource records based on a set of legacy decisions about how the institution uses AT, if I was starting from scratch I probably would do that differently).  The piece of the puzzle that I’ve been working on recently is integrating the records of one of our sub-collections, the Bakhmeteff Archive of Russian and East European Culture, into AT.

The manuscript unit took over accessioning materials for this collection in about 2007, so collections  that have been acquired in the 7 or 8 years do have an AT record, as do about 60 older collections that were added as part of a separate project.  So, my first step was to figure out which Bakhmeteff Collections already had a collection level record and which ones did not.  Since there was not one comprehensive collection list, this involved checking our AT records against a whole slew of other data sources* to see what was already in the database and which collections still needed collection descriptions in Archivists’ Toolkit.

The next issue was to figure out the best way to identify duplicate records.  In looking at all of the source data it became clear very quickly that way that the title was expressed across all of the data sources I was working with varied wildly — sometimes expresses as “John Smith Papers” sometimes “Smith, John Papers” and, in the case of many of our older catalog records, just “Papers” with John Smith living in the 100 field and not reappearing in the 245.  Some sources used diacritical marks and some didn’t (always thorny, but with several hundred collections in Russian a potential dealbreaker).  Therefore I chose to use the collection number rather than title.  The one issue with that is that I was using AT accession records, not resource records, so the collection number was expressed as part of the title filed (I know, I know) and had to be broken out into its own column, but not a huge deal.  Once I had that as a master document I could combine this spreadsheet and my other data sources and then use Open Refine to facet my spreadsheet by ID number to identify (and eliminate) any collection that shows up both in AT and  in one of my data sources.  I then had a comprehensive list of collections not represented in AT so that I knew which collections needed to be added.  It’s not a perfect solution, but it is a down and dirty way to identify where I have work to do so that I am not having a student manually check every collection against different data sources to identify what needs a better record.   It also let me combine data from all sources to come up with a new master collection list to work.  Plus it was a good, baby-steps introduction to using OpenRefine.


*Since information was coming from so many sources, and because I didn’t trust the completeness of any of them, I was checking our AT holdings against accession registers, a collection number spreadsheet, our stack directory, and a list of collections that I asked one of our fantastic systems librarians to generate for that that queried our catalog for any record that was cataloged with a bibliographic level of c: collection in the Leader, and had a location code that tied it back to our library.

The Legacy Landscape

When thinking about legacy data I have a whole lot of “legacy” to contend with; the Rare Book & Manuscript Library was officially established in 1930, though our collections date back to the founding of the University in 1754. There are 12 different units  within the library, and, for a long time, collection management was practiced on the “unit” rather than department level with different curatorial units keeping their own collection lists and donor files. This all leaves me with a LOT of non-standardized data to wrangle.

This legacy data is scattered over various spreadsheets, shelf lists, catalog records, accession registers, and accessions databases (one homegrown, the other Archivists’ Toolkit). This means that as much information as we have about our collections it is all dispersed and I’ve never really been able to answer questions like “how big is your collection?” “How much of that is processed” or “how many manuscript collections does the RBML hold?” I know that’s not super unusual, but it can be frustrating; especially because we do know… sort of… I mean, let me get together a bunch of spreadsheets … and then count lines…. and most of our collections are represented on at least one of those… well, except the ones that aren’t…or are there multiple times…and the binders– don’t forget the binders! I think a lot of you know what I mean. We HAVE collection data, we just don’t have it stored or structured in a way that can be used or queried particularly effectively, and what we do have is in no way standardized.

Getting intellectual control over the collection has been an ongoing project of mine since I started my position, but it has ramped up as we have been starting to think more seriously about making a transition to ArchivesSpace. If we are going to make a significant commitment to using a new collection management system the first step needs to be to have the ability to populate that system with reliable, complete, and standardized information about our collections. This has led me to spend a significant chunk of my time the last year either updating our AT accession records to make sure that they are compliant with DACS’s requirements for single-level minimal records, or adding collection information from other sources into AT so that we have collection-level information about every collection in one place. I chose to go with minimal rather than optimal records since we are not using AT to generate finding aids or to hold the bulk of our descriptive metadata or any component metadata (more on that later!). My goal here is to get baseline intellectual control over our collection, so I am keeping the records lean. I am, however, adding both processing status and physical location in addition to the DACS required elements so that I can more easily determine the size of our backlog, set processing priorities, and to encourage myself and my colleagues to get used to looking in one central place for all key collection information. More about some of the strategies I’m using to make all of this happen in upcoming posts!