Rehousing is Not Processing

This post has been stewing around since last July, but builds nicely on our Extensible Processing book reviews and Maureen’s post on containers.

In supervising processing projects I’ve come across some wacky decisions made over the years. While I’ve started projects from the beginning, a large portion of projects I’ve supervised have been already in progress, started under someone else (or multiple others).

One reoccurring issue I’ve noticed is how often people say processing when they mean something else or they mean a very small part of processing. Rehousing is a common culprit, but other activities fall into this too. Here’s two examples of “processing projects” that mostly aren’t at all:

1. Collection was processed in the 1970s. A paper finding exists with notes, series descriptions, and a folder (or item) level container list. Materials are in acidic boxes and acidic folders, but labeled. Actions taken during the project: Rehoused all materials into acid-free folders and acid-free boxes, including creating additional, smaller folders. Changed the series arrangement and physically re-arranged materials. Excel container list was created by retyping (retyping!) the content of the paper finding aid  and adding in the newly created folders for conversion to EAD.

So many times I came across similar ongoing processing projects with a justification that the materials needed better housing. Often, other tasks got tacked on such as redoing the series outline (even if there were no additional materials to add and no evidence that the current series outline had issues.)

2. Preliminary inventory in Word exists at the folder level using creator created titles for a large volume of organizational records. Finding aid existed with summary notes and linked out to a PDF of the preliminary inventory. Actions taken during project:

  • Collection was rehoused into acid-free folders, staples removed, some preservation photocopying done, oversize materials removed and rehoused separately (separation sheets completed)
  • Materials were reviewed on an item level and marked as restricted. Some redaction might have happened. Sometimes the restricted materials were removed to a new folder with the same title and marked as restricted (using separation sheets in the original folder). Sometimes the restricted materials were left in place and the whole folder was labeled restricted.
  • Excel container list was created by retyping (retyping!) the exact information on the folder (aka the exact information already in the preliminary Word list) as materials were re-foldered. Largely, the creator titles were kept with some additions. Dates for folders were added or edited. Excel list will be converted to EAD.
  • Folders were physically grouped by letter of alphabet based on the folder title. Ex: All the folders starting with “A” are physically together in “A” boxes, but not in actual alphabetical order yet. (Currently, those folders are being arranged in alphabetical order in acid-free boxes. Look for an update on how long/how expensive just this one phase takes!)

Both of these examples were large projects that occurred over many years (often with pauses due to turn over and lack of resources). Looking back, what value did we add? The collections are in more stable housing than before and in one case we know more about restricted material. But, otherwise, what have we gained for our users that we didn’t already have?

Essentially, these were called processing projects but are really rehousing and restriction review projects. Not projects to create access to materials or bring intellectual or physical order to the materials. After all, they both already had a documented intellectual and physical order that should have been described in our finding aid notes (at whatever level.)

What we should do instead:

  • Put resources towards creating access to materials over rehousing materials.
  • Develop a baseline housing standard that you can live with. It might be that all materials are in acid-free boxes. Or maybe it’s just that your boxes aren’t falling apart.
  • Get over the idea that all collections need to be physically arranged and re-housed during processing (or re-processing). Rehousing a collection into acid-free folders and/or acid-free boxes is not the main goal processing. The task does not create access to collections or describe the materials. It’s housekeeping. It’s not necessary to include in a processing project.
  • Specifically state what rehousing tasks will occur in the processing plan and at what level. Justify spending processing resources on this. Don’t include it just because you’re used to including this task during processing.
  • Prioritize materials, at a repository level, that risk severe damage or information loss due to current housing based on importance. Develop a specific budget/set of resources for this type of work. Tap into the resources of your preservation/conservation department when available.

When facing resistance to not including rehousing in a processing project numbers are your friend. “Do we want to rehouse this collection that’s already pretty stable or do we want to take those resources and create access to more collections?” is often too abstract for people. Attaching actual costs to rehousing work (labor AND supplies) can help to push people resistant or nervous about dropping rehousing to focus on activities that create access. Treating rehousing work as separate from processing can also help to decouple the idea that your intellectual and physical order must always match.

Advertisements

Put a strategic plan on it!

People who know me will know I love strategic planning. Or, more accurately, I love good strategic planning and how a strategic plan can assist you in many other activities.

Given that our library’s strategic plan is a few years old and our dean is retiring in the spring, the functional areas of SCUA didn’t want to wait for the whole library process to move forward. Luckily, there’s no rule that says you can’t have a strategic document for levels below the top or division/department.

While we didn’t go through a full blown strategic planning process, we had run many brainstorming, visioning, and planning activities over the last year and a half. Many of the projects in our document were already approved (officially or unofficially) and represented in individual and unit work plans.

Why did we need a plan then? When planning projects or allocating resources we seemed to encounter a few challenges. The biggest (to me) were a lack of understanding about:

  • The difference between work that is strategic to move a program forward v. the prioritization of regular ongoing work/projects
    • ex: processing the so and so papers may be a high priority on the list of collections to process, but this does not necessarily make that specific processing project a strategic priority
  • How the work of different functional areas within SCUA directly relate to one another, supports the work of the entire department, and how each unit/function can participate in meeting shared goals.

We determined three strategic directions across our work:

  1. Optimize the user experience
  2. Increase access to collections
  3. Expand knowledge of our collections to new audiences

Check out the full Strategic Directions for SCUA Functional Areas 2014-2017.

Here’s how I’m hoping to use our strategic directions document:

  • Raising awareness about what we do, why we do it, and its value within SCUA and the Libraries
  • Assist in developing annual work plans, how we spend our time, and evaluating our progress
  • Prioritization of pop up/new projects. Is it really a project that will move us forward? Does it have to happen right now? Can we approach it differently than before? What do we STOP doing from our strategic directions or regular/ongoing work to accommodate it?
  • Use as a tool for updating specific policies, procedures, and workflows highlighting how these changes support the activities and goals outlined in the strategic directions.
  • Advocating for resources at various levels within the library. Our AUL has already said this document will be extremely helpful as the libraries start to discuss priorities for fiscal and human resources for FY16.

Also, a hat tip to UCLA’s Library Special Collections strategic plan! We liked their presentation/formatting, so borrowed that for ours. Don’t reinvent the wheel!

 

Clean Up: Date Instructions for Accession Records

In an effort to be transparent (and highlight the volume of work) I’m attempting to document all of our cleanup instructions and decisions. For each ArchivesSpace field we’re including actions and questions as we explore our data. Some of the questions focus on how ArchivesSpace works or implements fields while others focus on our specific data, policies, or procedures. Over time, the questions turn into actions and we are continually updating our documentation as we work through these issues.

Below is what we have so far for dates of materials in accession records. We still have lots of questions (and will no doubt develop more) so feel free to provide any feedback or suggestions.

Actions

Dates from Beast fields unitdateinclusive and unitdatebulk will parse into date 1 and date 2 fields each including:

  • date_1_begin
  • date_1_end
  • date_1_type (required by AS)
  • date_1_expression (required by AS if no normalized dates)
  • date_1_label (required by AS)

Adhere to required date formats and definitions for ArchivesSpace fields

  • Control begin and end fields for date format: YYYY, YYYY-MM, or YYYY-MM-DD
  • Single dates do not need a date end
  • Control date expression based on local convention [revising current local format]
  • Split date ranges using “-” as delimiter and designate as “inclusive” or “bulk” in date type, based on what column they came from. Use date 1 fields for inclusive and date 2 fields for bulk
  • For values with commas, “1950, 1950” parse to two single dates using date 1 fields and date 2 fields
  • Label defaults to “creation” unless otherwise specified

For dates that include “undated”

  • Keep “undated” as part of whole statement in the date expression field.
  • Parse remaining dates as normal and remove “undated” from begin and end
  • ex: “1857-1919 and undated” remains as the date expression, 1857 goes to date_begin, 1919 goes to date_end, type is “inclusive”

Certainty

  • Assume that all collection dates in accession records are best estimates [Policy decision]
  • remove all forms of “circa” from accession dates
  • remove question marks and variations
  • remove square brackets and variations indicating guessed dates
  • remove “undated” if it is the only value
  • remove “unknown” if it is the only value

For dates listed as decades

  • Control decades to year spans. Use the first and last year of decade.
  • 1940s-1950s becomes 1940-1959 in date expression, begin 1940, and end 1959

If bulk dates are the exact same as inclusive, delete bulk dates

Questions

What date cutoffs do we use for partial decades?

  • ex: Late 1990s, mid 1970s, early 1930s
  • late = 1995-1999
  • mid = ???
  • early = 1930-1935

If bulk dates exist for single items, when to delete or not?

  • Will delete if same
  • Should we keep if there is a difference?
  • If a difference what is the year cutoff? 1? 5? 10? etc……

Are single dates with “undated” really single?

  • ex “1941 and undated”

Can we have “bulk” dates that are “single”?

  • ex: 1989, type as “bulk” in ArchivesSpace?

For date expression, can we agree on the preferred date formats?

  • Start with guidelines in processing manual
  • Update and make suggestions for changes
  • Solicit comments/feedback
  • Finalize decisions
  • ex: 1880s, 18th-20th Century, Oct. 1976, Sept-9-1980, May 5, 1990

What if accession is a single item with a date range, but the abstract gives single date?

  • ex: Edwin Warfield accessions, range 1875-1912, abstract for accession says 1889 for a single item. Assuming that this date range was for all Warfield papers? Ignore and take date from abstract?

What do we do if we have more dates than fields?

  • ex: single dates of 1966, 1967, 1969 or 1930, 1950, and 2002 would parse to three single date fields
  • Version 1.0.7.1 currently only imports date 1 and date 2 in CSV accession template
  • When do we want to turn single dates into a range instead? How many years in between? Based on size of materials? Never and develop procedure for adding dates beyond second into ArchivesSpace record post import?

 

In the next post we’ll go through some of the specific ways we are executing the clean up actions.

 

Date Formats

Now that we’re eliminated most of our duplicate bulk dates let’s take a look at the plethora of date formats in our accession records. Does your repository have local definitions for how to record collection dates? My guess is most would have something somewhere, even if not a formal written policy. We have a small section of our processing manual detailing preferred date formats and unacceptable date formats. It is suppose to apply to collection dates in accession records and in finding aids. Do people follow it? Sometimes. Usually not. Or they follow some of it, but not all of it. We also have dates created before the processing manual existed. The samples below are just from a portion of our accession records, so we might have additional formats yet to be discovered, but you’ll get the idea.

Our date fields could contain three elements: year only, month and year, or month, day, and year. The type might be a single date, multiple single dates, range, multiple ranges, or a combination of these (although that isn’t specified). For dates in accession records I have already gone ahead and removed any variation of the word “circa”. There’s also a healthy amount of “unknown” and “undated” speckled throughout.

Element, type unitdateinclusive (Beast field)
Year, single 1909
[1922]
1636 (approx.)
1920[?]
1940?
1946, undated
1957(?)
1999?
Year, multiple single 1913, 1963
1945 or 1946
1953, 1961, 1969, 1994
1954, 1956, 1966-1967, 1971
1958, 1960, 1962
1966, 1967, 1969
1967, 1968, 1969
1969, 1970
1995, 2000, undated
Year, range 1910-1950
1920s
1920s-1930s?
1921-1981 and undated
1940’s-2006
1980’s-1990’s
2000-2001 (FY 2001)
Early 1970s
late 1980s-early 1990s
undated, 1970s-2002
Year, multiple range 1920s, 1969-1975
1932-1934, 1950s
1937-1942; 1947-1950
Year, single and range 1928; 1938-1962 and undated
1938, 1950-1951
1950s-1960s, 1988
2008 [1901-2002]
Month Year, single November 1962
April 2001?
Month Year, range January 1977- November 1981
May2005-January 2007
Otober 1920-Marh 1921
Month, Day, Year, single 11/9/1911
June 14, 1924
Marh 8, 2006
Otober 26, 1963
Month, Day, Year, multiple single 12/19/2005; 4/4/2006
January 5, 2000,  July 12, 2000
9/19 & 9/20/2007
Month, Day, Year, range 10/24-10/26/2008
January 30, 2011-February 2, 2011
Marh 22-24, 2001
Otober 13, 1987-Deember 7, 1987

Here’s a summary of the issues:

  • Punctuation is not standard. Multiple dates may be separated with a period, comma, semi-colon, ampersand, or the word “and”.
  • We used a variety of methods to convey we were unsure of the date, such as ?, (?), [ ], [?], (approx.) in addition to all the circa variations. I’m guessing there are other dates we weren’t sure of, but we didn’t specify that.
  • Spacing isn’t consistent. Sometimes there are no spaces around punctuation, others times one, two , or more spaces.
  • Spelling. Sometimes we just couldn’t spell October or March (the most popular offenders apparently)
  • Formats are all over the place, even comparing the same element and type. Ex: March 22-24, 2001 compared to March 22, 2001-March 24, 2001.
  • Use of decades was a common practice.
  • Providing single dates instead of ranges. Do we really need to say “1966, 1967, 1969” instead of “1966-1969” if we’re only missing 1968?

Next post we’ll talk about the instructions and rules we’re developing for cleaning this up and how we go about executing those decisions.

Clean Up: Inclusive and Bulk Dates Comparison

Let’s start with a more straightforward cleanup issue identified during our accession record mapping. In this example, we’ll use a set of accession records (only a portion of our total) we have exported from the Beast into an Excel spreadsheet and focus on two fields: <unitdateinclusive> and <unitdatebulk>. We’ll map these to date fields in ArchivesSpace, but before we get to that let’s examine the data.

This spreadsheet contains 3361 accession records. 2685 rows have an inclusive date and 1908 rows include a bulk date. By sorting the spreadsheet by date and spot checking the data, we’ve come up with a list of inconsistent date formatting issues. One of the most pervasive habits was to always fill out both the inclusive and bulk dates, even if the values for each were the exact same. (For now, ignore the other date formatting issues in these examples.)

same inclusive and bulk dates

Supplying this information twice isn’t necessary for our users and could be confusing to some (plus it is extra work for us!) DACS 2.4.10 suggests providing a bulk date when dates differ significantly from the inclusive dates, so we want to keep the bulk dates that are different than our inclusive dates while removing the duplicate values.

We could compare these by hand (done that before!) or use a formula in Excel to do the work for us:

=IF(A2=B2, “same”, B2)

This formula asks if the value in <unitdateinclusive> equals the value in <unitdatebulk>. If they are equal, return the value “same” and if they are different return the value of <unitdatebulk>.

After dragging down the formula for the entire sheet I then copy the results of this new column to another one, utilizing the “paste values” feature to carry over the content and not the formula for the cell value.

Pasting values from formual

I could have put nothing instead of “same” in my new column if the values were equal, but I wanted to know how many times these dates were equal. Sorting by my newbulkdate column I know that:

  • 777 rows only contained an inclusive date.
    • The formula as I have it would have returned a “0” here, because I didn’t tell it how to handle a blank cell in <unitdatebulk>.
    • Easy fix is to go back to my <unitdatebulk> column, find all the blank cells, and replace them with “empty.” Empty will carry forward with the formula. (I’m sure there is a way to handle this with the formula if anyone wants to jump in.)
  • 567 rows were identified as having a different value in bulk date so these dates were retained.
  • 1341 rows had an identical value in inclusive and bulk dates and were not kept.

I can now do a global replace on the newbulkdate column to replace “same” and “empty” with nothing. I then remove my original <unitdatebulk> column and my bulkformula column.

Only different bulk dates remain

Of course, this method only worked on cells were the characters were the exact same. There will be bulk dates that are the same as inclusive dates this didn’t catch, such as values with different spacing or punctuation. (ex: c.2007 v. c. 2007)

In other posts we’ll look at more date clean up questions, issues, and cleanup.

History and Politics

I want to step away from ArchivesSpace migration and take a moment to summarize some of the legacy data at Special Collections and University Archives. Carrie did a great job at painting the legacy landscape at Columbia. Our situation is similar in many ways. One of her points couldn’t be more on:

“We HAVE collection data, we just don’t have it stored or structured in a way that can be used or queried particularly effectively, and what we do have is in no way standardized.”

Until 2011, UMD maintained separate departments (and sometimes units within those departments) that were responsible for all the work pertaining to their collections. Curatorial units created and maintained data about their collections in their own ways, sometimes in ways the same or similar to other units, but often not. Collections data lives in paper accession and control files, spreadsheets, word documents, Access and FileMaker databases (for single collections or for similar types of materials), catalog records, finding aids, in someone’s head, etc…. These files live on the server in different locations and generally without consistent file names. I’ll also throw in that since we acquired the AFL-CIO records last fall, this comes with thirty plus years of collection data, including data from an archives management system.

In the summer of 2011 the following departments and units merged into one department:

  • Archives and Manuscripts department
    • Literature unit
    • Historical manuscripts unit
    • University Archives unit
  • Marylandia, Rare Books, and National Trust Library department
    • Marylandia and Rare Books unit
    • National Trust for Historic Preservation Library unit
  • Library of American Broadcasting department
  • National Public Broadcasting Archives department

Along with this move came the creation of “functional areas” that would manage specific common functions consistently across the new department. The Access Group became responsible for managing arrangement and description and associated functions for the entire department. Until I was hired in February 2013 there was not a person solely devoted to planning and managing this work, but multiple people on the access team that had other main responsibilities outside of the team. The creation of my position is enabling SCUA to analyze our technical service operations, update our practices, and manage functions consistently.

Interesting to note, that at least currently, there are three other special collection units in the Libraries (Gordon D. Prange Collection, Special Collections in the Performing Arts, and the International Piano Archives at Maryland) that operate outside of Special Collections and University Archives. SCUA provides some services to some of these units (Beast database) and shares some policies/procedures (ex: processing manual) with some of them.

Accession record mapping

Here’s what we have done and where we are with mapping our Beast accessions information to ArchivesSpace accession records:

  1. Play around in ArchivesSpace. Create new accession records with example data. See how these records interact with agent and subject records. Explore how spawning records into resource records work. Figure out how event records work (new to Aspace). And so forth.
  2. Analyze the Accession CSV Map and Accession CSV Template available on the ArchivesSpace Data Import and Export Map page. Ask questions about the mapping as the map contains a few errors (see below).
  3. Begin matching fields from the Beast database to the ArchivesSpace accession fields. Start parsing out our fields to their destination fields in the CSV template, including doing some data clean up by hand. This step helped to start identifying data clean up issues
  4. Do some test imports of populated CSV template. A good reminder that ArchivesSpace is still very much in active development. (ex: importing of agents had been turned off in the version 1.0.4 as in previous versions the importer was duplicating agent entries.  Sadly, I did not realize this and spent a good 30 minutes trying to make my import take the names!)
  5. Adjust and continue refining map as you get into the details of your legacy data.

I added information about our database fields to the CSV map and sharing here:

Accession_CSV_map_umdscua02242014

A few things to note:

  • There are five fields which are included in the CSV map, but do not currently import. I highlighted them in red with a note.
    • We do not really have this data in our current records, so no worries for now.
  • There are ten fields which the map states are collection management records in Aspace, but they are really event records. I highlighted these fields in yellow with a note that they create event records.
    • On first glance collection management makes sense as it describes the activity these fields perform. I’ll confess I spent more time than I should have looking for them in that section of Aspace, before someone alerted me they are event records, which even if connected to a particular accession you don’t see when viewing that accession record (that will change in an upcoming release.)
  • Our data is messy. So messy. Information from the archdescid:abstract field might show up in 23 Aspace accession record fields. 23!
    • To be fair, this represents 11 different types of information as fields are more narrow in Aspace. (ex: A date range in the abstract doesn’t just move to a “date” field, but will move to 5-6 date fields based on the content.)
    • 11 is still a lot of different types of information crammed into one field.

Next we will start diving into the questions our data is pushing.