Ethical Internships: Mentoring the Leaders We Need

I gave this talk last Friday to the Arizona Archives Association annual symposium — many thanks to that group for their excellent ideas and discussion, and for their strong sense of mission and values.


I wanted to start by explaining how excited I am to be here with you, and what it means to me to be an archivist speaking to a room of Arizona archivists. I grew up in Arizona, in Maricopa county in an area called Ahwatukee, which is a neighborhood on the south side of South Mountain, misnamed by the original white landowners for the Crow phrase for “land in the next valley.” Obviously the Crow people never lived anywhere near Arizona. The Crow are a northern plains tribe who lived in Wyoming and were forcibly moved to Montana. And so it is especially strange to me that the area was given a Crow name when we consider that Ahwatukee is bounded to the south by the Gila River Indian Community.

Crow (Apsaroke) Indians of Montana --

Crow (Apsaroke) Indians of Montana — “Holds the Enemy” by Edward Curtis. Library of Congress Prints and Photographs Division

What does it tell us of Dr. and Mrs. Ames’, the landowners who named the area, regard for their American Indian neighbors that they used the language of a group far enough away to be largely irrelevant to their lives instead of their immediate neighbors? I have to assume that they were caught up in popular romantic notions of American Indians, possibly best represented in the photographs of Edward Curtis, who aestheticized and fictionalized American Indians at precisely the moment when it was clear that there would be no more Indian wars and that the United States government’s program of forced removal had successfully met its intended ends.

This founding vignette resonates with me, because I see reverberations of it in my experience growing up in Ahwatukee. My middle school was named for the Akimel O’odham, the Pima people, who reside in Arizona, and our school donned bright turquoise and copper, vaguely pan-Indian pictographs. This was all done with a sharp lack of specificity; it gave the impression that American Indian culture is a stylistic flourish instead of a tradition, culture and worldview. Looking at it now, this divide between seeing American Indians as a people and seeing them as a trace on the now white-occupied land is especially cruel when you consider the persistent inequities that American Indians in Arizona encounter today. Indeed, during the last census there were only 738 American Indian-identified people living in Ahwatukee, which has the wealthiest and one of the whitest school districts in Arizona. I was surrounded by empty gestures to Indians but had no real contact with first Arizonans in my life. The land was empty of traces and traditions of people who had lived there, considered a tabula rasa onto which developers could build tract houses.

And so, growing up, I made the mistake that I think is pretty common among some Arizonans of assuming that there’s no history to be found here. I was participating in an act of mass forgetting. Continue reading

Getting into the guts of AT

It’s the thing that we keep saying — in order to deal with our masses of stuff better, we need better ways of understanding what we have. A lot of my questions aren’t just about what’s in our finding aids — they’re about the relationships between archival materials and other archival management functions — accessioning, digital object management, location management, container management. For instance, the following questions have come up in the past or could easily come up in the future:

  • Which collections are constituted of accessions that came in before 1980?
  • Which collections have digital objects associated with them? What are the URIs of those digital objects?
  • I have a barcode for a box. Can you tell me the materials that are supposed to be in that box? What collection is this from?
  • We haven’t used this location listed in the location table since 2005! Are there any boxes associated with that location? What are they?

In order to answer these questions, I need to write reports that join different tables in Archivists’ Toolkit together. And this is a little bit tricky, because in their own way, components in the AT database are hierarchical (just like in an EAD-encoded finding aid). If I have an instance (a container with a barcode), and I want to know which collection it belongs to, I don’t have a direct relationship in the database. Instead, an instance is associated with a component. That component is associated with its parent component. It may have a lot of ancestor components before the most high-level component is associated with the collection-level information in the resource table.

These relationships are made in sql through what are called “joins”. And joining a table on itself (in some cases several times, recursively) is a huge friggin pain in the neck. So, after mucking around for a little while, the solution was to just ask someone smarter than me how he would handle this.

This is where my colleague Steelsen comes in — Steelsen introduced the idea of writing a stored procedure that would look for the top-most component instead of having to do this through joins. And then he wrote them for me, because he is a mensch of the first order. His procedures are here, and available to anyone who might find them useful. They have seriously revolutionized the way that I’ve been able to do reporting and solve problems.

For instance, something that folks have been begging for is a barcode look-up tool — they have a barcode, and they want to know which collection it belongs to, what its call number is, which location it’s assigned to, and which components are associated with that box. So here’s what I wrote (the user indicates the barcode in the where statement):

use schema;
SELECT 
 CONCAT(r.resourceIdentifier1,
 ' ',
 LPAD(r.resourceIdentifier2, 4, '00')) 'Collection',
 r.title 'Collection Title',
 series.subdivisionIdentifier 'Series/Accession Number',
 series.title 'Series Title',
 rc.title 'Component Title',
 rc.dateExpression 'Component Date',
 adi.container1Type 'Container Type',
 adi.container1NumericIndicator BoxNum,
 adi.container1AlphaNumIndicator BoxAlpha,
 adi.container2NumericIndicator FolderNum,
 adi.container2AlphaNumIndicator FolderAlpha,
 adi.archDescriptionInstancesId InstanceID,
 adi.barcode Barcode,
 adi.userDefinedString1 'Voyager Info',
 loc.LocationId,
 loc.building,
 loc.room,
 loc.coordinate1NumericIndicator ShelfNum,
 loc.coordinate1AlphaNumIndicator ShelfAlpha
FROM
 ArchDescriptionInstances adi
 INNER JOIN
 ResourcesComponents rc ON adi.resourceComponentId = rc.resourceComponentId
 INNER JOIN
 LocationsTable loc ON adi.locationID = loc.locationID
 INNER JOIN
 Resources r ON r.resourceId = GETRESOURCEFROMCOMPONENT(rc.resourceComponentId)
 LEFT OUTER JOIN
 ResourcesComponents series ON GETTOPCOMPONENT(rc.resourceComponentId) = series.resourceComponentID
WHERE
 adi.barcode = 39002042658774;

Here I use two of Steelsen’s procedures. In GETRESOURCEFROMCOMPONENT, I go up the tree of a component to find out what resource it belongs to and join that to the resource. I use GETTOPCOMONENT to help figure out what series a component belongs to (this assumes that the top-most component is a series, but that’s usually a safe bet for us).

I’m a sql n00b, and this isn’t the most efficient query I’ve ever run, but I’m really happy with the results, which can be viewed in a spreadsheet here.

By changing the where statement, I can find out all kinds of associated information about a location, a collection, a box, whatever. I can find out if barcodes have been assigned to components with different box numbers; I can find out if components with the same barcode have been assigned to more than one location. This set of procedures has really been a godsend to help me know more about the problems I’m fixing. So many thanks to Steelsen. I hope others find them useful too.

Another Quick One — Locations where Accessions have been Assigned

If you assign accessions to locations, but move them around to a final home after processing, it may be helpful to see where your accessions were assigned and when the record was last touched. This query will help you do a little clean-up:

SELECT
Accessions.title,
Accessions.accessionNumber1,
Accessions.accessionNumber2,
Accessions.accessionNumber3,
Accessions.lastUpdated,
LocationsTable.building,
LocationsTable.room,
LocationsTable.coordinate1AlphaNumIndicator Shelf,
AccessionsLocations.accessionId,
AccessionsLocations.locationId
FROM
yale_dev.AccessionsLocations
INNER JOIN
Accessions ON Accessions.accessionId = AccessionsLocations.accessionId
JOIN
LocationsTable ON AccessionsLocations.locationId = LocationsTable.locationId

Here’s an example of some output. We may check, for instance, accessions from before 2015 to make sure that the accession location is still relevant. I hope this is useful to someone else!

Title Accn Date last modified loc accnID locID
Yale Guidance Nursery yearly reports 2010 A 085 2010-06-22 12:18:55 SML XXX X 7077 1933
Margenau, Henry, papers 2010 M 053 2010-11-16 15:51:37 SML XXX X 7078 1940

Quick Query — Finding Locations where Nothing is Assigned in Archivists’ Toolki

I just wrote a quick query to give records in the locations table in Archivists’ Toolkit that don’t have instances assigned to them. This sounds like a pretty common thing that folks want to see — here it is:

SELECT
*
FROM
LocationsTable loc
WHERE
loc.locationId BETWEEN 0 AND 10000
AND loc.locationId NOT IN (SELECT
locationId
FROM
ArchDescriptionInstances containers
WHERE
containers.locationId BETWEEN 0 AND 10000)
AND loc.locationId NOT IN (SELECT
locationId
FROM
AccessionsLocations accession
WHERE
accession.locationId BETWEEN 0 AND 10000);

Making DACS Dates

Manipulating date strings (which is the data type we usually have in archival description), particularly when you have a lot of legacy data, is a pain. I was working with a friend to update some legacy data in her finding aid, and it occurred to me that there isn’t a lot of direct guidance out there about how to manipulate dates with various tools. So, here’s a run-down of some of my methods — please feel free to add your own in the comments.

Why does this matter?

I’ll be honest, in a lot of situations, date formats don’t matter at all. I’ve said it before and I’ll say it again — we put a whole lot of effort into creating structured data, considering that most of us just flatten it into HTML and put it up as a webpage. However, there is a brighter tomorrow. With structured data, you can make far better interfaces, and there are really nice examples of places that let you do stuff with date data.

In the Princeton finding aids site, you can sort by title, date, or container. This means that in a series like this, in the George F. Kennan papers, where the archivist (or possibly creator) filed by title, this isn’t the only way to look through materials.

George Kennan Finding Aid

The order of materials as they are presented

George Kennan Finding Aid

The order of materials, sorted by me (the user) by date ascending.

Letting users sort by title or date means a few things — we can stop wasting time with alpha or chron arrangement and spend more of our energies on the true value that archivists add to description — context, meaning, transparency — without worrying that there’s too much for the researcher to sort through. It also means that we don’t have to presume that a researcher’s primary discovery vector is either time or title — we can let her choose for herself. Finally, and most importantly, we can let original arrangement schemes and organic order (the true intellectual basis of arrangement) reign supreme.

The other reason why date formats are important is because our content standard tells us they are. Now, I personally think that it’s actually far more important to associate an ISO-compliant date with a descriptive component, which can then be rendered any way you want, but since until recently our tools didn’t support that very well, I think that the DACS format of YYYY Month D brings us a step closer to easier date clean-up and extracting ISO compliant dates from date expressions.

Excel

Excel, odi et amo. Excel offers a GUI for programming-ish functions, but I find as I do more and more advanced stuff that I get frustrated by not knowing what exactly is happening with the underlying data. Dates are particularly frustrating, since Excel stores dates as a serial number starting with January 1, 1900. As an archivist who has PLENTY OF DATES from before then, this can lead to rage. There are a few ways to deal with this — if your dates are all 20th or 21st century, congratulations! You don’t have a problem. There are ways to get Excel to change the ways it assigns serial numbers, to allow for negative numbers, which let’s you do the normal sorting and date re-formatting. Or, you can store everything as text and move each part of the date string to its own column to manipulate it.

So, an example of a clean-up project:

Excel Dates -- untouched

In this data, we have a bit to clean up. When I start a clean-up project, I usually start with a pencil-and-paper list of all of the steps that I need to go through before I change anything. This way, I see if I need to do research about how to do a step, and I can also see if there are dependencies in the data that may require me to sequence these steps in a particular way. When you’re first learning, it’s easy to jump right in without planning, but trust me — every time I’ve been burned by automation it’s because I didn’t plan. In a live data environment, you should always know what the computer’s going to do before you run a command, even if that command is just a formula in Excel. The flip side of this is, of course, that as long as you have good back-ups, you should feel free to experiment and try new things. Just make sure you make the effort to figure out what actually happened when you’re experimenting suddenly produces the results you want to see.

So, here’s my list of steps to perform on this data.

  • Check my encoding, which in this case just means which data is in which columns. Do you see the row where some of the date data is in the title column? It’s in row 4. I would probably survey the data and see how prevalent this kind of problem is. If it’s just a handful of errors, I’ll move the data over by hand. If there’s more, I’ll figure out a script/formula to automate this.
  • Check for unwanted characters. In this case, get rid of brackets. In case you haven’t heard, brackets are not a meaningful way of indicating uncertainty to researchers. There is a certainty attribute on <unitdate> for that, which can then be rendered in your institution’s EAD -> HTML stylesheet. However, my problem with brackets is more fundamental — in archival description, the date element is just a transcription of what we see on the record. We don’t actually know that this date represents anything. So in reality, these are all guesses to varying degrees of certainty, with the aim of giving the researcher some clue to time.
  • Fix the date format. DACS dates are YYYY Month D. (e.g., 2015 March 6)
  • Create an ISO date to serialize as an @normal attribute with <unitdate>

Let’s skip the obvious clean-up tasks and go straight to formatting dates. If everything is after 1900 (and if everything is a three-part date), this is really straightforward.

First, create a new column. Use the DATEVALUE formula to tell Excel to regard your date string as a date value — if your date string is in B2, your formula in C2 should be:

=DATEVALUE(B2)

Double-click on the bottom right corner of the cell to have that formula apply to the whole column.

Now that Excel knows that this is a date, you simply need to give it the format you want to see, in this case, yyyy mmmm d.

Choosing a custom date format

Choosing a custom date format.

This works great for three-part dates after 1900. If that’s not your situation, there are a few things you can do. One of my favorite methods is to filter the date list to each of the different date types and apply the custom date format to each of these (trying to apply a custom date format to a date that doesn’t fit the type will result in really confusing and bad results). Another option is to split the date into three different columns, treat each like text, and then bring them back together in the order you want with the CONCATENATE formula. Play around — Excel doesn’t make it easy, but there are lots of options.

OpenRefine

If you do a lot of data manipulation, I would definitely encourage you to stop torturing yourself and learn OpenRefine. I use it every day. OpenRefine uses something called GREL (Google Refine Expression Language — I wonder if they’ll be changing that to OREL now that this isn’t under the Google umbrella?), which is trickier to learn than Excel formulas but a lot more powerful and more in alignment with other programming languages. In fact, I should say that you only need to learn GREL for the fancy stuff — a lot of OpenRefine’s magic can be done through the GUI.

So, looking through this data set, I would do a lot of the same steps. One option is to just use the commands Edit Cells -> Common Transformations -> To Date, but unfortunately, most of these strings aren’t written in a way that OpenRefine understands them as dates.

The best path forward is probably to split this date string apart and put it back together. You could split by whitespace and turn them into three columns, but since some dates are just a year, or a year and a month, you wouldn’t necessarily have each of the three parts of the date in the columns where you want them.

So, I’m going to tell OpenRefine what a year looks like and ask it to put the year in its own column.

This formula pulls the year from the date string and puts it into its own column.

This formula pulls the year from the date string and puts it into its own column.

In this formula, I’m partitioning the string by a four-digit number and then taking that part of the partition for my new column. In the case of the year, the formula is:

value.partition(/\d{4}/)[1]

For a month it’s:

value.partition(/[A-Za-z]+/)[1]

And for the day it’s:

value.partition(/\d{1,2}/)[1]

There may be a more elegant way of partitioning this all as one step, but I don’t yet know how!

Then, once you have each of these parts of the date in their own columns, they should look like this:

Each part of the date element is in its own column.

Each part of the date element is in its own column.

The final step is to put the pieces back together in the order you want them. You can do this by clicking on the Year column, and selecting “Create column based on this column.” Then, use GREL to put everything in the order that you want to see it.

The plus signs signify that everything should be smushed together -- pay attention to the syntax of calling the value of columns.

The plus signs signify that everything should be smushed together — pay attention to the syntax of calling the value of columns.

The formula for this is:

value + " " + cells["Month"].value + " " + cells["Day"].value

And voila, you’ve turned your non-DACS date into a DACS-formatted date. You can use similar steps to make a column that creates an ISO-formatted date, too, although you’ll first have to convert months into two-digit numbers.

Finally, SQL

The two methods above require ETL — extract, transform, load. That is, you’re going to get data out of the database (or transform it into a tabbed sheet from xml) and then get it back into the database or the EAD (and then the database). There is a better way if you’re using Archivists’ Toolkit or ArchivesSpace, and it involves doing SQL updates. I’m going to punt on this for now, because I know that this will be a huge part of my future once we get into ArchivesSpace (I’ll also be creating normalized dates, which is data that Archivists’ Toolkit can’t store properly but ASpace can). So, stay tuned!

dwn w/ abbrs

Maybe you’ve heard — library and archives content standards are NOT DOWN with abbreviations these days. This is part of an effort to recognize that we no longer have to fit our descriptions onto catalog cards, and that the less confusing jargon we present to our patrons, the better!

In my repository, we made an official switch from abbreviated months to spelled-out months. However, there was still an enormous corpus of legacy abbreviations in Archivists’ Toolkit.

This is the kind of problem that power tools are great at solving. Check out my sql, posted here, for a big, fat find-and-replace that goes through the date expression field in AT and makes everything better. It looks for month abbreviations, common misspellings, variations of the term “circa” and variations on “undated.” I have my eye on brackets, too, but it might be too hairy to deal with them right now. Please feel free to use this in your own repository.

Also, stay tuned for part two, where I dispose of contemptible manuscript tradition abbreviations!

Happy New Year!

As we finish our first week back at work in 2015, we thought it might be nice to reflect on what we accomplished in 2014 and what our resolutions are for this year.

Looking Back

Carrie

As I type this I am sitting in a living room piled high with boxes and strewn with bubble wrap and packing tape.  I finished my six and a half year run at Columbia on Friday and will be starting a new position at Emory University’s Manuscript, Archives, & Rare Book Library at the beginning of next month.

This past year was full of professional changes.  I got a new director, moved offices, our library annexed another unit which landed under my supervision, and our University Librarian retired at the end of the year.  Amidst all of this, though, my team and I managed to hit some pretty major milestones in the middle of the chaos and change-related-anxiety.  We completed a comprehensive collection survey that resulted in DACS compliant collection level records for all of our holdings, we published our 1000th EAD finding aid, and kept up with the 3000 plus feet of accessions that came through our doors.

Cassie

Last year I spent a lot of time learning how to work with data more effectively (in part thanks to this blog!) I used OpenRefine and regular expressions to clean up accessions data. Did lots of ArchivesSpace planning, mapping, and draft policy work. Supervised an awesome field study. Participated in our Aeon implementation. Began rolling out changes to how we create metadata for archival collections and workflows for re-purposing the data. I also focused more than I ever have before on advocating for myself and the functions I oversee. This included a host of activities, including charting strategic directions, but mainly comprised lots of small conversations with colleagues and administrators about the importance of our work and the necessity to make programmatic changes. I also did a ton of UMD committee work. Oh, and got married! That was pretty happy and exciting.

Maureen

2014 was my sixth year working as a professional archivist, and continued my streak (which has finally ended, I swear) of being a serial short-timer. Through June of last year, I worked with a devoted team of archives warriors at the Tamiment Library and Robert F. Wagner Labor Archives. There, we were committed to digging ourselves out of the hole of un-described resources, poor collection control, and an inconsistent research experience. Hence, my need for this blog and coterie of smart problem solvers. I also gave a talk at the Radcliffe Workshop on Technology and Archival Processing in April, which was an archives nerd’s dream — a chance to daydream, argue, and pontificate with archivists way smarter than I am.

In June I came to Yale — a vibrant, smart, driven environment where I work with people who have seen and done it all. And I got to do a lot of fun work where I learned more about technology, data, and archival description to solve problems. And I wrote a loooot of blog posts about how to get data in and out of systems.

Meghan

It kind of feels like I did nothing this past year, other than have a baby and then learn how to live like a person who has a baby. 2014 was exhausting and wonderful. I still feel like I have a lot of tricks to learn about parenting; for example, how to get things done when there is a tiny person crawling around my floor looking for things to eat.

Revisiting my Outlook calendar reminds me that even with maternity leave, I had some exciting professional opportunities. I proposed, chaired, and spoke at a panel on acquisition, arrangement, and access for sexually explicit materials at the RBMS Conference in Las Vegas, and also presented a poster on HistoryPin at the SAA Conference in Washington, D.C. Duke’s Technical Services department continues to grow, so I served on a number of search committees, and chaired two of them. I continue to collaborate with colleagues to develop policies and guidelines for a wide range of issues, including archival housing, restrictions, description, and ingest. And we are *this close* to implementing ArchivesSpace, which is exciting.

Looking Forward

Carrie

I have so much to look forward to this year!  I’m looking forward to learning a new city, to my first foray into the somewhat dubious joys of homeownership, and to being within easy walking distance of  Jeni’s ice cream shop.  And that’s all before I even think about my professional life.  My new position oversees not only archival processing, but also cataloging and description of MARBL’s print collections so I will be spending a lot of time learning about about rare book cataloging and thinking hard about how to streamline resource description across all formats.

Changing jobs is energizing and disruptive in the best possible way so my goal for the year is to settle in well and to learn as much much as possible– from my new colleagues, from my old friends, and from experts and interested parties across the profession.

Cassie

I am super excited to be starting at the Orbis Cascade Alliance as a Program Manager in February. I’ll be heading up the new Collaborative Workforce Program covering the areas of shared human resources, workflow, policy, documentation, and training. The Alliance just completed migrating all 37 member institutions to a shared ILS. This is big stuff and a fantastic foundation to analyze areas for collaborative work.

While I can’t speak to specific goals yet, I know I will be spending a lot of time listening and learning. Implementing and refining a model for shared collaborative work is a big challenge, but has huge potential on so many fronts. I’m looking forward to learning from so many experts in areas of librarianship outside of my experiences/background. I’m also thrilled to be heading back to the PNW and hoping to bring a little balance back to life with time in the mountains and at the beach.

Maureen

I have a short list of professional resolutions this year. Projects, tasks and a constant stream of email has a way of overshadowing what’s really important — I’ll count on my fellow bloggers to remind me of these priorities!

  • All ArchivesSpace, all the time. Check out the ArchivesSpace @ Yale blog for more information about this process.
  • I want to create opportunities for myself for meaningful direct interaction with researchers so that their points of view can help inform the decisions we make in the repository. This may mean that I take more time at the reference desk, do more teaching in classes, or find ways to reach out and understand how I can be of better service.
  • I want to develop an understanding of what the potential is for archival data in a linked data environment. I want to develop a vision of how we can best deploy this potential for our researchers.
  • I have colleagues here at Yale who are true experts at collection development — I want to learn more about practices, tips, tricks, pitfalls, and lessons learned.

Meghan

I have a few concrete professional goals for the coming year:

  • I want to embrace ArchivesSpace and learn to use it like an expert.
  • I will finish my SPLC guide — the print cataloging is finished, so as soon as I get a chance I will get back to this project.
  • I have requested a regular desk shift so that I can stay more connected to the researchers using the collections we work so hard to describe.
  • I am working more closely with our curators and collectors on acquisitions and accessioning, including more travel.
  • My library is finishing a years-long renovation process, so this summer I will be involved with move-related projects (and celebrations). Hopefully there will be lots of cake for me in 2015.

 

On Containers

I’m here  to talk about boxes. Get excited.

I’ve been spending a LOT of time lately thinking about containers — fixing them, modelling them, figuring out what they are and aren’t supposed to do. And I’ve basically come to the conclusion that as a whole, we spend too much time futzing with containers because we haven’t spent enough time figuring out what they’re for and what they do.

For instance, I wrote a blog post a couple of months ago about work we’re doing to remediate stuff that should not but is happening with containers — barcodes being assigned to two different containers, two different container types with the same barcode/identifier information, etc. Considering the scale of our collections, the scale of these problems is mercifully slight, but these are the kinds of problems that turn into a crisis if a patron is expecting to find material in the box she ordered and the material simply isn’t there.

I’m also working with my colleagues here at Yale and our ArchivesSpace development vendor Hudson Molonglo to add functionality to ArchivesSpace so that it’s easier to work with containers as containers. I wrote a blog post about it on our ArchivesSpace blog. In short, we want to make it much easier to do stuff like assigning locations, assigning barcodes, indicating that container information has been exported to our ILS, etc. In order to do this, we need to know exactly how we want containers to relate to archival description and how they relate to each other.

As I’ve been doing this thinking about specific container issues, I’ve had some thoughts about containers in general. Here they are, in no particular order.

What are container numbers doing for us?

A container number is just a human-readable barcode, right? Something to uniquely identify a container? In other words, speaking in terms of the data model, isn’t this data that says something different but means the same thing? And is this possibly a point of vulnerability? At the end of the day, isn’t a container number  something that we train users to care about when really they want the content they’ve identified? And we have a much better system for getting barcodes to uniquely identify something than we do with box numbers?

In the days that humans were putting box numbers on a call slip and another human was reading that and using that information to interpret shelf location, it made sense to ask the patron to be explicit about which containers were associated with the actual thing that they want to see. But I think that we’ve been too good at training them (and training ourselves) to think in terms of box numbers (and, internally, locations) instead of creating systems that do all of that on the back end. Information about containers should be uniform, unadorned, reliable, and interact seamlessly with data systems. Boxes should be stored wherever is best for their size and climate, and that should be tracked in a locations database that interacts with the requesting database. And the actual information should be associated seamlessly with containers.

This means that instead of writing down a call number and box number and reading a note about how materials of this type are stored on-site and materials of another type are stored off-site, let’s take a lot of human error out of this. Let’s let them just click on what they want to see. Then, the system says “a-ha! There are so many connections in my database! This record is in box 58704728702861, which is stored in C-29 Row 11, Bay 2, Shelf 2. I’ll send this to the queue that prints a call slip so a page can get that right away!” And instead of storing box numbers and folder numbers in the person’s “shopping cart” of what she’s seen, let’s store unique identifiers for the archival description, so that if that same record get’s re-housed into box 28704728702844 and moved to a different location, the patron doesn’t have to update her citation in any scholarly work she produces. Even if the collection gets re-processed, we could make sure that identifiers for stuff that’s truly the same persists.

Also, don’t tell me that box numbers do a good job of giving cues about order and scale. There are waaaaaayyyyy better ways of doing that than making people infer relationships based on how much material fits into 0.42 linear feet.

We have the concepts. Our practice needs to catch up, and our tools do too.

Darn it, Archivists’ Toolkit, you do some dumb things with containers

Archival management systems are, obviously, a huge step up from managing this kind of information in disparate documents and databases. But I think that we’re still a few years away from our systems meeting their potential. And I really think that folks who do deep thinking about archival description and standards development need to insert themselves into these conversations.

Here’s my favorite example. You know that thing where you’re doing description in AT and you want to associate a container with the records that you just described in a component? You know how it asks you what kind of an instance you want to create? That is not a thing. This is just part of the AT data model — there’s nothing like this in DACS, nothing like it in EAD. Actual archival standards are smart enough to not say very much about boxes because they’re boxes and who cares? When it exports to EAD, it serializes as @label. LABEL. The pinnacle of semantic nothingness!

This is not a thing.

This is not a thing.

Like, WHY? I can see that this could be the moment where AT is asking you “oh, hey, do you want to associate this with a physical container in a physical place or do you want to associate it with a digital object on teh interwebz?” but there’s probably a better way of doing this.

My problem with this is that it has resulted in A LOT of descriptive malpractice. Practitioners who aren’t familiar with how this serializes in EAD think that they’re describing the content (“oh yes! I’ve done the equivalent of assigning a form/genre term and declaring in a meaningful way that these are maps!”) when really they’ve put a label on the container. The container is not the stuff! If you want to describe the stuff, you do that somewhere else!

Oh my gosh, my exclamation point count is pretty high right now. I’ll see if I can pull myself together and soldier on.

Maybe we should be more explicit about container relationships.

Now, pop quiz, if you have something that is in the physical collection and has also been microfilmed, how do you indicate that?

In Archivists’ Toolkit, there’s nothing clear about this. You can associate more than one instance with an archival description, but you can also describe levels of containers that (ostensibly) describe the same stuff, but happen to be a numbered item within a folder, within a box.

Anything can happen here.

Anything can happen here.

So this means that in the scenario I mentioned above, it often happens that someone will put the reel number into container 3, making the database think that the reel is a child of the box.

But even if all of the data entry happens properly, EAD import into Archivists’ Toolkit will take any three <container> tags and instead of making them siblings, brings the three together into parent-child instance relationship like you see above. This helps maintain relationships between boxes and folders, but is a nightmare if you have a reel in there.

EAD has a way of representing these relationships, but the AT EAD export doesn’t really even do that properly.

 <c id="ref10" level="file">
   <did>
     <unittitle>Potter, Hannah</unittitle>
     <unitdate normal="1851/1851">1851</unitdate>
     <container id="cid342284" type="Box" label="Mixed Materials (39002038050457)">1</container>
     <container parent="cid342284" type="Folder">2</container>
   </did>
 </c>

 <c id="ref11" level="file">
   <did>
     <unittitle>Potter, Horace</unittitle>
     <unitdate normal="1824/1824">1824</unitdate>
     <container id="cid342283" type="Box" label="Mixed Materials (39002038050457)">1</container>
     <container parent="cid342283" type="Folder">3</container>
   </did>
 </c>

Here, we see that these box 1’s are the same — they have the same barcode (btw, see previous posts for help working out what to do with this crazy export and barcodes). But the container id makes it seem like these are two different things — they have two different container id’s and their folders refer two two different parents.

What we really want to say is “This box 1 is the same as the other box 1’s. It’s not the same as reel 22. Folder 2 is inside of box 1, and so is folder 3.” Once we get our systems to represent all of this, we can do much better automation, better reporting, and have a much more reliable sense of where our stuff is.

So if we want to be able to work with our containers as they actually are, we need to represent those properly in our technology. What should we be thinking about in our descriptive practice now that we’ve de-centered the box?

“Box” is not a level of description.

In ISAD(G) (explicitly) and DACS (implicitly), archivists are required to explain the level at which they’re describing aggregations of records. There isn’t a vocabulary for this, but traditionally, these levels include “collection”, “record group”, “series”, “file” and “item.” Note that “box” is not on this list or any other reasonable person’s list. I know everyone means well, and I would never discourage someone from processing materials in aggregate, but the term “box-level processing” is like nails on a chalkboard to me. As a concept, it should not be a thing. Now, series-level processing? Consider me on board! File-group processing? Awesome, sounds good! Do you want to break those file groups out into discrete groups of records that are often surrounded by a folder and hopefully are associated with distinctive terms, like proper nouns? Sure, if you think it will help and you don’t have anything better to do.

A box is usually just an accident of administravia. I truly believe that archivists’ value is our ability to discern and describe aggregations of records — that box is not a meaningful aggregation, and describing it as such gives a false impression of the importance of one linear foot of material. I’d really love to see a push toward better series-level or file-group-level description, and less file-level mapping, especially for organizations’ records. Often, unless someone is doing a known item search, there’s nothing distinct enough about individual files as evidence (and remember, this is why we do processing — to provide access to and explain records that give evidence of the past) to justify sub-dividing them. I also think that this could help us think past unnecessary sorting and related housekeeping — our job isn’t to make order from chaos*, it’s to explain records and their context of creation of use. If records were created chaotically and kept in a chaotic way, are we really illuminating anything by prescribing artificial order?

This kind of thinking will be increasingly important when our records aren’t tied to physical containers.

In conclusion, let’s leave the robot work to the robots.

If I never had to translate a call number to a shelf location again, it would be too soon (actually, we don’t do that at MSSA, but still). Let’s stop making our patrons care about boxes, and let’s start making our technology work for us.


* This blog’s title, Chaos –> Order, is not about bringing order to a chaotic past — it’s about bringing order to our repositories and to our work habits. In other words, get that beam out of your own eye, sucka, before you get your alphabetization on.

 

Book Review: Extensible Processing. Case Studies and Conclusion

And we’ve come to the end. For me, the most fun part of this book is the case studies at the end. Here, everything that Dan had been talking about in previous chapters comes together and we see the concrete ways that extensible processing principles help solve big problems (huge problems, really — repositories in disarray huge, processing 2,500 feet in two years huge, giving access to huge volumes of records without violating HIPAA huge).

Instead of going through each case study, I thought I would pull out some winning strategies that helped archivists move mountains. But first the roll-call of devoted archivists taking smart approaches to their projects (I’ve tried to link to relevant materials online — really, though, read the case studies in Dan’s book).

So, what worked really well? What made it possible for these archivists to do so amazing remediation and program-building work?

  • Focus, deadlines, and scoping a project properly are the winning combination to finish a project. Giving a project a finite timeline forces participants to articulate our central values. Don’t let yourself become consumed by unimportant details.
  • Change your repository today to avoid the backlog of tomorrow — start with accessioning. A lot of what’s done as processing in these projects are activities that I would describe as retrospective accessioning (getting intellectual and physical control, understanding groupings of materials by creator/function, hunting for any agreements with donors that may impact access, use, or permission to dispose of materials), but with important information lost to time. Dan’s chapter on accessioning and Audra Eagle Yun’s case study on building an accessioning program make such a strong case that you’ll never know more about these materials than the moment they come through the door, so that’s the time to determine and meet a baseline level of control.
  • Re-use existing description — wherever you may find it — whenever possible. Creators know much more about their records than the rest of us ever will — Adriane’s case study made a great case for finding, recording, and re-using high-level description to help stay at a high-level understanding of the records. This means that you need to get comfortable with description as data, so that you can make it good and put it where it belongs. Maybe some posts on this blog can help you think though that!
  • If you’re in a position of responsibility over people, processes or systems, be smart about how you spend your time. Create a ranked list of the biggest things that you could do to improve access to the records you collect. Maybe that’s working with IT to make sure that the time-consuming, nagging process that your staff has to work around gets fixed. Maybe that means filling some training gaps. Maybe this means that you stop processing on a single collection and organize a survey of your entire holdings. Maybe it’s making sure you have a better database for tracking locations. If you ever find yourself saying, “I’m just too busy to think it through,” you’re already in the danger zone — you’re implicitly admitting that the way work is being done now is probably not the best way. Put two hours on your calendar to do some deep thinking, read these case studies, consult with your colleagues, and make sure that work is being done the way that works best for everyone.
  • Principles are sacred, procedures are not. You’re here to provide authentic, reliable evidence of the past through the records people leave behind which others can access during the course of research. Make sure that every procedure in your repository exists in service to that goal. Maybe this means that instead of doing item-level review for restrictions, you figure out that it makes more sense from an access and resources perspective to do review on demand. Maybe this means that you allocate staff that used to do arrangement to doing digitization.

Like we’ve said all week, this is a great book — practical, principled, helpful, approachable, and rooted in the language and values of archivists. Anyone seeking a way to improve her own practice will find valuable advice.

Book Review: Extensible Processing. Why Extensible Processing is Essential

This week, our core group of editors will review Extensible Processing for Archives and Special Collections: Reducing Processing Backlogs by Daniel A. Santamaria.


Many successful archival repositories have, for a very long time, operated in ways to make sure that their practices scale to their collections sizes, staffing resources, and user needs. But it seems that it’s only been in the last ten years, since the publication of Mark Greene and Dennis Meissner’s “More Product, Less Process: Revamping Traditional Archival Processing” and the associated cascade of conference presentations, case studies, and affiliated articles, that processing procedures as a whole have moved toward something that we can talk about, think critically about, and ultimately re-examine the purpose of.

This book provides the first comprehensive framework that I’ve seen about how to run a repository based on extensible processing principles — principles that are firmly rooted in deeply-held archival values and the logical extension of Greene and Meissner’s argument that every procedure in a library needs to be held to the scrutiny of materials’ availability for use. And, since this blog is largely about repository-wide projects (and shifting our thinking toward taking care of everything in our care instead of thinking about processing project after processing project), it seems like an excellent fit for our interests and audience.

Chapter one starts with a sobering analysis of the backlog problem. In short, backlogs are growing, staffing is flat, collecting continues, the records we collect as evidence of our creators’ lives and work are more voluminous than ever, and few of us are doing anything differently to help address the fact that patrons can’t see our collections. He pulls what I found to be a shocking statistic — according to an OCLC research survey of special collections libraries in late 2010, internet-accessible finding aids only exist for 44% of collections [1], despite the fact that it seemed like one couldn’t throw a rock at a conference between 2005-2010 without hitting someone having a discussion about Greene and Meissner’s article.

So, there’s obviously a problem. Despite MPLP’s very good advice that we need to be willing to look at our work differently if we want to overcome the problem of scale, it’s simply not happening in too many repositories. And here, I think, is where this book makes an important intervention in the archival literature.  Santamaria provides reasoned, step-by-step advice toward building a program where patrons are better served, donors’ expectations are met, and staff aren’t constantly trying to climb out from a hole of tasks yet to be performed with no relief in sight.

Given the choice, it’s a lot more professionally satisfying to work in a place that doesn’t accept the inevitability of backlogs. I worked for Dan at Princeton from the beginning of 2011 through 2013. If you’re wondering what it’s like to work at a place with a true philosophy of access first, and where one examines, each time, what processing means for that collection (and in the context of the other work that needs to be done) and why you’re doing it that way — well, it’s a lot of fun. I had come in at a particularly exciting time — because of the smart decisions that Dan and other archivists at Mudd had made in years previous, the backlog was dead. We were able to work on projects (like the Princeton Finding Aids site), that relied on creative engagement with our description, our materials, and our users. I believe that this kind of project was only possible because Dan had already built a culture of intellectual engagement with our work, where each member of the team understood our mission and the purposes of archival description.

For anyone overwhelmed by her repository, things can be different. But relief can only come if you’re willing to take a hard look at why you do what you do. More than that, you might have to spend more time managing and planning (and less time treading water, hoping that change will come externally). Chapter two provides six principles for an extensible processing program.

  1. Create a baseline level of access to all collections material
  2. Create standardized, structured description
  3. Manage archival materials in the aggregate
  4. “Do no harm”: limit physical handling and processing
  5. Iterate: conduct further processing in a systematic but flexible way
  6. Manage processing holistically

I believe that what separates professional archivists from interested enthusiasts is a commitment to managing our time in ways that our best for researchers and collections. This book makes a compelling case for a deliberate approach, which requires that archivists make prudent decisions and hard choices every day.

Throughout this book… emphasis is placed on decision-making, prioritization, and adherence to archival principles and standards — concepts that apply to archivists at many levels and in every kind of organization. [2]

I’m convinced that we all have the capability to approach our work this way — but that 44% number doesn’t lie. We need to treat the problem of backlogs like the crisis it is. I look forward to Meghan’s review tomorrow, which will cover chapters 3-5 and discuss concrete steps any archivist can take to effectively manage processing and kill the backlog.


[1]  Santamaria 2, quoting Dooley, Jackie and Katherine Luce. “Taking Our Pulse: The OCLC Research Survey of Special Collections and Archives.” OCLC Research, 2010. It’s interesting that according to the survey, 74% of collections would have online finding aids if analog copies were converted and made available online.

[2] Santamaria, Daniel A. Extensible Processing for Archives and Special Collections: Reducing Processing Backlogs. Chicago: ALA Neal-Schuman, 2015, X