Ethical Internships: Mentoring the Leaders We Need

I gave this talk last Friday to the Arizona Archives Association annual symposium — many thanks to that group for their excellent ideas and discussion, and for their strong sense of mission and values.

I wanted to start by explaining how excited I am to be here with you, and what it means to me to be an archivist speaking to a room of Arizona archivists. I grew up in Arizona, in Maricopa county in an area called Ahwatukee, which is a neighborhood on the south side of South Mountain, misnamed by the original white landowners for the Crow phrase for “land in the next valley.” Obviously the Crow people never lived anywhere near Arizona. The Crow are a northern plains tribe who lived in Wyoming and were forcibly moved to Montana. And so it is especially strange to me that the area was given a Crow name when we consider that Ahwatukee is bounded to the south by the Gila River Indian Community.

Crow (Apsaroke) Indians of Montana --

Crow (Apsaroke) Indians of Montana — “Holds the Enemy” by Edward Curtis. Library of Congress Prints and Photographs Division

What does it tell us of Dr. and Mrs. Ames’, the landowners who named the area, regard for their American Indian neighbors that they used the language of a group far enough away to be largely irrelevant to their lives instead of their immediate neighbors? I have to assume that they were caught up in popular romantic notions of American Indians, possibly best represented in the photographs of Edward Curtis, who aestheticized and fictionalized American Indians at precisely the moment when it was clear that there would be no more Indian wars and that the United States government’s program of forced removal had successfully met its intended ends.

This founding vignette resonates with me, because I see reverberations of it in my experience growing up in Ahwatukee. My middle school was named for the Akimel O’odham, the Pima people, who reside in Arizona, and our school donned bright turquoise and copper, vaguely pan-Indian pictographs. This was all done with a sharp lack of specificity; it gave the impression that American Indian culture is a stylistic flourish instead of a tradition, culture and worldview. Looking at it now, this divide between seeing American Indians as a people and seeing them as a trace on the now white-occupied land is especially cruel when you consider the persistent inequities that American Indians in Arizona encounter today. Indeed, during the last census there were only 738 American Indian-identified people living in Ahwatukee, which has the wealthiest and one of the whitest school districts in Arizona. I was surrounded by empty gestures to Indians but had no real contact with first Arizonans in my life. The land was empty of traces and traditions of people who had lived there, considered a tabula rasa onto which developers could build tract houses.

And so, growing up, I made the mistake that I think is pretty common among some Arizonans of assuming that there’s no history to be found here. I was participating in an act of mass forgetting. Continue reading

Getting into the guts of AT

It’s the thing that we keep saying — in order to deal with our masses of stuff better, we need better ways of understanding what we have. A lot of my questions aren’t just about what’s in our finding aids — they’re about the relationships between archival materials and other archival management functions — accessioning, digital object management, location management, container management. For instance, the following questions have come up in the past or could easily come up in the future:

  • Which collections are constituted of accessions that came in before 1980?
  • Which collections have digital objects associated with them? What are the URIs of those digital objects?
  • I have a barcode for a box. Can you tell me the materials that are supposed to be in that box? What collection is this from?
  • We haven’t used this location listed in the location table since 2005! Are there any boxes associated with that location? What are they?

In order to answer these questions, I need to write reports that join different tables in Archivists’ Toolkit together. And this is a little bit tricky, because in their own way, components in the AT database are hierarchical (just like in an EAD-encoded finding aid). If I have an instance (a container with a barcode), and I want to know which collection it belongs to, I don’t have a direct relationship in the database. Instead, an instance is associated with a component. That component is associated with its parent component. It may have a lot of ancestor components before the most high-level component is associated with the collection-level information in the resource table.

These relationships are made in sql through what are called “joins”. And joining a table on itself (in some cases several times, recursively) is a huge friggin pain in the neck. So, after mucking around for a little while, the solution was to just ask someone smarter than me how he would handle this.

This is where my colleague Steelsen comes in — Steelsen introduced the idea of writing a stored procedure that would look for the top-most component instead of having to do this through joins. And then he wrote them for me, because he is a mensch of the first order. His procedures are here, and available to anyone who might find them useful. They have seriously revolutionized the way that I’ve been able to do reporting and solve problems.

For instance, something that folks have been begging for is a barcode look-up tool — they have a barcode, and they want to know which collection it belongs to, what its call number is, which location it’s assigned to, and which components are associated with that box. So here’s what I wrote (the user indicates the barcode in the where statement):

use schema;
 ' ',
 LPAD(r.resourceIdentifier2, 4, '00')) 'Collection',
 r.title 'Collection Title',
 series.subdivisionIdentifier 'Series/Accession Number',
 series.title 'Series Title',
 rc.title 'Component Title',
 rc.dateExpression 'Component Date',
 adi.container1Type 'Container Type',
 adi.container1NumericIndicator BoxNum,
 adi.container1AlphaNumIndicator BoxAlpha,
 adi.container2NumericIndicator FolderNum,
 adi.container2AlphaNumIndicator FolderAlpha,
 adi.archDescriptionInstancesId InstanceID,
 adi.barcode Barcode,
 adi.userDefinedString1 'Voyager Info',
 loc.coordinate1NumericIndicator ShelfNum,
 loc.coordinate1AlphaNumIndicator ShelfAlpha
 ArchDescriptionInstances adi
 ResourcesComponents rc ON adi.resourceComponentId = rc.resourceComponentId
 LocationsTable loc ON adi.locationID = loc.locationID
 Resources r ON r.resourceId = GETRESOURCEFROMCOMPONENT(rc.resourceComponentId)
 ResourcesComponents series ON GETTOPCOMPONENT(rc.resourceComponentId) = series.resourceComponentID
 adi.barcode = 39002042658774;

Here I use two of Steelsen’s procedures. In GETRESOURCEFROMCOMPONENT, I go up the tree of a component to find out what resource it belongs to and join that to the resource. I use GETTOPCOMONENT to help figure out what series a component belongs to (this assumes that the top-most component is a series, but that’s usually a safe bet for us).

I’m a sql n00b, and this isn’t the most efficient query I’ve ever run, but I’m really happy with the results, which can be viewed in a spreadsheet here.

By changing the where statement, I can find out all kinds of associated information about a location, a collection, a box, whatever. I can find out if barcodes have been assigned to components with different box numbers; I can find out if components with the same barcode have been assigned to more than one location. This set of procedures has really been a godsend to help me know more about the problems I’m fixing. So many thanks to Steelsen. I hope others find them useful too.

Another Quick One — Locations where Accessions have been Assigned

If you assign accessions to locations, but move them around to a final home after processing, it may be helpful to see where your accessions were assigned and when the record was last touched. This query will help you do a little clean-up:

LocationsTable.coordinate1AlphaNumIndicator Shelf,
Accessions ON Accessions.accessionId = AccessionsLocations.accessionId
LocationsTable ON AccessionsLocations.locationId = LocationsTable.locationId

Here’s an example of some output. We may check, for instance, accessions from before 2015 to make sure that the accession location is still relevant. I hope this is useful to someone else!

Title Accn Date last modified loc accnID locID
Yale Guidance Nursery yearly reports 2010 A 085 2010-06-22 12:18:55 SML XXX X 7077 1933
Margenau, Henry, papers 2010 M 053 2010-11-16 15:51:37 SML XXX X 7078 1940

Quick Query — Finding Locations where Nothing is Assigned in Archivists’ Toolki

I just wrote a quick query to give records in the locations table in Archivists’ Toolkit that don’t have instances assigned to them. This sounds like a pretty common thing that folks want to see — here it is:

LocationsTable loc
loc.locationId BETWEEN 0 AND 10000
AND loc.locationId NOT IN (SELECT
ArchDescriptionInstances containers
containers.locationId BETWEEN 0 AND 10000)
AND loc.locationId NOT IN (SELECT
AccessionsLocations accession
accession.locationId BETWEEN 0 AND 10000);

Making DACS Dates

Manipulating date strings (which is the data type we usually have in archival description), particularly when you have a lot of legacy data, is a pain. I was working with a friend to update some legacy data in her finding aid, and it occurred to me that there isn’t a lot of direct guidance out there about how to manipulate dates with various tools. So, here’s a run-down of some of my methods — please feel free to add your own in the comments.

Why does this matter?

I’ll be honest, in a lot of situations, date formats don’t matter at all. I’ve said it before and I’ll say it again — we put a whole lot of effort into creating structured data, considering that most of us just flatten it into HTML and put it up as a webpage. However, there is a brighter tomorrow. With structured data, you can make far better interfaces, and there are really nice examples of places that let you do stuff with date data.

In the Princeton finding aids site, you can sort by title, date, or container. This means that in a series like this, in the George F. Kennan papers, where the archivist (or possibly creator) filed by title, this isn’t the only way to look through materials.

George Kennan Finding Aid

The order of materials as they are presented

George Kennan Finding Aid

The order of materials, sorted by me (the user) by date ascending.

Letting users sort by title or date means a few things — we can stop wasting time with alpha or chron arrangement and spend more of our energies on the true value that archivists add to description — context, meaning, transparency — without worrying that there’s too much for the researcher to sort through. It also means that we don’t have to presume that a researcher’s primary discovery vector is either time or title — we can let her choose for herself. Finally, and most importantly, we can let original arrangement schemes and organic order (the true intellectual basis of arrangement) reign supreme.

The other reason why date formats are important is because our content standard tells us they are. Now, I personally think that it’s actually far more important to associate an ISO-compliant date with a descriptive component, which can then be rendered any way you want, but since until recently our tools didn’t support that very well, I think that the DACS format of YYYY Month D brings us a step closer to easier date clean-up and extracting ISO compliant dates from date expressions.


Excel, odi et amo. Excel offers a GUI for programming-ish functions, but I find as I do more and more advanced stuff that I get frustrated by not knowing what exactly is happening with the underlying data. Dates are particularly frustrating, since Excel stores dates as a serial number starting with January 1, 1900. As an archivist who has PLENTY OF DATES from before then, this can lead to rage. There are a few ways to deal with this — if your dates are all 20th or 21st century, congratulations! You don’t have a problem. There are ways to get Excel to change the ways it assigns serial numbers, to allow for negative numbers, which let’s you do the normal sorting and date re-formatting. Or, you can store everything as text and move each part of the date string to its own column to manipulate it.

So, an example of a clean-up project:

Excel Dates -- untouched

In this data, we have a bit to clean up. When I start a clean-up project, I usually start with a pencil-and-paper list of all of the steps that I need to go through before I change anything. This way, I see if I need to do research about how to do a step, and I can also see if there are dependencies in the data that may require me to sequence these steps in a particular way. When you’re first learning, it’s easy to jump right in without planning, but trust me — every time I’ve been burned by automation it’s because I didn’t plan. In a live data environment, you should always know what the computer’s going to do before you run a command, even if that command is just a formula in Excel. The flip side of this is, of course, that as long as you have good back-ups, you should feel free to experiment and try new things. Just make sure you make the effort to figure out what actually happened when you’re experimenting suddenly produces the results you want to see.

So, here’s my list of steps to perform on this data.

  • Check my encoding, which in this case just means which data is in which columns. Do you see the row where some of the date data is in the title column? It’s in row 4. I would probably survey the data and see how prevalent this kind of problem is. If it’s just a handful of errors, I’ll move the data over by hand. If there’s more, I’ll figure out a script/formula to automate this.
  • Check for unwanted characters. In this case, get rid of brackets. In case you haven’t heard, brackets are not a meaningful way of indicating uncertainty to researchers. There is a certainty attribute on <unitdate> for that, which can then be rendered in your institution’s EAD -> HTML stylesheet. However, my problem with brackets is more fundamental — in archival description, the date element is just a transcription of what we see on the record. We don’t actually know that this date represents anything. So in reality, these are all guesses to varying degrees of certainty, with the aim of giving the researcher some clue to time.
  • Fix the date format. DACS dates are YYYY Month D. (e.g., 2015 March 6)
  • Create an ISO date to serialize as an @normal attribute with <unitdate>

Let’s skip the obvious clean-up tasks and go straight to formatting dates. If everything is after 1900 (and if everything is a three-part date), this is really straightforward.

First, create a new column. Use the DATEVALUE formula to tell Excel to regard your date string as a date value — if your date string is in B2, your formula in C2 should be:


Double-click on the bottom right corner of the cell to have that formula apply to the whole column.

Now that Excel knows that this is a date, you simply need to give it the format you want to see, in this case, yyyy mmmm d.

Choosing a custom date format

Choosing a custom date format.

This works great for three-part dates after 1900. If that’s not your situation, there are a few things you can do. One of my favorite methods is to filter the date list to each of the different date types and apply the custom date format to each of these (trying to apply a custom date format to a date that doesn’t fit the type will result in really confusing and bad results). Another option is to split the date into three different columns, treat each like text, and then bring them back together in the order you want with the CONCATENATE formula. Play around — Excel doesn’t make it easy, but there are lots of options.


If you do a lot of data manipulation, I would definitely encourage you to stop torturing yourself and learn OpenRefine. I use it every day. OpenRefine uses something called GREL (Google Refine Expression Language — I wonder if they’ll be changing that to OREL now that this isn’t under the Google umbrella?), which is trickier to learn than Excel formulas but a lot more powerful and more in alignment with other programming languages. In fact, I should say that you only need to learn GREL for the fancy stuff — a lot of OpenRefine’s magic can be done through the GUI.

So, looking through this data set, I would do a lot of the same steps. One option is to just use the commands Edit Cells -> Common Transformations -> To Date, but unfortunately, most of these strings aren’t written in a way that OpenRefine understands them as dates.

The best path forward is probably to split this date string apart and put it back together. You could split by whitespace and turn them into three columns, but since some dates are just a year, or a year and a month, you wouldn’t necessarily have each of the three parts of the date in the columns where you want them.

So, I’m going to tell OpenRefine what a year looks like and ask it to put the year in its own column.

This formula pulls the year from the date string and puts it into its own column.

This formula pulls the year from the date string and puts it into its own column.

In this formula, I’m partitioning the string by a four-digit number and then taking that part of the partition for my new column. In the case of the year, the formula is:


For a month it’s:


And for the day it’s:


There may be a more elegant way of partitioning this all as one step, but I don’t yet know how!

Then, once you have each of these parts of the date in their own columns, they should look like this:

Each part of the date element is in its own column.

Each part of the date element is in its own column.

The final step is to put the pieces back together in the order you want them. You can do this by clicking on the Year column, and selecting “Create column based on this column.” Then, use GREL to put everything in the order that you want to see it.

The plus signs signify that everything should be smushed together -- pay attention to the syntax of calling the value of columns.

The plus signs signify that everything should be smushed together — pay attention to the syntax of calling the value of columns.

The formula for this is:

value + " " + cells["Month"].value + " " + cells["Day"].value

And voila, you’ve turned your non-DACS date into a DACS-formatted date. You can use similar steps to make a column that creates an ISO-formatted date, too, although you’ll first have to convert months into two-digit numbers.

Finally, SQL

The two methods above require ETL — extract, transform, load. That is, you’re going to get data out of the database (or transform it into a tabbed sheet from xml) and then get it back into the database or the EAD (and then the database). There is a better way if you’re using Archivists’ Toolkit or ArchivesSpace, and it involves doing SQL updates. I’m going to punt on this for now, because I know that this will be a huge part of my future once we get into ArchivesSpace (I’ll also be creating normalized dates, which is data that Archivists’ Toolkit can’t store properly but ASpace can). So, stay tuned!

dwn w/ abbrs

Maybe you’ve heard — library and archives content standards are NOT DOWN with abbreviations these days. This is part of an effort to recognize that we no longer have to fit our descriptions onto catalog cards, and that the less confusing jargon we present to our patrons, the better!

In my repository, we made an official switch from abbreviated months to spelled-out months. However, there was still an enormous corpus of legacy abbreviations in Archivists’ Toolkit.

This is the kind of problem that power tools are great at solving. Check out my sql, posted here, for a big, fat find-and-replace that goes through the date expression field in AT and makes everything better. It looks for month abbreviations, common misspellings, variations of the term “circa” and variations on “undated.” I have my eye on brackets, too, but it might be too hairy to deal with them right now. Please feel free to use this in your own repository.

Also, stay tuned for part two, where I dispose of contemptible manuscript tradition abbreviations!

Happy New Year!

As we finish our first week back at work in 2015, we thought it might be nice to reflect on what we accomplished in 2014 and what our resolutions are for this year.

Looking Back


As I type this I am sitting in a living room piled high with boxes and strewn with bubble wrap and packing tape.  I finished my six and a half year run at Columbia on Friday and will be starting a new position at Emory University’s Manuscript, Archives, & Rare Book Library at the beginning of next month.

This past year was full of professional changes.  I got a new director, moved offices, our library annexed another unit which landed under my supervision, and our University Librarian retired at the end of the year.  Amidst all of this, though, my team and I managed to hit some pretty major milestones in the middle of the chaos and change-related-anxiety.  We completed a comprehensive collection survey that resulted in DACS compliant collection level records for all of our holdings, we published our 1000th EAD finding aid, and kept up with the 3000 plus feet of accessions that came through our doors.


Last year I spent a lot of time learning how to work with data more effectively (in part thanks to this blog!) I used OpenRefine and regular expressions to clean up accessions data. Did lots of ArchivesSpace planning, mapping, and draft policy work. Supervised an awesome field study. Participated in our Aeon implementation. Began rolling out changes to how we create metadata for archival collections and workflows for re-purposing the data. I also focused more than I ever have before on advocating for myself and the functions I oversee. This included a host of activities, including charting strategic directions, but mainly comprised lots of small conversations with colleagues and administrators about the importance of our work and the necessity to make programmatic changes. I also did a ton of UMD committee work. Oh, and got married! That was pretty happy and exciting.


2014 was my sixth year working as a professional archivist, and continued my streak (which has finally ended, I swear) of being a serial short-timer. Through June of last year, I worked with a devoted team of archives warriors at the Tamiment Library and Robert F. Wagner Labor Archives. There, we were committed to digging ourselves out of the hole of un-described resources, poor collection control, and an inconsistent research experience. Hence, my need for this blog and coterie of smart problem solvers. I also gave a talk at the Radcliffe Workshop on Technology and Archival Processing in April, which was an archives nerd’s dream — a chance to daydream, argue, and pontificate with archivists way smarter than I am.

In June I came to Yale — a vibrant, smart, driven environment where I work with people who have seen and done it all. And I got to do a lot of fun work where I learned more about technology, data, and archival description to solve problems. And I wrote a loooot of blog posts about how to get data in and out of systems.


It kind of feels like I did nothing this past year, other than have a baby and then learn how to live like a person who has a baby. 2014 was exhausting and wonderful. I still feel like I have a lot of tricks to learn about parenting; for example, how to get things done when there is a tiny person crawling around my floor looking for things to eat.

Revisiting my Outlook calendar reminds me that even with maternity leave, I had some exciting professional opportunities. I proposed, chaired, and spoke at a panel on acquisition, arrangement, and access for sexually explicit materials at the RBMS Conference in Las Vegas, and also presented a poster on HistoryPin at the SAA Conference in Washington, D.C. Duke’s Technical Services department continues to grow, so I served on a number of search committees, and chaired two of them. I continue to collaborate with colleagues to develop policies and guidelines for a wide range of issues, including archival housing, restrictions, description, and ingest. And we are *this close* to implementing ArchivesSpace, which is exciting.

Looking Forward


I have so much to look forward to this year!  I’m looking forward to learning a new city, to my first foray into the somewhat dubious joys of homeownership, and to being within easy walking distance of  Jeni’s ice cream shop.  And that’s all before I even think about my professional life.  My new position oversees not only archival processing, but also cataloging and description of MARBL’s print collections so I will be spending a lot of time learning about about rare book cataloging and thinking hard about how to streamline resource description across all formats.

Changing jobs is energizing and disruptive in the best possible way so my goal for the year is to settle in well and to learn as much much as possible– from my new colleagues, from my old friends, and from experts and interested parties across the profession.


I am super excited to be starting at the Orbis Cascade Alliance as a Program Manager in February. I’ll be heading up the new Collaborative Workforce Program covering the areas of shared human resources, workflow, policy, documentation, and training. The Alliance just completed migrating all 37 member institutions to a shared ILS. This is big stuff and a fantastic foundation to analyze areas for collaborative work.

While I can’t speak to specific goals yet, I know I will be spending a lot of time listening and learning. Implementing and refining a model for shared collaborative work is a big challenge, but has huge potential on so many fronts. I’m looking forward to learning from so many experts in areas of librarianship outside of my experiences/background. I’m also thrilled to be heading back to the PNW and hoping to bring a little balance back to life with time in the mountains and at the beach.


I have a short list of professional resolutions this year. Projects, tasks and a constant stream of email has a way of overshadowing what’s really important — I’ll count on my fellow bloggers to remind me of these priorities!

  • All ArchivesSpace, all the time. Check out the ArchivesSpace @ Yale blog for more information about this process.
  • I want to create opportunities for myself for meaningful direct interaction with researchers so that their points of view can help inform the decisions we make in the repository. This may mean that I take more time at the reference desk, do more teaching in classes, or find ways to reach out and understand how I can be of better service.
  • I want to develop an understanding of what the potential is for archival data in a linked data environment. I want to develop a vision of how we can best deploy this potential for our researchers.
  • I have colleagues here at Yale who are true experts at collection development — I want to learn more about practices, tips, tricks, pitfalls, and lessons learned.


I have a few concrete professional goals for the coming year:

  • I want to embrace ArchivesSpace and learn to use it like an expert.
  • I will finish my SPLC guide — the print cataloging is finished, so as soon as I get a chance I will get back to this project.
  • I have requested a regular desk shift so that I can stay more connected to the researchers using the collections we work so hard to describe.
  • I am working more closely with our curators and collectors on acquisitions and accessioning, including more travel.
  • My library is finishing a years-long renovation process, so this summer I will be involved with move-related projects (and celebrations). Hopefully there will be lots of cake for me in 2015.