Put a strategic plan on it!

People who know me will know I love strategic planning. Or, more accurately, I love good strategic planning and how a strategic plan can assist you in many other activities.

Given that our library’s strategic plan is a few years old and our dean is retiring in the spring, the functional areas of SCUA didn’t want to wait for the whole library process to move forward. Luckily, there’s no rule that says you can’t have a strategic document for levels below the top or division/department.

While we didn’t go through a full blown strategic planning process, we had run many brainstorming, visioning, and planning activities over the last year and a half. Many of the projects in our document were already approved (officially or unofficially) and represented in individual and unit work plans.

Why did we need a plan then? When planning projects or allocating resources we seemed to encounter a few challenges. The biggest (to me) were a lack of understanding about:

  • The difference between work that is strategic to move a program forward v. the prioritization of regular ongoing work/projects
    • ex: processing the so and so papers may be a high priority on the list of collections to process, but this does not necessarily make that specific processing project a strategic priority
  • How the work of different functional areas within SCUA directly relate to one another, supports the work of the entire department, and how each unit/function can participate in meeting shared goals.

We determined three strategic directions across our work:

  1. Optimize the user experience
  2. Increase access to collections
  3. Expand knowledge of our collections to new audiences

Check out the full Strategic Directions for SCUA Functional Areas 2014-2017.

Here’s how I’m hoping to use our strategic directions document:

  • Raising awareness about what we do, why we do it, and its value within SCUA and the Libraries
  • Assist in developing annual work plans, how we spend our time, and evaluating our progress
  • Prioritization of pop up/new projects. Is it really a project that will move us forward? Does it have to happen right now? Can we approach it differently than before? What do we STOP doing from our strategic directions or regular/ongoing work to accommodate it?
  • Use as a tool for updating specific policies, procedures, and workflows highlighting how these changes support the activities and goals outlined in the strategic directions.
  • Advocating for resources at various levels within the library. Our AUL has already said this document will be extremely helpful as the libraries start to discuss priorities for fiscal and human resources for FY16.

Also, a hat tip to UCLA’s Library Special Collections strategic plan! We liked their presentation/formatting, so borrowed that for ours. Don’t reinvent the wheel!

 

A very brief guide to deleting records with the ArchivesSpace API, from a total tyro

If you’ve ever used cURL before, you don’t need this.

Also, the videos and documentation that Hudson Molonglo put together are really stellar and recommended to anyone starting with this.

This guide is a true project-pad of my notes of how I did this. It might also be useful for those of us who never had formal training with scripting, but are in charge of the archival data in our repositories and appreciate power tools. Obviously, the problem with power tools is that you can cut your arm off. Use this carefully. Use in test/dev. Ask someone to check your work if you’re doing something truly crazy.

Here’s what I did

This came up for me because I had done a failed test migration (we think there’s a weird timestamp problem in the accessions table) and I wanted to delete the repository and all records in the repository in ASpace before trying again. As far as I can tell, there isn’t a great way to delete thousands of records in the user interface. So, the API seemed the way to go.

I figured this out by watching the video and reading the documentation on GitHub, and then doing a little extra googling around to learn more about curl options.

If you’re using a Mac, just fire up the terminal and get on with your life. I use a Windows PC at work, so I use Cygwn as a Unix emulator. The internet gave me good advice about how to add curl.exe.

Note: you won’t be able to do any of this unless you have admin access.

Let’s start with “Hello, World!”

$ curl 'http://test-aspace.yourenvironment.org:port/'

In this example, the url before the colon should be your ASpace instance (use test/dev!) and “port” should be your port. The response you get should basically just tell you that yes, you have communicated with this server.

Connect to the server

$ curl -F password='your password' 'http://test-aspace.yourenvironment.org:port/users/admin/login'

Here, you’re logging on as admin. The server will respond with a session token — go ahead and copy the token response and make it a variable, so you don’t have to keep track of it.

export TOKEN=cc0984b7bfa0718bd5c831b419cb8353c7545edb63b62319a69cdd29ea5775fa

Delete the records

Here, you definitely want to check the API documentation on GitHub. Basically, this tells you how to format the URI and the command to use. For instance, below, I wanted to delete an entire repository. I found out, though, that I couldn’t delete the repository if it had records that belonged to it. Since agents and subjects exist in ASpace without belonging to a repository, and since accessions and digital records hadn’t successfully migrated, I only needed to delete resource records.

$ curl -H "X-ArchivesSpace-Session: $TOKEN" -X "DELETE" 'http://test-aspace.yourenvironment.org:port/repositories/3/resources/[278-1693]'

So, I passed something to the header that gave my token ID, then I sent a command to delete some records. But which ones?

Let’s parse this URI. The first part is my ASpace test server, the port is my port.

The next thing to understand is that each repository, resource, accession, agent, whatever, has a numeric ID. URIs are formatted according to the record type and the ID. So, I go to repositories/3, because the resources I want to delete are in a particular repository, and that repository has the numeric ID of “3”. In order to find this out, you can look in the ASpace interface, or you can send a call to yoururl/repositories, which will give you a json response with id (and other) information about all of the repositories on your server.

After that, I tell curl which resource records I want to delete. There’s probably a better way, but I figured this out by sorting resources by date created, both ascending and descending, to find out what the first and last IDs are. I’d imagine, though, that if I didn’t want to look that up and I just asked for

'http://test-aspace.yourenvironment.org:port/repositories/3/resources/[1-2000]'

I would probably be okay, because it’s only deleting resource records in repository 3 and I want to get rid of all of those anyway. I’d get an error for resources that don’t exist in that repository, but it wouldn’t break anything. I had wondered if there are wildcards for curl, so that I could get ANY number after resources, but (according to some brief googling) it doesn’t look like there are.

What does this all mean?

Uh, I don’t know? I mean, the API is obviously very powerful and amazing, and I’m glad I didn’t have to figure out a way to delete those records in the interface. But I’m really just starting to dip my toe into the potential of this. I’m sure you can look forward to more updates.

Backlog Control — Known Unknowns

As part of a continuing attempt to understand our holdings, I’ve been writing a series of reports against our EAD. Of course, what’s in a finding aid only accounts for the stuff that someone bothered to record in the first place. To tackle undescribed collections, we’ve also been doing a shelfread project to get an understanding of what’s on our shelves.

Today, we take an “accessioning as processing” approach to accruals — we describe what’s there at the appropriate level of description at the time of accessioning, and we include a lot of the background information about how it came to us, what it all means, etc., to help make sense of it. This helps us avoid building a backlog.

In the past, however, there was a mysterious (to me) understanding of the nature of processed/unprocessed materials. We have many, many series of materials in collections (usually accruals) that may even have file-level inventories but are described as “unprocessed.” They don’t include essential information about immediate source of acquisition, creators, or what about these materials makes them hang together. I’m frankly not sure what my predecessors were waiting for — they did all the work of creating lots of description without doing any real explanation!

So, my boss wanted a sense of these known knowns — parts of collections that we need to at least give a better series title, or somehow take out of the limbo of “unprocessed”. She wanted to know how many series there were, which collections these series belong to, and how many boxes of stuff we’re talking about. It would also be great to know linear footage, but this is frankly unknowable from the data we have.

So, I wrote an xQuery. You can find it here. The xQuery looks for any series or subseries that has the string “unprocessed” in its title. From there, it reports out the distinct values of containers. The result looks something like this:

Screen Shot 2014-05-06 at 10.06.10 PM

Perhaps you see the problem. Originally, I thought I just wanted to get a count of the distinct containers. My xpath for the variable that would give me box info (called footage here) originally looked like this:

$unprocessedfootage := count(distinct-values($series//ead:container[@type eq ‘Box’]))

The idea here was that it would take a series, get a census of the different boxes in that series, and count ’em up. But this gave me bad data. In the case of :

<containertype=“Box”>10-17</container>

I would have “10-17” be considered one distinct value in the count, when really it represents 8 boxes. The report as I first had it was severely undercounting boxes.

If I want to get a count of the distinct containers, I have to deal with ranges like 10-17. I started by importing this into OpenRefine and separated the multi-valued cells in the “unprocessed” column so that each number or range was in its own cell/row.

Then, I did some googling and came across this StackOverflow answer that explained how to enumerate the values in a range in Excel (this will give me 10, 11, 12, 13, 14, 15, 16 and 17 from 10-17). I exported from OpenRefine and brought the document into Excel, separated the ranges into two columns, and did a quick if/then statement to repeat single values in the second column. From there, I just ran the VBA code that was provided. I brought the document BACK into Refine and separated multi-valued cells again, and found out that we have 908 distinct boxes of “unprocessed” materials in 67 collections.

Now, happily, we know exactly how big of a mess our described “unprocessed” materials are, and we’re in a much better position to make good sense of them.

Update — 2014 May 7

@ostephens on twitter very helpfully pointed out that the dumb VBA step can easily be avoided by doing the work in OpenRefine.

He was completely right and was kind enough to give me the recipe

After multi-valued cells were each in their own row/cell, I separated by “-” so that the beginning and end of each range was in its own column. Then, I created a new column based on the “First” column and did the following:

Screen Shot 2014-05-07 at 10.29.10 AM

On error, it copies the value from the original column so that my “enum” column is everything I need. Once I had the values enumerated, I split multi-value cells again and ended up with a much more beautiful process.

You can follow my steps by importing my OpenRefine project here.