Happy New Year!

As we finish our first week back at work in 2015, we thought it might be nice to reflect on what we accomplished in 2014 and what our resolutions are for this year.

Looking Back

Carrie

As I type this I am sitting in a living room piled high with boxes and strewn with bubble wrap and packing tape.  I finished my six and a half year run at Columbia on Friday and will be starting a new position at Emory University’s Manuscript, Archives, & Rare Book Library at the beginning of next month.

This past year was full of professional changes.  I got a new director, moved offices, our library annexed another unit which landed under my supervision, and our University Librarian retired at the end of the year.  Amidst all of this, though, my team and I managed to hit some pretty major milestones in the middle of the chaos and change-related-anxiety.  We completed a comprehensive collection survey that resulted in DACS compliant collection level records for all of our holdings, we published our 1000th EAD finding aid, and kept up with the 3000 plus feet of accessions that came through our doors.

Cassie

Last year I spent a lot of time learning how to work with data more effectively (in part thanks to this blog!) I used OpenRefine and regular expressions to clean up accessions data. Did lots of ArchivesSpace planning, mapping, and draft policy work. Supervised an awesome field study. Participated in our Aeon implementation. Began rolling out changes to how we create metadata for archival collections and workflows for re-purposing the data. I also focused more than I ever have before on advocating for myself and the functions I oversee. This included a host of activities, including charting strategic directions, but mainly comprised lots of small conversations with colleagues and administrators about the importance of our work and the necessity to make programmatic changes. I also did a ton of UMD committee work. Oh, and got married! That was pretty happy and exciting.

Maureen

2014 was my sixth year working as a professional archivist, and continued my streak (which has finally ended, I swear) of being a serial short-timer. Through June of last year, I worked with a devoted team of archives warriors at the Tamiment Library and Robert F. Wagner Labor Archives. There, we were committed to digging ourselves out of the hole of un-described resources, poor collection control, and an inconsistent research experience. Hence, my need for this blog and coterie of smart problem solvers. I also gave a talk at the Radcliffe Workshop on Technology and Archival Processing in April, which was an archives nerd’s dream — a chance to daydream, argue, and pontificate with archivists way smarter than I am.

In June I came to Yale — a vibrant, smart, driven environment where I work with people who have seen and done it all. And I got to do a lot of fun work where I learned more about technology, data, and archival description to solve problems. And I wrote a loooot of blog posts about how to get data in and out of systems.

Meghan

It kind of feels like I did nothing this past year, other than have a baby and then learn how to live like a person who has a baby. 2014 was exhausting and wonderful. I still feel like I have a lot of tricks to learn about parenting; for example, how to get things done when there is a tiny person crawling around my floor looking for things to eat.

Revisiting my Outlook calendar reminds me that even with maternity leave, I had some exciting professional opportunities. I proposed, chaired, and spoke at a panel on acquisition, arrangement, and access for sexually explicit materials at the RBMS Conference in Las Vegas, and also presented a poster on HistoryPin at the SAA Conference in Washington, D.C. Duke’s Technical Services department continues to grow, so I served on a number of search committees, and chaired two of them. I continue to collaborate with colleagues to develop policies and guidelines for a wide range of issues, including archival housing, restrictions, description, and ingest. And we are *this close* to implementing ArchivesSpace, which is exciting.

Looking Forward

Carrie

I have so much to look forward to this year!  I’m looking forward to learning a new city, to my first foray into the somewhat dubious joys of homeownership, and to being within easy walking distance of  Jeni’s ice cream shop.  And that’s all before I even think about my professional life.  My new position oversees not only archival processing, but also cataloging and description of MARBL’s print collections so I will be spending a lot of time learning about about rare book cataloging and thinking hard about how to streamline resource description across all formats.

Changing jobs is energizing and disruptive in the best possible way so my goal for the year is to settle in well and to learn as much much as possible– from my new colleagues, from my old friends, and from experts and interested parties across the profession.

Cassie

I am super excited to be starting at the Orbis Cascade Alliance as a Program Manager in February. I’ll be heading up the new Collaborative Workforce Program covering the areas of shared human resources, workflow, policy, documentation, and training. The Alliance just completed migrating all 37 member institutions to a shared ILS. This is big stuff and a fantastic foundation to analyze areas for collaborative work.

While I can’t speak to specific goals yet, I know I will be spending a lot of time listening and learning. Implementing and refining a model for shared collaborative work is a big challenge, but has huge potential on so many fronts. I’m looking forward to learning from so many experts in areas of librarianship outside of my experiences/background. I’m also thrilled to be heading back to the PNW and hoping to bring a little balance back to life with time in the mountains and at the beach.

Maureen

I have a short list of professional resolutions this year. Projects, tasks and a constant stream of email has a way of overshadowing what’s really important — I’ll count on my fellow bloggers to remind me of these priorities!

  • All ArchivesSpace, all the time. Check out the ArchivesSpace @ Yale blog for more information about this process.
  • I want to create opportunities for myself for meaningful direct interaction with researchers so that their points of view can help inform the decisions we make in the repository. This may mean that I take more time at the reference desk, do more teaching in classes, or find ways to reach out and understand how I can be of better service.
  • I want to develop an understanding of what the potential is for archival data in a linked data environment. I want to develop a vision of how we can best deploy this potential for our researchers.
  • I have colleagues here at Yale who are true experts at collection development — I want to learn more about practices, tips, tricks, pitfalls, and lessons learned.

Meghan

I have a few concrete professional goals for the coming year:

  • I want to embrace ArchivesSpace and learn to use it like an expert.
  • I will finish my SPLC guide — the print cataloging is finished, so as soon as I get a chance I will get back to this project.
  • I have requested a regular desk shift so that I can stay more connected to the researchers using the collections we work so hard to describe.
  • I am working more closely with our curators and collectors on acquisitions and accessioning, including more travel.
  • My library is finishing a years-long renovation process, so this summer I will be involved with move-related projects (and celebrations). Hopefully there will be lots of cake for me in 2015.

 

On Containers

I’m here  to talk about boxes. Get excited.

I’ve been spending a LOT of time lately thinking about containers — fixing them, modelling them, figuring out what they are and aren’t supposed to do. And I’ve basically come to the conclusion that as a whole, we spend too much time futzing with containers because we haven’t spent enough time figuring out what they’re for and what they do.

For instance, I wrote a blog post a couple of months ago about work we’re doing to remediate stuff that should not but is happening with containers — barcodes being assigned to two different containers, two different container types with the same barcode/identifier information, etc. Considering the scale of our collections, the scale of these problems is mercifully slight, but these are the kinds of problems that turn into a crisis if a patron is expecting to find material in the box she ordered and the material simply isn’t there.

I’m also working with my colleagues here at Yale and our ArchivesSpace development vendor Hudson Molonglo to add functionality to ArchivesSpace so that it’s easier to work with containers as containers. I wrote a blog post about it on our ArchivesSpace blog. In short, we want to make it much easier to do stuff like assigning locations, assigning barcodes, indicating that container information has been exported to our ILS, etc. In order to do this, we need to know exactly how we want containers to relate to archival description and how they relate to each other.

As I’ve been doing this thinking about specific container issues, I’ve had some thoughts about containers in general. Here they are, in no particular order.

What are container numbers doing for us?

A container number is just a human-readable barcode, right? Something to uniquely identify a container? In other words, speaking in terms of the data model, isn’t this data that says something different but means the same thing? And is this possibly a point of vulnerability? At the end of the day, isn’t a container number  something that we train users to care about when really they want the content they’ve identified? And we have a much better system for getting barcodes to uniquely identify something than we do with box numbers?

In the days that humans were putting box numbers on a call slip and another human was reading that and using that information to interpret shelf location, it made sense to ask the patron to be explicit about which containers were associated with the actual thing that they want to see. But I think that we’ve been too good at training them (and training ourselves) to think in terms of box numbers (and, internally, locations) instead of creating systems that do all of that on the back end. Information about containers should be uniform, unadorned, reliable, and interact seamlessly with data systems. Boxes should be stored wherever is best for their size and climate, and that should be tracked in a locations database that interacts with the requesting database. And the actual information should be associated seamlessly with containers.

This means that instead of writing down a call number and box number and reading a note about how materials of this type are stored on-site and materials of another type are stored off-site, let’s take a lot of human error out of this. Let’s let them just click on what they want to see. Then, the system says “a-ha! There are so many connections in my database! This record is in box 58704728702861, which is stored in C-29 Row 11, Bay 2, Shelf 2. I’ll send this to the queue that prints a call slip so a page can get that right away!” And instead of storing box numbers and folder numbers in the person’s “shopping cart” of what she’s seen, let’s store unique identifiers for the archival description, so that if that same record get’s re-housed into box 28704728702844 and moved to a different location, the patron doesn’t have to update her citation in any scholarly work she produces. Even if the collection gets re-processed, we could make sure that identifiers for stuff that’s truly the same persists.

Also, don’t tell me that box numbers do a good job of giving cues about order and scale. There are waaaaaayyyyy better ways of doing that than making people infer relationships based on how much material fits into 0.42 linear feet.

We have the concepts. Our practice needs to catch up, and our tools do too.

Darn it, Archivists’ Toolkit, you do some dumb things with containers

Archival management systems are, obviously, a huge step up from managing this kind of information in disparate documents and databases. But I think that we’re still a few years away from our systems meeting their potential. And I really think that folks who do deep thinking about archival description and standards development need to insert themselves into these conversations.

Here’s my favorite example. You know that thing where you’re doing description in AT and you want to associate a container with the records that you just described in a component? You know how it asks you what kind of an instance you want to create? That is not a thing. This is just part of the AT data model — there’s nothing like this in DACS, nothing like it in EAD. Actual archival standards are smart enough to not say very much about boxes because they’re boxes and who cares? When it exports to EAD, it serializes as @label. LABEL. The pinnacle of semantic nothingness!

This is not a thing.

This is not a thing.

Like, WHY? I can see that this could be the moment where AT is asking you “oh, hey, do you want to associate this with a physical container in a physical place or do you want to associate it with a digital object on teh interwebz?” but there’s probably a better way of doing this.

My problem with this is that it has resulted in A LOT of descriptive malpractice. Practitioners who aren’t familiar with how this serializes in EAD think that they’re describing the content (“oh yes! I’ve done the equivalent of assigning a form/genre term and declaring in a meaningful way that these are maps!”) when really they’ve put a label on the container. The container is not the stuff! If you want to describe the stuff, you do that somewhere else!

Oh my gosh, my exclamation point count is pretty high right now. I’ll see if I can pull myself together and soldier on.

Maybe we should be more explicit about container relationships.

Now, pop quiz, if you have something that is in the physical collection and has also been microfilmed, how do you indicate that?

In Archivists’ Toolkit, there’s nothing clear about this. You can associate more than one instance with an archival description, but you can also describe levels of containers that (ostensibly) describe the same stuff, but happen to be a numbered item within a folder, within a box.

Anything can happen here.

Anything can happen here.

So this means that in the scenario I mentioned above, it often happens that someone will put the reel number into container 3, making the database think that the reel is a child of the box.

But even if all of the data entry happens properly, EAD import into Archivists’ Toolkit will take any three <container> tags and instead of making them siblings, brings the three together into parent-child instance relationship like you see above. This helps maintain relationships between boxes and folders, but is a nightmare if you have a reel in there.

EAD has a way of representing these relationships, but the AT EAD export doesn’t really even do that properly.

 <c id="ref10" level="file">
   <did>
     <unittitle>Potter, Hannah</unittitle>
     <unitdate normal="1851/1851">1851</unitdate>
     <container id="cid342284" type="Box" label="Mixed Materials (39002038050457)">1</container>
     <container parent="cid342284" type="Folder">2</container>
   </did>
 </c>

 <c id="ref11" level="file">
   <did>
     <unittitle>Potter, Horace</unittitle>
     <unitdate normal="1824/1824">1824</unitdate>
     <container id="cid342283" type="Box" label="Mixed Materials (39002038050457)">1</container>
     <container parent="cid342283" type="Folder">3</container>
   </did>
 </c>

Here, we see that these box 1’s are the same — they have the same barcode (btw, see previous posts for help working out what to do with this crazy export and barcodes). But the container id makes it seem like these are two different things — they have two different container id’s and their folders refer two two different parents.

What we really want to say is “This box 1 is the same as the other box 1’s. It’s not the same as reel 22. Folder 2 is inside of box 1, and so is folder 3.” Once we get our systems to represent all of this, we can do much better automation, better reporting, and have a much more reliable sense of where our stuff is.

So if we want to be able to work with our containers as they actually are, we need to represent those properly in our technology. What should we be thinking about in our descriptive practice now that we’ve de-centered the box?

“Box” is not a level of description.

In ISAD(G) (explicitly) and DACS (implicitly), archivists are required to explain the level at which they’re describing aggregations of records. There isn’t a vocabulary for this, but traditionally, these levels include “collection”, “record group”, “series”, “file” and “item.” Note that “box” is not on this list or any other reasonable person’s list. I know everyone means well, and I would never discourage someone from processing materials in aggregate, but the term “box-level processing” is like nails on a chalkboard to me. As a concept, it should not be a thing. Now, series-level processing? Consider me on board! File-group processing? Awesome, sounds good! Do you want to break those file groups out into discrete groups of records that are often surrounded by a folder and hopefully are associated with distinctive terms, like proper nouns? Sure, if you think it will help and you don’t have anything better to do.

A box is usually just an accident of administravia. I truly believe that archivists’ value is our ability to discern and describe aggregations of records — that box is not a meaningful aggregation, and describing it as such gives a false impression of the importance of one linear foot of material. I’d really love to see a push toward better series-level or file-group-level description, and less file-level mapping, especially for organizations’ records. Often, unless someone is doing a known item search, there’s nothing distinct enough about individual files as evidence (and remember, this is why we do processing — to provide access to and explain records that give evidence of the past) to justify sub-dividing them. I also think that this could help us think past unnecessary sorting and related housekeeping — our job isn’t to make order from chaos*, it’s to explain records and their context of creation of use. If records were created chaotically and kept in a chaotic way, are we really illuminating anything by prescribing artificial order?

This kind of thinking will be increasingly important when our records aren’t tied to physical containers.

In conclusion, let’s leave the robot work to the robots.

If I never had to translate a call number to a shelf location again, it would be too soon (actually, we don’t do that at MSSA, but still). Let’s stop making our patrons care about boxes, and let’s start making our technology work for us.


* This blog’s title, Chaos –> Order, is not about bringing order to a chaotic past — it’s about bringing order to our repositories and to our work habits. In other words, get that beam out of your own eye, sucka, before you get your alphabetization on.

 

All ArchivesSpace, All the Time

It’s been quiet around here lately — look out for a handful of book review blog posts next week.

Until then, if you need a quick fix of archival order, I would encourage you to check out the ArchivesSpace @ Yale implementation blog here, and its companion site with associated documentation, here. I think that all four of us have ArchivesSpace on our agendas in one way or another, so you’ll probably be seeing more ArchivesSpace planning blog posts on this site, too.

Sniffing out problems — bad barcode information

Like many large repositories, we barcode all of our containers and keep them offsite. This is a pretty straightforward and on the whole error-resistant process, where an archivist uses a plug-in to Archivists’ Toolkit to select a container and then wands in a barcode to associate it with that container. That barcode then gets stored in an item record in our ILS, in the off-site storage system’s database, and on a barcode sheet. It’s also serialized as part of the EAD. When a patron wants a box, a series of systems and people  (including Aeon, our ILS, and our off-site storage workers) work behind the scenes to bring it here within two business days. It’s pretty magical, and it’s how we can provide a high quality of service to our patrons.

However, there are a few less fool-proof methods by which a barcode could be introduced to a database. A barcode could be entered as part of rapid data entry. Since container information is entered as part of each component, it’s possible to accidentally mistype the container number but wand the same barcode. It’s also possible for barcodes to be entered as part of EAD import (although they don’t make it easy) or as a SQL update.

We also have an extra legacy problem — a bad update (I don’t know if it was SQL or XSL) assigned barcodes based on container number, but didn’t check for container type. So, reel 1 was assigned the same barcode as box 1.

The diagnosis is pretty straightforward (although I attempted it a zillion much-harder ways first). As always, many many thanks to my colleague Mark, the brains behind this operation, who actually wrote this SQL query after seeing me flail about.

This query basically asks — okay. Give me a list of all of your distinct barcodes and container numbers. Do you have any barcodes that are associated with more than one container number? If so, that’s a huge problem! Report back that barcode!

Barcode 987654321 should only be associated with box 1 — if it’s associated with both box 1 AND box 2, that’s a real problem. It means either that somehow the barcode was duplicated (which we go to GREAT PAINS to prevent happening) or that there was a data entry error.

You can run this query against your own AT database to see if there were any barcodes assigned to more than one box.

select barcode,
group_concat(distinct coalesce(container1NumericIndicator, container1AlphaNumIndicator)) from ArchDescriptionInstances where barcode <> ''
group by barcode
having count(barcode) > 1
and count(distinct coalesce(container1NumericIndicator, container1AlphaNumIndicator)) > 1

Maybe you have a healthier relationship with the small stuff than I do, but this is the kind of thing that maintains a constant low-level stress buzz in the back of my head. I find it reassuring that we can run these kinds of reports/checks to know for sure that collection control is working the way it should.

Oh, and by the way, the diagnosis query for my reel/box problem is here:

SELECT barcode FROM
(
SELECT distinct barcode, container1Type FROM
ArchDescriptionInstances
) as tt
group by tt.barcode
having count(tt.barcode) > 1

I hope that someone out there finds this useful!

Put a strategic plan on it!

People who know me will know I love strategic planning. Or, more accurately, I love good strategic planning and how a strategic plan can assist you in many other activities.

Given that our library’s strategic plan is a few years old and our dean is retiring in the spring, the functional areas of SCUA didn’t want to wait for the whole library process to move forward. Luckily, there’s no rule that says you can’t have a strategic document for levels below the top or division/department.

While we didn’t go through a full blown strategic planning process, we had run many brainstorming, visioning, and planning activities over the last year and a half. Many of the projects in our document were already approved (officially or unofficially) and represented in individual and unit work plans.

Why did we need a plan then? When planning projects or allocating resources we seemed to encounter a few challenges. The biggest (to me) were a lack of understanding about:

  • The difference between work that is strategic to move a program forward v. the prioritization of regular ongoing work/projects
    • ex: processing the so and so papers may be a high priority on the list of collections to process, but this does not necessarily make that specific processing project a strategic priority
  • How the work of different functional areas within SCUA directly relate to one another, supports the work of the entire department, and how each unit/function can participate in meeting shared goals.

We determined three strategic directions across our work:

  1. Optimize the user experience
  2. Increase access to collections
  3. Expand knowledge of our collections to new audiences

Check out the full Strategic Directions for SCUA Functional Areas 2014-2017.

Here’s how I’m hoping to use our strategic directions document:

  • Raising awareness about what we do, why we do it, and its value within SCUA and the Libraries
  • Assist in developing annual work plans, how we spend our time, and evaluating our progress
  • Prioritization of pop up/new projects. Is it really a project that will move us forward? Does it have to happen right now? Can we approach it differently than before? What do we STOP doing from our strategic directions or regular/ongoing work to accommodate it?
  • Use as a tool for updating specific policies, procedures, and workflows highlighting how these changes support the activities and goals outlined in the strategic directions.
  • Advocating for resources at various levels within the library. Our AUL has already said this document will be extremely helpful as the libraries start to discuss priorities for fiscal and human resources for FY16.

Also, a hat tip to UCLA’s Library Special Collections strategic plan! We liked their presentation/formatting, so borrowed that for ours. Don’t reinvent the wheel!

 

When will restricted materials become available?

One morning recently, our records services archivist sent me an email. He was wondering if there was a way I could report to him on which materials in our university archives have restrictions that have passed. After all, this data is buried in access restriction notes all over finding aids — it would be very difficult to find this information by doing a search on our finding aids portal or in Archivists’ Toolkit.

This is exactly the kind of project that I love to do — it’s the intersection of archival functions, improved user experience, and metadata power tools.

In ArchivesSpace, restrictions have controlled date fields. This kind of report would be very easy in that kind of environment! Unfortunately, AT and EAD only has a place for this information as free text in notes.

Time for an xquery!

xquery version "3.0";
 
declare namespace ead="urn:isbn:1-931666-22-9";
declare namespace xlink = "http://www.w3.org/1999/xlink";
declare namespace functx = "http://www.functx.com";

<restrictions>
{
 for $ead in ead:ead
 let $doc := base-uri($ead)
 return
 <document uri="{$doc}">
 {
 for $accessrestrict in $ead//ead:dsc//ead:accessrestrict/ead:p[matches(.,'(19|20)[0-9]{2}')]
 let $series := $accessrestrict/ancestor::ead:c[@level = 'series' or @level = 'accession' or @level = 'accn']//ead:unitid
 let $dateseg := fn:substring-after($accessrestrict,'until')
 for $x in $series
 return
 
 <lookhere location="{$x}">
 {$accessrestrict}
 <date>{$dateseg}</date>
 </lookhere>
 }
 </document>
}
</restrictions>

And now for the walk-through.

Working together, we determined that any end dates will be below the <dsc>. So this report asks for any access restriction note below the dsc that includes a date in the twentieth or twenty-first century.

The report tells me what series that access restriction note is a part of and which file it’s a part of. I also pull out any text after the word “until”, because I see that common practice is to say “These materials will be restricted until XXXX.”

From there, I was able to put this data into an excel spreadsheet, do a bit of clean-up there, and give my colleague a sorted list of when particular series in collections are slated to be open.

SAA 2014 Sessions of Interest

Here are a few sessions (not comprehensive!) related to the content of this blog at SAA this week:

Wednesday, August 13
3:30pm – 5:00pm

Carrie: Friday, August 15 • 2:45pm – 3:45pm; SESSION 503 – How Are We Doing? Improving Access Through Assessment

Maureen: Friday, August 15 • 2:45pm – 3:45pm; SESSION 501 – Taken for Granted: How Term Positions Affect New Professionals and the Repositories That Employ Them

Meghan: Thursday, August 14 • 3:00pm – 3:30pm and Friday, August 15 • 4:00pm – 4:30pm; P05 PROFESSIONAL POSTER – Mapping Duke History with Historypin

Steve: Thursday, August 14 • 5:30pm – 7:30pm; Graduate Student Poster Presentations: ArchivesSpace and the Opportunity for Institutional Change

A very brief guide to deleting records with the ArchivesSpace API, from a total tyro

If you’ve ever used cURL before, you don’t need this.

Also, the videos and documentation that Hudson Molonglo put together are really stellar and recommended to anyone starting with this.

This guide is a true project-pad of my notes of how I did this. It might also be useful for those of us who never had formal training with scripting, but are in charge of the archival data in our repositories and appreciate power tools. Obviously, the problem with power tools is that you can cut your arm off. Use this carefully. Use in test/dev. Ask someone to check your work if you’re doing something truly crazy.

Here’s what I did

This came up for me because I had done a failed test migration (we think there’s a weird timestamp problem in the accessions table) and I wanted to delete the repository and all records in the repository in ASpace before trying again. As far as I can tell, there isn’t a great way to delete thousands of records in the user interface. So, the API seemed the way to go.

I figured this out by watching the video and reading the documentation on GitHub, and then doing a little extra googling around to learn more about curl options.

If you’re using a Mac, just fire up the terminal and get on with your life. I use a Windows PC at work, so I use Cygwn as a Unix emulator. The internet gave me good advice about how to add curl.exe.

Note: you won’t be able to do any of this unless you have admin access.

Let’s start with “Hello, World!”

$ curl 'http://test-aspace.yourenvironment.org:port/'

In this example, the url before the colon should be your ASpace instance (use test/dev!) and “port” should be your port. The response you get should basically just tell you that yes, you have communicated with this server.

Connect to the server

$ curl -F password='your password' 'http://test-aspace.yourenvironment.org:port/users/admin/login'

Here, you’re logging on as admin. The server will respond with a session token — go ahead and copy the token response and make it a variable, so you don’t have to keep track of it.

export TOKEN=cc0984b7bfa0718bd5c831b419cb8353c7545edb63b62319a69cdd29ea5775fa

Delete the records

Here, you definitely want to check the API documentation on GitHub. Basically, this tells you how to format the URI and the command to use. For instance, below, I wanted to delete an entire repository. I found out, though, that I couldn’t delete the repository if it had records that belonged to it. Since agents and subjects exist in ASpace without belonging to a repository, and since accessions and digital records hadn’t successfully migrated, I only needed to delete resource records.

$ curl -H "X-ArchivesSpace-Session: $TOKEN" -X "DELETE" 'http://test-aspace.yourenvironment.org:port/repositories/3/resources/[278-1693]'

So, I passed something to the header that gave my token ID, then I sent a command to delete some records. But which ones?

Let’s parse this URI. The first part is my ASpace test server, the port is my port.

The next thing to understand is that each repository, resource, accession, agent, whatever, has a numeric ID. URIs are formatted according to the record type and the ID. So, I go to repositories/3, because the resources I want to delete are in a particular repository, and that repository has the numeric ID of “3”. In order to find this out, you can look in the ASpace interface, or you can send a call to yoururl/repositories, which will give you a json response with id (and other) information about all of the repositories on your server.

After that, I tell curl which resource records I want to delete. There’s probably a better way, but I figured this out by sorting resources by date created, both ascending and descending, to find out what the first and last IDs are. I’d imagine, though, that if I didn’t want to look that up and I just asked for

'http://test-aspace.yourenvironment.org:port/repositories/3/resources/[1-2000]'

I would probably be okay, because it’s only deleting resource records in repository 3 and I want to get rid of all of those anyway. I’d get an error for resources that don’t exist in that repository, but it wouldn’t break anything. I had wondered if there are wildcards for curl, so that I could get ANY number after resources, but (according to some brief googling) it doesn’t look like there are.

What does this all mean?

Uh, I don’t know? I mean, the API is obviously very powerful and amazing, and I’m glad I didn’t have to figure out a way to delete those records in the interface. But I’m really just starting to dip my toe into the potential of this. I’m sure you can look forward to more updates.

Clean up: Instructions for accession date in accession records

Every accession record requires an accession date. ArchivesSpace requires this to save a record. Our processing manual instructs to include the date the materials arrived at UMD, but it’s buried in the Beast documentation as opposed to a policy/requirement. About 70% of the time there is a decent accession date in the Beast. We actually ended up checking our paper files and adding accession dates too. We weren’t going to do this originally, but ended up checking the paper for some other issues so went ahead and added an accession date when found.

Actions:

Every accession requires an accession date in YYYY-MM-DD format.

We have two date fields related to accession records. The “acqdate” field from the “source’ table that SHOULD specify the accession date of the materials. We also have the “dateoffirstentry” from the “archdesc” table that is the date the form in the Beast was entered. Usually, this date is later than the acqdate and pretty regularly it is much, much later.

If date in “acqdate” seems to be the same fiscal year as the accession number use this date.

When “acqdate” and “dateoffirstentry” both have similar dates, choose the earliest date, which will usually be in “acqdate”.

  • ex: “dateoffirstentry” is 6/18/2003 and “acqdate” is 06/11/2003. Use acqdate.

If no, “acqdate”, check “dateoffirstentry”. If this date seems to be the same fiscal year, use this date. If this field was obviously entered much later, do not use this date.

  • ex:  “dateoffirstentry” is 1/19/2003, no “acqdate”, accession number is 2003-19. Use dateoffirstentry.
  • ex: “dateoffirstentry” is 1/19/2003, no “acqdate”, accession number is 92-86. Do not use dateoffirstentry.

If you have a partial date:

  • Year and Month (August 2002)
    • Use 01 as the day value: 2002-08-01
    • (used first day of month to avoid different number of days in a month)
  • Season (Fall 2005 or Spring 2006)
    • Choose the end of the fiscal year or calendar year
    • Summer and Fall go to December 31, 2005
    • Winter and Spring go to June 30, 2006
  • Year only (2002, 1985)
    • Using the accession number, try to determine the fiscal year the materials came in.
    • If you can determine the fiscal year pick either the end of the fiscal year (June 30) or end of calendar year (December 31). This will help maintain that this accession came in during that fiscal year, but also attempt to ensure we know the calendar year the materials came in.
    • If the year matches the first part of the accession number, materials came in during the second half of the fiscal year, but the first half of the calendar year: January-June XXXX
      • Use June 30, XXXX
      • ex: 2002 with accession number 2002-172 is FY2002
        • FY2002 includes Jan-June 2002, use June 30, 2002
      • (used end of month to account for whole period)
    • If the year is one higher than the accession number, materials came in during the first half of the fiscal year, but the second half of the calendar year: July-December XXXX
      • Use December 31, XXXX
      • ex: 2003 with accession number 2002-267 is FY2003
        • FY2003 includes July-Dec 2002, use December 31, 2002
      • (used end of month to account for whole period)

If there is no date:

  • Use default date of 1856-01-01
    • Selected as the year of the charter of the Maryland Agricultural College.
    • Debated using the year of the accession number and selecting dummy months and days, but decided not to do this as it would negatively impact calendar year reporting.

What date to pick if there are multiple dates:

  • Unfortunately, a very common practice was to add new materials arriving to a recent accession instead of documenting each discreet group as their own accessions. There is no way to tell which materials arrived on which date so we can’t split this into two accession records.
    • Examples include:
      • September 11, 1990 and September 28, 1990
      • 7/27/2006 and 8/20/2006
    • Use the earliest date for the accession date.
    • In a general note state “Portions of this accession also arrived on [dates].”
  • Other times collections were not documented at the point of accession, so dates are generalized such as: circa 1990s.
    • Use default accession date of 1856-01-01
    • In a general note state “Accession date of materials may be [dates].”

 

 

 

 

Clean up: Instructions for accession identifier in accession records

Every accession record requires an unique accession number per local requirements. ArchivesSpace requires a value in the accession identifier field to save a record, which will greatly help in enforcing compliance for this field.

For the most part, our collections have accession numbers, though sometimes weird things happened. Most numbers look like this: 2014-18 or 92-113. There are a few variations with unit abbreviations as well as additional numbers or formatting.

Actions:

Format accession numbers according to the following:

  • accession_number_1 = fiscal year (2014 or 72)
  • accession_number_2 = number of accession for the year, will be sequential
    • Add leading zeros to second part of number. In OpenRefine: “000”[0,3-length(value)] + value
  • accession_number_3 = Letter code for collecting units
    • We are developing a letter code so we can quickly identify that an accession belongs to a particular collecting unit. We used an EAD tag for this in the Beast, which won’t fly anymore.
  • accession_number_4 = use for additional memorabilia or photograph numbers (more on that in another post, it’s a headache)

If more than one accession number is listed for a record:

  • Keep the oldest number for the record
  • Create a new record for each additional number, using the same title information
  • Include other information only if you know it is the same for each accession, ex: all the same donor
  • Add “described in Accession [acc#]” to the general information note. This will signal that the only description we have is contained in the earlier record. (Obviously, this is lousy for documenting accessions, but it’s the best we can do!)

If there is no accession number:

  • Consult the excel spreadsheet where accession numbers are assigned. Use the acqdate and dateoffirstentry fields to narrow down to a fiscal year and see if you can match up the information with an accession number.
  • Consult paper accession files (set ordered by accession name) to see if you can match something up.
  • Check the physical boxes to see if an accession number is on them (most of our boxes only have a title on them with no identifying number, a personal pet peeve.)
  • If nothing can be found anywhere, give accession a new number.
    • Add “retroactively assigned” to the general information note so we’ll know this was a backlog accession.
    • Write the accession number on box 1 of the materials.