How I learned to stop worrying and love the API

At University of Maryland, we are migrating from a local MS Access database appropriately named the Beast. We chose to begin our migration project with our accessions data. To get this data into ArchivesSpace we decided to use the csv importer since it seemed to be the easiest format to crosswalk our data to, and honestly, the only option for us at the time.

minions

Okay. Let me catch my breath.

For us, it seemed that the lowest barrier for getting our accession data into ArchivesSpace was to use the csv importer. Since we could get our data out of the Beast in a spreadsheet format, this made the most sense at the time. (Oh, if we had only known.)

Our data was messier than we thought, so getting our data reconciled to the importer template had its fair share of hiccups. The clean-up is not the moral of this story, although a bit of summary may be useful: some of the issues were our own doing, such as missing accession numbers that required going back to the control files, and just missing data in general. Our other major issue was understanding the importer and the template. The documentation contained some competing messages regarding the list of columns, importance (or unimportance) ofcolumn order, as well as unanticipated changes to the system that were not always reflected in the csv importer and template We did finally manage to get a decent chunk of our data cleaned and in the template after almost a year of cleaning and restructuring thousands of records.

AND THEN. Just when we thought we had it all figured out, ArchivesSpace moved processing/processing status from collection management to events. Unfortunately, at the current time there is not a way to import event information via the CSV importer. So we were stuck. We had already invested a lot of time in cleaning up our accessions data and now had a pretty important piece of data that we could no longer ingest in that same manner.

In comes the ArchivesSpace API to save the day!!

[In hindsight, I wish we had just used the API for accessions in the first place, but when this process began we were all just wee babes and had nary a clue how to really use the API and really thought the csv importer was the only option for us. Oh how far we’ve come!]

So, we revised our process to:

  1. Clean accessions in excel/open refine
  2. Keep the processing data we would need to create the event record in a separate sheet to keep the data together
  3. Import accessions (minus the processing event info) using csv importer
  4. After successful import, have a bright-eyed student worker (thanks Emily!) do the thankless task (sorry Emily!) of recording the ID of each accession (which the API will need to associate the processing event with the correct accession) into that separate sheet mentioned in step 2
  5. Using the spreadsheet from step 4 as the source, create a spreadsheet that includes the accession id and associated processing status with the rest of the information required for importing events (Getting to know the various ArchivesSpace data schemas is important). To make life easier, you may want to just name the columns according to the schema elements to which they will map.openrefine_event
  6. Since the API wants this to be in a JSON file, I then upload this spreadsheet file into OpenRefine (see screenshot above). This gives me a chance to double check data, but most importantly, makes it REALLY easy for me to create the JSON file (I am not a programmer).
  7. Once I am happy with my data in OpenRefine, I go to export, templating, then I put in the custom template (see below) I’ve created to match data schemas (listed in step 5). Since some is boilerplate, I didn’t need to include it in the spreadsheet.

OR_template

Here’s the template I developed based on the schemas for event, date, and linked records:


{"jsonmodel_type":{{jsonize(cells["jsonmodelType"].value)}},"event_type":{{jsonize(cells["event_type"].value)}},"external_ids":[],"external_documents":[],"linked_agents":[{"role":"executing_program","ref":"/agents/software/1"}],"linked_records":[{"role":"source","ref":"/repositories/2/accessions/{{jsonize(cells["linked_records"].value)}}"}],"repository":{"ref":"/repositories/2"},"date":{"label":{{jsonize(cells["label"].value)}},"date_type":{{jsonize(cells["date_type"].value)}},"expression":{{jsonize(cells["date"].value)}},"jsonmodel_type":"date"}}

Then export! Make sure to save the file with a memorable filename.

I then open the file in a text editor (for me, TextWrangler does the trick) and I have to do two things: make sure all whitespaces have been removed (using find and replace), and make sure there is one json per line. (regex find and replace of \r). However, you should be able to create the template in such a way as to do this.

Then, I put together a little bash script that tells curl to take the json file that was just created, read it line by line and POST each line via the API.

#!/bin/bash

url="http://test-aspace.yourenvironment.org:port/repositories/[repo#]/event"
for line in $(cat your_events.json);
do
echo `curl -H "X-ArchivesSpace-Session: $TOKEN" -d "$line" $url`;
done

Now, I just transfer need to transfer both the bash script and the json file from my local files to the archivespace server. (using the command  scp <filename> <location>. If you’re like me, you may have needed to ask a sysadmin how to do this in the first place).

Make sure you have logged in, and exported the Session ID as a $TOKEN. (I won’t walk you through that whole process of logging in, since Maureen outlines it so well here, as does the the Bentley here.)

Now, from the command line, all you need to do is:

bash curl_json.sh

And there you go. You should see lines streaming by telling you that events have been created.

If you don’t…or if the messages you see are of error and not success, fear not. See if the message makes sense (often it will be an issue with a hard-to-catch format error in the json file, like a missing semi-colon, or an extra ‘/’ (I speak from experience). These are not always easy to suss out at first, and trust me, I spent a lot of time with trial and error to figure out what I was doing wrong (I am not a programmer, and still very, very new at this).

Figuring out how to get our processing event data into ArchivesSpace after hitting a major roadblock with the csv importer still feels like a great accomplishment. We were initially worried that we were going to have to either a) go without the data, or b) enter it manually. So to find a solution that doesn’t require too much manual work was satisfying, professionally speaking (did I mention I’m not a programmer and had never really dealt with APIs before?).

So to all of you out there working in ArchivesSpace, or in anything else, and you feel like you keep hitting a wall that’s a bit higher than what you’ve climbed before, keep at it! You’ll be amazed at what you can do.

Records Management for Discards

Maybe this is a familiar problem for some other archivists. You have a collection that you’ve just finished processing — maybe it’s a new acquisition, or maybe it’s been sitting around for awhile — and you have some boxes of weeded papers leftover, waiting to be discarded. But for some reason — a reason usually falling outside of your job purview — you are not able to discard them. Maybe the gift agreement insists that all discards be returned to the donor, and you can’t track down the donor without inviting another accession, and you just don’t have time or space for that right now. Maybe your library is about to renovate and move, and your curators are preoccupied with trying to install 10 exhibitions simultaneously. Maybe the acquisition was a high-value gift, for which the donor took a generous tax deduction, and your library is legally obligated to keep all parts of the gift for at least three years. Maybe your donor has vanished, the gift agreement is non-existent, or the discards are actually supposed to go to another institution and that institution isn’t ready to pay for them. The reasons don’t matter, really. You have boxes of archival material and you need to track them, but they aren’t a part of your archival collection any more. How do you manage these materials until the glorious day when you are actually able to discard them?

We’ve struggled with this at Duke for a long time, but it became a more pressing issue during our recent renovation and relocation. Boxes of discards couldn’t just sit in the stacks in a corner anymore; we had to send them to offsite storage, which meant they needed to be barcoded and tracked through our online catalog. We ended up attaching them to the collection record, which was not ideal. Because the rest of the collection was processed and available, we could not suppress the discard items from the public view of the catalog. (Discards Box 1 is not a pretty thing for our patrons to see.) Plus, it was too easy to attach them to the collection and then forget about the boxes, since they were out of sight in offsite storage. There was no easy way to regularly collect all the discard items for curators to review from across all our collections. It was messy and hard to use, and the items were never going to actually be discarded! This was no good.

I ended up making a Discards 2015 Collection, which is suppressed in the catalog and therefore not discoverable by patrons. All materials identified for discard in 2015 will be attached to this record. I also made an internal resource record in Archivists’ Toolkit (soon to be migrated to ArchivesSpace) that has a series for each collection with discards we are tracking for the year. It is linked to the AT accession records, if possible. In the resource record’s series descriptions, I record the details about the discards: what is being discarded, who processed it, who reviewed it, why we haven’t been able to discard it immediately, and when we expect to be able to discard the material (if known). The Discard Collection’s boxes are numbered, barcoded, and sent to offsite storage completely separated from their original collection — as it should be. No co-mingling, physically or intellectually! Plus, all our discards are tracked together, so from now on, I can remind our curators and other relevant parties at regular intervals about the boxes sitting offsite that need to be returned, shredded, sold, or whatever.

I’d love to hear other approaches to discards — this is a new strategy for us, so maybe I’ve missed something obvious that your institution has already solved. Let me know in the comments. Happy weeding, everyone!

On Containers

I’m here  to talk about boxes. Get excited.

I’ve been spending a LOT of time lately thinking about containers — fixing them, modelling them, figuring out what they are and aren’t supposed to do. And I’ve basically come to the conclusion that as a whole, we spend too much time futzing with containers because we haven’t spent enough time figuring out what they’re for and what they do.

For instance, I wrote a blog post a couple of months ago about work we’re doing to remediate stuff that should not but is happening with containers — barcodes being assigned to two different containers, two different container types with the same barcode/identifier information, etc. Considering the scale of our collections, the scale of these problems is mercifully slight, but these are the kinds of problems that turn into a crisis if a patron is expecting to find material in the box she ordered and the material simply isn’t there.

I’m also working with my colleagues here at Yale and our ArchivesSpace development vendor Hudson Molonglo to add functionality to ArchivesSpace so that it’s easier to work with containers as containers. I wrote a blog post about it on our ArchivesSpace blog. In short, we want to make it much easier to do stuff like assigning locations, assigning barcodes, indicating that container information has been exported to our ILS, etc. In order to do this, we need to know exactly how we want containers to relate to archival description and how they relate to each other.

As I’ve been doing this thinking about specific container issues, I’ve had some thoughts about containers in general. Here they are, in no particular order.

What are container numbers doing for us?

A container number is just a human-readable barcode, right? Something to uniquely identify a container? In other words, speaking in terms of the data model, isn’t this data that says something different but means the same thing? And is this possibly a point of vulnerability? At the end of the day, isn’t a container number  something that we train users to care about when really they want the content they’ve identified? And we have a much better system for getting barcodes to uniquely identify something than we do with box numbers?

In the days that humans were putting box numbers on a call slip and another human was reading that and using that information to interpret shelf location, it made sense to ask the patron to be explicit about which containers were associated with the actual thing that they want to see. But I think that we’ve been too good at training them (and training ourselves) to think in terms of box numbers (and, internally, locations) instead of creating systems that do all of that on the back end. Information about containers should be uniform, unadorned, reliable, and interact seamlessly with data systems. Boxes should be stored wherever is best for their size and climate, and that should be tracked in a locations database that interacts with the requesting database. And the actual information should be associated seamlessly with containers.

This means that instead of writing down a call number and box number and reading a note about how materials of this type are stored on-site and materials of another type are stored off-site, let’s take a lot of human error out of this. Let’s let them just click on what they want to see. Then, the system says “a-ha! There are so many connections in my database! This record is in box 58704728702861, which is stored in C-29 Row 11, Bay 2, Shelf 2. I’ll send this to the queue that prints a call slip so a page can get that right away!” And instead of storing box numbers and folder numbers in the person’s “shopping cart” of what she’s seen, let’s store unique identifiers for the archival description, so that if that same record get’s re-housed into box 28704728702844 and moved to a different location, the patron doesn’t have to update her citation in any scholarly work she produces. Even if the collection gets re-processed, we could make sure that identifiers for stuff that’s truly the same persists.

Also, don’t tell me that box numbers do a good job of giving cues about order and scale. There are waaaaaayyyyy better ways of doing that than making people infer relationships based on how much material fits into 0.42 linear feet.

We have the concepts. Our practice needs to catch up, and our tools do too.

Darn it, Archivists’ Toolkit, you do some dumb things with containers

Archival management systems are, obviously, a huge step up from managing this kind of information in disparate documents and databases. But I think that we’re still a few years away from our systems meeting their potential. And I really think that folks who do deep thinking about archival description and standards development need to insert themselves into these conversations.

Here’s my favorite example. You know that thing where you’re doing description in AT and you want to associate a container with the records that you just described in a component? You know how it asks you what kind of an instance you want to create? That is not a thing. This is just part of the AT data model — there’s nothing like this in DACS, nothing like it in EAD. Actual archival standards are smart enough to not say very much about boxes because they’re boxes and who cares? When it exports to EAD, it serializes as @label. LABEL. The pinnacle of semantic nothingness!

This is not a thing.

This is not a thing.

Like, WHY? I can see that this could be the moment where AT is asking you “oh, hey, do you want to associate this with a physical container in a physical place or do you want to associate it with a digital object on teh interwebz?” but there’s probably a better way of doing this.

My problem with this is that it has resulted in A LOT of descriptive malpractice. Practitioners who aren’t familiar with how this serializes in EAD think that they’re describing the content (“oh yes! I’ve done the equivalent of assigning a form/genre term and declaring in a meaningful way that these are maps!”) when really they’ve put a label on the container. The container is not the stuff! If you want to describe the stuff, you do that somewhere else!

Oh my gosh, my exclamation point count is pretty high right now. I’ll see if I can pull myself together and soldier on.

Maybe we should be more explicit about container relationships.

Now, pop quiz, if you have something that is in the physical collection and has also been microfilmed, how do you indicate that?

In Archivists’ Toolkit, there’s nothing clear about this. You can associate more than one instance with an archival description, but you can also describe levels of containers that (ostensibly) describe the same stuff, but happen to be a numbered item within a folder, within a box.

Anything can happen here.

Anything can happen here.

So this means that in the scenario I mentioned above, it often happens that someone will put the reel number into container 3, making the database think that the reel is a child of the box.

But even if all of the data entry happens properly, EAD import into Archivists’ Toolkit will take any three <container> tags and instead of making them siblings, brings the three together into parent-child instance relationship like you see above. This helps maintain relationships between boxes and folders, but is a nightmare if you have a reel in there.

EAD has a way of representing these relationships, but the AT EAD export doesn’t really even do that properly.

 <c id="ref10" level="file">
   <did>
     <unittitle>Potter, Hannah</unittitle>
     <unitdate normal="1851/1851">1851</unitdate>
     <container id="cid342284" type="Box" label="Mixed Materials (39002038050457)">1</container>
     <container parent="cid342284" type="Folder">2</container>
   </did>
 </c>

 <c id="ref11" level="file">
   <did>
     <unittitle>Potter, Horace</unittitle>
     <unitdate normal="1824/1824">1824</unitdate>
     <container id="cid342283" type="Box" label="Mixed Materials (39002038050457)">1</container>
     <container parent="cid342283" type="Folder">3</container>
   </did>
 </c>

Here, we see that these box 1’s are the same — they have the same barcode (btw, see previous posts for help working out what to do with this crazy export and barcodes). But the container id makes it seem like these are two different things — they have two different container id’s and their folders refer two two different parents.

What we really want to say is “This box 1 is the same as the other box 1’s. It’s not the same as reel 22. Folder 2 is inside of box 1, and so is folder 3.” Once we get our systems to represent all of this, we can do much better automation, better reporting, and have a much more reliable sense of where our stuff is.

So if we want to be able to work with our containers as they actually are, we need to represent those properly in our technology. What should we be thinking about in our descriptive practice now that we’ve de-centered the box?

“Box” is not a level of description.

In ISAD(G) (explicitly) and DACS (implicitly), archivists are required to explain the level at which they’re describing aggregations of records. There isn’t a vocabulary for this, but traditionally, these levels include “collection”, “record group”, “series”, “file” and “item.” Note that “box” is not on this list or any other reasonable person’s list. I know everyone means well, and I would never discourage someone from processing materials in aggregate, but the term “box-level processing” is like nails on a chalkboard to me. As a concept, it should not be a thing. Now, series-level processing? Consider me on board! File-group processing? Awesome, sounds good! Do you want to break those file groups out into discrete groups of records that are often surrounded by a folder and hopefully are associated with distinctive terms, like proper nouns? Sure, if you think it will help and you don’t have anything better to do.

A box is usually just an accident of administravia. I truly believe that archivists’ value is our ability to discern and describe aggregations of records — that box is not a meaningful aggregation, and describing it as such gives a false impression of the importance of one linear foot of material. I’d really love to see a push toward better series-level or file-group-level description, and less file-level mapping, especially for organizations’ records. Often, unless someone is doing a known item search, there’s nothing distinct enough about individual files as evidence (and remember, this is why we do processing — to provide access to and explain records that give evidence of the past) to justify sub-dividing them. I also think that this could help us think past unnecessary sorting and related housekeeping — our job isn’t to make order from chaos*, it’s to explain records and their context of creation of use. If records were created chaotically and kept in a chaotic way, are we really illuminating anything by prescribing artificial order?

This kind of thinking will be increasingly important when our records aren’t tied to physical containers.

In conclusion, let’s leave the robot work to the robots.

If I never had to translate a call number to a shelf location again, it would be too soon (actually, we don’t do that at MSSA, but still). Let’s stop making our patrons care about boxes, and let’s start making our technology work for us.


* This blog’s title, Chaos –> Order, is not about bringing order to a chaotic past — it’s about bringing order to our repositories and to our work habits. In other words, get that beam out of your own eye, sucka, before you get your alphabetization on.

 

All ArchivesSpace, All the Time

It’s been quiet around here lately — look out for a handful of book review blog posts next week.

Until then, if you need a quick fix of archival order, I would encourage you to check out the ArchivesSpace @ Yale implementation blog here, and its companion site with associated documentation, here. I think that all four of us have ArchivesSpace on our agendas in one way or another, so you’ll probably be seeing more ArchivesSpace planning blog posts on this site, too.

Converting Preliminary Inventories to Tables with Macros: Folder Titles and Numbers

In my last post, I talked about how we have used macros at Maryland to start converting preliminary inventories into more useful formats to prepare for eventual importation to ArchivesSpace and create EAD finding aids. Using another macro, I have replaced folder title rows containing “(2f)” or “(3f)” with multiple rows. Using these two macros together, each single line in our tables will represent a single folder. To complete the process, I used a formula that quickly creates folder numbers. Folder numbers will allow you to sort the spreadsheet data while preserving the physical order and location of each folder.

In a Word document, the 2f or 3f shorthand might help preserve space in an inventory or finding aid, or it might seem more logical or aesthetically pleasing. But in a table, we want an accurate representation of each unit (the folder). In the Athletic Media Relations example that I discussed in my last post, there were 67 rows that contained either 2f, 3f,or 4f:

Untitled-2

Copying these lines one by one would have taken an excessive amount of time. Instead of copying a folder title over and over again, I automated the task by using this macro:

Sub two_f()
' Sub and End Sub are commands that say the macro will start and end
'Do is a command that marks the beginning of a loop
Do
Columns("B:B").EntireColumn.Select
'Search the B column for 2f instances
Selection.Find(What:=", (2f)", After:=ActiveCell, LookIn:=xlFormulas, _
LookAt:=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, _
MatchCase:=False, SearchFormat:=False).Activate
'Create a new line and copy down
ActiveCell.Offset(1, 0).Rows("1:1").EntireRow.Select
Selection.Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
Selection.FillDown
'replace the “2f” from the line you copied and line you pasted
ActiveCell.Offset(-1, 1).Range("A1:A2").Select
Selection.Replace What:=", (2f)", Replacement:="", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False
Loop Until Cells.findnext is nothing
Range("A1").Select
MsgBox "Conversion complete!"
End Sub

This macro can be adjusted for instances of “(3f)”, “(4f)”, etc. The following code shows the appropriate adjustments:

Sub three_f()
' Sub and End Sub are commands that say the macro will start and end
'Do is a command that marks the beginning of a loop
Do
Columns("B:B").EntireColumn.Select
'Search the B column for 3f instances
Selection.Find(What:=", (3f)", After:=ActiveCell, LookIn:=xlFormulas, _
LookAt:=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, _
MatchCase:=False, SearchFormat:=False).Activate
'Create two new lines and copy down each time
ActiveCell.Offset(1, 0).Rows("1:1").EntireRow.Select
Selection.Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
Selection.FillDown
ActiveCell.Offset(1, 0).Rows("1:1").EntireRow.Select
Selection.Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
Selection.FillDown
'replace the “3f” from the line you copied and two lines you pasted
ActiveCell.Offset(-2, 1).Range("A1:A3").Select
Selection.Replace What:=", (3f)", Replacement:="", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False
Loop Until Cells.findnext is nothing
Range("A1").Select
MsgBox "Conversion complete!"
End Sub

The code could also be adjusted accordingly to accommodate instances of 4f, 5f, etc. The macros will make the following adjustments to our example:

Untitled-3

Within a few seconds, the macro created about 200 lines in this Athletic Media Relations spreadsheet.

As with any macro that you might copy and paste into your VBA Project Window, save frequently and make sure this fits your needs exactly to prevent any unintended results. For example, the sample code above searches for and replaces “(3f)”, but your inventories might not use parentheses.

Now that each line in this table represents a single folder in this collection, we should create a folder number to preserve the order inside each box.

Formulas have been used elsewhere on this blog. Like macros, formulas are extremely powerful for data cleanup and this one will hopefully give you some inspiration. As Cassie has written earlier, the IF formula can be very useful for testing a condition of your choice for a column of data. It can also be used for creating new data. In this case, I want to create a folder number.

This formula uses the same If/Then/Else logic as Cassie’s previous example. The =IF command requires three statements separated by commas. The first (“If”) is the true or false condition you are testing. The second (“Then”) is the result you want if the condition has been met; the third (“Else”) is the result that you want if the condition was not met.

In this formula, we want to see if a folder is in the same box as the previous folder. If it is, we want to continue a sequence of folder numbers (1 to 2, 2 to 3, etc.). If this folder is not in the same box, we want to reset the sequence of folder numbers and start over with the number one (“1”).

In the Excel IF formula, this logical expression looks like this:

=IF(A2=A1,B1+1,1)

The one problem here is that this formula will not work for your very first folder, since it has no previous line to refer back to. To resolve this, just put the number “1” as your first folder number and put the formula in the cell for the second folder number.

Untitled-4

After you hit enter, fill down and don’t forget to copy the cells, and Paste Special your values!

As with macros, help with formulas is easy to find through Google, especially on sites such as MrExcel.com or StackOverflow.

Converting Preliminary Inventories to Tables with Macros: Moving Box Numbers

At University of Maryland we have lots of folder- (or item-) level inventories for processed, partially processed, or unprocessed collections. Preliminary inventories were mostly created using Microsoft Word, which is meant to create written works on printed pages, not the tabular data that we need to operate between systems. Complicating matters further, these files were each structured quite differently. My challenge this summer was to bring some of these Word documents into a standardized table format. Getting this data into Excel is a first step before we can use it in an EAD finding aid/ArchivesSpace. This has required quite a bit of cleanup and conversion. By using macros and formulas, we can make the conversion process a little easier.

I started the conversion process using many of the same regular expressions that Bonnie has described here on the blog. I especially looked out for tabs and line breaks, which you can locate in Word by searching for ^p or ^t (as well as ^13 and ^9, if you are using wildcards). To convert the inventory into a table, I had to make sure that each folder title was separated by one line break, with no line breaks in between. This would allow me to copy and paste the contents of the Word document into Excel and create a clean table.

Cleaning up a Word document with regular expressions was only part of the conversion process. I was still left with a number of redundant tasks moving around standardized data in Excel. First among them was to eliminate lines reporting a box number and move that number into a separate column. This is where I started introducing macros into our conversion work.

A Microsoft Office macro is essentially a process that automatically executes a task or series of tasks. You can run one by clicking an option on a list or even pressing a custom-made hotkey. Microsoft Office macros use Visual Basic for Applications (VBA) programming language. VBA is intended to be a simple, forgiving language, but it still takes some time to learn. I will point out a few simple steps you can take to get started with macros and even how to customize each code.

The simplest way to create a macro is to record it. Microsoft Word and Excel can “record” your actions by translating your activity in the program into a written code. You can then save these actions and re-create them later by “playing” the macro.

To record a macro, go to the View tab, click on the Macros list, and click “Record Macro.” When you are finished recording, go back to the Macros list and click “Stop Recording.” To replay the macro (re-run the same operations you recorded) or edit the source code, go to the Macro list and click View Macro, then “Run” or “Edit”.

In some instances, you may already have the code for a macro but do not have it stored into your copy of Microsoft Excel. To insert the pre-written code into Excel, type Alt+F11 to open the VBA Project Window, then Insert -> Module. Paste your code into the new window and press the Play button.

The example that I will use is from one of our Athletic Media Relations accessions. If I were to move the words “Box 5” manually, I would perform a series of steps: search for “Box 5”, copy the cell contents, delete the row, find an adjacent empty cell to paste the value “5”, and fill “5” down. I wanted to turn this:

Excel Sample Before Macro Conversion

into this:

Untitled-2

This particular inventory only had thirteen boxes, and could be converted in a minute or two manually, but that could take a very long time if you have a preliminary inventory with hundreds of boxes. So far I have applied the macro to about thirty inventories comprising several hundred boxes and saved hours of work. I used the following code to automate the process:

Sub Macro1()
' Sub and End Sub are commands that say the macro will start and end
'Create a new A column for the box number
Columns("A:A").Select
Selection.Insert Shift:=xlToRight, CopyOrigin:=xlFormatFromLeftOrAbove
'Take the value Box 1 from Cell B1 to Cell A1
ActiveCell.Offset(0, 1).Range("A1").Select
Selection.Cut
ActiveCell.Offset(0, -1).Range("A1").Select
ActiveSheet.Paste
'Fill the box number down
Selection.Copy
Range(Selection, Selection.End(xlDown)).Select
ActiveSheet.Paste
'Delete the row that initially had Box 1 in it
Rows(1).Delete
Columns("B:B").Select
'Do is a command that marks the beginning of a loop
Do
'Selection.Find searches within the selection (the B column) for a cell with a box number
Selection.Find(What:="box *", After:=ActiveCell, LookIn:=xlFormulas, _
LookAt:=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, _
MatchCase:=False, SearchFormat:=False).Activate
ActiveCell.Select
Selection.Cut
ActiveCell.Offset(0, -1).Select
ActiveSheet.Paste
'Take out the word box in the A column so it is just a number
Columns(1).Cells.Replace What:="Box ", Replacement:="", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False
Selection.Copy
Range(Selection, Selection.End(xlDown)).Select
ActiveSheet.Paste
ActiveCell.Rows("1:1").EntireRow.Select
Application.CutCopyMode = False
Selection.Delete Shift:=xlUp
Columns("B:B").Select
'This statement causes the commands to repeat until there are no more cells with the word Box in it
Loop Until Cells.FindNext Is Nothing
'These last five lines clean up the leftover box number from the last fill down
Selection.End(xlDown).Select
ActiveCell.Offset(1, -1).Select
Range(Selection, Selection.End(xlDown)).Select
Selection.ClearContents
Range("A1").Select
'End Sub is a required command that ends the macro
End Sub

Before you press “play,” a few words of warning. First, make sure you save a backup copy. You cannot undo the changes that are made through a macro! Any existing VBA code may need minor changes and tweaks to work with a new document, so be careful to review your code and make sure it fits your needs. Also, check to see if the macro will produce any unintended results when it runs.

The above sample macro identifies the “Box x” field by searching for the word “box” followed by a space and more characters. So make sure that there are no folder titles with the word “box” in it! (For example, in the sample screenshots above, “Box Scores” would need to be changed to read “Box-Scores”.)

For additional macro and VBA tips and workarounds, search Google for using “Excel”, “macro” and “VBA” as search terms. MrExcel.com and StackOverflow are especially useful sites.

In the next blog post, I will introduce a few other Excel macros and functions we have used at Maryland to continue this conversion process.

When will restricted materials become available?

One morning recently, our records services archivist sent me an email. He was wondering if there was a way I could report to him on which materials in our university archives have restrictions that have passed. After all, this data is buried in access restriction notes all over finding aids — it would be very difficult to find this information by doing a search on our finding aids portal or in Archivists’ Toolkit.

This is exactly the kind of project that I love to do — it’s the intersection of archival functions, improved user experience, and metadata power tools.

In ArchivesSpace, restrictions have controlled date fields. This kind of report would be very easy in that kind of environment! Unfortunately, AT and EAD only has a place for this information as free text in notes.

Time for an xquery!

xquery version "3.0";
 
declare namespace ead="urn:isbn:1-931666-22-9";
declare namespace xlink = "http://www.w3.org/1999/xlink";
declare namespace functx = "http://www.functx.com";

<restrictions>
{
 for $ead in ead:ead
 let $doc := base-uri($ead)
 return
 <document uri="{$doc}">
 {
 for $accessrestrict in $ead//ead:dsc//ead:accessrestrict/ead:p[matches(.,'(19|20)[0-9]{2}')]
 let $series := $accessrestrict/ancestor::ead:c[@level = 'series' or @level = 'accession' or @level = 'accn']//ead:unitid
 let $dateseg := fn:substring-after($accessrestrict,'until')
 for $x in $series
 return
 
 <lookhere location="{$x}">
 {$accessrestrict}
 <date>{$dateseg}</date>
 </lookhere>
 }
 </document>
}
</restrictions>

And now for the walk-through.

Working together, we determined that any end dates will be below the <dsc>. So this report asks for any access restriction note below the dsc that includes a date in the twentieth or twenty-first century.

The report tells me what series that access restriction note is a part of and which file it’s a part of. I also pull out any text after the word “until”, because I see that common practice is to say “These materials will be restricted until XXXX.”

From there, I was able to put this data into an excel spreadsheet, do a bit of clean-up there, and give my colleague a sorted list of when particular series in collections are slated to be open.