Computational Thinking and Archives

Yesterday, I was part of a panel with two brilliant colleagues (and a really great moderator) about computational thinking and problem solving in a library context. The room was packed, and I found the discussion rejuvenating — I’ll do my best to capture some lightening in a bottle.

What am I talking about?

“Computational thinking” is a term to describe a mode of problem solving that we all engage in.

What is computational thinking? It’s not necessarily about learning how to program, but rather how to think strategically so you can solve a problem, reduce or eliminate tediously repetitive tasks, improve accuracy and increase efficiency. Drawing on computer science strategies such as finding the logical structure of a task, modeling data in a more accessible form, and figuring out how to apply iteration and algorithms to break a task into pieces that might be automated, computational thinking is a mindset that anyone can learn and apply.

Everything that you do a find and replace operation, you’re using computational thinking. If you’ve ever wanted to make find and replace more powerful or more exact, and you think you might know the rules for doing so, you’re well on your way to using computational methods.

How did I get here?

All three of us talked about not having formal backgrounds in computer science — Arcadia explicitly considers herself a “humanities person,” but fell in love with methods and tools she had developed to help do work better and more efficiently. Mark mentioned that he has an educational background in astronomy, but that for most of the computational work we do, we really only need pretty basic arithmetic. My master’s degree is in information science, but I did my undergraduate work in history (with particular interest in intellectual history) and I’ve never taken a computer science course in my life. I even weaseled out of the required information theory class in grad school so that I could take a doctoral seminar in the history department.

I didn’t talk about this during the panel, but I definitely have spent a lot of my life with a healthy distrust of “automation” and “efficiency” for their own sake. I poured beaucoup haterade during grad school. I felt sensitized to the obvious gender politics between the “computer” people who would go on to work at Amazon and Microsoft and the “people” people who would go on to work in non-profit libraries and archives. We were paying the same tuition, but we were very obviously not getting equitable access to resources. And I didn’t feel like the economists and computer scientists on the faculty had the same vocabulary I did about critical theory or social justice. I was there to be an archivist because I believed (and believe) that taking a cold, hard look at the past as it really was does a lot to dismantle fantasies of teleology and the inherent naturalness of patriarchy and white supremacy and nationalism and disdain for the poor and the inevitability of late-industrial capitalist society. Societies have been radically different than what they are now, which means that nothing is inevitable. I want to be an archivist because I believe that access to records can result in accountability to actors in the present and can provide leverage for the future.

I’m here to tell you that no one in my workflow analysis class was interested in talking about that. In fact, I sometimes think that technologists’ Utopian scientism — their dogged devotion to technical improvement — makes them susceptible to a Whiggish view of history and less observant of the lived constraints and injustices of others. When you spend your days “developing,” it’s not always easy to see that the world hasn’t gotten better on the whole for everyone. Now imagine a school founded on this worldview. I found this repellent, almost at a visceral level, entirely at odds with my education until then. Looking back, I wish that I had fled less into the embrace of archives, history and anthropology courses, and tried to get enough leverage to take a guerrilla approach to my education. There are gaps in my technical education that I wish I had taken the opportunity to fill.

Then I started working. And, as I mentioned on my panel, I really hate doing boring work. I hate feeling like a robot. I hate putting bad data into a filemaker database that’s not accessible on the web and not based on schemata. I hate the waste that comes from data that can’t be re-used. I hate the waste of doing something by hand that isn’t creative or analytic or synthetic.

In archives, we work in bulk, we work with records formed by computers, and discovery happens in a networked environment. It became clear very quickly that if I wanted to be good at my job, if I wanted to deal with information explosion, and I didn’t want to do boring crap anymore, that I should get nimble around a computer.

So for me, at least, finding my way around a computer was necessary so that I could make room for the intellectual work of being an archivist. And I found, to my delight, that applying computational solutions is a creative, intellectual exercise in its own right.

To wrap up, here is the rest of my fortune cookie wisdom about computational thinking and library / archives work:

  • Be uncomfortable with boring work. Let that discomfort point you toward better solutions.
  • Know why you’re doing a task in the first place. This will help you design a better method, and may even help you ditch the task altogether.
  • At first, it may take a lot more time to do something with a script than to do it by hand. But once you’ve done it with a script, you’ve actually learned something. If you do it by hand, you’re stuck in the same learning mode that you were before.
  • Agitate for learning time.
  • If you’re a manager, encourage not just that work gets done, but that it gets done the best way using the best tools for the job.
  • It’s going to be hard to learn this stuff. It’s especially going to be hard if you’re the member of a group that people don’t see as a typical technologist.
  • Haters gonna hate, especially on the internet. Find your supportive group. (It’s why we have this blog!)
  • Learn how to make good back-ups. You may kill your database/computer/whatever a few times. If you have a back-up, it’s no big deal.
  • You’re here for your users. Spend your time on stuff that benefits them, and let the robots do the boring stuff.

Put a strategic plan on it!

People who know me will know I love strategic planning. Or, more accurately, I love good strategic planning and how a strategic plan can assist you in many other activities.

Given that our library’s strategic plan is a few years old and our dean is retiring in the spring, the functional areas of SCUA didn’t want to wait for the whole library process to move forward. Luckily, there’s no rule that says you can’t have a strategic document for levels below the top or division/department.

While we didn’t go through a full blown strategic planning process, we had run many brainstorming, visioning, and planning activities over the last year and a half. Many of the projects in our document were already approved (officially or unofficially) and represented in individual and unit work plans.

Why did we need a plan then? When planning projects or allocating resources we seemed to encounter a few challenges. The biggest (to me) were a lack of understanding about:

  • The difference between work that is strategic to move a program forward v. the prioritization of regular ongoing work/projects
    • ex: processing the so and so papers may be a high priority on the list of collections to process, but this does not necessarily make that specific processing project a strategic priority
  • How the work of different functional areas within SCUA directly relate to one another, supports the work of the entire department, and how each unit/function can participate in meeting shared goals.

We determined three strategic directions across our work:

  1. Optimize the user experience
  2. Increase access to collections
  3. Expand knowledge of our collections to new audiences

Check out the full Strategic Directions for SCUA Functional Areas 2014-2017.

Here’s how I’m hoping to use our strategic directions document:

  • Raising awareness about what we do, why we do it, and its value within SCUA and the Libraries
  • Assist in developing annual work plans, how we spend our time, and evaluating our progress
  • Prioritization of pop up/new projects. Is it really a project that will move us forward? Does it have to happen right now? Can we approach it differently than before? What do we STOP doing from our strategic directions or regular/ongoing work to accommodate it?
  • Use as a tool for updating specific policies, procedures, and workflows highlighting how these changes support the activities and goals outlined in the strategic directions.
  • Advocating for resources at various levels within the library. Our AUL has already said this document will be extremely helpful as the libraries start to discuss priorities for fiscal and human resources for FY16.

Also, a hat tip to UCLA’s Library Special Collections strategic plan! We liked their presentation/formatting, so borrowed that for ours. Don’t reinvent the wheel!


Converting Preliminary Inventories to Tables with Macros: Folder Titles and Numbers

In my last post, I talked about how we have used macros at Maryland to start converting preliminary inventories into more useful formats to prepare for eventual importation to ArchivesSpace and create EAD finding aids. Using another macro, I have replaced folder title rows containing “(2f)” or “(3f)” with multiple rows. Using these two macros together, each single line in our tables will represent a single folder. To complete the process, I used a formula that quickly creates folder numbers. Folder numbers will allow you to sort the spreadsheet data while preserving the physical order and location of each folder.

In a Word document, the 2f or 3f shorthand might help preserve space in an inventory or finding aid, or it might seem more logical or aesthetically pleasing. But in a table, we want an accurate representation of each unit (the folder). In the Athletic Media Relations example that I discussed in my last post, there were 67 rows that contained either 2f, 3f,or 4f:


Copying these lines one by one would have taken an excessive amount of time. Instead of copying a folder title over and over again, I automated the task by using this macro:

Sub two_f()
' Sub and End Sub are commands that say the macro will start and end
'Do is a command that marks the beginning of a loop
'Search the B column for 2f instances
Selection.Find(What:=", (2f)", After:=ActiveCell, LookIn:=xlFormulas, _
LookAt:=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, _
MatchCase:=False, SearchFormat:=False).Activate
'Create a new line and copy down
ActiveCell.Offset(1, 0).Rows("1:1").EntireRow.Select
Selection.Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
'replace the “2f” from the line you copied and line you pasted
ActiveCell.Offset(-1, 1).Range("A1:A2").Select
Selection.Replace What:=", (2f)", Replacement:="", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
Loop Until Cells.findnext is nothing
MsgBox "Conversion complete!"
End Sub

This macro can be adjusted for instances of “(3f)”, “(4f)”, etc. The following code shows the appropriate adjustments:

Sub three_f()
' Sub and End Sub are commands that say the macro will start and end
'Do is a command that marks the beginning of a loop
'Search the B column for 3f instances
Selection.Find(What:=", (3f)", After:=ActiveCell, LookIn:=xlFormulas, _
LookAt:=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, _
MatchCase:=False, SearchFormat:=False).Activate
'Create two new lines and copy down each time
ActiveCell.Offset(1, 0).Rows("1:1").EntireRow.Select
Selection.Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
ActiveCell.Offset(1, 0).Rows("1:1").EntireRow.Select
Selection.Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
'replace the “3f” from the line you copied and two lines you pasted
ActiveCell.Offset(-2, 1).Range("A1:A3").Select
Selection.Replace What:=", (3f)", Replacement:="", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
Loop Until Cells.findnext is nothing
MsgBox "Conversion complete!"
End Sub

The code could also be adjusted accordingly to accommodate instances of 4f, 5f, etc. The macros will make the following adjustments to our example:


Within a few seconds, the macro created about 200 lines in this Athletic Media Relations spreadsheet.

As with any macro that you might copy and paste into your VBA Project Window, save frequently and make sure this fits your needs exactly to prevent any unintended results. For example, the sample code above searches for and replaces “(3f)”, but your inventories might not use parentheses.

Now that each line in this table represents a single folder in this collection, we should create a folder number to preserve the order inside each box.

Formulas have been used elsewhere on this blog. Like macros, formulas are extremely powerful for data cleanup and this one will hopefully give you some inspiration. As Cassie has written earlier, the IF formula can be very useful for testing a condition of your choice for a column of data. It can also be used for creating new data. In this case, I want to create a folder number.

This formula uses the same If/Then/Else logic as Cassie’s previous example. The =IF command requires three statements separated by commas. The first (“If”) is the true or false condition you are testing. The second (“Then”) is the result you want if the condition has been met; the third (“Else”) is the result that you want if the condition was not met.

In this formula, we want to see if a folder is in the same box as the previous folder. If it is, we want to continue a sequence of folder numbers (1 to 2, 2 to 3, etc.). If this folder is not in the same box, we want to reset the sequence of folder numbers and start over with the number one (“1”).

In the Excel IF formula, this logical expression looks like this:


The one problem here is that this formula will not work for your very first folder, since it has no previous line to refer back to. To resolve this, just put the number “1” as your first folder number and put the formula in the cell for the second folder number.


After you hit enter, fill down and don’t forget to copy the cells, and Paste Special your values!

As with macros, help with formulas is easy to find through Google, especially on sites such as or StackOverflow.

Converting Preliminary Inventories to Tables with Macros: Moving Box Numbers

At University of Maryland we have lots of folder- (or item-) level inventories for processed, partially processed, or unprocessed collections. Preliminary inventories were mostly created using Microsoft Word, which is meant to create written works on printed pages, not the tabular data that we need to operate between systems. Complicating matters further, these files were each structured quite differently. My challenge this summer was to bring some of these Word documents into a standardized table format. Getting this data into Excel is a first step before we can use it in an EAD finding aid/ArchivesSpace. This has required quite a bit of cleanup and conversion. By using macros and formulas, we can make the conversion process a little easier.

I started the conversion process using many of the same regular expressions that Bonnie has described here on the blog. I especially looked out for tabs and line breaks, which you can locate in Word by searching for ^p or ^t (as well as ^13 and ^9, if you are using wildcards). To convert the inventory into a table, I had to make sure that each folder title was separated by one line break, with no line breaks in between. This would allow me to copy and paste the contents of the Word document into Excel and create a clean table.

Cleaning up a Word document with regular expressions was only part of the conversion process. I was still left with a number of redundant tasks moving around standardized data in Excel. First among them was to eliminate lines reporting a box number and move that number into a separate column. This is where I started introducing macros into our conversion work.

A Microsoft Office macro is essentially a process that automatically executes a task or series of tasks. You can run one by clicking an option on a list or even pressing a custom-made hotkey. Microsoft Office macros use Visual Basic for Applications (VBA) programming language. VBA is intended to be a simple, forgiving language, but it still takes some time to learn. I will point out a few simple steps you can take to get started with macros and even how to customize each code.

The simplest way to create a macro is to record it. Microsoft Word and Excel can “record” your actions by translating your activity in the program into a written code. You can then save these actions and re-create them later by “playing” the macro.

To record a macro, go to the View tab, click on the Macros list, and click “Record Macro.” When you are finished recording, go back to the Macros list and click “Stop Recording.” To replay the macro (re-run the same operations you recorded) or edit the source code, go to the Macro list and click View Macro, then “Run” or “Edit”.

In some instances, you may already have the code for a macro but do not have it stored into your copy of Microsoft Excel. To insert the pre-written code into Excel, type Alt+F11 to open the VBA Project Window, then Insert -> Module. Paste your code into the new window and press the Play button.

The example that I will use is from one of our Athletic Media Relations accessions. If I were to move the words “Box 5” manually, I would perform a series of steps: search for “Box 5”, copy the cell contents, delete the row, find an adjacent empty cell to paste the value “5”, and fill “5” down. I wanted to turn this:

Excel Sample Before Macro Conversion

into this:


This particular inventory only had thirteen boxes, and could be converted in a minute or two manually, but that could take a very long time if you have a preliminary inventory with hundreds of boxes. So far I have applied the macro to about thirty inventories comprising several hundred boxes and saved hours of work. I used the following code to automate the process:

Sub Macro1()
' Sub and End Sub are commands that say the macro will start and end
'Create a new A column for the box number
Selection.Insert Shift:=xlToRight, CopyOrigin:=xlFormatFromLeftOrAbove
'Take the value Box 1 from Cell B1 to Cell A1
ActiveCell.Offset(0, 1).Range("A1").Select
ActiveCell.Offset(0, -1).Range("A1").Select
'Fill the box number down
Range(Selection, Selection.End(xlDown)).Select
'Delete the row that initially had Box 1 in it
'Do is a command that marks the beginning of a loop
'Selection.Find searches within the selection (the B column) for a cell with a box number
Selection.Find(What:="box *", After:=ActiveCell, LookIn:=xlFormulas, _
LookAt:=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, _
MatchCase:=False, SearchFormat:=False).Activate
ActiveCell.Offset(0, -1).Select
'Take out the word box in the A column so it is just a number
Columns(1).Cells.Replace What:="Box ", Replacement:="", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
Range(Selection, Selection.End(xlDown)).Select
Application.CutCopyMode = False
Selection.Delete Shift:=xlUp
'This statement causes the commands to repeat until there are no more cells with the word Box in it
Loop Until Cells.FindNext Is Nothing
'These last five lines clean up the leftover box number from the last fill down
ActiveCell.Offset(1, -1).Select
Range(Selection, Selection.End(xlDown)).Select
'End Sub is a required command that ends the macro
End Sub

Before you press “play,” a few words of warning. First, make sure you save a backup copy. You cannot undo the changes that are made through a macro! Any existing VBA code may need minor changes and tweaks to work with a new document, so be careful to review your code and make sure it fits your needs. Also, check to see if the macro will produce any unintended results when it runs.

The above sample macro identifies the “Box x” field by searching for the word “box” followed by a space and more characters. So make sure that there are no folder titles with the word “box” in it! (For example, in the sample screenshots above, “Box Scores” would need to be changed to read “Box-Scores”.)

For additional macro and VBA tips and workarounds, search Google for using “Excel”, “macro” and “VBA” as search terms. and StackOverflow are especially useful sites.

In the next blog post, I will introduce a few other Excel macros and functions we have used at Maryland to continue this conversion process.

Round-tripping EAD, updated — how to not lose those barcodes

Several weeks ago, Bonnie published a hugely-useful checklist of things to think about when trying to round-trip data from Archivists’ Toolkit to EAD and back to Archivists’ Toolkit to help minimize data loss.

Don’t lose those barcodes!

In a comment, my colleague Mark gave a very sensible suggestion for how to not lose barcodes during that trip. When you export an EAD from AT, the container information looks something like this:

<c id="ref21" level="file">
        <container id="cid1063028" type="Box" label="Mixed Materials (39002042652603)">7</container>
        <unitdate normal="1930/1940" type="inclusive">circa 1935-1950</unitdate>

What I see here is that two separate fields in the AT database have been jammed together for EAD export. “Mixed materials” lives in the ArchDescriptionInstances table in AT as instanceType. 39002042652603 lives in the ArchDescriptionInstances table as barcode. During export, these are brought together as container @label.

So, the trick is to get them back into the fields in tables where they belong. By default, AT imports container @label to ArchDescriptionInstances.instanceType. You want to get the barcode information into barcode. Here’s how.

I sometimes have bad luck getting AT to re-import if container @label=”Mixed Materials (39002042652603)” — I do a big find/replace to make that look like @label=”39002042652603″

Then, I import into AT.

Getting the barcodes where they belong is pretty easy, but it has to happen in SQL in the back-end database. If you haven’t written SQL before, you might want to run this by a database administrator, just to have another set of eyes on it. You DEFINITELY want to do this in the test or dev version of your database first.

Step 1: copy over barcodes from instanceType to barcode (this assumes 14-digit barcodes)

UPDATE ArchDescriptionInstances 
SET barcode = instanceType
WHERE instanceType REGEXP '^[0-9]{14}$';

Step 2: replace barcodes in instanceType with “Mixed materials” (this assumes that you want everything to be “Mixed Materials”. I have a rant in the works about how inappropriate it is to have this label on containers anyway, but I’ll save that for another day).

UPDATE ArchDescriptionInstances 
SET instanceType = 'Mixed materials'
WHERE instanceType REGEXP '^[0-9]{14}$';

Congratulations! You haven’t lost your barcodes! You can now delete the previous resource record and enjoy your new-and-improved one.

But that’s not all…

Extra super-secret tips for folks keeping Voyager item information in Archivists’ Toolkit.

So, where I work, this wouldn’t be enough. We have a whole other set of data in the ArchDescriptionInstances table that isn’t serialized in EAD. Container type, location code, Voyager bibid, container restriction and a Boolean of whether this information has been exported to Voyager are kept in the ArchDescriptionInstances table in user defined fields. We then have a background program that sends this information to Voyager, where holdings/item records are created.

This means that if I round-trip EAD, this item information that exists is disassociated with the descriptive information. This doesn’t work.

Note: If you’re also using the plug-in that Yale developed to keep Voyager holdings information in AT, could you please drop me a line? We’re starting to think about migration paths to ArchivesSpace.

But I had a brainwave — if I round-trip EAD with barcodes, I now have a key. The same information associated with a barcode which is now blank (what’s the Voyager bib/holdings ID? was it exported?), is present in a different record with the same barcode from when the information was entered before, before the EAD was round-tripped.

This means that no one has to re-enter the same information or re-export to Voyager. Check out my SQL update below — same warnings and caveats apply.


start transaction;

UPDATE schema.ArchDescriptionInstances AS ArchDesc

FROM schema.ArchDescriptionInstances
WHERE userDefinedString1 = "{{{bibid}}}"
) AS newdata ON newdata.barcode=ArchDesc.barcode
ArchDesc.userDefinedString1 = newdata.UserDefinedString1,
ArchDesc.userDefinedString2 = newdata.UserDefinedString2,
ArchDesc.locationId = newdata.locationId,
ArchDesc.userDefinedBoolean1 = newdata.userDefinedBoolean1,
ArchDesc.userDefinedBoolean2 = newdata.userDefinedBoolean2

ArchDesc.userDefinedString1 = ""
AND ArchDesc.userDefinedString2 = "";


When will restricted materials become available?

One morning recently, our records services archivist sent me an email. He was wondering if there was a way I could report to him on which materials in our university archives have restrictions that have passed. After all, this data is buried in access restriction notes all over finding aids — it would be very difficult to find this information by doing a search on our finding aids portal or in Archivists’ Toolkit.

This is exactly the kind of project that I love to do — it’s the intersection of archival functions, improved user experience, and metadata power tools.

In ArchivesSpace, restrictions have controlled date fields. This kind of report would be very easy in that kind of environment! Unfortunately, AT and EAD only has a place for this information as free text in notes.

Time for an xquery!

xquery version "3.0";
declare namespace ead="urn:isbn:1-931666-22-9";
declare namespace xlink = "";
declare namespace functx = "";

 for $ead in ead:ead
 let $doc := base-uri($ead)
 <document uri="{$doc}">
 for $accessrestrict in $ead//ead:dsc//ead:accessrestrict/ead:p[matches(.,'(19|20)[0-9]{2}')]
 let $series := $accessrestrict/ancestor::ead:c[@level = 'series' or @level = 'accession' or @level = 'accn']//ead:unitid
 let $dateseg := fn:substring-after($accessrestrict,'until')
 for $x in $series
 <lookhere location="{$x}">

And now for the walk-through.

Working together, we determined that any end dates will be below the <dsc>. So this report asks for any access restriction note below the dsc that includes a date in the twentieth or twenty-first century.

The report tells me what series that access restriction note is a part of and which file it’s a part of. I also pull out any text after the word “until”, because I see that common practice is to say “These materials will be restricted until XXXX.”

From there, I was able to put this data into an excel spreadsheet, do a bit of clean-up there, and give my colleague a sorted list of when particular series in collections are slated to be open.

Wrap Up – Women in Archives

We are so thankful for everyone who has read and shared our posts. Thanks to all who commented publicly on the blog, social media platforms, and personal messages. Thanks to those discussing these topics off line with your colleagues, friends, and family.

What have we learned over the past two weeks writing about women in archives?

  • We weren’t the only ones asking trusted colleagues “Has this happened to you?”, “Do I really come across as mean?”, “Did you notice there are no female finalists for this high level or technical position?”
  • There are still enormous differences in how men and women are treated in the workplace at large and in the archives and library professions. We only skimmed the surface of recent studies in a variety of fields highlighting a few of these differences.
  • The personal is political. Our deeply personal experiences intimately connect to a larger social and political structure. Everything we’ve discussed has an impact on how we do our jobs. (We didn’t even get to discussing women and technology, being viewed as experts, etc…)
  • Social networks and sharing are important. As Maureen said in our kick off post, one reason for this blog in the first place was to share our work to help each other. We need to help each other in other ways too. Whether that’s providing advice on how to handle an inappropriate situation in the workplace or encouraging someone to apply to that next step up job we know they would rock (even if they worry they aren’t ready/qualified yet).
  • These issues are hard to write about. (Hence a wrap up post five days later than planned.) We probably should have kept track of the hours it took.
  • There is so much more to talk about. We brainstormed many more potential posts. We bet you have ideas too.

So, what’s next? What do you think? What ways will you be advocating for women in the profession or your institution? How can we continue the conversation?