At University of Maryland we have lots of folder- (or item-) level inventories for processed, partially processed, or unprocessed collections. Preliminary inventories were mostly created using Microsoft Word, which is meant to create written works on printed pages, not the tabular data that we need to operate between systems. Complicating matters further, these files were each structured quite differently. My challenge this summer was to bring some of these Word documents into a standardized table format. Getting this data into Excel is a first step before we can use it in an EAD finding aid/ArchivesSpace. This has required quite a bit of cleanup and conversion. By using macros and formulas, we can make the conversion process a little easier.
I started the conversion process using many of the same regular expressions that Bonnie has described here on the blog. I especially looked out for tabs and line breaks, which you can locate in Word by searching for ^p or ^t (as well as ^13 and ^9, if you are using wildcards). To convert the inventory into a table, I had to make sure that each folder title was separated by one line break, with no line breaks in between. This would allow me to copy and paste the contents of the Word document into Excel and create a clean table.
Cleaning up a Word document with regular expressions was only part of the conversion process. I was still left with a number of redundant tasks moving around standardized data in Excel. First among them was to eliminate lines reporting a box number and move that number into a separate column. This is where I started introducing macros into our conversion work.
A Microsoft Office macro is essentially a process that automatically executes a task or series of tasks. You can run one by clicking an option on a list or even pressing a custom-made hotkey. Microsoft Office macros use Visual Basic for Applications (VBA) programming language. VBA is intended to be a simple, forgiving language, but it still takes some time to learn. I will point out a few simple steps you can take to get started with macros and even how to customize each code.
The simplest way to create a macro is to record it. Microsoft Word and Excel can “record” your actions by translating your activity in the program into a written code. You can then save these actions and re-create them later by “playing” the macro.
To record a macro, go to the View tab, click on the Macros list, and click “Record Macro.” When you are finished recording, go back to the Macros list and click “Stop Recording.” To replay the macro (re-run the same operations you recorded) or edit the source code, go to the Macro list and click View Macro, then “Run” or “Edit”.
In some instances, you may already have the code for a macro but do not have it stored into your copy of Microsoft Excel. To insert the pre-written code into Excel, type Alt+F11 to open the VBA Project Window, then Insert -> Module. Paste your code into the new window and press the Play button.
The example that I will use is from one of our Athletic Media Relations accessions. If I were to move the words “Box 5” manually, I would perform a series of steps: search for “Box 5”, copy the cell contents, delete the row, find an adjacent empty cell to paste the value “5”, and fill “5” down. I wanted to turn this:
This particular inventory only had thirteen boxes, and could be converted in a minute or two manually, but that could take a very long time if you have a preliminary inventory with hundreds of boxes. So far I have applied the macro to about thirty inventories comprising several hundred boxes and saved hours of work. I used the following code to automate the process:
' Sub and End Sub are commands that say the macro will start and end
'Create a new A column for the box number
Selection.Insert Shift:=xlToRight, CopyOrigin:=xlFormatFromLeftOrAbove
'Take the value Box 1 from Cell B1 to Cell A1
'Fill the box number down
'Delete the row that initially had Box 1 in it
'Do is a command that marks the beginning of a loop
'Selection.Find searches within the selection (the B column) for a cell with a box number
Selection.Find(What:="box *", After:=ActiveCell, LookIn:=xlFormulas, _
LookAt:=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, _
'Take out the word box in the A column so it is just a number
Columns(1).Cells.Replace What:="Box ", Replacement:="", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
Application.CutCopyMode = False
'This statement causes the commands to repeat until there are no more cells with the word Box in it
Loop Until Cells.FindNext Is Nothing
'These last five lines clean up the leftover box number from the last fill down
'End Sub is a required command that ends the macro
Before you press “play,” a few words of warning. First, make sure you save a backup copy. You cannot undo the changes that are made through a macro! Any existing VBA code may need minor changes and tweaks to work with a new document, so be careful to review your code and make sure it fits your needs. Also, check to see if the macro will produce any unintended results when it runs.
The above sample macro identifies the “Box x” field by searching for the word “box” followed by a space and more characters. So make sure that there are no folder titles with the word “box” in it! (For example, in the sample screenshots above, “Box Scores” would need to be changed to read “Box-Scores”.)
For additional macro and VBA tips and workarounds, search Google for using “Excel”, “macro” and “VBA” as search terms. MrExcel.com and StackOverflow are especially useful sites.
In the next blog post, I will introduce a few other Excel macros and functions we have used at Maryland to continue this conversion process.