Converting Preliminary Inventories to Tables with Macros: Folder Titles and Numbers

In my last post, I talked about how we have used macros at Maryland to start converting preliminary inventories into more useful formats to prepare for eventual importation to ArchivesSpace and create EAD finding aids. Using another macro, I have replaced folder title rows containing “(2f)” or “(3f)” with multiple rows. Using these two macros together, each single line in our tables will represent a single folder. To complete the process, I used a formula that quickly creates folder numbers. Folder numbers will allow you to sort the spreadsheet data while preserving the physical order and location of each folder.

In a Word document, the 2f or 3f shorthand might help preserve space in an inventory or finding aid, or it might seem more logical or aesthetically pleasing. But in a table, we want an accurate representation of each unit (the folder). In the Athletic Media Relations example that I discussed in my last post, there were 67 rows that contained either 2f, 3f,or 4f:

Untitled-2

Copying these lines one by one would have taken an excessive amount of time. Instead of copying a folder title over and over again, I automated the task by using this macro:

Sub two_f()
' Sub and End Sub are commands that say the macro will start and end
'Do is a command that marks the beginning of a loop
Do
Columns("B:B").EntireColumn.Select
'Search the B column for 2f instances
Selection.Find(What:=", (2f)", After:=ActiveCell, LookIn:=xlFormulas, _
LookAt:=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, _
MatchCase:=False, SearchFormat:=False).Activate
'Create a new line and copy down
ActiveCell.Offset(1, 0).Rows("1:1").EntireRow.Select
Selection.Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
Selection.FillDown
'replace the “2f” from the line you copied and line you pasted
ActiveCell.Offset(-1, 1).Range("A1:A2").Select
Selection.Replace What:=", (2f)", Replacement:="", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False
Loop Until Cells.findnext is nothing
Range("A1").Select
MsgBox "Conversion complete!"
End Sub

This macro can be adjusted for instances of “(3f)”, “(4f)”, etc. The following code shows the appropriate adjustments:

Sub three_f()
' Sub and End Sub are commands that say the macro will start and end
'Do is a command that marks the beginning of a loop
Do
Columns("B:B").EntireColumn.Select
'Search the B column for 3f instances
Selection.Find(What:=", (3f)", After:=ActiveCell, LookIn:=xlFormulas, _
LookAt:=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, _
MatchCase:=False, SearchFormat:=False).Activate
'Create two new lines and copy down each time
ActiveCell.Offset(1, 0).Rows("1:1").EntireRow.Select
Selection.Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
Selection.FillDown
ActiveCell.Offset(1, 0).Rows("1:1").EntireRow.Select
Selection.Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
Selection.FillDown
'replace the “3f” from the line you copied and two lines you pasted
ActiveCell.Offset(-2, 1).Range("A1:A3").Select
Selection.Replace What:=", (3f)", Replacement:="", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False
Loop Until Cells.findnext is nothing
Range("A1").Select
MsgBox "Conversion complete!"
End Sub

The code could also be adjusted accordingly to accommodate instances of 4f, 5f, etc. The macros will make the following adjustments to our example:

Untitled-3

Within a few seconds, the macro created about 200 lines in this Athletic Media Relations spreadsheet.

As with any macro that you might copy and paste into your VBA Project Window, save frequently and make sure this fits your needs exactly to prevent any unintended results. For example, the sample code above searches for and replaces “(3f)”, but your inventories might not use parentheses.

Now that each line in this table represents a single folder in this collection, we should create a folder number to preserve the order inside each box.

Formulas have been used elsewhere on this blog. Like macros, formulas are extremely powerful for data cleanup and this one will hopefully give you some inspiration. As Cassie has written earlier, the IF formula can be very useful for testing a condition of your choice for a column of data. It can also be used for creating new data. In this case, I want to create a folder number.

This formula uses the same If/Then/Else logic as Cassie’s previous example. The =IF command requires three statements separated by commas. The first (“If”) is the true or false condition you are testing. The second (“Then”) is the result you want if the condition has been met; the third (“Else”) is the result that you want if the condition was not met.

In this formula, we want to see if a folder is in the same box as the previous folder. If it is, we want to continue a sequence of folder numbers (1 to 2, 2 to 3, etc.). If this folder is not in the same box, we want to reset the sequence of folder numbers and start over with the number one (“1”).

In the Excel IF formula, this logical expression looks like this:

=IF(A2=A1,B1+1,1)

The one problem here is that this formula will not work for your very first folder, since it has no previous line to refer back to. To resolve this, just put the number “1” as your first folder number and put the formula in the cell for the second folder number.

Untitled-4

After you hit enter, fill down and don’t forget to copy the cells, and Paste Special your values!

As with macros, help with formulas is easy to find through Google, especially on sites such as MrExcel.com or StackOverflow.

Advertisements

Converting Preliminary Inventories to Tables with Macros: Moving Box Numbers

At University of Maryland we have lots of folder- (or item-) level inventories for processed, partially processed, or unprocessed collections. Preliminary inventories were mostly created using Microsoft Word, which is meant to create written works on printed pages, not the tabular data that we need to operate between systems. Complicating matters further, these files were each structured quite differently. My challenge this summer was to bring some of these Word documents into a standardized table format. Getting this data into Excel is a first step before we can use it in an EAD finding aid/ArchivesSpace. This has required quite a bit of cleanup and conversion. By using macros and formulas, we can make the conversion process a little easier.

I started the conversion process using many of the same regular expressions that Bonnie has described here on the blog. I especially looked out for tabs and line breaks, which you can locate in Word by searching for ^p or ^t (as well as ^13 and ^9, if you are using wildcards). To convert the inventory into a table, I had to make sure that each folder title was separated by one line break, with no line breaks in between. This would allow me to copy and paste the contents of the Word document into Excel and create a clean table.

Cleaning up a Word document with regular expressions was only part of the conversion process. I was still left with a number of redundant tasks moving around standardized data in Excel. First among them was to eliminate lines reporting a box number and move that number into a separate column. This is where I started introducing macros into our conversion work.

A Microsoft Office macro is essentially a process that automatically executes a task or series of tasks. You can run one by clicking an option on a list or even pressing a custom-made hotkey. Microsoft Office macros use Visual Basic for Applications (VBA) programming language. VBA is intended to be a simple, forgiving language, but it still takes some time to learn. I will point out a few simple steps you can take to get started with macros and even how to customize each code.

The simplest way to create a macro is to record it. Microsoft Word and Excel can “record” your actions by translating your activity in the program into a written code. You can then save these actions and re-create them later by “playing” the macro.

To record a macro, go to the View tab, click on the Macros list, and click “Record Macro.” When you are finished recording, go back to the Macros list and click “Stop Recording.” To replay the macro (re-run the same operations you recorded) or edit the source code, go to the Macro list and click View Macro, then “Run” or “Edit”.

In some instances, you may already have the code for a macro but do not have it stored into your copy of Microsoft Excel. To insert the pre-written code into Excel, type Alt+F11 to open the VBA Project Window, then Insert -> Module. Paste your code into the new window and press the Play button.

The example that I will use is from one of our Athletic Media Relations accessions. If I were to move the words “Box 5” manually, I would perform a series of steps: search for “Box 5”, copy the cell contents, delete the row, find an adjacent empty cell to paste the value “5”, and fill “5” down. I wanted to turn this:

Excel Sample Before Macro Conversion

into this:

Untitled-2

This particular inventory only had thirteen boxes, and could be converted in a minute or two manually, but that could take a very long time if you have a preliminary inventory with hundreds of boxes. So far I have applied the macro to about thirty inventories comprising several hundred boxes and saved hours of work. I used the following code to automate the process:

Sub Macro1()
' Sub and End Sub are commands that say the macro will start and end
'Create a new A column for the box number
Columns("A:A").Select
Selection.Insert Shift:=xlToRight, CopyOrigin:=xlFormatFromLeftOrAbove
'Take the value Box 1 from Cell B1 to Cell A1
ActiveCell.Offset(0, 1).Range("A1").Select
Selection.Cut
ActiveCell.Offset(0, -1).Range("A1").Select
ActiveSheet.Paste
'Fill the box number down
Selection.Copy
Range(Selection, Selection.End(xlDown)).Select
ActiveSheet.Paste
'Delete the row that initially had Box 1 in it
Rows(1).Delete
Columns("B:B").Select
'Do is a command that marks the beginning of a loop
Do
'Selection.Find searches within the selection (the B column) for a cell with a box number
Selection.Find(What:="box *", After:=ActiveCell, LookIn:=xlFormulas, _
LookAt:=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, _
MatchCase:=False, SearchFormat:=False).Activate
ActiveCell.Select
Selection.Cut
ActiveCell.Offset(0, -1).Select
ActiveSheet.Paste
'Take out the word box in the A column so it is just a number
Columns(1).Cells.Replace What:="Box ", Replacement:="", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False
Selection.Copy
Range(Selection, Selection.End(xlDown)).Select
ActiveSheet.Paste
ActiveCell.Rows("1:1").EntireRow.Select
Application.CutCopyMode = False
Selection.Delete Shift:=xlUp
Columns("B:B").Select
'This statement causes the commands to repeat until there are no more cells with the word Box in it
Loop Until Cells.FindNext Is Nothing
'These last five lines clean up the leftover box number from the last fill down
Selection.End(xlDown).Select
ActiveCell.Offset(1, -1).Select
Range(Selection, Selection.End(xlDown)).Select
Selection.ClearContents
Range("A1").Select
'End Sub is a required command that ends the macro
End Sub

Before you press “play,” a few words of warning. First, make sure you save a backup copy. You cannot undo the changes that are made through a macro! Any existing VBA code may need minor changes and tweaks to work with a new document, so be careful to review your code and make sure it fits your needs. Also, check to see if the macro will produce any unintended results when it runs.

The above sample macro identifies the “Box x” field by searching for the word “box” followed by a space and more characters. So make sure that there are no folder titles with the word “box” in it! (For example, in the sample screenshots above, “Box Scores” would need to be changed to read “Box-Scores”.)

For additional macro and VBA tips and workarounds, search Google for using “Excel”, “macro” and “VBA” as search terms. MrExcel.com and StackOverflow are especially useful sites.

In the next blog post, I will introduce a few other Excel macros and functions we have used at Maryland to continue this conversion process.

Case Study: Clean Data, Cool Project

SPLCblogEvery now and then I get to work on a project from the very beginning, meaning that instead of cleaning up legacy data, I get to define and collect the data from scratch. Such was the case with one of Duke’s recent acquisitions, the records of the Southern Poverty Law Center Intelligence Project. Beginning in the 1970s, SPLC collected publications and ephemera from a wide range of right-wing and left-wing extremist groups. The Intelligence Project included groups monitored by SPLC for militia-like or Ku Klux Klan-like activities. There are also many organizations represented in the collection that are not considered “hate groups”– they simply made it onto SPLC’s radar and therefore into the Project’s records. The collection arrived at Duke in good condition, but very disorganized. Issues of various serial titles were spread across 90 record cartons with no apparent rhyme or reason. Inserted throughout were pamphlets, fliers, and correspondence further documenting the organizations and individuals monitored by SPLC.

What do you do when an archival collection arrives and consists mostly of printed materials and serials? In the past, Duke did one of two things: either pull out the books/serials and catalog them separately, or leave them in the archival collection and list them in the finding aid, sort of like a bibliography within a box list. This project was a great opportunity to try out something new. In consultation with our rare book and serials catalogers, we developed a hybrid plan to handle SPLC. Since we had to do an intensive sort of the collection anyway, I used that chance to pull out the serials and house each title separately. They are now being cataloged individually by our serials cataloger, which will get them into OCLC and therefore more publicly available than they would ever be if just buried in a list in the finding aid. She is also creating authority records for the various organizations and individuals represented in the collection, allowing us to build connections across the various groups as they merged and split over time. While she catalogs the serials, I have been archivally processing the non-serial pieces of the collection, tracking materials by organization and describing them in an AT finding aid. When all of the serials are cataloged, I will update the finding aid to include links to each title, so that although the printed materials have been physically separated from their archival cousins, the entire original collection will be searchable and integrated intellectually within the context of the SPLC Collection.

To further ensure that the SPLC serials did not lose their original provenance, we developed a template that our cataloger is applying to each record to keep the titles intellectually united with their original collection. All of the serials being cataloged are receiving 541 and 561 fields identifying them as part of the SPLC Collection within the Rubenstein Special Collections Library. We are also adding 710s for the Southern Poverty Law Center, and an 856 that includes a link to the SPLC collection guide. (Duke inserts all its finding aid links in the 856 field, but we rarely do this for non-manuscript catalog records.) The result is a catalog record for each serial that makes it blatantly obvious that the title was acquired through the SPLC Collection, and that there are other titles also present within the collection, should researchers care to check out the links. But, cataloging the serials this way also allows the researcher to find materials without necessarily searching for “SPLC.”

Screenshot 2014-04-07 at 8.20.22 PM

An example of one of the SPLC serials: The Crusader, a KKK publication.

Along with hammering out our various print and manuscript workflows to better meet the needs of this collection, we also saw it as an opportunity to create and collect data that would allow us to easily extract information from all the discrete catalog records we are creating. We are being as consistent as possible with controlled vocabularies. Our serials cataloger is adding various 7xxs to track each publisher using RBMS or LOC relator codes. LOC geographic headings are being added as 752s. We are also trying to be consistent in applying genre terms in the 655 field using the RBMS gathering term “Political Works.”

Screenshot 2014-04-07 at 9.10.28 PM

A view of the MARC fields from The Crusader’s catalog record.

Equally important, we are replicating this sort of data collection in the archival description of the non-serial portions of the SPLC Collection. When we finally reunite the serials with the finding aid, the same sort of geographic, subject, and publisher data will allow us to match up all of the fields and create relationships between an organization’s random fliers and its various newsletters.

Furthermore, my colleagues and I have dreams of going beyond a basic finding aid to create some sort of portal that will capitalize on our clean data to offer researchers a new way to access this collection. SPLC’s own website has a neat map of the various hate groups it has identified in the United States, but we would like to build something that specifically addresses the organizations and topics represented in this particular collection–after all, the Intelligence Project collected materials from all sorts of groups. We’re thinking about using something like Google Fusion Tables or some other online tool that can both map and sort the groups and their various agendas, but also connect back to the catalog records and collection guide so that researchers can quickly get to the original sources too.

I’ll have more to report on this cool project — and what we end up doing with our clean data — as it continues to progress over the next few months. Already, our serials cataloger has created 55 new OCLC records for various serial titles, and has replaced or enhanced another 140. She’s about halfway done with the cataloging part of the project. With so many of these groups being obscure, secretive, or short-lived, we believe that creating such thorough catalog records is worth our time and energy. Not only will it make the titles widely discoverable in OCLC, but hopefully it will build connections for patrons across the diverse organizations represented within this collection.