Case Study: Clean Data, Cool Project

SPLCblogEvery now and then I get to work on a project from the very beginning, meaning that instead of cleaning up legacy data, I get to define and collect the data from scratch. Such was the case with one of Duke’s recent acquisitions, the records of the Southern Poverty Law Center Intelligence Project. Beginning in the 1970s, SPLC collected publications and ephemera from a wide range of right-wing and left-wing extremist groups. The Intelligence Project included groups monitored by SPLC for militia-like or Ku Klux Klan-like activities. There are also many organizations represented in the collection that are not considered “hate groups”– they simply made it onto SPLC’s radar and therefore into the Project’s records. The collection arrived at Duke in good condition, but very disorganized. Issues of various serial titles were spread across 90 record cartons with no apparent rhyme or reason. Inserted throughout were pamphlets, fliers, and correspondence further documenting the organizations and individuals monitored by SPLC.

What do you do when an archival collection arrives and consists mostly of printed materials and serials? In the past, Duke did one of two things: either pull out the books/serials and catalog them separately, or leave them in the archival collection and list them in the finding aid, sort of like a bibliography within a box list. This project was a great opportunity to try out something new. In consultation with our rare book and serials catalogers, we developed a hybrid plan to handle SPLC. Since we had to do an intensive sort of the collection anyway, I used that chance to pull out the serials and house each title separately. They are now being cataloged individually by our serials cataloger, which will get them into OCLC and therefore more publicly available than they would ever be if just buried in a list in the finding aid. She is also creating authority records for the various organizations and individuals represented in the collection, allowing us to build connections across the various groups as they merged and split over time. While she catalogs the serials, I have been archivally processing the non-serial pieces of the collection, tracking materials by organization and describing them in an AT finding aid. When all of the serials are cataloged, I will update the finding aid to include links to each title, so that although the printed materials have been physically separated from their archival cousins, the entire original collection will be searchable and integrated intellectually within the context of the SPLC Collection.

To further ensure that the SPLC serials did not lose their original provenance, we developed a template that our cataloger is applying to each record to keep the titles intellectually united with their original collection. All of the serials being cataloged are receiving 541 and 561 fields identifying them as part of the SPLC Collection within the Rubenstein Special Collections Library. We are also adding 710s for the Southern Poverty Law Center, and an 856 that includes a link to the SPLC collection guide. (Duke inserts all its finding aid links in the 856 field, but we rarely do this for non-manuscript catalog records.) The result is a catalog record for each serial that makes it blatantly obvious that the title was acquired through the SPLC Collection, and that there are other titles also present within the collection, should researchers care to check out the links. But, cataloging the serials this way also allows the researcher to find materials without necessarily searching for “SPLC.”

Screenshot 2014-04-07 at 8.20.22 PM

An example of one of the SPLC serials: The Crusader, a KKK publication.

Along with hammering out our various print and manuscript workflows to better meet the needs of this collection, we also saw it as an opportunity to create and collect data that would allow us to easily extract information from all the discrete catalog records we are creating. We are being as consistent as possible with controlled vocabularies. Our serials cataloger is adding various 7xxs to track each publisher using RBMS or LOC relator codes. LOC geographic headings are being added as 752s. We are also trying to be consistent in applying genre terms in the 655 field using the RBMS gathering term “Political Works.”

Screenshot 2014-04-07 at 9.10.28 PM

A view of the MARC fields from The Crusader’s catalog record.

Equally important, we are replicating this sort of data collection in the archival description of the non-serial portions of the SPLC Collection. When we finally reunite the serials with the finding aid, the same sort of geographic, subject, and publisher data will allow us to match up all of the fields and create relationships between an organization’s random fliers and its various newsletters.

Furthermore, my colleagues and I have dreams of going beyond a basic finding aid to create some sort of portal that will capitalize on our clean data to offer researchers a new way to access this collection. SPLC’s own website has a neat map of the various hate groups it has identified in the United States, but we would like to build something that specifically addresses the organizations and topics represented in this particular collection–after all, the Intelligence Project collected materials from all sorts of groups. We’re thinking about using something like Google Fusion Tables or some other online tool that can both map and sort the groups and their various agendas, but also connect back to the catalog records and collection guide so that researchers can quickly get to the original sources too.

I’ll have more to report on this cool project — and what we end up doing with our clean data — as it continues to progress over the next few months. Already, our serials cataloger has created 55 new OCLC records for various serial titles, and has replaced or enhanced another 140. She’s about halfway done with the cataloging part of the project. With so many of these groups being obscure, secretive, or short-lived, we believe that creating such thorough catalog records is worth our time and energy. Not only will it make the titles widely discoverable in OCLC, but hopefully it will build connections for patrons across the diverse organizations represented within this collection.


3 thoughts on “Case Study: Clean Data, Cool Project

  1. Out of curiosity, do you have a rough sense for the amount of time that the 55 new serials records and editing of the 140 has taken?

  2. Hi Cassie: I’m not sure off the top of my head. Each record varies drastically depending on how much editing is needed. I know our cataloger is keeping stats so I’ll be sure to include that in my next post about the project.

  3. Pingback: Chaos —> Order | Happy New Year!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s