Why database structure matters

I could have titled this post “Why Accessioning Information Matters” too. You may remember back to my first post about the Beast I included a lovely map of the fields and tables.

I didn’t talk in great depth about all the associated issues, but one of the biggest challenges migrating data out has been the archdescid table. In some respects, it’s easy as most of the fields make sense, have a clear field to map to in ArchivesSpace, and we have ways to clean up the data when it’s in the wrong spot. However, one of the hardest quirks is that the archdescid table includes both accessions and resources (aka collection) information in the same table and fields.

Archdesc table from Beast Database on the wall. First brainstorming of accession/resource records.

Archdesc table from Beast Database on the wall. First brainstorming of accession/resource records.

Staff access the information through two different front end forms in Access. The “accessions form” and the “finding aid form” pull various fields to display. Not all fields are available in both forms, but all the information is stored in the same place.

Staff view of Beast to enter accessions or finding aid information.

Staff view of Beast to enter accessions or finding aid information.

This means in the process of creating a finding aid accessions information is OVERWRITTEN. (Part of this is also due to the fact that we haven’t used separate “collection numbers”, but instead use the earliest accession number to identify a group of materials. This took away a necessary step of creating a new record for a finding aid.) Sometimes overwriting information is fine because maybe we had sketchy dates and now we have better ones. BUT, sometimes this means we are overwriting very specific accessions information with aggregate information about a larger set of materials meant for a finding aid.

For example, maybe we get the first installment of Sally Smith’s papers of 2 linear feet in 2002 and create an accession record to document that donation. By 2010, we’ve gotten three more accretions and decided to process all four accessions together and create a finding aid. Our finding aid is built from the 2002 record in the Beast.

It’s now 2014 and we just got another accretion. We create the 2014 accession record and decide to add the materials right away to the processed collection. Because of how the archdescid table is set up every time we added information to the Sally Smith papers finding aid we altered the accession record for the 2002 accession as the data in that record serves as our finding aid data too. By the time we’re done adding in our 2014 materials, the 2002 accession record now says 25 linear feet, the volume of all five accessions together. We have now lost the specific accessions information for the oldest accession for this collection (unless it happens to live on paper buried in control files.)

So the question is, how do we retroactively create accession records from a possibly aggregate finding aid?

We have two major categories of these accessions:

  • Records where we think the finding aid only represents one accession.
    • These range in size from a few items to almost 100 linear feet. We’re pretty confident that the information for smaller collections equates to accessions, but can’t be sure and who knows what the real correlation to size is.
  • Records where we know there are additional accretions.
    • Accretions may or may not be represented in the finding aid. Sometimes we tracked this, sometimes we didn’t. If we tracked it, we did it in different ways. Sometimes we know that some of the accretions were added and others weren’t. Or that some were and are unsure of the others. There’s lots of variation here.

We spent a good chuck of time analyzing these records trying to determine what information was reliable for accessions. Here’s what we think is mostly reliable:

  • Title
  • Accession number
  • Accession date (There are little to no accession dates supplied for these records, so most will inherit the default date)
  • Dates of materials
  • Access and use restrictions
  • Donor name and contact information

Information that’s more of a crap shot:

  • Extents
  • Contents description (will populate from a combo of our abstract and scope notes, includes lots of bio info, going to be messy and not ideal)

Solution for now:

Import these accession records with the information available knowing that for some (most?) the information doesn’t accurately document the accession.

Add a general note to explain the data. Something like “This accession record was created from an aggregate finding aid and may not represent accurate accessions data. Accession specific data may be available in control files.”

Continue to refine data as accessions/resources are worked on through our normal processing/prioritization workflows.

What would you do? Have others dealt with a similar problem?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s