I mentioned in an earlier post that in anticipation for our three big archival systems projects (migration to ArchivesSpace from Archivists’ Toolkit, implementation of Aeon, and re-design of our finding aids portal), we’re taking a cold, hard look at our archival data. After all, both Aeon and the finding aids portal will be looking directly at the EAD to perform functions — both use xslt to display, manipulate, and transform the data.
So, there are some basic things we want to know. Will our data be good enough for Aeon to be able to turn EAD into a call slip (or add it to the proper processing queue, or know which reading room to send the call slip to)? Are our dates present and machine readable in such a way that the interface would be able to sort contents lists by date? And, while we’re at it, do our finding aids meet professional and local standards?
Let’s take a look at our criteria.
- Reference Code Element (2.1) — <unitid>
- Name and Location of Repository Element (2.2) — <repository>
- Title Element (2.3) — <unittitle>
- Date Element (2.4) — <unitdate>
- Extent Element (2.5) — <extent>
- Name of Creator(s) Element (2.6) (if known) — <origination>
- Scope and Content Element (3.1) — <scopecontent>
- Conditions Governing Access Element (4.1) — <accessrestrict>
- Languages and Scripts of the Material Element (4.5) — <langmaterial> (I decided to be generous and allow <langmaterial>/<language @langcode>, although I would prefer that there be content in the note)
For a descriptive record to meet DACS optimum standards, it must also include:
- Administrative/Biographical History Element (2.7) — <bioghist>
- Access points — <controlaccess>
At Tamiment, we’ve determined that the following elements must be included in a finding aid to meet local standards:
- Physical location note — <physloc>
- Restrictions on use note — <userestrict>
- Immediate source of acquisition note — <acqinfo>
- Appraisal note — <appraisal>
- Abstract — <abstract>
- Arrangement note — <arrangement>
- Processing information note — <processinfo>
- Our local standards also require that every series or subseries have a scope and content note, every component have a title, date and container, and every date be normalized.
I’ll talk about our reasons for these local standards in subsequent blog posts.
Finally, we’ve started thinking about which data elements must be present for us to be able to use the Aeon circulation system effectively. To print a call slip, a component in a finding aid needs the following information. Useful (but not required) fields are italicized:
- Reference code element / call number — <unitid>. We have to know what collection the patron is requesting.
- Repository note — <repository>. This should be a controlled string, so that the stylesheet knows which queue to send the call slip to. It may also be possible to do post-processing to add an attribute to this tag or a different tag, so that the string can vary but the attribute would be consistent enough for a computer to understand. In any case, we need SOME piece of controlled data telling us which reading room to visit to pull this material.
- Container information — <container>. Every paged container should have a unique combination of call number and box number. There’s no good way to check this computationally — we’ve all seen crazy systems of double numbering, numbering each series, etc.
- Collection title — <unittitle>. This is the title of the collection, which is useful for paging boxes.
- Physical location note — <physloc>. This isn’t strictly necessary, but it is very useful to know whether boxes are onsite or offsite.
- Access restrictions — <accessrestrict>. This is an operational requirement. By having the access restriction note, the page can see right away whether it’s okay to pull this box.
- Fancy-pants scripting piece to add location information…. This would require a lot of data standardization (and probably data gathering, in some cases), but it would be great to have the location on the repository-eyes-only side of the call slip.
So, how are we doing?
Frankly, I was pleasantly surprised. As you can see from the chart on the right, out of 1217 finding aids from that harvest, about two-thirds meet DACS single-level and optimum requirements. The reasons for failure vary –many are missing creator information, notes about the conditions governing access, and information about the language of material. Happily, information about the historical context of the collection and the presence of access points is fairly common.
We also see that the vast majority of our finding aids will meet the requirements for Aeon compliance. The problem of components without containers is a big one, but is something that we’ve obviously dealt with using paper call slips, and will have to be a remediation priority. Once this is addressed, we still have the outstanding issue of how to consistently tell the computer where a finding aid is coming from. Once we decide how we want that data to look, we’ll be able to fix it programmatically.
Our most distressing number is about local compliance, and the biggest offenders are physical location, immediate source of acquisition, and appraisal information. This reflects an overall trend in our repository of being careless with administrative information — we have very little information about when and how collections came and what interventions archivists made.
The requirement that appraisal information be included is extremely recent — unfortunately, this is the kind of information that is difficult to recover if not recorded at the time of processing. Hopefully, some information about appraisal may be included in processing information and separated materials notes.
For anyone interested in how our data breaks down, a chart is below.