Figuring Out What Has Been Done: Double Arrangement Statements

What was the problem?

During the transformation from EAD 1.0 to EAD 2002, table of contents information was migrated to an arrangement note, along with the already-existing arrangement note. As a clean-up task, folks were going through and paring this down to just one collection-level arrangement statement. We want to know how much of this is left to do.

How did I figure this out?

Luckily, this one is pretty easy. I’m just identifying documents that have more than one arrangement tag in a section (I could have specifically asked for more than one arrangement tag in the archdesc, but it doesn’t really matter. If I found dupe arrangement notes elsewhere I would want to know about them). The xquery is here, and it basically identifies where my files are,

declare variable $COLLECTION as document-node()* := collection("file:///C:/Users/cmc279/Desktop/ATExport/?recurse=yes;select=*.xml");

and then asks if there’s a second arrangement element

for $i in $COLLECTION//ead:ead
let $arrangement := exists($i//ead:arrangement[2]),

(that ead:arrangement[2] is basically saying “show me all the second arrangement notes” and exists() is just a way of getting a true/false answer about whether something is there — so here I’m asking “is it true or false that there’s a second arrangement note?)

since I want to know which finding aid it’s in, I say

$doc := base-uri($i)

And then I write a return statement to give myself a report with those variables:

return
<doc>
<uri>{$doc}</uri>
<arrangement>{$arrangement}</arrangement>
</doc>

What did I find out?

The report from this xQuery looked like this:

reportArrangement

(At first you’ll get the red bar of doom — just enclose everything in a root element to fix that). From here, I just wanted to know which finding aids had arrangement = true. To get a sense of how many, you can use the xpath 2.0 bar at the top of oXygen.

count(//arrangement[contains(., "true")])

This is, very simply, counting how many of these arrangement elements (remember, the arrangement variable asked for whether a second arrangement note exists) came back true. In our case, ONLY THREE! The people who did this clean-up work were clearly very thorough.

If it were significantly more, I might export this to excel to get a table of which EADs came back true. But since it’s only three, I used xpath again and wrote:

//arrangement[contains(., "true")]

At the bottom of the screen, it tells me which those are, and I can click on each to see the finding aid URI.

doubleArrangementResults

Conclusions

This was a pretty easy one to work out — fortunately, there were only three problem files in the end. Wouldn’t you much rather write a report to find those three problems than read through each finding aid (and we have more than 3,000!)?

Where are we with goals?

  1. Which finding aids from this project have been updated in Archivists’ Toolkit but have not yet been published to our finding aid portal?
  2. During the transformation from 1.0 to 2002, the text inside of mixed content was stripped (bioghist/blockquote, scopecontent/blockquote, scopecontent/emph, etc.). How much of this has been fixed and what remains?
  3. Container information is sometimes… off. Folders will be numbered 1-n across all boxes — instead of Box 1, Folders 1-20; Box 2, Folders 1-15, etc., we have Box 1, Folders 1-20; Box 2, Folders 21-35.
  4. Because of changes from 1.0 to 2002, it was common to have duplicate arrangement information in 1.0 (once as a table of contents, once as narrative information). During the transformation, this resulted in two arrangement statements.  We now know that only three finding aids have duplicate arrangement statements!
  5. The content of <title> was stripped in all cases. Where were <title> elements in 1.0 and has all the work been done to add them back to 2002?
  6. See/See Also references were (strangely) moved to parent components instead of where they belong. Is there a way of discovering the extent to which this problem endures?
  7. Notes were duplicated and moved to parent components. Again, is there a way of discovering the extent to which this problem endures?

 

Advertisements

One thought on “Figuring Out What Has Been Done: Double Arrangement Statements

  1. Pingback: Chaos —> Order | Figuring Out What Has Been Done: Making Sense of Versions

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s