Building a Case: Semantic URLs for Finding Aids

I’ve been working on the Beast project with Cassie as my final field study before graduation (sniff). My task has been to look at the resource records, analyze the EAD being produced by the Beast and how that will translate to ArchivesSpace, and identify the data cleanup steps needed to get us there. It’s been a fantastic learning experience, and I’m sad to be leaving now that the semester is ending.

One of the things I’ve spent a lot of time in graduate school thinking about is usability and user experience, and although it’s outside of the scope of my project, there’s something that’s been bugging me…

The Problem (as I see it)

When you look at a finding aid generated through ArchivesUM, the page URL looks something like this:

The ability to see what database actions are performed when a finding aid is called are of absolutely no use to the person viewing the page. The XML file title inserted in the middle is also useless, because it is related to none of the identifiers a researcher would use to access the collection.

By way of contrast, the URL for this post looks something like this:

WordPress, like many other sites, creates semantic URLs for each of the pages it generates.  It clearly identifies the source of the page, the date it was originally posted, and some human-readable form of the title, which can be altered by the author of the post.

Like the ArchivesUM URL, WordPress has provided a unique, static identifier for the information contained therein. Unlike its ArchivesUM counterpart, it provides the viewer with several important pieces of information such as date of publication and if it is a sub-topic of the site.  This has an impact on its findability, both on its website and when it appears on a Google search results page.  Users are quickly able to determine whether they find the source trustworthy, how new it is, and that it references the topic they are interested in.

Semantic URLs also clearly identify one subpage of a website from another. This can have an impact on search engine optimization, as pages with similar long and indecipherable URLs may not be crawled.

Possible Solution

Let’s take a look at what some other repositories are doing with their URLs.  I’ve been using Princeton and Duke as touchstones throughout my project because they’re clearly thinking progressively about a lot of things, including URLs:



In both cases, the last part of the URL is inserted using the <eadid> tag. Princeton uses its collection number, while Duke uses a shortened version of the collection title.  Both are clean and easy to read. Users like when they can easily understand a URL- it helps build confidence and trust in the website.  It is arguable how practical these URLs are to the average user because they’re not going to memorize them, but this consistent structure may be useful for archives staff. The same cannot be said for the current ArchivesUM URL standard.

I think the ArchivesSpace transition gives us the chance to adopt a new URL standard that looks something like this:<eadid&gt;

As with the Duke example above, this would use the <eadid> tag to provide some version of the title of the collection. It could contain either a shortened title or the full title of the collection, e.g.:

Admittedly, that could get a bit long for a semantic URL, but it would avoid confusion with similar records. This will be something to work out on the policy level.

The Pros of This Approach:

-Quickly conveys information about the repository and collection name

-Short and human-readable, which is an advantage for search engines

-Provides level of trust to user (which is admittedly hard to quantify)

-Can be done using the EAD ID field in ArchivesSpace

-Easy for reference archivists and researchers to identify collection by URL

-Removes “sausage making” database calls currently in URL, which reduces researcher confusion

-Can remain static even if database structure is changed

-If the finding aid is cross-linked on another page, users will have an idea where they’re going before they click on the link.

And the Cons:

-Can be confusing if collections have similar titles

-Need to set very clear rules for formatting, or automate it when converting from ArchivesSpace to finding aid

-Length is a concern- should be short but convey the information

-Requires policy changes for both the contents of <eadid> and the use of unique identifiers

The negatives to adopting a semantic URL approach are primarily in the implementation and can be mitigated by proper planning and clear policies. The positives boil down to the fact that for a relatively small amount of effort, we can have a huge impact on user experience as well as search engine optimization.

We are implementing ArchivesSpace in part to better serve our researchers into the future. It’s important, then, to consider everything about our EAD content, our finding aids, and our websites from the user’s perspective. The URL is the first thing a researcher will encounter, so why not start there?


2 thoughts on “Building a Case: Semantic URLs for Finding Aids

  1. Just got around to reading this. Great post and thanks for the Duke shout out. To clarify, the last segment of the Duke finding aid URL (\africanamericansmisc\ in your example) is really just the filename of the EAD file we post to the webserver. The filename just happens to be the same as the EADID (hopefully). We’ve only recently begun numbering collections (amazing but true), so these filenames have traditionally served as the only unique ID for collections. As you might imagine, having a text string as a collection ID has been a bit of a problem. Who knew we were really being progressive with our semantic URLs?

    If you’re interested in other ways to include semantic markeup in the HTML versions of your finding aids, check out this blog post by Sean Aery here at Duke:

  2. Pingback: SAA 2014 Poster- ArchivesSpace and the Potential For Institutional Change | The Unexpected Archivist

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s