Date Formats

Now that we’re eliminated most of our duplicate bulk dates let’s take a look at the plethora of date formats in our accession records. Does your repository have local definitions for how to record collection dates? My guess is most would have something somewhere, even if not a formal written policy. We have a small section of our processing manual detailing preferred date formats and unacceptable date formats. It is suppose to apply to collection dates in accession records and in finding aids. Do people follow it? Sometimes. Usually not. Or they follow some of it, but not all of it. We also have dates created before the processing manual existed. The samples below are just from a portion of our accession records, so we might have additional formats yet to be discovered, but you’ll get the idea.

Our date fields could contain three elements: year only, month and year, or month, day, and year. The type might be a single date, multiple single dates, range, multiple ranges, or a combination of these (although that isn’t specified). For dates in accession records I have already gone ahead and removed any variation of the word “circa”. There’s also a healthy amount of “unknown” and “undated” speckled throughout.

Element, type unitdateinclusive (Beast field)
Year, single 1909
[1922]
1636 (approx.)
1920[?]
1940?
1946, undated
1957(?)
1999?
Year, multiple single 1913, 1963
1945 or 1946
1953, 1961, 1969, 1994
1954, 1956, 1966-1967, 1971
1958, 1960, 1962
1966, 1967, 1969
1967, 1968, 1969
1969, 1970
1995, 2000, undated
Year, range 1910-1950
1920s
1920s-1930s?
1921-1981 and undated
1940’s-2006
1980’s-1990’s
2000-2001 (FY 2001)
Early 1970s
late 1980s-early 1990s
undated, 1970s-2002
Year, multiple range 1920s, 1969-1975
1932-1934, 1950s
1937-1942; 1947-1950
Year, single and range 1928; 1938-1962 and undated
1938, 1950-1951
1950s-1960s, 1988
2008 [1901-2002]
Month Year, single November 1962
April 2001?
Month Year, range January 1977- November 1981
May2005-January 2007
Otober 1920-Marh 1921
Month, Day, Year, single 11/9/1911
June 14, 1924
Marh 8, 2006
Otober 26, 1963
Month, Day, Year, multiple single 12/19/2005; 4/4/2006
January 5, 2000,  July 12, 2000
9/19 & 9/20/2007
Month, Day, Year, range 10/24-10/26/2008
January 30, 2011-February 2, 2011
Marh 22-24, 2001
Otober 13, 1987-Deember 7, 1987

Here’s a summary of the issues:

  • Punctuation is not standard. Multiple dates may be separated with a period, comma, semi-colon, ampersand, or the word “and”.
  • We used a variety of methods to convey we were unsure of the date, such as ?, (?), [ ], [?], (approx.) in addition to all the circa variations. I’m guessing there are other dates we weren’t sure of, but we didn’t specify that.
  • Spacing isn’t consistent. Sometimes there are no spaces around punctuation, others times one, two , or more spaces.
  • Spelling. Sometimes we just couldn’t spell October or March (the most popular offenders apparently)
  • Formats are all over the place, even comparing the same element and type. Ex: March 22-24, 2001 compared to March 22, 2001-March 24, 2001.
  • Use of decades was a common practice.
  • Providing single dates instead of ranges. Do we really need to say “1966, 1967, 1969” instead of “1966-1969” if we’re only missing 1968?

Next post we’ll talk about the instructions and rules we’re developing for cleaning this up and how we go about executing those decisions.

Advertisements

6 thoughts on “Date Formats

  1. Can I just say, I’m loving your blog posts (and your honesty about less-than-perfect legacy data!) We are in the process of planning our own retroconversion project so it is great to hear about your experiences. Thanks!

    • Thanks, Jenny! It’s been liberating to talk about these issues and so helpful for me too to hear what others are doing. Let us know anytime if you ever want to post about your retroconversion project.

  2. When you remove circa from dates, are you somehow retaining the information that the date is approximate/uncertain?

  3. Great question. You have a few options here. ArchivesSpace has a field for “date certainty” where you can specify if a date is “approximate”, “inferred”, or “questionable”. If you want to retain this information you could remove it from the date field and select one of the options under date certainty for that record. (We will probably do a version of this for our resource records, either at the collection or component level.) I’ll have an upcoming post where we talk about how to parse dates to ArchivesSpace fields.

    For our accession records, we are going to rely on institutional policy and assume that collection dates are all estimated dates. Most of the time these records were created without examining records so the dates are best guesses anyway. Based on our non-standardized practice I also know there are tons of dates that are estimated, but no one took the time to write in “circa”. We also don’t know which accessions have dates that are certain and verified. We will either decide to include “approximate” for our accession records with dates or just leave the certainty field blank relying on local practice that these dates are the best known estimate.

  4. Pingback: Building a case for semantic URLs (draft post) | Steve Ammidown

  5. Pingback: Chaos —> Order | Clean Up: Dates and OpenRefine

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s