In an effort to be transparent (and highlight the volume of work) I’m attempting to document all of our cleanup instructions and decisions. For each ArchivesSpace field we’re including actions and questions as we explore our data. Some of the questions focus on how ArchivesSpace works or implements fields while others focus on our specific data, policies, or procedures. Over time, the questions turn into actions and we are continually updating our documentation as we work through these issues.
Below is what we have so far for dates of materials in accession records. We still have lots of questions (and will no doubt develop more) so feel free to provide any feedback or suggestions.
Dates from Beast fields unitdateinclusive and unitdatebulk will parse into date 1 and date 2 fields each including:
- date_1_type (required by AS)
- date_1_expression (required by AS if no normalized dates)
- date_1_label (required by AS)
Adhere to required date formats and definitions for ArchivesSpace fields
- Control begin and end fields for date format: YYYY, YYYY-MM, or YYYY-MM-DD
- Single dates do not need a date end
- Control date expression based on local convention [revising current local format]
- Split date ranges using “-” as delimiter and designate as “inclusive” or “bulk” in date type, based on what column they came from. Use date 1 fields for inclusive and date 2 fields for bulk
- For values with commas, “1950, 1950” parse to two single dates using date 1 fields and date 2 fields
- Label defaults to “creation” unless otherwise specified
For dates that include “undated”
- Keep “undated” as part of whole statement in the date expression field.
- Parse remaining dates as normal and remove “undated” from begin and end
- ex: “1857-1919 and undated” remains as the date expression, 1857 goes to date_begin, 1919 goes to date_end, type is “inclusive”
- Assume that all collection dates in accession records are best estimates [Policy decision]
- remove all forms of “circa” from accession dates
- remove question marks and variations
- remove square brackets and variations indicating guessed dates
- remove “undated” if it is the only value
- remove “unknown” if it is the only value
For dates listed as decades
- Control decades to year spans. Use the first and last year of decade.
- 1940s-1950s becomes 1940-1959 in date expression, begin 1940, and end 1959
If bulk dates are the exact same as inclusive, delete bulk dates
What date cutoffs do we use for partial decades?
- ex: Late 1990s, mid 1970s, early 1930s
- late = 1995-1999
- mid = ???
- early = 1930-1935
If bulk dates exist for single items, when to delete or not?
- Will delete if same
- Should we keep if there is a difference?
- If a difference what is the year cutoff? 1? 5? 10? etc……
Are single dates with “undated” really single?
- ex “1941 and undated”
Can we have “bulk” dates that are “single”?
- ex: 1989, type as “bulk” in ArchivesSpace?
For date expression, can we agree on the preferred date formats?
- Start with guidelines in processing manual
- Update and make suggestions for changes
- Solicit comments/feedback
- Finalize decisions
- ex: 1880s, 18th-20th Century, Oct. 1976, Sept-9-1980, May 5, 1990
What if accession is a single item with a date range, but the abstract gives single date?
- ex: Edwin Warfield accessions, range 1875-1912, abstract for accession says 1889 for a single item. Assuming that this date range was for all Warfield papers? Ignore and take date from abstract?
What do we do if we have more dates than fields?
- ex: single dates of 1966, 1967, 1969 or 1930, 1950, and 2002 would parse to three single date fields
- Version 126.96.36.199 currently only imports date 1 and date 2 in CSV accession template
- When do we want to turn single dates into a range instead? How many years in between? Based on size of materials? Never and develop procedure for adding dates beyond second into ArchivesSpace record post import?
In the next post we’ll go through some of the specific ways we are executing the clean up actions.