Research and Development

Rationale

SNAC's research and development work is demonstrating the feasibility of separating the description of people from the description of the historical records that were created by them and that document their lives and work. This separation serves two complementary objectives: improving the economy and effectiveness of archival description and providing researchers with a novel tool that integrates access to distributed historical records and reveals the social context within which the records were created.

Data Sources

The descriptions that constitute the bulk of the source data for the work are in three forms:

  • Detailed guides or finding aids that are encoded using the international standard Encoded Archival Description (EAD).
  • Summary descriptions of archival collections that use the library standard MARC21.
  • Original descriptions of people (traditionally described as authority records and represented in a wide variety of non-standard formats).

Currently, SNAC is working with nearly 190,000 finding aids contributed by scores of repositories, 2.2 million MARC21 descriptions contributed by OCLC WorldCat, and approximately 500,000 original descriptions of people contributed by the British Library, the U.S. National Archives and Records Administration, the Smithsonian Institution Archives, and others.

Data Processing Methods

Processing of the source data takes place in three steps.

  • Data is extracted from the source descriptions and assembled into descriptions of individuals, families, and organizations using the international standard Encoded Archival Context-Corporate Bodies, Persons, and Families (EAC-CPF).
  • The resulting EAC-CPF descriptions are matched and combined with one another, and then matched against more than 25 million Virtual International Authority File (VIAF) records. Data from matching VIAF records supplements the data in the EAC-CPF descriptions.
  • Finally, the resulting EAC-CPF descriptions are added to the foundation of a public research tool that serves both as an integrated pathway into distributed historical resources and as a biographical-historical resource.

Each step in the processing is performed by one of the three SNAC collaborators. The first step, extracting/assembling, is done at the Institute for Advanced Technology in the Humanities, University of Virginia. The second step, matching/combining, is performed at the School of Information, University of California, Berkeley. The third and last step, developing the public research tool, is being carried out at the California Digital Library, University of California Office of the President.

The U.S. National Endowment for the Humanities (2010-2012) and the Andrew W. Mellon Foundation (2012-2015) have funded the R&D. The web site for the first phase of the R&D phase is maintained here.