You digitized it, but now what? Exploring computational methods for extracting biodiversity data from historical collections

Placeholder Show Content

Abstract/Contents

Abstract
Climate change is driving rapid changes in our biosphere on local and global scales. Our capacity to understand these shifts relies entirely upon two critical things: long-term observations, and an ability to discover and access them. Species occurrence data, which includes the occurrence of a species at a particular place on a specified date, are foundational to understanding biodiversity and tracking changes due to the effects of climate change. Open knowledge bases that gather species occurrence records enable researchers to assess spatio-temporal changes in biodiversity, but observations from the past recorded on paper are often missing. Libraries at several academic marine research stations on the West Coast of North America hold large physical collections of undergraduate student reports. These reports include field observations of species occurrences and populations recorded over a span of nine decades. Each library collection is important within its local context, but taken collectively these papers represent an extremely valuable corpus for conducting biodiversity research. Even after digitization, however, observational data in these papers are still “hidden” in the text. Reading and extracting those data by hand is an effort we cannot realistically undertake. In this presentation, I will describe a collaborative project in which we explore the potential of natural language processing, machine learning, and data visualization to identify and verify species occurrences in unpublished student research papers. I will review how we approach identifying relevant entities in the texts, link them to taxonomic authorities, and create derivative datasets. The final goal of the project is to serve the species occurrence metadata to relevant aggregators, e.g., the Global Biodiversity Information Facility. The overarching message of this talk will be how we can take advantage of computational methods to amplify the work of information professionals in surfacing historical biodiversity data.

Description

Type of resource text
Date modified February 15, 2022; December 5, 2022
Publication date February 9, 2022; February 9, 2022

Creators/Contributors

Author Whitmire, Amanda ORCiD icon https://orcid.org/0000-0003-2429-8879 (unverified)

Subjects

Subject Biodiversity
Subject Text data mining
Subject Natural language processing
Genre Text
Genre Presentation recording
Genre Presentation slides
Genre Speaker notes

Bibliographic information

Access conditions

Use and reproduction
User agrees that, where applicable, content will not be used to identify or to otherwise infringe the privacy or confidentiality rights of individuals. Content distributed via the Stanford Digital Repository may be subject to additional license and use restrictions applied by the depositor.
License
This work is licensed under a Creative Commons Attribution 4.0 International license (CC BY).

Preferred citation

Preferred citation
Whitmire, A. (2022). You digitized it, but now what? Exploring computational methods for extracting biodiversity data from historical collections. Stanford Digital Repository. Presented at the International Ocean Data Conference 2022, Available at https://purl.stanford.edu/mp178ym9045

Collection

Stanford Libraries staff presentations, publications, and research

View other items in this collection in SearchWorks

Contact information

Also listed in

Loading usage metrics...