You digitized it, but now what? Exploring computational methods for extracting biodiversity data from historical collections

Whitmire, Amanda

doi:10.25740/mp178ym9045

You digitized it, but now what? Exploring computational methods for extracting biodiversity data from historical collections

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fmp178ym9045" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Climate change is driving rapid changes in our biosphere on local and global scales. Our capacity to understand these shifts relies entirely upon two critical things: long-term observations, and an ability to discover and access them. Species occurrence data, which includes the occurrence of a species at a particular place on a specified date, are foundational to understanding biodiversity and tracking changes due to the effects of climate change. Open knowledge bases that gather species occurrence records enable researchers to assess spatio-temporal changes in biodiversity, but observations from the past recorded on paper are often missing. Libraries at several academic marine research stations on the West Coast of North America hold large physical collections of undergraduate student reports. These reports include field observations of species occurrences and populations recorded over a span of nine decades. Each library collection is important within its local context, but taken collectively these papers represent an extremely valuable corpus for conducting biodiversity research. Even after digitization, however, observational data in these papers are still “hidden” in the text. Reading and extracting those data by hand is an effort we cannot realistically undertake. In this presentation, I will describe a collaborative project in which we explore the potential of natural language processing, machine learning, and data visualization to identify and verify species occurrences in unpublished student research papers. I will review how we approach identifying relevant entities in the texts, link them to taxonomic authorities, and create derivative datasets. The final goal of the project is to serve the species occurrence metadata to relevant aggregators, e.g., the Global Biodiversity Information Facility. The overarching message of this talk will be how we can take advantage of computational methods to amplify the work of information professionals in surfacing historical biodiversity data.

Description

Type of resource	text
Date modified	February 15, 2022; December 5, 2022
Publication date	February 9, 2022; February 9, 2022

Creators/Contributors

Author	Whitmire, Amanda	https://orcid.org/0000-0003-2429-8879 (unverified)

Subjects

Subject	Biodiversity
Subject	Text data mining
Subject	Natural language processing
Genre	Text
Genre	Presentation recording
Genre	Presentation slides
Genre	Speaker notes

Bibliographic information

Related item	Title YouTube link from the IOC IODE
DOI	https://doi.org/10.25740/mp178ym9045
Location	https://purl.stanford.edu/mp178ym9045

Access conditions

Use and reproduction: User agrees that, where applicable, content will not be used to identify or to otherwise infringe the privacy or confidentiality rights of individuals. Content distributed via the Stanford Digital Repository may be subject to additional license and use restrictions applied by the depositor.
License: This work is licensed under a Creative Commons Attribution 4.0 International license (CC BY).

Preferred citation

Preferred citation: Whitmire, A. (2022). You digitized it, but now what? Exploring computational methods for extracting biodiversity data from historical collections. Stanford Digital Repository. Presented at the International Ocean Data Conference 2022, Available at https://purl.stanford.edu/mp178ym9045

Collection

Stanford Libraries staff presentations, publications, and research

View other items in this collection in SearchWorks

Contact information

Contact: thalassa@stanford.edu

Also listed in

View in SearchWorks

Loading usage metrics...