Crowdsourcing ontology verification

Mortensen, Jonathan M; Stanford University, Program in Biomedical Informatics.

Crowdsourcing ontology verification

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fnz542tx7239" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Biomedicine and healthcare rely heavily on ontologies, with ontology development and use increasing rapidly in the domain. However, as the scale and complexity of these ontologies increases, so too do errors and engineering challenges. There are both automated and manual methods that provide ontology quality assurance and identify ontology errors. However, these methods do not readily scale as ontology size increases, and they do not necessarily identify the most salient errors. Recently, crowdsourcing has enabled solutions to complex problems that computers alone cannot solve. Crowdsourcing presents an opportunity to develop methods for ontology quality assurance that overcome the current limitations of scalability and applicability. Toward that end, the work described in this dissertation has the following aims: (1) to examine the effect of ontology errors in biomedical applications, (2) to develop and tests a scalable framework for ontology verification via crowdsourcing that overcomes current ontology quality assurance method limitations, (3) to apply this framework to ontologies in-use, and (4) to evaluate the methodology and its effect in the context of a the biomedical domain. In the preliminary studies, I found that crowd workers perform best when answering questions about concrete (not abstract) ontology concepts, when presented with a simply stated natural language representation of an ontology axiom, and when provided textual definitions of ontology concepts. After completing these early studies, I refined and applied the crowd-based methodology to biomedical ontologies in-use. On SNOMED CT, the crowd identified 39 errors in a set of 200 expert-verified relationships, it was indistinguishable from any single expert by inter-rater agreement, and it performed on par with any single expert compared against the consensus standard that five subject-matter experts developed, with a mean AUC of 0.83. On the Gene Ontology, a different set of subject-matter experts identified 16 errors, generally in relationships referencing acids and metals. The crowd performed poorly in identifying those errors, with an AUC ranging from 0.44 to 0.73, depending on the methods configuration. However, when the crowd verified what experts considered to be easy relationships with useful definitions, they performed reasonably well. The results of the crowd's performance in verifying SNOMED CT and GO suggest that the crowd can indeed assist with ontology engineering tasks and, rather than serving as a complete replacement, the crowd can serve as an assistant, helping experts with ontology verification by completing the easy tasks and allowing experts to focus on the difficult tasks.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2015
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Mortensen, Jonathan M
Associated with	Stanford University, Program in Biomedical Informatics.
Primary advisor	Musen, Mark A
Thesis advisor	Musen, Mark A
Thesis advisor	Khatri, Purvesh
Thesis advisor	Noy, Natalya F
Advisor	Khatri, Purvesh
Advisor	Noy, Natalya F

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Jonathan M. Mortensen.
Note	Submitted to the Program in Biomedical Informatics.
Thesis	Thesis (Ph.D.)--Stanford University, 2015.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...