Crowdsourcing ontology verification

Placeholder Show Content

Abstract/Contents

Abstract
Biomedicine and healthcare rely heavily on ontologies, with ontology development and use increasing rapidly in the domain. However, as the scale and complexity of these ontologies increases, so too do errors and engineering challenges. There are both automated and manual methods that provide ontology quality assurance and identify ontology errors. However, these methods do not readily scale as ontology size increases, and they do not necessarily identify the most salient errors. Recently, crowdsourcing has enabled solutions to complex problems that computers alone cannot solve. Crowdsourcing presents an opportunity to develop methods for ontology quality assurance that overcome the current limitations of scalability and applicability. Toward that end, the work described in this dissertation has the following aims: (1) to examine the effect of ontology errors in biomedical applications, (2) to develop and tests a scalable framework for ontology verification via crowdsourcing that overcomes current ontology quality assurance method limitations, (3) to apply this framework to ontologies in-use, and (4) to evaluate the methodology and its effect in the context of a the biomedical domain. In the preliminary studies, I found that crowd workers perform best when answering questions about concrete (not abstract) ontology concepts, when presented with a simply stated natural language representation of an ontology axiom, and when provided textual definitions of ontology concepts. After completing these early studies, I refined and applied the crowd-based methodology to biomedical ontologies in-use. On SNOMED CT, the crowd identified 39 errors in a set of 200 expert-verified relationships, it was indistinguishable from any single expert by inter-rater agreement, and it performed on par with any single expert compared against the consensus standard that five subject-matter experts developed, with a mean AUC of 0.83. On the Gene Ontology, a different set of subject-matter experts identified 16 errors, generally in relationships referencing acids and metals. The crowd performed poorly in identifying those errors, with an AUC ranging from 0.44 to 0.73, depending on the methods configuration. However, when the crowd verified what experts considered to be easy relationships with useful definitions, they performed reasonably well. The results of the crowd's performance in verifying SNOMED CT and GO suggest that the crowd can indeed assist with ontology engineering tasks and, rather than serving as a complete replacement, the crowd can serve as an assistant, helping experts with ontology verification by completing the easy tasks and allowing experts to focus on the difficult tasks.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2015
Issuance monographic
Language English

Creators/Contributors

Associated with Mortensen, Jonathan M
Associated with Stanford University, Program in Biomedical Informatics.
Primary advisor Musen, Mark A
Thesis advisor Musen, Mark A
Thesis advisor Khatri, Purvesh
Thesis advisor Noy, Natalya F
Advisor Khatri, Purvesh
Advisor Noy, Natalya F

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Jonathan M. Mortensen.
Note Submitted to the Program in Biomedical Informatics.
Thesis Thesis (Ph.D.)--Stanford University, 2015.
Location electronic resource

Access conditions

Copyright
© 2015 by Jonathan McClarren Mortensen
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...