Semantic-based information extraction of biomedical definitions

Placeholder Show Content

Abstract/Contents

Abstract
It is well known that the volume of biomedical literature is growing exponentially and that scientists are being overwhelmed when they sift through the scope and diversity of this unstructured knowledge to find relevant information. Prior work on addressing this problem has focused on methods to search for relevant publications and to identify relevant parts of publications. There has been much less research on methods that assist in extracting knowledge from biomedical literature. To tackle this challenge, I present a novel method to support the acquisition of structured knowledge from unstructured text. I have applied my method to support the challenge of identifying rule-based definitions of disease phenotypes. Because background knowledge of complex and diverse medical conditions is critical to undertaking information extraction, I have developed a semantic-based approach. Specifically, I use existing background knowledge to incorporate domain-relevant semantics, such as semantic similarity and rules, into a method for finding publications and the parts of texts within that contain knowledge about phenotype definitions and for identifying the rule or rule format that correctly encodes a phenotype. I have evaluated my method in the autism phenotyping domain, and found that incorporating structured domain knowledge into information extraction provides better accuracy and higher relevance of results than alternative term-based approaches. My novel method can help scientists to rapidly identify and formalize the complex domain knowledge that is emerging in published research findings. My method is also widely applicable to other information extraction challenges where there is a need to accurately extract computer-interpretable definitions, constraints, and policies from text.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2012
Issuance monographic
Language English

Creators/Contributors

Associated with Hassanpour Ghady, Saeed
Associated with Stanford University, Department of Electrical Engineering
Primary advisor Das, Amar K. (Amar Kumar)
Primary advisor Garcia-Molina, Hector
Thesis advisor Das, Amar K. (Amar Kumar)
Thesis advisor Garcia-Molina, Hector
Thesis advisor Musen, Mark A
Thesis advisor Napel, Sandy
Advisor Musen, Mark A
Advisor Napel, Sandy

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Saeed Hassanpour.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis (Ph.D.)--Stanford University, 2012.
Location electronic resource

Access conditions

Copyright
© 2012 by Saeed Hassanpour Ghady
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...