Information retrieval across multiple information sources using a knowledge based methodology
- The recent years have seen a tremendous growth in research and developments in science and technology, and an emphasis in obtaining Intellectual Property (IP) protection for one's innovations. Information pertaining to IP for science and technology is siloed into many diverse sources and consists of laws, regulations, patents, court litigations, scientific publications, and more. Although a great deal of legal and scientific information is now available online, the scattered distribution of the information, combined with the enormous sizes and complexities, makes any attempt to gather relevant IP-related information on a specific technology a daunting task. In this thesis, we develop a knowledge-based software framework to facilitate retrieval of patents and related information across multiple diverse and uncoordinated information sources in the US patent system. The document corpus covers issued US patents, court litigations, scientific publications, and patent file wrappers in the biomedical technology domain. A document repository is to be populated with issued US patents, court cases, scientific publications, and file wrappers in XML format. Parsers are developed to automatically download documents from the information sources. Additionally, the parser also extracts metadata and textual content from the downloaded documents and populates the XML repository. A text index is built over the repository using Apache Lucene, to facilitate search and retrieval of documents. Based on the document repository, the underlying methodology to search across multiple information sources in the patent system is discussed. The methodology is divided into two major parts. First, we develop a knowledge-based query expansion methodology to tackle domain terminological inconsistencies in the documents. Relevant knowledge is retrieved from external sources such as domain ontologies. Since our goal is to retrieve a collection of relevant documents across multiple sources, we develop a patent system ontology to provide interoperability between the different types of documents and to facilitate information integration. We discuss the Information Retrieval (IR) framework which combines the knowledge-based query expansion methodology with the patent system ontology to provide a multi-domain search methodology. A visualization tool based on term co-occurrence is developed that can be used to browse the document repository through class hierarchies of domain ontologies. The knowledge-based query expansion methodology is evaluated through formal measures such as precision and recall. A simple term-based search is used as a baseline reference for comparison. Additionally, the results from related works are also used for comparison. A series of common questions asked during patent prior art searches and infringement analysis are generated to evaluate the patent system ontology. A summary of the results and analysis is provided.
|Type of resource
|electronic; electronic resource; remote
|1 online resource.
|Taduri, Siddharth S
|Stanford University, Civil & Environmental Engineering Department
|Law, K. H. (Kincho H.)
|Law, K. H. (Kincho H.)
|Statement of responsibility
|Submitted to the Department of Civil and Environmental Engineering.
|Thesis (Engineering)--Stanford University, 2012.
- © 2012 by Siddharth S Taduri
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...