Creating hardware component knowledge bases from PDF datasheets
Abstract/Contents
- Abstract
- Hardware component databases are vital resources in designing electronics. These databases store information about hardware components that allow designers to find the components they need. However, creating detailed hardware databases requires hundreds of thousands of hours of manual data entry. As a result, existing databases are often proprietary, incomplete, and may have sporadic human data entry errors. Knowledge base construction (KBC) systems help automate the process of creating and populating structured databases and have been applied effectively to many domains. Knowledge base construction techniques reduce dependency on human input, making it faster, easier, and cheaper to build these databases. This dissertation presents a machine-learning-based approach for creating hardware component databases directly from manufacturers' published component datasheets. Extracting data directly from datasheets is challenging for three reasons. First, the data is relational in nature; accurate interpretation relies on non-local context. Second, datasheets are filled with technical jargon. Third, datasheets are PDFs, a format that decouples visual locality from locality within the document. These challenges illuminate why human input is required, but human input is error-prone, time-consuming, and expensive. Instead of relying solely on human input, the approach of using a rich data model, weak supervision, data augmentation, and multi-task learning in this dissertation presents a more automated alternative. When utilized effectively, these machine-learning techniques create large knowledge bases cheaply and in just days. This dissertation consists of three parts. First, it presents Fonduer, a novel knowledge base construction system for richly formatted data based on a multimodal data model and weak supervision. It motivates Fonduer by studying the challenging properties of richly formatted data like the PDF datasheets electronics manufacturers use to publish component specifications. These insights lead to developing the building blocks necessary to enable automated information extraction from hardware datasheets. Fonduer is validated across various domains beyond only hardware datasheets by creating large knowledge bases in days. Second, this dissertation shows how Fonduer can be used to build hardware component knowledge bases in practice. The multimodal information that Fonduer captures provides signals utilized in training data generation and the augmentation of deep learning models for multi-task learning. An evaluation of this approach on datasheets of three types of components achieves an average quality of 0.77 F1—quality comparable to existing human-curated knowledge bases. Third, this dissertation demonstrates the utility of Fonduer with end-to-end applications and empirical results applied to real-world use cases. Two end-to-end applications, the enhancement of product catalogs with thumbnail images and the analysis of electrical characteristics, demonstrate that hardware component knowledge bases created in days make hardware component selection easier. Together, these results show three things. First, it is possible to automate the generation of hardware component knowledge bases. Second, these generated knowledge bases can be of higher quality than existing human-curated knowledge bases. Finally, these higher-quality knowledge bases open the door to innovative applications and tools for designing electronics.
Description
| Type of resource | text |
|---|---|
| Form | electronic resource; remote; computer; online resource |
| Extent | 1 online resource. |
| Place | California |
| Place | [Stanford, California] |
| Publisher | [Stanford University] |
| Copyright date | 2021; ©2021 |
| Publication date | 2021; 2021 |
| Issuance | monographic |
| Language | English |
Creators/Contributors
| Author | Hsiao, Luke Wen-syong | |
|---|---|---|
| Degree supervisor | Levis, Philip | |
| Degree supervisor | Winstein, Keith | |
| Thesis advisor | Levis, Philip | |
| Thesis advisor | Winstein, Keith | |
| Thesis advisor | Ré, Christopher | |
| Degree committee member | Ré, Christopher | |
| Associated with | Stanford University, Department of Electrical Engineering |
Subjects
| Genre | Theses |
|---|---|
| Genre | Text |
Bibliographic information
| Statement of responsibility | Luke Hsiao. |
|---|---|
| Note | Submitted to the Department of Electrical Engineering. |
| Thesis | Thesis Ph.D. Stanford University 2021. |
| Location | https://purl.stanford.edu/sf776wm9525 |
Access conditions
- Copyright
- © 2021 by Luke Wen-syong Hsiao
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...