Multi-camera vision for smart environments

Placeholder Show Content

Abstract/Contents

Abstract
Technology is blending into every part of our lives. Instead of having them as intruders in the environment, people prefer the computers to go into the background, and automatically help us at the right time, on the right thing, and in the right way. A smart environment system senses the environment and people in it, makes deductions, and then takes actions if necessary. Sensors are the "eyes" of smart environments. In this dissertation we consider vision sensors. Image and video data contain rich information, yet they are also challenging to interpret. From obtaining the raw image data from the camera sensors to achieving the application's goal in a smart environment, we need to consider three main components of the system, i.e., the hardware platform, vision analysis of the image data, and high-level reasoning pertaining to the end application. A generic vision-related problem, e.g., human pose estimation, can be approached based on different assumptions. In our approach the system constraints and application's high-level objective are considered and often define boundary conditions in the design of vision-based algorithms. A multi-camera setup requires distributed processing at each camera node and information sharing through the camera network. Therefore, computation capacity of camera nodes and the communication bandwidth can define two hard constraints for algorithm design. We first introduce a smart camera architecture and its specific local computation power and wireless communication constraints. We then examine how the problem of human pose detection can be formulated for implementation on this smart camera, and describe the steps taken to achieve real-time operation for an avatar-based gaming application. We then present a human activity analysis technique based on multi-camera fusion. The method defined a hierarchy of coarse and fine level activity classes and employs different visual features to detect the pose or activity in each class. The camera fusion methods studied in this dissertation include decision fusion and feature fusion. We show the results of experiments in three testbed environments and analyze the performance of the different fusion methods. Although computer vision already involves complicated techniques in interpreting the image data, it still constitutes just the sensing part of the smart environment system, and further application-related high-level reasoning is needed. Modeling such high-level reasoning will depend on the specific nature of the problem. We present a case study in this dissertation to demonstrate how the vision processing and high-level reasoning modules can be interfaced for making semantic inference based on observations. The case study aims to recognize objects in a smart home based on the observed user interactions. We make use of the relationship between objects and user activities to infer the objects when related activities are observed. We apply Markov logic network (MLN) to model such relationships. MLN enables intuitive modeling of relationships, and it also offers the power of graphical models. We show different ways of constructing the knowledge base in MLN, and provide experimental results.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2011
Issuance monographic
Language English

Creators/Contributors

Associated with Wu, Chen, 1936-
Associated with Stanford University, Department of Electrical Engineering
Primary advisor Aghajan, Hamid K
Primary advisor Van Roy, Benjamin
Thesis advisor Aghajan, Hamid K
Thesis advisor Van Roy, Benjamin
Thesis advisor Li, Fei Fei, 1976-
Advisor Li, Fei Fei, 1976-

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Chen Wu.
Note Submitted to the Department of Electrical Engineering.
Thesis Ph.D. Stanford University 2011
Location electronic resource

Access conditions

Copyright
© 2011 by Chen Wu
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...