Deep learning on point clouds for 3D scene understanding

Placeholder Show Content


Point cloud is a commonly used geometric data type with many applications in computer vision, computer graphics and robotics. The availability of inexpensive 3D sensors has made point cloud data widely available and the current interest in self-driving vehicles has highlighted the importance of reliable and efficient point cloud processing. Due to its irregular format, however, current convolutional deep learning methods cannot be directly used with point clouds. Most researchers transform such data to regular 3D voxel grids or collections of images, which renders data unnecessarily voluminous and causes quantization and other issues. In this thesis, we present novel types of neural networks (PointNet and PointNet++) that directly consume point clouds, in ways that respect the permutation invariance of points in the input. Our network provides a unified architecture for applications ranging from object classification and part segmentation to semantic scene parsing, while being efficient and robust against various input perturbations and data corruption. We provide a theoretical analysis of our approach, showing that our network can approximate any set function that is continuous, and explain its robustness. In PointNet++, we further exploit local contexts in point clouds, investigate the challenge of non-uniform sampling density in common 3D scans, and design new layers that learn to adapt to varying sampling densities. The proposed architectures have opened doors to new 3D-centric approaches to scene understanding. We show how we can adapt and apply PointNets to two important perception problems in robotics: 3D object detection and 3D scene flow estimation. In 3D object detection, we propose a new frustum-based detection framework that achieves 3D instance segmentation and 3D amodal box estimation in point clouds. Our model, called Frustum PointNets, benefits from accurate geometry provided by 3D points and is able to canonicalize the learning problem by applying both non-parametric and data-driven geometric transformations on the inputs. Evaluated on large-scale indoor and outdoor datasets, our real-time detector significantly advances state of the art. In scene flow estimation, we propose a new deep network called FlowNet3D that learns to recover 3D motion flow from two frames of point clouds. Compared with previous work that focuses on 2D representations and optimizes for optical flow, our model directly optimizes 3D scene flow and shows great advantages in evaluations on real LiDAR scans. As point clouds are prevalent, our architectures are not restricted to the above two applications or even 3D scene understanding. This thesis concludes with a discussion on other potential application domains and directions for future research.


Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2018; ©2018
Publication date 2018; 2018
Issuance monographic
Language English


Author Qi, Ruizhongtai
Degree supervisor Guibas, Leonidas J
Thesis advisor Guibas, Leonidas J
Thesis advisor Girod, Bernd
Thesis advisor Savarese, Silvio
Degree committee member Girod, Bernd
Degree committee member Savarese, Silvio
Associated with Stanford University, Department of Electrical Engineering.


Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Ruizhongtai Qi.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2018.
Location electronic resource

Access conditions

© 2018 by Ruizhongtai Qi
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...