Browsing by Author "Li, Xiaolong"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- 3D Deep Learning for Object-Centric Geometric PerceptionLi, Xiaolong (Virginia Tech, 2022-06-30)Object-centric geometric perception aims at extracting the geometric attributes of 3D objects. These attributes include shape, pose, and motion of the target objects, which enable fine-grained object-level understanding for various tasks in graphics, computer vision, and robotics. With the growth of 3D geometry data and 3D deep learning methods, it becomes more and more likely to achieve such tasks directly using 3D input data. Among different 3D representations, a 3D point cloud is a simple, common, and memory-efficient representation that could be directly retrieved from multi-view images, depth scans, or LiDAR range images. Different challenges exist in achieving object-centric geometric perception, such as achieving a fine-grained geometric understanding of common articulated objects with multiple rigid parts, learning disentangled shape and pose representations with fewer labels, or tackling dynamic and sequential geometric input in an end-to-end fashion. Here we identify and solve these challenges from a 3D deep learning perspective by designing effective and generalizable 3D representations, architectures, and pipelines. We propose the first deep pose estimation for common articulated objects by designing a novel hierarchical invariant representation. To push the boundary of 6D pose estimation for common rigid objects, a simple yet effective self-supervised framework is designed to handle unlabeled partial segmented scans. We further contribute a novel 4D convolutional neural network called PointMotionNet to learn spatio-temporal features for 3D point cloud sequences. All these works advance the domain of object-centric geometric perception from a unique 3D deep learning perspective.
- PointMotionNet: Point-Wise Motion Learning for Large-Scale LiDAR Point Clouds SequencesWang, Jun; Li, Xiaolong; Sullivan, Alan; Abbott, A. Lynn; Chen, Siheng (IEEE, 2022-06)We propose a point-based spatiotemporal pyramid architecture, called PointMotionNet, to learn motion information from a sequence of large-scale 3D LiDAR point clouds. A core component of PointMotionNet is a novel technique for point-based spatiotemporal convolution, which finds the point correspondences across time by leveraging a time-invariant spatial neighboring space and extracts spatiotemporal features. To validate PointMotionNet, we consider two motion-related tasks: point-based motion prediction and multisweep semantic segmentation. For each task, we design an end-to-end system where PointMotionNet is the core module that learns motion information. We conduct extensive experiments and show that i) for point-based motion prediction, PointMotionNet achieves less than 0.5m mean squared error on Argoverse dataset, which is a significant improvement over existing methods; and ii) for multisweep semantic segmentation, PointMotionNet with a pretrained segmentation backbone outperforms previous SOTA by over 3.3 % mIoU on SemanticKITTI dataset with 25 classes including 6 moving objects.