Resolving 3D Human Pose Ambiguities with 3D Scene Constraints
M. Hassan, V. Choutas, D. Tzionas and M. J. Black
International Conference on Computer Vision (ICCV) 2019, Seoul, Korea
Title
Resolving 3D Human Pose Ambiguities with 3D Scene Constraints
Abstract
To understand and analyze human behavior, we need to capture humans moving in, and interacting with, the world. Most existing methods perform 3D human pose estimation without explicitly considering the scene. We observe however that the world constrains the body and vice-versa. To motivate this, we show that current 3D human pose estimation methods produce results that are not consistent with the 3D scene. Our key contribution is to exploit static 3D scene structure to better estimate human pose from monocular images. The method enforces Proximal Relationships with Object eXclusion and is called PROX. To test this, we collect a new dataset composed of 12 different 3D scenes and RGB sequences of 20 subjects moving in and interacting with the scenes. We represent human pose using the 3D human body model SMPL-X and extend SMPLify-X to estimate body pose using scene constraints. We make use of the 3D scene information by formulating two main constraints. The interpenetration constraint penalizes intersection between the body model and the surrounding 3D scene. The contact constraint encourages specific parts of the body to be in contact with scene surfaces if they are close enough in distance and orientation. For quantitative evaluation we capture a separate dataset with 180 RGB frames in which the ground-truth body pose is estimated using a motion-capture system. We show quantitatively that introducing scene constraints significantly reduces 3D joint error and vertex error. Our code and data are available for research at https://prox.is.tue.mpg.de.
Video
Downloads
Please register and accept the License agreement in this website in order to get access to the PROX dataset.
The downloads section provides the 2 PROX datasets:
- Quantitative PROX dataset : Dataset of 180 static RGB-D frames with Ground Truth.
The dataset captures static RGB-D frames of 1 subject in 1 scene and is described in Section 4.2 of the PROX paper. - Qualitative PROX dataset: Dataset of 100K RGB-D frames pseudo Ground Truth.
The dataset captures dynamic RGB-D sequences of 20 subjects in 12 scenes and is described in Section 4.1.2 on the PROX paper.
Each PROX dataset includes:
- Preview Videos of Recordings
- RGB-D Recordings
- 3D Scene Scans
- Camera Calibration
- Camera-to-World Transformations
- SMPL-X fittings
Referencing PROX
@inproceedings{PROX:2019,
title = {Resolving {3D} Human Pose Ambiguities with {3D} Scene Constraints},
author = {Hassan, Mohamed and Choutas, Vasileios and Tzionas, Dimitrios and Black, Michael J.},
booktitle = {International Conference on Computer Vision},
pages = {2282--2292},
month = oct,
year = {2019},
url = {https://prox.is.tue.mpg.de},
month_numeric = {10}
}
Contact
For questions, please contact prox@tue.mpg.de.
For commercial licensing, please contact ps-licensing@tue.mpg.de.