Frankenhorse: Automatic Completion of Articulating Objects from Image-based Reconstruction

Abstract

Reconstruction of scene geometry and semantics are important problems in vision, and increasingly brought together. The state of the art in Structure from Motion and Multi View Stereo (SfM+MVS) can already create accurate, dense reconstructions of scenes. Systems such as CMPMVS [2] are freely available and produce impressive results automatically. However, when assumptions break down or there is insufficient data, noise, extraneous geometry and holes appear in the reconstruction. We propose to solve these problems by introducing prior knowledge. We focus on the difficult class of articulating objects, such as people and animals. Prior modelling of these classes is difficult due to the articulation and large intra-class variation. We propose an automatic method for completion which does not rely on a prior model of the deformation or training data captured under controlled conditions. Instead, given far from perfect reconstructions, we simultaneously complete each using the well-reconstructed parts of the others. This is enabled by the data-driven piecewise-rigid 3D model alignment method of Chang and Zwicker [1]. This method estimates local coordinate frames on the meshes and proposes correspondences by matching local descriptors. Each correspondence determines a rigid alignment, which is used as a label in a graph labelling problem to determine a piecewise-rigid alignment which brings the meshes into correspondence while penalising stretching edges. Our main contributions are as follows. We present a novel, fully automatic method for the completion of noisy real SfM+MVS reconstructions which (1) exploits a set of noisy reconstructions of objects of the class, rather than relying on a large clean training set which is expensive to collect, (2) handles the articulation structure in the class of objects, allowing larger holes to be filled and with greater accuracy than a generic smoothness prior and (3) is exemplar-based, allowing details to be maintained that may be smoothed out in related learning-based approaches. Our method takes as its input sets of images of scenes each containing an object of a specific class. For each input image set, initially yielding an incomplete and cluttered reconstruction of the whole scene, the output is a completed model of the object, created using the other reconstructions. Our method consists of a pipeline of several stages, visualised in Figure 1. In the first stage, each scene is reconstructed using a SfM+MVS pipeline [2]. We then segment the objects from the scene by combining object detections in the images. In the third stage, we align each of

Topics

    0 Figures and Tables

      Download Full PDF Version (Non-Commercial Use)