[1912.06126v1] Deep Structured Implicit Functions
We show that this DSIF representation improves both reconstruction accuracy and generalization behavior over previous work – its F-Score results are better than the stateof-the-art [21] by 10.3 points for 3D autoencoding of test models from trained classes and by 17.8 points for unseen classes

Abstract The goal of this project is to learn a 3D shape representation that enables accurate surface reconstruction, compact storage, efficient computation, consistency for similar shapes, generalization across diverse shape categories, and inference from depth camera observations. Towards this end, we introduce Deep Structural Implicit Functions (DSIF), a 3D shape representation that decomposes space into a structured set of local deep implicit functions. We provide networks that infer the space decomposition and local deep implicit functions from a 3D mesh or posed depth image. During experiments, we find that it provides 10.3 points higher surface reconstruction accuracy (FScore) than the state-of-the-art (OccNet), while requiring fewer than 1% of the network parameters. Experiments on posed depth image completion and generalization to unseen classes show 15.8 and 17.8 point improvements over the state-of-the-art, while producing a structured 3D representation for each input with consistency across diverse shape collections.
‹Figure 1. Network architecture. Our system takes in one or more posed depth images and outputs a DSIF function that can be used to classify inside/outside for any query point x. It starts with a SIF encoder to extract a set of overlapping shape elements, each defined by a local Gaussian region of support parameterized by θi. It then extracts sample points/normals from the depth images and passes them through a PointNet encoder for each shape element to produce a latent vector zi. A TinyOccNet is used to decode each zi to produce an implicit function fi(x, zi), which is combined with the local Gaussian function g(x, θi) and summed with other shape elements to produce the output function DSIF(x). (Related Work)Figure 3. Representation efficiency. F-score vs. model complexity. Curves show varying M for constant N. Other methods marked as points. Top: F-score vs. count of decoder parameters. The N = 32, M = 32 configuration (large dot) reaches >90% F-score with <1% of the parameters of OccNet, and is used as the benchmark configuration in this paper. Bottom: F-score vs. shape vector dimension (|Θ| + |Z| for DSIF). DSIF achieves similar reconstruction accuracy to OccNet at the same dimensionality, and can use additional dimensions to further improve accuracy. (Experimental Evaluation)Figure 4. Representation consistency. Example shape decompositions produced by our model trained multi-class on 3D-R2 N2 . Shape elements are depicted by their support ellipsoids and colored consistently by index. Note that the shape element shown in brown is used to represent the right-front leg of the chairs, tables, desks, and sofas, as well as the front-right wheel of the cars. (Experimental Evaluation)Figure 6. Human body modeling. Surface reconstructions and decompositions for 4 random SMPL [5] human meshes from the SURREAL [37] dataset. For each triple, from left to right: SMPL mesh, our reconstruction, our shape decomposition. These results demonstrate unsupervised correspondence between people in different poses as well as accurate reconstructions of organic shapes. (Experimental Evaluation)Figure 2. Autoencoder examples. F-scores for the test set (8746 shapes) are shown ordered by the DSIF F-score, with examples marked with their position on the curve. Our reconstructions (blue curve) are most accurate for 93% of shapes (exact scores shown faded). The scores of OccNet and SIF follow roughly the same curve as DSIF (rolling means shown bold), indicating shapes are similarly difficult for all methods. Solid shapes such as the rifle are relatively easy to represent, while shapes with irregular, thin structures such as the lamp are more difficult. (Experimental Evaluation)Figure 5. Generalization examples. Example shape reconstructions for piano, printer, and camera classes, which did not appear in the training data. F-score is plotted below ordered by DSIF score, similar to Figure ??. Our method (blue curve) achieves the best accuracy on 91% of the novel shapes. (Experimental Evaluation)Figure 7. Depth completion examples. Visualizations of surfaces predicted from posed depth images (depicted by green points). Our method provides better details in both the observed and unobserved parts of the shape. (3D Completion from a Single Depth Image)Figure 8. Surface reconstruction from partial human scans. (Reconstruction of Partial Human Body Scans)›