In this post, I’ll be reviewing this very interesting paper titled ‘Learning Semantic Deformation Flows with 3D Convolutional Networks‘ by Yumer and Mitra.
- Overview: A CNN is trained to take in a voxelized representation of a given shape (e.g. car) and a semantic deformation intention (e.g., make more sporty) as inputs, and generate a deformation flow as the output. The deformation flow is added to the input voxel shape via a Free Form Deformation (FFD) module to produce the desired shape.
- Baseline: Directly regress the output voxel shape instead of computing a deformation flow
- Learning the deformation flow instead of direct volumetric generation results in ~70% less error
- Deformation Flow Computation: FFD embeds a shape in a lattice space and enables the embedded shape to be deformed using the FFD lattice vertices, which act as control points in the local volumetric deformation coordinate system. The FFD lattice vertices are defined at the voxel centers of the last layer of the CNN, since the prediction is per voxel.
- Ablations: U-Net style architecture with skip connections between the encoder and decoder performs better than the network trained without skip connections.
- Train and test on three different data types – mesh, point set, depth scan
- Datasets: ShapeNet, SemEd – cars, shoes, chairs, airplanes
- A network is trained per object class
- Comparison Methods: Yumer et al.
 Yumer, M. Ersin, and Niloy J. Mitra. “Learning semantic deformation flows with 3d convolutional networks.” European Conference on Computer Vision. Springer, Cham, 2016.
 Yumer, M.E., Chaudhuri, S., Hodgins, J.K., Kara, L.B.: Semantic shape editing using deformation handles. ACM Transactions on Graphics (TOG) 34(4), 86 (2015)