The process of registering each video sequence both spatially and temporally is a prerequisite for studying facial dynamics across individuals. In this task we aims to extract a relevant facial representation from raw images acquired by standard video cameras, namely face geometry information. To accomplish this task three main research topics will be addressed: 1- Face geometry modeling using Active Appearance Models (AAM); 2- Improve AAM fitting w.r.t. to occlusion and light variability; 3- Accurate 3D facial reconstruction from multiple views using the AAM; 4- Quasi-dense 3D facial reconstruction. |
|
In this sub-task we explore the use of AAM and several proposed improvements in order to create a robust and accurate face geometry model. AAMs are generative nonlinear parametric models of shape and texture, commonly used to model faces. These adaptive template matching methods model the variability of shape and texture captured from a representative training set, being able to fully describe both shape (2D control points localization) and texture (appearance) of the trained faces, as well as unseen faces. Several enhancements were proposed to the traditional AAM where the most relevant was the fitting formulation based on the Inverse Composition Image Alignment. A true gradient derived approach (such as the Simultaneous Inverse Compositional or the ProjectOut formulation) was used rather than a pre-computed numerical estimate leading into a more efficient and accurate fitting quality. Another AAM variant, the combined 2D+3D AAM was also proposed, where a 3D AAM (build with structure from motion) is constrained by a 2D AAM allowing the recovery of the 3D shape of faces from a single image proving to be a more stable model with less local minima prone. We will explore the use of Simultaneous Inverse Compositional or the ProjectOut formulation as well as the 2D+3D AAM to model a sparse 3D model of a face.
In this sub-task we will explore several approach in order to improve the morphable face model with respect to partial occlusions and light variation. One of the major limitations of the Active Appearance Models is its lack of robustness to partial occlusion. A small amount of occlusion is sufficient to cause the AAM search to diverge. AAM fitting minimize the L2 norm (SSD) between the model and the (warped) target image. The use of an appropriate norm, was shown to make the AAM more robust to missing data and outliers. This approach and other light invariant texture models will be also subject to further research.
For the purpose of out research, a 3D facial reconstruction is required. Instead of using 3D dense range models, or dense stereo based reconstruction solutions, we propose to fit a single 2D+3D AAM to multiple images (not less than three cameras) captured simultaneously by uncalibrated cameras positioned arbitrarily in front of the individual’s face. Multiple view AAM fitting provide an accurate 3D sparse facial reconstruction less prone to self occlusion.
In order to robustly characterize the facial dynamics using the sparse reconstruction obtained with the 2D+3D AAM, it is necessary to make this discretization process sufficiently refined to accurately capture the individual ́s facial dynamics. To accomplish this task, and since the facial reconstruction is based on an AAM model, it is necessary to increase the density of reconstruction, intensifying the number of vertices, using densification approaches.
Summarizing, this task aims the 3D quasi-dense facial reconstruction from multiple view images using a single 2D+3D AAM. This task will also devote further investigation on robust solutions for the AAM fitting under occlusions and light invariant texture models.