Introduction

Registering images of a deforming surface is important for tasks such as video augmentation by texture editing, deformation capture and non-rigid Structure-from-Motion. This is a difficult problem since the appearance of imaged surfaces varies due to several phenomena such as camera pose, surface deformation, lighting and motion blur. Recovering a 3D surface, its deformations and the camera pose from a monocular video sequence is intrinsically ill-posed. While prior information can be used to disambiguate the problem, see e.g. (29,75,145), it is common to avoid a full 3D model by using image-based deformation models, e.g. (16,24,49,111). The Thin-Plate Spline warps (TPS) is one possible deformation model, proposed in a seminal paper by (24), that has been shown to effectively model a wide variety of image deformations in different contexts. Recent work shows that the TPS warp can be estimated not only with the traditional landmark based method, but also with direct methods, i.e. by minimizing the intensity discrepancy between registered images (16,111). Other non-rigid warps include Radial Basis Functions (with e.g. multiquadrics (113) or Wendland's (72) as kernel function) and Free-Form Deformations (162).

The Gauss-Newton algorithm with additive update of the parameters is usually used for conducting the minimization. Its main drawback is that the Hessian matrix must be recomputed and inverted at each iteration. More efficient solutions have been proposed by (9) based on compositional updating of the parameters. They might lead to a constant Hessian matrix. Most non-rigid warps do not form groups, preventing the use of compositional algorithms which require one to compose and possibly invert the warps. Despite several attempts to relax the groupwise assumption by various approximations (158,75,123), there is no simple solution in the literature.

This paper is an extended version of an earlier conference version (76). With respect to the literature, it brings several contributions:

We experimentally show that Feature-Driven algorithms are clearly more efficient without loss of accuracy compared to previous state-of-the-art methods. The combination of the Feature-Driven framework with Learning-based local registration outperforms other algorithms for most experimental setups.

Roadmap.

Previous work is reviewed in section A.2. In particular, previous attempts to extending compositional algorithms to non-groupwise warps are presented in section A.2.2. The Feature-Driven framework and the associated operations are explained in section A.3. Registration with the Feature-Driven Inverse Compositional and the Feature-Driven Learning-based algorithms are described in section A.4. The Feature-Driven parametrization of the TPS and of the FFD warps are detailed in section A.5. Experimental results on simulated and real data are reported in section A.6. Conclusions and further work are discussed in section A.7. Details on the piecewise linear relationship we used for the Learning-based local registration step are given in appendix A.7.

Notation.

Scalars are in italics ($ x$), vectors in bold ( $ \mathbf{v}$), matrices in sans-serif ( $ \mathsf{M}$) and sets (or collections) in fraktur ( $ \mathfrak{C}$). Vectors are always considered as column vectors. The inverse of a matrix  $ \mathsf{M}$ is written  $ \mathsf{M}^{-1}$, the pseudo-inverse  $ \mathsf{M}^\dagger $ and the transpose  $ \mathsf{M}^\mathsf{T}$. The symbol  $ \mathbb{R}$ denotes the set of the real numbers. The identity matrix of size $ n$ is denoted  $ \mathsf{I}_n$. The notation  $ \mathbf{0}_{m \times n}$ and  $ \mathbf{1}_{m \times n}$ corresponds to the matrices of size  $ m \times n$ filled with zeros and ones respectively. The operator that vectorizes a matrix is denoted  $ \boldsymbol{\nu}$, i.e. $ \boldsymbol{\nu}(\mathsf{M}) = \left( \mathbf{m}_1^\mathsf{T}\; \ldots \; \mathbf{m}_l^\mathsf{T}\right)^\mathsf{T}$ where the vectors  $ \left\{\mathbf{m}_i\right\}_{i=1}^l$ are the columns of  $ \mathsf{M}$. Conversely, the operator $ \zeta_p$ builds a matrix of size  $ \mathbb{R}^{q \times p}$ from a vector of size  $ \mathbb{R}^{pq}$, i.e. $ \zeta_p(\mathbf{v}) = \left( \mathbf{v}_1 \; \ldots \; \mathbf{v}_p \right) \in \mathbb{R}^{q \times p}$ where $ \mathbf{v} = \left( \mathbf{v}_1^\mathsf{T}\; \ldots \; \mathbf{v}_l^\mathsf{T}\right)^\mathsf{T}\in \mathbb{R}^{pq}$. The notation $ \zeta$ is used to abbreviate $ \zeta_2$. We denote  $ \textrm{rms}(\mathbf{v})$ the Root Mean of Squares (RMS) of the $ m$-vector  $ \mathbf{v}$, i.e. $ \textrm{rms}(\mathbf{v})=\sqrt{\frac{1}{m} \sum _{i=1} ^{m}\mathbf{v}^2_i} \propto \left\Vert \mathbf{v} \right\Vert$, with $ \Vert \:\raisebox{1pt}{$\scriptscriptstyle\bullet$}\:\Vert$ the two-norm.

Images are considered as  $ \mathbb{R}^2 \rightarrow \mathbb{R}$ functionsA.2 and are denoted using calligraphic fonts ( $ \mathcal{A}$). If  $ \mathfrak{C}$ is a collection of pixels then  $ {\boldsymbol{\xi}}_\mathfrak{C}(\mathcal{I})$ is the vector in which are stacked the values of  $ \mathcal {I}$ for all the pixels indicated in  $ \mathfrak{C}$. More precisely, if  $ \mathfrak{C} = \left\{ \mathbf{q}_i \right\}_{i=1}^{\vert\mathfrak{C}\vert}$ then $ {\boldsymbol{\xi}}_\mathfrak{C}(\mathcal{I}) = \left( \mathcal{I}(\mathbf{q}_1...
...rt\mathfrak{C}\vert}) \right)^\mathsf{T}\in \mathbb{R}^{\vert\mathfrak{C}\vert}$ where  $ \vert\mathfrak{C}\vert$ is the cardinal of the set  $ \mathfrak{C}$.

The images to be registered are written  $ \mathcal{I}_i$ with  $ i=1,\dots,n$. The texture image, e.g. the region of interest in the first image, is denoted  $ \mathcal{I}_0$. The set of pixels of interest, i.e. the subset of pixels of the image  $ \mathcal{I}_0$ actually used to estimate a warp, is denoted  $ \mathfrak{R}$. A generic parametric warp is written $ \mathcal {W}$. It depends on a parameter vector  $ \mathbf{u}_i$ for image  $ \mathcal{I}_i$ and maps a point  $ \mathbf{q}_0$ from the texture image to the corresponding point  $ \mathbf{q}_i$ in the $ i$-th image: $ \mathbf{q}_i = \mathcal{W}(\mathbf{q}_0;\mathbf{u}_i)$. The notation  $ \mathcal{W}(\mathbf{q} ; \:\raisebox{1pt}{$\scriptscriptstyle\bullet$}\:)$ designates the warp as a function of its parameters, i.e. an $ \mathbb{R}^l \times \mathbb{R}^2$ function where $ l$ is the size of the parameter vector, instead of as a function of the pixels.


Contributions to Parametric Image Registration and 3D Surface Reconstruction (Ph.D. dissertation, November 2010) - Florent Brunet
Webpage generated on July 2011
PDF version (11 Mo)