Introduction
Digital images and videos are nowadays ubiquitous.
This stems from the rapid growth of cheap sensors such as webcams, digital cameras, camcorders, smartphones...
A natural consequence of this omnipresence is a need for sophisticated algorithms to manipulate a massive amount of data.
This is one of the reasons why Computer Vision has become a major research topic over the past few decades.
In a nutshell, the ultimate goal of Computer Vision would be to make computers able to understand the world into which they `live'.
Here, the word `computer' must be taken in a broad sense since computing chips are now not only in classical computers but in many other devices such as smartphones, cars, etc.
The verb `to understand' must also be considered in a really broad sense.
Despite all the efforts spent by the scientists during the past decades, we are still far from having intelligent computers.
In a more realistic way, it would be probably more reasonable to talk about `automatic extraction of information' of images instead of `understanding the world'.
The mechanical being does not exist yet but Computer Vision is nonetheless of broad interest with useful applications in domains such as multimedia, metrology, medical imaging, robotics, etc.
Computer Vision has been present in the professional context for decades now, particularly in the field of medical imaging.
However, the situation is evolving quite rapidly since the beginning of this millennium.
Indeed, Computer Vision is now involved in mass products such as mobile phones, cars, and game consoles.
This has opened brand new perspectives and developments for research in Computer Vision.
This thesis deals with specific problems in Computer Vision.
More precisely, we mainly consider the three following topics: surface fitting, image registration, and 3D reconstruction in deformable environments.
In this thesis, we always consider parametric approaches.
This is a common characteristic to all our contributions.
Before giving general points about the specific problems treated here, let us have a quick look at what a parametric approach is.
Most fundamentally, a parametric approach relies on what we call the `magic triplet'1.1.
This magic triplet is made of the three following elements:
- Parametric models.
- A parametric model is a family of functions that can be described by a finite set of parameters1.2. The combination of a parametric model with a set of parameters allows one to model a phenomenon (such as a surface that fits a cloud of points or the deformation between two images). Many parametric models may be used in Computer Vision, ranging from very specific models (representing, for instance, an affine transformation) to models of general use (such as the well-known B-splines that allows one to model complex transformations such as the image deformation function). The choice of a `good' parametric model, i.e. a model that can represent the phenomenon under study, is extremely important. In this thesis, we use many different existing parametric models and also propose new ones.
- Parameter estimation.
- Parameter estimation is a central part of parametric approaches. It consists in finding an appropriate set of parameters that, combined with a fixed parametric model, `explains' correctly a data set. This is achieved by modelling the problem under study. It usually results in a `score function' (also known as criterion, cost function, loss function, or residual error). The minimization (or, sometimes, maximization) of this criterion is expected to give the right result. The modelling step is a mathematical formulation of the problem that takes into account various elements such as the nature of the data, the type of measurement noise, the presence of erroneous data, the prior knowledge one has about the solution, etc.
- Hyperparameter estimation.
- Hyperparameter estimation is another important part of any classical parametric approach. What we call hyperparameters in this document are the additional parameters that typically arise in the optimization problems resulting of the modelling step. The number of control points of a B-spline or the strength given to a regularization prior are examples of hyperparameters. For reasons that will become clear along this manuscript, hyperparameters cannot be estimated the same way as natural parameters. Determining in an automatic way good hyperparameters is a challenging problem that is often neglected. Using appropriate hyperparameters is nonetheless crucial in order to get satisfactory results with a parametric approach. Part of our work is dedicated to this point.
As said previously, the work presented in this thesis focuses on various specific topics in Computer Vision.
We now give a brief overview of these topics.
Of course, these elements will be further detailed in the corresponding chapters of this thesis.
- Range surface fitting
- is the problem of finding an analytic expression of a smooth parametric surface that approximates a set of 3D data point. In this document, we consider `range data'. This type of data is also known as 2.5D. It may be viewed as a set of 2D locations, each one of which being associated an altitude (or height, or depth). Range data is now of broad interest because there are some devices that allows one to get such data quite easily: Time-of-Flight cameras, laser range scanners, etc. The main challenges encountered in such problems is to cope with noise, large amounts of data, and discontinuities.
- Image registration
- is the problem of determining the transformation between two (or more) images of the same scene. Various types of transformations may be considered: photometric, geometric. In this document, we are mainly interested in geometric transformations. Besides, we focus on deformable environments. It means that the position (or the shape) of the objects may vary between the images to register. This implies that complex parametric models must be used to model the deformations. Consequently the parameter estimation step also becomes quite difficult in general.
- 3D reconstruction of deformable surfaces.
- The last problem of Computer Vision addressed in this document is the reconstruction of a deforming 3D surface from a monocular video. Using the motion cue only makes it a fundamentally ill-posed problem since there exist an infinite number of 3D shapes that have the same reprojection in an image. We propose to overcome this problem by considering that the deformable surface is inextensible and that a reference shape is available for a template image. Although these assumptions are common, the way we enforce the underlying constraints is new: we model the reconstructed surface as a smooth surface (based on tensor-product B-splines) and impose that the surface be everywhere a local isometry.
Various contributions related to the three main topics of the previous section are given in this thesis.
We now give a brief overview of our contributions.
we propose a new method that allows one to automatically tune the hyperparameters in an efficient way from a computational point of view. The results we get with our approach are similar to those obtained with state-of-the-art approaches such as Cross-Validation.
We also propose an algorithm to fit a surface to data presenting heteroskedastic noise, i.e. noise with a variance which is not constant over the whole dataset.
Moreover, our algorithm is extremely fast, which is interesting to process data coming from devices such as Time-of-Flight cameras.
we propose a new parametric warp relying on NURBS (Non Uniform Rational B-Spline).
We show that this model is particularly well-suited to perspective imaging conditions while, on the contrary, classical Free-Form Deformations relying on B-splines are more adapted to affine imaging conditions.
Practical elements about the estimation of our NURBS-warps' parameters are also given in details.
we propose a new modelling of the problem that allows one to discard the thorny problems caused by the so-called `region-of-interest'. This new approach relies on the simple, yet successful, idea that the parts of the scene seen in one image but out of the field of view in the other images may be considered as outliers (as the outliers caused, for instance, by occlusions or specularities).
we provide a new principle for automatically tuning the hyperparameters.
This new principle relies on the well-known paradigm of dividing the data into a training set and a test set.
We adapt this principle to the specific context of feature-based image registration by considering features as the training set and pixel colours as the test set.
This approach has the advantage of using all the available image information (both features and colours).
In a sense, this new approach may be considered as a way of combining the feature-based approach (used for the estimation of the natural parameters of the warp) and of the direct approach (used to automatically set the hyperparameters).
we propose a new algorithm that reconstructs inextensible surfaces from a monocular video.
Two approaches are proposed.
The first one reconstructs a sparse surface, i.e. a cloud of three-dimensional points.
We manage to implement this first approach as a second order cone program.
The second one reconstructs a smooth and parametric surface such as a tensor product B-spline.
The founding idea is to say that the function (the parametric model) representing the reconstructed 3D surface must be locally and everywhere an isometry.
This idea is implemented as a new term which is included in a cost function that also bundles a data term and a regularization term.
Chapter 2 presents the general tools used in this thesis.
In particular, we give all the convention and notation, the basic elements on optimization, and the most usual parametric models of functions.
Chapter 3 also deals with basic elements but which are more related to the central topics of this thesis, i.e. basics on parameter and hyperparameter estimation.
While these first two chapters only address general points, the other chapters are specific to our work.
In particular, each one of the last chapters includes at least one of our contributions.
Chapter 4 is dedicated to the problem of fitting a parametric smooth surface to range data.
General points about range data are given in this chapter.
In particular, some explanations on the standard devices that allows one to acquire range data are given.
We also present our two contributions which are related to the problem of surface fitting with range data.
Chapter 5 is concerned by a fundamental problem in Computer Vision: image registration.
A review of the classical approaches to image registration is given.
Then, two contributions related to parameter estimation in image registration are given in details.
Chapter 6 is also dedicated to image registration.
However, we have decided not to merge chapter 5 and chapter 6.
Indeed, chapter 5 is about specific techniques that, given a parametric model, allows one to compute the parameters (and the hyperparameters).
Chapter 6 is different of chapter 5 in the sense that we propose a new parametric model instead of giving methods to estimates the parameters.
Chapter 7 deals with the reconstruction of deformable and inextensible surfaces from a monocular video.
This chapter comes last since the tools and methods it proposes may be built upon the contributions brought in the previous chapters.
For instance, the first step in the method we propose to reconstruct a 3D surface is to register successive images of the video.
We conclude and comment our work and its possible future evolutions in chapter 8.
Contributions to Parametric Image Registration and 3D Surface Reconstruction (Ph.D. dissertation, November 2010) - Florent Brunet
Webpage generated on July 2011
PDF version (11 Mo)