Subsections

Range Surface Fitting with Heteroskedastic Noise

In this section, we present some other contributions we made in the topic of reconstructing range surface. In the previous sections, we considered arbitrary range data in the sense that the locations of the data points did not follow any particular pattern. In this section, we consider data sets with points arranged on a regular grid. More precisely, we consider data sets in form of depth maps produced by TOF cameras. This type of data is particularly challenging for several reasons. Firstly, there is an important amount of data. With the equipment we used, there was typically around 20 thousands points per depth map. Secondly, there is usually an important amount of noise in the TOF data. Besides, this noise is heteroskedastic which means that the variance of the noise is not identical for all the pixels of the depth map. Thirdly, as TOF cameras can capture data from arbitrary parts of the environment, the resulting depth maps are likely to contain discontinuities. The contributions proposed in this section aim at coping with these different aspects of TOF data.

This section is organized as follows. First, we specialize the general approach to range surface fitting with B-splines to the particular case of data arranged on a regular grid. In this first part, we will just consider an homoskedastic noise. In addition to the presentation of the general way of handling grid data, we make two contributions: a new regularization term that approximate the bending energy while being compatible with the grid approach and a new manner of selecting the regularization parameter. Next, we present our multi-step approach for handling heteroskedastic noise and discontinuities. Finally, the efficiency of our approach is illustrated with a set of experiments.


Fitting a B-spline on Mesh Data

Properties of the Kronecker Product

As it will be shown later, fitting a B-spline on mesh data leads to formulas that heavily relies on the Kronecker product. We thus give here a list of interesting properties linked to the Kronecker product. The following identities are taken from (61):

$\displaystyle \mathbf{I}_p \otimes \mathbf{I}_q$ $\displaystyle = \mathbf{I}_{pq}$ (4.74)
$\displaystyle (\mathsf{A} \otimes \mathsf{B})^\mathsf{T}$ $\displaystyle = \mathsf{A}^\mathsf{T}\otimes \mathsf{B}^\mathsf{T}$ (4.75)
$\displaystyle (\mathsf{A} \otimes \mathsf{B})^{-1}$ $\displaystyle = \mathsf{A}^{-1}\otimes \mathsf{B}^{-1}$ (4.76)
$\displaystyle (\mathsf{A} \mathsf{B}) \otimes (\mathsf{C} \mathsf{D})$ $\displaystyle = (\mathsf{A} \otimes \mathsf{C})(\mathsf{B} \otimes \mathsf{D})$ (4.77)
$\displaystyle (\mathsf{A}+ \mathsf{B}) \otimes (\mathsf{C} + \mathsf{D})$ $\displaystyle = \mathsf{A}\otimes \mathsf{C} + \mathsf{B} \otimes \mathsf{C} + \mathsf{A} \otimes \mathsf{D} + \mathsf{B} \otimes \mathsf{D}$ (4.78)
$\displaystyle \mathrm{vect}(\mathsf{A}_{p\times s} \mathsf{X}_{s \times t})$ $\displaystyle = (\mathbf{I}_t \otimes \mathsf{A}) \mathrm{vect}(\mathsf{X})$ (4.79)
$\displaystyle \mathrm{vect}(\mathsf{X}_{s\times t} \mathsf{B}_{t \times q})$ $\displaystyle = (\mathsf{B}^\mathsf{T}\otimes \mathbf{I}_s) \mathrm{vect}(\mathsf{X})$ (4.80)
$\displaystyle \mathrm{vect}(\mathsf{A}_{p\times s} \mathsf{X}_{s \times t} \mathsf{B}_{t \times q})$ $\displaystyle = (\mathsf{B}^\mathsf{T}\otimes \mathsf{A}) \mathrm{vect}(\mathsf{X})$ (4.81)

Here comes additional properties that will be useful in the derivations of the principles for fitting a B-spline to mesh data:

$\displaystyle \mathrm{vect}(\mathsf{A}_{m \times n}) = \mathrm{vect}(\mathsf{B}_{m \times n})$ $\displaystyle \Leftrightarrow \mathsf{A}_{m \times n} = \mathsf{B}_{m \times n}$ (4.82)
$\displaystyle \mathrm{trace}(\mathsf{A}_m \otimes \mathsf{B}_n)$ $\displaystyle = \mathrm{trace}(\mathsf{A}_m)\mathrm{trace}(\mathsf{B}_n)$ (4.83)
$\displaystyle \mathrm{vect}(\mathsf{A})-\mathrm{vect}(\mathsf{B})$ $\displaystyle = \mathrm{vect}(\mathsf{A} - \mathsf{B})$ (4.84)
$\displaystyle \left \Vert\mathrm{vect}(\mathsf{A})\right \Vert^2$ $\displaystyle = \left \Vert \mathsf{A}\right \Vert _\mathcal{F}^2$ (4.85)
$\displaystyle \left \Vert \mathsf{A} \right \Vert _\mathcal{F}^2$ $\displaystyle = \mathrm{trace}(\mathsf{A}^\mathsf{T}\mathsf{A})$ (4.86)
$\displaystyle \mathrm{trace}(\mathbf{I}_n - \mathsf{A}_{n\times n})$ $\displaystyle = n - \mathrm{trace}(\mathsf{A})$ (4.87)

Mesh Data

In this section, we consider that the data set is made of range data points organized as a regular grid of size $ m \times n$. This type of data naturally arises with TOF cameras since this device produces depth maps. This particular data structure leads to write the whole data set in the following way:

$\displaystyle \left \{ \mathbf{q}_{i,j} = (x_i,y_j) \leftrightarrow z_{i,j} \:\vert\:i \in \llbracket 1,m\rrbracket , j \in \llbracket 1,n\rrbracket \right \}.$ (4.88)

The depths $ z_{i,j}$ are grouped in a matrix $ \mathsf{Z} \in \mathbb{R}^{m \times n}$.

Fitting a B-spline to Mesh Data

The general principle for fitting a B-spline to mesh range data is the same as the one used in the previous sections. It is still formulated as a minimization problem with a cost function composed of two terms. However, for the needs of this section, this optimization problem is formulated in a slightly different way:

$\displaystyle \min_{\mathsf{P}} \mathcal{D}(\mathsf{P}) + \mathcal{R}_\lambda(\mathsf{P}),$ (4.89)

where $ \mathcal{D}$ and $ \mathcal{R}_\lambda$ are respectively the data term and the regularization term. Here, we consider that the regularization parameter $ \lambda$ is part of the regularization term itself.

Data term.

The data term is again defined as the mean squared residual:

$\displaystyle \mathcal{D}(\mathsf{P}) = \frac{1}{mn} \left \Vert \mathsf{N} \mathrm{vect}(\mathsf{P}) - \mathrm{vect}(\mathsf{Z}) \right \Vert^2.$ (4.90)

The fact that the data are arranged on a grid is exploited in the collocation matrix  $ \mathsf{N}$. Indeed, since we use uniform cubic B-splines, this matrix is initially defined as:

$\displaystyle {\tiny \mathsf{N} = \begin{pmatrix}N_{-3}(x_1)N_{-3}(y_1) & \ldot...
...& N_{g-1}(x_m)N_{-3}(y_n) & \ldots & N_{g-1}(x_m) N_{3}(y_n)  \end{pmatrix}}.$ (4.91)

From equation (4.92), it is easy to see that the collocation matrix  $ \mathsf{N}$ can be written as the Kronecker product of two matrices:

$\displaystyle \mathsf{N} = \mathsf{E} \otimes \mathsf{F},$ (4.92)

where:

$\displaystyle \mathsf{E} = \begin{pmatrix}N_{-3}(x_1) & \ldots & N_{g-1}(x_1)  \vdots & & \vdots  N_{-3}(x_m) & \ldots & N_{g-1}(x_m)  \end{pmatrix},$ (4.93)

and

$\displaystyle \mathsf{F} = \begin{pmatrix}N_{-3}(y_1) & \ldots & N_{h-1}(y_1)  \vdots & & \vdots  N_{-3}(y_n) & \ldots & N_{h-1}(y_n)  \end{pmatrix}.$ (4.94)

Consequently, the data term can be written as:

$\displaystyle \mathcal{D}(\mathsf{P}) = \left \Vert (\mathsf{E} \otimes \mathsf{F}) \mathrm{vect}(\mathsf{P}) - \mathrm{vect}(\mathsf{Z}) \right \Vert^2$ (4.95)

Regularization term.

In order to obtain the best efficiency in terms of computational complexity, we propose a new regularization term. It is a variation of the classical bending energy used so far. We propose a new regularization term because the classical bending energy cannot be expressed with a tensor product expression. We define it as:

\begin{multline}
\mathcal{R}_\lambda(\mathsf{P}) = \lambda \int_{k_0^y}^{k_h^y} ...
...\partial y^2}(x,y ; \mathsf{P}) \right )^2 \mathrm dx \mathrm dy.
\end{multline}

The main advantage of the regularization term of equation (4.97) is that, as the data term, it can be written using the Kronecker product. Indeed, we have that:

$\displaystyle R_\lambda(\mathsf{P}) = \left \Vert \mathsf{R}_\lambda \mathrm{ve...
...es \mathsf{S}_y)  \lambda^2(\mathsf{S}_x \otimes \mathsf{S}_y) \end{bmatrix},$ (4.96)

where $ \mathsf{S}_x$ and $ \mathsf{S}_y$ are the square roots of the monodimensional bending matrices along the $ x$ and $ y$ directions respectively.

Solution to the problem.

The main advantage of having a data term and a regularization term that can be written using the Kronecker product of matrices is that it leads to efficient computations. Indeed, given the expressions of the data term and the regularization term, the solution of problem (4.90) is given by:

$\displaystyle \mathsf{P}^\star = \left ( \mathsf{F}^\mathsf{T}\mathsf{F} + \lam...
...} \left ( \mathsf{E}^\mathsf{T}\mathsf{E} + \lambda \mathsf{B}_x \right )^{-1},$ (4.97)

where $ \mathsf{B}_x = \mathsf{S}_x^\mathsf{T}\mathsf{S}_x$ and $ \mathsf{B}_y = \mathsf{S}_y^\mathsf{T}\mathsf{S}_y$. The solution given by equation (4.99) is derived with the following reasoning:

  $\displaystyle \min_{\mathsf{P}} \left \Vert \begin{bmatrix}\mathsf{E} \otimes \...
...athsf{Z})  \mathbf{0}  \mathbf{0}  \mathbf{0} \end{bmatrix}\right \Vert^2$    
$\displaystyle \Leftrightarrow \quad$ $\displaystyle \begin{bmatrix}\mathsf{E} \otimes \mathsf{F}  \sqrt{\lambda} \m...
...athrm{vect}(\mathsf{Z})  \mathbf{0}  \mathbf{0}  \mathbf{0} \end{bmatrix}$    
$\displaystyle \Leftrightarrow \quad$ $\displaystyle \big\{ (\mathsf{E}^\mathsf{T}\otimes \mathsf{F}^\mathsf{T})(\math...
...{E}^\mathsf{T}\otimes \mathsf{S}_y^\mathsf{T})(\mathsf{E} \otimes \mathsf{S}_y)$    
  $\displaystyle \qquad + \lambda^2 (\mathsf{S}_x^\mathsf{T}\otimes \mathsf{S}_y^\...
... (\mathsf{E}^\mathsf{T}\otimes \mathsf{F}^\mathsf{T}) \mathrm{vect}(\mathsf{Z})$    
$\displaystyle \Leftrightarrow \quad$ $\displaystyle (\mathsf{E}^\mathsf{T}\mathsf{E} \otimes \mathsf{F}^\mathsf{T}\ma...
...athsf{E} \otimes \mathsf{S}_y^\mathsf{T}\mathsf{S}_y) \mathrm{vect}(\mathsf{P})$    
  $\displaystyle \qquad + \lambda^2 (\mathsf{S}_x^\mathsf{T}\mathsf{S}_x \otimes \...
... (\mathsf{E}^\mathsf{T}\otimes \mathsf{F}^\mathsf{T}) \mathrm{vect}(\mathsf{P})$    
$\displaystyle \Leftrightarrow \quad$ $\displaystyle \mathsf{F}^\mathsf{T}\mathsf{F} \mathsf{P} \mathsf{E}^\mathsf{T}\...
...\mathsf{S}_x^\mathsf{T}\mathsf{S}_x = \mathsf{F}^\mathsf{T}\mathsf{Z}\mathsf{E}$    
$\displaystyle \Leftrightarrow \quad$ $\displaystyle \left ( \mathsf{F}^\mathsf{T}\mathsf{F} \mathsf{P} + \lambda \mat...
...E} + \lambda \mathsf{B}_x \right ) = \mathsf{F}^\mathsf{T}\mathsf{Z} \mathsf{E}$    
$\displaystyle \Leftrightarrow \quad$ $\displaystyle \left ( \mathsf{F}^\mathsf{T}\mathsf{F} + \lambda \mathsf{B}_y\ri...
...E} + \lambda \mathsf{B}_x \right ) = \mathsf{F}^\mathsf{T}\mathsf{Z} \mathsf{E}$    
$\displaystyle \Leftrightarrow \quad$ $\displaystyle \mathsf{P} = \left ( \mathsf{F}^\mathsf{T}\mathsf{F} + \lambda \m...
...E} \left ( \mathsf{E}^\mathsf{T}\mathsf{E} + \lambda \mathsf{B}_x \right )^{-1}$    


Choice of the Regularization Parameter

At this point, we have an efficient approach to fit a surface to range data (with homoskedastic noise) for a given regularization parameter. In this section, we propose a new approach to determine in an automatic and fast manner the regularization parameter. The contribution we propose here is not completely brand new. In fact, it is an adaptation to our problem of a quite old principle: Morozov's discrepancy principle (130). In this section, we explain the principles of this criterion in the context of range surface fitting. Extended details on the Morozov's principle may be found in (141,182,80,168,177).

Again, we emphasize that for now, we just consider a noise with the same distribution over all the dataset. This stems from the fact that fitting a surface to range data with homoskedastic noise is the basic building block of the whole algorithm that we propose in this section. The case of the heteroskedastic noise will be handle in section 4.3.2. Let us consider that the dataset is a noisy sample of a reference function $ \underline{f}$, i.e. $ z_{i,j} = \underline{f}(j,i) + e_{i,j}$ where the $ e_{i,j}$ are random variables that represent the noise. Let us further assume that the noise has standard deviation $ \sigma$. For a given dataset  $ \mathsf{Z}$, let $ \mathcal{S}$ be the set of all the B-splines control points satisfying equation (4.99) for all possible regularization parameter $ \lambda$, i.e. :

$\displaystyle \mathcal{S} = \{ f(\textrm{\raisebox{1pt}{\tiny$\bullet$}}; \mathsf{P}_\lambda) \:\vert\:\lambda \in \mathbb{R}^+ \}.$ (4.98)

The surface fitting problem is equivalent to find the function $ f \in \mathcal{S}$ such that:

$\displaystyle f = \arg \min_{s \in \mathcal{S}} \Vert s - \underline{f} \Vert,$ (4.99)

where $ \Vert \textrm{\raisebox{1pt}{\tiny $\bullet$}}\Vert$ is some kind of function norm. The discrepancy principle states that the function $ f$ should be the one that results in a standard deviation for the errors ( $ \mathrm{std}(\mathsf{S} - \mathsf{Z})$ with $ s_{i,j} = f(j,i)$) that is as close as possible to the true one ($ \sigma$). Since $ \underline{f}$ is not necessarily a member of $ \mathcal{S}$, the strict equality between $ \sigma$ and $ \mathrm{std}(\mathsf{S} - \mathsf{Z})$ is not always possible. The regularization parameter is thus chosen as the one that minimizes the following problem:

$\displaystyle \min_{\lambda \in \mathbb{R}_+} \left ( \mathrm{std}(\mathsf{S}^\star(\lambda) - \mathsf{Z}) \right )^2,$ (4.100)

with $ s_{i,j}(\lambda) = s_\lambda(j,i)$ and $ s_\lambda$ is the B-spline obtained by solving equation (4.99) for a given value of $ \lambda$. Since the quantity $ \mathrm{std}(\mathsf{S}^\star - \mathsf{Z})$ increases with $ \lambda$, the criterion optimized in equation (4.102) has only one global minimum. Problem (4.102) may be optimized using, for instance, the golden section search algorithm described in section 2.2.2.12.

Of course, the ground truth standard deviation of the errors is generally unknown in practice. In place of that, we propose a simple but efficient approach to estimate the standard deviation of the noise directly from the dataset. It relies on a local linear approximation of the data. Let $ \mathcal{P}_{i,j}$ be the function representing the plane obtained by linear regression from the dataset $ \{ (v,u) \leftrightarrow z_{u,v} \:\vert\:u \in \llbracket i-1, i+1 \rrbracket \textrm{ and } v \in \llbracket j-1,j+1\rrbracket \}$. The approximate standard deviation $ \sigma_e$ for all the dataset $ \mathsf{Z}$ is given by:

$\displaystyle \sigma_e^2 = \frac{1}{(m-2)(n-2)} \sum_{i=2}^{m-1} \sum_{j=2}^{n-1} (\mathcal{P}_{i,j}(j,i) - z_{i,j})^2.$ (4.101)

The efficiency of this standard deviation estimator for range data will be tested in the experimental part of this section (see section 4.3.3.1). Note that this estimator is clearly more efficient with data that represent a smooth surface than with data that have discontinuities.

The FGA Algorithm

The combination of equation (4.99) and of the discrepancy principle (including the standard deviation estimator) is called the Fast Grid Approach algorithm (FGA). Note that this algorithm is designed to work on datasets with homoskedastic noise. It will be used as a basic building block of our more general approach to handle heteroskedastic data.


Our Approach to Handle Heteroskedastic Noise and Discontinuities

In section 4.3.1, we presented the FGA method which allows one to quickly fit a B-spline to range data organized as a regular mesh with homoskedastic noise. In this section, we present our approach to fit a B-spline on range data with heteroskedastic noise and discontinuities. The approach we propose to handle such data is a multi-step approach. The different steps of the algorithm are summarized below and detailed after that.

Segmentation.
The dataset is segmented in order to first, separate the different level of noise and second, get rid of the discontinuities.
Bounding boxes.
The data of each segment is embedded in a rectangular domain (the bounding box) so that the FGA algorithm can be used.
Local depth maps.
A local depth map is created for each segment by filling the corresponding bounding box.
Local fittings.
A surface is fitted to each one of the local depth maps using the FGA algorithm.
Merging.
A single global surface is finally constructed from the local fittings using the fact that the control points of a B-spline have only a localized influence.
Figure 4.25 illustrates the steps of our approach.

Figure 4.25: Illustration of our multistep algorithm to fit a B-spline to range data with discontinuities and heteroskedastic noise. (a) The scene viewed with an usual sensor (CMOS sensor). (b) Depth map acquired with a TOF camera (PMDTec 19k). (c) Depth map segmentation. (d) Bounding box for the segment  $ \mathcal{C}^{(14)}$. (e) Local depth map obtained by extrapolating the data from  $ \mathcal{C}^{(14)}$. (f) Local fitting. (g) Depth map sampled from the global surface obtained by merging all the local fittings. Compared to the initial depth map (b), the amount of noise has been reduced while preserving the edges of the objects.
Image ccd0100 Image depthmap-1-7 Image segm
(a) (b) (c)
Image box Image bbox2 Image local2
(d) (e) (f)
Image highfitted-1-7  
(g)  

Segmentation

The first step of our approach consists in segmenting the initial dataset according to the depth of the points. Using this criterion is motivated by the fact that the amount of noise depends mainly on the distance between the sensor and the scene. Besides, this approach makes the discontinuities coincident with the borders of the segments which is a desirable behaviour in the sense that it is easier to fit a surface on a dataset without discontinuities.

The segmentation is achieved with the $ \alpha$-expansion algorithm of (198,26). The implementation details are given below. The result of the segmentation step is a set of $ r$ labels  $ E = \llbracket 1,r \rrbracket $ and a collection of $ r$ segments  $ \mathcal{C}^{(t)}$ for  $ t \in \llbracket 1,r \rrbracket $. The segment  $ \mathcal{C}^{(t)}$ is the set of pixels  $ (i,j) \in \llbracket 1,m \rrbracket \times \llbracket 1,n \rrbracket $ labelled with the value $ t$. Note that each segment is guaranteed to be 8-connected. We denote  $ \mathsf{C}^{(t)}$ the matrix such that  $ c_{i,j}^{(t)} = 1$ if $ (i,j) \in \mathcal{C}^{(t)}$ and 0 otherwise.

Segmentation details.

Let $ \mathcal{J} = \llbracket 1,m \rrbracket \times \llbracket 1,n \rrbracket $ be the set of pixels of the depth map  $ \mathsf{Z}$. The segmentation consists in classifying each pixel of  $ \mathcal{J}$ in one of the $ r$ segments  $ \{\mathcal{C}^{(t)}\}_{t \in \llbracket 1,r \rrbracket }$. Besides, we want the segmentation to be spatially consistent and the segments to be sets of 8-connected pixels. To do so, we use the $ \alpha$-expansion algorithm proposed in (198,26). With this approach, the segmentation is the solution of an optimization problem:

$\displaystyle \min_{\{ \mathcal{C}^{(1)}, \ldots, \mathcal{C}^{(s)}\}} E_\mathrm{data} + E_\mathrm{reg},$ (4.102)

where $ s$ is an integer such that $ s \leq r$. $ E_\mathrm{data}$ is the data term, i.e. a term that measures the quality of allocating a certain label to certain pixels. $ E_\mathrm{reg}$ is a term promoting the spatial consistency of the segmentation.

We chose as data term the likelihood that a depth $ z_{i,j}$ belongs to the segment  $ \mathcal{C}^{(t)}$. If we consider a normal distribution of parameters  $ \mathbold{\theta}^{(t)} = (\mu^{(t)}, \sigma^{(t)})$ for the noise in the $ t$th segment, $ E_\mathrm{data}$ is given by:

$\displaystyle E_\mathrm{data}=\sum_{t \in \llbracket 1,s \rrbracket } \sum_{(i,j) \in \mathcal{J}} D^{(t)}(i,j),$ (4.103)
with$\displaystyle \quad D^{(t)}(i,j) = -\log\big(L((i,j)\in \mathcal{C}^{(t)} \vert \boldsymbol{\theta}^{(t)}))\big),$ (4.104)
and$\displaystyle \quad L((i,j)\in\mathcal{C}^{(t)} \vert \mathbold{\theta}^{(t)} ) \propto \exp \left(- \tfrac{(z_{i,j}-\mu^{(t)})^2}{2(\sigma^{(t)})^2}\right).$ (4.105)

The parameters  $ \{\mathbold{\theta}^{(t)}\}_{t\in \llbracket 1,s \rrbracket }$ are computed using the k-means algorithm on the depth map  $ \mathsf{Z}$. The number $ s$ is determined manually. The regularization term  $ E_\mathrm{reg}$ is defined using the Potts model ( $ \mathcal{N}(\mathbf{p})$ denotes the 8-neighbourhood of the pixel  $ \mathbf{p}$):

$\displaystyle E_\mathrm{reg} = \sum_{\mathbf{p}\in\mathcal{J}} \sum_{\mathbf{q}\in\mathcal{N}(\mathbf{p})} V(\mathbf{p}, \mathbf{q})$ (4.106)

with

$\displaystyle V(\mathbf{p}, \mathbf{q}) = \begin{cases}0 & \text{if $\mathbf{p}...
...l{C}^{(t)}$ and $\mathbf{q} \in \mathcal{C}^{(t')}$, $t \neq t'$}. \end{cases}$    

The scalar  $ c \in \mathbb{R}_+$ determines the weight given to the spatial consistency: we use $ c = 25$. The final touch of the segmentation consists in taking the $ r$ connected components of the $ s$ segments  $ \{\mathcal{C}^{(t)}\}_{t \in \llbracket 1,s \rrbracket }$.

Bounding Boxes

Usually, the data contained in the segments  $ \left \{\mathcal{C}^{(t)}\right \}_{t=1}^r$ does not form a complete rectangle. However, the data must be arranged in a full grid so that the FGA algorithm can be used. To do so, we define for each segment  $ \mathcal{C}^{(t)}$ a bounding box  $ \mathcal{B}^{(t)}$ which is, intuitively, a rectangular subdomain of  $ [k^x_{-k},k^x_{g+k+1}] \times [k^y_{-l},k^y_{h+l+1}]$ that includes the data contained in  $ \mathcal{C}^{(t)}$ and of minimal size. In other words, $ \mathcal{B}^{(t)}$ is the smallest rectangular domain  $ \llbracket \beta^{(t)}, \delta^{(t)} \rrbracket \times \llbracket \gamma^{(t)}, \eta^{(t)} \rrbracket $ (with $ \beta^{(t)},\delta^{(t)} \in M$ and $ \gamma^{(t)},\eta^{(t)} \in N$) that contains the non-rectangular domain  $ \mathcal{I}^{(t)}$ defined as follow:

$\displaystyle \mathcal{I}^{(t)} = \bigcup_{ \substack{ (i,j) \text{ s.t.}  (u...
...} }{\mathrm{infl}(p_{i,j})} \qquad \text{for $t \in \llbracket 1,r\rrbracket$},$ (4.107)

where $ \mathrm{infl}(p_{i,j})$ is the domain of influence of the weight $ p_{i,j}$, i.e. :

$\displaystyle \mathrm{infl}(p_{i,j}) = [k_j^x,k_{j+k+1}^x] \times [k_i^y,k_{i+l+1}^y].$ (4.108)


Local Depth Maps

Except for some exceptional cases, the data contained in the segment  $ \mathcal{C}^{(t)}$ does not fill completely the bounding box  $ \mathcal{B}^{(t)}$. Having a dataset organized as a complete grid is however a requirement to use the FGA algorithm. We thus define for each segment  $ \mathcal{C}^{(t)}$ a local depth map, denoted  $ \mathsf{Z}^{(t)}$, which is a rectangular completion of the data contained in  $ \mathcal{C}^{(t)}$. For all  $ t \in \llbracket 1,r \rrbracket $, the local depth map  $ \mathsf{Z}^{(t)} \subset \mathbb{R}^{m \times n}$ is defined as:

$\displaystyle z^{(t)}_{i,j} = \begin{cases}z_{i,j} & \text{if $(i,j)\in \mathca...
...otin \mathcal{B}^{(t)}$}  \hat{z}^{(t)}_{i,j} & \text{otherwise.} \end{cases}$ (4.109)

The values  $ \hat{z}^{(t)}_{i,j}$ are computed by recursively propagating the depths located on the borders of the class  $ \mathcal{C}^{(t)}$. The details of this operation are given by algorithm 6. Note that the local depth maps  $ \mathsf{Z}^{(t)}$ all have the same size than the initial depth map  $ \mathsf{Z}$ but they are non-zero only over  $ \mathcal{B}^{(t)}$. This makes the merging step easier.


\begin{algorithm}
% latex2html id marker 14894\caption{Computation of the loca...
...ow \{(i,j) \in \mathcal{B}^{(t)} \vert t_{i,j} = 1 \}$\ \\
}
}
\end{algorithm}

Local Fittings

The fourth step of our approach consists in fitting a surface to each one of the $ r$ depth maps  $ \mathsf{Z}^{(t)}$. This is achieved using the FGA algorithm of section 4.3.1. The use of the FGA algorithm is possible because all the requirements are fulfilled: the local depth maps are complete rectangular grids and, moreover, the noise is homogeneous inside each one of the segments. We use the following parameters for the FGA algorithm:

$\displaystyle \bar M^{(t)}$ $\displaystyle = M \cap [ \beta^{(t)},\delta^{(t)} ],$ $\displaystyle \bar A^{(t)}$ $\displaystyle = A \cap [ \beta^{(t)},\delta^{(t)} ],$    
$\displaystyle \bar L^{(t)}$ $\displaystyle = L \cap [ \gamma^{(t)},\eta^{(t)} ],$ $\displaystyle \bar B^{(t)}$ $\displaystyle = B \cap [ \gamma^{(t)},\eta^{(t)} ].$    

The result of this step is a collection of $ r$ matrices of control points  $ \mathsf{P}^{(t)} \in \mathbb{R}^{(g+k+1)\times(h+l+1)}$. Note that  $ p_{i,j}^{(t)} = 0$ if there is no pixel  $ (u,v) \in \mathcal{B}^{(t)}$ such that $ (u,v) \in \mathrm{infl}(p_{i,j})$. The regularization parameter for the $ t$th depth map, denoted  $ \alpha^{(t)}$, is computed using the data contained in  $ \mathcal{C}^{(t)}$ only, not all the data from  $ \mathsf{Z}^{(t)}$.

Merging

The last step of our approach consists in merging all the local fittings in order to get a surface that fits the whole depth map  $ \mathsf{Z}$. To do so, we build a B-spline by taking the control points computed during the previous step. For  $ t \in \llbracket 1,r \rrbracket $, let  $ \mathsf{Q}^{(t)} \in \mathbb{R}^{(g+k+1) \times (h+l+1)}$ be the matrix that indicates the control points of `interest' for the segment  $ \mathcal{C}^{(t)}$, i.e. the ones that have an influence region intersecting with the data contained in  $ \mathcal{C}^{(t)}$:

$\displaystyle q^{(t)}_{i,j} = \begin{cases}1 & \text{if $\exists (u,v)\in \math...
... s.t. $(u,v) \in \mathrm{infl}(p_{i,j})$}  0 & \text{otherwise.} \end{cases}$    

The set of interesting control points is then reduced so that there are no conflicts between the control points coming from different local fittings. This is achieved by eroding the binary matrix  $ \mathsf{Q}^{(t)}$:

$\displaystyle \hat{\mathsf{Q}}^{(t)} = \mathsf{Q}^{(t)} \ominus \mathsf{S}_2,$ (4.110)

where $ \mathsf{S}_2$ is the following structuring element:

$\displaystyle \mathsf{S}_2 = \begin{pmatrix}1 & 1 & 1 & 1 & 0  1 & 1 & 1 & 1 & 0  1 & 1 & 1 & 1 & 0  1 & 1 & 1 & 1 & 0  0 & 0 & 0 & 0 & 0 \end{pmatrix}.$ (4.111)

Finally, the surface that fits the whole depth map  $ \mathsf{Z}$ is the B-spline having for control points the matrix  $ \mathsf{P}^\star$:

$\displaystyle \mathsf{P}^\star = \sum_{t=1}^r \bar{\mathsf{P}}^{(t)},$ (4.112)

$\displaystyle \textrm{with }\bar p^{(t)}_{i,j} = \begin{cases}p^{(t)}_{i,j} & \text{if $\hat{q}^{(t)}_{i,j} =1 $}  0 & \text{otherwise.} \end{cases}$ (4.113)


Experiments


Standard Deviation Estimator

In this experiment, we assess the precision of the standard deviation estimator described in section 4.3.1.4. To do so, we randomly generate some surfaces as linear combination of elementary functions (sine waves, exponential, polynomials). Depth maps are then obtained by sampling these surfaces on a grid of size $ 120 \times 160$. An additive normally-distributed noise with standard deviation $ \sigma$ is added to the depth maps. The estimated standard deviation $ \sigma_e$ is finally compared to the expected value $ \sigma$ by computing the relative error between them. The estimator is tested for several levels of noise ranging from $ 1\%$ to $ 21\%$ of the maximal data amplitude. The estimation is repeated for 100 surfaces for each level of noise. We obtain a relative error that is consistently smaller than  $ \boldsymbol{6}\boldsymbol{\%}$.

Grid Approach

This experiment shows the interest of exploiting the tensorial properties of the B-spline model instead of the direct approach in order to fit a surface on data arranged as an orthogonal grid. Figure 4.26 shows the computation times required by the two approaches in function of the number of data points (with slightly less data points than control points). It is obvious from this illustration that the grid approach is much faster than the direct approach. We can even say that the grid approach is compulsory if we want to fit surfaces in a reasonable amount of time for large datasets. Moreover, the memory space required by the direct approach is much more important than for the grid approach. On our test machine, the computations were intractable for the standard approach with more than 1500 data points.

Figure 4.26: Illustration of the performance gain obtained when using a grid approach instead of the standard one to fit surfaces on range data.
Image speedup

Qualitative Results

Figure 4.27 shows some representative results of our surface fitting algorithm. These results are compared to the ones obtained with a global regularization scheme. Since the discrepancy principle is only usable on data with homogeneous noise, we used the generalized cross validation principle (193) to compute the regularization parameter with the global regularization scheme. It is clear from figure 4.27 that our approach is able to simultaneously use the proper amount of regularization while preserving the sharp edges present in the observed scene. On the contrary, a global regularization scheme leads to an overly smoothed surface that does not reflect well the scene geometry.

Figure 4.27: First row: a view of the scene taken with a CMOS sensor. Second row: raw depth maps. Notice the high level of noise present in the background. Third row: surface fitted using our algorithm. Last row: surface fitted using a global approach. With this approach, we can see that the object boundaries are over-smoothed compared to the results obtained with our algorithm.
Image t1026-cmos Image t50-cmos    
Image t1026-original Image t50-original    
Image t1026-our Image t50-our    
Image t1026-global Image t50-global    
     

Quantitative Results

In this experiment, we compare the quality of the surfaces fitted using our algorithm described in section 4.3.2 with the ones obtained using a global regularization scheme. This is achieved by comparing the fitted surface with a ground truth surface. To do so, we used a set of 214 range images publicly available at (44). These images were acquired using an Odetics LADAR camera and, as a consequence, the amount of noise is quite limited. These datasets can thus be used as ground truth surfaces. Let  $ \bar{\mathsf{Z}}$ be a range image acquired with the LADAR camera. The different algorithms are tested using datasets obtained by adding a normally-distributed noise to the LADAR range images with a standard deviation proportional to the depth. Let  $ \mathsf{Z}$ be the noisy dataset:

$\displaystyle z_{i,j} = \bar{z}_{i,j} + e_{i,j},$ (4.114)

where $ e_{i,j}$ is a random variable drawn from a normal distribution  $ \mathcal{N}(0,\sigma_{i,j})$ with $ \sigma_{i,j} \propto z_{i,j}$. Let $ f_o$ and $ f_g$ be the surfaces fitted to the range image  $ \mathsf{Z}$ with our algorithm and a global regularization scheme respectively. Let  $ \hat{\mathsf{Z}}^o$ and  $ \hat{\mathsf{Z}}^g$ be the depth maps predicted with $ f_o$ and $ f_g$ respectively. The quality of the fitted surface is assessed by comparing the norm between the predicted depth maps  $ \hat{\mathsf{Z}}^o$ and  $ \hat{\mathsf{Z}}^g$ with the ground truth one  $ \mathsf{Z}$. The results, averaged for the 214 range images of the dataset, are reported in figure 4.28 for different levels of noise. It shows that our approach is, in average, $ 20\%$ better than the one using a global regularization scheme whatever the noise intensity is.

Figure 4.28: Discrepancy between ground truth range images and the ones predicted with the fitted surface using our algorithm (dark grey) and a global regularization scheme (light grey). The indicated level of noise corresponds to the standard deviation of a normally-distributed noise for the deepest point and relative to the total depth of the scene. The surfaces fitted with our approach are better than the ones fitted using a global regularization scheme.
Image quantitative


Contributions to Parametric Image Registration and 3D Surface Reconstruction (Ph.D. dissertation, November 2010) - Florent Brunet
Webpage generated on July 2011
PDF version (11 Mo)