Subsections

Notation, First Definitions

Although most of our notation is compliant with the international standard ISO 31-11 (183), we feel that it is appropriate to give the details of the notation most commonly used in this manuscript. We also give some basic definitions concerning, for instance, some standard operators or matrices.

Basic Notations

Scalars are denoted in italic Roman lowercase letters (e.g.

) or, sometimes, italic Greek lowercase (e.g. $\alpha$ ). Vectors are written using bold fonts (e.g. $\mathbf{p}$ ). They are considered as column vectors. Sans-serif fonts are used for matrices (e.g. $\mathsf{M}$ ). The elements of a vector are denoted using the same letter as the vector but in an italic Roman variant. The same remark holds for matrices. Full details on notation and tools for vector and matrices will be given in section 2.1.4.

Functions

A function is the indication of an input space

, an output space

, and a mapping between the elements of these spaces. A function is said to be monovariate when $\dim(I) = 1$ and multivariate when $\dim(I) > 1$ . All the same way, a function is said to be scalar-valued when $\dim(O)=1$ and vector-valued when $\dim(O) > 1$ . We use lowercase italic letters for scalar-valued functions (e.g.

) and calligraphic fonts for vector-valued functions (e.g. $\mathcal {W}$ ). From time to time, other typographic conventions are used to denominate functions depending on the context. A function $\mathcal {F}$ is defined using this notation: $\mathcal {F} : I \rightarrow O$ . The mapping between an element $\mathbf{x} \in I$ and its corresponding image $\mathcal {F}(\mathbf{x}) \in O$ is denoted $\mathbf{x} \mapsto \mathcal {F}(\mathbf{x})$ . The vector $\mathbf{x}$ is called the free variable (and it can be replaced by any other notation). The complete definition of a function is written:

$\begin{displaymath}\begin{array}{rcl} \mathcal {F} : I & \longrightarrow & O \\... ...athbf{x} & \longmapsto & \mathcal {F}(\mathbf{x}). \end{array}\end{displaymath}$

(2.1)

Differential operators.

Let $f : \mathbb{R}^m \rightarrow \mathbb{R}$ be a scalar-valued function and let $\mathbf{x} \in \mathbb{R}^m$ be the free variable. The partial derivative of

with respect to

is denoted by $\frac{\partial f}{\partial x_i}$ . The gradient of

evaluated at $\mathbf{p}$ is denoted $\boldsymbol{\nabla}_{\mathbf{p}} f(\mathbf{p})$ . It is considered as a column vector:

$\displaystyle \boldsymbol{\nabla}_{\mathbf{p}} f(\mathbf{p}) = \begin{pmatrix} ... ...bf{p}) \vdots \frac{\partial f}{\partial x_n}(\mathbf{p}) \end{pmatrix}.$

(2.2)

In practice, the dependency on $\mathbf{p}$ is omitted when it is obvious from the context. Consequently, $\boldsymbol{\nabla}_{\mathbf{p}} f(\mathbf{p})$ is often shortened to $\boldsymbol{\nabla}f(\mathbf{p})$ or even $\boldsymbol{\nabla}f$ .

For a vector-valued function $\mathcal {F} : \mathbb{R}^m \rightarrow \mathbb{R}^n$ , the counterpart of the gradient is the Jacobian matrix, denoted $\boldsymbol{\mathsf{J}}_\mathcal {F}$ . If the components of $\mathcal {F}$ are denoted $\{f_i\}_{i=1}^n$ then the Jacobian matrix is defined as:

$\displaystyle \boldsymbol{\mathsf{J}}_\mathcal {F}(\mathbf{p}) = \begin{pmatrix... ... f_n}{\partial x_m}(\mathbf{p}) \end{pmatrix} \in \mathbb{R}^{n \times m}.$

(2.3)

As for the gradient, the point where the Jacobian matrix is evaluated is omitted when it is clear from the context.

The Hessian matrix of a scalar-valued function $f : \mathbb{R}^m \rightarrow \mathbb{R}$ is the matrix of the partial derivatives of second order. This matrix is denoted $\boldsymbol{\mathsf{H}}_f$ and it is defined by:

$\displaystyle \boldsymbol{\mathsf{H}}_f(\mathbf{p}) = \begin{pmatrix} \frac{\... ... f}{\partial x_m^2}(\mathbf{p}) \end{pmatrix} \in \mathbb{R}^{m \times m}.$

(2.4)

As for the gradient and the Jacobian matrix, we consider that the notation $\boldsymbol{\mathsf{H}}_f(\mathbf{p})$ is equivalent to the notation $\boldsymbol{\mathsf{H}}_f$ when the vector $\mathbf{p}$ is obvious from the context.

Sets and Collections

The sets are usually written using upper-case letters (e.g.

). The usual sets of numbers are denoted using the blackboard font: $\mathbb{N}$ for the natural numbers, $\mathbb{Z}$ for the integers, $\mathbb{Q}$ for the rational numbers, $\mathbb{R}$ for the real numbers, and $\mathbb{C}$ for the complex numbers^2.1. The explicit definition of a set is denoted using the curly brackets (e.g. $A = \{1,2,\ldots,10\} = \{i \:\vert\:i=1,\ldots,10\} = \{i\}_{i=1}^{10}$ ). The vertical bar in a set definition is synonym of the expression `such that' (often abbreviated `s.t.'). Following the Anglo-Saxon convention, we consider that $\mathbb{N}= \{ 1, 2, \ldots \}$ and $\mathbb{N}^* = \{0, 1, 2, \ldots \}$ while $\mathbb{R}$ is the set of all the real numbers and $\mathbb{R}^* = \mathbb{R}\smallsetminus \{0\}$ . The set of all the positive (respectively negative) real numbers is denoted $\mathbb{R}_+$ (respectively $\mathbb{R}_-$ ). The Cartesian product of two sets is designated using the $\times$ symbol, i.e. for two sets

and

, we have $A \times B = \left\{(a,b) \:\vert\:a \in A \textrm{ and } b \in B \right\}$ . The notation

represents the Cartesian product of

with itself iterated

times. The symbols used for the intersection, the union, and the difference are respectively $\cap$ , $\cup$ , and $\smallsetminus$ .

Real intervals are denoted using brackets: is the set of all the real numbers such that $a \leq x \leq b$ . The scalars and are called the endpoints of the interval. We use outwards-pointing brackets to indicate the exclusion of an endpoint: for instance, $]a,b] = \{ x \in \mathbb{R}\:\vert\:a < x \leq b \}$ .

Integer intervals (also known as discrete intervals) are denoted using either the double bracket notation or the `three dots' notation. For instance, the integer interval $\{-1, 0, 1, 2\}$ may be written $\llbracket -1,2 \rrbracket$ or $\{-1, \ldots, 2\}$ .

Collections, i.e. grouping of heterogeneous or `complicated' elements such as point correspondences are denoted using fraktur fonts (e.g. $\mathfrak{D}$ ).

Matrices and Vectors

Matrices are denoted using sans serif font (e.g. $\mathsf{M}$ ). Although a vector is a special matrix, we use bold symbols for them (e.g. $\mathbf{p}$ or $\mathbold{\beta}$ ). By default, vectors are considered as column vectors. The set of all the matrices defined over $\mathbb{R}$ and of size $m \times n$ is denoted $\mathbb{R}^{m \times n}$ . The transpose, the inverse, and the pseudo-inverse of a matrix $\mathsf{A}$ are respectively denoted $\mathsf{A}^\mathsf{T}$ , $\mathsf{A}^{-1}$ , and $\mathsf{A}^\dagger$ . The pseudo-inverse is generally defined as $\mathsf{A}^\dagger = \left( \mathsf{A}^\mathsf{T}\mathsf{A}\right)^{-1}\mathsf{A}^\mathsf{T}$ (see section 2.2.2.6). The coefficient located at the intersection of the

th row and the

th column of the matrix $\mathsf{A}$ is denoted $a_{i,j}$ . The coefficients of a vector are noted using the same letter but with the bold removed. For instance, the

th coefficient of the vector $\mathbold{\beta}$ is written $\beta_i$ .

Parentheses and brackets.

We use either the parenthesis or squared brackets when giving the explicit form of a matrix. Parenthesis are used when the elements are scalars, e.g. :

$\displaystyle \mathsf{A} = \begin{pmatrix}a_{11} & a_{12} & a_{13} a_{21} & a_{22} & a_{23} \end{pmatrix}.$

(2.5)

The bracket notation is used when the matrix is defined with `blocks', i.e. juxtaposition of matrices, vectors, and scalars. For instance:

$\displaystyle \mathsf{B} = \begin{bmatrix}\mathsf{A} & 2 \mathsf{A} 3 \mathsf{A} & 4 \mathsf{A} \end{bmatrix}.$

(2.6)

Common matrices.

The identity matrix of size $n \times n$ is denoted $\mathbf{I}_n$ :

$\displaystyle \mathbf{I}_n = \begin{pmatrix} 1 & 0 & \ldots & 0 0 & 1 & ... ... \ddots & 0 0 & \ldots & 0 & 1 \end{pmatrix} \in \mathbb{R}^{n \times n}.$

(2.7)

The matrix of size $m \times n$ filled with zeros is denoted $\mathbf{0}_{m \times n}$ . The subscripts in the notation $\mathbf{I}_n$ and $\mathbf{0}_{m \times n}$ are often omitted when the size can be easily deduced from the context.

Common operators.

The operator $\mathrm{vect}$ is used for the column-wise vectorization of a matrix. For instance, if $\mathsf{A} \in \mathbb{R}^{m \times n}$ :

$\displaystyle \mathrm{vect}(\mathsf{A}) = \begin{bmatrix} \mathbf{a}_1 \vdo... ...thsf{A} = \begin{bmatrix} \mathbf{a}_1 & \cdots & \mathbf{a}_n \end{bmatrix}.$

(2.8)

The operator $\mathrm{diag}$ deals with diagonal matrices. The effect of this operator is similar to the one of the diag function in Matlab. Applied to a vector $\mathbf{d} \in \mathbb{R}^n$ , it builds a matrix $\mathsf{D} \in \mathbb{R}^{n \times n}$ such that:

$\displaystyle \mathsf{D} = \begin{pmatrix} d_1 & 0 & \ldots & 0 0 & d_2 &... ...ots \vdots & \ddots & \ddots & 0 0 & \ldots & 0 & d_n \end{pmatrix}.$

(2.9)

Conversely, when applied to a square matrix $\mathsf{M} \in \mathbb{R}^{n \times n}$ , the operator $\mathrm{diag}$ builds a vector that contains the diagonal coefficients of $\mathsf{M}$ :

$\displaystyle \mathrm{diag}(\mathsf{M}) = \begin{pmatrix} m_{1,1} & \ldots & m_{n,n} \end{pmatrix}^\mathsf{T}.$

(2.10)

The Hadamard product.

The Hadamard product of two matrices, also known as the element-wise product, is denoted with the $\odot$ symbol. The Hadamard product of the matrices $\mathsf{A}$ and $\mathsf{B}$ is the matrix $\mathsf{C} = \mathsf{A} \odot \mathsf{B}$ such that $c_{i,j} \stackrel{\mathrm{def}}{=} a_{i,j} b_{i,j}$ . The matrices $\mathsf{A}$ , $\mathsf{B}$ , and $\mathsf{C}$ all have the same size.

The Kronecker product.

The Kronecker product, denoted with the symbol $\otimes$ , is a binary operation on two matrices of arbitrary sizes. Let $\mathsf{A} \in R^{m_a \times n_a}$ and $\mathsf{B} \in R^{m_b \times n_b}$ be two matrices. The Kronecker product of $\mathsf{A}$ and $\mathsf{B}$ is defined as follows:

$\displaystyle A \otimes B = \begin{bmatrix} a_{11} \mathsf{B} & \cdots & a_{1n... ...s a_{m_a1} \mathsf{B} & \cdots & a_{m_an_a} \mathsf{B} \end{bmatrix}.$

(2.11)

Vector norms.

The -norm of a vector $\mathbf{v} \in \mathbb{R}^n$ is denoted $\Vert \mathbf{v} \Vert_p$ . It is defined for $p \geq 1$ by:

$\displaystyle \Vert \mathbf{v} \Vert_p \stackrel{\mathrm{def}}{=} \left( \sum_{i=1}^n \vert v_i \vert^p \right)^\frac{1}{p}.$

(2.12)

Note that the 1-norm is also known as the taxicab norm or the Manhattan norm. The 2-norm corresponds to the Euclidean norm. In this case, we prefer the notation $\Vert \mathbf{v} \Vert$ instead of the notation $\Vert \mathbf{v} \Vert_2$ :

$\displaystyle \Vert \mathbf{v} \Vert \stackrel{\mathrm{def}}{=} \sqrt{\sum_{i=1}^n v_i^2}.$

(2.13)

The maximum norm, also known as the infinity norm or uniform norm, is denoted $\Vert \mathbf{v} \Vert_\infty$ . It is defined as:

$\displaystyle \Vert \mathbf{v} \Vert_\infty \stackrel{\mathrm{def}}{=} \max_{i \in \llbracket 1,n \rrbracket } \vert v_i \vert.$

(2.14)

Note that the maximum norm corresponds to the

-norm when $p \rightarrow \infty$ .

The Frobenius norm.

The Frobenius norm of a matrix $\mathsf{A} \in \mathbb{R}^{m \times n}$ is denoted $\Vert \mathsf{A} \Vert_\mathcal{F}$ . It is defined as:

$\displaystyle \Vert \mathsf{A} \Vert_\mathcal{F}\stackrel{\mathrm{def}}{=} \sqrt{\sum_{i=1}^m \sum_{j=1}^n a_{i,j}^2}.$

(2.15)

The Frobenius norm of a matrix is related to the Euclidean norm of a vector in the sense that they are both defined as the square root of the sum of the squared coefficients. In fact, we have the following equality:

$\displaystyle \Vert \mathsf{A} \Vert_\mathcal{F}= \Vert \mathrm{vect}(\mathsf{A}) \Vert.$

(2.16)