Let’s continue our work on image alignment. We shall cover further details on image warping.

For what comes next, we’ll work a bit in Python. Import the following packages :

import cv2
import numpy as np
from matplotlib import pyplot as plt

So far, we saw how to :

  • detect features (Harris corner detector, Laplacian of Gaussians for blobs, difference of Gaussians for fast approximation of the LOG)
  • the properties of the ideal feature
  • the properties of the feature descriptor
  • matching of local features
  • feature distance metrics

We’ll cover into further details image alignment.


I. Image Warping

There is a geometric relationship between these 2 images :


An image warping is a change of domain of an image : \(g(x) = f(h(x))\). This might include translation, rotation or aspect change. These changes are said to be global parametric warping: \(p' = T(p)\), since the transformation can easily be described by few parameters and is the same for every input point.


To build the transformed image, we usually apply an inverse-warping :

  • for every pixel \(x'\) in \(g(x')\) :
  • compute the source location \(x = \hat{h}(x')\)
  • resample \(f(x)\) at location \(x\) and copy to \(g(x')\)

This allows \(\hat{h}(x')\) to be defined for all pixels in \(g(x')\).

1. Linear transformations

We’ll now cover the different types of linear transformations that we can apply to an image using inverse-warping.

a. Uniform Scaling

Scaling by factor \(s\) :

\[S = \begin{pmatrix} s & 0 \\ 0 & s \end{pmatrix}\]


b. Rotation

Rotation by angle \(\theta\) :

\[R = \begin{pmatrix} cos \theta & - sin \theta \\ sin \theta & cos \theta \end{pmatrix}\]


c. 2D Mirror about the Y-axis

\[T = \begin{pmatrix} -1 & 0 \\ 0 & 1 \end{pmatrix}\]

d. 2D Miror accross line \(y = x\)

\[T = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}\]

e. All 2D linear transformations

In summary, the linear transforms we can apply are :

  • scale
  • rotation
  • shear
  • mirror
\[\begin{pmatrix} x' \\ y \end{pmatrix} = \begin{pmatrix} a & b \\ c & d \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix}\]

The transformation should respect the following properties :

  • origin maps origin
  • lines map to lines
  • parallel lines remain parallel
  • ratios are preserved
  • closed under composition

2. Translation

The trick is to add one more coordinate to build homogenous image coordinates.


3. Affine transformation

An affine transformation is any transformation that combines linear transformations and translations. For example :

\[\begin{pmatrix} x' \\ y' \\ w' \end{pmatrix} = \begin{pmatrix} a & b & c \\ d & e & f \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} x \\ y \\ w \end{pmatrix}\]

In affine transformations, the origin does not always have to map the origin.

4. Homography

The Homography transform is also called projective transformation or planar perspective map.

\[H = \begin{pmatrix} a & b & c \\ d & e & f \\ g & h & 1 \end{pmatrix}\]

Homographic transformations simply respect the following properties :

  • lines map to lines
  • closed under composition

The different transformations can be summarized this way :


With homographies, points at infinity become finite vanishing points.


II. Computing transformations

Given a set of matches between images \(A\) and \(B\), we must find the transform \(T\) that best agrees with the matches.


1. Translation

The displacement of match \(i\) is \((x_i' - x_i, y_i' - y_i)\) where \(x_i' = x_i + x_t\) and \(y_i' = y_i + y_t\). We want therefore to solve :

\[(x_t, y_t) = ( \frac {1} {n} \sum_i x_i' - x_i, \frac {1} {n} \sum_i y_i' - y_i )\]


We face an overdetermined system of equations, which can be solved by least squares.

\[r_{x_i}(x_t) = x_i + x_t - x_i'\] \[r_{y_i}(y_t) = y_i + y_t - y_i'\]

The goal is to minimize the sum of squared residuals :

\[C(x_t, y_t) = \sum_i (r_{x_i}(x_t)^2 + r_{y_i}(y_t)^2)\]

We can rewrite the problem in matrix form :


And the solution heads : \(t = (A^T A)^{-1} Ab\)

2. Affine transformation

\[\begin{pmatrix} x' \\ y' \\ w' \end{pmatrix} = \begin{pmatrix} a & b & c \\ d & e & f \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} x \\ y \\ w \end{pmatrix}\]

We can write the residuals as :

\[r_{x_i}(a,b,c,d,e,f) = (ax_i + by_i + c) - x_i'\] \[r_{y_i}(a,b,c,d,e,f) = (dy_i + ey_i + f) - y_i'\]

And rewrite the cost function as :

\[C(a,b,c,d,e,f) = \sum_i ( r_{x_i}(a,b,c,d,e,f)^2 + r_{y_i}(a,b,c,d,e,f)^2 )\]

Which can be rewritten in matrix form as :


Let’s develop a more general formulation. We have \(x' = f(x,p)\) a parametric transformation. The Jacobian of the transformation \(f\) with respect to the motion parameters \(p\) determines the relationship between the amount of motion \(\Delta x = x' - x\) and the unknown parameters \(\Delta x = x' - x = J(x)p\) . We note that \(J = \frac {\delta f} {\delta p}\).


The sum of squared residuals is then :

\[E_{LLS} = \sum_i {\mid \mid J(x_i)p - \Delta x_i \mid \mid_2 }^2\]

The solution yields :

\(Ap = b\) where :

  • we define \(A = \sum_i J^T(x_i)J(x_i)\) the Hessian
  • and \(b = \sum_i J^T(x_i) \Delta x_i\)

Up to now, we considered only a perfect matching accuracy. It’s however only rarely the case. We can weight the least squares problem :

\(E_{WLS} = \sum_i {\Sigma_i}^2 \mid \mid r_i \mid \mid\) . If the \(\delta_i\) are fixed, the solution to apply is : \(p = ( \sum^T A^T A \Sigma)^-1 \sigma^T Ab\) with \(\Sigma\) a matix containing for each observation the noise level.

**Conclusion **: I hope this article on image alignment, transformations, and warping was helpful. Don’t hesitate to drop a comment if you have any question.