### Homography

In the pinhole model, the points in space are projected on the image plane (of the idealized camera sensor) by using a linear transformation if we care to use homogeneous coordinates. We would need the 3x4 camera matrix C, which is obtained by putting the camera intrinsic and extrinsic together. The formula for projection will be

$\left( \begin{array}{c} \tilde{u} \\ \tilde{v} \\ \tilde{w} \end{array} \right) = \left( \begin{array}{cccc} C_{11} & C_{12} & C_{13} & C_{14} \\ C_{21} & C_{22} & C_{23} & C_{24} \\ C_{31} & C_{32} & C_{33} & C_{34} \end{array} \right) \left( \begin{array}{c} X \\ Y \\ Z \\ 1 \end{array} \right)$

To get the un-homogeneous coordinates $u, v$ on the image plane, we simply rescale everything by $\tilde{w}$

$u = \frac{\tilde{u}}{\tilde{w}}, v = \frac{\tilde{v}}{\tilde{w}}$

We can use this fact to rewrite the projection as

$\left( \begin{array}{c} \lambda\tilde{u} \\ \lambda\tilde{v} \\ \lambda\tilde{w} \end{array} \right) = \lambda\left( \begin{array}{cccc} C_{11} & C_{12} & C_{13} & C_{14} \\ C_{21} & C_{22} & C_{23} & C_{24} \\ C_{31} & C_{32} & C_{33} & 1 \end{array} \right) \left( \begin{array}{c} X \\ Y \\ Z \\ 1 \end{array} \right)$

Here we chose $\lambda$ to have the last parameter of the camera matrix as 1. Notice we will end up again with the same $u, v$ ($\lambda$ cancels out) as you can see

$u = \frac{\lambda\tilde{u}}{\lambda\tilde{w}}, v = \frac{\lambda\tilde{v}}{\lambda\tilde{w}}$

Now consider a plane in the world space, and imagine we aligned the reference system in a way Z=0 for all points on this plane.

For all the point laying on the plane, we can use a simplified version of the projection. The third column of the camera matrix will be going to be irrelevant (multiplied by Z=0 in fact). So we can simplify and write

$\left( \begin{array}{c} \tilde{u} \\ \tilde{v} \\ \tilde{w} \end{array} \right) = \left( \begin{array}{ccc} C_{11} & C_{12} & C_{13} \\ C_{21} & C_{22} & C_{23} \\ C_{31} & C_{32} & 1 \end{array} \right) \left( \begin{array}{c} X \\ Y \\ 1 \end{array} \right)$

The projection for a given plane in the world space to the image plane of the camera is called homography and can be expressed by a 3x3 matrix, with 8 degrees of freedom (the last parameter can be always one, due to scale invariance).

Now imagine a second camera, looking at the points on the same designated plane

We could do something like the following: from the image plane of the first camera we could apply an inverse projection to get the points on the world plane, then project them on the second camera image plane by using its own camera matrix. We could, in other words, combine the two transformations and obtain a single homography H which will project co-planar points (in the world space) from one image to the other. Mind that this change of prospective will work only for points on that same plane, as the general case would need a 4x4 matrix M.

The mapping equation for the homography will be

$\left( \begin{array}{c} \tilde{u} \\ \tilde{v} \\ \tilde{w} \end{array} \right) = \left( \begin{array}{ccc} H_{11} & H_{12} & H_{13} \\ H_{21} & H_{22} & H_{23} \\ H_{31} & H_{32} & 1 \end{array} \right) \left( \begin{array}{c} u' \\ v' \\ 1 \end{array} \right)$

### Homography calculation

Even if H is the result of combining two camera matrices, you don’t need to know them to calculate H. There are 8 degrees of freedom, so you would need just 8 equations for the 8 unknowns. Those equations could be written by using 4 couples of corresponding points in the given images planes.

How do we obtain those equations? So, we have $\mathbb{x_1} \sim\ H\mathbb{x_2}$ for points on a plane in world space. Written explicitly:

$\left( \begin{array}{c} \tilde{x}_2 \\ \tilde{y}_2 \\ \tilde{z}_2 \end{array} \right) = \left( \begin{array}{ccc} H_{11} & H_{12} & H_{13} \\ H_{21} & H_{22} & H_{23} \\ H_{31} & H_{32} & 1 \end{array} \right) \left( \begin{array}{c} x_1 \\ y_1 \\ 1 \end{array} \right)$

Taking into account the fact that non homogeneous coordinates are

$x_2=\frac{\tilde{x}_2}{\tilde{z}_2}, y_2=\frac{\tilde{y}_2}{\tilde{z}_2}$

then we can write

$x_2 = \frac{H_{11} x_1 + H_{12}y_1 + H_{13}}{H_{31}x_1 + H_{32}y_1 + 1}$ $y_2 = \frac{H_{21} x_1 + H_{22}y_1 + H_{23}}{H_{31}x_1 + H_{32}y_1 + 1}$

Rearranging the above we get $\begin{array}{c} x_2(H_{31}x_1 + H_{32}y_1 + 1) = H_{11}x_1 + H_{12}y_1 + H_{13} \\ y_2(H_{31}x_1 + H_{32}y_1 + 1) = H_{21}x_1 + H_{22}y_1 + H_{23} \end{array}$

And then $\begin{array}{c} x_1x_2H_{31} + y_1x_2H_{32} + x_2 = H_{11}x_1 + H_{12}y_1 + H_{13} \\ x_1y_2H_{31} + y_1y_2H_{32} + y_2 = H_{21}x_1 + H_{22}y_1 + H_{23} \end{array}$

And then $\begin{array}{c} -x_1x_2H_{31} - y_1x_2H_{32} + x_1H_{11} + y_1H_{12} + H_{13} = x_2 \\ -x_1y_2H_{31} - y_1y_2H_{32} + x1H_{21} + y_1H_{22} + H_{23} = y_2 \end{array}$

Now, if we define the following three vectors $\mathbf{a_x} = \left( \begin{array}{c} x_1 \\ y_1 \\ 1 \\ 0 \\ 0 \\ 0 \\ -x_1 x_2 \\ -y_1 x_2 \end{array} \right) , \mathbf{a_y} = \left( \begin{array}{c} 0 \\ 0 \\ 0 \\ x_1 \\ y_1 \\ 1 \\ -x_1 y_2 \\ -y_1 y_2 \end{array} \right) , \mathbf{h} = \left( \begin{array}{c} H_{11} \\ H_{12} \\ H_{13} \\ H_{21} \\ H_{22} \\ H_{23} \\ H_{31} \\ H_{32} \end{array} \right)$

Then the above equations become simply $\begin{array}{c} \mathbf{a_x}^T\mathbf{h} = x_2 \\ \mathbf{a_y}^T\mathbf{h} = y_2 \end{array}$

If we stack at least 8 equations, using 4 couples of corresponding points, then we can form the following linear system of equation

$\left( \begin{array}{c} \mathbf{a_{x_1}}^T \\ \mathbf{a_{y_1}}^T \\ \vdots \\ \mathbf{a_{x_N}}^T \\ \mathbf{a_{y_N}}^T \end{array} \right) \mathbf{h} = \left( \begin{array}{c} x_2 \\ y_2 \\ \vdots \\ x_N \\ y_N \end{array} \right)$

Well, we are basically done, as we can use tf.matrix_solve_ls to solve the linear system. Hereafter the code

I created also a jupyter notebook on github as an example.

That’s all folks.