minimalia blog

Homography for tensorflow

9 min read · tagged neural network, tensorflow, homography, computer vision



In the pinhole model, if we care to use homogeneous coordinates, the points in space are projected on the image plane (of the idealized camera sensor) using a simple linear transformation. We would need the 3x4 camera matrix C, which is obtained by putting the camera intrinsic and extrinsic together. The formula for projection will be

(u~v~w~)=(C11C12C13C14C21C22C23C24C31C32C33C34)(XYZ1)\left( \begin{array}{c} \tilde{u} \\ \tilde{v} \\ \tilde{w} \end{array} \right) = \left( \begin{array}{cccc} C_{11} & C_{12} & C_{13} & C_{14} \\ C_{21} & C_{22} & C_{23} & C_{24} \\ C_{31} & C_{32} & C_{33} & C_{34} \end{array} \right) \left( \begin{array}{c} X \\ Y \\ Z \\ 1 \end{array} \right)

To get the non-homogeneous coordinates u,vu, v on the image plane, we rescale everything by w~\tilde{w}

u=u~w~,v=v~w~u = \frac{\tilde{u}}{\tilde{w}}, v = \frac{\tilde{v}}{\tilde{w}}

We can use this fact to rewrite the projection as

(λu~λv~λw~)=λ(C11C12C13C14C21C22C23C24C31C32C331)(XYZ1)\left( \begin{array}{c} \lambda\tilde{u} \\ \lambda\tilde{v} \\ \lambda\tilde{w} \end{array} \right) = \lambda\left( \begin{array}{cccc} C_{11} & C_{12} & C_{13} & C_{14} \\ C_{21} & C_{22} & C_{23} & C_{24} \\ C_{31} & C_{32} & C_{33} & 1 \end{array} \right) \left( \begin{array}{c} X \\ Y \\ Z \\ 1 \end{array} \right)

Here we chose λ\lambda to have the last parameter of the camera matrix as 1. Notice we will end up again with the same u,vu, v (λ\lambda cancels out) as you can see

u=λu~λw~,v=λv~λw~u = \frac{\lambda\tilde{u}}{\lambda\tilde{w}}, v = \frac{\lambda\tilde{v}}{\lambda\tilde{w}}

Now consider a plane in the world space, and imagine we aligned the reference system in a way Z=0 for all points on this plane.

For all the points lying on the plane, we can use a simplified version of the projection. The third column of the camera matrix will be irrelevant (multiplied by Z=0 in fact). So we can simplify and write

(u~v~w~)=(C11C12C13C21C22C23C31C321)(XY1)\left( \begin{array}{c} \tilde{u} \\ \tilde{v} \\ \tilde{w} \end{array} \right) = \left( \begin{array}{ccc} C_{11} & C_{12} & C_{13} \\ C_{21} & C_{22} & C_{23} \\ C_{31} & C_{32} & 1 \end{array} \right) \left( \begin{array}{c} X \\ Y \\ 1 \end{array} \right)

The projection for a given plane in the world space to the image plane of the camera is called homography and can be expressed by a 3x3 matrix, with 8 degrees of freedom (the last parameter can always be one, due to scale invariance, so it is fixed).

Now imagine a second camera, looking at the points on the same designated plane

We could do something like the following:from the first camera’s image plane, we could apply an inverse projection to get the points on the world plane, then project them on the second camera image plane by using its camera matrix. We could, in other words, combine the two transformations and obtain a single homography H, which will project co-planar points (in the world space) from one image to the other. Mind that this change of perspective will work only for points on that same plane, as the general case would need a 4x4 matrix M.

The mapping equation for the homography will be

(u~v~w~)=(H11H12H13H21H22H23H31H321)(uv1)\left( \begin{array}{c} \tilde{u} \\ \tilde{v} \\ \tilde{w} \end{array} \right) = \left( \begin{array}{ccc} H_{11} & H_{12} & H_{13} \\ H_{21} & H_{22} & H_{23} \\ H_{31} & H_{32} & 1 \end{array} \right) \left( \begin{array}{c} u' \\ v' \\ 1 \end{array} \right)

Homography calculation

Even if H is the result of combining two camera matrices, you don’t need to know them to calculate H. There are 8 degrees of freedom, so you would need just 8 equations for the 8 unknowns. Those equations could be written by using 4 couples of corresponding points in the given images planes.

How do we obtain those equations? So, we have x1 Hx2\mathbb{x_1} \sim\ H\mathbb{x_2} for points on a plane in world space. Written explicitly:

(x~2y~2z~2)=(H11H12H13H21H22H23H31H321)(x1y11)\left( \begin{array}{c} \tilde{x}_2 \\ \tilde{y}_2 \\ \tilde{z}_2 \end{array} \right) = \left( \begin{array}{ccc} H_{11} & H_{12} & H_{13} \\ H_{21} & H_{22} & H_{23} \\ H_{31} & H_{32} & 1 \end{array} \right) \left( \begin{array}{c} x_1 \\ y_1 \\ 1 \end{array} \right)

Taking into account the fact that non-homogeneous coordinates are

x2=x~2z~2,y2=y~2z~2x_2=\frac{\tilde{x}_2}{\tilde{z}_2}, y_2=\frac{\tilde{y}_2}{\tilde{z}_2}

then we can write

x2=H11x1+H12y1+H13H31x1+H32y1+1x_2 = \frac{H_{11} x_1 + H_{12}y_1 + H_{13}}{H_{31}x_1 + H_{32}y_1 + 1} y2=H21x1+H22y1+H23H31x1+H32y1+1y_2 = \frac{H_{21} x_1 + H_{22}y_1 + H_{23}}{H_{31}x_1 + H_{32}y_1 + 1}

Rearranging the above, we get

x2(H31x1+H32y1+1)=H11x1+H12y1+H13y2(H31x1+H32y1+1)=H21x1+H22y1+H23\begin{array}{c} x_2(H_{31}x_1 + H_{32}y_1 + 1) = H_{11}x_1 + H_{12}y_1 + H_{13} \\ y_2(H_{31}x_1 + H_{32}y_1 + 1) = H_{21}x_1 + H_{22}y_1 + H_{23} \end{array}

And then

x1x2H31+y1x2H32+x2=H11x1+H12y1+H13x1y2H31+y1y2H32+y2=H21x1+H22y1+H23\begin{array}{c} x_1x_2H_{31} + y_1x_2H_{32} + x_2 = H_{11}x_1 + H_{12}y_1 + H_{13} \\ x_1y_2H_{31} + y_1y_2H_{32} + y_2 = H_{21}x_1 + H_{22}y_1 + H_{23} \end{array}

And then

x1x2H31y1x2H32+x1H11+y1H12+H13=x2x1y2H31y1y2H32+x1H21+y1H22+H23=y2\begin{array}{c} -x_1x_2H_{31} - y_1x_2H_{32} + x_1H_{11} + y_1H_{12} + H_{13} = x_2 \\ -x_1y_2H_{31} - y_1y_2H_{32} + x1H_{21} + y_1H_{22} + H_{23} = y_2 \end{array}

Now, if we define the following three vectors

ax=(x1y11000x1x2y1x2),ay=(000x1y11x1y2y1y2),h=(H11H12H13H21H22H23H31H32)\mathbf{a_x} = \left( \begin{array}{c} x_1 \\ y_1 \\ 1 \\ 0 \\ 0 \\ 0 \\ -x_1 x_2 \\ -y_1 x_2 \end{array} \right) , \mathbf{a_y} = \left( \begin{array}{c} 0 \\ 0 \\ 0 \\ x_1 \\ y_1 \\ 1 \\ -x_1 y_2 \\ -y_1 y_2 \end{array} \right) , \mathbf{h} = \left( \begin{array}{c} H_{11} \\ H_{12} \\ H_{13} \\ H_{21} \\ H_{22} \\ H_{23} \\ H_{31} \\ H_{32} \end{array} \right)

Then the above equations become simply

axTh=x2ayTh=y2\begin{array}{c} \mathbf{a_x}^T\mathbf{h} = x_2 \\ \mathbf{a_y}^T\mathbf{h} = y_2 \end{array}

If we stack at least 8 equations, using 4 couples of corresponding points, then we can form the following linear system of equation

(ax1Tay1TaxNTayNT)h=(x2y2xNyN)\left( \begin{array}{c} \mathbf{a_{x_1}}^T \\ \mathbf{a_{y_1}}^T \\ \vdots \\ \mathbf{a_{x_N}}^T \\ \mathbf{a_{y_N}}^T \end{array} \right) \mathbf{h} = \left( \begin{array}{c} x_2 \\ y_2 \\ \vdots \\ x_N \\ y_N \end{array} \right)

Well, we are basically done, as we can use tf.matrix_solve_ls to solve the linear system. Hereafter the code

def ax(p, q):   
    return [ p[0], p[1], -1, 0, 0, 0, -p[0] * q[0], -p[1] * q[0] ]
def ay(p, q):
    return [ 0, 0, 0, p[0], p[1], 1, -p[0] * q[1], -p[1] * q[1] ]

def homography(x1s, x2s):

    p = []
    # we build matrix A by using only 4 point correspondence. The linear
    # system is solved with the least square method, so here 
    # we could even pass more correspondence 
    p.append(ax(x1s[0], x2s[0]))
    p.append(ay(x1s[0], x2s[0]))

    p.append(ax(x1s[1], x2s[1]))
    p.append(ay(x1s[1], x2s[1]))   
    p.append(ax(x1s[2], x2s[2]))
    p.append(ay(x1s[2], x2s[2]))
    p.append(ax(x1s[3], x2s[3]))
    p.append(ay(x1s[3], x2s[3]))
    # A is 8x8
    A = tf.stack(p, axis=0)  
    m = [[x2s[0][0], x2s[0][1], x2s[1][0], x2s[1][1], x2s[2][0], x2s[2][1], x2s[3][0], x2s[3][1]]]

    # P is 8x1
    P = tf.transpose(tf.stack(m, axis=0))  
    # here we solve the linear system
    # we transpose the result for convenience
    return tf.transpose(tf.matrix_solve_ls(A, P, fast=True))

I also created a jupyter notebook on github as an example.

That’s all folks.

Stefano software developer

powered by @gatsby