Camera Calibration Matrix how to? - language-agnostic

With this toolbox I was performing calibration of my camera.
However the toolbox outputs results in matrix form, and being a noob I don't really understand mathy stuff.
The matrix is in the following form.
Where R is a rotation matrix, T is a translation vector.
And these are the results I got from the toolbox. It outputs values in pixels.
-0.980755 -0.136184 -0.139905 217.653207
0.148552 -0.055504 -0.987346 995.948880
0.126695 -0.989128 0.074666 371.963957
0.000000 0.000000 0.000000 1.000000
Using this data can I know how much my camera is rotated and distance of it from the calibration object?

The distance part is easy. The translation from the origin is given by the first three numbers in the rightmost column. This represents the translation in the x, y, and z directions respectively. In your example, the camera's position p = (px, py, pz) = (217.653207, 995.948880, 371.963957). You can take the Euclidean distance between the camera's location and the location of the calibration object (cx, cy, cz). That is it would just be sqrt( (px-cx)2 + (py-cy)2 + (pz-cz)2 )
The more difficult part regards the rotation which is captured in the upper left 3x3 elements of the matrix. Without knowing exactly how they arrived at this, you're somewhat out of luck. That is, it's not easy to convert that back to Euler Angles, if that's what you want. However, you can transform those elements into a Quaternion Rotation which will give you the unique unit vector and angle to rotate the camera to that orientation. The specifics of the computation are provided here. Once you have the Quaternion rotation, you can easily apply it to the vectors n = (0, 0, 1), up = (0, 1, 0) and right = (1, 0, 0) to get the normal (direction the camera is pointed), up and right vectors. The right vector is only useful if you are interested in slewing the camera left or right from its current position.

I'm guessing the code uses the 'standard' formation - then you will find more details in the opencv library docs or their book.

Related

How to use spatial transformer to crop the image in pytorch?

the paper of the spatial transformer network claims that it can be used to crop the image.
Given the crop region (top_left, bottom_right)=(x1,y1,x2,y2), how to interpret the region as a transformation matrix and crop the image in pytorch?
Here is a introduction about the spatial transformer network in torch (http://torch.ch/blog/2015/09/07/spatial_transformers.html), in the introduction, it visualize the bounding box where the transformer look at, How can we determine the bounding box given the transformation matrix?
[Edit]
I just found out the answer to the first question [given the crop region, find out a transformation matrix]
The image in the original post already provides a good answer, but it might be useful to provide some code.
Importantly, this method should retain gradients correctly. In my case I have a batch of y,x values that represent the center of the crop position (in the range [-1,1]). As for the values a and b, which are scale x and y values for the transformation, in my case I used 0.5 for each in combination with a smaller output size (half in width and height) to retain the original scale, i.e. to crop. You can use 1 to have no scale changes, but then there would be no cropping.
import torch.nn.functional as F
def crop_to_affine_matrix(t):
'Turns (N,2) translate values into (N,2,3) affine transformation matrix'
t = t.reshape(-1,1,2,1).flip(2) # flip x,y order to y,x
t = F.pad(t, (2,0,0,0)).squeeze(1)
t[:,0,0] = a
t[:,1,1] = b
return t
t = torch.zeros(5,2) # center crop positions for batch size 5
F.affine_grid(crop_to_affine_matrix(t), outsize)

Loss function for Bounding Box Regression using CNN

I am trying to understand Loss functions for Bounding Box Regression in CNNs. Currently I use Lasagne and Theano, which makes writing loss expressions very easy. Many sources propose different methods and I am asking myself which one is usually used in practice.
The bounding boxes coordinates are represented as normalized coordinates in the order [left, top, right, bottom] (using T.matrix('targets', dtype=theano.config.floatX)).
I have tried the following functions so far; however all of them have their drawbacks.
Intersection over Union
I was adviced to use the Intersection over Union measure to identify how well the 2 bounding boxes align and overlap. However, a problem occurs when the boxes don't overlap and then intersection is 0; then the whole quotient turns 0 regardless of how far the bounding boxes are apart. I implemented it as:
def get_area(A):
return (A[:,2] - A[:,0]) * (A[:,1] - A[:,3])
def get_intersection(A, B):
return (T.minimum(A[:,2], B[:,2]) - T.maximum(A[:,0], B[:,0])) \
* (T.minimum(A[:,1], B[:,1]) - T.maximum(A[:,3], B[:,3]))
def bbox_overlap_loss(A, B):
"""Computes the bounding box overlap using the
Intersection over union"""
intersection = get_intersection(A, B)
union = get_area(A) + get_area(B) - intersection
# Turn into loss
l = 1.0 - intersection / union
return l.mean()
Squared Diameter Difference
To create an error measure for non overlapping bounding boxes, I tried to compute the squared difference of the bounding box diameter. It seems to work, but I almost sure that there is much better way to do this. I implemented it as:
def squared_diameter_loss(A, B):
# Represent the squared distance from the real diameter
# in normalized pixel coordinates
l = (abs(A[:,0:2]-B[:,0:2]) + abs(A[:,2:4]-B[:,2:4]))**2
return l.mean()
Euclidean Loss
The simplest function would be the Euclidean Loss which computes the square root of the difference of the bounding box parameters squared. However, this doesn't take into account the area of the overlapping bounding box but only the difference of the parameters left, right, top, bottom. I implemented it as:
def euclidean_loss(A, B):
l = lasagne.objectives.squared_error(A, B)
return l.mean()
Could someone guide me on which would be the best loss function for bounding box regression for this use case or spot if I am doing something wrong here. Which loss function is usually used in practice?
Speaking from personal implementation experience, I had much better results training a CNN using IOU as the loss function as opposed to Euclidean (MSE or L2) Loss. Have not used the squared diameter difference loss. In general, a loss function that explicitly represents the goodness of your outputs for the tasks you hope to accomplish is probably best.
With regards to the IOU having a value of zero, you can introduce some additional term in the formulation so that it gracefully trends towards 0, perhaps based on normalized distance between bbox centers. This might give the additional effect of helping to center bounding boxes relative to the ground truth.
This response is mostly conceptual but I'd be happy to supply code examples if desired.

Stage3D, AGAL - vertices' and textures' coordinate systems

I've been trying to work with more complicated shaders, and have run into issues with the coordinate systems used by the vertex shader and texture sampler. In short: they don't seem to make any sense, and when trying to test them I end up getting inconsistent results. To make matters worse, the internet has little in the way of documentation, and most of the information I've found seems to expect me to know how this works already. I was hoping someone could clarify the following:
The vertex shaders pass an (x, y, z) representing a location on the render target. What are acceptable values for x, y, and z?
How do x and y correspond to the width and height of the back buffer (assuming that it's the render target)?
How do x and y correspond to the width and height on an output texture (assuming that it's the render target)?
When x=0 and y=0 where does the vertex sit, location-wise?
The texture samplers sample a texture at a (u, v) coordinate. What are acceptable values for u and v?
How do u and v correspond with the width and height of the texture being sampled?
How do AGAL's wrap, clamp, and repeat flags alter sampling, and what is the default behavior when one isn't given?
when sampling at u=0 and v=0, which pixel is returned location-wise?
EDIT:
From my tests, I believe the answers are:
Unsure
-1 is left/bottom, 1 is right/top
Unsure
At the center of the output
Unsure
0 is left/bottom, 1 is right/top
Unsure
The far bottom-left of the texture
You normally use the coordinate system of your own and then multiply the position of each vertex by MVP (model-view-projection) matrix to get NDC coordinates that can be fed to GPU as an output of vertex shader. There is a nice article explaining all that for Stage3D.
Correct. And z is in range [0, 1]
Rendering to a render target is the same as rendering to backbuffer - you output NDC from your vertex shader so the real size of the texture is irrelevant.
Yup, center of the screen.
Normally, it`s [0, 1] but you can use values that go out of that range and then the output depends on texture wrap mode (like repeat or clamp) set on the sampler.
(0, 0) is left/top, (1, 1) is right/bottom.
Default one is repeat. Those modes decide what you will get when you sample using coordinate that is out of range of [0, 1]. With repeat [1.5, 1.5] will result in [0.5, 0.5] while [1.0, 1.0] will be the result if the mode is set to clamp.
Top-left pixel of the texture.

Transforming data along (x,y) coordinate axis?

I have data like this
x-axis data values
-20.49, -12.23, -9.99, -1.00 0 , 1.12, 2.23, 3.45, 4.56, 8.99, 20.99, 30.23
y-axis data values
10,20,20,40,50,60........
I would like to transform above given data into xy coordinate system.
Please have look at the image.
For eg:
along x-axis (min, max ) data value (-20.49, 30.23),
along y-axis (min, max ) data value (10,60)
now if I want plot data(-20.49, 10) in image,
the X coordinate is going to be =200,
and Y-coordinate going to be = 220.
Like this I want plot all data fits within the range of rectangle.
Hope this gives all details
Thanks
This is more of the math question, not related to any programming language. And speaking about Actionscript 3, it has Y axis going from top to bottom, not from bottom to top. Anyway: If you have two points on an axis that you want to map to screen coordinates of your choice, record xmin as lesser native value, xmax as greater native value, and coordinates as xleft and xright. Then, when you need to receive a screen coordinate for your given x, you calculate the xcoord value as:
xcoord = xleft + (x - xmin)*(xright - xleft)/(xmax - xmin);
Similar approach will net you correct values for the Y axis.

HTML5 Canvas - Wireframe Sphere in 2d

I'm looking to draw a 3D wire frame sphere in 2D Canvas. I'm not a math ninja by any means, so I'm wondering if anyone knows a simple way to draw one in Canvas using lineto arc connections and drawing it with :math:
I would appreciate any assistance.
Something like this: http://en.wikipedia.org/wiki/File:Sphere_wireframe_10deg_6r.svg
I'm hoping this is a simple equation, but if you know that it isn't (i.e. drawing that would be a lot of code), I would appreciate knowing that as well as I may need to reconsider what I wanna do.
The easiest for you would probably to view the source of the SVG file (here) and recreate those paths using canvas commands.
If you want an actual 3d sphere, projected onto 2d space, I'd suggest using a library like Three.js
You can also look at some of the math I've done here: swarms
The _3d and Matrix modules should be all that you need.
This time SO didn't help me, so I've helped myself and here it is: a pure HTML5 + JavaScript configurable rendering or a wireframe sphere.
I started from this excellent post and then went on. Basically I collected some vertex generation code from Qt3D and adapted to JS.
I'm not 100% sure the rotation functions are correct, but you are welcome to contribute back in case you find errors.
To be clearer, I've distinguished Z positions and draw white on the front and gray on the back.
Here's the result (16 rings x 32 slices) and related jsFiddle link
Enjoy
This is an old thread, but I had the same question and could not find any existing satisfying answer. That is, an answer other than "use WebGL" or "use Three.js". Lo and behold, I am the bearer of great news: it is actually possible to render such a sphere using exclusively Canvas2D's ellipse function, giving us:
a straightforward implementation (~130 lines everything included)
no need for computing vertices and edges
sexy smooth edges
You can find a demo on JSBin, for posterity, with a bunch of options.
The key is to notice that the "wireframe" we're trying to draw is solely composed of circles, and every circle rotated in 3d space will get projected to the camera as an ellipse. The question, then, is: how to find the ellipse corresponding to the projection of the rotated circle?
As we are only interested in circles that lay on the surface of the sphere, we can characterize each of them by the (inter)section of a plane and the (unit) sphere. Therefore, each circle can be described by a normal vector and an offset -1 < o < 1.
Then it's not too difficult to compute and draw the ellipse resulting from the projection of the circle:
function draw_section(n, o = 0) {
let {x, y, z} = project(_p, n) // project normal on camera
let a = atan2(y, x) // angle of projected normal -> angle of ellipse
let ry = sqrt(1 - o * o) // radius of section -> y-radius of ellipse
let rx = ry * abs(z) // x-radius of ellipse
let W = sqrt(x * x + y * y)
let sa = acos(clamp(-1, 1, o * (1 / W - W) / rx || 0)) // ellipse start angle
let sb = z > 0 ? 2 * PI - sa : - sa // ellipse end angle
ctx.beginPath()
ctx.ellipse(x * o * RADIUS, y * o * RADIUS, rx * RADIUS, ry * RADIUS, a, sa, sb, z <= 0)
ctx.stroke()
}
The disks from your example image can be obtained by:
rotating a plane around the z axis
shifting a plane along the z axis
function draw_arcs() {
for (let i = 10; i--;) {
let a = i / 10 * Math.PI
draw_section(vec.set(_n, cos(a), sin(a), 0))
}
for (let i = 9; i--;) {
let a = (i + 1) / 10 * Math.PI
draw_section(Z, cos(a))
}
}
A nice benefit of this method is that you can do this "shifting a plane along the Z axis" for all axes, resulting in a lovely wireframe that would be hard to reproduce if computing vertices and edges by hand:
The only change was the following:
function draw_arcs() {
for (let i = 9; i--;) {
let a = (i + 1) / 10 * Math.PI
draw_section(Z, cos(a))
draw_section(X, cos(a))
draw_section(Y, cos(a))
}
}
The function draw_section above was carefully crafted so that it only draws the camera-facing arc of a given section, which means we get occlusion-culling for free.
(and my dirty trick to render the back of the sphere with a different color is to run draw_arcs again after flipping the canvas)
It's also possible to use 2 radial gradients to have some fake depth shading like in your image:
Sadly browsers seem to struggle a lot when drawing paths with gradients.
There are but two drawbacks I see right now:
performance may vary between browsers. It's likely some optimization could be done, like merging successive calls to .stroke() into one. Frankly quite surprised by how slow using ellipse seems to be at times.
it's a parallel projection rather than a perspective one. If we were to add perspective, projected circles would still appear as ellipses, but the calculation of the ellipse would be a tad more involved. I haven't done it yet, I expect it to be possible, might update my answer if I succeed.
Look at this one: http://jsfiddle.net/aJMBp/
you should just draw a lot of these lines to create a complete sphere. This is a good starting point, give me 5 minutes and I'll see if I can improve it to draw a sphere.
Getting better:
http://jsfiddle.net/aJMBp/1/
Ok, thats def out of my capacity. However, another little improvement here: http://jsfiddle.net/aJMBp/2/