Bounding Box labeling for VOC data in SSD - deep-learning

I used SSD for CNN training and I tested with VOC2007 and VOC2012 dataset.
I do not quite understand the normalized Bounding Box in the VOC2007 dataset.
Say for VOC2007 data set, when we look at Annotations/000002.xml file. It has the bounding box of
<size>
<width>335</width>
<height>500</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object><name>train</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>139</xmin>
<ymin>200</ymin>
<xmax>207</xmax>
<ymax>301</ymax>
</bndbox></object>
That is ok to draw a bounding box to the image shown below.
But when I look at the normalized data at labels/000002.txt file.
18 0.516417910448 0.501 0.202985074627 0.202
How these 0.516417910448 0.501 0.202985074627 0.202 are related to 139 200 207 301?
I used 335 and 500 to normalize, but don't get these values.

I had a similar problem in the past, my solution was:
Consider you have (from your Pascal VOC XML file - Annotations/000002.xml):
width = 335
height = 500
xmin = 139
xmax = 207
ymin = 200
ymax = 301
To normalize the bounding box coords, you can do:
def norm_box(width, height, xmin, xmax, ymin, ymax):
x = (xmin + xmax)/2. * 1./width
w = (xmax - xmin) * 1./width
y = (ymin + ymax)/2. * 1./height
h = (ymax - ymin) * 1./height
return (x,y,w,h)
box_norm = norm_box(width, height, xmin, xmax, ymin, ymax)
print(box_norm)
The output is:
(0.5164179104477612, 0.501, 0.20298507462686566, 0.202)
that matches the normalized data at labels/000002.txt file.
18 0.516417910448 0.501 0.202985074627 0.202
To denormalize it, you can do:
def denorm_box(width, height, x, y, w, h):
xmax = int((x*width) + (w * width)/2.0)
xmin = int((x*width) - (w * width)/2.0)
ymax = int((y*height) + (h * height)/2.0)
ymin = int((y*height) - (h * height)/2.0)
return (xmin, xmax, ymin, ymax)
(x,y,w,h) = box_norm
box = denorm_box(width, height, x, y, w, h)
print(box)
The output is:
(139, 207, 200, 301)
You can see a demo here:
https://notebooks.azure.com/andrewssobral/libraries/utils/html/bbnorm.ipynb
Hope this help you

Related

Gradient is always zero in autograd.grad()

I implemented a custom loss function, which looks like this:
However, the gradient of this function is always zero and I don't understand why.
The code for the objective function:
def objective(p, output):
x,y = p
a = minA
b = minB
r = 0.1
XA = 1/2 -1/2 * torch.tanh(100*((x - a[0])**2 + (y - a[1])**2 - (r + 0.02)**2))
XB = 1/2 -1/2 * torch.tanh(100*((x - b[0])**2 + (y - b[1])**2 - (r + 0.02)**2))
q = (1-XA)*((1-XB)* output + (XB))
output_grad, _ = torch.autograd.grad(q, (x,y))
output_grad.requires_grad_()
q = output_grad**2
return q
And the code for training the model (which is a simple, fully connected NN):
model = NN(input_size)
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
for e in range(epochs) :
for configuration in total:
print("Train for configuration", configuration)
# Training pass
optimizer.zero_grad()
#output is q~
output = model(configuration)
#loss is the objective function we defined
loss = objective(configuration, output.item())
loss.backward()
optimizer.step()
I really think the problem is in the output_grad, _ = torch.autograd.grad(q, (x,y)).
(During he training, "configuration" is a point sampled from a distribution identified by the coordinates x and y).
Thanks!!
Here I provide the code on a google colab session:
Google colab
Tanh is a bounded function and converges quite quickly to 1. Your XA and XB points are defined as
XA = 1/2 - 1/2 * torch.tanh(100*(z1 + z2 - z0))
XB = 1/2 - 1/2 * torch.tanh(100*(z3 + z4 - z0))
Since z1 + z2 - z0 and z3 + z4 - z0 are rather close to 1, you will end up with an input close to 100. This means the tanh will output 1, resulting in XA and XB begin zeros. You might not want to have this 100 coefficient if you want to have non zero outputs.

Convert Yolo output to a real-world coordinate system

We have detected objects on UAV data using Yolo v5 and obtained bounding box coordinates (x1,y1,x2,y2) in the format relative to the origin of the satellite data. The data looks like this and is returned as a tab-delimited text file.
[ 7953 11025 7978 11052]
[16777 10928 16817 10970]
[15670 10591 15685 10607]
The results are accompanied by a PNG and the PGW (world file) reads like this:
0.1617903116883119
0
0
-0.1617903116883119
655854.20159515587147325
2716038.70000312989577651
How can the bounding boxes be converted into real-world global projection EPSG:4328 usable in GIS? Any hints towards a python script are much appreciated.
go in Detect.py and set gn = 1 and it'll get you the un-normalized coordinates. Attaching the screenshot below for you reference
I wrote this short function to convert the yolo detections to real-world polygons. The yolo detections.txt needs to be read without the [].
# function to return polygon
def bbox(x1, y1, x2, y2):
# world file content
# Line 1: A: x-component of the pixel width (x-scale)
xscale = 0.1617903116883119
# Line 2: D: y-component of the pixel width (y-skew)
yskew = 0
# Line 3: B: x-component of the pixel height (x-skew)
xskew = 0
# Line 4: E: y-component of the pixel height (y-scale), typically negative
yscale = -0.1617903116883119
# Line 5: C: x-coordinate of the center of the original image's upper left pixel transformed to the map
xpos = 655854.20159515587147325
# Line 6: F: y-coordinate of the center of the original image's upper left pixel transformed to the map
ypos = 2716038.70000312989577651
X_proj = xpos + (xscale * x1) + (xskew * y1)
Y_proj = ypos + (yscale * y1) + (yskew * x1)
X1_proj = xpos + (xscale * x2) + (xskew * y2)
Y1_proj = ypos + (yscale * y2) + (yskew * x2)
return Polygon([[X_proj, Y_proj],
[X1_proj, Y_proj],
[X1_proj, Y1_proj],
[X_proj, Y1_proj]])
outGDF = gpd.GeoDataFrame(geometry = dataset.apply(lambda g: bbox(int(g[0]),int(g[1]),int(g[2]),int(g[3])),axis=1),crs = {'init':'epsg:32638'})

find point of intersection between two arc

I'm creating a game for kids. It's creating a triangle using 3 lines. How I approached this is I create two arcs(semi circle) from two end points of a base line. But I couldn't figure how to find the point of intersection of those two arc. I've search about it but only found point of intersection between two straight lines. Is there any method to find this point of intersection? Below is the figure of two arcs drawn from each end of the baseline.
Assume centers of the circle are (x1, y1) and (x2, y2), radii are R1 and R2. Let the ends of the base be A and B and the target point be T. We know that AT = R1 and BT = R2. IMHO the simplest trick to find T is to notice that difference of the squares of the distances is a known constant (R1^2 - R2^2). And it is easy to see that the line the contains points meeting this condition is actually a straight line perpendicular to the base. Circles equations:
(x - x1)^2 + (y-y1)^2 = R1^2
(x - x2)^2 + (y-y2)^2 = R2^2
If we subtract one from another we'll get:
(x2 - x1)(2*x - x1 - x2) + (y2 - y1)(2*y - y1 - y2) = R1^2 - R2^2
Let's x0 = (x1 + x2)/2 and y0 = (y1 + y2)/2 - the coordinates of the center. Let also the length of the base be L and its projections dx = x2 - x1 and dy = y2 - y1 (i.e. L^2 = dx^2 + dy^2). And let's Q = R1^2 - R2^2 So we can see that
2 * (dx * (x-x0) + dy*(y-y0)) = Q
So the line for all (x,y) pairs with R1^2 - R2^2 = Q = const is a straight line orthogonal to the base (because coefficients are exactly dx and dy).
Let's find the point C on the base that is the intersection with that line. It is easy - it splits the base so that difference of the squares of the lengths is Q. It is easy to find out that it is the point on a distance L/2 + Q/(2*L) from A and L/2 - Q/(2*L) from B. So now we can find that
TC^2 = R1^2 - (L/2 + Q/(2*L))^2
Substituting back Q and simplifying a bit we can find that
TC^2 = (2*L^2*R1^2 + 2*L^2*R2^2 + 2*R1^2*R2^2 - L^4 - R1^4 - R2^4) / (4*L^2)
So let's
a = (R1^2 - R2^2)/(2*L)
b = sqrt(2*L^2*R1^2 + 2*L^2*R2^2 + 2*R1^2*R2^2 - L^4 - R1^4 - R2^4) / (2*L)
Note that formula for b can also be written in a different form:
b = sqrt[(R1+R2+L)*(-R1+R2+L)*(R1-R2+L)*(R1+R2-L)] / (2*L)
which looks quite similar to the Heron's formula. And this is not a surprise because b is effectively the length of the height to the base AB from T in the triangle ABT so its length is 2*S/L where S is the area of the triangle. And the triangle ABT obviously has sides of lengths L, R1 and R2 respectively.
To find the target T we need to move a along the base and b in a perpendicular direction. So coordinates of T calculated from the middle of the segment are:
Xt = x0 + a * dx/L ± b * dy / L
Yt = y0 + a * dy/L ± b * dx / L
Here ± means that there are two solutions: one on either side of the base line.
Partial case: if R1 = R2 = R, then a = 0 and b = sqrt(R^2 - (L/2)^2) which makes obvious sense: T lies on the segment bisector on a length of sqrt(R^2 - (L/2)^2) from the middle of the segment.
Hope this helps.
While you have not stated clearly, I assume that you have points with coordinates (A.X, A.Y) and (B.X, B.Y) and lengths of two sides LenA and LenB and need to find coordinates of point C.
So you can make equation system exploiting circle equation:
(C.X - A.X)^2 + (C.Y - A.Y)^2 = LenA^2
(C.X - B.X)^2 + (C.Y - B.Y)^2 = LenB^2
and solve it for unknowns C.X, C.Y.
Not that it is worth to subtract A coordinates from all others, make and solve simpler system (the first equation becomes C'.X^2 + C'.Y^2 = LenA^2), then add A coordinates again
So I actually needed this to design a hopper to lift grapes during the wine harvest. Tried to work it out myself but the algebra is horrible, so I had a look on the web -in the end I did it myself but introduced some intermediate variables (that I calculate in Excel - this should also work for the OP since the goal was a calculated solution). In fairness this is really much the same as previous solutions but hopefully a little clearer.
Problem:
What are the coordinates of a point P(Xp,Yp) distance Lq from point Q(Xq,Yq) and distance Lr from point R(Xr,Yr)?
Let us first map the problem onto to new coordinate system where Lq is the origin, thus Q’ = (0,0), let (x,y) = P’(Xp-Xq,Yp-Yq) and let (a,b) = R’(Xr-Xq,Yr-Yq).
We may now write:
x^2 + y^2 = Lq^2 -(1)
(x-a)^2 + (y-b)^2 = Lr^2 -(2)
Expanding 2:
x^2 – 2ax + a^2 + y^2 -2ay + b^2 =Lr^2
Subtracting 1 and rearranging
2by = -2ax + a2 + b2 - Lr^2+ Lq^2
For convenience, let c = a^2 + b^2 + Lq^2 + Lr^2 (these are all known constants so c may be easily computed), thus we obtain:
y = -ax/b + c/2b
Substituting into 1 we obtain:
x^2 + (-a/b x + c/2b)^2 = Lq^2
Multiply the entire equation by b^2 and gather terms:
(a^2 + b^2) x2 -ac x + c/4 + Lq^2 b^2 = 0
Let A = (a2 + b2), B= -ac ,and C= c/4 + Lq^2 b^2
Use the general solution for a quadratic
x = (-B +-SQRT(B^2-4AC))/2A
Substitute back into 1 to get:
y= SQRT(Lq^2 - x^2 )
(This avoids computational difficulties where b = 0)
Map back to original coordinate system
P = (x+Xq, y + Yq)
Hope this helps, sorry about the formatting, I had this all pretty in Word, but lost it

Visualizing features and activations in CNN using Keras example

I'm following the keras blog post code to visualize the features learned and activations at different layers. The code has randomly generated a gray-image of dimension (1,3,img_width, img_height) and visualized it. here it is:
from __future__ import print_function
from scipy.misc import imsave
import numpy as np
import time
from keras.applications import vgg16
from keras import backend as K
# dimensions of the generated pictures for each filter.
img_width = 128
img_height = 128
# the name of the layer we want to visualize
# (see model definition at keras/applications/vgg16.py)
layer_name = 'block5_conv1'
# util function to convert a tensor into a valid image
def deprocess_image(x):
# normalize tensor: center on 0., ensure std is 0.1
x -= x.mean()
x /= (x.std() + 1e-5)
x *= 0.1
# clip to [0, 1]
x += 0.5
x = np.clip(x, 0, 1)
# convert to RGB array
x *= 255
if K.image_data_format() == 'channels_first':
x = x.transpose((1, 2, 0))
x = np.clip(x, 0, 255).astype('uint8')
return x
# build the VGG16 network with ImageNet weights
model = vgg16.VGG16(weights='imagenet', include_top=False)
print('Model loaded.')
model.summary()
# this is the placeholder for the input images
input_img = model.input
# get the symbolic outputs of each "key" layer (we gave them unique names).
layer_dict = dict([(layer.name, layer) for layer in model.layers[1:]])
def normalize(x):
# utility function to normalize a tensor by its L2 norm
return x / (K.sqrt(K.mean(K.square(x))) + 1e-5)
kept_filters = []
for filter_index in range(0, 200):
# we only scan through the first 200 filters,
# but there are actually 512 of them
print('Processing filter %d' % filter_index)
start_time = time.time()
# we build a loss function that maximizes the activation
# of the nth filter of the layer considered
layer_output = layer_dict[layer_name].output
if K.image_data_format() == 'channels_first':
loss = K.mean(layer_output[:, filter_index, :, :])
else:
loss = K.mean(layer_output[:, :, :, filter_index])
# we compute the gradient of the input picture wrt this loss
grads = K.gradients(loss, input_img)[0]
# normalization trick: we normalize the gradient
grads = normalize(grads)
# this function returns the loss and grads given the input picture
iterate = K.function([input_img], [loss, grads])
# step size for gradient ascent
step = 1.
# we start from a gray image with some random noise
if K.image_data_format() == 'channels_first':
input_img_data = np.random.random((1, 3, img_width, img_height))
else:
input_img_data = np.random.random((1, img_width, img_height, 3))
input_img_data = (input_img_data - 0.5) * 20 + 128
# we run gradient ascent for 20 steps
for i in range(20):
loss_value, grads_value = iterate([input_img_data])
input_img_data += grads_value * step
print('Current loss value:', loss_value)
if loss_value <= 0.:
# some filters get stuck to 0, we can skip them
break
# decode the resulting input image
if loss_value > 0:
img = deprocess_image(input_img_data[0])
kept_filters.append((img, loss_value))
end_time = time.time()
print('Filter %d processed in %ds' % (filter_index, end_time - start_time))
# we will stich the best 64 filters on a 8 x 8 grid.
n = 8
# the filters that have the highest loss are assumed to be better-looking.
# we will only keep the top 64 filters.
kept_filters.sort(key=lambda x: x[1], reverse=True)
kept_filters = kept_filters[:n * n]
# build a black picture with enough space for
# our 8 x 8 filters of size 128 x 128, with a 5px margin in between
margin = 5
width = n * img_width + (n - 1) * margin
height = n * img_height + (n - 1) * margin
stitched_filters = np.zeros((width, height, 3))
# fill the picture with our saved filters
for i in range(n):
for j in range(n):
img, loss = kept_filters[i * n + j]
stitched_filters[(img_width + margin) * i: (img_width + margin) * i + img_width,
(img_height + margin) * j: (img_height + margin) * j + img_height, :] = img
# save the result to disk
imsave('stitched_filters_%dx%d.png' % (n, n), stitched_filters)
Could you please let me know how can I modify these statements in the code:
input_img_data = np.random.random((1, img_width, img_height, 3))
input_img_data = (input_img_data - 0.5) * 20 + 128
to insert my own data and visualize the features learned and activations? My image is a RGB image of dimensions 150, 150. thanks for your assistance.
If you want to process single image:
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
img = load_img('data/XXXX.jpg') # this is a PIL image
x = img_to_array(img)
x = x.reshape((1,) + x.shape)
If you want to process batch:
from keras.preprocessing.image import ImageDataGenerator
data_gen_args = dict(featurewise_center=True,
featurewise_std_normalization=True,
rotation_range=90.,
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.2)
image_datagen = ImageDataGenerator(**data_gen_args)
image_generator = image_datagen.flow_from_directory(
'data/images',
class_mode=None,
seed=seed)
To see documentation : https://keras.io/preprocessing/image/#imagedatagenerator
Update
# we start from a gray image with some random noise
if K.image_data_format() == 'channels_first':
img = load_img('images/1/1.png') # this is a PIL image
x = img_to_array(img)
x = x.reshape((1,) + x.shape)
else:
#input_img_data = np.random.random((1, img_width, img_height, 3))
img = load_img('images/1/1.png') # this is a PIL image
x = img_to_array(img)
x = x.reshape((1,) + x.shape)
input_img_data = x
input_img_data = (input_img_data - 0.5) * 20 + 128

How can I find the smallest difference between two angles around a point?

Given a 2D circle with 2 angles in the range -PI -> PI around a coordinate, what is the value of the smallest angle between them?
Taking into account that the difference between PI and -PI is not 2 PI but zero.
An Example:
Imagine a circle, with 2 lines coming out from the center, there are 2 angles between those lines, the angle they make on the inside aka the smaller angle, and the angle they make on the outside, aka the bigger angle.
Both angles when added up make a full circle. Given that each angle can fit within a certain range, what is the smaller angles value, taking into account the rollover
This gives a signed angle for any angles:
a = targetA - sourceA
a = (a + 180) % 360 - 180
Beware in many languages the modulo operation returns a value with the same sign as the dividend (like C, C++, C#, JavaScript, full list here). This requires a custom mod function like so:
mod = (a, n) -> a - floor(a/n) * n
Or so:
mod = (a, n) -> (a % n + n) % n
If angles are within [-180, 180] this also works:
a = targetA - sourceA
a += (a>180) ? -360 : (a<-180) ? 360 : 0
In a more verbose way:
a = targetA - sourceA
a -= 360 if a > 180
a += 360 if a < -180
x is the target angle. y is the source or starting angle:
atan2(sin(x-y), cos(x-y))
It returns the signed delta angle. Note that depending on your API the order of the parameters for the atan2() function might be different.
If your two angles are x and y, then one of the angles between them is abs(x - y). The other angle is (2 * PI) - abs(x - y). So the value of the smallest of the 2 angles is:
min((2 * PI) - abs(x - y), abs(x - y))
This gives you the absolute value of the angle, and it assumes the inputs are normalized (ie: within the range [0, 2π)).
If you want to preserve the sign (ie: direction) of the angle and also accept angles outside the range [0, 2π) you can generalize the above. Here's Python code for the generalized version:
PI = math.pi
TAU = 2*PI
def smallestSignedAngleBetween(x, y):
a = (x - y) % TAU
b = (y - x) % TAU
return -a if a < b else b
Note that the % operator does not behave the same in all languages, particularly when negative values are involved, so if porting some sign adjustments may be necessary.
An efficient code in C++ that works for any angle and in both: radians and degrees is:
inline double getAbsoluteDiff2Angles(const double x, const double y, const double c)
{
// c can be PI (for radians) or 180.0 (for degrees);
return c - fabs(fmod(fabs(x - y), 2*c) - c);
}
See it working here:
https://www.desmos.com/calculator/sbgxyfchjr
For signed angle:
return fmod(fabs(x - y) + c, 2*c) - c;
In some other programming languages where mod of negative numbers are positive, the inner abs can be eliminated.
I rise to the challenge of providing the signed answer:
def f(x,y):
import math
return min(y-x, y-x+2*math.pi, y-x-2*math.pi, key=abs)
For UnityEngine users, the easy way is just to use Mathf.DeltaAngle.
Arithmetical (as opposed to algorithmic) solution:
angle = Pi - abs(abs(a1 - a2) - Pi);
I absolutely love Peter B's answer above, but if you need a dead simple approach that produces the same results, here it is:
function absAngle(a) {
// this yields correct counter-clock-wise numbers, like 350deg for -370
return (360 + (a % 360)) % 360;
}
function angleDelta(a, b) {
// https://gamedev.stackexchange.com/a/4472
let delta = Math.abs(absAngle(a) - absAngle(b));
let sign = absAngle(a) > absAngle(b) || delta >= 180 ? -1 : 1;
return (180 - Math.abs(delta - 180)) * sign;
}
// sample output
for (let angle = -370; angle <= 370; angle+=20) {
let testAngle = 10;
console.log(testAngle, "->", angle, "=", angleDelta(testAngle, angle));
}
One thing to note is that I deliberately flipped the sign: counter-clockwise deltas are negative, and clockwise ones are positive
There is no need to compute trigonometric functions. The simple code in C language is:
#include <math.h>
#define PIV2 M_PI+M_PI
#define C360 360.0000000000000000000
double difangrad(double x, double y)
{
double arg;
arg = fmod(y-x, PIV2);
if (arg < 0 ) arg = arg + PIV2;
if (arg > M_PI) arg = arg - PIV2;
return (-arg);
}
double difangdeg(double x, double y)
{
double arg;
arg = fmod(y-x, C360);
if (arg < 0 ) arg = arg + C360;
if (arg > 180) arg = arg - C360;
return (-arg);
}
let dif = a - b , in radians
dif = difangrad(a,b);
let dif = a - b , in degrees
dif = difangdeg(a,b);
difangdeg(180.000000 , -180.000000) = 0.000000
difangdeg(-180.000000 , 180.000000) = -0.000000
difangdeg(359.000000 , 1.000000) = -2.000000
difangdeg(1.000000 , 359.000000) = 2.000000
No sin, no cos, no tan,.... only geometry!!!!
A simple method, which I use in C++ is:
double deltaOrientation = angle1 - angle2;
double delta = remainder(deltaOrientation, 2*M_PI);