problem with pre-annotations (yolo v5) in label studio - deep-learning

I am trying to use annotations from my Yolo v5 model in label-studio. I was able to bring the annotations to label-studio and show them. The problem is the inaccuracy in the labels displayed by label-studio (I have images generated from Yolo to how the predictions should look like). The only transformation I did to the x, y, width, height data obtained from yolo is multiplying by 100.
Predictions from yolo:
YOLO
label-studio:
label-studio

Please use label-studio-converter for this: https://github.com/heartexlabs/label-studio-converter/pull/46
Most likely you incorrectly calculated coordinates, the correct code is here: https://github.com/heartexlabs/label-studio-converter/blob/master/label_studio_converter/imports/yolo.py#L85

only transformation I did to the x, y, width, height data obtained from yolo is multiplying by 100
Building on #Max answer you can use the following formulas (from here) to convert yolo coordinates to label studio compatible coordinates
x_lbstdio = (x-width/2)*100
y_lbstdio = (y-height/2)*100
w_lbstdio = w*100
h_lbstdio = h*100
Note the only change being subtracting half of width and half of height from x and y respectively before multiplying by 100

Related

How Can I Convert Dataset Annaotations To Fixed(YoloV5) Format Without Hand Encoding

So I Am Working On This Awesome Project On Object Detection,Where The Prior Task Is To Identify Brand Logos, So after Doing some research i found this dateset available for the
brand logo For More About Dataset:here
DATASET:
This dateset has 2 versions
FlickrLogos32
FlickrLogos47(recommended for brand detection)
as the name 32 and 47 are the no. of classes offered by this dataset. From the Documentation itself mentioned 47 version is correctly annotated and recommended for object detection & recognization also in my project i have used 47 version
Model:
I Am Using YoloV5 For object detection the
reason behind using YoloV5 and not previous versions is, it it well documented with couple of tutorials with jupyter notebooks available
Problem:
As For The YoloV5:Object Detection Model,The Object Label Should Be Annotated As
<x_center> <y_center> <width> <height> corresponds to bounding box(below image),
whereas the dataset annotations are given in the form of
<x1> <y1> <x2> <y2> where <x1>,<y1>:upper left corner of the bounding box
<x2>,<y2>:lower right corner of the bounding box.
How can i transform <x1>,<y1>,<x2>,<y2>: corner points of bounding box to naive yolo
annotations format i.e.<center_x>,<center_y>,<height>,<width>
without manually going one by one over image and drawing rectangle box with roboflow
Also the Labels are annotated by pixel so we have to normalize it in (0,1)
Datset Insights:
For Any Dataset Example Its Having An Image(.png) and as a Label A Ground truth(.txt)(see below image)
the '.mask' file its just binary mask of object present in image
So A Data Example look likes:
Image:
gt_data.txt:
Mask:
In general to calculate the center it should be xmin + (width/2) and ymin + (height/2). So I think you have you /2 in wrong part of the equation.
Also note that an yolo annotation will look like this.
0.642859 0.079219 0.148063 0.148062
The coordinates are relative to the size of the photo from 0-1. To normalize the coordinates you need to normalize the x dimensions by dividing by the photo width and normalize the y dimensions by dividing by the photo height.

How to use spatial transformer to crop the image in pytorch?

the paper of the spatial transformer network claims that it can be used to crop the image.
Given the crop region (top_left, bottom_right)=(x1,y1,x2,y2), how to interpret the region as a transformation matrix and crop the image in pytorch?
Here is a introduction about the spatial transformer network in torch (http://torch.ch/blog/2015/09/07/spatial_transformers.html), in the introduction, it visualize the bounding box where the transformer look at, How can we determine the bounding box given the transformation matrix?
[Edit]
I just found out the answer to the first question [given the crop region, find out a transformation matrix]
The image in the original post already provides a good answer, but it might be useful to provide some code.
Importantly, this method should retain gradients correctly. In my case I have a batch of y,x values that represent the center of the crop position (in the range [-1,1]). As for the values a and b, which are scale x and y values for the transformation, in my case I used 0.5 for each in combination with a smaller output size (half in width and height) to retain the original scale, i.e. to crop. You can use 1 to have no scale changes, but then there would be no cropping.
import torch.nn.functional as F
def crop_to_affine_matrix(t):
'Turns (N,2) translate values into (N,2,3) affine transformation matrix'
t = t.reshape(-1,1,2,1).flip(2) # flip x,y order to y,x
t = F.pad(t, (2,0,0,0)).squeeze(1)
t[:,0,0] = a
t[:,1,1] = b
return t
t = torch.zeros(5,2) # center crop positions for batch size 5
F.affine_grid(crop_to_affine_matrix(t), outsize)

How to crop features outside an image region using pytorch?

We can use ROI-Pool/ROI-Align to crop the sub-features inside an image region (which is a rectangle).
I was wondering how to crop features outside this region.
In other words, how to set values (of a feature map) inside a rectangle region to zero, but values outside the region remains unchanged.
I'm not sure that this idea of ROI align is quite correct. ROI pool and align are used to take a number of differently sized regions of interest identified in the original input space (i.e. pixel-space) and output a set of same-sized feature crops from the features calculated by (nominally) the convolutional network.
As perhaps a simple answer to your question, though, you simply need to create a mask tensor of ones with the same dimension as your feature maps, and set the values within the ROIs to zero for this mask, then multiply the mask by the feature maps. This will suppress all values within the ROIs. Creation of this mask should be fairly simple. I did it with a for-loop to avoid thinking but there's likely more efficient ways as well.
feature_maps # batch_size x num_feature maps x width x height
mask = torch.ones(torch.shape(feature_maps[0,0,:,:]))
for ROI in ROIs: # assuming each ROI is [xmin ymin xmax ymax]
mask[ROI[0]:ROI[2],ROI[1]:ROI[3]] = 0
mask = mask.unsqueeze(0).unsqueeze(0) # 1 x 1 x width x height
mask = mask.repeat(batch_size,num_feature_maps,1,1) # batch_size x num_feature maps x width x height
output = torch.mul(mask,feature_maps)

How does the training data look like in Yolo model

I'm trying to figure out how to create YOLOv1 model from scratch but I can't figure out how the training data should look like. I suspect training labels (ground truth) looks like a matrix with (7, 7, 5*2 + 10) where
7x7 stands for the prediction grid
5 is object location and confidence (always equal to 1); x,y - known box center; h,v - box height and width
*2 is because there should be horizontal and vertical box for each cell
10 - is one-hot encoding for a class present in this position
what I don't understand is
whether to put confidence==1 to both horizontal and vertical bounding boxes?
whether x and y should be coordinates in the original (resized for the input) image?
...or maybe I'm completely off with my whole understanding. Does somebody have experience with YOLO?

Google Maps pixel height by latitude

In Google maps, the closer one gets to the pole, the more strechted out the map gets and sp each pixel of map represents less movment (asymtotically to 0 at the north pole)
I'm looking for a formula to connect the width of a pixel in degrees to the latitute (i.e. the real world distance represented by a pixel on the map). I have some data points here for zoom level 12 (IIRC)
Lat Width
0 0.703107352
4.214943141 0.701522096
11.86735091 0.688949038
21.28937436 0.656590105
30.14512718 0.60989762
35.46066995 0.574739011
39.90973623 0.541457085
41.5085773 0.528679228
44.08758503 0.507194173
47.04018214 0.481321842
48.45835188 0.468430215
51.17934298 0.442887842
63.23362741 0.318394373
72.81607372 0.208953319
80.05804956 0.122131316
90 0
The reason for doing this is I want to input lat/lng pairs and sort out exactly what pixel they would be located with respect to 0,0
I might be wrong but are you sure thos points are the pixel height? They seem to be a cosine which would be the pixel width not the height.
After a little trigonometry the pixel height adjusts to the formula:
where R is the earth radius, phi is the latitude and h is the height of a pixel in the equator.
This formula does not adjust to your points, that's why I asked if it was the width instead.
Anyway if you want so much precision that you cannot use the approximation in the previous answer you should also consider the R variable with the latitude and even with that I don't think you'll get the exact result.
Update:
Then the formula would be a cosine. If you want to take the variable radius of the earth the formula would be:
where R is the radius of the earth and d(0) is your pixel width at the equator. You may use this formula for R assuming the eearth to be an ellipsoid:
with a = 6378.1 (equator) and b = 6356.8 (poles)
While I am not sure what "height of a pixel" means, the plot of data (shown below) seems to fit the equation
y = a + bx + cx^2 + dx^3 where y = height, x = latitude
with coefficients
a = 7.0240278979641990E-01
b = 3.7784208874521786E-04
c = -1.2602864112736206E-04
d = 3.8304225582846095E-07
The general approach to find the equation is to first plot the data, then hypothesize the type of function, and then do a regression to find the coefficients.