Why does tesseract OSD gives only 90,270,0 etc? why not random skew angles? - ocr

When i try to identify skew angle using tesseract osd feature (using pytesseract image_to_osd()), i always get skew result of 0,90,270. However, my images have random skews like 40 degree, 55 degree, etc. Doesn't tesseract support skew angles other than 0,90,270?

Related

How Can I Convert Dataset Annaotations To Fixed(YoloV5) Format Without Hand Encoding

So I Am Working On This Awesome Project On Object Detection,Where The Prior Task Is To Identify Brand Logos, So after Doing some research i found this dateset available for the
brand logo For More About Dataset:here
DATASET:
This dateset has 2 versions
FlickrLogos32
FlickrLogos47(recommended for brand detection)
as the name 32 and 47 are the no. of classes offered by this dataset. From the Documentation itself mentioned 47 version is correctly annotated and recommended for object detection & recognization also in my project i have used 47 version
Model:
I Am Using YoloV5 For object detection the
reason behind using YoloV5 and not previous versions is, it it well documented with couple of tutorials with jupyter notebooks available
Problem:
As For The YoloV5:Object Detection Model,The Object Label Should Be Annotated As
<x_center> <y_center> <width> <height> corresponds to bounding box(below image),
whereas the dataset annotations are given in the form of
<x1> <y1> <x2> <y2> where <x1>,<y1>:upper left corner of the bounding box
<x2>,<y2>:lower right corner of the bounding box.
How can i transform <x1>,<y1>,<x2>,<y2>: corner points of bounding box to naive yolo
annotations format i.e.<center_x>,<center_y>,<height>,<width>
without manually going one by one over image and drawing rectangle box with roboflow
Also the Labels are annotated by pixel so we have to normalize it in (0,1)
Datset Insights:
For Any Dataset Example Its Having An Image(.png) and as a Label A Ground truth(.txt)(see below image)
the '.mask' file its just binary mask of object present in image
So A Data Example look likes:
Image:
gt_data.txt:
Mask:
In general to calculate the center it should be xmin + (width/2) and ymin + (height/2). So I think you have you /2 in wrong part of the equation.
Also note that an yolo annotation will look like this.
0.642859 0.079219 0.148063 0.148062
The coordinates are relative to the size of the photo from 0-1. To normalize the coordinates you need to normalize the x dimensions by dividing by the photo width and normalize the y dimensions by dividing by the photo height.

How to use spatial transformer to crop the image in pytorch?

the paper of the spatial transformer network claims that it can be used to crop the image.
Given the crop region (top_left, bottom_right)=(x1,y1,x2,y2), how to interpret the region as a transformation matrix and crop the image in pytorch?
Here is a introduction about the spatial transformer network in torch (http://torch.ch/blog/2015/09/07/spatial_transformers.html), in the introduction, it visualize the bounding box where the transformer look at, How can we determine the bounding box given the transformation matrix?
[Edit]
I just found out the answer to the first question [given the crop region, find out a transformation matrix]
The image in the original post already provides a good answer, but it might be useful to provide some code.
Importantly, this method should retain gradients correctly. In my case I have a batch of y,x values that represent the center of the crop position (in the range [-1,1]). As for the values a and b, which are scale x and y values for the transformation, in my case I used 0.5 for each in combination with a smaller output size (half in width and height) to retain the original scale, i.e. to crop. You can use 1 to have no scale changes, but then there would be no cropping.
import torch.nn.functional as F
def crop_to_affine_matrix(t):
'Turns (N,2) translate values into (N,2,3) affine transformation matrix'
t = t.reshape(-1,1,2,1).flip(2) # flip x,y order to y,x
t = F.pad(t, (2,0,0,0)).squeeze(1)
t[:,0,0] = a
t[:,1,1] = b
return t
t = torch.zeros(5,2) # center crop positions for batch size 5
F.affine_grid(crop_to_affine_matrix(t), outsize)

understanding deconv layer math

I need help. Trying to understand how the math of a deconv layer works. Let's talk about this layer:
layer {
name: "decon"
type: "Deconvolution"
bottom: "conv2"
top: "decon"
convolution_param {
num_output: 1
kernel_size: 4
stride: 2
pad: 1
}
}
So basically this layer is supposed to "upscale" an image by a factor of 2. If I look at the learned weights, I see e.g. this:
-0,0629104823 -0,1560362280 -0,1512266700 -0,0636162385
-0,0635886043 +0,2607241870 +0,2634004350 -0,0603787377
-0,0718072355 +0,3858278100 +0,3168329000 -0,0817491412
-0,0811873227 -0,0312164668 -0,0321144797 -0,0388795212
So far, so good. Now I'm trying to understand how to apply these weights to actually achieve the upscaling effect. I need to do this in my own code because I want to use simple pixel shaders.
Looking at the Caffe code, "DeconvolutionLayer::Forward_cpu" internally calls "backward_cpu_gemm", which does "gemm", followed by "col2im". My understanding of how all this works is this: gemm takes the input image, and multiplies each pixel with each of the 16 weights listed above. So basically gemm produces 16 output "images". Then col2im sums up these 16 "images" to produce the final output image. But due to the stride of 2, it stretches the 16 gemm images over the output image in such a way that each output pixel is only comprised of 4 gemm pixels. Does that sound correct to you so far?
My understand is that each output pixel is calculated from the nearest 4 low-res pixels, by using 4 weights from the 4x4 deconv weight matrix. If you look at the following image:
https://i.stack.imgur.com/X6iXE.png
Each output pixel uses either the yellow, pink, grey or white weights, but not the other weights. Do I understand that correctly? If so, I have a huge understanding problem, because in order for this whole concept to work correctly, e.g. the yellow weights should add up to the same sum as the pink weights etc. But they do not! As a result my pixel shader produces images where 1 out of 4 pixels is darker than the others, or every other line is darker, or things like that (depending on which trained model I'm using). Obviously, when running the model through Caffe, no such artifacts occur. So I must have a misunderstanding somewhere. But I can't find it... :-(
P.S: Just to complete the information: There's a conv layer in front of the deconv layer with "num_output" of e.g. 64. So the deconv layer actually has e.g. 64 4x4 weights, plus one bias, of course.
After a lot of debugging I found that my understanding of the deconv layer was perfectly alright. I fixed all the artifacts by simply dividing the bias floats by 255.0. That's necessary because pixel shaders run in 0-1 range, while the Caffe bias constants seem to be targetted at 0-255 pixel values.
Everything working great now.
I still don't understand why the 4 weight pairs don't sum up to the same value and how that can possibly work. But what do I know. It does work, after all. I suppose some things will always be a mystery to me.

Random bezier blob?

I have some ActionScript3 code I'm using to create liquid-like "droplets", and when they're first generated they look like a curved square (that's as close as I can get them to being a circle). I've tried and failed a lot here but my goal is to make these droplets look more organic and free-form, as if you were looking closely at rain drops on your windshield before they start dripping.
Here's what I have:
var size:int = (100 - asset.width) / 4,
droplet:Shape = new Shape();
droplet.graphics.beginFill(0xCC0000);
droplet.graphics.moveTo(size / 2, 0);
droplet.graphics.curveTo(size, 0, size, size / 2);
droplet.graphics.curveTo(size, size, size / 2, size);
droplet.graphics.curveTo(0, size, 0, size / 2);
droplet.graphics.curveTo(0, 0, size / 2, 0);
// Apply some bevel filters and such...
Which yields a droplet shaped like this:
When I try adding some randomness to the size or the integers or add more curves in the code above, I end up getting jagged points and some line overlap/inversion.
I'm really hoping someone who is good at math or bezier logic can see something obvious that I need to do to make my consistently rounded-corner square achieve shape randomness similar to this:
First off, you can get actual circle-looking cirles using beziers by using 0.55228 * size rather than half-size (in relation to bezier curves, this is sometimes called kappa). It only applies if you're using four segments, and that's where the other hint comes in: the more points you have, the more you can make your shape "creep", so you might actually want more segments, in which case it becomes easier to simply generate a number of points on a circle (fairly straight forward using good old sine and cosine functions and a regularly spaced angle), and then come up with the multi-segment Catmul-Rom curve through those points instead. Catmul-Rom curves and Bezier curves are actually different representations of the same curvatures, so you can pretty much trivially convert from one to the other, explained over at http://pomax.github.io/bezierinfo/#catmullconv (last item in the section gives the translation if you don't care about the maths). You can then introduce as much random travel as you want (make the upper points a little stickier and "jerk" them down when they get too far from the bottom points to get that sticky rain look)

drawing a line: is there exists a limits of thickness in Graphics.lineStyle()?

I'm developing a simple a graphical editor for my flash-based app. In my editor there's a posibility of scaling, range of scaling is big (maximum scale is 16.0, minimum scale is 0.001 and default scale is 0.2). So it's quite possible that a user can draw a line with thickness 0.1 or 300.0, and it looks that line possible thickness (in Graphics.lineStyle()) has upper border. As I found out from livedocs maximum value is 255. So if thickness is greater then 255.0 there'is drawn a line of thickness 255.0. Whether mentioned upper border exists and how big is it. Here're my questions:
Right now I'm drawing lines with drawPath() or lineTo() methods. Natural walkarround if thickness is greater then 255.0 is to draw a rectange instead of segment and two circles on the ends of segment (instead of lineTo()). Or even to draw two thin segments and two half-circles and fill interior. Maybe there's more elegant/quick solution?
Another question is if the thickness of line is big but less then 255.0 (e.g. 100.0), what is faster drawing a line with lineTo() or drawing two thin segments and two half-circles and fill interior?
And finally, maybe someone knows a good article/book where I can read what's inside all methods of flash.display.Graphics class (or even not flash specific article/book on graphics)?
Any thoughts are appreciated. Thank you in advance!
I agree with f-a that putting the line in a container would probably be better and more efficient than drawing a rectangle and extra circles.
I don't think that the math would be too difficult to work out. For efficiency you should probably only do this if the line style is going to be over 255.
To setup the display object to hold your line I would start by halving the width of your line (the length can stay the same). Then create a new sprite and draw the line in the sprite at half size (e.g. if you wanted 300, just draw it at 150). It would be most simple to just start at (0,0) and draw the segment straight so that all of your transformations can be applied to the new sprite.
From here you can just double the scaleY of the sprite to get the desired line weight. It should keep the same length and the ends should also be rounded correctly.
Hope this helped out!
A cool resource for working with the graphics class is Flash and Math. This site has several cool effects and working examples and source code.
http://www.flashandmath.com/