How to convert image of handwriting into pen coordinates? - ocr

I have a image with binary values (black and white) at each pixel. I want to convert this into an ordered list of pen coordinates (X,Y) which traces the path of the pen. I want to do this so that I can use an API which only takes pen coordinates as an input.
Is there any library or straightforward way to do this? Thanks!

This question https://graphicdesign.stackexchange.com/questions/25165/how-can-i-convert-a-jpg-signature-into-strokes describes using a vector graphic convertor to do this. It suggests first converting the pixels to binary values, and then using the autotrace tool.
autotrace -centerline -color-count 2 -output-file output.svg -output-format SVG input.png
I'm not sure what the best way to get SVG files into (X,Y) coordinates is, you can do a rough job by parsing the XML directly.
Another approach could be to look through the submissions in the Kaggle competition for this same problem: https://www.kaggle.com/c/icdar2013-stroke-recovery-from-offline-data . I haven't looked into these myself, though I imagine they'd result in better performance.

Related

Plotting lineArcs with turf.js that don't match up with their surrounding geodesic strings

Background
We are supplied with some AIXM data (an XML based superset of GML) which describes polygon areas on a map as a mix of GeodesicStrings (a list of coordinates) and ArcByCentrePoints (a centre point coordinate with a radius, start bearing and end bearing). We are taking this data and converting it into a simple list of coordinates that we then display using a Google maps polyline.
Problem
When we plot a shape with an arc, the start and end points of the arc usually don't match up with the end point of the preceding line and the start point of the subsequent line. It looks as if the radial distance is out by an amount which doesn't appear to be proportional to the radius. See screenshot: interestingly the smaller arc at the top seems fine but the larger arc is inset.
We're pretty sure the data is correct because it looks fine when we use a third party tool to visualise it, so we're doing something wrong.
Implementation
We are using the turf.js library to convert the arc description into a set of points using their lineArc function. Internally this utilises their destination function which "uses the Haversine formula to account for global curvature". We combine these generated points, in the correct sequence, with the points taken directly from the preceding and subsequent GeodesicString elements to give us our final polygon.
Data
Input: Fragment of AIXM (GML) describing polygon
Output: Resulting list of points
Help!
I'm aware this question is light on code but I hope I've described the problem adequately and that some kind person with more GIS knowledge than me (>0) might be able to point me in the right direction. Thanks :)
I've given a couple of presentations on debugging and one of the things I say is that you should keep an open mind and shouldn't get too fixated on a possible cause of a bug because you can waste a lot of time tracking down a false lead.
Sadly in this case I didn't take my own advice. I was so obsessed with the idea that the problem arose from a complex cause, such as issues with the implementation of the Haversine formula, that I overlooked the far simpler answer. My code was taking a string representation of the radius, including the units (e.g. nautical miles or meters) and converting it into kilometres. Sadly I was using parseInt rather than parseFloat as part of this and so instantly losing precision. It was a simple as that - a schoolboy error.
Big thanks to Stefano Borghi, a maintainer of Turf JS, for all his help with this and for helping me see the wood for the trees.

How to merge two shapes from KML files and get coordinates of overlapped area?

There are two KML/KMZ files. As an example one of them has coordinates of black square and another one has coordinates of green square. How can I get coordinates of red square (which is overlapped area)? Looks like ideally this can be scripted or generated using program.
If this can be achieved, then to summarise the goal is to analyse and merge two KML/KMZ files of boundaries and create smaller shapes in one KML/KMZ.
Many thanks
You'll probably want to use software that can perform basic GIS analysis functions. A good free option is QGIS. Load your KML files and then go to the Vector menu and find the Intersect tool.
If you need commandline/script/programatic options, you can look into GDAL and it's OGR Intersection method.

How do I add a caffe slice layer to crop images?

For example instead of inferring a batch of 64 28x28 images and adding 64 results together why can I add a layer to the network and crop out these 64 images from a 224x224 input image? It seems this would be more elegant and faster.
gif of different lighting
How do you do this? I find it odd I can’t find slice examples like this and I am guess I must be using the wrong terms or someone asking the question wrong.
I tried the slice layer but it keep wanting to slice the 8 bit gray. For example to create four 224x224 2bit images.
Any Ideas?
By the way my application is really cool! I am doing unsupervised grouping of 3D objects using many different lighting angles. This eliminates manual labeling of classes!
https://github.com/GemHunt/lighting-augmentation
Thanks Much! Paul Krush
This was answered for now on the Caffe Google Group:
https://groups.google.com/forum/#!topic/caffe-users/_uii8kTMOdM
One or more of these answers will work for me:
1.) Python pre and post processing is better
2.) Check out windowing
3.) Try the reshape layer (1,1,224,224)->(1,64,28,28)

How to draw a path partially in HTML5's canvas?

Let's say I have curved path made using a serie of bezierCurveTo() calls. I'd like to make it appear progressively in an animation, by increasing the percentage of it that is drawn frame-after-frame. The problem is that I cannot find a standard way to draw only a part of a canvas path - would someon know of a good way (or even a tricky way) to achieve this?
Sure...and Simon Porritt did all the hard math for us!
jsBezier is a small lib with a function: pointAlongCurveFrom(curve, location, distance) that will let you incrementally plot each point along your Bezier curve.
jsBezier is available on GitHub: https://github.com/sporritt/jsBezier
Just found a small library that does exactly that: https://github.com/camoconnell/lazy-line-painter
It relies on the Raphael lib (http://raphaeljs.com/), and the two put together do not make too big a payload.

document image processing

I working on an application for processing document images (mainly invoices) and basically, I'd like to convert certain regions of interest into an XML-structure and then classify the document based on that data. Currently I am using ImageJ for analyzing the document image and Asprise/tesseract for OCR.
Now I am looking for something to make developing easier. Specifically, I am looking for something to automatically deskew a document image and analyze the document structure (e.g. converting an image into a quadtree structure for easier processing). Although I prefer Java and ImageJ I am interested in any libraries/code/papers regardless of the programming language it's written in.
While the system I am working on should as far as possible process data automatically, the user should oversee the results and, if necessary, correct the classification suggested by the system. Therefore I am interested in using machine learning techniques to achieve more reliable results. When similar documents are processed, e.g. invoices of a specific company, its structure is usually the same. When the user has previously corrected data of documents from a company, these corrections should be considered in the future. I have only limited knowledge of machine learning techniques and would like to know how I could realize my idea.
The following prototype in Mathematica finds the coordinates of blocks of text and performs OCR within each block. You may need to adapt the parameters values to fit the dimensions of your actual images. I do not address the machine learning part of the question; perhaps you would not even need it for this application.
Import the picture, create a binary mask for the printed parts, and enlarge these parts using an horizontal closing (dilation and erosion).
Query for each blob's orientation, cluster the orientations, and determine the overall rotation by averaging the orientations of the largest cluster.
Use the previous angle to straighten the image. At this time OCR is possible, but you would lose the spatial information for the blocks of text, which will make the post-processing much more difficult than it needs to be. Instead, find blobs of text by horizontal closing.
For each connected component, query for the bounding box position and the centroid position. Use the bounding box positions to extract the corresponding image patch and perform OCR on the patch.
At this point, you have a list of strings and their spatial positions. That's not XML yet, but it sounds like a good starting point to be tailored straightforwardly to your needs.
This is the code. Again, the parameters (structuring elements) of the morphological functions may need to change, based on the scale of your actual images; also, if the invoice is too tilted, you may need to "rotate" roughly the structuring elements in order to still achieve good "un-skewing."
img = ColorConvert[Import#"http://www.team-bhp.com/forum/attachments/test-drives-initial-ownership-reports/490952d1296308008-laura-tsi-initial-ownership-experience-img023.jpg", "Grayscale"];
b = ColorNegate#Binarize[img];
mask = Closing[b, BoxMatrix[{2, 20}]]
orientations = ComponentMeasurements[mask, "Orientation"];
angles = FindClusters#orientations[[All, 2]]
\[Theta] = Mean[angles[[1]]]
straight = ColorNegate#Binarize[ImageRotate[img, \[Pi] - \[Theta], Background -> 1]]
TextRecognize[straight]
boxes = Closing[straight, BoxMatrix[{1, 20}]]
comp = MorphologicalComponents[boxes];
measurements = ComponentMeasurements[{comp, straight}, {"BoundingBox", "Centroid"}];
texts = TextRecognize#ImageTrim[straight, #] & /# measurements[[All, 2, 1]];
Cases[Thread[measurements[[All, 2, 2]] -> texts], (_ -> t_) /; StringLength[t] > 0] // TableForm
The paper we use for skew angle detection is: Skew detection and text line position determination in digitized documents by Gatos et. al. The only limitation with this paper is that it can detect skew upto -5 and +5 degrees. After that, we need something to slap the user with a message! :)
In your case, where there are primarily invoice scans, you may beautifully use: Multiresolution Analysis in Extraction of Reference Lines from Documents with Gray Level Background by Tag et. al.
We wrote the code in MATLAB, if you need help let me know!
I worked on a similar project once, and for being a long time user of OpenCV I ended up using it once again. OpenCV is a popular-cross-platform-computer-vision-library that offers programming interfaces for C and C++.
I found an interesting blog that had a post on how to detect the skew angle of a text using OpenCV, and then another on how to deskew.
To retrieve the text of the document and be able to pass a smaller image to tesseract, I suggest taking a look at the bounding box technique.
I don't know if the image acquisition procedure is your responsibility, but if it is you might want to take a look at how to do camera calibration with OpenCV to fix the distortion in the image caused by some camera lenses.