How to extract texts from customized blocks in a image using Google Vision API(OCR)? - ocr

when we use Google vision's DOCUMENT_TEXT_DETECTION for a image, it decides what are the blocks in the image and what texts are in each block
Here I want to get the text for the blocks which are defined by me(already have a model to identify different blocks in a image).
Simply I want the texts within blocks defined by me but the defined by Google vision.
How I can achieve this?

For now, I decided to filter symbols for given block's vertices. It is better, if there is a way to simply find intersected symbols. For, now I'm going to loop through every symbol.

I found a better way to do this. First I merge each block vertically and between every block it is possible to include textual separator. It mean after every block there is a line of text. So then we can provide this image with merged blocks as the input for Google vision API. Is the response we can get full text for our input and we also have the text that we previously set between the blocks. So we can split the whole text using that. Then we can have block-wise text

Related

How can a scanned page be divided into words like the reCaptcha project?

I would like to digitize a book in a similar way to the reCaptcha project. Is there already a system for inputing an image and then outputting little images cropped around words? Any ideas on how to do this?
You should look into the Tesseract OCR project on which reCaptcha was probably based. It has the capability to output the coordinates of recognized words. Then you crop the page to those coords and you are done.
If you just want to split the image in multiple images one word each you could try to find the word bounding boxes and then take those co-ordinates for the splitting. This can be done by taking histograms/projections of the document in horizontal direction and then for each line in vertical direction. An example algorithm with some pictures describing the idea can be found in this paper: "Document Page Decomposition by the Bounding-Box Projection Technique" (http://haralick.org/conferences/71281119.pdf). You could implement this in OpenCV.
Alternativly, you can use Tessaract as mentioned by beppe9000. Perhaps this helps: Getting the bounding box of the recognized words using python-tesseract
But then you get the whole complexity of training OCR even though you only want the bounding boxes.

computer vision to separate characters from image

I'm trying to separate text from a background that has very similar background.
any idea on how to extract HDP 250?
The simple way would be to create a database of how each letter should look like. Then use template matching to find most probable letter. You will have to make some restrictions regarding image orientation, scale and illumination, they cannot vary too much. Also learn how empty space should look like and only consider a letter when its probably of presence is higher than an empty space one.

OCR match frame´s position to field in credit card

I am developing an OCR to detect credit card.
After scanning the image I get a list of words with it´s positions.
Any tips/suggestions about the best approach to detect which words correspond to each field of credit card (number, date, name)?
For example:
position = 96.00 491.00
text = CARDHOLDER
Thanks in advance
Your first problem is that most OCRs are not optimised for small amounts of text that take up most of the "page" (or card image, in your case) in spatially separated chunks. They expect lines, or pages of text from a scanned book or a newspaper. So straight away they're not likely to do that well at analysing the image.
Because the font is fairly uniform they'll likely recognise the characters well, but the layout will confuse the page segmentation algorithm and so the text you get out might not be in the right order. For example, the "1234" of the card number and the smaller "1234" below it constitute a single column of text, likewise the second two sets of four numbers and the expiration date.
For specialized cases where you know the layout in advance you really want to develop your own page segmentation algorithm to break up the image into zones, e.g. card number, card holder name, start and expiration dates. This shouldn't be too hard because I think the location of these components are standardised on credit cards. Assuming good preprocessing and binarization you could basically do a horizontal histogram and split the image at the troughs.
Then extract each zone as a separate image containing just one line of text and feed it to the OCR.
Alternately (the quick and dirty approach)
Instruct the OCR that what you want to recognise consists of a single column (i.e. prevent it from trying to figure out the page layout itself). You can do this with Tesseract using the -psm (page segmentation mode) parameter set to, probably, 6 (but try and see what gives you the best results)
Make Tesseract output hOCR format, which you can set in the configfile. hOCR format includes the bounding boxes of the lines that get output relative to the whole image.
write an algorithm that compares the bounding boxes in the hOCR to where you know each card component should be (looking for some percentage of overlap, it won't match exactly for obvious reasons.)
In addition to the good tips provided by Mikesname, you can greatly improve the recognition result regardless of which OCR engine you use if you use image processing to convert the image to bitonal (pure black and white), such as the attached copy of your image.

html5 canvas draw text with mutiple font

How can i fill text in the canvas with mutiple font.
I can be able to fill in canvas this:
This is an example of what I want to do
this is another example of what I want to do
I know that i can slpit the sting and do first fill the normal text, second the bold text, and third the rest of the text. but i want to be able to drag and drop the text, so i cant do in that way.
Sorry, you're out of luck. There's no easy way out here.
You have to call drawtext at least three times if you want text with a bolded word in the center.
There may be a time in the future where you are allowed to draw arbitrary html, the spec mentions this is a real possibility, but that won't be for some time. To quote the spec:
Note: A future version of the 2D context API may provide a way to render fragments of documents, rendered using CSS, straight to the canvas. This would be provided in preference to a dedicated way of doing multiline layout.
From the end of this section.
You can of course still drag and drop, you just have to have a list of elements and their locations that make up a "node". Much more complicated objects have been done in the canvas no doubt.

How to implement text selecting?

My question is not language based or OS based. I guess every system is offering some sort of TextOut(text, x, y) method. I am looking for some guidlines or articles how should I implement selection of outputed text. Could not find any info about this.
The only thing which comes to my mind is like this:
When user clicks some point on the text canvas I know the coordinates of this point. I need to calculate where exactly it will be in my text buffer. So I am traversing from the begining of the buffer and I am applying to each character (or block of text) a style (if it has any). After this, I know that after given style the letter has given size. I am adding its width and height to previously calculated X,Y coordinates. In this way, I am traversing the buffer until the calculated position has not reached the point that has been clicked by the user. After I reach the point within range of some offset I have starting point for the selection.
This is the basic idea. I don't know if this is good, I would like to know how this is done for real like for example in Firefox. I know I could browse the sources and if I won't have a choice I'll do it. But first I am trying to find some article about it...
Selecting text is inherently specific to the control which is containing it and the means it stores that text.
A very simple (though questionably inefficient means) is to run the text flow algorithm you are using when clicking on a point and stopping the algorithm when you have reached what is closest to that point. More advanced controls might cache the text layout to make selections or drawing their content more efficient. Depending on how much you value CPU time or memory there are ways to use caches and special cases to make this “hit test” cheaper.
If you can make any assertions (only one font in the control, so every line has the same height) then it is possible to make these tests cheaper by indexing the font layout by lines and then doing simple arithmetic to find out which line was clicked on. If your text control is also using monospace fonts (every character occupies the same width as well as height) then you are in even more luck, as you can jump straight to the character information via a lookup table and two simple divisions.
Keep in mind that writing a text control from scratch is obscenely difficult. For best practice, you should keep the content of the document separate from the display information. The reason for this is because the text itself will need to be edited quite often, so algorithms such as Ropes or Gap Buffers may be employed on the data side to provide faster insertion around the caret. Every time text is edited it must also be rendered, which involves taking that data and running it through some kind of formatting / flow algorithms to determine how it needs to be displayed to the user. Both of these sides require a lot of algorithms that may be annoying to get right.
Unfortunately using the native TextOut functions will not help you. You will need to use methods which give you the text extents for individual characters, and more advanced (multiline for example) controls often must do their own rendering of characters using this information. Functions like TextOut are not built to deal with blinking insertion carets for example, or performing incremental updates on text layouts. While some TextOut style functions may support word wrap and alignment for you, they also require re-rendering the entire string which becomes more undesirable in proportion to the amount of text you need to work with in your control.
You are thinking at a much lower level than necessary (not an insult. you are thinking that you need to do much more work then you need to). Most (if not all) languages with GUI support will also have some form of selectionRange that gives you either the string that was selected or the start and stop indices in the string.
With a modern language, you should never have to calculate pixels and character widths.
For text selection in Javascript, see this question: Understanding what goes on with textarea selection with JavaScript