OCR algorithm- distinguish between textual image and object image - ocr

Am writing a program that extracts the contents from the logo of different websites.. i am using OCR to extract the text from the logo but i want to optimize the program and want to apply OCR only on those logos which have text but i dont know how to determine if a logo contains text or not??? any method??

this is a case where we need to know if an image has text in it. It is different from OCR.
The algorithm which is considered to be best to date is Stroke Width Transform. It was designed by Ephstein under Microsoft in 2010. It doesn't use any machine learning purposes.
You can get more details from this paper : Detecting Text in Natural Scenes with Stroke Width Transform
Or watch a video about this.
There is an implementation of this algorithm here.

Related

How can a scanned page be divided into words like the reCaptcha project?

I would like to digitize a book in a similar way to the reCaptcha project. Is there already a system for inputing an image and then outputting little images cropped around words? Any ideas on how to do this?
You should look into the Tesseract OCR project on which reCaptcha was probably based. It has the capability to output the coordinates of recognized words. Then you crop the page to those coords and you are done.
If you just want to split the image in multiple images one word each you could try to find the word bounding boxes and then take those co-ordinates for the splitting. This can be done by taking histograms/projections of the document in horizontal direction and then for each line in vertical direction. An example algorithm with some pictures describing the idea can be found in this paper: "Document Page Decomposition by the Bounding-Box Projection Technique" (http://haralick.org/conferences/71281119.pdf). You could implement this in OpenCV.
Alternativly, you can use Tessaract as mentioned by beppe9000. Perhaps this helps: Getting the bounding box of the recognized words using python-tesseract
But then you get the whole complexity of training OCR even though you only want the bounding boxes.

Could you explain svg tags in website?

I found website that i like, and i download the html source code
I have understand all html source except svg code in homepage can you explain this all numbers means?
you can just explain numbers meaning in svg tags. thank you :)
this is html source: http://1drv.ms/1FPR9Iw
Scalable Vector Graphics (SVG) is the description of an image as an application of the Extensible Markup Language (XML). Any program such as a Web browser that recognizes XML can display the image using the information provided in the SVG format. Different from a raster graphicsscalable part of the term emphasizes that vector graphic images can easily be made scalable (whereas an image specified in raster graphics is a fixed-size bitmap). Thus, the SVG format enables the viewing of an image on a computer display of any size and resolution, whether a tiny LCD screen in a cell phone or a large CRT display in a workstation. In addition to ease of size reduction and enlargement, SVG allows text within images to be recognized as such, so that the text can be located by a search engine and easily translated into other languages.
Being a vector graphics format SVG is mostly useful for vector type diagrams like:
Two-dimensional graphs in an X,Y coordinate system.
Column charts, pie charts etc.
Scalable icons and logos for web, tablet and mobile apps and webapps.
Architecture and design diagrams
etc.
so the number are co-ordinates for your shape
To know more about it and how it's useful and how we should use it please follow below links
you can read more about it Here and Here
Svg ( Scalable Vector Graphics) defines graphics in xml format. you can read more about it here: svg w3 schools.

OCR match frame´s position to field in credit card

I am developing an OCR to detect credit card.
After scanning the image I get a list of words with it´s positions.
Any tips/suggestions about the best approach to detect which words correspond to each field of credit card (number, date, name)?
For example:
position = 96.00 491.00
text = CARDHOLDER
Thanks in advance
Your first problem is that most OCRs are not optimised for small amounts of text that take up most of the "page" (or card image, in your case) in spatially separated chunks. They expect lines, or pages of text from a scanned book or a newspaper. So straight away they're not likely to do that well at analysing the image.
Because the font is fairly uniform they'll likely recognise the characters well, but the layout will confuse the page segmentation algorithm and so the text you get out might not be in the right order. For example, the "1234" of the card number and the smaller "1234" below it constitute a single column of text, likewise the second two sets of four numbers and the expiration date.
For specialized cases where you know the layout in advance you really want to develop your own page segmentation algorithm to break up the image into zones, e.g. card number, card holder name, start and expiration dates. This shouldn't be too hard because I think the location of these components are standardised on credit cards. Assuming good preprocessing and binarization you could basically do a horizontal histogram and split the image at the troughs.
Then extract each zone as a separate image containing just one line of text and feed it to the OCR.
Alternately (the quick and dirty approach)
Instruct the OCR that what you want to recognise consists of a single column (i.e. prevent it from trying to figure out the page layout itself). You can do this with Tesseract using the -psm (page segmentation mode) parameter set to, probably, 6 (but try and see what gives you the best results)
Make Tesseract output hOCR format, which you can set in the configfile. hOCR format includes the bounding boxes of the lines that get output relative to the whole image.
write an algorithm that compares the bounding boxes in the hOCR to where you know each card component should be (looking for some percentage of overlap, it won't match exactly for obvious reasons.)
In addition to the good tips provided by Mikesname, you can greatly improve the recognition result regardless of which OCR engine you use if you use image processing to convert the image to bitonal (pure black and white), such as the attached copy of your image.

How to remove graphic from scanned document before passing it to tesserract for OCRing?

I'm working on OCR project but I don't know how to remove graphics from the scanned document image before passing it to tesserract.
Some scanned documents which I want to remove graphics are below:
http://www.mediafire.com/view/hvmpty2z3cw3vao/IMG_0087.JPG
http://www.mediafire.com/view/1sgy5s2aaj2o8y3/IMG_0086.JPG
Any advice is very appreciate. Many thanks.
As the text area is usually sparse and does not connect each other, you may consider to have a sobel edge detection on the original image and detect the biggest connection area with some threshold to detect the image area.
Meanwhile, as the image is a rectangle area, another way is to have a Hough translation to detect straight line to consist a rectangle with 4 lines. If you go this way, it’s recommended that you zoom the image first to reduce the calculate complexity.
You can start by detecting text areas using an algorithm available in AForge.Net. See HorizontalRunLengthSmoothing and VerticalRunLengthSmoothing. The algorithm is not very complicated and you can implement easily it using your favorite image processing library. The only constraint is to know approximately the size of the characters in your images.

html5 canvas draw text with mutiple font

How can i fill text in the canvas with mutiple font.
I can be able to fill in canvas this:
This is an example of what I want to do
this is another example of what I want to do
I know that i can slpit the sting and do first fill the normal text, second the bold text, and third the rest of the text. but i want to be able to drag and drop the text, so i cant do in that way.
Sorry, you're out of luck. There's no easy way out here.
You have to call drawtext at least three times if you want text with a bolded word in the center.
There may be a time in the future where you are allowed to draw arbitrary html, the spec mentions this is a real possibility, but that won't be for some time. To quote the spec:
Note: A future version of the 2D context API may provide a way to render fragments of documents, rendered using CSS, straight to the canvas. This would be provided in preference to a dedicated way of doing multiline layout.
From the end of this section.
You can of course still drag and drop, you just have to have a list of elements and their locations that make up a "node". Much more complicated objects have been done in the canvas no doubt.