how to perform ocr on a selected part of the image - ocr

I'm a student working on a computer vision project for the automation of the underwriting process.
Following is an example of an old National Identity card of a Sri Lankan. I want to extract the identity card number from the image. When I perform OCR using pytesseract it could not extract the number properly. but when I prop just the number and perform OCR, it successfully identifies the number. This is not just a single use case. This should be automated, when a user enters his Identification number, my programme should be able to extract the number from the identification document and cross check it with what the user has enterd. Im stuck at the point where the extraction happens.
I got a suggestion to create and train a custom object detection model with yolo, and using that identify the part where NIC no is located, crop it and then perform OCR. I wonder whether there is a easier way of doing that other than this ? I just need to extract the identification number from the image.
a sample image of an old NIC is provided for your reference sample nic

"I got a suggestion to create and train a custom object detection model with yolo"
Using Yolo for detecting where the ID number is an overkill.
Some suggestions you can try.
Detect face using the opencv face-detector.
Detect the thick orange line using opencv threshold function on one of the color channels.
Then crop the area above the detected face/line for the id number.

Related

Is there a reality capture parameter to request the desired number of vertices?

In the previous reality capture system users could set a parameter which would determine the resolution of the output models. I want to wind up with models about 100-150K vertices. Is there a setting that allows me to request the modeler to keep the number of generated vertices within some bounds, somewhere in the forge API?
The vertex/triangle decimation is usually, what can be called "subjective" task, which can also explain why there are so many optimization algorithms in the wild.
One type of optimization you would need and expect for "organic" models, and totally different one for an architectural building.
The Reality Capture API provides you only with raw Hi-Res results, avoiding "opionated" optimizations. This should be considered just as a step in automation pipeline.
Another step, would be, upon receiving, to automatically optimize the resulted mesh based on set of settings you need.
One of these steps could be Design Automation for 3ds Max, where you feed a model and using the ProOptimizer Modifier within 3ds Max, you output the mesh with needed detail. A sample of this step, can be found here: https://forge-showroom.autodesk.io/post/prooptimizer.
There are also numerous opensource solutions which should help you cover this post-processing step.

Best practices to fine-tune a model?

I have a few questions regarding the fine-tuning process.
I'm building an app that is able to recognize data from the following documents:
ID Card
Driving license
Passport
Receipts
All of them have different fonts (especially receipts) and it is hard to match exactly the same font and I will have to train the model on a lot of similar fonts.
So my questions are:
Should I train a separate model for each of the document types for better performance and accuracy or it is fine to train a single eng model on a bunch of fonts that are similar to the fonts that are being used on this type of documents?
How many pages of training data should I generate per font? By default, I think tesstrain.sh generates around 4k pages.
Maybe any suggestions on how I can generate training data that is closest to real input data
How many iterations should be used?
For example, if I'm using some font that has a high error rate and I want to target 98% - 99% accuracy rate.
As well maybe some of you had experience working with this type of documents and maybe you know some common fonts that are being used for these documents?
I know that MRZ in passport and id cards is using OCR-B font, but what about the rest of the document?
Thanks in advance!
Ans 1
you can train a single model to achieve the same but if you want to detect different languages then I think you will need different models.
Ans 2
If you are looking for some datasets then have a look at this Mnist Png Dataset which has digits as well as alphabets from various computer-based fonts. Here is a link to some starter code to use the data set implemented in Pytorch.
Ans 3
You can use optuna to find the best set of params for your model, but you will need some of the
using-optuna-to-optimize-pytorch-hyperparameters
Have a look at these
PAN-Card-OCR
document-details-parsing-using-ocr
They are trying to achieve similar task.
Hope it answers your Question...!
I would train a classifier on the 4 different types to classify an ID, license, passport, receipts. Basically so you know that a passport is a passport vs a drivers license ect. Then I would have 4 more models that are used for translating each specific type (passport, drivers license, ID, and receipts). It should be noted that if you are working with multiple languages this will likely mean making 4 models based each specific language meaning that if you have L languages you make need 4*L number of models for translating those.
Likely a lot. I don’t think that font is really an issue. Maybe what you should do is try and define some templates for things like drivers license and then generate based on that template?
This is the least of your problems, just test for this.
Assuming you are referring to a ML data model that might be used to perform ocr using computer vision I'd recommend to:
Setup your taxonomy as required by your application requirements.
This means to categorize the expected font sets per type of scanned document (png,jpg tiff etc.) to include inside the appropriate dataset. Select the fonts closest to the ones being used as well as the type of information you need to gather (Digits only, Alphabetic characters).
Perform data cleanup on your dataset and make sure you have homogenous data for the OCR functionality. For example, all document images should be of png type, with max dimensions of 46x46 to have an appropriate training model. Note that higher resolution images and smaller scale means higher accuracy.
Cater for handwritting as well, if you have damaged or non-visible font images. This might improve character conversion options in cases that fonts on paper are not clearly visible/worn out.
In case you are using keras module with TF on mnist provided datasets, setup a cancellation rule for ML model training when you reach 98%-99% accuracy for more control in case you expect your fonts in images to be error-prone (as stated above). This helps avoid higher margin of errors when you have bad images in your training dataset. For a dataset of 1000+ images, a good setting would be using TF Dense of 256 and 5 epochs.
A sample training dataset can be found here.
If you just need to do some automation with your application or do data entry that requires OCR conversion from images, a good open source solution would be to use information gathering automatically via PSImaging module (Powershell) use the degrees of confidence retrieved (from png) and run them against your current datasets to improve your character match accuracy.
You can find the relevant link here

How to recognize and count objects with Firebase / ML Kit

I'd like to recognize and count objects in a picture, e.g. count the number of houses in a picture of a neighbourhood. What's the best way to do this with ML Kit?
Do I need to use the Object Detection API? Or is it possible to get multiple "house" tags using a straight-forward image-labeler?
The ML Kit Object Detection API (note that it is now offered as a standalone SDK) can count objects in an image / video stream, but it limited to the 5 largest objects. Also, you should evaluate if the object detection works for your use case. It is a very general localizer and works for most objects, however with when objects are close together / overlapping it may not distinguish between them.
If you need to detect more than 5 objects, I would recommend looking at directly using TensorFlow Lite with some of the pre-trained models available on TF Hub or train one yourself using AutoML Vision Edge if the general models don't fit your use case.
Fwiw, Image Labeling assigns labels that describe the scene of an image. However, it does not count the number of objects, you typically get a single label "house".

Best way to identify a person without using facial recognition (deep learning)

I have a cctv video where I want to identify a person. I tried both facial recognition and object tracking but both failed to produce high accuracy since the quality of the frame isn't great and the face disappears from the frame sometimes.
I have simplified the problem as much as I can and now thinking about training a YOLOV3 on the person and do object tracking or training on Resnet50 as a classification problem.
I have also looked into re-identification but not sure if it will work in this use case or not.
So the problem now is simplified to given an image of people and objects in hostile environment, how do you find and identify specific person?
thanks
It seems that deep learning is precisely the tool to use for identifying a specific person. And without facial recognition that seems impossible, unless the person wears the same clothes every time and that's your criteria for "specific person".
Consider using Face-API.js -- you provide several photos of the specific person and you can then detect whether they are in a particular image.
If you are still open to use video as input and not a specific frame you can look into person identification through gait.
One example of a deep learning implementation would be:
https://github.com/marian-margeta/gait-recognition

how to convert number image to display the data of student

This system should record the attendance of a student by using OCR on a captured image from a devcice that using android. The details of the student will be displayed and recorded after the OCR processes the character (for example, a number to identify the students) in the captured image.
The steps: capture an number image using a any device using operating system android --> OCR recognizes the numbers in the image --> the numbers will be use to display the student's data and record his or her attendance for the day.
What you’re asking about is called OCR, and it isn’t as simple as you might think. there is one open-source effort to do it.