This system should record the attendance of a student by using OCR on a captured image from a devcice that using android. The details of the student will be displayed and recorded after the OCR processes the character (for example, a number to identify the students) in the captured image.
The steps: capture an number image using a any device using operating system android --> OCR recognizes the numbers in the image --> the numbers will be use to display the student's data and record his or her attendance for the day.
What you’re asking about is called OCR, and it isn’t as simple as you might think. there is one open-source effort to do it.
Related
I'm a student working on a computer vision project for the automation of the underwriting process.
Following is an example of an old National Identity card of a Sri Lankan. I want to extract the identity card number from the image. When I perform OCR using pytesseract it could not extract the number properly. but when I prop just the number and perform OCR, it successfully identifies the number. This is not just a single use case. This should be automated, when a user enters his Identification number, my programme should be able to extract the number from the identification document and cross check it with what the user has enterd. Im stuck at the point where the extraction happens.
I got a suggestion to create and train a custom object detection model with yolo, and using that identify the part where NIC no is located, crop it and then perform OCR. I wonder whether there is a easier way of doing that other than this ? I just need to extract the identification number from the image.
a sample image of an old NIC is provided for your reference sample nic
"I got a suggestion to create and train a custom object detection model with yolo"
Using Yolo for detecting where the ID number is an overkill.
Some suggestions you can try.
Detect face using the opencv face-detector.
Detect the thick orange line using opencv threshold function on one of the color channels.
Then crop the area above the detected face/line for the id number.
I have a few questions regarding the fine-tuning process.
I'm building an app that is able to recognize data from the following documents:
ID Card
Driving license
Passport
Receipts
All of them have different fonts (especially receipts) and it is hard to match exactly the same font and I will have to train the model on a lot of similar fonts.
So my questions are:
Should I train a separate model for each of the document types for better performance and accuracy or it is fine to train a single eng model on a bunch of fonts that are similar to the fonts that are being used on this type of documents?
How many pages of training data should I generate per font? By default, I think tesstrain.sh generates around 4k pages.
Maybe any suggestions on how I can generate training data that is closest to real input data
How many iterations should be used?
For example, if I'm using some font that has a high error rate and I want to target 98% - 99% accuracy rate.
As well maybe some of you had experience working with this type of documents and maybe you know some common fonts that are being used for these documents?
I know that MRZ in passport and id cards is using OCR-B font, but what about the rest of the document?
Thanks in advance!
Ans 1
you can train a single model to achieve the same but if you want to detect different languages then I think you will need different models.
Ans 2
If you are looking for some datasets then have a look at this Mnist Png Dataset which has digits as well as alphabets from various computer-based fonts. Here is a link to some starter code to use the data set implemented in Pytorch.
Ans 3
You can use optuna to find the best set of params for your model, but you will need some of the
using-optuna-to-optimize-pytorch-hyperparameters
Have a look at these
PAN-Card-OCR
document-details-parsing-using-ocr
They are trying to achieve similar task.
Hope it answers your Question...!
I would train a classifier on the 4 different types to classify an ID, license, passport, receipts. Basically so you know that a passport is a passport vs a drivers license ect. Then I would have 4 more models that are used for translating each specific type (passport, drivers license, ID, and receipts). It should be noted that if you are working with multiple languages this will likely mean making 4 models based each specific language meaning that if you have L languages you make need 4*L number of models for translating those.
Likely a lot. I don’t think that font is really an issue. Maybe what you should do is try and define some templates for things like drivers license and then generate based on that template?
This is the least of your problems, just test for this.
Assuming you are referring to a ML data model that might be used to perform ocr using computer vision I'd recommend to:
Setup your taxonomy as required by your application requirements.
This means to categorize the expected font sets per type of scanned document (png,jpg tiff etc.) to include inside the appropriate dataset. Select the fonts closest to the ones being used as well as the type of information you need to gather (Digits only, Alphabetic characters).
Perform data cleanup on your dataset and make sure you have homogenous data for the OCR functionality. For example, all document images should be of png type, with max dimensions of 46x46 to have an appropriate training model. Note that higher resolution images and smaller scale means higher accuracy.
Cater for handwritting as well, if you have damaged or non-visible font images. This might improve character conversion options in cases that fonts on paper are not clearly visible/worn out.
In case you are using keras module with TF on mnist provided datasets, setup a cancellation rule for ML model training when you reach 98%-99% accuracy for more control in case you expect your fonts in images to be error-prone (as stated above). This helps avoid higher margin of errors when you have bad images in your training dataset. For a dataset of 1000+ images, a good setting would be using TF Dense of 256 and 5 epochs.
A sample training dataset can be found here.
If you just need to do some automation with your application or do data entry that requires OCR conversion from images, a good open source solution would be to use information gathering automatically via PSImaging module (Powershell) use the degrees of confidence retrieved (from png) and run them against your current datasets to improve your character match accuracy.
You can find the relevant link here
I am trying to write a PC application (Windows, .NET) that identifies students on the basis of some card equipped with RFID identification to build lecture attendance registers. Currently I have a Stronglink SL040A RFID reader (http://www.stronglink-rfid.com/en/rfid-modules/sl040.html), which operates as a HID and sends the data as a series of keystrokes.
The system works perfectly with older cards like Mifare 1K classic (even with PayPass credit cards). The new student cards (and identity cards) issued by the Hungarian authorities, however, contain Mifare PlusX 4K chips, which seem to send a new key every time one uses the card. I have tried experimenting with the settings the configuration tool of the reader offers, but to no avail. I can make the 1K classic cards send a much longer key by changing the end block parameter but the PlusX 4K keeps sending the shorter, and painfully non-consistent, keys.
I am a physicist without a deeper understanding of these chips and RFID authentication in general – I am just trying to make a job done that seemed easy at the beginning. I have no intention of cracking or abusing these cards in any way, I am just trying to find some block of data on the card that stays consistent upon each use, does not require complicated authentication protocols but is unique between different cards.
Is it possible or is it against the philosophy of these chips? If possible, shall I have to buy a new reader or can I make it do what I need?
Your thoughts are much appreciated.
From the MiFare PlusX 4K datasheet:
Section 8.2:
There are three different versions of the PICC. The UID is programmed into a locked part
of the NV-memory reserved for the manufacturer:
• unique 7-byte serial number
• unique 4-byte serial number
• non-unique 4-byte serial number
Due to security and system requirements, these bytes are write-protected after being
programmed by the PICC manufacturer at production.
...
During personalization, the PICC can be configured to support Random ID in security
level 3. The user can configure whether Random ID or fixed UID shall be used. According
to ISO/IEC 14443-3 the first anticollision loop (see Ref. 5) returns the Random Number
Tag 08h, the 3-byte Random Number and the BCC, if Random ID is used. The retrieval of
the UID in this case can be done using the Virtual Card Support Last command, see
Ref. 3 or by reading out block 0.
From what you have described, it appears that the cards are running in Security Level 3, and unfortunately, the backwards-compatible part of the card only exists at lower security levels. The mentioned command of Virtual Card Support Last is also only available after level 3 authentication.
I'm afraid what you want to do appears impossible unless you can use the ISO/IEC 14443-4 protocol layer, which I think would let you authenticate at level 3? The relevant data appears to be in section 8.7, and involves AES authentication.
I want to particular data from my image or pdf. For example I have invoice bill in scan document so i just want to extract invoice number. I am already used Tesseract OCR, Apache Tika OCR, Aspose OCR so please suggest me step to get particular data. Thank you in advance.
You can get/extract some specific contents from a portion of the image by using custom recognition blocks. Please note, the above mentioned solution is useful in scenario when you have documents/images following the similar structure, that is; the contents to be scanned are always on the same location for each image.
Furthermore you can perform OCR operation on a PDF file using Aspose.OCR in combination with Aspose.Pdf. Visit the link Performing OCR on PDF Documents for details.
I work with Aspose as Developer evangelist.
Have you looked at using ABBYY FlexiCapture? That function is one of the primary aspects of what it does. In using products like FlexiCapture the issue becomes whether your document is of fixed or semi-structured design. For documents like invoices the answer is almost always semi-structured because the information moves around on the page. Also, there are usually many different layouts of invoices. ABBYY solved that challenge through their FlexiCapture for Invoice product.
As an alternative, if you just needed to extract something like an invoice number in a region there are ways in lower priced products like ABBYY Recognition Server in which you could use what they call an area template, or you could extract all of the OCR text and develop an application to apply a regular expression to locate the field value adjacent to the field label. Problems can arise when the field label and the field value do not fall in proximity to each other in the OCR result text. This can happen when after the line break of the field labels ("invoice no" or "invoice #") there is another value immediately following on the first position of the next line. Then the OCR text could become something like "Invoice No. Bob's Bargain Barn 66422." The Regex could look for the value immediately following the search phrase "Invoice No." then produce a result for the adjacent text "Bob's Bargain Barn." Worse, often times the label text and invoice number will be within a table, complicating matters as some OCR engines would ignore it altogether (not Recognition Server though). It is for these reasons we researched FlexiCapture because it eliminated fancy coding required for data extraction. It is expensive but worth it.
Disclosure, we are an ABBYY Partner.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
Our motor pool wants to scan drivers’ licenses and have the data imported into our custom system. We're looking for something that will allow us to programmatically get the data from the scanner (including the picture) and let us insert it into our application. I was wondering if anyone has had experience with this type of system and could recommend one or tell us which ones to avoid. Our application is written in PowerBuilder and uses a DB2 database.
Try solutions by idScan.net (www.idScan.net)
There is SDK that will allow drivers license parsing for all states in the USA and Canadian provinces. You can also purchase hardware such as ID scanner E-seek m250 that reads both 2D barcode and magnetic stripes (software is included).
Good luck!
We support something similar in our records management software. Our application is designed to work with a wedge reader, since they are the easiest to get up and running (no special drivers needed). When a card is swiped, the reader sends keystrokes to the OS for each character that is encoded on the magnetic stripe, with a simulated Enter keypress between each track (an AAMVA-compliant license has 3 data tracks).
It's slightly annoying because it behaves exactly as if someone was typing out the data by hand, so there is no easy way to tell when you have all the data (you could just wait to get 3 lines of information, but then it's difficult to detect invalid cards, such as when someone tries to swipe a student ID card, which might have fewer than 3 tracks encoded; in this case, the application hangs forever waiting for the non-existent third track to be received). To deal with this, we use a "fail-fast" approach: each time we get an Enter keypress, we immediately process the current line, keeping a record of which track we are expecting at that point (1, 2, or 3). If the current track cannot be processed (for example, a different start character appears on the track that what is documented for an AAMVA format driver's license), we assume the user must have swiped something other than a driver's license.
I'm not sure if the reader we use supports reading image data or not. It can be programmed to return a subset of the data on the card, but we just use the factory default setting, which appears to return only the first three data tracks (and actually I believe image data is encoded in the 2D barcode found on some licenses, not on the magnetic stripe, but I could be wrong).
For more on the AAMVA track format that is used on driver's license magstripes, see Annex F in the current standard.
The basic approach we use is:
Display a modal dialog that has a hidden textbox, which is given focus. The dialog box simply tells the user to swipe the card through the reader.
The user swipes the card, and the reader starts sending keydown events to the hidden textbox.
The keydown event handler for the textbox watches for Enter keypresses. When one is detected, we grab the last line currently stored in the textbox, and pass it to a track parser that attempts to parse the track according to the AAMVA format.
If this "fail-fast" parsing step fails for the current track, we change the dialog's status message to a message telling the user the card could not be read. At this point, the textbox will still receive additional keydown events, but it's OK because subsequent tracks have a high enough chance of also failing that the user will still see the error message whenever the reader stops sending data.
If the parsing is successful, we increment a counter that tells the parser what track it should process next.
If the current track count is greater than 3, we know we've processed 3 tracks. At this point we parse the 3 tracks (which have already split most of the fields up but everything is still stored as strings at this point) into a more usable DriversLicense object, which does additional checks on the track data, and makes it more consumable from our application (converting the DOB field from a string into a real Date object, parsing out the subfields in the AAMVA Name field into first name, middle name, last name, name suffix, etc.). If this second parsing phase fails, we tell the user to reswipe the card. If it succeeds, we close the dialog and pass the DriversLicense object to our main application for further processing.
If your scanner is "twain compliant", You will then be able to manage it from your app through an ActiveX control you can buy on the net like this one. You'll be able to manage your basic scan parameters (quality, color, single/multiple pages can, output format, etc), start the scan from your app, save the result as a file and transfer this file wherever needed. We have been using it with VB code for the last 2 years. It works.
Maybe you want to use magnetic stripe reader, to get driver license info from the card. As I remember most of the Driver licenses just have the data in plain text on those stripes, so it is relatively stright forward programming-wise.
MagStripe readers are also cheap now days.
You can try something from this list: http://www.adams1.com/plugins.html
I have not used them myself, though.
I wrote a parser in C#, and while it's "ok" it's still far from perfect.
I can't seem to find it but a Wikipedia entry used to exist that has the patterns to look for (trust me, parsing this yourself is a pain without any help).
Be aware that different states have different laws for what you can and can't use government issued ID's for. Texas has one.
We use a dell card reader and it inputs it exactly as though it were being typed through a keyboard, followed by the enter key. This made programming /very/ easy because then you just send focus to the text box and wait for enter. The main keys which break it in to chunks is the carrot '^'. Break that and you'll have your basic chunks.
You can also use InfoScan SDK. You can find it on www.scan-monitor.com the system allows you to use any scanner and does not make you purchase a specific scanner.