I'm using pytesseract to recognize a code from contract images and the code is in the same location in the different images. The problem I'm having is I must recognize the whole image which I'm sure will take more time than recognizing a specified zone. Is there a solution to recognizing the code in a specified zone without scanning the whole image?
Thank you
Related
I have been researching and studying various HTML code with similar questions but wanted to try something a little different: (And please forgive me for my silly sounding questions here. I am still quite new with this kinds of things so still experimenting with a lot)
1) I want to know if it's possible to write code in HTML that will allow the display of an image at a certain time. Like let's say for example, the image will only appear at midnight local time from wherever someone is searching.
2) If this is possible, how to I first display one image on a website and then at a specified time have that image replaced by a different image and then after let's say one minute everything will revert back to the original image on the website?
3) How do I "store" the image I want to have displayed at a certain time of the day? I mean, how do I program in HTML to show that new image but obviously not make it possible to be known in the source code?
4) Last, is it possible to prevent proxy servers from being used or other means that could potentially manipulate the time.
Thank you much!
I am posting this question by request of a Firebase engineer.
I am using the Camera2 API in conjunction with Firebase-mlkit vision. I am using both barcode and on-platform OCR. The things I am trying to decode are mostly labels on equipment. In testing the application I have found that trying to scan the entire camera image produces mixed results. The main problem is that the field of view is too wide.
If there are multiple bar codes in view, firebase returns multiple results. You can sort of work around this by looking at the coordinates and picking the one closest to the center.
When scanning text, it's more or less the same, except that you get multiple Blocks, many times incomplete (you'll get a couple of letters here and there).
You can't just narrow the camera mode, though - for this type of scanning, the user benefits from the "wide" camera view for alignment. The ideal situation would be if you have a camera image (let's say for the sake of argument it's 1920x1080) but only a subset of the image is given to firebase-ml. You can imagine a camera view that has a guide box on the screen, and you orient and zoom the item you want to scan within that box.
You can select what kind of image comes from the Camera2 API but firebase-ml spits out warnings if you choose anything other than YUV_420_488. The problem is that there's not a great way in the Android API to deal with YUV images unless you do it yourself. That's what I ultimately ended up doing - I solved my problem by writing a Renderscript that takes an input YUV, converts it to RGBA, crops it, then applies any rotation if necessary. The result of this is a Bitmap, which I then feed into either the FirebaseVisionBarcodeDetectorOptions or FirebaseVisionTextRecognizer.
Note that the bitmap itself cases mlkit runtime warnings, urging me to use the YUV format instead. This is possible, but difficult. You would have to read the byte array and stride information from the original camera2 yuv image and create your own. The object that comes from camear2 is unfortunately a package-protected class, so you can't subclass it or create your own instance - you'd essentially have to start from scratch. (I'm sure there's a reason Google made this class package protected but it's extremely annoying that they did).
The steps I outlined above all work, but with format warnings from mlkit. What makes it even better is the performance gain - the barcode scanner operating on an 800x300 image takes a tiny fraction as long as it does on the full size image!
It occurs to me that none of this would be necessary if firebase paid attention to cropRect. According to the Image API, cropRect defines what portion of the image is valid. That property seems to be mutable, meaning you can get an Image and change its cropRect after the fact. That sounds perfect. I thought that I could get an Image off of the ImageReader, set cropRect to a subset of that image, and pass it to Firebase and that Firebase would ignore anything outside of cropRect.
This does not seem to be the case. Firebase seems to ignore cropRect. In my opinion, firebase should either support cropRect, or the documentation should explicitly state that it ignores it.
My request to the firebase-mlkit team is:
Define the behavior I should expect with regard to cropRect, and document it more explicitly
Explain at least a little about how images are processed by these recognizers. Why is it so insistent that YUV_420_488 be used? Maybe only the Y channel is used in decoding? Doesn't the recognizer have to convert to RGBA internally? If so, why does it get angry at me when I feed in Bitmaps?
Make these recognizers either pay attention to cropRect, or state that they don't and provide another way to tell these recognizers to work on a subset of the image, so that I can get the performance (reliability and speed) that one would expect out of having to ML correlate/transform/whatever a smaller image.
--Chris
Working on a project for Retrieving content from a given image and compare with other images in the repository and list out the matching images.
what should be right approach to do it so that the search wont slowdown eventually.
What I was planning to do as a first level of filtering was to use any Image Querying (CBIR technique) to retrieve images matching the pattern of given image.
Then do OCR to get the image content and do a match check.
Please let me know if there is any better approach for this.
Steps done
Softwares
1. Tesseract OCR
2. Image Magick - For image cleaning
3. Textcleaner script
Found out the image orientation using Image Magick software
Convert package has a feature to find the image orientation using the EXIF data which is not that useful.
For this image was rotated 90 degree thrice and the ocr data for each was compared with the other to find the correct orientation. ( image with maximum number of words wins)
OCRed the image to get the text and applied filtering to get the bill no, date and amount.
on success stores the details on DB for future search
on failure
Created 10 different images with different filters (gray scale mode and sharpment applied)
OCRed all images and found out the required data form all the data got.
Saved data is used for future search feature to eradicate duplication
SikuliX or Sikuli Script has Region.text() which returns the text value from the image on screen based on tesseract ocr.
Is there something similar in Sikuli-java-api??
I need to verify some text from screen and am trying to decide which of the two api should be used. Thanx for ur help in advance!
No, only up on setting up the tesseract ocr, you would be able to read/validate text. if you are not able to download this OCR directly during installation try the offline version of this OCR and copy it to your local.
Depending on what you wish to accomplish by recognizing "text", Sikuli can still recognize images of text. Text images are treated the same as any other images displayed. Sikuli on its own can't interpret the text in an image, however if you know what text you wish to see, and have an image of it for comparison, you can still validate whether or not it appears. Keep in mind that font and resolution changes will likely cause unreliable results.
By linked images, I mean having an image in a directory and giving the path to the image in order to set an objects display or background. I've noticed that since using this method, my page is taking a considerable amount of time to load. So if this is the issue, what would be the alternative?
Thanks
I suggest using the Firebug addon for Firefox.
It breaks the loading time down and explains how long each thing takes to load.
But from your question i would say no, it shouldn't add any extra loading time on.
(I assume you mean loading an image from /index/pictures compared to /index/)
EDIT: Looking at your comments you say "more detailed image"... does this mean a larger file size and if so how large?
You can answer your question yourself by using firebug, We can't without asking a lot more questions since you have left so much needed information out. :/
Regardless of the location of the file, your browser still has to obtain the file. The key word there, is browser.
The data (regardless of type) is downloaded to the client (browser).
Regarding image size, try reducing image size, there are plenty of programs out there (I've found VSO image resizer rather useful in the past)
Remember, slow loading webpages effect SEO!