How to deal with TIFF files with incompatible ICC color profiles - tiff

I'm writing a TIFFImageReader plugin for Java (ImageIO), but the question is generic:
What is the "correct" way to deal with a TIFF file that contains an ICC color profile that does not match the image data (ie. image data is RGB, but TIFF metadata contains a CMYK ICC profile)?
I recently had a user report this problem. The quick and easy fix is to just issue a warning, and ignore the color profile. However, the TIFF 6.0 specification doesn't mention ICC profiles at all, or how to deal with them. ICC Specification ICC.1:2010-12, annex B does describe how to correctly embed color profiles, but does not mention profile relation to image data (other than being in the same IFD).

There is no "correct" way to deal with this as a general problem.
The color profile tells you how to connect RGB numbers, which by themselves are just abstract values, to a device-independent space like CIE XYZ or CIELAB so you can correctly interpret them as colors and display them in a consistent way.
With an incorrect profile (or without a profile for that matter) you can't know how to interpret the RGB numbers. It's a little like getting a weight measurement without units of with incorrect units. If I have a recipe that calls for 5 units of water, and you don't know whether that should be 5 grams or 5 cups, you are kind of stuck. You can either guess or tell the user their information doesn't make sense.
In many cases you can make an educated guess based on domain-specific information. For example, we can often correctly guess that images on the web without a profile will likely be sRGB data, but that's just a guess. A lot like assuming a recipe for cookies that calls for 1 flour probably means 1 cup of flour.
It's very unusual for an RGB image to carry a CMYK profile — it's really hard to understand how that even happens. If I had to deal with this, I would throw an error with a message about the problem. If color accuracy is not critical, you could also strip the profile and let whatever downstream process needs to display the image deal with the untagged image, or guess something reasonable like sRGB with the understanding that it is just a guess.

Related

Producing many different hashes of a jpg file with minimal change to picture

My goal is to write a program (e.g. in Python or C++) which takes as input a JPG file (e.g. tux.jpg) and make tiny changes to it, such that it outputs many different images (maybe a thousand images or even more), but in a way that all these images, while having different hash, look almost the same visually, i.e. the changes should have the least impact to the original image as possible.
I first though to play around with the jpg header but that might not be enough to make the many thousands of different pictures I want.
As a naive way, I thought to flip a random bit in the file, but that bit can possibly result in a less than desirable result, which can be seen especially in small pictures (e.g. a dark pixel in the white space in the tux picture). Ideally, I would like to change a random pixel with a "neighboring" color, such that the two resulting pictures have almost no visual difference.
For this purpose, I read the JPG codec example but I find it very confusing and hard to understand. Can someone help me what my program should look for as it parses the file in binary format and how to change a random pixel with a "neighboring" color?
You can change the comment part of the file by playing with the file header. A simple way to do that is to use a ready made open source program that allows you to put the comment of your choice, example HLLO repeated 8 times. That should give you 256 bits to play with. You can then determine the place where the HLLO pattern is located in the file using a hex editor. You then load the data in memory and start changing these 32 bytes and calculate the hash each time to get a collision (a hash that matches)
By the time you find a collision, the universe will have ended.
Although in theory doable, it's practically impossible to crack SHA256 in a reasonable amount of time, standard encryption protocols would be over and hackers would be enjoying their time.

Is there a "law of diminishing returns" for converting images to Base64 as opposed to simply using the images themselves?

Say I have some icons on my site (32x32, 64x64, 128x128, etc.), then converting them to Base64 makes sense, right?
Now say I have some high resolution images that are 2MB, 3MB, 4MB, and greater that I am using on my site. Does it make sense to convert these larger images to Base64 or is it "smarter" to simply keep using them as .jpg/.png/.gif/etc.?
If there is such a "law", "rule of thumb", etc. what is the line in the sand?
EDIT: While this post was marked as a duplicate, the linked "original" is from over 10 years ago; browser technology, computers, and the web itself, has changed significantly since then. What would be the answer for today's technology?
The answer to the questions is yes, and it depends.
If we rephrase the question to: Does the law of diminishing returns apply to using base64 for embedding images in a page?
a) Yes, the law applies
b) It depends on: Image count and size, and your setup (ie HTTP (HTTP/2?) connection type, etc)
The reason being that more images require more connections imply more handshakes, unless you are using keep alive connections or HTTP/2 streaming. If the images are bigger and require more computing to convert from base64 back to binary (plus decompression), then the bandwidth saves come with CPU expense.
In general, if you have lots of images (icons, for example), you could embed as base64. But in that case you also have the following options:
a) Image Atlas: Converting all small images to a single image (one load) and showing only the portion that you need through the page.
b) Converting to alternative formats, such as fonts or SVG, and again rendering what you need. Example: Open Iconic.

FirebaseVisionImage / ML Toolkit cropRect() support

I am posting this question by request of a Firebase engineer.
I am using the Camera2 API in conjunction with Firebase-mlkit vision. I am using both barcode and on-platform OCR. The things I am trying to decode are mostly labels on equipment. In testing the application I have found that trying to scan the entire camera image produces mixed results. The main problem is that the field of view is too wide.
If there are multiple bar codes in view, firebase returns multiple results. You can sort of work around this by looking at the coordinates and picking the one closest to the center.
When scanning text, it's more or less the same, except that you get multiple Blocks, many times incomplete (you'll get a couple of letters here and there).
You can't just narrow the camera mode, though - for this type of scanning, the user benefits from the "wide" camera view for alignment. The ideal situation would be if you have a camera image (let's say for the sake of argument it's 1920x1080) but only a subset of the image is given to firebase-ml. You can imagine a camera view that has a guide box on the screen, and you orient and zoom the item you want to scan within that box.
You can select what kind of image comes from the Camera2 API but firebase-ml spits out warnings if you choose anything other than YUV_420_488. The problem is that there's not a great way in the Android API to deal with YUV images unless you do it yourself. That's what I ultimately ended up doing - I solved my problem by writing a Renderscript that takes an input YUV, converts it to RGBA, crops it, then applies any rotation if necessary. The result of this is a Bitmap, which I then feed into either the FirebaseVisionBarcodeDetectorOptions or FirebaseVisionTextRecognizer.
Note that the bitmap itself cases mlkit runtime warnings, urging me to use the YUV format instead. This is possible, but difficult. You would have to read the byte array and stride information from the original camera2 yuv image and create your own. The object that comes from camear2 is unfortunately a package-protected class, so you can't subclass it or create your own instance - you'd essentially have to start from scratch. (I'm sure there's a reason Google made this class package protected but it's extremely annoying that they did).
The steps I outlined above all work, but with format warnings from mlkit. What makes it even better is the performance gain - the barcode scanner operating on an 800x300 image takes a tiny fraction as long as it does on the full size image!
It occurs to me that none of this would be necessary if firebase paid attention to cropRect. According to the Image API, cropRect defines what portion of the image is valid. That property seems to be mutable, meaning you can get an Image and change its cropRect after the fact. That sounds perfect. I thought that I could get an Image off of the ImageReader, set cropRect to a subset of that image, and pass it to Firebase and that Firebase would ignore anything outside of cropRect.
This does not seem to be the case. Firebase seems to ignore cropRect. In my opinion, firebase should either support cropRect, or the documentation should explicitly state that it ignores it.
My request to the firebase-mlkit team is:
Define the behavior I should expect with regard to cropRect, and document it more explicitly
Explain at least a little about how images are processed by these recognizers. Why is it so insistent that YUV_420_488 be used? Maybe only the Y channel is used in decoding? Doesn't the recognizer have to convert to RGBA internally? If so, why does it get angry at me when I feed in Bitmaps?
Make these recognizers either pay attention to cropRect, or state that they don't and provide another way to tell these recognizers to work on a subset of the image, so that I can get the performance (reliability and speed) that one would expect out of having to ML correlate/transform/whatever a smaller image.
--Chris

what is COLOR_FormatYUV420Flexible?

I wish to encode video/avc on my android encoder. The encoder (Samsung S5) publishes COLOR_FormatYUV420Flexible as one of its supported formats. yay!
but I dont quite understand what it is and how I can use it. the docs say:
Flexible 12 bits per pixel, subsampled YUV color format with 8-bit chroma and luma components.
Chroma planes are subsampled by 2 both horizontally and vertically. Use this format with Image. This format corresponds to YUV_420_888, and can represent the COLOR_FormatYUV411Planar, COLOR_FormatYUV411PackedPlanar, COLOR_FormatYUV420Planar, COLOR_FormatYUV420PackedPlanar, COLOR_FormatYUV420SemiPlanar and COLOR_FormatYUV420PackedSemiPlanar formats
This seems to suggest that I can use this const with just about any kind of YUV data: planer, semi-planer, packed etc. this seems unlikely: how would the encoder know how to interpret the data unless I specify exactly where the U/V values are?
is there any meta-data that I need to provide in addition to this const? does it just work?
Almost, but not quite.
This constant can be used with almost any form of YUV (both planar, semiplanar, packed and all that). But, the catch is, it's not you who can choose the layout and the encoder has to support it - it's the other way around. The encoder will choose the surface layout and will describe it via the flexible description, and you need to support it, whichever one it happens to be.
In practice, when using this, you don't call getInputBuffers() or getInputBuffer(int index), you call getInputImage(int index), which returns an Image, which contains pointers to the start of the three planes, and their row and pixel strides.
Note - when calling queueInputBuffer afterwards, you have to supply a size parameter, which can be tricky to figure out - see https://stackoverflow.com/a/35403738/3115956 for more details on that.

achieve better recognition results via training tesseract

I have a question regarding achieving better recognition results with tesseract. I am using tesseract to recognize serial numbers. The serial numbes consist of only one font-type, characters A-Z, 0-9 and occur in different sizes and lengths.
At the moment I am able to recognize about 40% of the serial number images correct. Images are taken via mobile phone camera. Therefore the image quality isn't the best.
Special problem characters are 8/B, 5/6. Since I am recognizing only serial numbers, I am not using any dictionary improvements and every character is recognized independently.
My question is: Does someone has already experience in achieving better recognition results with training tesseract? How many images would be needed to be able to get good results.
For training tesseract should I use printed and afterwards photographed serial numbers, or should I use original digital serial numbers, without printing and photographing?
Maybe somebody has already experience in that kind of area.
Regarding training tesseract: I have already trained tesseract with some images. Therefore I have printed all characters in different sizes, photographed and labeled them correctly. Example training photo of the character 5
Is this a good/bad training example? Since I only want to recognize single characters without any dependency, I though I don't have to use words for training.
Actual I only have trained with 3 of these images for the characters B 8 6 5 which doesn't result in a better recognition in comparison with the original english (eng) tesseract database.
best regards,
Christoph
I am currently working on a Sikuli application using Tesseract to read text (Strings and numbers) from screenshots. I found that the best way to achieve accuracy was to process the screenshot before performing the OCR on it. However, most of the text I am reading is green text-on black background, making this my preferred solution. I used Scalr's method within BufferedImage to increase the size of the image:
BufferedImage bufImg = Scalr.resize(...)
which instantly yielded more accurate results with black text on gray background. I then used BufferedImage's options BufferedImage.TYPE_BYTE_GRAY and BufferedImage.TYPE_BYTE_BINARY when creating a new BufferedImage to process the Image to grayscale and black/white, respectively.
Following these steps brought Tesseract's accuracy from a 30% to around an 85% when dealing with green text on black background, and a really-close-to-100% accuracy when dealing with normal black text on white background. (sometimes letters within a word are mistaken by numbers i.e. hel10)
I hope this helps!