tesseract didn't get the little labels - ocr

I've installed tesseract on my linux environment.
It works when I execute something like
# tesseract myPic.jpg /output
But my pic has some little labels and tesseract didn't see them.
Is an option is available to set a pitch or something like that ?
Example of text labels:
With this pic, tesseract doesn't recognize any value...
But with this pic:
I have the following output:
J8
J7A-J7B P7 \
2
40 50 0 180 190
200
P1 P2 7
110 110
\ l
For example, in this case, the 90 (on top left) is not seen by tesseract...
I think it's just an option to define or somethink like that, no ?
Thx

In order to get accurate results from Tesseract (as well as any OCR engine) you will need to follow some guidelines as can be seen in my answer on this post:
Junk results when using Tesseract OCR and tess-two
Here is the gist of it:
Use a high resolution image (if needed) 300 DPI is minimum
Make sure there is no shadows or bends in the image
If there is any skew, you will need to fix the image in code prior to ocr
Use a dictionary to help get good results
Adjust the text size (12 pt font is ideal)
Binarize the image and use image processing algorithms to remove noise
It is also recommended to spend some time training the OCR engine to receive better results as seen in this link: Training Tesseract
I took the 2 images that you shared and ran some image processing on them using the LEADTOOLS SDK (disclaimer: I am an employee of this company) and was able to get better results than you were getting with the processed images, but since the original images aren't the greatest - it still was not 100%. Here is the code I used to try and fix the images:
//initialize the codecs class
using (RasterCodecs codecs = new RasterCodecs())
{
//load the file
using (RasterImage img = codecs.Load(filename))
{
//Run the image processing sequence starting by resizing the image
double newWidth = (img.Width / (double)img.XResolution) * 300;
double newHeight = (img.Height / (double)img.YResolution) * 300;
SizeCommand sizeCommand = new SizeCommand((int)newWidth, (int)newHeight, RasterSizeFlags.Resample);
sizeCommand.Run(img);
//binarize the image
AutoBinarizeCommand autoBinarize = new AutoBinarizeCommand();
autoBinarize.Run(img);
//change it to 1BPP
ColorResolutionCommand colorResolution = new ColorResolutionCommand();
colorResolution.BitsPerPixel = 1;
colorResolution.Run(img);
//save the image as PNG
codecs.Save(img, outputFile, RasterImageFormat.Png, 0);
}
}
Here are the output images from this process:

Related

OCD digit recognition on a LCD display

I'm tring to get the digit from this LCD display:
LCD Display
I used pytesseract whit this code:
img = cv2.imread('1.png')
img = get_grayscale(img)
img = cv2.bitwise_not(img)
custom_config = r'--psm 7 --oem 3 -c tessedit_char_whitelist=0123456789'
print(pytesseract.image_to_string(img,lang='eng',config=custom_config)
But i didn't get a good result.
I also tried passing the cropped image:
Cropped LCD Display
I get better result but not good enough.
I tried also to insall other OCR (lie calamari-ocr, easyocr) module but i always get some different error while tring to install that.
What i can try?

Can I control the output image quality/size with Puppeteer export to PDF?

When using Puppeteer to print a page as PDF, Puppeteer may convert images in that page to a different format.
For example, printing a JPEG image will result in a PDF with (roughly) the same size as the image. That means Puppeteer is using the same exact JPEG image in the generated PDF. Same happens with other formats like PNG and SVG (the output size matches the size of the original images).
However, printing a WebP image will result in a PDF with a much bigger size (10x more that expected). This seems to be because Puppeteer is converting the WebP image into a JPEG/PNG image before generating the PDF.
I am guessing this is because WebP is not supported (maybe not even by the PDF standard and that may be the reason Puppeteer converts the WebP image in the first place).
Is there a way to control this image conversion? In particular, is it possible to set the target format (ideally JPEG) and quality (ideally < 100) to try to maintain the output size of the PDF in the same range as the input WebP image size?
It may help you to see at two levels what happens to images when saved as pdf, now understand this is a basic demo thus not real world but just by explanation of considerations.
Upper left we have 5x5 pixels so screen rendering uses a blurring to not show images as "sharp" but upper right a pdf viewer tries to maintain vector sharpness.
so what about different formats, GIF TIF and PNG (middle line) are lossless and behave in roughly similar fashion. All should maintain colour pixel fidelity in a PDF.
However, lower line, Jpeg is lousy at maintaining colour fidelity because it spreads the colours between adjoining pixels, which is "Perfect" for fuzzy text or photographs but not much good for PDF colours.
Ok moving on your focus is input to pdf so what do those look like when stored.
each may be written in many ways but let's focus on the most versatile PNG.
%PDF-1.0
1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj
2 0 obj<</Type/Pages/Count 1/Kids[3 0 R]>>endobj
3 0 obj<</Type/Page/MediaBox[0 0 3.75 3.75]/Rotate 0/Resources<</XObject<</Img3 6 0 R>>>>/Contents 5 0 R/Parent 2 0 R>>endobj
5 0 obj<</Length 34>>
stream
q
3.75 0 0 3.75 0 0 cm
/Img3 Do
Q
endstream
endobj
6 0 obj<</Length 75/Type/XObject/Subtype/Image/Width 5/Height 5/BitsPerComponent 8/SMask 7 0 R/ColorSpace/DeviceRGB>>
stream
ÿ ÿÿÿ   ÿÿÿ ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ ÿÿÿÿ###ÿÿÿ ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ ÿÿÿÿÿÿ ÿÿÿ
endstream
endobj
7 0 obj
<</Length 5/Type/XObject/Subtype/Image/Width 5/Height 5/BitsPerComponent 1/ColorSpace/DeviceGray>>
stream
ÿÿÿÿÿ
endstream
endobj
xref
0 7
0000000000 65536 f
0000000016 00000 n
0000000062 00000 n
0000000114 00000 n
0000000334 00000 n
0000000472 00000 n
0000000555 00000 n
trailer
<</Size 7/Info<</Producer(Me)>>/Root 1 0 R>>
startxref
684
%%EOF
Again, for illustration this is a non-typical stream as it shows the bitmap uncompressed but note the main image is defined by
6 0 obj<</Length 75/Type/XObject/Subtype/Image/Width 5/Height 5/BitsPerComponent 8/SMask 7 0 R/ColorSpace/DeviceRGB>>
So of interest it is 5 pixels wide by 5 pixels high NO hint of how many inches it's just 8 bits R, 8bits G, 8 bits B (again its only 3 colours) the Alpha is in a separate image (image Smask 7) so 3x5x5=75 is the uncompressed storage now we can compress many ways such as "Flate" (similar to say used in a zip file)
that will convert the stream from lots of ÿs into a more compacted form.
Again, there are many encodings so if we wish to keep the pdf as text for editing in a text editor first
/Length 72
/Filter [ /ASCIIHexDecode /FlateDecode ]
>>
stream
789cfbcfc0f0ffff7f8605601244a002b0888383038c892a091582e86500009c
663a28>
endstream
Well that was not much compression down from 75 to 72 !
let's use something better by not using plain text.
6 0 obj
<</Length 36/Type/XObject/Subtype/Image/Width 5/Height 5/BitsPerComponent 8/SMask 7 0 R/ColorSpace/DeviceRGB/Filter/FlateDecode>>
stream
xœû¯ ðÿÿ…`D °ˆƒƒŒ‰*©€P¤   ÞF;è
endstream
Ok much better we halved the storage from 72 down to 36 good its small compact and well formed.
So, what about keeping the jpeg structure ahhhh! that when maintaining its lousy nature needs 730
<</Filter/DCTDecode/Type/XObject/Subtype/Image/BitsPerComponent 8/Width 5/Height 5/ColorSpace/DeviceRGB/Length 730>>
stream
ÿØÿà JFIF ` ` ÿÛ C
ÿÛ C
ÿÀ " ÿÄ
ÿÄ µ } !1AQa "q2‘¡#B±ÁRÑð$3br‚
%&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyzƒ„…†‡ˆ‰Š’“”•–—˜™š¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖ×ØÙÚáâãäåæçèéêñòóôõö÷øùúÿÄ
ÿÄ µ w !1AQ aq"2B‘¡±Á #3RðbrÑ
$4á%ñ&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz‚ƒ„…†‡ˆ‰Š’“”•–—˜™š¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖ×ØÙÚâãäåæçèéêòóôõö÷øùúÿÚ ? ëÿ j¯Ú[Á_ >|×õ¯„šgŽ¡ñ–™ý¥mg¨.˜ƒJ
§é€Æ„鮬
´`©þ¬ ㌢Šõiæ¸ì,ëáèUq„*ÖŒRÑ%³I}Ëüõ<ìFŽ*£¯QËšVnÓšWi_E$—ÉÿÙ
endstream
endobj
So this test piece is not real world but may serve to help make decisions over best storage means for different inputs.
My preference is use PNG where possible for charts and document text and use Jpeg only when essential for photos or fuzzy OCR.
taking your offered sample jpeg is necessary but even set quality to high with size reduction from maximal can suffer collateral damage.
However, it's not very noticeable except you zoom in closer than blobby anyway here 4 X zoom
Source 58-59 KB
Slightly reduced 50-51 KB

Convert Autodesk Viewer Units to Inches

I am using the viewer with the Edit2D library and am trying to convert the length between two x and y points into real measurements.
For example, after a shape is drawn using the polygon tool, I want to get the length of the first edge.
I get the drawn shape and the first two points on the event shown below, get 2 points, and get the distance between them. It seems they are in Autodesk Units or something. Is there an easy way to convert the units to feet or inches?
I have found
Edit2DExtension.defaultContext.unitHandler.fromDisplayUnits()
as well as
Edit2DExtension.defaultContext.unitHandler.toDisplayUnits()
and also
Autodesk.Viewing.Private.convertUnits().
I've tried all three, but am unsure how to use them and haven't found any good results with them yet.
There may be a way to do it through Edit2d but I haven't found a way yet and there is next to no documentation I can find on this library.
beforeEdit2DAction(event) {
console.log('After Shape has been drawn -> ', event);
let shape = event.action.shape;
let pointA = shape._loops[0][0]; // Value: {x: 21.393766403198242, y: 20.934386880096092}
let pointB = shape._loops[0][1]; // Value: {x: 25.082155227661133, y: 20.934386880096092}
// Distance between 2 points (Assuming Autodesk units)
let length = Autodesk.Edit2D.Math2D.distance2D(pointA, pointB); // 3.6883888244628906
// Need to convert to real world units (preferably ft or inches)
}
The real length is 29.5 FEET
Any ideas, or comments are welcome! Thanks
Edit: Trying Petr's suggestion here's what it returned:
That's an interested question. The "unit handler" keeps track of two types of units:
layer units (Edit2DExtension.defaultContext.unitHandler.config.layerUnits, can be inch for example)
display units (Edit2DExtension.defaultContext.unitHandler.config.displayUnits)
These two properties control how the actual lengths and areas are displayed. For example, the unit handler's toDisplayUnits method is implemented like so:
toDisplayUnits(fromUnits, value) {
this.updateConfig();
return Autodesk.Viewing.Private.convertUnits(fromUnits, this.config.displayUnits, this.config.scaleFactor, value);
}
With that, configuring fromUnits and displayUnits (and scale) properly should give you the real measurements you need.

Cut ultrasound signal between specific values using Octave

I have an ultrasound wave (graph axes: Volt vs microsecond) and need to cut the signal/wave between two specific value to further analyze this clipping. My idea is to cut the signal between 0.2 V (y-axis). The wave is sine shaped as shown in the figure with the desired cutoff points in red
In my current code, I'm cutting the signal between 1900 to 4000 ms (x-axis) (Aa = A(1900:4000);) and then I want to make the aforementioned clipping and proceed with the code.
Does anyone know how I could do this y-axis clipping?
Thanks!! :)
clear
clf
pkg load signal
for k=1:2
w=1
filename=strcat("PCB 2.1 (",sprintf("%01d",k),").mat")
load(filename)
Lthisrun=length(A);
Pico(k,1:Lthisrun)=A;
Aa = A(1900:4000);
Ah= abs(hilbert(Aa));
step=100;
hold on
i=1;
Ac=0;
for index=1:step:3601
Ac(i+1)=Ac(i)+Ah(i);
i=i+1
r(k)=trapz(Ac)
end
end
ok, you want to just look at values 'above the noise' in your data. Or, in this case, 'clip out' everything below 0.2V. the easiest way to do this is with logical indexing. You can take an array and create a sub array eliminating everything that doesn't meet a certain logical condition. See this example:
f = #(x) sin(x)./x;
x = [-100:.1:100];
y = f(x);
plot(x,y);
figure;
x_trim = x(y>0.2);
y_trim = y(y>0.2);
plot(x_trim, y_trim);
From your question it looks like you want to do the clipping after applying the horizontal windowing from 1900-4000. (you say that that is in milliseconds, but your image shows the pulse being much sooner than 1900 ms). In any case, something like
Ab = Aa(Aa > 0.2);
will create another array Ab that will only contain the portions of Aa with values above 0.2. You may need to do something similar (see the example) for the horizontal axis if your x-data is not just the element index.

more minimaler cubism.js horizon chart from json example

Following up on a previous question... I've got my minimal horizon chart example much more minimaler than before ( minimal cubism.js horizon chart example (TypeError: callback is not a function) )
<body>
<div class="mag"></div>
<script type="text/javascript">
var myContext = cubism.context();
var myMetr = myContext.metric(function(start, stop, step, callback) {
d3.json("../json/600s.json.php?t0=" + start/1000 + "&t1=" + stop/1000 + "&ss=" + step/1000, function(er, dt) {
if (!dt) return callback(new Error("unable to load data, or has NaNs"));
callback(null, dt.val);
});
});
var myHoriz = myContext.horizon()
.metric(myMetr);
d3.select(".mag")
.call(myHoriz);
</script>
</body>
The d3.json() bit calls a server side .php that I've written that returns a .json version of my measurements. The .php takes the start, stop, step (which cubism's context.metric() uses) as the t0, t1, and ss items in its http query string and sends back a .json file. The divides by 1000 are because I made my .php expect parameters in s, not ms. And the dt.val is because the actual array of my measurements is in the "val" member of the json output, e.g.
{
"other":"unused members...",
"n":5,
"val":[
22292.078125,
22292.03515625,
22292.005859375,
22292.02734375,
22292.021484375
]
}
The problem is, now that I've got it pared down to (I think) the bare minimum, AND I actually understand all of it instead of just pasting from other examples and hoping for the best (in which scenario, most things I try to change just break things instead of improving them), I need to start adding parameters and functions back to make it visually more useful.
Two problems first of all are, this measurement hovers all day around 22,300, and only varies +/- 10 maybe all day, so the graph is just a solid green rectangle, AND the label just says constantly "22k".
I've fixed the label with .format(d3.format(".3f")) (versus the default .2s which uses SI metric prefixes, thus the "22k" above).
What I can't figure out is how to use either axis, scale, extent, or what, so that this only shows a range of numbers that are relevant to the viewer. I don't actually care about the positive-green and negative-blue and darkening colours aspects of the horizon chart. I just used it as proof-of-concept to get the constantly-shifting window of measurements from my .json data source, but the part I really need to keep is the serverDelay, step, size, and such features of cubism.js that intelligently grab the initial window of data, and incrementally grab more via the .json requests.
So how do I keep the cubism bits I need, but usefully change my all-22300s graph to show the important +/- 10 units?
update re Scott Cameron's suggestion of horizon.extent([22315, 22320])... yes I had tried that and it had zero effect. Other things I've changed so far from "minimal" above...
var myHoriz = myContext.horizon()
.metric(myMetr)
.format(d3.format(".2f"))
.height(100)
.title("base1 (m): ")
.colors(["#08519c", "#006d2c"])
// .extent([22315, 22320]) // no effect with or without this line
;
I was able to improve the graph by using metric.subtract inserting it above the myHoriz line like so: (but it made the numerical label useless now):
var myMetr2 = myMetr.subtract(22315);
var myHoriz = myContext.horizon()
.metric(myMetr2)
.format...(continue as above)
All the examples seem so concise and expressive and work fine verbatim but so many of the tweaks I try to make to them seem to backfire, I'm not sure why that is. And similarly when I refer to the API wiki... maybe 4 out of 5 things I use from the API work immediately, but then I always seem to hit one that seems to have no effect, or breaks the chart completely. I'm not sure I've wrapped my head around how so many of the parameters being passed around are actually functions, for one thing.
Next hurdles after this scale/extent question, will be getting the horizontal time axis back (after having chopped it out to make things more minimal and easier to understand), and switching this from an area-looking graph to more of a line graph.
Anyway, all direction and suggestion appreciated.
Here's the one with the better vertical scale, but now the numerical label isn't what I want:
Have you tried horizon.extent? It lets you specify the [min, max] value for the horizon chart. By default, a linear scale will be created to map values within the extent to the pixels within the chart's height (specified with `horizon.height or default to 30 pixels).