Bounding boxes around characters for tesseract 4.0.0-beta.1 - ocr

I am trying to do number plate recognition using tesseract 4.0.0-beta.1. In tesseract documentation, it is told to create box files in the form . I tried using "makebox" function. But, it is not detecting every character properly. Then, somewhere i read that this function is for version 3.x.
I later tried "wordstrbox" function. But the box file which is created in this way is empty. Can someone tell me how to create box files for tesseract 4.0.0-beta.1.

Use pytesseract.image_to_data()
import pytesseract
import cv2
from pytesseract import Output
img = cv2.imread('image.jpg')
d = pytesseract.image_to_data(img, output_type=Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
(text,x,y,w,h) = (d['text'][i],d['left'][i],d['top'][i],d['width'][i],d['height'][i])
cv2.rectangle(img, (x,y), (x+w,y+h) , (0,255,0), 2)
cv2.imshow('img',img)
cv2.waitkey(0)
Among the data returned by pytesseract.image_to_data():
left is the distance from the upper-left corner of the bounding box,
to the left border of the image.
top is the distance from the upper-left corner of the bounding box,
to the top border of the image.
width and height are the width and height of the bounding box.
conf is the model's confidence for the prediction for the word within
that bounding box. If conf is -1, that means that the corresponding
bounding box contains a block of text, rather than just a single
word.
The bounding boxes returned by pytesseract.image_to_boxes() enclose letters so I believe pytesseract.image_to_data() is what you're looking for.

I've found AlfyFaisy's answer very helpful and just wanted to share the code to view the bounding boxes of single characters. The differences regard the keys of the dictionary that is output by the image_to_boxes method:
import pytesseract
import cv2
from pytesseract import Output
img = cv2.imread('image.png')
height = img.shape[0]
width = img.shape[1]
d = pytesseract.image_to_boxes(img, output_type=Output.DICT)
n_boxes = len(d['char'])
for i in range(n_boxes):
(text,x1,y2,x2,y1) = (d['char'][i],d['left'][i],d['top'][i],d['right'][i],d['bottom'][i])
cv2.rectangle(img, (x1,height-y1), (x2,height-y2) , (0,255,0), 2)
cv2.imshow('img',img)
cv2.waitKey(0)
At least on my machine (Python 3.6.8, cv2 4.1.0) the cv2 method is waitKey(0) with a capital K.
This is the output I got:

Related

Selenium, using find_element but end up with half the website

I finished the linked tutorial and tried to modify it to get somethings else from a different website. I am trying to get the margin table of HHI but the website is coded in a strange way that I am quite confused.
I find the child element of the parent that have the text with xpath://a[#name="HHI"], its parent is <font size="2"></font> and contains the text I wanted but there is a lot of tags named exactly <font size="2"></font> so I can't just use xpath://font[#size="2"].
Attempt to use the full xpath would print out half of the website content.
the full xpath:
/html/body/table/tbody/tr/td/table/tbody/tr/td/table/tbody/tr[3]/td/pre/font/table/tbody/tr/td[2]/pre/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font
Is there anyway to select that particular font tag and print the text?
website:
https://www.hkex.com.hk/eng/market/rm/rm_dcrm/riskdata/margin_hkcc/merte_hkcc.htm
Tutorial
https://www.youtube.com/watch?v=PXMJ6FS7llk&t=8740s&ab_channel=freeCodeCamp.org
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
import pandas as pd
# prepare it to automate
from datetime import datetime
import os
import sys
import csv
application_path = os.path.dirname(sys.executable) # export the result to the same file as the executable
now = datetime.now() # for modify the export name with a date
month_day_year = now.strftime("%m%d%Y") # MMDDYYYY
website = "https://www.hkex.com.hk/eng/market/rm/rm_dcrm/riskdata/margin_hkcc/merte_hkcc.htm"
path = "C:/Users/User/PycharmProjects/Automate with Python – Full Course for Beginners/venv/Scripts/chromedriver.exe"
# headless-mode
options = Options()
options.headless = True
service = Service(executable_path=path)
driver = webdriver.Chrome(service=service, options=options)
driver.get(website)
containers = driver.find_element(by="xpath", value='') # or find_elements
hhi = containers.text # if using find_elements, = containers[0].text
print(hhi)
Update:
Thank you to Conal Tuohy, I learn a few new tricks in Xpath. The website is written in a strange way that even with the Xpath that locate the exact font tag, the result would still print all text in every following tags.
I tried to make a list of different products by .split("Back to Top") then slice out the first item and use .split("\n"). I will .split() the lists within list until it can neatly fit into a dataframe with strike prices as index and maturity date as column.
Probably not the most efficient way but it works for now.
product = "HHI"
containers = driver.find_element(by="xpath", value=f'//font[a/#name="{product}"]')
hhi = containers.text.split("Back to Top")
# print(hhi)
hhi1 = hhi[0].split("\n")
df = pd.DataFrame(hhi1)
# print(df)
df.to_csv(f"{product}_{month_day_year}.csv")
You're right that HTML is just awful! But if you're after the text of the table, it seems to me you ought to select the text node that follows the B element that follows the a[#name="HHI"]; something like this:
//a[#name="HHI"]/following-sibling::b/following-sibling::text()[1]
EDIT
Of course that XPath won't work in Selenium because it identifies a text node rather than an element. So your best result is to return the font element that directly contains the //a[#name="HHI"], which will include some cruft (the Back to Top link, etc) but which will at least contain the tabular data you want:
//a[#name="HHI"]/parent::font
i.e. "the parent font element of the a element whose name attribute equals HHI"
or equivalently:
//font[a/#name="HHI"]
i.e. "the font element which has, among its child a elements, one whose name attribute equals HHI"

How to determine the diference in two images for a particular land use type

I am working on 2 images, image-1 is a xarray DataArray, image-2 is a raster .tif data. I want to overlay the 2 data to see the land use types (image-2) that falls within a particular value in the xarray (image-1). Below is my code:
import netCDF4 as nc
import xarray as xr
import rasterio
import rioxarray
#import the dataset
era_5 = (r'F:\2ND_ARTICLE_II\ERA-5\ERA-5_All_Nigin.nc')
era_5 = xr.open_dataset(era_5)
era_5 = era_5['tp']
#import the tiff
lulc1 = rioxarray.open_rasterio(r'F:\2ND_ARTICLE_II\LULC\lulc_clp_Nig.tif', masked=True)
Now my question is how to determine the image deference that corresponds to a particular land use type between the two images.

Is there an attribute 'fit-to-page' in add_picture() using python docx

I have added a picture in a doc by using python docx. It looks good as long as it's small. But the picture goes next page or it's displayed half of it if the size is too big. How to make my picture 'fit-to-page'. I dont want to give any constants like Inches-5.5 or something.
p1 = doc.add_paragraph(' ')
pic = doc.add_picture(os.path.join(base_path, fi),
width=Inches(5.0))
para = doc.paragraphs[-1]
It's possible to get the text width, which is the page width minus the left and right margins, and pass this value to the width argument of add_picture().
An example of a function to get the text width is:
def get_text_width(document):
"""
Returns the text width in mm.
"""
section = document.sections[0]
return (section.page_width - section.left_margin - section.right_margin) / 36000
You can then call the function when adding a new picture:
r.add_picture(image, width=Mm(get_text_width(doc)))
If you need to add pictures in different sections of a document, it's necessary to improve the function to address this.
References:
How to change page size to A4 in python-docx
https://www.trichview.com/help/units_of_measurement.html#:~:text=English%20metric%20unit%20(EMU)%20is,%2C%201%20mm%20%3D%2036000%20EMU

Interpretation of yolov5 output

I am making a face mask detection project and I trained my model using ultralytics/yolov5.I saved the trained model as an onnx file, you can find the model file here model.onnx. Now I want you use this model.onnx with opencv to detect real time face mask. The input image size during training was 320*320. You can visualize this model using netron.
I have written this code to capture the image using webcam and pass it to model.onnx to predict my bounding boxes. The code is as follows:
def predict(img):
session = onnxruntime.InferenceSession(model_path)
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name
img = img.reshape((1,3,320,320))
data = json.dumps({'data':img.tolist()})
data = np.array(json.loads(data)['data']).astype('float32')
result = session.run([output_name],{input_name:data})
result = np.array(result)
print(result.shape)
The output of result.shape is (1, 1, 3, 40, 40, 85)
Can anyone help me in interpreting this shape and how can i use this result array to predict my class, bounding box and confidence.
I've never worked with a pure yolov5 model, but here's the output format for yolov5s. It looks like it should be similar.
ouput tensor structure (yolov5s):
output_tensor[a, b, c, d]
a -> image index (If you're input is a batch of images, this tells you which image's output you're looking at. If your input is just one image, leave this as 0.)
b -> index of image in batch
c -> information about bounding box
0, 1 -> x and y coordinate of bounding box center
2, 3 -> width and height of bounding box
4 -> bounding box confidence
5 - 85 -> single class confidences
d -> index of proposed bounding boxes

HTML Dec Code image in Tkinter label — either text or image is doubled

I'd like to add a picture to some of my tkinter labels, and I found a page with many of them (there are, of course, many similar pages), including some that I want.
But I'm having a strange behavior with this.
The code
import tkinter as tk
from tkinter import ttk
import html
root = tk.Tk()
root.geometry("200x100")
s = html.unescape('&#127937') # chequered flag
text = "some text"
label_text = "{}{}".format(text, s)
my_label = ttk.Label(root, text=label_text)
my_label.pack()
t = chr(9917)
another = "football ball"
another_text = "{}{}".format(t, another)
another_label = ttk.Label(root, text=another_text)
another_label.pack()
root.mainloop()
produces the following window:
On the other hand, if I replace label_text = "{}{}".format(text, s) with label_text = "{}{}".format(s, text) the flag appears twice instead (once before "some text" and another after).
Apparently this only happens with html images.
For example, with the second label, I have the expected behavior.
Is there something I'm doing wrong here, or should I just avoid these images in tkinter?
i wouldnt avoid them yet i wouldnt advise them either. Because tkinter propbably uses regular images its propbably not used to emojis. My recommendation is to use regular images instead of emojis.