Hello, i am looking for guidance. I have been using pytesseract to do OCR but it seems like i can't get the OCR to recognise a series of equal signs put together in an image. any guidance on how to address this issue ? i tested the image with AWS Rekognition, Google Vision and same results. I tried to select ROI with Open CV and focus the OCR on that, and yet it still came out empty, i.e. no character recognised. appreciate for any guidance.
thank you
Your text seems to be dificult to extract. Try to work on a full image when extracting text with tesseract.
I made one aproach to your solution but as you can see the bounding box for characters is not the expected.
This is the code:
import cv2
import numpy as np
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
originalImage = cv2.imread('a.png')
grayImage = cv2.cvtColor(originalImage, cv2.COLOR_BGR2GRAY)
(thresh, blackAndWhiteImageOriginal) = cv2.threshold(grayImage, 127, 255, cv2.THRESH_BINARY_INV)
blackAndWhiteImage = cv2.dilate(blackAndWhiteImageOriginal, np.ones((3,3), np.uint8))
ocr_output_details = pytesseract.image_to_data(blackAndWhiteImage, output_type=pytesseract.Output.DICT, config="--psm 7 -c tessedit_char_whitelist==")
rgbImage = cv2.cvtColor(blackAndWhiteImage,cv2.COLOR_GRAY2RGB)
for i in range(len(ocr_output_details['level'])):
(x, y, w, h) = (ocr_output_details['left'][i], ocr_output_details['top'][i], ocr_output_details['width'][i], ocr_output_details['height'][i])
cv2.rectangle(rgbImage, (x, y), (x + w, y + h), (0,0,255), 2)
print('Text: ', ocr_output_details['text'])
cv2.imshow('Boxes', rgbImage)
cv2.waitKey(0)
cv2.destroyAllWindows()
And the result:
Result 1
Using another apropiate full image with expected caracthers size i can extract equal symbols perfectly with tesseract.
This is the code:
import cv2
import numpy as np
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
originalImage = cv2.imread('b.jpg')
grayImage = cv2.cvtColor(originalImage, cv2.COLOR_BGR2GRAY)
(thresh, blackAndWhiteImageOriginal) = cv2.threshold(grayImage, 127, 255, cv2.THRESH_BINARY)
blackAndWhiteImage = cv2.erode(blackAndWhiteImageOriginal, np.ones((3,3), np.uint8))
img = originalImage
img_copy = img.copy()
gray = cv2.cvtColor(img_copy, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(gray, 180, 255, cv2.THRESH_BINARY)
results = pytesseract.image_to_data(thresh, config="-c tessedit_char_whitelist== --psm 6")
text = []
for b in map(str.split, results.splitlines()[1:]):
if len(b) == 12:
x, y, w, h = map(int, b[6: 10])
cv2.rectangle(originalImage, (x, y), (x + w, y + h), (255,0,0), 2)
cv2.putText(originalImage, b[11], (x, y + h + 15), cv2.FONT_HERSHEY_COMPLEX, 0.6, 0)
text.append(b[11])
print('Text: ', text)
cv2.imshow("Result", originalImage)
cv2.waitKey(0)
And the result:
Result 2
You can try to improve results using Tesseract documentation. Tesseract -Improving the quality of the output
Important things to do are:
Use white for the background and black for characters font color
Select desired tesseractpsm mode. In the previous cases i was using 6 and 7 psm modes to treat image as a single uniform block of text and treat the image as a single text line respectively
Try to use tessedit_char_whitelist config to specify only the characters that you are sarching for.
Related
I am trying to recreate the following graph in plotnine. It's asking me for more details but I don't want to distract from the example. I think it's pretty obvious what I'm trying to do. I have been given a function by a colleague. I'm not interested in rewriting the function. I want to take sm and use plotnine to plot it instead of matplotlib. I plot lots of dataframes with plotnine but I'm not sure how to use it in this case. I have tried on my own to figure it out and I keep getting lost. I hope that for someone more experienced I am overlooking something simple.
import matplotlib.pyplot as plt
def getSuccess(y,x):
return((y*(-x))*.5+.5)
steps = 100
stepSize = 1/steps
sm = []
for y in range(steps*2+1):
sm.append([getSuccess((y-steps)*stepSize,(x-steps)*stepSize) for x in range(steps*2+1)])
plt.imshow(sm)
plt.ylim(-1, 1)
plt.colorbar()
plt.yticks([0,steps,steps*2],[str(y) for y in [-1.0,0.0,1.0]])
plt.xticks([0,steps,steps*2],[str(x) for x in [-1.0,0.0,1.0]])
plt.show()
You could try geom_raster.
I have taken your synthetic data sm and converted to a dataframe as plotnine will need this.
import pandas as pd
import numpy as np
from plotnine import *
df = pd.DataFrame(sm).melt()
df.rename(columns={'variable':'x','value':'density'}, inplace=True)
df.insert(1,'y',df.index % 201)
p = (ggplot(df, aes('x','y'))
+ geom_raster(aes(fill='density'), interpolate=True)
+ labs(x=None,y=None)
+ scale_x_continuous(expand=(0,0), breaks=[0,100,200], labels=[-1,0,1])
+ scale_y_continuous(expand=(0,0), breaks=[0,100,200], labels=[-1,0,1])
+ theme_matplotlib()
+ theme(
text = element_text(family="Calibri", size=9),
legend_title = element_blank(),
axis_ticks = element_blank(),
legend_key_height = 29.6,
legend_key_width = 6,
)
)
p.save(filename='C:\\Users\\BRB\\geom_raster.png', height=10, width=10, units = 'cm', dpi=400)
This result is:
i would like to have the entries from the following vehicle registration document automatically written to a text file.
However, the text recognition is very difficult. I have tried to open the image in different configurations. I have also tested different colour levels of the vehicle registration document. However, none of my attempts yielded a usable result.
Does anyone have an idea how it would be possible to recognise the text properly?
This is the image i tried to ocr:
The Code i used is shown in the Following:
import cv2
import numpy as np
import pytesseract
import matplotlib.pyplot as plt
from PIL import Image
import regex
pytesseract.pytesseract.tesseract_cmd=r'C:\Program Files\Tesseract-OCR\tesseract.exe'
img = cv2.imread("Fahrzeugscheinsplit1.jpg")
result = pytesseract.image_to_string(img)
print(result)
My output is shown in here:
|
08.05.2006)'| 8566) ADVOOOO1X
ne r pear
a BORD 7 aoe \
‘BWY i
QUBB1 Repieee ay a f
TRAC |
| = say, |
is Mondeo ath }
FO! s 1
Fz.2.Pers, +b. 8 Spl. .
Kombilimousine
vo) EURO 4
«| BURO 4 ) Re !
» Diesel ES
ll 0002. WW 0d62. l2198 |
First, you should know the image-processing techniques for tesseract. From the official documentation you can apply simple-threshold.
If you apply simple thresholding, the result will be:
I think we should center the image for accurate recognition. We can center the image by adding borders:
The image is ready for text-extraction, if we process the image with the confidence > 30:
Nearly all the text in the given input image is detected. We can also print the values of the detected texts:
Detected Text: 08.05.2006
Detected Text: 8566!
Detected Text: M1
Detected Text: AC
Detected Text: 8
Detected Text: 6
Detected Text: FORD
Detected Text: BWY
Detected Text: SFHAP7
Detected Text: Mondeo
Detected Text: FORD
Detected Text: (D)
Detected Text: Pz.z.Pers.bef.b.
Detected Text: 8
Detected Text: Spl.
Detected Text: Kombilimousine
Detected Text: EURO
Detected Text: 4
Detected Text: EURO
Detected Text: 4
Detected Text: Diesel
Detected Text: 0002
Detected Text: 0462
Detected Text: 2198
Using simple thresholding we nearly found all the values correctly, for the missing parts you can play with the values like decreasing the confidence level or increasing the thresh level or using other threshold methods like adaptive-thresholding or inRange-thresholding
Code:
from cv2 import imread, cvtColor, COLOR_BGR2GRAY as GRAY
from cv2 import imshow, waitKey, rectangle, threshold, THRESH_BINARY as BINARY
from cv2 import copyMakeBorder as addBorder, BORDER_CONSTANT as CONSTANT
from pytesseract import image_to_data, Output
bgr = imread("UXvS7.jpg")
gray = cvtColor(bgr, GRAY)
border = addBorder(gray, 50, 50, 50, 50, CONSTANT, value=255)
thresh = threshold(border, 150, 255, BINARY)[1]
data = image_to_data(thresh, output_type=Output.DICT)
for i in range(0, len(data["text"])):
confidence = int(data["conf"][i])
if confidence > 30:
x = data["left"][i]
y = data["top"][i]
w = data["width"][i]
h = data["height"][i]
text = data["text"][i]
print(f"Detected Text: {text}")
rectangle(thresh, (x, y), (x + w, y + h), (0, 255, 0), 2)
imshow("", thresh)
waitKey(0)
I want to create GUI which should automatically clean data in csv file once selected and plot superimposed PDF & histogram graph. I have uploaded basic python program which generates the required graph but I am unbale to convert it into interface. I guess, only "open file" & "plot" buttons would suffice the requirement. image- want to retrieve data from 'N'th column (13) only with skipping top 4 rows
I am basically from metallurgy background and trying my hands in this field.
Any help would be greatly appreciated
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
raw_data = pd.read_csv("D:/Project/Python/NDC/Outlier_ND/800016_DAT.csv",skiprows=4,header=None)
clean = pd.DataFrame(raw_data)
data1 = clean.iloc[:, [13]]
Q1 = data1.quantile(0.25)
Q3 = data1.quantile(0.75)
IQR = Q3 - Q1
data_IQR = data1[~((data1 < (Q1 - 1.5 * IQR)) |(data1 > (Q3 + 1.5 * IQR))).any(axis=1)]
data_IQR.shape
print(data1.shape)
print(data_IQR.shape)
headerList = ['Actual_MR']
data_IQR.to_csv(r'D:\Project\Python\NDC\Outlier_ND\800016_DAT_IQR.csv', header=headerList, index=False)
data = pd.read_csv("D:/Project/Python/NDC/Outlier_ND/800016_DAT_IQR.csv")
mean, sd = norm.fit(data)
plt.hist(data, bins=25, density=True, alpha=0.6, facecolor = '#2ab0ff', edgecolor='#169acf', linewidth=0.5)
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = norm.pdf(x, mean, sd)
plt.plot(x, p, 'red', linewidth=2)
title = " Graph \n mean: {:.2f} and SD: {:.2f}".format(mean, sd)
plt.title(title)
plt.xlabel('MR')
plt.ylabel('Pr')
plt.show()
Following code demo how PySimpleGUI to work with matplotlib, detail please find all remark in script.
import math, random
from pathlib import Path
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg
from matplotlib.figure import Figure
import PySimpleGUI as sg
# 1. Define the class as the interface between matplotlib and PySimpleGUI
class Canvas(FigureCanvasTkAgg):
"""
Create a canvas for matplotlib pyplot under tkinter/PySimpleGUI canvas
"""
def __init__(self, figure=None, master=None):
super().__init__(figure=figure, master=master)
self.canvas = self.get_tk_widget()
self.canvas.pack(side='top', fill='both', expand=1)
# 2. create PySimpleGUI window, a fixed-size Frame with Canvas which expand in both x and y.
font = ("Courier New", 11)
sg.theme("DarkBlue3")
sg.set_options(font=font)
layout = [
[sg.Input(expand_x=True, key='Path'),
sg.FileBrowse(file_types=(("ALL CSV Files", "*.csv"), ("ALL Files", "*.*"))),
sg.Button('Plot')],
[sg.Frame("", [[sg.Canvas(background_color='green', expand_x=True, expand_y=True, key='Canvas')]], size=(640, 480))],
[sg.Push(), sg.Button('Exit')]
]
window = sg.Window('Matplotlib', layout, finalize=True)
# 3. Create a matplotlib canvas under sg.Canvas or sg.Graph
fig = Figure(figsize=(5, 4), dpi=100)
ax = fig.add_subplot()
canvas = Canvas(fig, window['Canvas'].Widget)
# 4. initial for figure
ax.set_title(f"Sensor Data")
ax.set_xlabel("X axis")
ax.set_ylabel("Y axis")
ax.set_xlim(0, 1079)
ax.set_ylim(-1.1, 1.1)
ax.grid()
canvas.draw() # do Update to GUI canvas
# 5. PySimpleGUI event loop
while True:
event, values = window.read()
if event in (sg.WINDOW_CLOSED, 'Exit'):
break
elif event == 'Plot':
"""
path = values['Path']
if not Path(path).is_file():
continue
"""
# 6. Get data from path and plot from here
ax.cla() # Clear axes first if required
ax.set_title(f"Sensor Data")
ax.set_xlabel("X axis")
ax.set_ylabel("Y axis")
ax.grid()
theta = random.randint(0, 359)
x = [degree for degree in range(1080)]
y = [math.sin((degree+theta)/180*math.pi) for degree in range(1080)]
ax.plot(x, y)
canvas.draw() # do Update to GUI canvas
# 7. Close window to exit
window.close()
This is my very first attempt at using Python. I normally use .NET, but to identify shapes in documents have turned to Python and OpenCV for image processing.
I am using OpenCV TemplateMatching (cv2.matchTemplate) to discover Regions of Interest (ROI) in my documents.
This works well. The template matches the ROI's and rectangles are placed, identifying the matches.
The ROI's in my images contain text which I also need to OCR and extract. I am trying to do this with Tesseract, but I think I am approaching it wrongly, based upon my results.
My process is this:
Run cv2.matchTemplate
Loop through matched ROI's
Add rectangle info. to image
Pass rectangle info. to Tesseract
Add text returned from tesseract to image
Write the final image
In the image below, you can see the matched regions (which are fine), but you can see that the text in the ROI doesn't match the text from tesseract (bottom right of ROI).
Please could someone take a look and advise where I am going wrong?
import cv2
import numpy as np
import pytesseract
import imutils
img_rgb = cv2.imread('images/pd2.png')
img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)
template = cv2.imread('images/matchMe.png', 0)
w, h = template.shape[::-1]
res = cv2.matchTemplate(img_gray, template, cv2.TM_CCOEFF_NORMED)
threshold = 0.45
loc = np.where(res >= threshold)
for pt in zip(*loc[::-1]):
cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0, 0, 255), 2)
roi = img_rgb[pt, (pt[0] + w, pt[1] + h)]
config = "-l eng --oem 1 --psm 7"
text = pytesseract.image_to_string(roi, config=config)
print(text)
cv2.putText(img_rgb, text, (pt[0] + w, pt[1] + h),
cv2.FONT_HERSHEY_SIMPLEX, 1.2, (0, 0, 255), 3)
cv2.imwrite('images/results.png', img_rgb)
There were two issues in your code:
1. You were modifying image (drawing rect) before OCR.
2. roi was not properly constructed.
img_rgb = cv2.imread('tess.png')
img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)
template = cv2.imread('matchMe.png', 0)
w, h = template.shape[::-1]
res = cv2.matchTemplate(img_gray, template, cv2.TM_CCOEFF_NORMED)
threshold = 0.45
loc = np.where(res >= threshold)
for pt in zip(*loc[::-1]):
roi = img_rgb[pt[1]:pt[1] + h, pt[0]: pt[0] + w]
config = "-l eng --oem 1 --psm 7"
text = pytesseract.image_to_string(roi, config=config)
print(text)
cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0, 0, 255), 2)
cv2.putText(img_rgb, text, (pt[0] + w, pt[1] + h),
cv2.FONT_HERSHEY_SIMPLEX, 1.2, (0, 0, 255), 3)
cv2.imwrite('results.png', img_rgb)
You might still have to feed tesseract even properly filtered image for any meaningful recognition. Hope this helps.
When I train and then test my model using Caffe's command line interface, I get e.g. 98.65% whereas when I myself write code(given below) to calculate accuracy from the same pre-trained model, I get e.g 98.1% using Caffe.Net.
Everything is straight forward and I have no idea what is causing the issue.
I also tried using Caffe.Classifier and its predict method, and yet get another lesser accuracy(i.e. 98.20%!)
Here is the snippet of code I wrote:
import sys
import caffe
import numpy as np
import lmdb
import argparse
from collections import defaultdict
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import itertools
from sklearn.metrics import roc_curve, auc
import random
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--proto', help='path to the network prototxt file(deploy)', type=str, required=True)
parser.add_argument('--model', help='path to your caffemodel file', type=str, required=True)
parser.add_argument('--mean', help='path to the mean file(.binaryproto)', type=str, required=True)
#group = parser.add_mutually_exclusive_group(required=True)
parser.add_argument('--db_type', help='lmdb or leveldb', type=str, required=True)
parser.add_argument('--db_path', help='path to your lmdb/leveldb dataset', type=str, required=True)
args = parser.parse_args()
predicted_lables=[]
true_labels = []
misclassified =[]
class_names = ['unsafe','safe']
count=0
correct = 0
batch=[]
plabe_ls=[]
batch_size = 50
cropx = 224
cropy = 224
i = 0
multi_crop = False
use_caffe_classifier = True
caffe.set_mode_gpu()
# Extract mean from the mean image file
mean_blobproto_new = caffe.proto.caffe_pb2.BlobProto()
f = open(args.mean, 'rb')
mean_blobproto_new.ParseFromString(f.read())
mean_image = caffe.io.blobproto_to_array(mean_blobproto_new)
f.close()
net = caffe.Classifier(args.proto, args.model,
mean = mean_image[0].mean(1).mean(1),
image_dims = (224, 224))
net1 = caffe.Net(args.proto, args.model, caffe.TEST)
net1.blobs['data'].reshape(batch_size, 3,224, 224)
data_blob_shape = net1.blobs['data'].data.shape
#check and see if its lmdb or leveldb
if(args.db_type.lower() == 'lmdb'):
lmdb_env = lmdb.open(args.db_path)
lmdb_txn = lmdb_env.begin()
lmdb_cursor = lmdb_txn.cursor()
for key, value in lmdb_cursor:
count += 1
datum = caffe.proto.caffe_pb2.Datum()
datum.ParseFromString(value)
label = int(datum.label)
image = caffe.io.datum_to_array(datum).astype(np.float32)
#key,image,label
#buffer n image
if(count % 5000 == 0):
print('{0} samples processed so far'.format(count))
if(i < batch_size):
i+=1
inf= key,image,label
batch.append(inf)
#print(key)
if(i >= batch_size):
#process n image
ims=[]
for x in range(len(batch)):
img = batch[x][1]
#img has c,w,h shape! its already gone through transpose and channel swap when it was being saved into lmdb!
#Method III : use center crop just like caffe does in test time
if (use_caffe_classifier != True):
#center crop
c,w,h = img.shape
startx = h//2 - cropx//2
starty = w//2 - cropy//2
img = img[:, startx:startx + cropx, starty:starty + cropy]
#transpose the image so we can subtract from mean
img = img.transpose(2,1,0)
img -= mean_image[0].mean(1).mean(1)
#transpose back to the original state
img = img.transpose(2,1,0)
ims.append(img)
else:
ims.append(img.transpose(2,1,0))
if (use_caffe_classifier != True):
net1.blobs['data'].data[...] = ims[:]
out_1 = net1.forward()
plabe_ls = out_1['pred']
else:
out_1 = net.predict(np.asarray(ims), oversample=multi_crop)
plabe_ls = out_1
plbl = np.asarray(plabe_ls)
plbl = plbl.argmax(axis=1)
for j in range(len(batch)):
if (plbl[j] == batch[j][2]):
correct+=1
else:
misclassified.append(batch[j][0])
predicted_lables.append(plbl[j])
true_labels.append(batch[j][2])
batch.clear()
i = 0
sys.stdout.write("\rAccuracy: %.2f%%" % (100.*correct/count))
sys.stdout.flush()
print(", %i/%i corrects" % (correct, count))
What is causing this difference in accuracies ?
More information :
I am using Python3.5 on windows.
I read images from an lmdb dataset.
The images have 256x256 and center cropped with the size 224x224.
It is finetuned on GoogleNet.
For the Caffe.predict to work well I had to change classify.py
In training, I just use Caffes defaults, such as random crops at training and center crop at test-time.
Changes:
changed line 35 to:
self.transformer.set_transpose(in_, (2, 1, 0))
and line 99 to :
predictions = predictions.reshape((len(predictions) // 10, 10, -1))
1) First off, you need to revert Line 35 (32?) of classify.py: self.transformer.set_transpose(in_, (2, 1, 0)) back to the original
self.transformer.set_transpose(in_, (2, 0, 1)). So it expects HWC and transforms internally to CHW for downstream processing.
2) Run your Classifier branch as it is. You're likely to get a bad result. Please check this. If so, it means the image database is not CWH as you've commented, but actually CHW. After you've confirmed this, make the change to your Classifier branch: ims.append(img.transpose(2,1,0)) to become ims.append(img.transpose(1,2,0)). Re-test your Classifier branch. The result should be 98.2% (goto Step 3) or 98.65% (try Step 4).
3) If your result in Step 3 is 98.2%, also undo your the second change to classify.py. Theoretically, as your images have even height/width so // and / should have no difference. If it does differ or crashes, something is seriously wrong with your image database -- your assumption of the image size is incorrect. You need to check these. They could be off by a pixel or so, and could explain the slight discrepancies in accuracy.
4) If your result in Step 3 is 98.65%, then you need to make changes to the Caffe.Net branch of your code. The database images are CHW, so you need to make the first transpose: img = img.transpose(1,2,0) and the second transpose after mean subtraction to img = img.transpose(2,0,1). Then run your Caffe.Net branch. If you still get 98.1% as before, you should check that mean subtraction is performed correctly by your network.
In Steps (2) and (4), it's possible to get worse results, which means that the problem is likely a difference in mean subtraction for your trained Net vs your expectations in Python code. Check this.
About your 98.2% for the caffe.Classifier:
If you look at lines 78 - 80, the center crop is done along crop_dims , not img_dims. If you further look at line 42 on the caffe.Classifier constructor, the crop_dims are never user-determined. It's determined by the size of the Net's input blobs. Lastly, it you look at line 70, the img_dims are used to resize the images prior to center cropping. So what's happening with your setup is: a) The images are first getting resized to 224 x 224, then uselessly getting center cropped to 224 x 224 ( I assume this is the HxW for your Net ). You obviously will get results poorer than 98.65%. What you need to do is to change the img_dims = (256, 256). That prevents resizing. The crop will be picked up automatically from your Net and you should get your 98.65%.