I came across the problem of reading phone numbers in images. I tried to detect them in an image using the tesseract, but sometimes it gives me a wrong answer. For example, the number is 8 995 005-81-86, but tesseract gives me 8 995 0005-81-86 as an output. How can I fix it? Maybe binarizing?
Code is basic
import pytesseract as pt
from PIL import Image
img = Image.open('1.png')
number = pt.image_to_string(img)
print(number)
https://i.stack.imgur.com/kvhAq.png
You should pass a black on white text for best results:
import cv2
from PIL import Image
img = cv2.imread('kvhAq.png')
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(img, 100, 255, cv2.THRESH_BINARY)
im = Image.fromarray(thresh.astype("uint8"))
print(pytesseract.image_to_string(im))
Related
I have 100K known embedding i.e.
[emb_1, emb_2, ..., emb_100000]
Each of this embedding is derived from GPT-3 sentence embedding with dimension 2048.
My task is given an embedding(embedding_new) find the closest 10 embedding from the above 100k embedding.
The way I am approaching this problem is brute force.
Every time a query asks to find the closest embeddings, I compare embedding_new with [emb_1, emb_2, ..., emb_100000] and get the similarity score.
Then I do quicksort of the similarity score to get the top 10 closest embedding.
Alternatively, I have also thought about using Faiss.
Is there a better way to achieve this?
I found a solution using Vector Database Lite (VDBLITE)
VDBLITE here: https://pypi.org/project/vdblite/
import vdblite
from time import time
from uuid import uuid4
import sys
from pprint import pprint as pp
if __name__ == '__main__':
vdb = vdblite.Vdb()
dimension = 12 # dimensions of each vector
n = 200 # number of vectors
np.random.seed(1)
db_vectors = np.random.random((n, dimension)).astype('float32')
print(db_vectors[0])
for vector in db_vectors:
info = {'vector': vector, 'time': time(), 'uuid': str(uuid4())}
vdb.add(info)
vdb.details()
results = vdb.search(db_vectors[10])
pp(results)
Looks like it uses FAISS behind the scene.
Using you own idea, just make sure that the embeddings are in a matrix form, you can easily use numpy for this.
This is computed in linear time (in num. of embeddings) and should be fast.
import numpy as np
k = 10 # k best embeddings
emb_mat = np.stack([emb_1, emb_2, ..., emb_100000])
scores = np.dot(emb_mat, embedding_new)
best_k_ind = np.argpartition(scores, k)[-k:]
top_k_emb = emb_mat[best_k_ind]
The 10 best embeddings will be found in top_k_emb.
For a general solution inside a software project you might consider Faiss by Facebook Research.
An example for using Faiss:
d = 2048 # dimensionality of your embedding data
k = 10 # number of nearest neighbors to return
index = faiss.IndexFlatIP(d)
emb_list = [emb_1, emb_2, ..., emb_100000]
index.add(emb_list)
D, I = index.search(embedding_new, k)
You can use IndexFlatIP for inner product similarity, or indexFlatL2 for Euclidian\L2-norm distance.
In order to bypass memory issues (data>1M) refer to this great infographic Faiss cheat sheet at slide num. 7
I'm using the CIFAR-10 pre-trained VAE from lightning-bolts. It should be able to regenerate images with the quality shown on this picture taken from the docs (LHS are the real images, RHS are the generated)
However, when I write a simple script that loads the model, the weights, and tests it over the training set, I get a much worse reconstruction (top row are real images, bottom row are the generated ones):
Here is a link to a self-contained colab notebook that reproduces the steps I've followed to produce the pictures.
Am I doing something wrong on my inference process? Could it be that the weights are not as "good" as the docs claim?
Thanks!
First, the image from the docs you show is for the AE, not the VAE. The results for the VAE look much worse:
https://pl-bolts-weights.s3.us-east-2.amazonaws.com/vae/vae-cifar10/vae_output.png
Second, the docs state "Both input and generated images are normalized versions as the training was done with such images." So when you load the data you should specify normalize=True. When you plot your data, you will need to 'unnormalize' the data as well:
from pl_bolts.datamodules import CIFAR10DataModule
from pl_bolts.models.autoencoders import VAE
from pytorch_lightning import Trainer
import matplotlib.pyplot as plt
import numpy as np
import torch
from torchvision import transforms
torch.manual_seed(17)
np.random.seed(17)
vae = VAE(32, lr=0.00001)
vae = vae.from_pretrained("cifar10-resnet18")
dm = CIFAR10DataModule(".", normalize=True)
dm.prepare_data()
dm.setup("fit")
dataloader = dm.train_dataloader()
print(dm.default_transforms())
mean = torch.tensor(dm.default_transforms().transforms[1].mean)
std = torch.tensor(dm.default_transforms().transforms[1].std)
unnormalize = transforms.Normalize((-mean / std).tolist(), (1.0 / std).tolist())
X, _ = next(iter(dataloader))
vae.eval()
X_hat = vae(X)
fig, axes = plt.subplots(2, 10, figsize=(10, 2))
for i in range(10):
ax_real = axes[0][i]
ax_real.imshow(np.transpose(unnormalize(X[i]), (1, 2, 0)))
ax_real.get_xaxis().set_visible(False)
ax_real.get_yaxis().set_visible(False)
ax_gen = axes[1][i]
ax_gen.imshow(np.transpose(unnormalize(X_hat[i]).detach().numpy(), (1, 2, 0)))
ax_gen.get_xaxis().set_visible(False)
ax_gen.get_yaxis().set_visible(False)
Which gives something like this:
Without normalization it looks like:
What is the most effective way to concatenate 4 corner, shown in this photo ?
(conducting in getitem())
left_img = Image.open('image.jpg')
...
output = right_img
This is how I would do it.
Firstly I would convert the image to a Tensor Image temporarily
from torchvision import transforms
tensor_image = transforms.ToTensor()(image)
Now assuming you have a 3 channel image (although similiar principles apply to any matrices of any number of channels including 1 channel gray scale images).
You can find the Red channel with tensor_image[0] the Green channel with tensor_image[1] and the the Blue channel with tensor_image[2]
You can make a for loop iterating through each channel like
for i in tensor_image.size(0):
curr_channel = tensor_image[i]
Now inside that for loop with each channel you can extract the
First corner pixel with float(curr_channel[0][0])
Last top corner pixel with float(curr_channel[0][-1])
Bottom first pixel with float(curr_channel[-1][0])
Bottom and last pixel with float(curr_channel[-1][-1])
Make sure to convert all the pixel values to float or double values before this next appending step
Now you have four values that correspond to the corner pixels of each channel
Then you can make a list called new_image = []
You can then append the above mentioned pixel values using
new_image.append([[curr_channel[0][0], curr_channel[0][-1]], [curr_channel[-1][0], curr_channel[-1][-1]]])
Now after iterating through every channel you should have a big list that contains three (or tensor_image.size(0)) number of lists of lists.
Next step is to convert this list of lists of lists to a torch.tensor by running
new_image = torch.tensor(new_image)
To make sure everything is right new_image.size() should return torch.Size([3, 2, 2])
If that is the case you now have your wanted image but it is tensor format.
The way to convert it back to PIL is to run
final_pil_image = transforms.ToPILImage()(new_image)
If everything went good, you should have a pil image that fulfills your task. The only code it uses is clever indexing and one for loop.
There is a possibility however if you look more than I can, then you can avoid using a for loop and perform operations on all the channels without the loop.
Sarthak Jain
I don't know how quick this is but here:
import numpy as np
img = np.array(Image.open('image.jpg'))
w, h = img.shape[0], image.shape[1]
# the window size:
r = 4
upper_left = img[:r, :r]
lower_left = img[h-r:, :r]
upper_right = img[:r, w-r:]
lower_right = img[h-r:, w-r:]
upper_half = np.concatenate((upper_left, upper_right), axis=1)
lower_half = np.concatenate((lower_left, lower_right), axis=1)
img = np.concatenate((upper_half, lower_half))
or short:
upper_half = np.concatenate((img[:r, :r], img[:r, w-r:]), axis=1)
lower_half = np.concatenate((img[h-r:, :r], img[h-r:, w-r:]), axis=1)
img = np.concatenate((upper_half, lower_half))
I am making an image cropper using pygame as interface and opencv for image processing.
I have created function like crop(), colorfilter() etc but i load image as pygame.image.load() to show it on screen but when i perform crop() it is numpy.ndarray and pygame cannot load it getting error:
argument 1 must be pygame.Surface, not numpy.ndarray
how do i solve this problem. i need to blit() the cropped image. should save image and read it then delete it after its done as i want to apply more than one filters.
The following function converts a OpenCV (cv2) image respectively a numpy.array (that's the same) to a pygame.Surface:
import numpy as np
def cv2ImageToSurface(cv2Image):
if cv2Image.dtype.name == 'uint16':
cv2Image = (cv2Image / 256).astype('uint8')
size = cv2Image.shape[1::-1]
if len(cv2Image.shape) == 2:
cv2Image = np.repeat(cv2Image.reshape(size[1], size[0], 1), 3, axis = 2)
format = 'RGB'
else:
format = 'RGBA' if cv2Image.shape[2] == 4 else 'RGB'
cv2Image[:, :, [0, 2]] = cv2Image[:, :, [2, 0]]
surface = pygame.image.frombuffer(cv2Image.flatten(), size, format)
return surface.convert_alpha() if format == 'RGBA' else surface.convert()
See How do I convert an OpenCV (cv2) image (BGR and BGRA) to a pygame.Surface object for a detailed explanation of the function.
I have a plotly scatter plot with hover text for each point. The hover function works if I plot in circle, otherwise the hover function is not working. (for example if I plot in square, the hover text is not showing)
received error message from browser: Uncaught TypeError: Cannot read property ....... at Object.hoverPoints
Don't really know, what you're doing wrong here, but it works for me as expected:
Code:
import plotly.plotly as py
import plotly.graph_objs as go
import plotly.offline as pyo
# Create random data with numpy
import numpy as np
N = 1000
random_x = np.random.randn(N)
random_y = np.random.randn(N)
# Create a trace
trace = go.Scatter(
x = random_x,
y = random_y,
mode = 'markers',
marker = ({'symbol':'triangle-left'})
)
data = [trace]
pyo.plot(data)
Reference to marker styling
https://plot.ly/python/reference/#scatter-marker-symbol