Can I use pad_sequence with transformer in Pytorch? - deep-learning

I'm trying to use transformer to process some image data (not NLP data), e.g. 480 x 640 images with different sequence length, an example would be [6, 480, 640], [7, 480, 640], [8, 480, 640]. And I would like to put these three sequences into one batch.
However, most tutorials I saw use torchtext to deal with the non-fixed length problem. But since I run the transformer with my own dataset, torchtext is not applicable(is it?). After searching I find pad_sequence can be used to deal with this problem.
However I didn't find any tutorials about using pad_sequence with transformer. Is it applicable?Has anyone try it before?

Let's say we have 03 images with different dimensions. Applying pad_sequence function on them will result as follow:
import torch
from torch.nn.utils.rnn import pad_sequence
image_1 = torch.ones(25, 30)
image_2 = torch.ones(32, 30)
image_3 = torch.ones(29, 30)
images = pad_sequence([image_1, image_2, image_3])
print(images.size())
# torch.Size([32, 3, 30])
This remains the same if you are working with 3D images
import torch
from torch.nn.utils.rnn import pad_sequence
image_1 = torch.ones(25, 30, 50)
image_2 = torch.ones(32, 30, 50)
image_3 = torch.ones(29, 30, 50)
images = pad_sequence([image_1, image_2, image_3])
print(images.size())
# torch.Size([32, 3, 30, 50])
One thing you should be aware of with this function is that it only works when the images share the n - 1 dimensions. In other words, if you have something like this:
import torch
from torch.nn.utils.rnn import pad_sequence
image_1 = torch.ones(25, 30, 50)
image_2 = torch.ones(32, 50, 30)
image_3 = torch.ones(29, 31, 50)
images = pad_sequence([image_1, image_2, image_3])
# RuntimeError: The size of tensor a (50) must match the size of tensor b (30) at non-singleton dimension 2
print(images.size())
It won't work!
But anyways, since you're working with images, I suggest you to use the Pad transformation from torchvision. It works the same as the pad_sequence function but with more options. Just follow the doc.

Related

How to find closest embedding vectors?

I have 100K known embedding i.e.
[emb_1, emb_2, ..., emb_100000]
Each of this embedding is derived from GPT-3 sentence embedding with dimension 2048.
My task is given an embedding(embedding_new) find the closest 10 embedding from the above 100k embedding.
The way I am approaching this problem is brute force.
Every time a query asks to find the closest embeddings, I compare embedding_new with [emb_1, emb_2, ..., emb_100000] and get the similarity score.
Then I do quicksort of the similarity score to get the top 10 closest embedding.
Alternatively, I have also thought about using Faiss.
Is there a better way to achieve this?
I found a solution using Vector Database Lite (VDBLITE)
VDBLITE here: https://pypi.org/project/vdblite/
import vdblite
from time import time
from uuid import uuid4
import sys
from pprint import pprint as pp
if __name__ == '__main__':
vdb = vdblite.Vdb()
dimension = 12 # dimensions of each vector
n = 200 # number of vectors
np.random.seed(1)
db_vectors = np.random.random((n, dimension)).astype('float32')
print(db_vectors[0])
for vector in db_vectors:
info = {'vector': vector, 'time': time(), 'uuid': str(uuid4())}
vdb.add(info)
vdb.details()
results = vdb.search(db_vectors[10])
pp(results)
Looks like it uses FAISS behind the scene.
Using you own idea, just make sure that the embeddings are in a matrix form, you can easily use numpy for this.
This is computed in linear time (in num. of embeddings) and should be fast.
import numpy as np
k = 10 # k best embeddings
emb_mat = np.stack([emb_1, emb_2, ..., emb_100000])
scores = np.dot(emb_mat, embedding_new)
best_k_ind = np.argpartition(scores, k)[-k:]
top_k_emb = emb_mat[best_k_ind]
The 10 best embeddings will be found in top_k_emb.
For a general solution inside a software project you might consider Faiss by Facebook Research.
An example for using Faiss:
d = 2048 # dimensionality of your embedding data
k = 10 # number of nearest neighbors to return
index = faiss.IndexFlatIP(d)
emb_list = [emb_1, emb_2, ..., emb_100000]
index.add(emb_list)
D, I = index.search(embedding_new, k)
You can use IndexFlatIP for inner product similarity, or indexFlatL2 for Euclidian\L2-norm distance.
In order to bypass memory issues (data>1M) refer to this great infographic Faiss cheat sheet at slide num. 7

PyTorch: How to normalize a tensor when the image is cropped randomly?

Let's say we are working with the CIFAR-10 dataset and we want to apply some data augmentation and additionally normalize the tensors. Here is some reproducible code for this
from torchvision import transforms, datasets
import matplotlib.pyplot as plt
trafo = transforms.Compose([transforms.Pad(padding = 4, fill = 0, padding_mode = "constant"),
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomCrop(size = (32, 32)),
transforms.ToTensor(),
transforms.Normalize(mean = (0.0, 0.0, 0.0), std = (1.0, 1.0, 1.0))]
)
cifar10_full = datasets.CIFAR10(root = "CIFAR-10", train = True, transform = trafo, target_transform = None, download = True)
The normalization I chose so far would do nothing with the tensors since I put the mean and std to 0 and 1 respectively. According to the documentation of torchvision.transforms.Normalize, the provided means and standard deviations are for each channel of the input. However, the problem is that that I cannot calculate the mean across each channel because of some random flipping and cropping mean. Therefore, my idea was something along the following lines
trafo_1 = transforms.Compose([transforms.Pad(padding = 4, fill = 0, padding_mode = "constant"),
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomCrop(size = (32, 32)),
transforms.ToTensor()
)
cifar10_full = datasets.CIFAR10(root = "CIFAR-10", train = True, transform = trafo_1, target_transform = None, download = True)
Now I could calculate the mean across each channel of the input and then I wanted to normalize the tensors again. However, I cannot simply use transforms.Normalize() as cifar10_full is not the original dataset anymore, but how I could proceed instead? (One solution would be to simply fix the seed of the random generators, i.e use torch.manual_seed(0), but I would like to avoid this for now...)
The mean and std are not for each tensor, but from the whole dataset. What you are trying to do doesn't really matter, you just want a scale that is good enough for the whole data representation, there is no exact mean or std you will get, these are all random operations, just use the mean and std from the actual data, which is pretty much the standard.
First, try to calculate the mean and std of the dataset (try random sampling), and use that for normalization.
# Calculate the mean, std of the complete dataset
import glob
import cv2
import numpy as np
import tqdm
import random
# calculating 3 channel mean and std for image dataset
means = np.array([0, 0, 0], dtype=np.float32)
stds = np.array([0, 0, 0], dtype=np.float32)
total_images = 0
randomly_sample = 5000
for f in tqdm.tqdm(random.sample(glob.glob("dataset_path/**.jpg", recursive = True), randomly_sample)):
img = cv2.imread(f)
means += img.mean(axis=(0,1))
stds += img.std(axis=(0,1))
total_images += 1
means = means / (total_images * 255.)
stds = stds / (total_images * 255.)
print("Total images: ", total_images)
print("Means: ", means)
print("Stds: ", stds)
Just a simple scenario, do you think in actual testing or inference your images will be augmented this way too, probably not, you will have clean images which match closely with the mean and std from the clean version of the data, so it's useless to calculate mean and std (you can take few random samples), unless you want to apply TTA.
If you want to apply TTA too, then you can go ahead and run some augmentation on the images, do random sampling and take the mean and std of those images.

Pytthon 3,8 Pygame 2 in W10 cant save png file [duplicate]

I am making an image cropper using pygame as interface and opencv for image processing.
I have created function like crop(), colorfilter() etc but i load image as pygame.image.load() to show it on screen but when i perform crop() it is numpy.ndarray and pygame cannot load it getting error:
argument 1 must be pygame.Surface, not numpy.ndarray
how do i solve this problem. i need to blit() the cropped image. should save image and read it then delete it after its done as i want to apply more than one filters.
The following function converts a OpenCV (cv2) image respectively a numpy.array (that's the same) to a pygame.Surface:
import numpy as np
def cv2ImageToSurface(cv2Image):
if cv2Image.dtype.name == 'uint16':
cv2Image = (cv2Image / 256).astype('uint8')
size = cv2Image.shape[1::-1]
if len(cv2Image.shape) == 2:
cv2Image = np.repeat(cv2Image.reshape(size[1], size[0], 1), 3, axis = 2)
format = 'RGB'
else:
format = 'RGBA' if cv2Image.shape[2] == 4 else 'RGB'
cv2Image[:, :, [0, 2]] = cv2Image[:, :, [2, 0]]
surface = pygame.image.frombuffer(cv2Image.flatten(), size, format)
return surface.convert_alpha() if format == 'RGBA' else surface.convert()
See How do I convert an OpenCV (cv2) image (BGR and BGRA) to a pygame.Surface object for a detailed explanation of the function.

Invalid shape error plotting 2D image in python for MNIST Sign Language dataset having PIXEL VALUES AS COLUMNS

I have a MNIST Sign Language dataset with pixel values as columns.
I get the error when I try to plot an image at one of the indexes as follows:
#Training dataset
dfr = pd.read_csv("sign_mnist_train.csv")
X_train_orig = dfr.iloc[:,1:]
Y_train_orig = dfr['label']
#Testing dataset
dfe = pd.read_csv("sign_mnist_test.csv")
X_test_orig = dfe.iloc[:,1:]
Y_test_orig = dfe['label']
#shapes of dataset
print(dfr.shape) #(27455, 785)
print(dfe.shape) #(7172, 785)
#Example of a picture
index = 1
plt.imshow(X_train_orig.iloc[index])
#TypeError: Invalid shape (784,) for image data
Looks like the image you are trying to plot is a flattened one corresponding to [B, N], where N is 1x28x28, and B is 27455 which is your image size (27455, 784). This is fine if you want to feed it to a Linear layer of 784 long vector. To plot this image you have to reshape it to correspond to [27455, 1, 28, 28]. You can try this out:
image = X_train_orig.iloc[index]
image = np.reshape(image.values, (28, 28))
plt.imshow(image)

LSTM Timeseries recursive prediction converge to same value

I'm working on Timeseries sequence prediction using LSTM.
My goal is to use window of 25 past values in order to generate a prediction for the next 25 values. I'm doing that recursively:
I use 25 known values to predict the next value. Append that value as know value then shift the 25 values and predict the next one again until i have 25 new generated values (or more)
I'm using "Keras" to implement the RNN
Architecture:
regressor = Sequential()
regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))
regressor.add(Dropout(0.1))
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.1))
regressor.add(LSTM(units = 50))
regressor.add(Dropout(0.1))
regressor.add(Dense(units = 1))
regressor.compile(optimizer = 'rmsprop', loss = 'mean_squared_error')
regressor.fit(X_train, y_train, epochs = 10, batch_size = 32)
Problem:
Recursive prediction always converge to the some value no matter what sequence comes before.
For sure this is not what I want, I was expecting that the generated sequence will be different depending on what I have before and I'm wondering if someone have an idea about this behavior and how to avoid it. Maybe I'm doing something wrong ...
I tried different epochs number and didn't help much, actually more epochs made it worse. Changing Batch Size, Number of Units , Number of Layers , and window size didn't help too in avoiding this issue.
I'm using MinMaxScaler for the data.
Edit:
scaling new inputs for testing:
dataset_test = sc.transform(dataset_test.reshape(-1, 1))