Pretrained lightning-bolts VAE not doing proper inference on training dataset - deep-learning

I'm using the CIFAR-10 pre-trained VAE from lightning-bolts. It should be able to regenerate images with the quality shown on this picture taken from the docs (LHS are the real images, RHS are the generated)
However, when I write a simple script that loads the model, the weights, and tests it over the training set, I get a much worse reconstruction (top row are real images, bottom row are the generated ones):
Here is a link to a self-contained colab notebook that reproduces the steps I've followed to produce the pictures.
Am I doing something wrong on my inference process? Could it be that the weights are not as "good" as the docs claim?
Thanks!

First, the image from the docs you show is for the AE, not the VAE. The results for the VAE look much worse:
https://pl-bolts-weights.s3.us-east-2.amazonaws.com/vae/vae-cifar10/vae_output.png
Second, the docs state "Both input and generated images are normalized versions as the training was done with such images." So when you load the data you should specify normalize=True. When you plot your data, you will need to 'unnormalize' the data as well:
from pl_bolts.datamodules import CIFAR10DataModule
from pl_bolts.models.autoencoders import VAE
from pytorch_lightning import Trainer
import matplotlib.pyplot as plt
import numpy as np
import torch
from torchvision import transforms
torch.manual_seed(17)
np.random.seed(17)
vae = VAE(32, lr=0.00001)
vae = vae.from_pretrained("cifar10-resnet18")
dm = CIFAR10DataModule(".", normalize=True)
dm.prepare_data()
dm.setup("fit")
dataloader = dm.train_dataloader()
print(dm.default_transforms())
mean = torch.tensor(dm.default_transforms().transforms[1].mean)
std = torch.tensor(dm.default_transforms().transforms[1].std)
unnormalize = transforms.Normalize((-mean / std).tolist(), (1.0 / std).tolist())
X, _ = next(iter(dataloader))
vae.eval()
X_hat = vae(X)
fig, axes = plt.subplots(2, 10, figsize=(10, 2))
for i in range(10):
ax_real = axes[0][i]
ax_real.imshow(np.transpose(unnormalize(X[i]), (1, 2, 0)))
ax_real.get_xaxis().set_visible(False)
ax_real.get_yaxis().set_visible(False)
ax_gen = axes[1][i]
ax_gen.imshow(np.transpose(unnormalize(X_hat[i]).detach().numpy(), (1, 2, 0)))
ax_gen.get_xaxis().set_visible(False)
ax_gen.get_yaxis().set_visible(False)
Which gives something like this:
Without normalization it looks like:

Related

How to find closest embedding vectors?

I have 100K known embedding i.e.
[emb_1, emb_2, ..., emb_100000]
Each of this embedding is derived from GPT-3 sentence embedding with dimension 2048.
My task is given an embedding(embedding_new) find the closest 10 embedding from the above 100k embedding.
The way I am approaching this problem is brute force.
Every time a query asks to find the closest embeddings, I compare embedding_new with [emb_1, emb_2, ..., emb_100000] and get the similarity score.
Then I do quicksort of the similarity score to get the top 10 closest embedding.
Alternatively, I have also thought about using Faiss.
Is there a better way to achieve this?
I found a solution using Vector Database Lite (VDBLITE)
VDBLITE here: https://pypi.org/project/vdblite/
import vdblite
from time import time
from uuid import uuid4
import sys
from pprint import pprint as pp
if __name__ == '__main__':
vdb = vdblite.Vdb()
dimension = 12 # dimensions of each vector
n = 200 # number of vectors
np.random.seed(1)
db_vectors = np.random.random((n, dimension)).astype('float32')
print(db_vectors[0])
for vector in db_vectors:
info = {'vector': vector, 'time': time(), 'uuid': str(uuid4())}
vdb.add(info)
vdb.details()
results = vdb.search(db_vectors[10])
pp(results)
Looks like it uses FAISS behind the scene.
Using you own idea, just make sure that the embeddings are in a matrix form, you can easily use numpy for this.
This is computed in linear time (in num. of embeddings) and should be fast.
import numpy as np
k = 10 # k best embeddings
emb_mat = np.stack([emb_1, emb_2, ..., emb_100000])
scores = np.dot(emb_mat, embedding_new)
best_k_ind = np.argpartition(scores, k)[-k:]
top_k_emb = emb_mat[best_k_ind]
The 10 best embeddings will be found in top_k_emb.
For a general solution inside a software project you might consider Faiss by Facebook Research.
An example for using Faiss:
d = 2048 # dimensionality of your embedding data
k = 10 # number of nearest neighbors to return
index = faiss.IndexFlatIP(d)
emb_list = [emb_1, emb_2, ..., emb_100000]
index.add(emb_list)
D, I = index.search(embedding_new, k)
You can use IndexFlatIP for inner product similarity, or indexFlatL2 for Euclidian\L2-norm distance.
In order to bypass memory issues (data>1M) refer to this great infographic Faiss cheat sheet at slide num. 7

PyTorch: How to normalize a tensor when the image is cropped randomly?

Let's say we are working with the CIFAR-10 dataset and we want to apply some data augmentation and additionally normalize the tensors. Here is some reproducible code for this
from torchvision import transforms, datasets
import matplotlib.pyplot as plt
trafo = transforms.Compose([transforms.Pad(padding = 4, fill = 0, padding_mode = "constant"),
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomCrop(size = (32, 32)),
transforms.ToTensor(),
transforms.Normalize(mean = (0.0, 0.0, 0.0), std = (1.0, 1.0, 1.0))]
)
cifar10_full = datasets.CIFAR10(root = "CIFAR-10", train = True, transform = trafo, target_transform = None, download = True)
The normalization I chose so far would do nothing with the tensors since I put the mean and std to 0 and 1 respectively. According to the documentation of torchvision.transforms.Normalize, the provided means and standard deviations are for each channel of the input. However, the problem is that that I cannot calculate the mean across each channel because of some random flipping and cropping mean. Therefore, my idea was something along the following lines
trafo_1 = transforms.Compose([transforms.Pad(padding = 4, fill = 0, padding_mode = "constant"),
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomCrop(size = (32, 32)),
transforms.ToTensor()
)
cifar10_full = datasets.CIFAR10(root = "CIFAR-10", train = True, transform = trafo_1, target_transform = None, download = True)
Now I could calculate the mean across each channel of the input and then I wanted to normalize the tensors again. However, I cannot simply use transforms.Normalize() as cifar10_full is not the original dataset anymore, but how I could proceed instead? (One solution would be to simply fix the seed of the random generators, i.e use torch.manual_seed(0), but I would like to avoid this for now...)
The mean and std are not for each tensor, but from the whole dataset. What you are trying to do doesn't really matter, you just want a scale that is good enough for the whole data representation, there is no exact mean or std you will get, these are all random operations, just use the mean and std from the actual data, which is pretty much the standard.
First, try to calculate the mean and std of the dataset (try random sampling), and use that for normalization.
# Calculate the mean, std of the complete dataset
import glob
import cv2
import numpy as np
import tqdm
import random
# calculating 3 channel mean and std for image dataset
means = np.array([0, 0, 0], dtype=np.float32)
stds = np.array([0, 0, 0], dtype=np.float32)
total_images = 0
randomly_sample = 5000
for f in tqdm.tqdm(random.sample(glob.glob("dataset_path/**.jpg", recursive = True), randomly_sample)):
img = cv2.imread(f)
means += img.mean(axis=(0,1))
stds += img.std(axis=(0,1))
total_images += 1
means = means / (total_images * 255.)
stds = stds / (total_images * 255.)
print("Total images: ", total_images)
print("Means: ", means)
print("Stds: ", stds)
Just a simple scenario, do you think in actual testing or inference your images will be augmented this way too, probably not, you will have clean images which match closely with the mean and std from the clean version of the data, so it's useless to calculate mean and std (you can take few random samples), unless you want to apply TTA.
If you want to apply TTA too, then you can go ahead and run some augmentation on the images, do random sampling and take the mean and std of those images.

Rearranging Expression to a Standard Form for Transfer Function

I'm a newcomer to Python/Sympy and I'm hoping it can make life easier understanding control system topics. A common requirement for me is to cross-check equations developed in the literature against my own derivations. When it comes to transfer functions, the denominator is typically ordered with the higher orders of s on the left, with order decreasing moving to the right. The highest order s term has a unity coefficient.
Here's an example (taken from here):
I've developed my own transfer function using sympy and I'd like to rearrange it in the fashion just described.
import sympy as sp
from sympy import simplify
from IPython.display import display
s, tau_1, tau_2 = sp.symbols('s,tau_1,tau_2')
F = (1+s*tau_2)/(1+s*(tau_1+tau_2));
k_0, k_d, N = sp.symbols('k_0,k_d,N')
H = (k_0*k_d*F)/(s+((k_0*k_d*F/N)))
display(H.simplify())
Which yields:
Now, I'm not really expecting simplify to know which format I'd like the expression displayed, but I'm hoping there's an existing function or set of functions that will help me to arrange it the way I'd like. Is there?
FURTHER UPDATE:
After a bit of manipulation, I've managed to isolate the highest power and divide across by the coefficient top & bottom to leave the highest order term without a coefficient, as I wanted. It's not perfect by any stretch. An improvement would be to have each term stand alone and ordered highest to lowest as with most polynomial presentations. I notice that collect() doesn't order the power terms as you'd expect. What's that all about!?
import sympy as sp
from sympy import simplify
from sympy import poly
from sympy import degree
from IPython.display import display
s, tau_1, tau_2 = sp.symbols('s,tau_1,tau_2')
F = (1+s*tau_2)/(1+s*(tau_1+tau_2));
display(F)
k_0, k_d, N = sp.symbols('k_0,k_d,N')
H = (k_0*k_d*F)/(s+((k_0*k_d*F/N)))
display(H)
def normTF(expr):
H_c = expr.ratsimp().collect(s)
n,d=sp.fraction(H_c)
collected = sp.Poly(d, s).as_expr()
degree = sp.degree(collected, gen=s)
terms = dict(i.as_independent(s)[::-1] for i in sp.Add.make_args(collected))
sn=(n/terms[s**degree]).ratsimp().collect(s)
sd=(d/terms[s**degree]).ratsimp().collect(s)
return sn/sd
display(normTF(H))
Maybe this is what you want:
In [30]: H.cancel().collect(s)
Out[30]:
N⋅k₀⋅k_d⋅s⋅τ₂ + N⋅k₀⋅k_d
─────────────────────────────────────────────
2
k₀⋅k_d + s ⋅(N⋅τ₁ + N⋅τ₂) + s⋅(N + k₀⋅k_d⋅τ₂)

how to work with the catboost overfitting detector

I am trying to understand the catboost overfitting detector. It is described here:
https://tech.yandex.com/catboost/doc/dg/concepts/overfitting-detector-docpage/#overfitting-detector
Other gradient boosting packages like lightgbm and xgboost use a parameter called early_stopping_rounds, which is easy to understand (it stops the training once the validation error hasn't decreased in early_stopping_round steps).
However I have a hard time understanding the p_value approach used by catboost. Can anyone explain how this overfitting detector works and when it stops the training?
It's not documented on the Yandex website or at the github repository, but if you look carefully through the python code posted to github (specifically here), you will see that the overfitting detector is activated by setting "od_type" in the parameters. Reviewing the recent commits on github, the catboost developers also recently implemented a tool similar to the "early_stopping_rounds" parameter used by lightGBM and xgboost, called "Iter."
To set the number of rounds after the most recent best iteration to wait before stopping, provide a numeric value in the "od_wait" parameter.
For example:
fit_param <- list(
iterations = 500,
thread_count = 10,
loss_function = "Logloss",
depth = 6,
learning_rate = 0.03,
od_type = "Iter",
od_wait = 100
)
I am using the catboost library with R 3.4.1. I have found that setting the "od_type" and "od_wait" parameters in the fit_param list works well for my purposes.
I realize this is not answering your question about the way to use the p_value approach also implemented by the catboost developers; unfortunately I cannot help you there. Hopefully someone else can explain that setting to the both of us.
Catboost now supports early_stopping_rounds: fit method parameters
Sets the overfitting detector type to Iter and stops the training
after the specified number of iterations since the iteration with the
optimal metric value.
This works very much like early_stopping_rounds in xgboost.
Here is an example:
from catboost import CatBoostRegressor, Pool
from sklearn.model_selection import train_test_split
import numpy as np
y = np.random.normal(0, 1, 1000)
X = np.random.normal(0, 1, (1000, 1))
X[:, 0] += y * 2
X_train, X_eval, y_train, y_eval = train_test_split(X, y, test_size=0.1)
train_pool = Pool(X, y)
eval_pool = Pool(X_eval, y_eval)
model = CatBoostRegressor(iterations=1000, learning_rate=0.1)
model.fit(X, y, eval_set=eval_pool, early_stopping_rounds=10)
The result should be something like this:
522: learn: 0.3994718 test: 0.4294720 best: 0.4292901 (514) total: 957ms remaining: 873ms
523: learn: 0.3994580 test: 0.4294614 best: 0.4292901 (514) total: 958ms remaining: 870ms
524: learn: 0.3994495 test: 0.4294806 best: 0.4292901 (514) total: 959ms remaining: 867ms
Stopped by overfitting detector (10 iterations wait)
bestTest = 0.4292900745
bestIteration = 514
Shrink model to first 515 iterations.
early_stopping_rounds takes into account both od_type='Iter' and od_wait parameters. No need to individually set both od_type and od_wait, just set early_stopping_rounds parameter.

Computing cosine_proximity loss between two outputs of the network

I'm using Keras 2.0.2 Functional API (Tensorflow 1.0.1) to implement a network that takes several inputs and produces two outputs a and b. I need to train the network using the cosine_proximity loss, such that b is the label for a. How do I do this?
Sharing my code here. The last line model.fit(..) is the problematic part because I don't have labeled data per se. The label is produced by the model itself.
from keras.models import Model
from keras.layers import Input, LSTM
from keras import losses
shared_lstm = LSTM(dim)
q1 = Input(shape=(..,.. ), name='q1')
q2 = Input(shape=(..,.. ), name='q2')
a = shared_lstm(q1)
b = shared_lstm(q2)
model = Model(inputs=[q1,q2], outputs=[a, b])
model.compile(optimizer='adam', loss=losses.cosine_proximity)
model.fit([testq1, testq2], [?????])
You can define a fake true label first. For example, define it as a 1-D array of ones of the size of your input data.
Now comes the loss function. You can write it as follows.
def my_cosine_proximity(y_true, y_pred):
a = y_pred[0]
b = y_pred[1]
# depends on whether you want to normalize
a = K.l2_normalize(a, axis=-1)
b = K.l2_normalize(b, axis=-1)
return -K.mean(a * b, axis=-1) + 0 * y_true
I have multiplied y_true by zero and added it just so that Theano does give not missing input warning/error.
You should call your fit function normally i.e. by including your fake ground-truth labels.
model.compile('adam', my_cosine_proximity) # 'adam' used as an example optimizer
model.fit([testq1, testq2], fake_y_true)