Why origin arcface and deepface arcface’s embedding values’ scale are different? And how can i use deepface arcface’ embedding when face swapping? - identity

I used origin arcface model and deepface’s arcface to get identity embedding of aligned images.
But I got completely different scale of values.
Inputs are same but the only different thing on these two way was input shape.
Origin arcface model : (batch, channel, height, width)
Deepface arcface model : (batch, height, width, channel)
Aligned images are normalized like below.
transforms_arcface = transforms.Compose([
transforms.ColorJitter(0.2, 0.2, 0.2, 0.01),
transforms.Resize((224, 224)),
transforms.ToTensor(),
# transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
Xs = cv2.imread(source_image_path)[:, :, ::-1]
Xs = Image.fromarray(Xs)
normalized_Xs = transforms_arcface(Xs)
I printed some outputs(embeddings) here.
# origin arcface
# maximum, minimum value of embedding
tensor(2.7417, device='cuda:2') tensor(-2.4630, device='cuda:2')
tensor(2.2528, device='cuda:1') tensor(-2.4806, device='cuda:1')
tensor(2.8164, device='cuda:0') tensor(-3.0586, device='cuda:0')
tensor(2.5641, device='cuda:2') tensor(-2.7087, device='cuda:2')
tensor(3.1357, device='cuda:1') tensor(-3.4846, device='cuda:1')
tensor(3.1438, device='cuda:0') tensor(-2.9450, device='cuda:0')
tensor(3.1075, device='cuda:0') tensor(2.4668, device='cuda:2')
# deepface arcface
# maximum, minimum value of embedding
tensor(0.3724, device='cuda:2') tensor(-0.4499, device='cuda:2')
tensor(0.4816, device='cuda:0') tensor(-0.6993, device='cuda:0')
tensor(0.5832, device='cuda:1') tensor(-0.5441, device='cuda:1')
tensor(0.4039, device='cuda:1') tensor(-0.4289, device='cuda:1')
tensor(0.3976, device='cuda:0') tensor(-0.3404, device='cuda:0')
tensor(0.6162, device='cuda:2') tensor(-0.4228, device='cuda:2')
tensor(0.3019, device='cuda:0') tensor(-0.4458, device='cuda:0')
Like you can see, deepface’s arcface result in very small scale of value so its embedding are useless when i’m trying to get face swapped image from that.
I want to swap source image’s identity to target image by source image’s embedding generated from deepface’s arcface.
How can I get the same embedding as origin one?
I’m trying to get identity embedding from deepface’s arcface model and generate swapped face image. On target image, while keeping attributes of it, generator will change identity part using source’s embedding(that i got from deepface’s arcface)

Related

PyTorch: How to normalize a tensor when the image is cropped randomly?

Let's say we are working with the CIFAR-10 dataset and we want to apply some data augmentation and additionally normalize the tensors. Here is some reproducible code for this
from torchvision import transforms, datasets
import matplotlib.pyplot as plt
trafo = transforms.Compose([transforms.Pad(padding = 4, fill = 0, padding_mode = "constant"),
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomCrop(size = (32, 32)),
transforms.ToTensor(),
transforms.Normalize(mean = (0.0, 0.0, 0.0), std = (1.0, 1.0, 1.0))]
)
cifar10_full = datasets.CIFAR10(root = "CIFAR-10", train = True, transform = trafo, target_transform = None, download = True)
The normalization I chose so far would do nothing with the tensors since I put the mean and std to 0 and 1 respectively. According to the documentation of torchvision.transforms.Normalize, the provided means and standard deviations are for each channel of the input. However, the problem is that that I cannot calculate the mean across each channel because of some random flipping and cropping mean. Therefore, my idea was something along the following lines
trafo_1 = transforms.Compose([transforms.Pad(padding = 4, fill = 0, padding_mode = "constant"),
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomCrop(size = (32, 32)),
transforms.ToTensor()
)
cifar10_full = datasets.CIFAR10(root = "CIFAR-10", train = True, transform = trafo_1, target_transform = None, download = True)
Now I could calculate the mean across each channel of the input and then I wanted to normalize the tensors again. However, I cannot simply use transforms.Normalize() as cifar10_full is not the original dataset anymore, but how I could proceed instead? (One solution would be to simply fix the seed of the random generators, i.e use torch.manual_seed(0), but I would like to avoid this for now...)
The mean and std are not for each tensor, but from the whole dataset. What you are trying to do doesn't really matter, you just want a scale that is good enough for the whole data representation, there is no exact mean or std you will get, these are all random operations, just use the mean and std from the actual data, which is pretty much the standard.
First, try to calculate the mean and std of the dataset (try random sampling), and use that for normalization.
# Calculate the mean, std of the complete dataset
import glob
import cv2
import numpy as np
import tqdm
import random
# calculating 3 channel mean and std for image dataset
means = np.array([0, 0, 0], dtype=np.float32)
stds = np.array([0, 0, 0], dtype=np.float32)
total_images = 0
randomly_sample = 5000
for f in tqdm.tqdm(random.sample(glob.glob("dataset_path/**.jpg", recursive = True), randomly_sample)):
img = cv2.imread(f)
means += img.mean(axis=(0,1))
stds += img.std(axis=(0,1))
total_images += 1
means = means / (total_images * 255.)
stds = stds / (total_images * 255.)
print("Total images: ", total_images)
print("Means: ", means)
print("Stds: ", stds)
Just a simple scenario, do you think in actual testing or inference your images will be augmented this way too, probably not, you will have clean images which match closely with the mean and std from the clean version of the data, so it's useless to calculate mean and std (you can take few random samples), unless you want to apply TTA.
If you want to apply TTA too, then you can go ahead and run some augmentation on the images, do random sampling and take the mean and std of those images.

How can I concatenate the 4 corners of the image quickly when loading image in deep learning?

What is the most effective way to concatenate 4 corner, shown in this photo ?
(conducting in getitem())
left_img = Image.open('image.jpg')
...
output = right_img
This is how I would do it.
Firstly I would convert the image to a Tensor Image temporarily
from torchvision import transforms
tensor_image = transforms.ToTensor()(image)
Now assuming you have a 3 channel image (although similiar principles apply to any matrices of any number of channels including 1 channel gray scale images).
You can find the Red channel with tensor_image[0] the Green channel with tensor_image[1] and the the Blue channel with tensor_image[2]
You can make a for loop iterating through each channel like
for i in tensor_image.size(0):
curr_channel = tensor_image[i]
Now inside that for loop with each channel you can extract the
First corner pixel with float(curr_channel[0][0])
Last top corner pixel with float(curr_channel[0][-1])
Bottom first pixel with float(curr_channel[-1][0])
Bottom and last pixel with float(curr_channel[-1][-1])
Make sure to convert all the pixel values to float or double values before this next appending step
Now you have four values that correspond to the corner pixels of each channel
Then you can make a list called new_image = []
You can then append the above mentioned pixel values using
new_image.append([[curr_channel[0][0], curr_channel[0][-1]], [curr_channel[-1][0], curr_channel[-1][-1]]])
Now after iterating through every channel you should have a big list that contains three (or tensor_image.size(0)) number of lists of lists.
Next step is to convert this list of lists of lists to a torch.tensor by running
new_image = torch.tensor(new_image)
To make sure everything is right new_image.size() should return torch.Size([3, 2, 2])
If that is the case you now have your wanted image but it is tensor format.
The way to convert it back to PIL is to run
final_pil_image = transforms.ToPILImage()(new_image)
If everything went good, you should have a pil image that fulfills your task. The only code it uses is clever indexing and one for loop.
There is a possibility however if you look more than I can, then you can avoid using a for loop and perform operations on all the channels without the loop.
Sarthak Jain
I don't know how quick this is but here:
import numpy as np
img = np.array(Image.open('image.jpg'))
w, h = img.shape[0], image.shape[1]
# the window size:
r = 4
upper_left = img[:r, :r]
lower_left = img[h-r:, :r]
upper_right = img[:r, w-r:]
lower_right = img[h-r:, w-r:]
upper_half = np.concatenate((upper_left, upper_right), axis=1)
lower_half = np.concatenate((lower_left, lower_right), axis=1)
img = np.concatenate((upper_half, lower_half))
or short:
upper_half = np.concatenate((img[:r, :r], img[:r, w-r:]), axis=1)
lower_half = np.concatenate((img[h-r:, :r], img[h-r:, w-r:]), axis=1)
img = np.concatenate((upper_half, lower_half))

Pytthon 3,8 Pygame 2 in W10 cant save png file [duplicate]

I am making an image cropper using pygame as interface and opencv for image processing.
I have created function like crop(), colorfilter() etc but i load image as pygame.image.load() to show it on screen but when i perform crop() it is numpy.ndarray and pygame cannot load it getting error:
argument 1 must be pygame.Surface, not numpy.ndarray
how do i solve this problem. i need to blit() the cropped image. should save image and read it then delete it after its done as i want to apply more than one filters.
The following function converts a OpenCV (cv2) image respectively a numpy.array (that's the same) to a pygame.Surface:
import numpy as np
def cv2ImageToSurface(cv2Image):
if cv2Image.dtype.name == 'uint16':
cv2Image = (cv2Image / 256).astype('uint8')
size = cv2Image.shape[1::-1]
if len(cv2Image.shape) == 2:
cv2Image = np.repeat(cv2Image.reshape(size[1], size[0], 1), 3, axis = 2)
format = 'RGB'
else:
format = 'RGBA' if cv2Image.shape[2] == 4 else 'RGB'
cv2Image[:, :, [0, 2]] = cv2Image[:, :, [2, 0]]
surface = pygame.image.frombuffer(cv2Image.flatten(), size, format)
return surface.convert_alpha() if format == 'RGBA' else surface.convert()
See How do I convert an OpenCV (cv2) image (BGR and BGRA) to a pygame.Surface object for a detailed explanation of the function.

plotting maps using OSM or other shapefiles and matplotloib for standardized report

We are developing a standardized report for our activities. The last graph I need is to display the geographic area of the activities (there are close to 100 locations).
The output for these reports is PDF letter or A4 size
The report is a mplotlib figure, where:
fig = plt.figure(figsize=(8.5, 11))
rect0 = 0, .7,, 0.18, 0.3
rect1 = .3, .7, .18, .3
rect2 = .8, .29, .2, .7
rect3 = 0, 0, .8, .4
ax1 = fig.add_axes(rect0)
ax2 = fig.add_axes(rect1)
ax3 = fig.add_axes(rect2)
ax4 = fig.add_axes(rect3)
The contents and layout for axes 1-3 are settled and work great. However ax4 is where the map contents would be displayed (ideally).
I was hoping to do something like this:
map1 = Basemap(llcrnrlon=6.819087, llcrnrlat=46.368452, urcrnrlon=6.963978,
urcrnrlat=46.482906, resolution = 'h', projection='tmerc',
lon_0=6.88, lat_0=46.42, ax=4)
map1.readshapefile('a valid shape file that works') #<----- this is the sticking point
map1.draw(insert locator coordinates)
plt.savefig(report to be inserted to document)
plt.show()
However I have not been successful in obtaining a shape file that works from open street maps or GIS.
Nor have I identified the correct process to transform the data from openstreetmaps.
Nor have I identified the process to extract that information from the OSM/xml document or the transformed GeoJSON document.
Ideally I would like to grab the bounding box information from openstreetmaps and generate the map directly.
What is the process to get a shapefile that works with the .readshapefile() call?
Or alternatively how do I get the defined map into a Matplotlib axes ?
It might be easiest to use the cartopy.io.img_tiles module, which will automatically pull the OSM tiles for use with cartopy. Using the pre-rendered tiles would negate the trouble of handling and styling individual shapefiles/XML.
See the cartopy docs on using these tiles within cartopy.

pygame draw lifebar with a clipping area

I'd like to draw a lifebar with pygame by using a clipping area (limit the area to a half when half of the hitpoints are gone for example.)
But even though the clipping area is correct, I always get the full image.
That's my lifebar class:
class Lifebar():
def __init__(self,x,y,images,owner):
self.x=x
self.y=y
self.images=images
self.owner=owner
self.owner.world.game.addGUI(self)
self.inter=False
def getValues(self):
value1 = 1.0 * self.owner.hitpoints
value2 = 1.0 * self.owner.maxhitpoints
return [value1,value2]
def render(self,surface):
rendervalues = self.getValues()
maxwidth = self.images[0].get_width()
ratio = rendervalues[0] / rendervalues[1]
actwidth = int(round(maxwidth * ratio))
surface.blit(self.images[0],(self.x,self.y))
surface.set_clip(self.x, 0, (self.x+actwidth), 1080)
surface.blit(self.images[1],(self.x,self.y))
self.owner.world.game.setclipDefault()
surface.blit(self.images[2],(self.x,self.y))
I checked that the hitpoints weren't full and that the clipping area was limited in x direction. (get_clip())
I don't know if I misunderstood how set_clip() works because I only used it for the whole screen before(objects that were partially out of the screen)
The .set_clip() method of a pygame.Surface object sets the area in which any other surfaces will be drawn into when you call .blit(). This means that the passed rectangle defines a new point of origin on the destination surface as well as the size of the area which will be updated.
To cut out a specific area of an image and draw it onto a surface you could pass an optional rectangle as third parameter to the .blit() method:
surface.blit(source_image, #source image (surface) which you want to cut out
(destination_x, destination_y), #coordinates on the destination surface
(x_coordinate, y_coordinate, width, height)) #"cut out"-rect
I hope this helps you a little bit :)