I am making an image cropper using pygame as interface and opencv for image processing.
I have created function like crop(), colorfilter() etc but i load image as pygame.image.load() to show it on screen but when i perform crop() it is numpy.ndarray and pygame cannot load it getting error:
argument 1 must be pygame.Surface, not numpy.ndarray
how do i solve this problem. i need to blit() the cropped image. should save image and read it then delete it after its done as i want to apply more than one filters.
The following function converts a OpenCV (cv2) image respectively a numpy.array (that's the same) to a pygame.Surface:
import numpy as np
def cv2ImageToSurface(cv2Image):
if cv2Image.dtype.name == 'uint16':
cv2Image = (cv2Image / 256).astype('uint8')
size = cv2Image.shape[1::-1]
if len(cv2Image.shape) == 2:
cv2Image = np.repeat(cv2Image.reshape(size[1], size[0], 1), 3, axis = 2)
format = 'RGB'
else:
format = 'RGBA' if cv2Image.shape[2] == 4 else 'RGB'
cv2Image[:, :, [0, 2]] = cv2Image[:, :, [2, 0]]
surface = pygame.image.frombuffer(cv2Image.flatten(), size, format)
return surface.convert_alpha() if format == 'RGBA' else surface.convert()
See How do I convert an OpenCV (cv2) image (BGR and BGRA) to a pygame.Surface object for a detailed explanation of the function.
Related
I am doing the task of pedestrian with text retrieval
I use the image encoder of the clip model to encode the image, and then use the Robert encoding language to finally calculate the cos similarity of two 768 dimensional vectors
And I only use the image encoder of CLIP with the projector removed as image encoder.
I replaced the text encoder and image encoder on the original frame, and the result of the original frame is reasonable.
Topk of text to image in new encoder increased to 15 and then immediately decreased to 0.1 in 9 epoch, while in the original frame, the topk can reach 58.
class RobertaTextEncode(nn. Module):
def_init_(self, args):
super(RobertaTextEncode, self)._init
self. out_channels = 768
self. args = args
self. in_palnes = 768
self. tokenizer = RobertaTokenizerFast. from_pretrained(' roberta-base')
self. text_encode = RobertaModel. from_pretrained(' roberta-base')
def forward(self, captions):
caption = [ caption.text for caption in captions]
device = torch. device("cuda:0"if torch. cuda. is_available() else "cpu")
tokenized = self. tokenizer. batch_encode_plus(caption, truncation=' longest_first', padding=' max_length', max_length=self. args. MODEL. text_length, add_special_tokens=True, return_tensors=' pt'). to(device)
encode_text = self. text_encode(** tokenized)
text_feature = encode_text. pooler_output # [b, 768]
return text_feature
def load_pretrain_model(model_path):
from . clip import clip
url = clip._MODELS[ model_path]
model_path = clip._download(url)
try:
model = torch. jit. load(model_path, map_location="cpu"). eval()
state_dict=None
except
RuntimeError: state_dict = torch. load(model_path, map_location="cpu")
h_resolution = int((224-32)//32+1)
w_resolution = int((224-32)//32+1)
model = clip. build_model(state_dict or model. state_dictC), h_resolution, w_resolution, 32)
return model
class clipImageEncode(nn.Module):
def __init__(self, cfg):
clip_model = load_pretrain_model('ViT-B/32)
clip_model.to('cuda')
self.image_encode = clip.model.encode_image
def forward(self, x):
visual_feat = self.image_encode(x)
return visual_feat
I want to know why. I would appreciate it if you could provide suggestions.
Is there any way to extract the detected label like person or cat, dog or others that is printing by the results.print() function? I want these detected labels to be saved in an array and use it later. I am using YOLOv5 model here.
cap = cv2.VideoCapture(0)
while cap.isOpened():
ret, frame = cap.read()
# Make detections
results = model(frame)
results.print()
# Showing the box and prediction
cv2.imshow('YOLO', np.squeeze(results.render()))
if cv2.waitKey(10) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
The printed output of the results.print() was like this -
image 1/1: 480x640 1 person
Speed: 7.0ms pre-process, 80.6ms inference, 3.5ms NMS per image at shape (1, 3, 480, 640)
From this output, I wanna extract the person label and store it in an array.
This might not be the optimal solution, but here's an approach that I used for a personal project:
lst = []
cap = cv2.VideoCapture(0)
while cap.isOpened():
ret, frame = cap.read()
# Make detections
results = model(frame)
cv2.imshow('YOLO', np.squeeze(results.render()))
df = results.pandas().xyxy[0]
for i in df['name']: # name->labels
lst.append(i)
if cv2.waitKey(10) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
I have used results.pandas().xyxy[0] function to get the results as a data frame and then appended the labels to a list.
Assuming you use YoloV5 with pytorch, please see this link. It detailes how to interpret the results as json objects and also explains the structure.
Let's say we are working with the CIFAR-10 dataset and we want to apply some data augmentation and additionally normalize the tensors. Here is some reproducible code for this
from torchvision import transforms, datasets
import matplotlib.pyplot as plt
trafo = transforms.Compose([transforms.Pad(padding = 4, fill = 0, padding_mode = "constant"),
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomCrop(size = (32, 32)),
transforms.ToTensor(),
transforms.Normalize(mean = (0.0, 0.0, 0.0), std = (1.0, 1.0, 1.0))]
)
cifar10_full = datasets.CIFAR10(root = "CIFAR-10", train = True, transform = trafo, target_transform = None, download = True)
The normalization I chose so far would do nothing with the tensors since I put the mean and std to 0 and 1 respectively. According to the documentation of torchvision.transforms.Normalize, the provided means and standard deviations are for each channel of the input. However, the problem is that that I cannot calculate the mean across each channel because of some random flipping and cropping mean. Therefore, my idea was something along the following lines
trafo_1 = transforms.Compose([transforms.Pad(padding = 4, fill = 0, padding_mode = "constant"),
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomCrop(size = (32, 32)),
transforms.ToTensor()
)
cifar10_full = datasets.CIFAR10(root = "CIFAR-10", train = True, transform = trafo_1, target_transform = None, download = True)
Now I could calculate the mean across each channel of the input and then I wanted to normalize the tensors again. However, I cannot simply use transforms.Normalize() as cifar10_full is not the original dataset anymore, but how I could proceed instead? (One solution would be to simply fix the seed of the random generators, i.e use torch.manual_seed(0), but I would like to avoid this for now...)
The mean and std are not for each tensor, but from the whole dataset. What you are trying to do doesn't really matter, you just want a scale that is good enough for the whole data representation, there is no exact mean or std you will get, these are all random operations, just use the mean and std from the actual data, which is pretty much the standard.
First, try to calculate the mean and std of the dataset (try random sampling), and use that for normalization.
# Calculate the mean, std of the complete dataset
import glob
import cv2
import numpy as np
import tqdm
import random
# calculating 3 channel mean and std for image dataset
means = np.array([0, 0, 0], dtype=np.float32)
stds = np.array([0, 0, 0], dtype=np.float32)
total_images = 0
randomly_sample = 5000
for f in tqdm.tqdm(random.sample(glob.glob("dataset_path/**.jpg", recursive = True), randomly_sample)):
img = cv2.imread(f)
means += img.mean(axis=(0,1))
stds += img.std(axis=(0,1))
total_images += 1
means = means / (total_images * 255.)
stds = stds / (total_images * 255.)
print("Total images: ", total_images)
print("Means: ", means)
print("Stds: ", stds)
Just a simple scenario, do you think in actual testing or inference your images will be augmented this way too, probably not, you will have clean images which match closely with the mean and std from the clean version of the data, so it's useless to calculate mean and std (you can take few random samples), unless you want to apply TTA.
If you want to apply TTA too, then you can go ahead and run some augmentation on the images, do random sampling and take the mean and std of those images.
What is the most effective way to concatenate 4 corner, shown in this photo ?
(conducting in getitem())
left_img = Image.open('image.jpg')
...
output = right_img
This is how I would do it.
Firstly I would convert the image to a Tensor Image temporarily
from torchvision import transforms
tensor_image = transforms.ToTensor()(image)
Now assuming you have a 3 channel image (although similiar principles apply to any matrices of any number of channels including 1 channel gray scale images).
You can find the Red channel with tensor_image[0] the Green channel with tensor_image[1] and the the Blue channel with tensor_image[2]
You can make a for loop iterating through each channel like
for i in tensor_image.size(0):
curr_channel = tensor_image[i]
Now inside that for loop with each channel you can extract the
First corner pixel with float(curr_channel[0][0])
Last top corner pixel with float(curr_channel[0][-1])
Bottom first pixel with float(curr_channel[-1][0])
Bottom and last pixel with float(curr_channel[-1][-1])
Make sure to convert all the pixel values to float or double values before this next appending step
Now you have four values that correspond to the corner pixels of each channel
Then you can make a list called new_image = []
You can then append the above mentioned pixel values using
new_image.append([[curr_channel[0][0], curr_channel[0][-1]], [curr_channel[-1][0], curr_channel[-1][-1]]])
Now after iterating through every channel you should have a big list that contains three (or tensor_image.size(0)) number of lists of lists.
Next step is to convert this list of lists of lists to a torch.tensor by running
new_image = torch.tensor(new_image)
To make sure everything is right new_image.size() should return torch.Size([3, 2, 2])
If that is the case you now have your wanted image but it is tensor format.
The way to convert it back to PIL is to run
final_pil_image = transforms.ToPILImage()(new_image)
If everything went good, you should have a pil image that fulfills your task. The only code it uses is clever indexing and one for loop.
There is a possibility however if you look more than I can, then you can avoid using a for loop and perform operations on all the channels without the loop.
Sarthak Jain
I don't know how quick this is but here:
import numpy as np
img = np.array(Image.open('image.jpg'))
w, h = img.shape[0], image.shape[1]
# the window size:
r = 4
upper_left = img[:r, :r]
lower_left = img[h-r:, :r]
upper_right = img[:r, w-r:]
lower_right = img[h-r:, w-r:]
upper_half = np.concatenate((upper_left, upper_right), axis=1)
lower_half = np.concatenate((lower_left, lower_right), axis=1)
img = np.concatenate((upper_half, lower_half))
or short:
upper_half = np.concatenate((img[:r, :r], img[:r, w-r:]), axis=1)
lower_half = np.concatenate((img[h-r:, :r], img[h-r:, w-r:]), axis=1)
img = np.concatenate((upper_half, lower_half))
I am making a little game where when events happen, rectangles get spawned at a random x and y point and I am having some trouble implementing functions into this. Here is some basic code:
xran = random.randint(5, 485)
yran = random.randint(5, 485)
xran1 = random.randint(5, 450)
yran1 = random.randint(5, 400)
def allRand():
#This REGENERATES those randoms making it 'spawn' in a new location.
xran = random.randint(0, 485)
yran = random.randint(0, 485)
xran1 = random.randint(5, 450)
yran1 = random.randint(5, 400)
char = pygame.draw.rect(screen, black, (x,y,15,15), 0)
food = pygame.draw.rect(screen, green, (xran,yran,10,10), 0)
badGuy = pygame.draw.rect(screen, red, (xran1,yran1,50,100), 0)
if char.colliderect(food):
score += 1
print "Your score is:",score
allRand()
Does calling a function that regenerates random numbers work for any of you? I know it regenerates them because I have had it print back the variables, for some reason my rects don't do there though.
Note: This is just snippet of my code it was just meant to give an idea of what I am trying to do.
Thanks!
You need to declare xran, etc. with global inside the allRand() function. Otherwise, it's just creating new variables inside function scope, assigning them random values, then throwing them away when the function returns.
Your allRand() method doesn't have any code. You must indent the lines you want in that function.
It's kinda working because you still call those statements below your def. But it's not because you're calling the function.
To add to Lee Daniel Crocker's answer, when you create variables in a function they exist only in that function. If you want them to exist outside you can either make them global variables as he suggested, you are can return them and catch them:
>>> def square(number):
squared = number*number
return squared
>>> square(3)
9
>>> ninesquared = square(3)
>>> ninesquared
9
>>>
Read more
It looks like you need to master your basics. I suggest doing that first before trying things in pygame.
EDIT:
If you define variables outside of the function, they will not effect any variables you define in the function either.
>>> x = 5
>>> def rais():
x = 10
>>> x
5
>>> rais()
>>> x
5
>>>
Notice how rais did nothing?
If we change the line in rais to be x = x + 1 then python will give us back an error that x is not defined.
If you want you're variables to get into the function you need to pass them in as parameters, but once again, they won't effect anything outside of the function unless you return and capture them. And once again, you can declare them as global variables and that will also work.