How YOLO handles the input images of different sizes? - deep-learning

I am working on custom object detection with YOLOv5. We can provide different input image sizes to the network. How can a DNN network accept different sizes of input? Does YOLO has different backbones for different input sizes?
When I give the argument --imgsz as 640, YOLO dataloader is resizing it to (384, 672, 3) and if the --imgsz is 320, the resized images are of size (224, 352, 2).
As conventional CNNs accepts fixed square-sized (equal height and width) inputs, How is YOLO handling the variable image sizes?

Related

Treat point clouds as RGB images for activity recognition

I have a VLP-16 lidar and I want to use it for activity recognition, I want to treat point clouds as RGB images so that I can use openpose. I use this repository to convert my point clouds into depth images and the following are some examples:
#Use Pillow and OpenCV to check image's format
Pillow: RGB (150, 64)
OpenCV: (64, 150, 3)
Just as these two images show, I have two choices: low-resolution image or higher resolution but the image is blurring and deform(due to VLP-16 is sparse in the vertical direction).
I've tried some methods to process these two types image:
Use opencv or API to resize / crop images.
Use image super-resolution(results show below).
Use filter for depth image completion(results show below).
Use CycleGAN for image-to-image translation.
Use network for low-resolution depth images.
In addition, I also tried Lidar Super-resolution(point cloud upsampling), but these methods can't make me avhieve my goal: Treat point clouds as RGB images for activity recognition.
May i have some suggestions to achieve my goal?
Any help is much appreciated!

Does YOLO rescale anchors before fully connected layers? What is the solution for the absence of ROI pooling?

I read dozens of articles about YOLO but didn't find this answer. The question is: Faster R-CNN uses ROI Pooling to rescale anchors before the fully connected layers, but YOLO doesn't. Some people say YOLO doesn't need ROI pooling because it doesn't have RPN, but YOLO does have different anchors with different sizes/proportions, each one trying to detect an object. How can a neural network be trained with this anchors with different sizes?
Yolo calculates a confidence score and a class score, but I can't understand how it's possible without reshaping the anchors.
You speak from fully connected layers and anchor boxes in yolo. Only the first version of Yolo had a fully connected layer but no anchors. Yolo v2 and v3 are both full CNNs without any fully connected layers, but with anchors.
In the first yolo the width and height were directly predicted relative to the width and height of the input. In Yolov2 and v3 anchor boxes were used and only a rescaling of the anchors width and height is learned in the same manner as in e.g. Faster R-CNN, SSD.

Is it possible to use images of different sizes in a Caffe network?

I'm researching anomaly detection recently. I have received a dataset with images of various sizes and ratios. I have learned that almost all Caffe networks need input of images of the same size, or of the same ratio so it can be easily resized. However my dataset contains anomalies of round shapes, linear shapes and irregular shapes, and all images have been cropped to fit the shape of each anomaly. It is impossible to resize this dataset. What should I do? Is it possible to use images of different sizes as input?

Fully convolutional autoencoder for variable-sized images in keras

I want to build a convolutional autoencoder where the size of the input in not constant. I'm doing that by stacking up conv-pool layers until I reach an encoding layer, and then doing the reverse with upsample-conv layers. the problem is that no matter what settings I use, I can't get the exact same size in the output layer as the input layer. The reason for that is that the UpSampling layer (given say (2,2) size), doubles the size of the input, so I can't get odd dimensions for instance. Is there a way to tie the output dimension of a given layer to the input dimension of a previous layer for individual samples (as I said, the input size for the max-pool layer in variable)?
Yes, there is.
You can use three methods
Padding
Resizing
Crop or Pad
Padding will only work to increase the dimensions. Not beneficial for reducing the size.
Resizing should be more costly but optimum solution for each case (up or downsampling). It will keep all the values in the range and will simply resample them to resize in a given dimension.
Crop or Pad will work as resize and it will be more compute-efficient as there is no interpolation in this method. However, if you want to resize it to a smaller dimension, it will crop from the edges.
By using those 3, you can arrange your layer's dimensions.

Supporting Multiple screen sizes Android AIR by making stage MAX resolution

Hey everyone so this is not a duplicate! The question I have is if i make the stage width and height of my Android AIR 4.0 Application using FLASH CS6 to say 1080x1920 and make all my movieClips etc... fit the stage. Will it then be able to fit all lower screen sizes and scale automatically? instead of having to create multiple XML files and create a different size image for all available screen sizes?
I don't know a really good method of doing this, so i thought that this might be a logical approach since it's the largest possible already can't it just shrink down to all devices? I tested on my small screen and the only problem I am having is it not filling the whole width of the screen.
But then I Add this line of code in my Constructor and everything fits perfectly:
stage.align = StageAlign.TOP_LEFT;
stage.scaleMode = StageScaleMode.EXACT_FIT;
stage.displayState = StageDisplayState.FULL_SCREEN;
Any thoughts?
If you follow the Flex model, it's the other way around. Flex apps are generally built for the lowest screen density (not resolution) and scale upwards. The sizing and placements respond naturally to the screen size. At each screen density range, a new set of images are used and a new multiplier is used for all of the sizes and positions.
So let's look at it this way. In Flex, there are a set of DPI ranges. 120 (generally ignored), 160, 240, 320, 480, and 640. Every device uses a single one of those settings. Take the iPhone. You build for 160, but the iPhone is 320dpi. So all of the values you use, built for 160dpi, are doubled for your app on the iPhone. You use images twice the size, too.
For most of my apps, I have at least four different sizes (one each for 160, 240, 320, and 480 dpi ranges) of every single image in my project. The images only scale if I don't have an image for the dpi range. The goal should be to never scale any images, up or down. In each range, the images remain the same size at all times and the only thing that changes is the positioning of them.
Now, I've used Flex as my example here, since the layout engine it uses is extremely thorough and well thought out, but it is entirely possible to build a simple system for this in AS3 as well (I did it last year with relative ease).
The biggest thing you need to do is forget about screen resolution. In this day and age, especially with Android where there are hundreds of different screens in use, screen size and resolution are irrelevant. Screen density, on the other hand, is everything