Darknet and Data Augmentation - deep-learning

In the darknet deep learning framework .cfg files we see parameters like
angle, saturation, exposure
These parameters are used for data augmentation in classic image classification problems.
Does Darknet automatically perform image augmentation automatically for object detection by setting up these parameters?

Yes. By default, data augmentation is done automatically by DarkNet. For example, when training, the input image will be rotated by a random angle, up to the specified angle. With those parameters in the .cfg file, you can control thoses limits.

Related

Is is possible to use a transformer network architecture as an autoencoder to perform anomaly detection?

I would like use the efficiency of transformer architecture to do anomaly detection on time series based on transformers. I am wondering:
Can we modify slightly the architecture to create a bottleneck in the transformer network (similar to a fully connected network AutoEncoder, or AE with LSTMs).
does it actually makes sense to try to do that
I would like the transformer to learn how to reconstruct in output the input sequence, with some intermediate latent space that has lower dimensionality (bottleneck).
My idea was to reduce d_model (number of variables in the time series, or embedding dimension in nlp) but it must be of the same size of the input series according to `torch.nn.Transformer`` (see here)

Is there an actual minimum input image size for popular computer vision models? (E.g., vgg, resnet, etc.)

According to the documentation on pre-trained computer vision models for transfer learning (e.g., here), input images should come in "mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224".
However, when running transfer learning experiments on 3-channel images with height and width smaller than expected (e.g., smaller than 224), the networks generally run smoothly and often get decent performances.
Hence, it seems to me that the "minimum height and width" is somehow a convention and not a critical parameter. Am I missing something here?
There is a limitation on your input size which corresponds to the receptive field of the last convolution layer of your network. Intuitively, you can observe the spatial dimensionality decreasing as you progress through the network. At least this is the case for feature extractor CNNs which aim at extracting feature embeddings from the input image. That is most pre-trained models such as vanilla VGG, and ResNets networks do not retain spatial dimensionality. If the input of a convolutional layer is smaller than the kernel size (even if/when padded), then you simply won't be able to perform the operation.
TLDR: adaptive pooling layer
For example, the standard resnet50 model accepts input only in ranges 193-225, and this is due to the architecture and downscaling layers (see below).
The only reason why the default pytorch model works is that it is using adaptive pooling layer which allows to not restrict input size. So it's gonna work but you should be ready for performance decay and other fun things :)
Hope you will find it useful:
https://discuss.pytorch.org/t/how-can-torchvison-models-deal-with-image-whose-size-is-not-224-224/51077/3
What is Adaptive average pooling and How does it work?
https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html
https://github.com/pytorch/vision/blob/c187c2b12d86c3909e59a40dbe49555d85b98703/torchvision/models/resnet.py#L118
https://github.com/pytorch/vision/blob/c187c2b12d86c3909e59a40dbe49555d85b98703/torchvision/models/resnet.py#L151
https://developpaper.com/pytorch-implementation-examples-of-resnet50-resnet101-and-resnet152/

How to reject false alarm in object detection using SSD?

I used SSD for my object detection. But there are some false detection from some other objects in the image.
That is happening consistently from same objects. So is there a way to reject those components in training?
For Yolo, I can do as follow.
Just add images with these non-labeled objects to the training dataset and train. Network will learn not to detect such objects.
Also it is desirable to add negative-samples to your training dataset: https://github.com/AlexeyAB/darknet
desirable that our training dataset include images with non-labeled objects that we do not want to detect - negative samples without bounded box (empty .txt files). (Credit to alexbe)
In general, what we can do are
Hard negative mining, Inspect confusion matrix and Data Augmentation.

MXnet - ImageRecordIter and data augmentation for ROI-Pooling enabled CNN

How can I perform data augmentation when I use ROI-Pooling in a CNN network which I developed using MXnet ?
For example suppose I have a resnet50 architecture which uses a roi-pooling layer and I want to use random-crops data augmentation in the ImageRecord Iterator.
Is there an automatic way that data coordinates in the rois passed to the roi pooling layer, transform so as to be applied in images generated by the data-augmentation process of the ImageRecord Iterator ?
You should be able to repurpose the ImageDetRecordIter for this. It is intended for use with Object Detection data containing bounding boxes, but you could define the bounding boxes as your ROIs. And now when you apply augmentation operations (such as flip and rotation), the coordinates of the bounding boxes will be adjusted in-line with the images.
Otherwise you can easily write your own transform function using Gluon, and can make use of any OpenCV augmentation to apply to both your image and ROIs. Just write a function that takes data and label, and returns the augmented data and label.

Caffe - Image augmentation by cropping

The cropping strategy of caffe is to apply random-crop for training and center-crop for testing.
From experiment, I observed that accuracy of recognition improves if I can provide two cropped version (random and center) for the same image during training. These experimental data (size 100x100) are generated offline (not using caffe) by applying random and center cropping on a 115x115 sized image.
I would like to know how to perform this task in caffe?
Note: I was thinking to use 2 data layers, each with different cropping (center and random), and then perform concatenation. However, I found that caffe does not allow center crop during training.
Easy answer would be to prepare another already-cropped dataset of your training data, cropped to 100x100. Then mix this dataset with your original data and train. In this way, random cropping of your new images will actually give you center cropping.
More complex way is hand-crafting your batches using caffe APIs (MATLAB and Python) and feeding the hand-crafted batches on-the-fly to the network.
You can check this link for different ways to achieve this.