Faster RCNN in Python cfg file - caffe

What does PROPOSAL_METHOD: gt signify, in the faster_rcnn_end2end.yml file?
Does is mean that the ROI proposals are the Ground Truth Bounding Boxes only?
The link to the file is here

Related

Darknet and Data Augmentation

In the darknet deep learning framework .cfg files we see parameters like
angle, saturation, exposure
These parameters are used for data augmentation in classic image classification problems.
Does Darknet automatically perform image augmentation automatically for object detection by setting up these parameters?
Yes. By default, data augmentation is done automatically by DarkNet. For example, when training, the input image will be rotated by a random angle, up to the specified angle. With those parameters in the .cfg file, you can control thoses limits.

3D annotation for instance segmentation

I'm trying to annotate some data for 3D instance segmentation. While it's fairly straightforward to draw masks for each 2D plane, it's not obvious how to connect the same "instances" together post-annotation (ie. connect the "red" masks together, connect the "blue" masks together) without laboriously making sure the instances are instance-matched (ie. colour-coded to make sure "red" masks always connect with "red" masks).
A naive approach I have thought of is to make many 2D segmentation masks, and calculate the center of mass for each object detected. I can later re-assign the instances based on the closest matching center of mass, but I worry this would inadvertently generate "crossed-over" segmentation instances (illustrated below). What are some high-throughput strategies to generate 3D annotations?
The boundary of your 2-d slices could be used as constraints to obtain the optimal 3-d surface, as proposed in 1.
However, I think it is easier to generate 3-d labels from markers, such as 2. Its implementation is available in here (Fill free open an issue if you encounter any problems :P).
Also, the napari package could be useful to develop the GUI without much effort.
[1] Grady, Leo. "Minimal surfaces extend shortest path segmentation methods to 3D." IEEE Transactions on Pattern Analysis and Machine Intelligence 32.2 (2008): 321-334.
[2] Falcão, Alexandre X., and Felipe PG Bergo. "Interactive volume segmentation with differential image foresting transforms." IEEE Transactions on Medical Imaging 23.9 (2004): 1100-1108.
You can use 3D Slicer's Segment Editor. It is free, open-source, has many built-in tools, and customizable/extensible in Python or C++ (you can plug in your own segmentation method with minimal effort). To solve a segmentation task, typically you first figure out a good segmentation workflow (what tools to use, in what combination and what parameters) using interactive GUI, then if necessary you can make it semi-automatic or fully automatic using Python scripting.
You can create a segmentation by contouring every image slice, but it would be too tedious. Instead, you can use 3D region growing (Grow from seeds effect) or segment on just a few slices and interpolate between them (Fill between slices effect).

CNN,neural network angle detection

I have a dataset of 1000 images of 1 soft toy from different angles. Example: dataset
I have to use some neural networks and train to detect my toy and also output its angle. I want to see: Class Probability Angle.
Example: wanted output
Is there a way to modify SSD or YOLO with Tensorflow or Darknet to modify framework/network to even calculate and output angle in YOLO for example?
While searching the internet I didn't find network example that will do something like that.

Cutting down the size of a fasttext bin file

Currently the bin files for fastText wiki.en.bin is about 8GB. Is there a version about half the size of this? The bin files consists of the model and pretrained vectors that were generated from a large wiki corpus. Is there a smaller en. version that would make it easier for lower range machines? Loading this up is taking too much memory.
Or to get a smaller size bin file for use with fasttext, should i train my own set of fasttext vectors with a smaller set of parallel corpus?
You can use the quantize function
$ ./fasttext quantize -output wiki.en
This will drastically reduce the size of your model without losing too much accuracy.
Currently, the native Facebook fastText library supports quantization only for the supervised models used for classification, and cannot compress unsupervised models for embedding lookup trained e.g. on wiki.
However, I have created a package compress-fasttext that is able to significantly reduce the size of unsupervised fastText models. You can read more about it in this Medium post.
There are a few models of different sizes (10MB to 200MB) compressed with this package for English and Russian, and a set of tiny models for 101 other languages.

Which is best for object localization among R-CNN, fast R-CNN, faster R-CNN and YOLO

what is the difference between R-CNN, fast R-CNN, faster R-CNN and YOLO in terms of the following:
(1) Precision on same image set
(2) Given SAME IMAGE SIZE, the run time
(3) Support for android porting
Considering these three criteria which is the best object localization technique?
R-CNN is the daddy-algorithm for all the mentioned algos, it really provided the path for researchers to build more complex and better algorithm on top of it.
R-CNN, or Region-based Convolutional Neural Network
R-CNN consist of 3 simple steps:
Scan the input image for possible objects using an algorithm called Selective Search, generating ~2000 region proposals
Run a convolutional neural net (CNN) on top of each of these region proposals
Take the output of each CNN and feed it into a) an SVM to classify the region and b) a linear regressor to tighten the bounding box of the object, if such an object exists.
Fast R-CNN:
Fast R-CNN was immediately followed R-CNN. Fast R-CNN is faster and better by the virtue of following points:
Performing feature extraction over the image before proposing regions, thus only running one CNN over the entire image instead of 2000 CNN’s over 2000 overlapping regions
Replacing the SVM with a softmax layer, thus extending the neural network for predictions instead of creating a new model
Intuitively it makes a lot of sense to remove 2000 conv layers and instead take once Convolution and make boxes on top of that.
Faster R-CNN:
One of the drawbacks of Fast R-CNN was the slow selective search algorithm and Faster R-CNN introduced something called Region Proposal network(RPN).
Here’s is the working of the RPN:
At the last layer of an initial CNN, a 3x3 sliding window moves across the feature map and maps it to a lower dimension (e.g. 256-d)
For each sliding-window location, it generates multiple possible regions based on k fixed-ratio anchor boxes (default bounding boxes)
Each region proposal consists of:
an “objectness” score for that region and
4 coordinates representing the bounding box of the region
In other words, we look at each location in our last feature map and consider k different boxes centered around it: a tall box, a wide box, a large box, etc. For each of those boxes, we output whether or not we think it contains an object, and what the coordinates for that box are. This is what it looks like at one sliding window location:
The 2k scores represent the softmax probability of each of the k bounding boxes being on “object.” Notice that although the RPN outputs bounding box coordinates, it does not try to classify any potential objects: its sole job is still proposing object regions. If an anchor box has an “objectness” score above a certain threshold, that box’s coordinates get passed forward as a region proposal.
Once we have our region proposals, we feed them straight into what is essentially a Fast R-CNN. We add a pooling layer, some fully-connected layers, and finally a softmax classification layer and bounding box regressor. In a sense, Faster R-CNN = RPN + Fast R-CNN.
YOLO:
YOLO uses a single CNN network for both classification and localising the object using bounding boxes. This is the architecture of YOLO :
In the end you will have a tensor of shape 1470 i.e 7*7*30 and the structure of the CNN output will be:
The 1470 vector output is divided into three parts, giving the probability, confidence and box coordinates. Each of these three parts is also further divided into 49 small regions, corresponding to the predictions at the 49 cells that form the original image.
In postprocessing steps, we take this 1470 vector output from the network to generate the boxes that with a probability higher than a certain threshold.
I hope you get the understanding of these networks, to answer your question on how the performance of these network differs:
On the same dataset: 'You can be sure that the performance of these networks are in the order they are mentioned, with YOLO being the best and R-CNN being the worst'
Given SAME IMAGE SIZE, the run time: Faster R-CNN achieved much better speeds and a state-of-the-art accuracy. It is worth noting that although future models did a lot to increase detection speeds, few models managed to outperform Faster R-CNN by a significant margin. Faster R-CNN may not be the simplest or fastest method for object detection, but it is still one of the best performing. However researchers have used YOLO for video segmentation and by far its the best and fastest when it comes to video segmentation.
Support for android porting: As far as my knowledge goes, Tensorflow has some android APIs to port to android but I am not sure how these network will perform or even will you be able to port it or not. That again is subjected to hardware and data_size. Can you please provide the hardware and the size so that I will be able to answer it clearly.
The youtube video tagged by #A_Piro gives a nice explanation too.
P.S. I borrowed a lot of material from Joyce Xu Medium blog.
If your are interested in these algorithms you should take a look into this lesson which go through the algoritmhs you named : https://www.youtube.com/watch?v=GxZrEKZfW2o.
PS: There is also a Fast YOLO if I remember well haha !
I have been working with YOLO and FRCNN a lot. To me the YOLO has the best accuracy and speed but if you want to do research on image processing, I will suggest FRCNN as many previous works are done with it, and to do research you really want to be consistent.
For Object detection, I am trying SSD+ Mobilenet. It has a balance of accuracy and speed So it can also be ported to android devices easily with good fps.
It has less accuracy compared to faster rcnn but more speed than other algorithms.
It also has good support for android porting.