single-class or multi-class object detection for a specific class object? - deep-learning

One thing that I was wondering for a long time is the performance of a CNN-based object detector in single class and multi-class.
For example, If I want to design a pedestrian detector using the famous Faster R-CNN(VGG-16). The official version could detect pedestrian with 76.7 AP (PASCAL 07 test) if the training data is PASCAL VOC07+12 trainval.
However, I am quite satisfied with the detection results but what if I just revise its framework into a single class pedestrian detector and the training data will only contain pedestrian so both training and testing data will be fewer.
I know the computational power will consume less than the original 20-class one but I am curious about the detection performance.
has anybody tried to compare single-class and multi-class detector in the same class?

Yes, but the results vary quite a bit according to model and application. I've done this with several SVM applications and one CNN. As expected, the single-class consumed less resource in every case.
However, the results were quite different. One SVM actually did better in single-class training; two were significantly worse, and the other 3-4 were about the same (within expected error range).
The CNN didn't fare so well; it needed some tweaks to the topology.

Related

When using the reinforcement learning model ddpg, the input data are sequence data

When using the reinforcement learning model ddpg, the input data are sequence data, high-dimensional (21 dimensional) state and low dimensional (1-dimensional) action. Does this have any negative impact on the training of the model? How to solve it
In general in any machine learning scenario, dimensionality per se is not a problem, it is mostly a matter of how much variability there is the input data. Of course, higher dimensional data can have much higher variability than lower dimensional one.
Even considering this, the problem can "easily" be solved by feeding more data to the ML algorithm and increasing the complexity that it is allowed to represent (i.e. more nodes and/or layers in a neural network).
In RL, this is even less of a problem because you don't really have a restriction on how much data you actually have. You can always run your agent some more on the environment to get more sample trajectories to train on. The only issue you might find here is that your computing time grows a lot (depending on how much more you need to train on the environment for this problem).

6D pose estimation of a known 3D CAD object with limited model training for a new object

I'm working on a project where I need to estimate the 6DOF pose of a known 3D CAD object in a single RGB image - i.e. this task: https://paperswithcode.com/task/6d-pose-estimation. There are several constraints on the problem:
Usable commercially (licensed under BSD, MIT, BOOST, etc.), not GPL.
The CAD object is known and we do NOT aim for generality (i.e.recognize the class of all chairs).
The CAD object can be uploaded by a user, so it may have symmetries and a range of textures.
Inference step will be run on a smartphone, and should be able to run at >30fps.
The inference step can either be a) find the pose of the object once and then I can write code to continue to track it or b) find the pose of the object continuously. I.e. the model doesn't need to have any continuous refinement steps after the initial pose estimate is found.
Can be anywhere on the scale of single instance of a single object to multiple instances of multiple objects (MiMo). MiMO is preferred, but not required.
If a deep learning approach is used, the training time required for a new CAD object should be on the order of hours, not days.
Can either 1) just find the initial pose of an object and not have any refinement steps after or 2) find the initial pose of the object and also have refinement steps after.
I am open to traditional approaches (i.e. 2D->3D correspondences then solving with PnP), but it seems like deep learning approaches outperform them (classical are too slow - Real time 6D pose estimation of known 3D CAD objects from a single 2D image or point clouds from RGBD Camera when objects are one on top of the other?). Looking at deep learning approaches (poseCNN, HybridPose, Pix2Pose, CosyPose), it seems most of them match these constraints, except that they require model training time. Though perhaps I can use a single pre-trained model and then specialize it for each new CAD object with a shorter training step. But I am not sure of this, and I think success probably relies on the specific model chosen. For example, this project says it requires 3 hours of training time: https://github.com/DLR-RM/AugmentedAutoencoder.
So, my question: would somebody know what the state of the art, commercially usable implementation that doesn't require extensive training time for a new CAD object is?

deep learning concept - hyperparameter tuning weights RNN/LSTM

When we build a model and train it, the initial weights are randomly initialized, unless specified (seed).
As we know, there are a variety of parameters we can adjust like epochs, optimizers, batch_size, etc to find the "best" model.
The concept I have trouble with is: Even if we do find the best model after tuning, the weights will be different, yielding different models and results. So the best model for this maybe wouldn't be the best if we compiled and ran it again with the "best parameters". If we seed the weights with the parameters for reproducibility, we don't know if those would be the best weights. On the other hand, if we tune the weights, then the "best parameters" won't be best parameters anymore? I am stuck in a loop. Is there a general guideline on what parameters to tune first as opposed to others?
Or is this whole logic flawed somewhere and I am way overthinking?
We initialize weights randomly to ensure that each node acts differently(unsymmetric) from others.
Depending upon the hyperparameters(epochs, batch size etc, iterations,.)The weights are updated until the iterations last. In the end, we call the updated weights as models.
Seed is used to control the randomness of initialization. If im not wrong, a good learning algorithm(Objective function and optimizer) converges irrespective of seed values.
Again, A good model means tuning all the hyperparameters, making sure that the model is not underfitting.
On the other hand, even the model shouldn't overfit.
There is nothing like the best parameters(weights, bias), we need to continuously tune the model until the results are satisfactory and the main parts are data processing.

How do I verify that my model is actually functioning correctly in deep learning?

I have a dataset of around 6K chemical formulas which I am preprocessing via Keras' tokenization to perform binary classification. I am currently using a 1D convolutional neural network with dropouts and am obtaining an accuracy of 82% and validation accuracy of 80% after only two epochs. No matter what I try, the model just plateaus there and doesn't seem to be improving at all. Those same exact accuracies are reached with a vanilla LSTM too. What else can I try to improve my accuracies? Losses only have a difference of 0.04... Anyone have any ideas? Both models use an embedding layer and changing the output dimension isn't having an effect either.
According to your answer, I believe your model has a high bias and low variance (see this link for further details). Thus, your model is not fitting your data very well and it is causing underfitting. So, I suggest you 3 things:
Train your model a little longer: I believe two epoch are too few to give a chance to your model understand the patterns in the data. Try to minimize learning rate and increase the number of epochs.
Try a different architecture: you may change the amount of convolutions, filters and layers, You can also use different activation functions and other layers like max pooling.
Make an error analysis: once you finished your training, apply your model to test set and take a look into the errors. How much false positives and false negatives do you have? Is your model better to classify one class than the other? You can see a pattern in the errors that may be related to your data?
Finally, if none of these suggestions helped you, you may also try to increase the number of features, if possible.

deep learning: How do I know my net is not memorizing

I have a convolutional neural network and my input data are 10.000 images of the same object from different views (angles in 3D around the image). My network converges, but I am not sure if the network has memorized all the different angles / views or not. Since I only have one object I cannot really check test it with different data.
My training / test plot looks like this (red trainig, green test):
Since the test is lower than training I expect the network to learn all the images by heart? Even though I have 10.000 kind of different images.
First, "memorize" is not a term we apply to the learning process, since it's not exact regurgitation of prior examples.
This is a matter of your experimental process. You get to define the success criteria. Is 95% accuracy good enough for your intended application? What, to you, is good enough performance to declare success?
One way to build a more convincing argument is to make the typical third partition: besides training and test sets, save part of your data for validation. You do the training and testing as you've already done. When the model has converged, you apply it to the validation set to predict results. If that test passes your success criterion, then you have a finished model.