I am currently working on vehicle detection using ssd mobile net TensorFlow API. I have made a custom dataset from coco dataset which comprises of all the vehicle categories in coco i.e. car, bicycle, motorcycle, bus, truck, and also I have a dataset of 730 rickshaw images.
Ultimately my goal is to detect rickshaws along with other vehicles as well. But so far I have failed.
There are a total of 16000 instances in the train_labels.csv on average each class has 2300 instances. I have set the batch size = 12. Then I train the coco pre-trained model on my custom dataset for 12000 steps.
But unfortunately I have not been able to get good results. After training it failed to classify other vehicles.
Any advice regarding the ratio of each class in the dataset, or maybe I need more rickshaw images, how many layers should I freeze? Or may be a different perspective would be highly appreciated.
Since you have a custom dataset of 730 rickshaw images, I think there is no need to extract different dataset of other vehicles from COCO dataset for fine tuning.
What I meant is the tensorflow pretrained model is really good at detecting all other vehicles than the rickshaw. Your task is just to teach the model, how to detect rickshaw.
Another option is since you already have a vehicle dataset, you can try training a model using checkpoints from COCO.
https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9
Go through the above article, it will give you a fair idea about start to end flow. Author has tuned ssd mobilenet model trained on coco dataset to detect raccoon images. The raccoon was the only new class author wanted to detect. In your case, you just have to replace raccoon by rickshaw images and follow exact same steps. Author of this has used Google cloud but you can change the config file to tune it on a local machine. Considering you have only 730 new images, tuning it shouldn't take time.
This is another good example in case things are not clear https://towardsdatascience.com/building-a-toy-detector-with-tensorflow-object-detection-api-63c0fdf2ac95
Coming to your question about do you need more data, more data is always better. What I would suggest is tune model using steps above and check mAP. If you think mAP is low and the performance for your intended application is not enough, collect more data and tune again.
Please let me know if you have any questions.
Related
I have a dataset composed of 10k-15k pictures for supervised object detection which is very different from Imagenet or Coco (pictures are much darker and represent completely different things, industrial related).
The model currently used is a FasterRCNN which extracts features with a Resnet used as a backbone.
Could train the backbone of the model from scratch in one stage and then train the whole network in another stage be beneficial for the task, instead of loading the network pretrained on Coco and then retraining all the layers of the whole network in a single stage?
From my experience, here are some important points:
your train set is not big enough to train the detector from scratch (though depends on network configuration, fasterrcnn+resnet18 can work). Better to use a pre-trained network on the imagenet;
the domain the network was pre-trained on is not really that important. The network, especially the big one, need to learn all those arches, circles, and other primitive figures in order to use the knowledge for detecting more complex objects;
the brightness of your train images can be important but is not something to stop you from using a pre-trained network;
training from scratch requires much more epochs and much more data. The longer the training is the more complex should be your LR control algorithm. At a minimum, it should not be constant and change the LR based on the cumulative loss. and the initial settings depend on multiple factors, such as network size, augmentations, and the number of epochs;
I played a lot with fasterrcnn+resnet (various number of layers) and the other networks. I recommend you to use maskcnn instead of fasterrcnn. Just command it not to use the masks and not to do the segmentation. I don't know why but it gives much better results.
don't spend your time on mobilenet, with your train set size you will not be able to train it with some reasonable AP and AR. Start with maskrcnn+resnet18 backbone.
I am working with Neural network for object classification right now. I am working on creating datasets for training and validation. I want to know if it is possible to create two datasets for training comprising of two completely different objects and labels. (EG dataset 1 has cars and dataset 2 has cats) Does it still work or should I create datasets where each file has mixed up both the different object types and labels in all the training files? Does such mixture/separation matter if I am training the network in one cycle with different datasets?
Depending on what you are using to train, many APIs (such as TensorFlow object detection) read the TF Record files (datasets) in order as they are scrambled to make the files beforehand. Scrambling is quite important with training as you will get the model starting training with one class, and then train for a bit with another individual class. It should get to the same standard eventually, but it is a lot better for the model to train with an equal distribution of classes of the training steps.
I'm quite new to caffe and this could be a non sense question.
I have trained my network from scratch. It trains well and gets a reasonable accuracy in tests. The question is about retraining or fine tuning this network. Suppose you have new samples of images of the same original categories and you want to teach the net with this new images (because for example the net fails to predict in this particular images).
As far a I know it is possible to resume training with a snapshot and solverstate or fine-tuning using only the weights of the training model. What is the best option in this case?. or is better to retrain the net with original images and new ones together?.
Think in a possible "incremental training" scheme, because not all the cases for a particular category are available in the initial training. Is it possible to retrain the net only with the new samples?. Should I change the learning rate or maintain any parameters in order to maintain the original accuracy in prediction when training with the new samples? the net should predict in original image set with the same behaviour after fine tuning.
Thanks in advance.
all experts
I am new in CNN and Caffe. I have a task in classification between 2 classes. The data set that I have collected is very small about 50 for class A and 50 for class B (I know that it is very very small). It is a human images.
I took the BVLC model and made a change such as Batch size for testing and training and also the maximum iteration. I try with many various setup, but the model doesn't work.
Any idea about how to come up with appropriate model or setting or other solutions ?
remark** I once randomly made a change to the BVLC model setup and it worked, but i lost the set up file.
For the train.prototxt and Solve.prototxt, I get it from this guy Adil Moujahid
I did try training batch size as 32,64,128,256 and testing for 5,20,30 but failed
For the data set, it is images of normal women and beautiful women and i will classify it, but Stackoverflow does not allowed me to add more than 2 links
I wonder that is there any formula , equation or steps that I can come up with and choose the right model setting.
Thank you in advance.
What is your meaning in "doesn't work"? Loss stays too high? Training is converged, but accuracy is low? Andrew Ng has an excellent session on "debugging" CNNs - Nuts and Bolts of Building Applications using Deep Learning (NIPS slides, summary, additional summary).
My humble guess is that your network has an overfitting problem - it learns the specific examples and can't generalize - so increasing the train dataset / regularization / data augmentation can help.
I am trying to train a learning model to recognize one specific scene. For example, say I would like to train it to recognize pictures taken at an amusement park and I already have 10 thousand pictures taken at an amusement park. I would like to train this model with those pictures so that it would be able to give a score for other pictures of the probability that they were taken at an amusement park. How do I do that?
Considering this is an image recognition problem, I would probably use a convolutional neural network, but I am not quite sure how to train it in this case.
Thanks!
There are several possible ways. The most trivial one is to collect a large number of negative examples (images from other places) and train a two-class model.
The second approach would be to train a network to extract meaningful low-dimensional representations from an input image (embeddings). Here you can use siamese training to explicitly train the network to learn similarities between images. Such an approach is employed for face recognition, for instance (see FaceNet). Having such embeddings, you can use some well-established methods for outlier detections, for instance, 1-class SVM, or any other classifier. In this case you also need negative examples.
I would heavily augment your data using image cropping - it is the most obvious way to increase the amount of training data in your case.
In general, your success in this task strongly depends on the task statement (are restricted to parks only, or any kind of place) and the proper data.