Multiple pretrained networks in Caffe - deep-learning

Is there a simple way (e.g. without modifying caffe code) to load wights from multiple pretrained networks into one network? The network contains some layers with same dimensions and names as both pretrained networks.
I am trying to achieve this using NVidia DIGITS and Caffe.
EDIT: I thought it wouldn't be possible to do it directly from DIGITS, as confirmed by answers. Can anyone suggest a simple way to modify the DIGITS code to be able to select multiple pretrained networks? I checked the code a bit, and thought the training script would be a good place to start, but I don't have in-depth knowledge of Caffe, so I'm not sure what the best/quickest way to achieve this would be.

As Shai suggested, there was no way of doing this, so I decided to clone the official repository and make the appropriate changes. I changed the code so that multiple pretrained networks can be loaded by using a colon as separator.
I created a pull request on the official repository and my changes were then merged with the main branch of DIGITS, meaning it is now possible to use this functionality in DIGITS.

AFAIK there is no straight forward way of doing so.
However, you can use net surgery to load the pretrained models and manually assign their weights to the target net. Once you have a single net with all the weights initialized according to the various pretrained models, you can save it and use it as a single pretrained model for the rest of your work.

Related

Is it possible to use a stable-baselines model as the baseline for another model?

I recently trained a stable-baselines PPO model for a couple of days and it is performing well on test environments. Essentially, I am trying to iterate on this model. I was wondering if it is possible to use this model as a new baseline for future model training. So, instead of starting with some naive policy for my environment, it could use this model as a starting spot and potentially learn a better approach to solving the environment.
The answer is yes. You basically need to do following things to achieve this:
Save your PPO model after the training in any environment you used from stable-baseline repository.
Load the saved PPO model for your new environment training rather than create a new PPO model. In this case, the starting spot will be the trained policy and the policy will evolve from there for better.
One thing you might be interested in is Transfer Learning, which basically systematically does what you want to do (train a model in one environment and use that as a baseline for another new environment to save training time for the second environment). The key to make it work is make sure there is high similarity between the two environments. If they are totally different, the pre-trained policy from the first environment may not help much.
In addition, the network architecture of the PPO is the same when you do this. In reality, the optimal network architecture for different environments are different.

Stacking/chaining CNNs for different use cases

So I'm getting more and more into deep learning using CNNs.
I was wondering if there are examples of "chained" (I don't know what the correct term would be) CNNs - what I mean by that is, using e.g. a first CNN to perform a semantic segmentation task while using its output as input for a second CNN which for example performs a classification task.
My questions would be:
What is the correct term for this sequential use of neural networks?
Is there a way to pack multiple networks into one "big" network which can be trained in one a single step instead of training 2 models and combining them.
Also if anyone could maybe provide a link so I could read about that kind of stuff, I'd really appreciate it.
Thanks a lot in advance!
Sequential use of independent neural networks can have different interpretations:
The first model may be viewed as a feature extractor and the second one is a classifier.
It may be viewed as a special case of stacking (stacked generalization) with a single model on the first level.
It is a common practice in deep learning to chain multiple models together and train them jointly. Usually it calls end-to-end learning. Please see the answer about it: https://ai.stackexchange.com/questions/16575/what-does-end-to-end-training-mean

Custom translator - Model adjustment after training

I've used three parallel sentence files to train my custom translator model. No dictionary files and no tuning files too. After training is finished and I've checked test results, I want to make some adjustments in the model. And here are several questions:
Is it possible to tune the model after training? Am I right that the model can't be changed and the only way is to train a new model?
The best approach to adjusting the model is to use tune files. Is it correct?
There is no way to see an autogenerated tune file, so I have to provide my own tuning file for a more manageable tuning process. Is it so?
Could you please describe how the tuning file is generated, when I have 3 sentence files with different amount of sentences, which is: 55k, 24k and 58k lines. Are all tuning sentences is from the first file or from all three files proportionally to their size? Which logic is used?
I wish there were more authoritative answers on this, I'll share what I know as a fellow user.
What Microsoft Custom Translator calls "tuning data" is what is normally known as a validation set. It's just a way to avoid overfitting.
Is it possible to tune the model after training? Am I right that the model can't be changed and the only way is to train a new model?
Yes, with Microsoft Custom Translator you can only train a model based on the generic category you have selected for the project.
(With Google AutoML technically you can choose to train a new model based on one of your previous custom models. However, it's also not usable without some trial and error.)
The best approach to adjusting the model is to use tune files. Is it correct?
It's hard to make a definitive statement on this. The training set also has an effect. A good validation set on top of a bad training set won't get us good results.
There is no way to see an autogenerated tune file, so I have to provide my own tuning file for a more manageable tuning process. Is it so?
Yes, it seems to me that if you let it decide how to split the training set into the training set, tuning set and test set, you can only download the training set and the test set.
Maybe neither includes the tuning set, so theoretically you can diff them. But that doesn't solve the problem of the split being different between different models.
... Which logic is used?
Good question.

Is there an open source solution for Multiple camera multiple object (people) tracking system?

I have been trying to tackle a problem where I need to track multiple people through multiple camera viewpoints on a real-time basis.
I found a solution DeepCC (https://github.com/daiwc/DeepCC) on DukeMTMC dataset but unfortunately, this solution has been taken down because of data confidentiality issues. They were using Fast R-CNN for object detection, triplet loss for Re-identification and DeepSort for real-time multiple object tracking.
Questions:
1. Can someone share some other resources regarding the same problem?
2. Is there a way to download and still use the DukeMTMC database for multiple tracking problem?
3. Is anyone aware when the official website (http://vision.cs.duke.edu/DukeMTMC/) will be available again?
Please feel free to provide different variations of the question :)
Intel OpenVINO framewors has all part of this task:
Objects detection with pretrained Faster RCNN, SSD or YOLO.
Reidentification models.
And complete demo application.
And you can use another models. Or if you want to use detection on GPU then take opencv_dnn_cuda for detection and OpenVINO for reidentification.
A good deep learning library that I have used in the past for my work is called Mask R-CNN, or Mask Regions-Convolutional Neural-Network. Although I have only used this algorithm on images and not on videos, the same principles apply, and it's very easy to make the transition to detection objects in a video. The algorithm uses Tensorflow and Keras, where you can split your input data, i.e images of people, into two sets, training, and validation.
For training, use a third party software like via, to annotate the people in the images. After the annotations have been drawn, you will export a JSON file with all annotations drawn, which will be used for the training process. Do the same thing for the validation phase, BUT make sure the images in the validation have not been seen before by the algorithm.
Once you have annotated both groups and generated JSON files, you then can start training the algorithm. Mask R-CNN makes it very easy to train, with all you need to do is pass one line full of commands to start it. If you want to train data on your GPU instead of your CPU, then install Nvidia's CUDA, which works very well with supported GPUs, and requires no coding after the installation.
During the training stage, you will be generating weights files, which are stored in the .h5 format. Depending on the number of epochs you choose, there will be a weights file generated per epoch. Once the training has finished, you then will just have to reference that weights file anytime you want to detect relevant objects, i.e. in your video feed.
Some important info:
Mask R-CNN is somewhat of an older algorithm, but it still works flawlessly today. Although some people have updated the algorithm to Tenserflow 2.0+, to get the best use out of it, use the following.
Tensorflow-gpu 1.13.2+
Keras 2.0.0+
CUDA 9.0 to 10.0
Honestly, the hardest part for me in the past was not using the algorithm, but finding the right versions of Tensorflow, Keras, and CUDA, that all play well with each other, and don't error out. Although the above-mentioned versions will work, try and see if you can upgrade or downgrade certain libraries to see if you can get better results.
Article about Mask R-CNN with video, I find it to be very useful and resourceful.
https://www.pyimagesearch.com/2018/11/19/mask-r-cnn-with-opencv/
The GitHub repo can be found below.
https://github.com/matterport/Mask_RCNN
EDIT
You can use this method across multiple cameras, just set up multiple video captures within a computer vision library like OpenCV. I assume this would be done with Python, which both Mask R-CNN and OpenCV are primarly based in.

Azure Machine Learning Data Transformation

Can machine learning be used to transform/modifiy a list of numbers.
I have many pairs of binary files read from vehicle ECUs, an original or stock file before the vehicle was tuned, and a modified file which has the engine parameters altered. The files are basically lists of little or big endian 16 bit numbers.
I was wondering if it is at all possible to feed these pairs of files into machine learning, and for it to take a new stock file and attempt to transform or tune that stock file.
I would appreciate it if somebody could tell me if this is something which is at all possible. All of the examples I've found appear to make decisions on data rather than do any sort of a transformation.
Also I'm hoping to use azure for this.
We would need more information about your specific problem to answer. But, supervised machine learning can take data with a lot of inputs (like your stock file, perhaps) and an output (say a tuned value), and learn the correlations between those inputs and output, and then be able to predict the output for new inputs. (In machine learning terminology, these inputs are called "features" and the output is called a "label".)
Now, within supervised machine learning, there is a category of algorithms called regression algorithms. Regression algorithms allow you to predict a number (sounds like what you want).
Now, the issue that I see, if I'm understanding your problem correctly, is that you have a whole list of values to tune. Two things:
Do those values depend on each other and influence each other? Do any other factors not included in your stock file affect how the numbers should be tuned? Those will need to be included as features in your model.
Regression algorithms predict a single value, so you would need to build a model for each of the values in your stock file that you want to tune.
For more information, you might want to check out Choosing an Azure Machine Learning Algorithm and How to choose algorithms for Microsoft Azure Machine Learning.
Again, I would need to know more about your data to make better suggestions, but I hope that helps.