Simulation of a deep self-learning neural network on a robot

Simulation of a deep self-learning neural network on a robot - deep-learning

Do you know of any ready platforms/software/code that I can use for me to run a simulation of a self-learning deep neural network on a robotic platform, while being able to tweek the code to change specific parameters?
I have found scientific documentation and platforms like Webots, but for the project I want it, I need a ready solution that I will be able to change some parameters in and measure the effect of it on the generations the algorithm needs to train the robot, based on these parameters.

Related

Pretrained model or training from scratch for object detection?

I have a dataset composed of 10k-15k pictures for supervised object detection which is very different from Imagenet or Coco (pictures are much darker and represent completely different things, industrial related).
The model currently used is a FasterRCNN which extracts features with a Resnet used as a backbone.
Could train the backbone of the model from scratch in one stage and then train the whole network in another stage be beneficial for the task, instead of loading the network pretrained on Coco and then retraining all the layers of the whole network in a single stage?

From my experience, here are some important points:
your train set is not big enough to train the detector from scratch (though depends on network configuration, fasterrcnn+resnet18 can work). Better to use a pre-trained network on the imagenet;
the domain the network was pre-trained on is not really that important. The network, especially the big one, need to learn all those arches, circles, and other primitive figures in order to use the knowledge for detecting more complex objects;
the brightness of your train images can be important but is not something to stop you from using a pre-trained network;
training from scratch requires much more epochs and much more data. The longer the training is the more complex should be your LR control algorithm. At a minimum, it should not be constant and change the LR based on the cumulative loss. and the initial settings depend on multiple factors, such as network size, augmentations, and the number of epochs;
I played a lot with fasterrcnn+resnet (various number of layers) and the other networks. I recommend you to use maskcnn instead of fasterrcnn. Just command it not to use the masks and not to do the segmentation. I don't know why but it gives much better results.
don't spend your time on mobilenet, with your train set size you will not be able to train it with some reasonable AP and AR. Start with maskrcnn+resnet18 backbone.

How do shared parameters in actor-critic models work?

Hello StackOverflow Community!
I have a question about Actor-Critic Models in Reinforcement Learning.
While listening policy gradient methods classes of Berkeley University, it is said in the lecture that in the actor-critic algorithms where we both optimize our policy with some policy parameters and our value functions with some value function parameters, we use same parameters in both optimization problems(i.e. policy parameters = value function parameters) in some algorithms (e.g. A2C/A3C)
I could not understand how this works. I was thinking that we should optimize them separately. How does this shared parameter solution helps us?
Thanks in advance :)

You can do it by sharing some (or all) layers of their network. If you do, however, you are assuming that there is a common state representation (the intermediate layer output) that is optimal w.r.t. both. This is a very strong assumption and it usually doesn't hold. It has been shown to work for learning from image, where you put (for instance) an autoencoder on the top both the actor and the critic network and train it using the sum of their loss function.
This is mentioned in PPO paper (just before Eq. (9)). However, they just say that they share layers only for learning Atari games, not for continuous control problems. They don't say why, but this can be explained as I said above: Atari games have a low-dimensional state representation that is optimal for both the actor and the critic (e.g., the encoded image learned by an autoencoder), while for continuous control you usually pass directly a low-dimensional state (coordinates, velocities, ...).
A3C, which you mentioned, was also used mostly for games (Doom, I think).
From my experience, in control sharing layers never worked if the state is already compact.

What are the similarities between A3C and PPO in reinforcement learning policy gradient methods?

Is there any easy way to merge properties of PPO with an A3C method? A3C methods run a number of parrel actors and optimize the parameters. I am trying to merge PPO with A3C.

PPO has a built-in mechanism(surrogate clipping objective function) to prevent large gradient updates & generally outperforms A3C on most continuous control environments.
In order for PPO to enjoy the benefits of parallel computing like A3C, Distributed PPO(DPPO) is the way to go.
Check out the links below to find out more information about DPPO.
Pseudo code from the original DeepMind paper
Original DeepMind paper: Emergence of Locomotion Behaviours in Rich Environments
If you plan to implement your DPPO code in Python with Tensorflow, I will suggest you to try Ray for the part on distributed execution.

Overview for Deep Learning Networks

I am fairly new to Deep Learning and get quite overwhelmed by the many different Nets and their field of application. Thus, I want to know if there is some kind of overview which kind of different networks exist, what there key-features are and what kind of purpose they have.
For example I know abut LeNet, ConvNet, AlexNet - and somehow they are the same but still differ?

There are basically two types of neural networks, supervised and unsupervised learning. Both need a training set to "learn". Imagine training set as a massive book where you can learn specific information. In supervised learning, the book is supplied with answer key but without the solution manual, in contrast, unsupervised learning comes without answer key or solution manual. But the goal is the same, which is that to find patterns between the questions and answers (supervised learning) and questions (unsupervised learning).
Now we have differentiate between those two, we can go into the models. Let's discuss about supervised learning, which basically has 3 main models:
artificial neural network (ANN)
convolutional neural network (CNN)
recurrent neural network (RNN)
ANN is the simplest of all three. I believe that you have understand it, so we can move forward to CNN.
Basically in CNN all you have to do is to convolve our input with feature detectors. Feature detectors are matrices which have the dimension of (row,column,depth(number of feature detectors). The goal of convolving our input is to extract informations related to spatial data. Let's say you want to distinguish between cats and dogs. Cats have whiskers but dogs does not. Cats also have different eyes than dogs and so on. But the downside is, the more convolution layers will result in slower computation time. To mitigate that, we do some kind of processing called pooling or downsampling. Basically, this reduce the size of feature detectors while minimizing lost features or information. Then the next step would be flattening or squashing all those 3d matrix into (n,1) dimension so you can input it into ANN. Then the next step is self explanatory, which is normal ANN. Because CNN is inherently able to detect certain features, it mostly(maybe always) used for classification, for example image classification, time series classification, or maybe even video classification. For a crash course in CNN, check out this video by Siraj Raval. He's my favourite youtuber of all time!
Arguably the most sophisticated of all three, RNN is bestly described as neural networks that have "memory" by introducing "loops" within them which allow information to persist. Why is this important? As you are reading this, your brain use previous memory to comprehend all of this information. You don't seem to rethink everything from scratch again and this is what traditional neural networks do, which is to forget everything and re-learn again. But native RNN aren't effective so when people talk about RNN they mostly refer to LSTM which stands for Long Short-Term Memory. If that seems confusing to you, Cristopher Olah will give you in depth explanation in a very simple way. I advice you to check out his link for complete understanding about how RNN, especially LSTM variant
As for unsupervised learning, I'm so sorry that I haven't got the time to learn them, so this is the best I can do. Good luck and have fun!

They are the same type of Networks. Convolutional Neural Networks. The problem with the overview is that as soon as you post something it is already outdated. Most of the networks you describe are already old, even though they are only a few years old.
Nevertheless you can take a look at the networks supplied by caffe (https://github.com/BVLC/caffe/tree/master/models).
In my personal view the most important concepts in deep Learning are recurrent networks (https://keras.io/layers/recurrent/), residual connections, inception blocks (see https://arxiv.org/abs/1602.07261). The rest are largely theoretical concepts, which would not fit in a stack overflow answer.

A simple Convolutional neural network code

I am interested in convolutional neural networks (CNNs) as a example of computationally extensive application that is suitable for acceleration using reconfigurable hardware (i.e. lets say FPGA)
In order to do that I need to examine a simple CNN code that I can use to understand how they are implemented, how are the computations in each layer taking place, how the output of each layer is being fed to the input of the next one. I am familiar with the theoretical part (http://cs231n.github.io/convolutional-networks/)
But, I am not interested in training the CNN, I want a complete, self contained CNN code that is pre-trained and all the weights and biases values are known.
I know that there are plenty of CNN libraries, i.e. Caffe, but the problem is that there is no trivial example code that is self contained. even for the simplest Caffe example "cpp_classification" many libraries are invoked, the architecture of the CNN is expressed as .prototxt file, other types of inputs such as .caffemodel and .binaryproto are involved. openCV2 libraries is invoked too. there are layers and layers of abstraction and different libraries working together to produce the classification outcome.
I know that those abstractions are needed to generate a "useable" CNN implementation, but for a hardware person who needs a bare-bone code to study, this is too much of "un-related work".
My question is: Can anyone guide me into a simple and self-contained CNN implementation that I can start with?

I can recommend tiny-cnn. It is simple, lightweight (e.g. header-only) and CPU only, while providing several layers frequently used within the literature (as for example pooling layers, dropout layers or local response normalization layer). This means, that you can easily explore an efficient implementation of these layers in C++ without requiring knowledge of CUDA and digging through the I/O and framework code as required by framework such as Caffe. The implementation lacks some comments, but the code is still easy to read and understand.
The provided MNIST example is quite easy to use (tried it myself some time ago) and trains efficiently. After training and testing, the weights are written to file. Then you have a simple pre-trained model from which you can start, see the provided examples/mnist/test.cpp and examples/mnist/train.cpp. It can easily be loaded for testing (or recognizing digits) such that you can debug the code while executing a learned model.
If you want to inspect a more complicated network, have a look at the Cifar-10 Example.

This is the simplest implementation I have seen: DNN McCaffrey
Also, the source code for this by Karpathy looks pretty straightforward.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008