Mini-batches in RL - reinforcement-learning

I just read the paper of Mnih (2013) and was really wondering about the aspect that he talks about using RMSprop with minibatches of size 32 (page 6).
My understanding of these kinds of reinforcement learning algorithms is, that there is only 1 or at least very little amount of training samples per fit, and in every fit I update the network.
Whereas in supervised learning I have up to millions of samples and divide them in minibatches of e.g. 32 and update the network after every minibatch, which makes sense.
So my question is: If I put only one sample into the neural network at a time, how does minibatches make sense? Did I understand something wrong about that concept?
Thanks in advance!

The answer provided by Filip is correct. Just to add intuition to his answer, the reason why an experience replay is used is to decorrelate the experiences that the RL experienced. This is essential when non-linear function approximation is used such as neural networks.
Example: Imagine if you had 10 days to study for a chemistry and math test, and both test were on the same day. If you spend the first 5 days on chemistry and last 5 days on math, you would have forgotten most of the chemistry you studied. A neural network behaves similarly.
By decorrelating the experiences, a more general policy can be identified through the training data.
And while training the neural network, we have a batch of memory (i.e., data), and we sample random mini-batches of 32 from them to do supervised learning, just as any other neural network is trained.

The paper you mentioned introduces two mechanisms that stabilize Q-Learning method when used with a deep neural network function approximator. One of the mechanisms is called Experience Replay, and it is basically a memory buffer for observed experiences. You can find the description in the paper in the end of the fourth page. Instead of learning from the single experience you have just seen, you save it to the buffer. Learning is done every N iterations and you sample a random minibatch of experiences from the replay buffer.

Related

Why is WGAN considered an adversarial network?

I finished going over the WGAN paper: WGAN Paper Link
After reading the algorithm provided by the writers I find it odd that they would refer to the network as an adversarial network.
In the first part of the algorithm a 'critic' is trained to optimality and they show this critic approximates the Wasserstein distance between our generator distribution and the real distribution. We then take this approximation and update the parameters of the generator distribution in the direction of the gradient of the critic. So in a sense we're just approximating a loss function and then we tell the generator in what direction is best to go. so a critic is a very good name for this, but calling it an adversarial network implies that the generator and the critic are at odds. Any ideas why this should still be nicknamed an adversarial network?
The name "adversarial" does not come from this paper, it comes from the GAN itself, this paper is merely an incremental work on top (and thus is not renaming anything). The reason why the "original" GAN is called Generative Adversarial Network is because it is trained in a form of a two-player, competitive game, where a generator task is to fool discriminator, and discriminators task is to well, not be fooled. This is the "at odds" part. And it is indeed critical to the whole system, vast majority of problems of GANs, that spawned hundreds of papers (like the one above) comes from the fact that greedy optimization of 2 player games has much more chaotic dynamics, and will not "just converge with small enough learning rate" that normal minimization of (smooth enough) loss function would. From math perspective, the subtle difference that makes things chaotic is that gradients that train discriminator are not back-propagated to the generator. Otherwise generator would be "helping" discriminator. Because of this stop gradient the emerging dynamics is no longer a gradient vector field of any loss, and instead it is a dynamical system emerging from simultaneous minimization of 2 functions (also called 2 player games).

Best Neural Network architecture for traditional large multiclass classification problem

I am new to deep learning (I just finished to read deep learning with pytorch), and I was wondering what is the best neural network architecture for my case.
I have a large multiclass classification problem (user identification problem), about 1000 classes in which each class is a user. I have about 2000 features for each user after one-hot encoding and cleaning. Data are highly imbalanced, but I can always use oversampling/downsampling techniques.
I was wondering what is the best architecture to implement for my case. I've always seen deep learning applied to time series or images, so I'm not sure about what to use in this case. I was thinking about a multi-layer perceptron but maybe there are better solutions.
Thanks for your tips and help. Have a nice day!
You can try triplet learning instead of simple classification.
From your 1000 users, you can make, c * 1000 * 999 / 2 pairs. c is the average number of samples per class/user.
https://arxiv.org/pdf/1412.6622.pdf

Difference between Evolutionary Strategies and Reinforcement Learning?

I am learning about the approach employed in Reinforcement Learning for robotics and I came across the concept of Evolutionary Strategies. But I couldn't understand how RL and ES are different. Can anyone please explain?
To my understanding, I know of two main ones.
1) Reinforcement learning uses the concept of one agent, and the agent learns by interacting with the environment in different ways. In evolutionary algorithms, they usually start with many "agents" and only the "strong ones survive" (the agents with characteristics that yield the lowest loss).
2) Reinforcement learning agent(s) learns both positive and negative actions, but evolutionary algorithms only learns the optimal, and the negative or suboptimal solution information are discarded and lost.
Example
You want to build an algorithm to regulate the temperature in the room.
The room is 15 °C, and you want it to be 23 °C.
Using Reinforcement learning, the agent will try a bunch of different actions to increase and decrease the temperature. Eventually, it learns that increasing the temperature yields a good reward. But it also learns that reducing the temperature will yield a bad reward.
For evolutionary algorithms, it initiates with a bunch of random agents that all have a preprogrammed set of actions it is going to do. Then the agents that has the "increase temperature" action survives, and moves onto the next generation. Eventually, only agents that increase the temperature survive and are deemed the best solution. However, the algorithm does not know what happens if you decrease the temperature.
TL;DR: RL is usually one agent, trying different actions, and learning and remembering all info (positive or negative). EM uses many agents that guess many actions, only the agents that have the optimal actions survive. Basically a brute force way to solve a problem.
I think the biggest difference between Evolutionary Strategies and Reinforcement Learning is that ES is a global optimization technique while RL is a local optimization technique. So RL can converge to a local optima converging faster while ES converges slower to a global minima.
Evolution Strategies optimization happens on a population level. An evolution strategy algorithm in an iterative fashion (i) samples a batch of candidate solutions from the search space (ii) evaluates them and (iii) discards the ones with low fitness values. The sampling for a new iteration (or generation) happens around the mean of the best scoring candidate solutions from the previous iteration. Doing so enables evolution strategies to direct the search towards a promising location in the search space.
Reinforcement learning requires the problem to be formulated as a Markov Decision Process (MDP). An RL agent optimizes its behavior (or policy) by maximizing a cumulative reward signal received on a transition from one state to another. Since the problem is abstracted as an MDP learning can happen on a step or episode level. Learning per step (or N steps) is done via temporal-Difference learning (TD) and per episode is done via Monte Carlo methods. So far I am talking about learning via action-value functions (learning the values of actions). Another way of learning is by optimizing the parameters of a neural network representing the policy of the agent directly via gradient ascent. This approach is introduced in the REINFORCE algorithm and the general approach known as policy-based RL.
For a comprehensive comparison check out this paper https://arxiv.org/pdf/2110.01411.pdf

Overview for Deep Learning Networks

I am fairly new to Deep Learning and get quite overwhelmed by the many different Nets and their field of application. Thus, I want to know if there is some kind of overview which kind of different networks exist, what there key-features are and what kind of purpose they have.
For example I know abut LeNet, ConvNet, AlexNet - and somehow they are the same but still differ?
There are basically two types of neural networks, supervised and unsupervised learning. Both need a training set to "learn". Imagine training set as a massive book where you can learn specific information. In supervised learning, the book is supplied with answer key but without the solution manual, in contrast, unsupervised learning comes without answer key or solution manual. But the goal is the same, which is that to find patterns between the questions and answers (supervised learning) and questions (unsupervised learning).
Now we have differentiate between those two, we can go into the models. Let's discuss about supervised learning, which basically has 3 main models:
artificial neural network (ANN)
convolutional neural network (CNN)
recurrent neural network (RNN)
ANN is the simplest of all three. I believe that you have understand it, so we can move forward to CNN.
Basically in CNN all you have to do is to convolve our input with feature detectors. Feature detectors are matrices which have the dimension of (row,column,depth(number of feature detectors). The goal of convolving our input is to extract informations related to spatial data. Let's say you want to distinguish between cats and dogs. Cats have whiskers but dogs does not. Cats also have different eyes than dogs and so on. But the downside is, the more convolution layers will result in slower computation time. To mitigate that, we do some kind of processing called pooling or downsampling. Basically, this reduce the size of feature detectors while minimizing lost features or information. Then the next step would be flattening or squashing all those 3d matrix into (n,1) dimension so you can input it into ANN. Then the next step is self explanatory, which is normal ANN. Because CNN is inherently able to detect certain features, it mostly(maybe always) used for classification, for example image classification, time series classification, or maybe even video classification. For a crash course in CNN, check out this video by Siraj Raval. He's my favourite youtuber of all time!
Arguably the most sophisticated of all three, RNN is bestly described as neural networks that have "memory" by introducing "loops" within them which allow information to persist. Why is this important? As you are reading this, your brain use previous memory to comprehend all of this information. You don't seem to rethink everything from scratch again and this is what traditional neural networks do, which is to forget everything and re-learn again. But native RNN aren't effective so when people talk about RNN they mostly refer to LSTM which stands for Long Short-Term Memory. If that seems confusing to you, Cristopher Olah will give you in depth explanation in a very simple way. I advice you to check out his link for complete understanding about how RNN, especially LSTM variant
As for unsupervised learning, I'm so sorry that I haven't got the time to learn them, so this is the best I can do. Good luck and have fun!
They are the same type of Networks. Convolutional Neural Networks. The problem with the overview is that as soon as you post something it is already outdated. Most of the networks you describe are already old, even though they are only a few years old.
Nevertheless you can take a look at the networks supplied by caffe (https://github.com/BVLC/caffe/tree/master/models).
In my personal view the most important concepts in deep Learning are recurrent networks (https://keras.io/layers/recurrent/), residual connections, inception blocks (see https://arxiv.org/abs/1602.07261). The rest are largely theoretical concepts, which would not fit in a stack overflow answer.

Deep learning, Loss does not decrease

I have tried to finetune a pretrained model using a training set that has 20 classes. The important thing to mention is that even though I have 20 classes, one class consist the 1/3 of the training images. Is that a reason that my loss does not decrease and testing accuracy is almost 30%?
Thank you for any advise
I had similar problem. I resolved it by increasing the variance of the initial values for the neural network weights. This serves as pre-conditioning for the neural network, to prevent the weights from dying out during back-prop.
I came across neural network lectures from Prof. Jenny Orr's course and found it very informative. (Just realized that Jenny co-authored many papers with Yann LeCun and Leon bottou in the early years on neural network training).
Hope it helps!
Yes it is very possible that your net is overfitting to the unbalanced labels. One solution is you can perform data augmentation on the other labels to balance them out. For example, if you have image data: you can do random crops, take horizontal/vertical flips, a variety of techniques.
Edit:
One way to check if you are overfitting to the unbalanced labels is to compute a histogram of your nets predicted labels. If it's highly skewed towards the unbalanced class, you should try the above data augmentation method and retrain your net and see if that helps.