Deep Q Learning : How to visualize convergence?

Deep Q Learning : How to visualize convergence? - deep-learning

I have trained an RL agent in an environment similar to the Puckworld. Theres no puck though! The agent is in continuous space and wants to reach a fixed target. Each episode the agent is born at a random location and there is an added noise to each action to make learning less trivial.
The reward is given every step as a scaled version of the distance to the target.
I want to plot the convergence of the neural network. The same problem in discrete space and using Q learning, I would plot the sum of all elements in Q matrix vs episode number. This gave me a good understanding of the performance of the network. How can i do the same for a neural network?
Plotting the reward collected in an episode vs episode number is not optimal here.
I use PyTorch. Any help is appreciated

Related

Deep Learning for Acoustic Emission concrete fracture speciments: regression on-set time and classification of type of failure

How can I use deep learning for both regression and classification tasks?
I am facing a problem with acoustic emission on fracture with concrete speciment. The objective is to find automatically the on-set time instant (time at the beginning of the acoustic emission) and the slope with the peak value to determine the kind of fracture (mode I or mode II based on the raise angle RA).
I have tried Regional CNN to work with images of the signals Fine-tuning Faster-RCNN using pytorch, but unfortunately the results are not outstanding up to now.
I would like to work with sequences (time series) of amplitude data according to a certain sampling frequency, but they have different length each. How can I deal with this problem?
Can I make a 1D-CNN which makes a sort of anomaly detection based on the supervised point that I can mark manually on training examples?
I have a certain number of recordings which I would like to exploit to train the model sampled at 100Hz. In examples on anomaly detection like Timeseries anomaly detection using an Autoencoder, they use the same time series and they perform a window with sliding 1 time step in order to obtain about 3700 to train their neural network. Instead I have different number of recordings (time series) each of them with a certain on-set time instant and different global length in seconds. How can I manage it?
I actually need the time instant of the beginning of the signal and the maximum point to define the raise angle and classify the type of fracture. Can I make classification directly with CNN simultaneously with regression tasks of the on-set time instant?
Thank you in advance!

I finally solved, thanks to the fundamental suggestion by #JonNordby, using Sound Event Detection method. We adopted and readapted the code from GitHub YashNita.
I labelled the data according to the following image:
Then, I adopted the method for extracting features from computing the spectrogram of the input signals:
And finally we were able to get a more precise output recognition of the Seismic Event Detection which is directly connected to the Acoustic Emission Event detection, obtaining the following result:
For the moment, only the event recognition phase was done, but it would be simple to readapt also to conduct classification of mode I or mode II of cracking.

Mini-batches in RL

I just read the paper of Mnih (2013) and was really wondering about the aspect that he talks about using RMSprop with minibatches of size 32 (page 6).
My understanding of these kinds of reinforcement learning algorithms is, that there is only 1 or at least very little amount of training samples per fit, and in every fit I update the network.
Whereas in supervised learning I have up to millions of samples and divide them in minibatches of e.g. 32 and update the network after every minibatch, which makes sense.
So my question is: If I put only one sample into the neural network at a time, how does minibatches make sense? Did I understand something wrong about that concept?
Thanks in advance!

The answer provided by Filip is correct. Just to add intuition to his answer, the reason why an experience replay is used is to decorrelate the experiences that the RL experienced. This is essential when non-linear function approximation is used such as neural networks.
Example: Imagine if you had 10 days to study for a chemistry and math test, and both test were on the same day. If you spend the first 5 days on chemistry and last 5 days on math, you would have forgotten most of the chemistry you studied. A neural network behaves similarly.
By decorrelating the experiences, a more general policy can be identified through the training data.
And while training the neural network, we have a batch of memory (i.e., data), and we sample random mini-batches of 32 from them to do supervised learning, just as any other neural network is trained.

The paper you mentioned introduces two mechanisms that stabilize Q-Learning method when used with a deep neural network function approximator. One of the mechanisms is called Experience Replay, and it is basically a memory buffer for observed experiences. You can find the description in the paper in the end of the fourth page. Instead of learning from the single experience you have just seen, you save it to the buffer. Learning is done every N iterations and you sample a random minibatch of experiences from the replay buffer.

caffe - how to properly train alexnet with only 7 classes

I have a small dataset collect from imagenet(7 classes each class with 1000 training data). I try to train it with alexnet model. But somehow the accuracy just cant go any higher(about 68% maximum). I remove conv4 and conv5 layer to prevent model overfitting also decrease the number of neuron in each layer(conv and fc). here is my setup.
Did i do anything wrong so that the accuracy is so low?

I want to sort out a few terms:
(1) A perceptron is an individual cell in a neural net.
(2) In a CNN, we generally focus on the kernel (filter) as a unit; this is the square matrix of perceptrons that forms a psuedo-visual unit.
(3) The only place it usually makes sense to focus on an individual perceptron is in the FC layers. When you talk about removing some of the perceptrons, I think you mean kernels.
The most important part of training a model is to make sure that your model is properly fitted to the problem at hand. AlexNet (and CaffeNet, the BVLC implementation) is fitted to the full ImageNet data set. Alex Krizhevsky and his colleagues spent a lot of research effort in tuning their network to the problem. You are not going to get similar accuracy -- on a severely reduced data set -- by simply removing layers and kernels at random.
I suggested that you start from CONVNET (the CIFAR-10 net) because it's much better tuned to this scale of problem. Most of all, I strongly recommend that you make constant use of your visualization tools, so that you can detect when the various kernel layers begin to learn their patterns, and to see the effects of small changes in the topology.
You need to run some experiments to tune and understand your topology. Record the kernel visualizations at chosen times during the training -- perhaps at intervals of 10% of expected convergence -- and compare the visual acuity as you remove a few kernels, or delete an entire layer, or whatever else you choose.
For instance, I expect that if you do this with your current amputated CaffeNet, you'll find that the severe losses in depth and breadth greatly change the feature recognition it's learning. The current depth of building blocks is not enough to recognize edges, then shapes, then full body parts. However, I could be wrong -- you do have three remaining layers. That's why I asked you to post the visualizations you got, to compare with published AlexNet features.
edit: CIFAR VISUALIZATION
CIFAR is much better differentiated between classes than is ILSVRC-2012. Thus, the training requires less detail per layer and fewer layers. Training is faster, and the filters are not nearly as interesting to the human eye. This is not a problem with the Gabor (not Garbor) filter; it's just that the model doesn't have to learn so many details.
For instance, for CONVNET to discriminate between a jonquil and a jet, we just need a smudge of yellow inside a smudge of white (the flower). For AlexNet to tell a jonquil from a cymbidium orchid, the network needs to learn about petal count or shape.

Is this a valid way to speed up kFold cross validations for deep neural network training?

In the context of convolutional neural network training, I need to do a 10-fold cross validations of my training set. Training just 1 of the 10 fold takes at least one hour on my GPU which means total time for training all 10 folds independently would take at least 10 hours! To speed up training, will my kFold result be valid if I load and tune the trained weights from the fully trained model from the first fold (fold1) for each of the rest of the KFold models (fold2, fold3... fold10)? Is there any side effect?

That won't be doing any cross-validation.
The point of retraining the net is to ensure that it's trained on a different subset of your data, and for every full training, it keeps aside a set of validation data that it has never seen. If you reload your weights from a previous training instance, you're going to be validating against data your network has already seen, and your cross-validation score will be inflated.

What is action and reward in a neural network which learns weights by reinforcement learning

My goal is to predict customer churn. I want to use reinforcement learning to train a recurrent neural network which predicts a target response for its input.
I understand that the state is represented by the input to the network at each time, but I don't understand how the action is represented. Is it the values of weights which the neural network should decide to choose by some formulas?
Also, how should we create a reward or punishment to teach the neural network its weights as we don't know the target response for each input neurons?

The aim of reinforcement learning is typically to maximize long term reward for an agent playing a game of sorts (a Markov Decision Process). In typical reinforcement learning usage, neural networks are used to approximate the Q-function. So, the network's input is the state and action (or a feature representations thereof), and the output is the value of taking that action in that state. Reinforcement learning algorithms like Q-learning provide the details on how to choose actions at a given time step, and also dictate how updates to the value function should be done.
It isn't clear how your specific goal of building a customer churn model might be formulated as a Markov Decision Problem. You could define your states to be statistics about customers' interactions with the company website, but it isn't clear what the actions might be, because it isn't clear what the agent is and what it can do. This is also why you are finding it difficult to define a reward function. The reward function should tell the agent if it's doing a good job. So, if we're imagining an MDP where the agent is trying to minimize customer churn, we might provide a negative reward proportional to the number of customers that turn over.
I don't think you want to learn a Q-function. I think it's more likely that you are interested simply in supervised learning, where you have some sample data and you want to learn a function that will tell you how much churn there will be. For this, you should be looking towards gradient descent methods and forward/backward propagation for training your neural network.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008