Stable Baselines - PPO Iterate through the data frame for learning

Stable Baselines - PPO Iterate through the data frame for learning - reinforcement-learning

PPO model doesn't iterate through the whole dataframe .. its basically repeating the first step many times (10,000 in this example) ?
In this case, DF's shape is (5476, 28) and each step's obs shape is: (60, 28).. I dont see that its iterating through the whole DF.
# df shape - (5476, 28)
env = MyRLEnv(df)
model = PPO("MlpPolicy", env, verbose=4)
model.learn(total_timesteps=10000)
MyRLEnv:
self.action_space = spaces.Discrete(4)
self.observation_space = spaces.Box(low=-np.inf, high=np.inf, shape=(60, 28) , dtype=np.float64)
Thanks!

I also got stuck few days ago on something similar to this , but after inspecting deeply I found that the learn method actually runs the environment for n times , now this n is equal to total_timesteps/size_of_df , this in your case would be nearly 10000/5476times which is almost equal to 1.8 , so this 1.8 means , the algorithm would reset the environment at the beginning, then run the step method for the entire dataframe and reset the environment again and run the step method for only 80% of the data in dataframe. So, when the PPO Algorithm stops you see only 80% of the dataframe is being ran .
The Actor Critic Algorithms runs the environment numerous number of times to improve it's efficiency, so that is the reason it is usually suggested that in order to get better results we should keep the value of total_timesteps fairly high , so that it can run it on the same data for quite some times to learn better.
Example:
Say my total_timesteps = 10000 and len(df) = 5000,
then in that case it would run for, n = total_timesteps/len(df) = 2 Full scans of the entire dataframe .

Related

Pytorch : different behaviours in GAN training with different, but conceptually equivalent, code

I'm trying to implement a simple GAN in Pytorch. The following training code works:
for epoch in range(max_epochs): # loop over the dataset multiple times
print(f'epoch: {epoch}')
running_loss = 0.0
for batch_idx,(data,_) in enumerate(data_gen_fn):
# data preparation
real_data = data
input_shape = real_data.shape
inputs_generator = torch.randn(*input_shape).detach()
# generator forward
fake_data = generator(inputs_generator).detach()
# discriminator forward
optimizer_generator.zero_grad()
optimizer_discriminator.zero_grad()
#################### ALERT CODE #######################
predictions_on_real = discriminator(real_data)
predictions_on_fake = discriminator(fake_data)
predictions = torch.cat((predictions_on_real,
predictions_on_fake), dim=0)
#########################################################
# loss discriminator
labels_real_fake = torch.tensor([1]*batch_size + [0]*batch_size)
loss_discriminator_batch = criterion_discriminator(predictions,
labels_real_fake)
# update discriminator
loss_discriminator_batch.backward()
optimizer_discriminator.step()
# generator
# zero the parameter gradients
optimizer_discriminator.zero_grad()
optimizer_generator.zero_grad()
fake_data = generator(inputs_generator) # make again fake data but without detaching
predictions_on_fake = discriminator(fake_data) # D(G(encoding))
# loss generator
labels_fake = torch.tensor([1]*batch_size)
loss_generator_batch = criterion_generator(predictions_on_fake,
labels_fake)
loss_generator_batch.backward() # dL(D(G(encoding)))/dW_{G,D}
optimizer_generator.step()
If I plot the generated images for each iteration, I see that the generated images look like the real ones, so the training procedure seems to work well.
However, if I try to change the code in the ALERT CODE part , i.e., instead of:
#################### ALERT CODE #######################
predictions_on_real = discriminator(real_data)
predictions_on_fake = discriminator(fake_data)
predictions = torch.cat((predictions_on_real,
predictions_on_fake), dim=0)
#########################################################
I use the following:
#################### ALERT CODE #######################
predictions = discriminator(torch.cat( (real_data, fake_data), dim=0))
#######################################################
That is conceptually the same (in a nutshell, instead of doing two different forward on the discriminator, the former on the real, the latter on the fake data, and finally concatenate the results, with the new code I first concatenate real and fake data, and finally I make just one forward pass on the concatenated data.
However, this code version does not work, that is the generated images seems to be always random noise.
Any explanation to this behavior?

Why do we different results?
Supplying inputs in either the same batch, or separate batches, can make a difference if the model includes dependencies between different elements of the batch. By far the most common source in current deep learning models is batch normalization. As you mentioned, the discriminator does include batchnorm, so this is likely the reason for different behaviors. Here is an example. Using single numbers and a batch size of 4:
features = [1., 2., 5., 6.]
print("mean {}, std {}".format(np.mean(features), np.std(features)))
print("normalized features", (features - np.mean(features)) / np.std(features))
>>>mean 3.5, std 2.0615528128088303
>>>normalized features [-1.21267813 -0.72760688 0.72760688 1.21267813]
Now we split the batch into two parts. First part:
features = [1., 2.]
print("mean {}, std {}".format(np.mean(features), np.std(features)))
print("normalized features", (features - np.mean(features)) / np.std(features))
>>>mean 1.5, std 0.5
>>>normalized features [-1. 1.]
Second part:
features = [5., 6.]
print("mean {}, std {}".format(np.mean(features), np.std(features)))
print("normalized features", (features - np.mean(features)) / np.std(features))
>>>mean 5.5, std 0.5
>>>normalized features [-1. 1.]
As we can see, in the split-batch version, the two batches are normalized to the exact same numbers, even though the inputs are very different. In the joint-batch version, on the other hand, the larger numbers are still larger than the smaller ones as they are normalized using the same statistics.
Why does this matter?
With deep learning, it's always hard to say, and especially with GANs and their complex training dynamics. A possible explanation is that, as we can see in the example above, the separate batches result in more similar features after normalization even if the original inputs are quite different. This may help early in training, as the generator tends to output "garbage" which has very different statistics from real data.
With a joint batch, these differing statistics make it easy for the discriminator to tell the real and generated data apart, and we end up in a situation where the discriminator "overpowers" the generator.
By using separate batches, however, the different normalizations result in the generated and real data to look more similar, which makes the task less trivial for the discriminator and allows the generator to learn.

How to read 3x3xN coordinates string into matlab array efficently

I have a MATLAB script that takes a JSON that was created by myself in a remote server and contains a long list of 3x3xN coordinates e.g. for N=1:
str = '[1,2,3.14],[4,5.66,7.8],[0,0,0],';
I want to avoid string splitting it, is there any approach to use strread or similar to read this 3×3×N tensor?
It's a multi-particle system and N can be large, though I have enough memory to store it all at once in the memory.
Any suggestion of how to format the array string in the JSON is very welcome as well.

If you can guarantee the format is always the same, I think it's easiest, safest and fastest to use sscanf:
fmt = '[%f,%f,%f],[%f,%f,%f],[%f,%f,%f],';
data = reshape(sscanf(str, fmt), 3, 3).';
Depending on the rest of your data (how is that "N" represented?), you might need to adjust that reshape/transpose.
EDIT
Based on your comment, I think this will solve your problem quite efficiently:
% Strip unneeded concatenation characters
str(str == ',') = ' ';
str(str == ']' | str == '[') = [];
% Reshape into workable dimensions
data = permute( reshape(sscanf(str, '%f '), 3,3,[]), [2 1 3]);
As noted by rahnema1, you can avoid the permute and/or character removal by adjusting your JSON generators to spit out the data column-major and without brackets, but you'll have to ask yourself these questions:
whether that is really worth the effort, considering that this code right here is already quite tiny and pretty efficient
whether other applications are going to use the JSON interface, because in essence you're de-generalizing the JSON output just to fit your processing script on the other end. I think that's a pretty bad design practice, but oh well.
Just something to keep in mind:
emitting 500k values in binary is about 34 MB
doing the same in ASCII is about 110 MB
Now depending a bit on your connection speed, I'd be getting really annoyed really quickly because every little test run takes about 3 times as long as it should be taking :)
So if an API call straight to the raw data is not possible, I would at least base64 that data in the JSON.

You can use eval function:
str = '[1,2,3.14],[4,5.66,7.8],[0,0,0],';
result=permute(reshape(eval(['[' ,str, ']']),3,3,[]),[2 1 3])
result =
1.00000 2.00000 3.14000
4.00000 5.66000 7.80000
0.00000 0.00000 0.00000
Using eval all elements concatenated to create a row vector. Then row vector reshaped to a 3d array. Since in MATLAB elements are placed in matrix columnwise it is required to permute the array so each 3*3 matrix are trasposed.
note1: There is no need to place [] in jSON string so you can use str2num instead of eval :
result=permute(reshape(str2num(str),3,3,[]),[2 1 3])
note2:
if you save data columnwise there is no need to permute:
str='1 4 0 2 5.66 0 3.14 7.8 0';
result=reshape(str2num(str),3,3,[])
Update: As Ander Biguri and excaza noted about security an speed issues related to eval and str2num and after Rody Oldenhuis 's suggestion about using sscanf I tested 3 methods in Octave:
a=num2str(rand(1,60000));
disp('-----SSCANF---------')
tic
sscanf(a,'%f ');
toc
disp('-----STR2NUM---------')
tic
str2num(a);
toc
disp('-----STRREAD---------')
tic
strread(a,'%f ');
toc
and here is the result:
-----SSCANF---------
Elapsed time is 0.0344398 seconds.
-----STR2NUM---------
Elapsed time is 0.142491 seconds.
-----STRREAD---------
Elapsed time is 0.515257 seconds.
So it is more secure and faster to use sscanf, in your case:
str='1 4 0 2 5.66 0 3.14 7.8 0';
result=reshape(sscanf(str,'%f '),3,3,[])
or
str='1, 4, 0, 2, 5.66, 0, 3.14, 7.8, 0';
result=reshape(sscanf(str,'%f,'),3,3,[])

Testing from an LMDB file in Caffe

I am wondering how to go about setting up ONLY a test phase in Caffe for an LMDB file. I have already trained my model, everything seems good, my loss has decreased, and the output I am getting on images loaded in one by one also seem good.
Now I would like to see how my model performs on a separate LMDB test set, but seem to be unable to do so successfully. It would not be ideal for me to do a loop by loading images one at a time since my loss function is already defined in caffe and this would require me to redefine it.
this is what I have so far, but the results of this dont make sense; when I compare the loss I have from the train set to the loss I get from this, they don't match (orders of magnitude apart). Does anyone have any idea what my problem could be?
caffe.set_device(0)
caffe.set_mode_gpu()
net = caffe.Net('/home/jeremy/Desktop/caffestuff/JP_Kitti/all_proto/mirror_shuffle/deploy_JP.prototxt','/home/jeremy/Desktop/caffestuff/JP_Kitti/all_proto/mirror_shuffle/snapshot_iter_10000.caffemodel',caffe.TEST)
solver = None # ignore this workaround for lmdb data (can't instantiate two solvers on the same data)
solver = caffe.SGDSolver('/home/jeremy/Desktop/caffestuff/JP_Kitti/all_proto/mirror_shuffle/lenet_auto_solverJP_test.prototxt')
niter = 100
test_loss = zeros(niter)
count = 0
for it in range(niter):
solver.test_nets[0].forward() # SGD by Caffe
# store the test loss
test_loss[count] = solver.test_nets[0].blobs['loss']
print(solver.test_nets[0].blobs['loss'].data)
count = count+1

See my answer here. Do not forget to subtract the mean, otherwise you'll get low accuracy. The link to the code, posted above, takes care of that.

R: NaiveBayes incrementally on a large data set

I have a large data set in a MySQL database (at least 11 GB of data). I would like to train a NaiveBayes model on the entire set and then test is against a smaller but also quite large data set (~3 GB).
The second part seems feasible - I assume that I would run the following in a loop:
data_test <- sqlQuery(con, paste("select * from test_data LIMIT 10000", "OFFSET", (i*10000) ))
model_pred <- predict(model, data_test, type="raw")
...and then dump the predictions back to MySQL or a CSV.
How can I, however, train my model incrementally on such a large data set? I noticed in the R documentation of the function (http://www.inside-r.org/packages/cran/e1071/docs/naiveBayes) that there is an addtional argument in the predict function "newdata" which suggests that incremental learning is possible. The predict function however will return the predictions and not a new model.
Please provide me with an example of how to incrementally train my model.

Dynamic Topic model output - Blei format

I am working with the Dynamic Topic Models package that was developed by Blei. I am new to LDA however I understand it.
I would like to know what does the output by the name of
lda-seq/topic-000-var-obs.dat store?
I know that lda-seq/topic-001-var-e-log-prob.dat stores the log of the variational posterior and by applying the exponential over it, I get the probability of the word within Topic 001.
Thanks

Topic-000-var-e-log-prob.dat store the log of the variational posterior of the topic 1.
Topic-001-var-e-log-prob.dat store the log of the variational posterior of the topic 2.

I have failed to find a concrete answer anywhere. However, since the documentation's sample.sh states
The code creates at least the following files:
- topic-???-var-e-log-prob.dat: the e-betas (word distributions) for topic ??? for all times.
...
- gam.dat
without mentioning the topic-000-var-obs.dat file, suggests that it is not imperative for most analyses.
Speculation
obs suggest observations. After a little dig around in the example/model_run results, I plotted the sum across epochs for each word/token using:
temp = scan("dtm/example/model_run/lda-seq/topic-000-var-obs.dat")
temp.matrix = matrix(temp, ncol = 10, byrow = TRUE)
plot(rowSums(temp.matrix))
and the result is something like:
The general trend of the non-negative values is decreasing and many values are floored (in this case to -11.00972 = log(1.67e-05)) Suggesting that these values are weightings or some other measure of influence on the model. The model removes some tokens and the influence/importance of the others tapers off over the index. The later trend may be caused by preprocessing such as sorting tokens by tf-idf when creating the dictionary.
Interestingly the row sum values varies for both the floored tokens and the set with more positive values:
temp = scan("~/Documents/Python/inference/project/dtm/example/model_run/lda-seq/topic-009-var-obs.dat")
temp.matrix = matrix(temp, ncol = 10, byrow = TRUE)
plot(rowSums(temp.matrix))

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Stable Baselines - PPO Iterate through the data frame for learning - reinforcement-learning

Related

Pytorch : different behaviours in GAN training with different, but conceptually equivalent, code

How to read 3x3xN coordinates string into matlab array efficently

Testing from an LMDB file in Caffe

R: NaiveBayes incrementally on a large data set

Dynamic Topic model output - Blei format

Categories

Resources