How BatchSize in Keras works ? LSTM-WithState-Time Series - deep-learning

I am working on Time series problem using LSTM (Stateful) on Keras.
I have 40,000 samples and using batch size of 64 and look back is 7 days. So my tensor shape is (64, 7, 6) 6 is number of features.
My question is when I say batch size = 64; How are samples selected in Keras LSTM. Is it first 64 samples followed by next 64 samples or does it divide samples to 625 windows (40000/64) and send corresponding 64 samples from each window ?
Is this important as I am working on time series problem with state LSTM as forecasting depends on previous days.

Related

Multiprocessing in StableBaselines3 - how is batch size distributed?

I am following this multiprocessing notebook. I want to understand how the batch_size parameter of the model is distributed across the multiple environments.
I have a model trained with 1 worker on 1 environment with a batch_size = 64, I understand that the network is updated in batches of 64 samples/timesteps.
Now what if I have that same model but trained with 4 workers on 4 environments, with parameter batch_size set to 64? Is the model now actually being updated with 64*4 samples/timesteps? Or is the 64 batch size being split 4 ways, so model updated with 64 samples, but 16 from each environment?
Thank you!

Question on the kernel dimensions for convolutions on mel filter bank features

I am currently trying to understand the following paper: https://arxiv.org/pdf/1703.08581.pdf. I am struggling to understand a part about how a convolution is performed on an input of log mel filterbank features:
We train seq2seq models for both end-to-end speech translation, and a baseline model for speech recognition. We found
that the same architecture, a variation of that from [10], works
well for both tasks. We use 80 channel log mel filterbank features extracted from 25ms windows with a hop size of 10ms,
stacked with delta and delta-delta features. The output softmax
of all models predicts one of 90 symbols, described in detail in
Section 4, that includes English and Spanish lowercase letters.
The encoder is composed of a total of 8 layers. The input
features are organized as a T × 80 × 3 tensor, i.e. raw features,
deltas, and delta-deltas are concatenated along the ’depth’ dimension. This is passed into a stack of two convolutional layers
with ReLU activations, each consisting of 32 kernels with shape
3 × 3 × depth in time × frequency. These are both strided by
2 × 2, downsampling the sequence in time by a total factor of 4,
decreasing the computation performed in the following layers.
Batch normalization [26] is applied after each layer.
As I understand it, the input to the convolutional layer is 3 dimensional (number of 25 ms windows (T) x 80 (features for each window) x 3 (features, delta features and delta-delta features). However, the kernels used on those inputs seem to have 4 dimensions and I do not understand why that is. Wouldn't a 4 dimensional kernel need a 4 dimensional input? In my head, the input has the same dimensions as a rgb picture: width (time) x height (frequency) x color channels (features, delta features and delta-delta features). Therefore I would think of a kernel for a 2D convolution as a filter of size a (filter width) x b (filter height) x 3 (depth of the input). Am I missing something here? What is wrong about my idea or what is done different in this paper?
Thanks in advance for your answer!
I figured it out, turns out it was just a misunderstanding from my side: the authors are using 32 kernels of shape 3x3, which results (after two layers with 2x2 striding) in an output of shape t/4x20x32 where t stands for the time dimension.

How to determine steps_per_epoch while using Image augumentation as it increase the number of images

model.fit_generator(datagen.flow(X_train,y_train,batch_size=32),epochs=10,steps_per_epoch=5000,validation_data=(X_test,y_test))
My total data size is 5000 and batch size is 32 , Then how to determine value for steps_per_epoch
case 1:When not using ImageAugumentation
cas2 2:When using using ImageAugumentation(Coz number images will increase and how to include that in steps_per_epoch)
The steps_per_epoch will be the total number of samples in your training set (before augmentation) divided by the batch size. So--
steps_per_epoch = 5000 / 32 ~ 156
Using data augmentation will not affect this calculation. You can also get more info about working with this parameter, as well as the fit_generator(), in my video on Training a CNN with Keras. The steps_per_epoch coverage starts around 4:08.

Training time on GeForce GTX Titan X with CUDA 7.5

I'm running the Caffe library on GeForce GTX Titan X with CUDA 7.5 (Ubuntu 14). I'm not sure whether Caffe is properly configured for my setup. My dataset consists of images with 256 x 256 pixels (3 channels), 100000 training / 10000 test samples. For the very first test I'm using AlexNet with new_height=256, new_width=256, crop_size=227. Running 1000 training iterations on one Titan X with batch_size=256 takes about 17 minutes... Is it not too slow for this hardware?
Any help and advices are kindly appreciated!
Running 1000 iterations on a batch of 256 images:
(256 height* 256 width* 256 batch size * 1000 iteration * 3 channels) bytes / ((1024*1024)MB * (17*60)seconds) = 47MBps compute speed.
The following may improve the performance:
If the original images are of bigger resolution, try to preprocess them to 256x256 thus reducing a lot of pixel reads from the harddisk.
Compile Caffe using Cudnn flag. This may lead to a 30% improvement in speed
Try creating an LMDB dataset of the input set and use the LMDB data for training.
Try using an SSD instead of a SATA harddisk.
No, it is not. Check out this link for Caffe performance and hardware configuration.

Multiple regression with lagged time series using libsvm

I'm trying to develop a forecaster for electric consumption. So I want to perform a regression using daily data for an entire year. My dataset has several features. Googling I've found that my problem is a Multiple regression problem (Correct me please if I am mistaken).
What I want to do is train a svm for regression with several independent variables and one dependent variable with n lagged days. Here's a sample of my independent variables, I actually have around 10. (We used PCA to determine which variables had some correlation to our problem)
Day Indep1 Indep2 Indep3
1 1.53 2.33 3.81
2 1.71 2.36 3.76
3 1.83 2.81 3.64
... ... ... ...
363 1.5 2.65 3.25
364 1.46 2.46 3.27
365 1.61 2.72 3.13
And the independendant variable 1 is actually my dependant variable in the future. So for example, with a p=2 (lagged days) I would expect my svm to train with the first 2 time series of all three independant variables.
Indep1 Indep2 Indep3
1.53 2.33 3.81
1.71 2.36 3.76
And the output value of the dependent variable would be "1.83" (Indep variable 1 on time 3).
My main problem is that I don't know how to train properly. What I was doing is just putting all features-p in an array for my "x" variables and for my "y" variables I'm just putting my independent variable on p+1 in case I want to predict next day's power consumption.
Example of training.
x with p = 2 and 3 independent variables y for next day
[1.53, 2.33, 3.81, 1.71, 2.36, 3.76] [1.83]
I tried with x being a two dimensional array but when you combine it for several days it becomes a 3d array and libsvm says it can't be.
Perhaps I should change from libsvm to another tool or maybe it's just that I'm training incorrectly.
Thanks for your help,
Aldo.
Let me answer with the python / numpy notation.
Assume the original time series data matrix with columns (Indep1, Indep2, Indep3, ...) is a numpy array data with shape (n_samples, n_variables). Let's generate it randomly for this example:
>>> import numpy as np
>>> n_samples = 100, n_variables = 5
>>> data = np.random.randn(n_samples, n_variables)
>>> data.shape
(100, 5)
If you want to use a window size of 2 time-steps, then the training set can be built as follows:
>>> targets = data[2:, 0] # shape is (n_samples - 2,)
>>> targets.shape
(98,)
>>> features = np.hstack([data[0:-2, :], data[1:-1, :]]) # shape is (n_samples - 2, n_variables * 2)
>>> features.shape
(98, 10)
Now you have your 2D input array + 1D targes that you can feed to libsvm or scikit-learn.
Edit: it might very well be the case that extracting more time-series oriented features such as moving average, moving min, moving max, moving differences (time based derivatives of the signal) or STFT might help your SVM mode make better predictions.