Lower batch size in the last iteration of first training epoch than the other iteration - deep-learning

I'm trying to train an deep neural network model, the output dimensions of each iteration in one epoch is like [64,1600,8] (64 is the batch size). But in the last iteration of first epoch, this output changed to [54,1600,8] and faced with dimension error. Why in the last iteration batch size had changed??
Additionally, if I change the batch size to 32 the last iteration's output is [22,1600,8].
I think that the output of the last iteration must be same as the other iteration.

The last iteration batch size changed because you did not have enough data to completely fill the batch. If you have a batch size of 10, for example, and you have 101 entries total in your data, then you will have 10 batches of 10 and 1 batch of 1.
The solution is to either drop the batch if it is not the correct size, or to adapt your model so that it will detect the size of the batch and change accordingly, instead of having the batch size hard-coded in to your model parameters.

Seeing that you are using pytorch, I'll add to the answer by Richard by saying that pytorch DataLoaders have the functionality built-in to drop the last (incomplete) batch. Checking the documentation, you can specify drop_last=True while instantiating the DataLoader.

Related

Slow prediction speed for translation model opus-mt-en-ro

I'm using the model Helsinki-NLP/opus-mt-en-ro from huggingface.
To produce output, I'm using the following code:
inputs = tokenizer(
questions,
max_length=max_input_length,
truncation=True,
return_tensors='pt',
padding=True).to('cuda')
translation = model.generate(**inputs)
For small inputs (i.e. the number of sentences in questions), it works fine. However, when the number of sentences increases (e.g., batch size = 128), it is very slow.
I have a dataset of 100K examples and I have to produce the output. How to make it faster? (I already checked the usage of GPU and it varies between 25% and 70%).
Update: Following the comment of dennlinger, here is the additional information:
Average question length: Around 30 tokens
Definition of slowness: With a batch of 128 questions, it takes around 25 seconds. So given my dataset of 100K examples, it will take more than 5 hours. I'm using GPU Nvidia V100 (16GB) (hence to('cuda') in the code). I cannot increase the batch size because it results in out of memory error.
I didn't try different parameters, but I know by default, the number of beams equals 1.

How is dividing into minibatches implemented in batch normalization for deeper layers?

Suppose, we have dataset X (2D array), and we divide it into batches X_1, ..., X_k.
Then for each batch we do normalization, then each i-th component of batch element we multiply by parameter gamma_i and add to them beta_i.
Batch normalization layer can be repeated several times and I didn't found anything about how it is implemented deeper in network.
In next BN layers do we use the same division to batches as in the beginning (using the same rows in X as in the firsh BN layer), just adding new gamma and beta parameters, or we do it from scratch for every layers's input?
Hope, my question is clear.

Epochs and Iterations in Deeplearning4j

I recently started learning Deeplearning4j and I fail to understand how the concept of epochs and iterations is actually implemented.
In the online documentation it says:
an epoch is a complete pass through a given dataset ...
Not to be confused with an iteration, which is simply one
update of the neural net model’s parameters.
I ran a training using a MultipleEpochsIterator, but for the first run I set 1 epoch, miniBatchSize = 1 and a dataset of 1000 samples, so I expected the training to finish after 1 epoch and 1000 iterations, but after more than 100.000 iterations it was still running.
int nEpochs = 1;
int miniBatchSize = 1;
MyDataSetFetcher fetcher = new MyDataSetFetcher(xDataDir, tDataDir, xSamples, tSamples);
//The same batch size set here was set in the model
BaseDatasetIterator baseIterator = new BaseDatasetIterator(miniBatchSize, sampleSize, fetcher);
MultipleEpochsIterator iterator = new MultipleEpochsIterator(nEpochs, baseIterator);
model.fit(iterator)
Then I did more tests changing the batch size, but that didn't change the frequency of the log lines printed by the IterationListener. I mean that I thought that if I increase the batch size to 100 then with 1000 samples I would have just 10 updates of the parameters an therefore just 10 iterations, but the logs and the timestamp intervals are more or less the same.
BTW. There is a similar question, but the answer does not actually answer my question, I would like to understand better the actual details:
Deeplearning4j: Iterations, Epochs, and ScoreIterationListener
None of this will matter after 1.x (which is already out in alpha) - we got rid of iterations long ago.
Originally it was meant to be shortcut syntax so folks wouldn't have to write for loops.
Just focus on for loops with epochs now.

How do I read variable length 1D inputs in Tensorflow?

I'm trying to read variable length 1-D inputs into a Tensorflow CNN.
I have previously implemented reading fixed length inputs by first constructing a CSV file (where the first column is the label and the remaining columns are the input values - flattened spectrogram data all padded/truncated to the same length) using tf.TextLineReader().
This time I have a directory full of files each one containing a line of data I want to use as input (flattened spectrogram data again but I do not want to force them to the same dimensions), and the line lengths are not fixed. I'm getting an error trying to use the previous approach of compiling a CSV first. I looked into the documentation of tf.TextLineReader() and it specifies that all CSV rows must be the same shape, so I am stuck! Any help would be much appreciated, thanks :)
I'm assuming that the data isn't changing shape when you have a longer or shorter sample right? By that I mean that if you trained your network on arrays of 1000 pixels for example, with a kernel of say [5,1] size. That [5,1] kernel needs to see the same patterns in the variable length data as it did in the training data. If your data is stretched or shrunk, then the correct solution is to interpolate the data to the same size as the training data so the shapes/patterns match.
Assuming you just want variable length inputs, then in theory you should be able to do this by setting your batch size to 1 and varying the 1st dimension of the data.
So your input placeholder would look like:
X = tf.placeholder(dtype, shape=[1,None,1,1])
The 4 shape arguments are: 1=batch size; None=unknown first dimension size; 1=unused because it's a 1D dataset, 1=one channel images, again unused but necessary for tf.conv2d to receive the expected 4D image.
This is not very different from configuring tensorflow to support variable batch sizes. So you should review this link below and understand that process.
get the size of a variable batch dimension
Note that you can't use a batch size more than 1 here because you wouldn't be able to construct a matrix with missing values in the 2nd dimension. I expect the convolution operations to work with this variable dimension (though I haven't actually tried this).
Another option to deal with this problem would be to pad your inputs with 0's so they all have a common length, but that will need to have been trained into the model up front.

How to print probability for repeated measures logistic regression?

I would like SAS to print the probability of my binary dependent variable occurring (“Calliphoridae” a particular fly family being present (1) or not (0), at a specific instance for my continuous independent variable (“degree_index” that was recorded from .055 to 2.89, but can be continuously recorded past 2.89 and always increases as time goes on) using Proc GENMOD. How do I change my code to print the probability, for example, that Calliphoridae is present at degree_index=.1?
My example code is:
proc genmod data=thesis descending ;
class Body_number ;
model Calliphoridae = degree_index / dist=binomial link=logit ;
repeated subject=Body_number/ type=cs;
estimate 'degreeindex=.1' intercept 1 degree_index 0 /exp;
estimate 'degree_index=.2' intercept 1 degree_index .1 /exp;run;
I get an output for the contrast estimate results as mean estimate at degree_index=.1 is ..99; degree_index=.2 is .98.
I think that it is correctly modeling the probability...I just didn't include the square of
the degree-day index. If you do, it allows the probability to increase and decrease. I
realized this when I did the probability by hand
(e^-1.1307x+.2119)/(1+e^-1.1307x+.2119) to verify that this really was modeling
probability when y=1 for the mean estimates at specific x values...and then I realized that it is
fitting a regression line and cannot increase and decrease because there is only
one x value. http://www.stat.sc.edu/~hansont/stat704/chapter14a.pdf