How to set only the modified weights for each convolutional layers? - deep-learning

I am currently doing some experiments on modifying the weights and not of the bias for each convolutional layers of a model.
For each of the layers of the model, I used layer.get_weights()[0] to get the weights. After modifying the value of the weights for that particular layer I wanted to set the weights back to the corresponding layer.
I wanted to use the set_weights() method for that purpose, but however, it takes input both weights and bias, so I could not achieve that. What is the simplest method to set the weight values back to the layers of the model keeping the bias the same as it is?.
I am just a beginner and if the question is not appropriate kindly give me some suggestions and ideas.

layer.get_weights() returns list of numpy arrays. Element 0 is weights, element 1 - biases. Actually, I don't remember and can't check it right now, can this list contain something else, but it is not important in your situation, I guess.
So you can do something like:
params = layer.get_weights()
weights = params[0]
biases = params[1]
my_weights = <your modifications>
layer.set_weights([my_weights, biases])

Related

Use LSTM to forecast Precipitation

I build a LSTM to forecast Precipitation, but it doesn't work well.
My code is very simple and data is very short only contains 720 points.
i use MinMaxScale to scale the data.
this is my code, seq_len = 12
model = Sequential([
layers.LSTM(2, input_shape=(SEQ_LEN, 1),
layers.Dense(1)])
my data is like this
and the output compares with true value like this
I use adam and mae loss function, epoch=10
is it underfitting? or is this simple net can't do this work?
r2_score is no more than 0.55
please tell me how to adjust it. thanks
there are so many options;
first of all it would be better to define the optimized window size by changing the periods of the sequences
The second option would be changing the batch-size of the dataset
Change optimizer into SGD cause of few datapoints and before training model define the best values for learning rate by setting Learning Rate Schedule callback
Try another model architecture with convolution layers and etc
Sometimes it would be a trick to help model performance by setting lambda layer after the last layer to scale up values cause of lstm default activation function is tanh.

what's the meaning of 'parameterize' in deep learning?

no
13
what's the meaning of 'parameterize' in deep learning? As shown in the photo, does it means the matrix 'A' can be changed by the optimization during training?
Yes, when something can be parameterized it means that gradients can be calculated.
This means that the (dE/dw) which means the derivative of Error with respect to weight can be calculated (i.e it must be differentiable) and subtracted from the model weights along with obviously a learning_rate and other params being included depending on the optimizer.
What the paper is saying is that if you make a binary matrix a weight and then find the gradient (dE/dw) of that weight with respect to a loss and then make an update on the binary matrix through backpropagation, there is not really an activation function (which by requirement must be differentiable) that can keep the values discrete (like 0 and 1) but rather you will end up with continous values (like these decimal values).
Therefore it is saying since the idea of having binary values be weights and for them to be back-propagated in a way where the weights + activation function also yields an updated weight matrix that is also binary is difficult, another solution like the Bernoulli Distribution is used instead to initialize parameters of a model.
Hope this helps,

How much is the dimension of some bidirectional LSTM layers?

I read a paper about machine translation, and it uses projection layer. Its encoder has 6 bidirectional LSTM layers. If input embedding dimension is 512, how much will be the dimension of the encoder output? 512*2**5?
The paper's link: https://www.aclweb.org/anthology/P18-1008.pdf
Not quite. Unfortunately, Figure 1 in the mentioned paper is a bit misleading. It is not that the six encoding layers are in parallel, as it might be understood from the figure, but rather that these layers are successive, meaning that the hidden state/output from the previous layer is used in the subsequent layer as an input.
This, and the fact that the input (embedding) dimension is NOT the output dimension of the LSTM layer (in fact, it is 2 * hidden_size) change your output dimension to exactly that: 2 * hidden_size, before it is put into the final projection layer, which again is changing the dimension depending on your specifications.
It is not quite clear to me what the description of add does in the layer, but if you look at a reference implementation it seems to be irrelevant to the answer. Specifically, observe how the encoding function is basically
def encode(...):
encode_inputs = self.embed(...)
for l in num_layers:
prev_input = encode_inputs
encode_inputs = self.nth_layer(...)
# ...
Obviously, there is a bit more happening here, but this illustrates the basic functional block of the network.

How to reshape a pytorch matrix without mixing elements of items in a batch

In my Neural network model, I represent an 8 word-sentence with a 8x256 dimensional embedding matrix. I want to give it to a LSTM as a input where LSTM takes a single word embedding at a time as input and process it. According to pytorch documentation, the input should be in the shape of (seq_len, batch, input_size). What is the correct way to convert my input to desired shape ? I don't want to mixup the numbers by mistake. I am quite new in PyTorch and row-major calculations, therefore I wanted to ask it here. I do it as follows, is it correct ?
x = torch.rand(8,256)
lstm_input = torch.reshape(x,(8,1,256))
Your solution is correct: you added a Singleton dimension for the "batch" dimension, leaving x to be with temporal dimension 8 and input dimension 256.
Since you are new to pytorch, here are a few equivalent ways of doing the same thing:
x = x[:, None, :]
Putting None in the dim=1 indicates to pytorch to add a singelton dimension.
Another way is to use view:
x = x.view(8, 1, 256)

Keras LSTM input - Predicting a parabolic trajectory

I want to predict the trajectory of a ball falling. That trajectory is parabolic. I know that LSTM may be too much for this (i.e. a simpler method could suffice).
I thought that we can do this with 2 LSTM layers and a Dense layer at the end.
The end result that I want is to give the model 3 heights h0,h1,h2 and let it predict h3. Then, I want to give it h1, h2, and the h3 it outputted previously to predict h4, and so on, until I can predict the whole trajectory.
Firstly, what would the input shape be for the first LSTM layer ?
Would it be input_shape = (3,1) ?
Secondly, would the LSTM be able to predict a parabolic path ?
I am getting almost a flat line, not a parabola, and I want to rule out the possibility that I am misunderstanding how to feed and shape input.
Thank you
The input shape is in the form (samples, timeSteps, features).
Your only feature is "height", so features = 1.
And since you're going to input sequences with different lengths, you can use timeSteps = None.
So, your input_shape could be (None, 1).
Since we're going to use a stateful=True layer below, we can use batch_input_shape=(1,None,1). Choose the amount of "samples" you want.
Your model can predict the trajectory indeed, but maybe it will need more than one layer. (The exact answer about how many layers and cells depend on knowing how the match inside LSTM works).
Training:
Now, first you need to train your network (only then it will be able to start predicting good things).
For training, suppose you have a sequence of [h1,h2,h3,h4,h5,h6...], true values in the correct sequence. (I suggest you have actually many sequences (samples), so your model learns better).
For this sequence, you want an output predicting the next step, then your target would be [h2,h3,h4,h5,h6,h7...]
So, suppose you have a data array with shape (manySequences, steps, 1), you make:
x_train = data[:,:-1,:]
y_train = data[:,1:,:]
Now, your layers should be using return_sequences=True. (Every input step produces an output step). And you train the model with this data.
A this point, whether you're using stateful=True or stateful=False is not very relevant. (But if true, you always need model.reset_state() before every single epoch and sequence)
Predicting:
For predicting, you can use stateful=True in the model. This means that when you input h1, it will produce h2. And when you input h2 it will remember the "current speed" (the state of the model) to predict the correct h3.
(In the training phase, it's not important to have this, because you're inputting the entire sequences at once. So the speed will be understood between steps of the long sequences).
You can se the method reset_states() as set_current_speed_to(0). You will use it whenever the step you're going to input is the first step in a sequence.
Then you can do loops like this:
model.reset_states() #make speed = 0
nextH = someValueWithShape((1,1,1))
predictions = [nextH]
for i in range(steps):
nextH = model.predict(nextH)
predictions.append(nextH)
There is an example here, but using two features. There is a difference that I use two models, one for training, one for predicting, but you can use only one with return_sequences=True and stateful=True (don't forget to reset_states() at the beginning of every epoch in training).