I'm using the Keras functional API, and I would like to know: when exactly are the kernels initialized? Is it during the creation of the layer, like in
x = Dense(32, kernel_initializer='glorot_uniform')(x)
or is it during the compilation of the model? e.g.
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
I guess it's not during model.fit(...) or I wouldn't be able to fine-tune a pre-trained model, because the previous weights would be lost. Am I missing something?
It turns out that the Layer superclass defines the method build(input_shape) that all derived classes that have weights, such as Dense and Conv2D, must implement. In the method, among other things, the weight variables are created and initialized. This build is actually called by the Layer's method __call__, which is the one called in the line
x = Dense(32, kernel_initializer='glorot_uniform')(x)
right after the constructor, __init__.
Reference: https://github.com/fchollet/keras/blob/master/keras/engine/topology.py
Related
In Caffe deep-learning framework there is an argmax layer which is not differentiable and hence can not be used for end to end training of a CNN.
Can anyone tell me how I could implement the soft version of argmax which is soft-argmax?
I want to regress coordinates from heatmap and then use those coordinates in loss calculations. I am very new to this framework therefore no idea how to do this. any help will be much appreciated.
I don't get exactly what you want, but there are following options:
Use L2 loss to train regression task (EuclideanLoss). Or SmoothL1Loss (from SSD Caffe by Wei Lui), or L1 (don't know were you get it).
Use softmax with cross-entropy loss (SoftmaxWithLoss) to train classification task with classes corresponding to the possible values of x or y coordinate. For example, one loss layer for x, and one for y. SoftmaxWithLoss accepts label as a numeric value, and casts it to int with static_cast(). But take into account that implementation doesn't check that the casted value is within 0..(num_classes-1) range, so you have to be careful.
If you want something more unusual, you'll have to write you own layer in C++, C++/CUDA or Python+NumPy. This is very often the case unless you are already using someone other's implementation.
Basically as this thread discusses here, you cannot use python list to wrap your sub-modules (for example your layers); otherwise, Pytorch is not going to update the parameters of the sub-modules inside the list. Instead you should use nn.ModuleList to wrap your sub-modules to make sure their parameters are going to be updated. Now I have also seen codes like following where the author uses python list to calculate the loss and then do loss.backward() to do the update (in reinforce algorithm of RL). Here is the code:
policy_loss = []
for log_prob in self.controller.log_probability_slected_action_list:
policy_loss.append(- log_prob * (average_reward - b))
self.optimizer.zero_grad()
final_policy_loss = (torch.cat(policy_loss).sum()) * gamma
final_policy_loss.backward()
self.optimizer.step()
Why using the list in this format works for updating the parameters of modules but the first case does not work? I am very confused now. If I change in the previous code policy_loss = nn.ModuleList([]), it throws an exception saying that tensor float is not sub-module.
You are misunderstanding what Modules are. A Module stores parameters and defines an implementation of the forward pass.
You're allowed to perform arbitrary computation with tensors and parameters resulting in other new tensors. Modules need not be aware of those tensors. You're also allowed to store lists of tensors in Python lists. When calling backward it needs to be on a scalar tensor thus the sum of the concatenation. These tensors are losses and not parameters so they should not be attributes of a Module nor wrapped in a ModuleList.
I have a function defined and I just want to know if it is possible to perform it batchwise. For instance,
def function():
Some processes here
return x
def forward():
encode = self._encoding(embedded_premises,premises_lengths)
Now since, the encode will be 3D tensor, which will be batch size, seq_length, hidden size I want to perform function() batchwise and return x also batchwise.
Is there any other way than looping over all batches?
If you're working with pytorch functions inside your function, which you most likely are if you want your method to work with autograd, it can work batchwise. That means most pytorch operations respect or can be made to respect the first dimension as batch dimension (for instance convolutions, linear layers, etc). Sometimes it's more complex to express your operation such that it is both correct and fast, but in general pytorch is built with the assumption that operations will be used on batched data and it is made as simple as reasonably possible. If you have a more specific example of your function, please post it.
While defining prototxt in caffe, I found sometimes we use Softmax as the last layer type, sometimes we use SoftmaxWithLoss, I know the Softmax layer will return the probability the input data belongs to each class, but it seems that SoftmaxwithLoss will also return the class probability, then what's the difference between them? or did I misunderstand the usage of the two layer types?
While Softmax returns the probability of each target class given the model predictions, SoftmaxWithLoss not only applies the softmax operation to the predictions, but also computes the multinomial logistic loss, returned as output. This is fundamental for the training phase (without a loss there will be no gradient that can be used to update the network parameters).
See
SoftmaxWithLossLayer
and Caffe Loss
for more info.
I am trying to implement a custom RNN layer in Keras and I tried to follow what explained in this link, which basically instructs how to inherit from the existing RNN classes. However, the update equation of the hidden layer in my formulation is a bit different: h(t) = tanh(W.x + U.h(t-1) + V.r(t) + b) and I am a bit confused. In this equation, r(t) = f(x, p(t)) is a function of x, the fixed input distributed over time, and also p(t) = O(t-1).alpha + p(t-1), where O(t) is the Softmax output of each RNN cell.
I think after calling super(customRNN, self).step in the inherited step function, the standard h(t) should be overridden by my definition of h(t). However I am not sure how to modify the states and also get_constants function, and whether or not I need to modify any other parts of the recurrent and simpleRNN classes in Keras.
My intuition is that the get_constants function only returns the dropout matrices as extra states to the step function, so I am guessing at least one state should be added for the dropout matrix of V in my equations.
I have just recently started using Keras and I could not find many references on custom Keras layer definition. Sorry if my question is a bit overwhelmed with a lot of parameters, I just wanted to make sure that I am not missing any point. Thanks!