I am new to deep learning, i want to build a model that can identify similar images, i am reading classification is a Strong Baseline for Deep Metric Learning research paper. and here is they used the phrase: "remove the bias term inthe last linear layer". i have no idea what is bias term is and how to remove it from googlenet or other pretrained models. if someone help me out with this it would be great! :)
To compute the layer n outputs, a linear neural network computes a linear combination of the layer n-1 output for each layer n output, adds a scalar constant value to each layer n output (the bias term), and then applies an activation function. In pytorch, one could disable the bias in a linear layer using:
layer = torch.nn.Linear(n_in_features, n_out_features, bias=False)
To overwrite the existing structure of, say, the googlenet included in torchvision.models defined here, you can simply override the last fully connected layer after initialization:
from torchvision.models import googlenet
num_classes = 1000 # or whatever you need it to be
model = googlenet(num_classes)
model.fc = torch.nn.Linear(1000,num_classes,bias = False)
Related
Simple and short question. I have a network (Unet) which performs image segmentation. I want the logits as the output to feed into the cross entropy loss (using pytorch). Currently my final layer looks as so:
class Logits(nn.Sequential):
def __init__(self,
in_channels,
n_class
):
super(Logits, self).__init__()
# fully connected layer outputting the prediction layers for each of my classes
self.conv = self.add_module('conv_out',
nn.Conv2d(in_channels,
n_class,
kernel_size = 1
)
)
self.activ = self.add_module('sigmoid_out',
nn.Sigmoid()
)
Is it correct to use the sigmoid activation function here? Does this give me logits?
When people talk about "logits" they usually refer to the "raw" n_class-dimensional output vector. For multi-class classification (n_class > 2) you want to convert the n_class-dimensional vector of raw "logits" into a n_class-dim probability vector.
That is, you want prob = f(logits) with prob_i >= 0 for all n_class entries, and that sum(prob)=1.
The most straight forward way of doing that in a differentiable way is to use the Softmax function:
prob_i = softmax(logits) = exp(logits_i) / sum_j exp(logits_j)
It is easy to see that the output of softmax is indeed a n_class-dim probability vector (I leave it to you as a short exercise).
BTW, this is why the raw predictions are called "logits" because they are kind of "log" of the output predicted probabilities.
Now, it is customary not to explicitly compute the softmax on top of a classification network and defer its computation to the loss function, e.g. nn.CrossEntropyLoss that internally computes the softmax and requires the raw logits as inputs, rather than the normalized probabilities. This is done mainly for numerical stability.
Therefore, if you are training a multi-class classification network with nn.CrossEntropyLoss you do not need to worry at all about the final activation and simply output the raw logits from your final conv/linear layer.
Most importantly, do not use nn.Sigmoid() activation as it tends to have saturated gradients and will mess up your training.
As far as I understood, you are working on a multi-label classification task where a single input can have several labels, hence your usage of nn.Sigmoid (vs nn.Softmax for multi-class classification).
There a loss function which combines nn.Sigmoid and the nn.BCELoss: nn.BCEWithLogitsLoss. So you would have as input, a vector of logits whose length is the number of classes. And, the target would as well have the same shape: as a multi-hot-encoding, with 1s for active classes.
I'm trying to make a simple conv net in c#, and I want to make a Softmax outputlayer, but I don't really now what it is. Is it a fully connected layer with Softmax activation or just a layer which outputs the softmax of the data?
Softmax is just a function that takes a vector and outputs a vector of the same size having values within the range [0,1]. Also the values inside the vector follow the fundamental rule of probability ie. sum of values in vector = 1.
softmax(x)_i = exp(x_i) / ( SUM_{j=1}^K exp(x_j) ) # for each i = 1,.., K
But sometimes people use Softmax classifier which refers to a MLP with input and 1 output layer (which makes it a linear classifier like linear SVM) where softmax function is applied to the outputs of output layer. This setup gives the probability of the input being close to each of the output classes.
I came across a tutorial where the autor use a LSTM network for a time series prediction like this :
trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = numpy.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
model = Sequential()
model.add(LSTM(4, input_shape=(1, look_back)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=100, batch_size=1, verbose=2)
We agree that the LSTM in this act like a normal NN (and is useless ?) since the LSTM got only one time step without stateful = TRUE , Am I right ?
Generally speaking, you are correct. The input shape should be (window length, features n).
However, there has been some success in transforming the input to the way you describe above. Below is a whitepaper where they were able to beat many top performing algorithms by doing so, and they used convolutional 1D layers to handle the time series pattern through a separate input.
LSTM Fully Convolutional Networks for Time
Series Classification
Question: How do I print/return the softmax layer for a multiclass problem using Keras?
my motivation: it is important for visualization/debugging.
it is important to do this for the 'training' setting. ergo batch normalization and dropout must behave as they do in train time.
it should be efficient. calling vanilla model.predict() every now and then is less desirable as the model I am using is heavy and this is extra forward passes. The most desirable case is finding a way to simply display the original network output which was calculated during training.
it is ok to assume that this is done while using Tensorflow as a backend.
Thank you.
You can get the outputs of any layer by using: model.layers[index].output
For all layers use this:
from keras import backend as K
inp = model.input # input placeholder
outputs = [layer.output for layer in model.layers] # all layer outputs
functor = K.function([inp]+ [K.learning_phase()], outputs ) # evaluation function
# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = functor([test, 1.])
print layer_outs
While defining prototxt in caffe, I found sometimes we use Softmax as the last layer type, sometimes we use SoftmaxWithLoss, I know the Softmax layer will return the probability the input data belongs to each class, but it seems that SoftmaxwithLoss will also return the class probability, then what's the difference between them? or did I misunderstand the usage of the two layer types?
While Softmax returns the probability of each target class given the model predictions, SoftmaxWithLoss not only applies the softmax operation to the predictions, but also computes the multinomial logistic loss, returned as output. This is fundamental for the training phase (without a loss there will be no gradient that can be used to update the network parameters).
See
SoftmaxWithLossLayer
and Caffe Loss
for more info.