I am currently working on estimating uncertainty in deep learning models. I came across this tutorial where a probablistic CNN is implemented for classification of MNIST dataset. However, the model is a custom deep model as show below
def get_probabilistic_model(input_shape, loss, optimizer, metrics):
"""
This function should return the probabilistic model according to the
above specification.
The function takes input_shape, loss, optimizer and metrics as arguments, which should be
used to define and compile the model.
Your function should return the compiled model.
"""
model = Sequential([
Conv2D(kernel_size=(5, 5), filters=8, activation='relu', padding='VALID', input_shape=input_shape),
MaxPooling2D(pool_size=(6, 6)),
Flatten(),
Dense(tfpl.OneHotCategorical.params_size(10)),
tfpl.OneHotCategorical(10, convert_to_tensor_fn=tfd.Distribution.mode)
])
model.compile(loss=loss, optimizer=optimizer, metrics=metrics)
return model
However, I want to use a pre-trained deep learning model like efficientnet or mobilenet above and get the estimate of uncertainty of predictions from those models for my problem. How do I go about doing that?
Related
I have created a pytorch model and saved the state_dict as a model.pt file.
I then do the following for making the model ready for inference:
sdict = torch.load(PATH)
model = torchvision.models.detection.maskrcnn_resnet50_fpn()
model.load_state_dict(sdict['model_state'])
model.eval()
I have two models both using the same parameters except the number of epochs for training.
Model 1: epochs +100
Model 2: epochs c.10
The inference time difference between the two models is 30x.
How can that be?
I am running the code on a cpu.
I want to use Bert only for embedding and use the Bert output as an input for a classification net that I will build from scratch.
I am not sure if I want to do finetuning for the model.
I think the relevant classes are BertModel or BertForPreTraining.
BertForPreTraining head contains two "actions":
self.predictions is MLM (Masked Language Modeling) head is what gives BERT the power to fix the grammar errors, and self.seq_relationship is NSP (Next Sentence Prediction); usually refereed as the classification head.
class BertPreTrainingHeads(nn.Module):
def __init__(self, config):
super().__init__()
self.predictions = BertLMPredictionHead(config)
self.seq_relationship = nn.Linear(config.hidden_size, 2)
I think the NSP isn't relevant for my task so I can "override" it.
what does the MLM do and is it relevant for my goal or should I use the BertModel?
You should be using BertModel instead of BertForPreTraining.
BertForPreTraining is used to train bert on Masked Language Model (MLM) and Next Sentence Prediction (NSP) tasks. They are not meant for classification.
BERT model simply gives the output of the BERT model, you can then finetune the BERT model along with the classifier that you build on top of it. For classification, if its just a single layer on top of BERT model, you can directly go with BertForSequenceClassification.
In anycase, if you just want to take the output of BERT model and learn your classifier (without fine-tuning BERT model), then you can freeze the Bert model weights using:
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
for param in model.bert.bert.parameters():
param.requires_grad = False
The above code is borrowed from here
I am using the VGG-16 network available in pytorch out of the box to predict some image index. I found out that for same input file, if i predict multiple time, I get different outcome. This seems counter-intuitive to me. Once the weights are predicted ( since I am using the pretrained model) there should not be any randomness at any step, and hence multiple run with same input file shall return same prediction.
Here is my code:
import torch
import torchvision.models as models
VGG16 = models.vgg16(pretrained=True)
def VGG16_predict(img_path):
transformer = transforms.Compose([transforms.CenterCrop(224),transforms.ToTensor()])
data = transformer(Image.open(img_path))
output = softmax(VGG16(data.unsqueeze(0)), dim=1).argmax().item()
return output # predicted class index
VGG16_predict(image)
Here is the image
Recall that many modules have two states for training vs evaluation: "Some models use modules which have different training and evaluation behavior, such as batch normalization. To switch between these modes, use model.train() or model.eval() as appropriate. See train() or eval() for details." (https://pytorch.org/docs/stable/torchvision/models.html)
In this case, the classifier layers include dropout, which is stochastic during training. Run VGG16.eval() if you want the evaluations to be non-random.
I came across a tutorial where the autor use a LSTM network for a time series prediction like this :
trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = numpy.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
model = Sequential()
model.add(LSTM(4, input_shape=(1, look_back)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=100, batch_size=1, verbose=2)
We agree that the LSTM in this act like a normal NN (and is useless ?) since the LSTM got only one time step without stateful = TRUE , Am I right ?
Generally speaking, you are correct. The input shape should be (window length, features n).
However, there has been some success in transforming the input to the way you describe above. Below is a whitepaper where they were able to beat many top performing algorithms by doing so, and they used convolutional 1D layers to handle the time series pattern through a separate input.
LSTM Fully Convolutional Networks for Time
Series Classification
I am trying to learn deep learning.
In torch tutorial,
https://github.com/torch/tutorials/blob/master/2_supervised/2_model.lua
https://github.com/torch/tutorials/blob/master/3_unsupervised/2_models.lua
Supervised model
-- Simple 2-layer neural network, with tanh hidden units
model = nn.Sequential()
model:add(nn.Reshape(ninputs))
model:add(nn.Linear(ninputs,nhiddens))
model:add(nn.Tanh())
model:add(nn.Linear(nhiddens,noutputs))
Unsupervised model
-- encoder
encoder = nn.Sequential()
encoder:add(nn.Linear(inputSize,outputSize))
encoder:add(nn.Tanh())
encoder:add(nn.Diag(outputSize))
-- decoder
decoder = nn.Sequential()
decoder:add(nn.Linear(outputSize,inputSize))
-- complete model
module = unsup.AutoEncoder(encoder, decoder, params.beta)
why unsupervised model needs to implement nn.Diag ?
Thanks in advance.
It is in fact a scaling by a learnable vector (the diagonal of a matrix). This is mentioned in the section 3.1 of the paper Learning Fast Approximations of Sparse Coding. It is multiplied by the tanh and together form the non linearity.