I would like to modify the ImageNet caffe model as described bellow:
As the input channel number for temporal nets is different from that
of spatial nets (20 vs. 3), we average the ImageNet model filters of
first layer across the channel, and then copy the average results 20
times as the initialization of temporal nets.
My question is how can I achive the above results? How can I open the caffe model to be able to do those changes to it?
I read the net surgery tutorial but it doesn't cover the procedure needed.
Thank you for your assistance!
AMayer
The Net Surgery tutorial should give you the basics you need to cover this. But let me explain the steps you need to do in more detail:
Prepare the .prototxt network architectures: You need two files: the existing ImageNet .prototxt file, and your new temporal network architecture. You should make all layers except the first convolutional layers identical in both networks, including the names of the layers. That way, you can use the ImageNet .caffemodel file to initialize the weights automatically.
As the first conv layer has a different size, you have to give it a different name in your .prototxt file than it has in the ImageNet file. Otherwise, Caffe will try to initialize this layer with the existing weights too, which will fail as they have different shapes. (This is what happens in the edit to your question.) Just name it e.g. conv1b and change all references to that layer accordingly.
Load the ImageNet network for testing, so you can extract the parameters from the model file:
net = caffe.Net('imagenet.prototxt', 'imagenet.caffemodel', caffe.TEST)
Extract the weights from this loaded model.
conv_1_weights = old_net.params['conv1'][0].data
conv_1_biases = old_net.params['conv1'][1].data
Average the weights across the channels:
conv_av_weights = np.mean(conv_1_weights, axis=1, keepdims=True)
Load your new network together with the old .caffemodel file, as all layers except for the first layer directly use the weights from ImageNet:
new_net = caffe.Net('new_network.prototxt', 'imagenet.caffemodel', caffe.TEST)
Assign your calculated average weights to the new network
new_net.params['conv1b'][0].data[...] = conv_av_weights
new_net.params['conv1b'][1].data[...] = conv_1_biases
Save your weights to a new .caffemodel file:
new_net.save('new_weights.caffemodel')
Related
I am working on Object detection in document images. I have a model working with DETR-Resnet-50 on custom data
I also found out that Layout Parser Faster RCNN - Mask RCNN models use Resnet 50 as a backbone. And since these are already trained on document images, fine tuning these would give better results.
I think one way to do this would be to do like:
model = torch.hub.load('facebookresearch/detr:main', 'detr_resnet50', pretrained=True)
for name, param in model.named_parameters():
print(name, param.shape)
#param.data = weight
and when the layers name seem similar, you change the weights manually.
Now the problem is that Layout Parser models are in detectron2 and my DETR models are in HuggingFace. How can I change the weights of backbone (resnet50) in DETR so that they are initialised with those weights instead of imagenet ones?
I'm using Resnet50 model to classify images into two classes: normal cells and cancer cells.
so I want to to increase the accuracy but i don't know what to modify.
# we are using resnet50 for transfer learnin here. So we have imported it
from tensorflow.keras.applications import resnet50
# initializing model with weights='imagenet'i.e. we are carring its original weights
model_name='resnet50'
base_model=resnet50.ResNet50(include_top=False, weights="imagenet",input_shape=img_shape, pooling='max')
last_layer=base_model.output # we are taking last layer of the model
# Add flatten layer: we are extending Neural Network by adding flattn layer
flatten=layers.Flatten()(last_layer)
# Add dense layer
dense1=layers.Dense(100,activation='relu')(flatten)
# Add dense layer to the final output layer
output_layer=layers.Dense(class_count,activation='softmax')(flatten)
# Creating modle with input and output layer
model=Model(inputs=base_model.inputs,outputs=output_layer)
model.compile(Adamax(learning_rate=.001), loss='categorical_crossentropy', metrics=['accuracy'])
There were 48 errors in 534 test cases Model accuracy= 91.01 %
Also what do you think about the results of the graph?
this is the classification report
i got good results but is there a possibility to increase accuracy more than that?
This is a broad question as there are many ways one can attempt to generally improve the network's accuracy. some of which may be
Increase the dimension of the layers that are learned in transfer learning (make sure not to overfit)
Use transfer learning with Convolution layers and not MLP
let the optimization algorithm choose the learning rate on its own
Play with additional augmentations to the dataset
and the list goes on.
Also, if possible, I would suggest comparing your results to other publicly available benchmarks - by doing so you might understand the upper bounds of the accuracies better
I'm currently working on my bachelor project and I'm using the PointNet deep neural network.
My project group and I have created a dataset of point clouds(an unsorted list of x amount of 3d coordinates) and segmentation files, but we can't train PointNet to predict segmentation with the dataset.
Each segmentation file is a list containing the same amount of rows, as points in the corresponding point cloud, and each row is either a 1 or a 2, depending on the corresponding point belonging to segment 1 or 2.
When PointNet predicts it outputs a list of x elements, where each element is the segment that PointNet predicts the corresponding point belongs to.
When we run the benchmark dataset from the original PointNet implementation, the system runs and can predict segmentation, so we know that the error is in the dataset somewhere, even though we have tried our best to have our dataset look like the original benchmark dataset.
The implemented PointNet uses pytorch conv2d, maxpool2d and linear transformation. For calculating the loss, both the nn.functional.nll_loss and the nn.NLLLos functions have been used. When using the nn.NLLLos the weight parameter was set to a tensor of [1,100] to combat potential imbalance of the data.
These are the thing we have tried:
We have tried downsampling the point clouds i.e remove points using voxel downsampling
We have tried downscaling and normalize all values so they are between 0 and 1, using this formula (data - np.min(data)) / (np.max(data) - np.min(data))
We have tried running an euclidean clustering function on the data, to have each scanned object for it self
We have tried replicating another dataset, which was created using the same raw data, which we know have worked before
In the attached link, images of the datafiles with a description can be found.
Cheers everyone
I'm studying on a deep learning(supervised-learning) to estimate depth images from monocular images.
And the dataset currently uses KITTI data. RGB images (input image) are used KITTI Raw data, and data from the following link is used for ground-truth.
In the process of learning a model by designing a simple encoder-decoder network, the result is not so good, so various attempts are being made.
While searching for various methods, I found that groundtruth only learns valid areas by masking because there are many invalid areas, i.e., values that cannot be used, as shown in the image below.
So, I learned through masking, but I am curious about why this result keeps coming out.
and this is my training part of code.
How can i fix this problem.
for epoch in range(num_epoch):
model.train() ### train ###
for batch_idx, samples in enumerate(tqdm(train_loader)):
x_train = samples['RGB'].to(device)
y_train = samples['groundtruth'].to(device)
pred_depth = model.forward(x_train)
valid_mask = y_train != 0 #### Here is masking
valid_gt_depth = y_train[valid_mask]
valid_pred_depth = pred_depth[valid_mask]
loss = loss_RMSE(valid_pred_depth, valid_gt_depth)
As far as I can understand, you are trying to estimate depth from an RGB image as input. This is an ill-posed problem since the same input image can project to multiple plausible depth values. You would need to integrate certain techniques to estimate accurate depth from RGB images instead of simply taking an L1 or L2 loss between an RGB image and its corresponding depth image.
I would suggest you to go through some papers in estimating depth from single images such as: Depth Map Prediction from a Single Image using a Multi-Scale Deep Network where they use a network to first estimate the global structure of the given image and then use a second network that refines the local scene information. Instead of taking a simple RMSE loss, as you did, they use a scale-invariant error function in which the relationship between points is measured.
Is there a simple way of renaming layers in a caffe network by using the pycaffe interface?
I have looked through the net surgery example, but I cannot find an example of what I need.
For example, I would like to load a trained Caffe model and change the name of conv1 layer and its corresponding blob to new-conv1.
I don't know a direct way to do it, but here is a workaround:
Given a pretrained Caffe model my_model.caffemodel and its net architecture net.prototxt. Make a copy of net.prototxt (say net_new.prototxt), and change the name of conv1 layer to new-conv1 (you can change the names of bottom and top if you want).
import caffe
net_old = caffe.Net('net.prototxt','my_model.caffemodel',caffe.TEST)
net_new = caffe.Net('net_new.prototxt','my_model.caffemodel',caffe.TEST)
net_new.params['new-conv1'][0].data[...] = net_old.params['conv1'][0].data[...] #copy filter across 2 nets
net_new.params['new-conv1'][1].data[...] = net_old.params['conv1'][1].data[...] #copy bias
net_new.save('my_model_new.caffemodel')