How to load CUB-200-2011 dataset in pytorch?

How to load CUB-200-2011 dataset in pytorch? - deep-learning

I am trying to do fine-grained image classification for CUB-200-2011 dataset, couldn't figure out the correct way for loading the data. I'm using pre-trained ImageNet ResNet-101 for training by freezing all layers except last two layers. Suggest a way to for data loading in Pytorch.

Related

Freezing certain layers in neural networks using Pytorch Image Models

I am trying to do binary classification using transfer learning using Timm
In the process, I want to experiment with freezing/unfreezing different layers of different architectures but so far, I am able to freeze/unfreeze entire models only.
Can anyone help me in illustrating it with a couple of model architectures for the sake of heterogeneity of different architectures?
Below, I am ilustrating the entire freezing of couple of architectures using Timm - convnext and resnet but can anyone illustrate me with any different models but only using Timm(As it is more comprehensive than Pytorch model zoo)-
import timm
convnext = timm.create_model('convnext_tiny_in22k', pretrained=True,num_classes=2)
resnet = timm.create_model('resnet50d', pretrained=True,num_classes=2)

Usefulness of Pretrained NN's for performing binary segmentation in images

I am trying to perform binary segmentation on a custom dataset (DAGM dataset in my case Link to the dataset
I was just curious to know if pretrained networks on the imagenet dataset like VGG,Resnet will be of any particular use as I am not trying to segment objects like cats,dogs etc but anomalies in the images.

Normally you would want to fine tune a model on your new dataset which was previously trained and tuned on a similar problem. Neural networks extract features from samples and use those features to classify. If you have previously trained your network on biomedical dataset, then it has learned how to extract features from those models. So try to find a model that was trained on similar domain.
Also you can check the below link for more insight about the issue.
https://en.wikipedia.org/wiki/Catastrophic_interference

how to train pre-trained CNN on new dataset which is not organised in classes (Unsupervised)

I have a pretrained CNN (Resnet-18) trained on Imagenet, now i want to extend it on my own dataset of video frames , now the point is all tutorials i found on Finetuning required dataset to be organised in classes like
class1/train/
class1/test/
class2/train/
class2/test/
but i have only frames on many videos , how will i train my CNN on it.
So can anyone point me in right direction , any tutorial or paper etc ?
PS: My final task is to get deep features of all frames that i provide at the time of testing

for training network, you should have some 'label'(sometimes called y) of your input data. from there, network calculate loss between logit(answer of network) and the given label.
And the network will self-revise using that loss value by backpropagating. that process is what we call 'training'.
Because you only have input data, not label, so you can get the logit only. that means a loss cannot be calculated.
Fine tuning is almost same word with 'additional training', so that you cannot fine tuning your pre-trained network without labeled data.
About train set & test set, that is not the problem right now.
If you have enough labeled input data, you can divide it with some ratio.
(e.g. 80% of data for training, 20% of data for testing)
the reason why divide data into these two sets, we want to check the performance of our trained network more general, unseen situation.
However, if you just input your data into pre-trained network(encoder part), it will give a deep feature. It doesn't exactly fit to your task, still it is deep feature.
Added)
Unsupervised pre-training for convolutional neural network in theano
here is the method you need, deep feature encoder in unsupervised situation. I hope it will help.

May I use CaffeNet for 3 labels? [duplicate]

I trained GoogLeNet model from scratch. But it didn't give me the promising results.
As an alternative, I would like to do fine tuning of GoogLeNet model on my dataset. Does anyone know what are the steps should I follow?

Assuming you are trying to do image classification. These should be the steps for finetuning a model:
1. Classification layer
The original classification layer "loss3/classifier" outputs predictions for 1000 classes (it's mum_output is set to 1000). You'll need to replace it with a new layer with appropriate num_output. Replacing the classification layer:
Change layer's name (so that when you read the original weights from caffemodel file there will be no conflict with the weights of this layer).
Change num_output to the right number of output classes you are trying to predict.
Note that you need to change ALL classification layers. Usually there is only one, but GoogLeNet happens to have three: "loss1/classifier", "loss2/classifier" and "loss3/classifier".
2. Data
You need to make a new training dataset with the new labels you want to fine tune to. See, for example, this post on how to make an lmdb dataset.
3. How extensive a finetuning you want?
When finetuning a model, you can train ALL model's weights or choose to fix some weights (usually filters of the lower/deeper layers) and train only the weights of the top-most layers. This choice is up to you and it ususally depends on the amount of training data available (the more examples you have the more weights you can afford to finetune).
Each layer (that holds trainable parameters) has param { lr_mult: XX }. This coefficient determines how susceptible these weights to SGD updates. Setting param { lr_mult: 0 } means you FIX the weights of this layer and they will not be changed during the training process.
Edit your train_val.prototxt accordingly.
4. Run caffe
Run caffe train but supply it with caffemodel weights as an initial weights:
~$ $CAFFE_ROOT/build/tools/caffe train -solver /path/to/solver.ptototxt -weights /path/to/orig_googlenet_weights.caffemodel

Fine-tuning is a very useful trick to achieve a promising accuracy compared to past manual feature. #Shai already posted a good tutorial for fine-tuning the Googlenet using Caffe, so I just want to give some recommends and tricks for fine-tuning for general cases.
In most of time, we face a task classification problem that new dataset (e.g. Oxford 102 flower dataset or Cat&Dog) has following four common situations CS231n:
New dataset is small and similar to original dataset.
New dataset is small but is different to original dataset (Most common cases)
New dataset is large and similar to original dataset.
New dataset is large but is different to original dataset.
In practice, most of time we do not have enough data to train the network from scratch, but may be enough for pre-trained model. Whatever which cases I mentions above only thing we must care about is that do we have enough data to train the CNN?
If yes, we can train the CNN from scratch. However, in practice it is still beneficial to initialize the weight from pre-trained model.
If no, we need to check whether data is very different from original datasets? If it is very similar, we can just fine-tune the fully connected neural network or fine-tune with SVM. However, If it is very different from original dataset, we may need to fine-tune the convolutional neural network to improve the generalization.

Building an Image search engine using Convolutional Neural Networks

I am trying to implement an image search engine using AlexNethttps://github.com/akrizhevsky/cuda-convnet2
The idea is to implement an image search engine by training a neural net to classify images and then using the code from the net's last hidden layer as a similarity measure.
I am trying to figure out how to train the CNN on a new set of images to classify them. Does anyone know how to get started with this?
Thanks

You basically have two approaches to your problem:
-Either you have plenty of good training data (>1M) and dozens of GPUs and you retrain the network from scratch using SGD with the classes you have for your queries.
-Either you don't and then you simply truncate a pretrained AlexNet (where exactly you truncate it is for you to choose) and plug it to your images (possibly resized to fit the network (227x227x3 if I am not mistaken)).
Then from your image you get a feature vector (sometimes called a descriptor) and you use those feature vectors to train a linear SVM on your images and your specific task.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008