DeeplabV3+ without pretrained backbone, will it the reason for bad DSC? - deep-learning

I segment multiple targets in medical image (CT) with DeeplabV3+, but with 3D volumes, so I can't load pretrained backbone(resnet...etc.) in the net.
And the details is:
patch size: 16, 256, 256(cannot edit)
batch size: 2(cause' GPU cannot afford the bigger one)
optimizer: SGD
loss: Dice+CrossEntropy(refer to nnUNet setting)
dataset: just about 20 cases.
the original code is for 2D situation, and I exchange each layer from 2D to 3D(like nn.Conv2d TO nn.Conv3d and something)
But finally, My validation DSC just reached 0.6 around, I have no idea what's wrong in my code? Could anyone give me a hand(idea), please? Thanks a lot!
Increase the performance of the model, because now I don't have any idea why my network is so bad. Thanks a lot.

You can try to use a few 3x3 convolutional layers on 3D volumes of images keeping dimensions (h and w) of the features constant and then convert such tensor to 3 channel tensor using 1x1 convolutional layer. Now you will have a tensor of same height and width of the image with 3 channels and you can use the pretrained models.
For reference, check here:
https://segmentation-models.readthedocs.io/en/latest/tutorial.html#training-with-non-rgb-data

Related

Is there an actual minimum input image size for popular computer vision models? (E.g., vgg, resnet, etc.)

According to the documentation on pre-trained computer vision models for transfer learning (e.g., here), input images should come in "mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224".
However, when running transfer learning experiments on 3-channel images with height and width smaller than expected (e.g., smaller than 224), the networks generally run smoothly and often get decent performances.
Hence, it seems to me that the "minimum height and width" is somehow a convention and not a critical parameter. Am I missing something here?
There is a limitation on your input size which corresponds to the receptive field of the last convolution layer of your network. Intuitively, you can observe the spatial dimensionality decreasing as you progress through the network. At least this is the case for feature extractor CNNs which aim at extracting feature embeddings from the input image. That is most pre-trained models such as vanilla VGG, and ResNets networks do not retain spatial dimensionality. If the input of a convolutional layer is smaller than the kernel size (even if/when padded), then you simply won't be able to perform the operation.
TLDR: adaptive pooling layer
For example, the standard resnet50 model accepts input only in ranges 193-225, and this is due to the architecture and downscaling layers (see below).
The only reason why the default pytorch model works is that it is using adaptive pooling layer which allows to not restrict input size. So it's gonna work but you should be ready for performance decay and other fun things :)
Hope you will find it useful:
https://discuss.pytorch.org/t/how-can-torchvison-models-deal-with-image-whose-size-is-not-224-224/51077/3
What is Adaptive average pooling and How does it work?
https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html
https://github.com/pytorch/vision/blob/c187c2b12d86c3909e59a40dbe49555d85b98703/torchvision/models/resnet.py#L118
https://github.com/pytorch/vision/blob/c187c2b12d86c3909e59a40dbe49555d85b98703/torchvision/models/resnet.py#L151
https://developpaper.com/pytorch-implementation-examples-of-resnet50-resnet101-and-resnet152/

Why Batch-normalization Layer follows scale layer in caffe?

I noticed that Batch Normalization layer follows Scale layer in mobile net. It seems BN layer and scale layer are a pair.
And Convolution layer + BN layer + Scale layer + ReLU layer works well.
So what scale layer do?
It seems caffe can't learn parameters in BN layer, so Scale layer is useful, but why?
In tensorflow doc, https://www.tensorflow.org/api_docs/python/tf/contrib/layers/batch_norm
When the next layer is linear (also e.g. nn.relu), this can be
disabled since the scaling can be done by the next layer.
It makes me more confuse.
Please help me, thanks!
Batch Normalization does two things: First normalize with the mean and standard deviation of activations in a batch, and then perform scaling and bias to restore an appropriate range of the activations.
Caffe implemented this with two layers, the Batch Normalization layer only does the normalization part, without the scaling and bias, which can be done with the scaling layer, or might not even be needed if the next layer can also do scaling (this is what TF docs mention).
Hope this helps.

May I use CaffeNet for 3 labels? [duplicate]

I trained GoogLeNet model from scratch. But it didn't give me the promising results.
As an alternative, I would like to do fine tuning of GoogLeNet model on my dataset. Does anyone know what are the steps should I follow?
Assuming you are trying to do image classification. These should be the steps for finetuning a model:
1. Classification layer
The original classification layer "loss3/classifier" outputs predictions for 1000 classes (it's mum_output is set to 1000). You'll need to replace it with a new layer with appropriate num_output. Replacing the classification layer:
Change layer's name (so that when you read the original weights from caffemodel file there will be no conflict with the weights of this layer).
Change num_output to the right number of output classes you are trying to predict.
Note that you need to change ALL classification layers. Usually there is only one, but GoogLeNet happens to have three: "loss1/classifier", "loss2/classifier" and "loss3/classifier".
2. Data
You need to make a new training dataset with the new labels you want to fine tune to. See, for example, this post on how to make an lmdb dataset.
3. How extensive a finetuning you want?
When finetuning a model, you can train ALL model's weights or choose to fix some weights (usually filters of the lower/deeper layers) and train only the weights of the top-most layers. This choice is up to you and it ususally depends on the amount of training data available (the more examples you have the more weights you can afford to finetune).
Each layer (that holds trainable parameters) has param { lr_mult: XX }. This coefficient determines how susceptible these weights to SGD updates. Setting param { lr_mult: 0 } means you FIX the weights of this layer and they will not be changed during the training process.
Edit your train_val.prototxt accordingly.
4. Run caffe
Run caffe train but supply it with caffemodel weights as an initial weights:
~$ $CAFFE_ROOT/build/tools/caffe train -solver /path/to/solver.ptototxt -weights /path/to/orig_googlenet_weights.caffemodel
Fine-tuning is a very useful trick to achieve a promising accuracy compared to past manual feature. #Shai already posted a good tutorial for fine-tuning the Googlenet using Caffe, so I just want to give some recommends and tricks for fine-tuning for general cases.
In most of time, we face a task classification problem that new dataset (e.g. Oxford 102 flower dataset or Cat&Dog) has following four common situations CS231n:
New dataset is small and similar to original dataset.
New dataset is small but is different to original dataset (Most common cases)
New dataset is large and similar to original dataset.
New dataset is large but is different to original dataset.
In practice, most of time we do not have enough data to train the network from scratch, but may be enough for pre-trained model. Whatever which cases I mentions above only thing we must care about is that do we have enough data to train the CNN?
If yes, we can train the CNN from scratch. However, in practice it is still beneficial to initialize the weight from pre-trained model.
If no, we need to check whether data is very different from original datasets? If it is very similar, we can just fine-tune the fully connected neural network or fine-tune with SVM. However, If it is very different from original dataset, we may need to fine-tune the convolutional neural network to improve the generalization.

Regression Using Caffe

I have 900 training samples and 100 test samples where each of the samples has one label (e.g. 64, 136 so on). Here each sample is represented with a 1-dimensional vector of size 460000.
How can I do linear regression using CAFFE with these data? I badly need a solution.
Thanks in advance.
You can use the Euclidean layer as loss function.
Euclidean Loss Layer.
With that, just make sure your last layer has only one neuron output (num_output: 1, in your protoxt file).
You can check some examples here: Examples Caffe, particularly the Autoencoder uses fully connected network and the Euclidean loss.

How to feed the weights of the neurons to the blobs in caffe and vice versa?

I am a very newbie to caffe.
I have a huge weight vector which contains the weights connecting the neurons of the neural network written in C++. I want to know use this weight vector as to define a neural network in Caffe and these weights will be the initial weights of the connecting neurons. How do I feed these weights into the Caffe blobs which is the fundamental way to hold parameter values like weights and biases in caffe.
After every iteration when the weights get updated, I also want to get their values from the blobs and put them back into this huge weight vector which I will access from the remaining part of code in C++.
Please tell me how to code this in caffe. It is actually a process of serialization and deserialization of the weights vector to and from blobs.
Any help will be greatly appreciated