I want to do a 2 class segmentation using dense_vnet model available on niftynet which originally does a 9 class segmentation
I tried to retrain only the last layer by making changes in config file according to this suggestion: HOw to fine tune niftynet pre trained model for custom data
vars_to_restore = ^((?!DenseVNet\/(skip_conv|fin_conv)).)*$
num_classes = 2
error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign
requires shapes of both tensors to match. lhs shape= [2] rhs shape=
[9]
[[{{node save/Assign_8}} = Assign[T=DT_FLOAT, class=["loc:#DenseVNet/conv/conv/b"], use_locking=true, validate_shape=true,
device="/job:localhost/replica:0/task:0/device:CPU:0"](DenseVNet/conv/conv/b,
save/RestoreV2:8)]]
It looks like you have restored too many layers, some of them are still designed to classify to 9 classes. Inspect the architecture and exclude restore for all layers which are designed to classify into 9 classes.
Related
I am working on Object detection in document images. I have a model working with DETR-Resnet-50 on custom data
I also found out that Layout Parser Faster RCNN - Mask RCNN models use Resnet 50 as a backbone. And since these are already trained on document images, fine tuning these would give better results.
I think one way to do this would be to do like:
model = torch.hub.load('facebookresearch/detr:main', 'detr_resnet50', pretrained=True)
for name, param in model.named_parameters():
print(name, param.shape)
#param.data = weight
and when the layers name seem similar, you change the weights manually.
Now the problem is that Layout Parser models are in detectron2 and my DETR models are in HuggingFace. How can I change the weights of backbone (resnet50) in DETR so that they are initialised with those weights instead of imagenet ones?
I'm using Resnet50 model to classify images into two classes: normal cells and cancer cells.
so I want to to increase the accuracy but i don't know what to modify.
# we are using resnet50 for transfer learnin here. So we have imported it
from tensorflow.keras.applications import resnet50
# initializing model with weights='imagenet'i.e. we are carring its original weights
model_name='resnet50'
base_model=resnet50.ResNet50(include_top=False, weights="imagenet",input_shape=img_shape, pooling='max')
last_layer=base_model.output # we are taking last layer of the model
# Add flatten layer: we are extending Neural Network by adding flattn layer
flatten=layers.Flatten()(last_layer)
# Add dense layer
dense1=layers.Dense(100,activation='relu')(flatten)
# Add dense layer to the final output layer
output_layer=layers.Dense(class_count,activation='softmax')(flatten)
# Creating modle with input and output layer
model=Model(inputs=base_model.inputs,outputs=output_layer)
model.compile(Adamax(learning_rate=.001), loss='categorical_crossentropy', metrics=['accuracy'])
There were 48 errors in 534 test cases Model accuracy= 91.01 %
Also what do you think about the results of the graph?
this is the classification report
i got good results but is there a possibility to increase accuracy more than that?
This is a broad question as there are many ways one can attempt to generally improve the network's accuracy. some of which may be
Increase the dimension of the layers that are learned in transfer learning (make sure not to overfit)
Use transfer learning with Convolution layers and not MLP
let the optimization algorithm choose the learning rate on its own
Play with additional augmentations to the dataset
and the list goes on.
Also, if possible, I would suggest comparing your results to other publicly available benchmarks - by doing so you might understand the upper bounds of the accuracies better
I would like to create a fully convolution network for binary image classification in pytorch that can take dynamic input image sizes, but I don't quite understand conceptually the idea behind changing the final layer from a fully connected layer to a convolution layer. Here and here both state that this is possible by using a 1x1 convolution.
Suppose I have a 16x16x1 image as input to the CNN. After several convolutions, the output is a 16x16x32. If using a fully connected layer, I can produce a single value output by creating 16*16*32 weights and feeding it to a single neuron. What I don't understand is how you would get a single value output by applying a 1x1 convolution. Wouldn't you end up with 16x16x1 output?
Check this link: http://cs231n.github.io/convolutional-networks/#convert
In this case, your convolution layer should be a 16 x 16 filter with 1 output channel. This will convert the 16 x 16 x 32 input into a single output.
Sample code to test:
from keras.layers import Conv2D, Input
from keras.models import Model
import numpy as np
input = Input((16,16,32))
output = Conv2D(1, 16)(input)
model = Model(input, output)
print(model.summary()) # check the output shape
output = model.predict(np.zeros((1, 16, 16, 32))) # check on sample data
print(f'output is {np.squeeze(output)}')
This approach of Fully convolutional networks are useful in segmentation tasks using patch based approaches since you can speed up prediction(inference) by feeding a bigger portion of the image.
For classification tasks, you usually have a fc layer at the end. In that case, a layer like AdaptiveAvgPool2d is used which ensures the fc layer sees a constant input feature size irrespective of the input image size.
https://pytorch.org/docs/stable/nn.html#adaptiveavgpool2d
See this pull request for torchvision VGG: https://github.com/pytorch/vision/pull/747
In case of Keras, GlobalAveragePooling2D. See the example, "Fine-tune InceptionV3 on a new set of classes".
https://keras.io/applications/
I hope you are familier with keras. Now see your image is of 16*16*1. Image will pass to the keras convoloutional layer but first we have to create the model. like model=Sequential() by this we are able to get keras model instance. now we will give our convoloutional layer with our parameters like
model.add(Conv2D(20,(2,2),padding="same"))
now here we are adding 20 filters to our image. and our image becomes 16*16*20 now for more best features we add more conv layers like
model.add(Conv2D(32,(2,2),padding="same"))
now we add 32 filters to your image after this your image will be size of 16*16*32
dont forgot to put activation after conv layers. If you are new than you should study about activations, Optimization and loss of the network. these are the basic part of neural Networks.
Now its time to move towards fully connected layer. First we need to flatten our image because fully connected layer only works on 2d vectors (no_of_ex,image_dim) in your case
imgae diminsion after applying flattening will be (16*16*32)
model.add(Flatten())
after flatening our image your network will give it to fully connected layers
model.add(Dense(32))
model.add(Activation("relu"))
model.add(Dense(8))
model.add(Activation("relu"))
model.add(Dense(2))
because you are having a problem of binary classification if you have to classify 3 classes than last layer will have 3 neuron if you have to classify 10 examples than your last dense layer willh have 10 neuron.
model.add(Activation("softmax"))
model.compile(loss='binary_crossentropy',
optimizer=Adam(),
metrics=['accuracy'])
return model
after this you have to fit this model.
estimator=model()
estimator.fit(X_train,y_train)
full code:
def model (classes):
model=Sequential()
# conv2d set =====> Conv2d====>relu=====>MaxPooling
model.add(Conv2D(20,(5,5),padding="same"))
model.add(Activation("relu"))
model.add(Conv2D(32,(5,5),padding="same"))
model.add(Activation("relu"))
model.add(Flatten())
model.add(Dense(32))
model.add(Activation("relu"))
model.add(Dense(8))
model.add(Activation("relu"))
model.add(Dense(2))
#now adding Softmax Classifer because we want to classify 10 class
model.add(Dense(classes))
model.add(Activation("softmax"))
model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=0.0001, decay=1e-6),
metrics=['accuracy'])
return model
You can take help from this kernal
I have pairs of movie witch contains 2783 features.
The vector is defined as: if the feature is in the movie it's 1 otherwise it's 0.
Example :
movie 1 = [0,0,1,0,1,0,1 ...] & movie 2 = [1,0,1,1,1,0,1 ...]
Each pair has for label 1 or 0.
movie1,movie2=0
movie1,movie4=1
movie2,movie150=0
The input is similar to SGNS (Skip gram negative sampling) word2vec model.
My goal is to find similarity between programs and learn embedding of each movie.
I'd to build a kind of 'SGNS implementation with keras'. However my input is not one hot and I can't use the Embedding layers. I tried to use Dense layers and merge them with a dot product. I'm not sure about the model architecture and I got errors.
from keras.layers import Dense,Input,LSTM,Reshape
from keras.models import Model,Sequential
n_of_features = 2783
n_embed_dims = 20
# movie1 vectors
word= Sequential()
word.add(Dense(n_embed_dims, input_dim=(n_words,)))
# movie2 vectors
context = Sequential()
context.add(Dense(n_embed_dims, input_dim=n_words,))
model = Sequential()
model.add(keras.layers.dot([word, context], axes=1))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
loss='mean_squared_error')
If someone has an idea how to implement it.
If you're not wedded to Keras, you could probably model this by turning each movie into a synthetic 'document' with tokens for each feature-that-is-present. Then, use a 'Paragraph Vectors' implementation in pure PV-DBOW mode to learn a vector for each movie.
(In pure PV-DBOW, dense doc-vectors are learned to predict each word in a document, without regard to order/word-adjacency/etc. It is a bit like skip-gram, but the training pairs are not 'word to every nearby word' but 'doc-token to every in-doc word'.)
In gensim, the Doc2Vec class with initialization parameter dm=0 uses pure PV-DBOW mode.
I have a network which estimates the depth_map from an input image. In a nutshell, I have an input_image and its corresponding ground_truth (depth map). Let us call this network a generator network. So far so good. Now I have heard of ´Generative Adversarial Networksand I thought I could improve my network with adding aDiscriminator`-Network as follows:
input -> neural network -> estimated_depth_image -> discriminator -> output: true or false depending on real or synthesised
^
|
ground_truth_depth_image
But then, how would I switch between feeding the estimated_depth_image with label 0 and feeding the ground_truth_image with label 1 into the discriminator. Is that even possible? If yes, how would you approach it?
My problem is, how do I feed my new dataset which is consists of all the groud_truth depth images with label 1 and all the estimated_deth_images with label 0 into the discriminator network at the same time using caffe?
As far as I know this is not easy to do in Caffe, because you would need two different optimizers (one for the generator and one for the discriminator). You also need to do the forward pass only in G, and then in D using the output of G and some real depth data. I would recommend to use Tensorflow, torch or pytorch to do that.