How to calculate value of FC layer in CNN - deep-learning

Hi i am trying to understand how to calculate the values of each layers of CNN with stride but having hard time understanding the value of FC, FC3 and FC4.
Please let me know how we calculated Pool1 5x5x10 to make 400 and from 400 to 120 and 80.

Related

Network prediction and norm issues

I am facing an issue that can be reduced to the toy problem of learning a map f(X) = exp(-0.1 * ||X||)X, where X is a matrix and ||X|| is the 2-norm of X.
I want to approximate such a map with a CNN and I have available a training dataset with N pairs of the form (X_i,f(X_i)). The dataset has matrices X_i with completely different norms. They can be 1, 10, 1e-5 and so on.
My question is: how would you normalize/manipulate the dataset to train the network and have it generalize well?

Fully connected neural network with constant loss

I am working on a project to predict soccer player values from a set of inputs. The data consists of about 19,000 rows and 8 columns (7 columns for input and 1 column for the target) all of numerical values.
I am using a fully connected Neural Network for the prediction but the problem is the loss is not decreasing as it should.
The loss is very large (1e+13) and doesn’t decrease as it should, it just fluctuates.
This is the function I am using to run the model:
def gradient_descent(model, learning_rate, num_epochs, data_loader, criterion):
losses = []
optimizer = torch.optim.Adam(model.parameters())
for epoch in range(num_epochs): # one epoch
for inputs, outputs in data_loader: # one iteration
inputs, outputs = inputs.to(torch.float32), outputs.to(torch.float32)
logits = model(inputs)
loss = criterion(torch.squeeze(logits), outputs) # forward-pass
optimizer.zero_grad() # zero out the gradients
loss.backward() # compute the gradients (backward-pass)
optimizer.step() # take one step
losses.append(loss.item())
loss = sum(losses[-len(data_loader):]) / len(data_loader)
print(f'Epoch #{epoch}: Loss={loss:.3e}')
return losses
The model is fully connected neural network with 4 hidden layers, each with 7 neurons. input layer has 7 neurons and output has 1. I am using MSE for loss function. I tried changing the learning rate but it is still bad.
What could be the reason behind this?
Thank you!
It is difficult to diagnose your problem from the information you provided, but I'll try to point you in some useful directions.
Data Normalization:
The way we initialize the weights in deep NN has a significant effect on the training process. See, e.g.:
He, K., Zhang, X., Ren, S. and Sun, J., Delving deep into rectifiers: Surpassing human-level performance on imagenet classification (ICCV 2015).
Most initialization methods assume the inputs have zero mean and unit variance (or similar statistics). If your inputs violate these assumptions, you will find it difficult to train. See, e.g., this post.
Normalize the Targets:
You are trying to solve a regression problem (MSE loss), it might be the case that your targets are poorly scaled and causing very large loss values. Try and normalize the targets to span a more compact range.
Learning Rate:
Try and adjust your learning rate: both increasing it and decreasing it by orders of magnitude.

Calculating the number of weights in Convolutional Neural Network using Parameter Sharing

While reading a book Machine Learning: a Probabilistic Perspective by Murphy and this article by Mike O'Neill I have encountered some calculations about the number of weights in Convolutional Neural Network which I want to understand. The architecture of the network is like this:
And this is the explanation from the above article:
Layer #2 is also a convolutional layer, but with 50 feature maps. Each
feature map is 5x5, and each unit in the feature maps is a 5x5
convolutional kernel of corresponding areas of all 6 of the feature
maps of the previous layers, each of which is a 13x13 feature map.
There are therefore 5x5x50 = 1250 neurons in Layer #2, (5x5+1)x6x50 =
7800 weights, and 1250x26 = 32500 connections.
The calculation of the number of weights, (5x5+1)x6x50 = 7800, seems strange for me. Shouldn't be the actual calculation like this:
(5x5x6+1)x50 = 7550 according to the parameter sharing explained here.
My argument is as follows:
We have 50 filters of size 5x5x6 and 1 bias for each filter, hence the total number of weights is (5x5x6+1)x50=7550. And this is Pytorch code which verifies this:
import torch
import torch.nn as nn
model = nn.Conv2d(in_channels=6, out_channels=50, kernel_size=5, stride=2)
params_count = sum(param.numel() for param in model.parameters() if param.requires_grad)
print(params_count) # 7550
Can anyone explain this and which one is correct?
My calculations:
Layer-1 depth is 6, kernel : 5*5
Layer-2 depth is 50 , kernel : 5*5
Total number of neurons #Layer-2 : 5*5*50 = 1250
Total number of weights would be: 5*5*50*6 = 7500
Finally, bias for #Layer-2 = 50 (depth is 50)
I agree with you : Total weights must be 7550.

How to get ouput from a particular layer from pretrained CNN in pytorch

I have pretrained CNN (RESNET18) on imagenet dataset , now what i want is to get output of my input image from a particular layer,
for example.
my input image is of FloatTensor(3, 224, 336) and i send a batch of size = 10 in my resnet model , now what i want is the output returned by model.layer4,
Now what i tried is out = model.layer4(Variable(input)) but it gave me input dimensions mismatch error(As expected), this is the exact error returned
RuntimeError: Need input of dimension 4 and input.size[1] == 64 but got input to be of shape: [10 x 3 x 224 x 336] at /Users/soumith/miniconda2/conda-bld/pytorch_1501999754274/work/torch/lib/THNN/generic/SpatialConvolutionMM.c:47
So i'm confused , how to proceed now to get my layer4 output
PS: My ultimate task is to combine layer4 output and fullyconnected layer output together (Tweeking in CNN, kind of gated CNN ), so if anyone have any insight in this case then please do tell me, maybe my above approach is not right
You must create a module with all layers from start to the block you want:
resnet = torchvision.models.resnet18(pretrained=True)
f = torch.nn.Sequential(*list(resnet.children())[:6])
features = f(imgs)

Tune input features using backprop in keras

I am trying to implement discriminant condition codes in Keras as proposed in
Xue, Shaofei, et al., "Fast adaptation of deep neural network based
on discriminant codes for speech recognition."
The main idea is you encode each condition as an input parameter and let the network learn dependency between the condition and the feature-label mapping. On a new dataset instead of adapting the entire network you just tune these weights using backprop. For example say my network looks like this
X ---->|----|
|DNN |----> Y
Z --- >|----|
X: features Y: labels Z:condition codes
Now given a pretrained DNN, and X',Y' on a new dataset I am trying to estimate the Z' using backprop that will minimize prediction error on Y'. The math seems straightforward except I am not sure how to implement this in keras without having access to the backprop itself.
For instance, can I add an Input() layer with trainable=True with all other layers set to trainable= False. Can backprop in keras update more than just layer weights? Or is there a way to hack keras layers to do this?
Any suggestions welcome.
thanks
I figured out how to do this (exactly) in Keras by looking at fchollet's post here
Using the keras backend I was able to compute the gradient of my loss w.r.t to Z directly and used it to drive the update.
Code below:
import keras.backend as K
import numpy as np
model.summary() #Pretrained model
loss = K.categorical_crossentropy(Y, Y_out)
grads = K.gradients(loss, Z)
grads /= (K.sqrt(K.mean(K.square(grads)))+ 1e-5)
iterate = K.function([X,Z],[loss,grads])
step = 0.1
Z_adapt = Z_in.copy()
for i in range(100):
loss_val, grads_val = iterate([X_in,Z_adapt])
Z_adapt -= grads_val[0] * step
print "iter:",i,np.mean(loss_value)
print "Before:"
print model.evaluate([X_in, Z_in],Y_out)
print "After:"
print model.evaluate([X_in, Z_adapt],Y_out)
X,Y,Z are nodes in the model graph. Z_in is an initial value for Z'. I set it to an average value from the train set. Z_adapt is after 100 iterations of gradient descent and should give you a better result.
Assume that the size of Z is m x n. Then you can first define an input layer of size m * n x 1. The input will be an m * n x 1 vector of ones. You can define a dense layer containing m * n neurons and set trainable = True for it. The response of this layer will give you a flattened version of Z. Reshape it appropriately and give it as input to the rest of the network that can be appended ahead of this.
Keep in mind that if the size of Z is too large, then network may not be able to learn a dense layer of that many neurons. In that case, maybe you need to put additional constraints or look into convolutional layers. However, convolutional layers will put some constraints on Z.