In one particular application, I need to train only bias of the convolutional operation. Therefore I removed W parameter from trainable_weight. that looks like this:
self.trainable_weights = [ self.b]
I save model at 0 epoch and after 200 epoch and I found that W is somehow able to learn. Do not know what is going on. When I saw model.summary() it showing the correct number of learning parameters. Can anyone tell me what is wrong here?
Related
I'm totally new to pytorch, so it might be a very basic question. I have two networks that should be trained together.
First one takes data as input and returns its embedding as output.
Second one takes pairs of embedded datapoints and returns their 'similarity' as output.
Partial loss is then computed for every datapoint, and then all the losses are combined.
This final loss should be backpropagated through both networks.
How should the code for that look like? I'm thinking something like this:
def train_models(inputs, targets):
network1.train()
network2.train()
embeddings = network1(inputs)
paired_embeddings = pair_embeddings(embeddings)
similarities = network2(similarities)
"""
I don't know how the loss should be calculated here.
I have a loss formula for every embedded datapoint,
but not for every similarity.
But if I only calculate loss for every embedding (using similarites),
won't backpropagate() only modify network1,
since embeddings are network1's outputs
and have not been modified in network2?
"""
optimizer1.step()
optimizer2.step()
scheduler1.step()
scheduler2.step()
network1.eval()
network2.eval()
I hope this specific enough. I'll gladly share more details if necessary. I'm just so inexperienced with pytorch and deep learning in general, that I'm not even sure how to ask this question.
You can use single optimizer for this purpose, and even pass different learning rate for each network.
optimizer = optim.Adam([
{'params': network1.parameters()},
{'params': network2.parameters(), 'lr': 1e-3}
], lr=1e-4)
# ...
loss = loss1 + loss2
loss.backward()
optimizer.step()
I'm working on a multi input multi output which take an image and a numpy array as an input with output of 2 regression values , while training i notice that the total loss of the model is greater than the sum of the two outputs , what does it mean ?
I am unable to completely understand your problem, but from what I can comprehend is that unless your val loss and total loss is approximately near by then your model is going good, basically both of your losses should not differ much. For this kind of problems try and use tensorBoard to get graphical output for your model performance and check if model loss and val loss are varying closely or is there any hindrance.
I am trying to implement a CNN in Tensorflow (quite similar architecture to VGG), which then splits into two branches after the first fully connected layer. It follows this paper: https://arxiv.org/abs/1612.01697
Each of the two branches of the network outputs a set of 32 numbers. I want to write a joint loss function, which will take 3 inputs:
The predictions of branch 1 (y)
The predictions of branch 2 (alpha)
The labels Y (ground truth) (q)
and calculate a weighted loss, as in the image below:
Loss function definition
q_hat = tf.divide(tf.reduce_sum(tf.multiply(alpha, y),0), tf.reduce_sum(alpha,0))
loss = tf.abs(tf.subtract(q_hat, q))
I understand the fact that I need to use the tf functions in order to implement this loss function. Having implemented the above function, the network is training, but once trained, it is not outputting the expected results.
Has anyone ever tried combining outputs of two branches of a network in one joint loss function? Is this something TensorFlow supports? Maybe I am making a mistake somewhere here? Any help whatsoever would be greatly appreciated. Let me know if you would like me to add any further details.
From TensorFlow perspective, there is absolutely no difference between a "regular" CNN graph and a "branched" graph. For TensorFlow, it is just a graph that needs to be executed. So, TensorFlow certainly supports this. "Combining two branches into joint loss" is also nothing special. In fact, it is "good" that loss depends on both branches. It means that when you ask TensorFlow to compute loss, it will have to do the forward pass through both branches, which is what you want.
One thing I noticed is that your code for loss is different than the image. Your code appears to do this https://ibb.co/kbEH95
I have a kind of euclidean loss function which is:
\sum_{i,j} c_i*max{0,y_{ji}-k_{ji}} + p_i*max{0,k_{ji}-y_{ji}}
which y_{ji} are the output of caffe and k_{ji} are the real output value, i is the index of the items and j is index of samples.
The issue is about getting the values of parameters c_i and p_i.
When I have c_i = c_q for all i \neq q, and similarly for p_i, I simply get the values of them as parameters of the loss layer (I added two new parameters in the caffe.proto). However, the problems is that now I have around 300 items so that it is not reasonable to get them as loss layer parameters.
I tried to get their values in the loss layer, I mean I tried to add another bottom layer for loss layer, but it gave an error.
I am stuck here!
Please guide me how I can solve this issue.
Thanks in advance,
Afshin
I am wondering how to go about setting up ONLY a test phase in Caffe for an LMDB file. I have already trained my model, everything seems good, my loss has decreased, and the output I am getting on images loaded in one by one also seem good.
Now I would like to see how my model performs on a separate LMDB test set, but seem to be unable to do so successfully. It would not be ideal for me to do a loop by loading images one at a time since my loss function is already defined in caffe and this would require me to redefine it.
this is what I have so far, but the results of this dont make sense; when I compare the loss I have from the train set to the loss I get from this, they don't match (orders of magnitude apart). Does anyone have any idea what my problem could be?
caffe.set_device(0)
caffe.set_mode_gpu()
net = caffe.Net('/home/jeremy/Desktop/caffestuff/JP_Kitti/all_proto/mirror_shuffle/deploy_JP.prototxt','/home/jeremy/Desktop/caffestuff/JP_Kitti/all_proto/mirror_shuffle/snapshot_iter_10000.caffemodel',caffe.TEST)
solver = None # ignore this workaround for lmdb data (can't instantiate two solvers on the same data)
solver = caffe.SGDSolver('/home/jeremy/Desktop/caffestuff/JP_Kitti/all_proto/mirror_shuffle/lenet_auto_solverJP_test.prototxt')
niter = 100
test_loss = zeros(niter)
count = 0
for it in range(niter):
solver.test_nets[0].forward() # SGD by Caffe
# store the test loss
test_loss[count] = solver.test_nets[0].blobs['loss']
print(solver.test_nets[0].blobs['loss'].data)
count = count+1
See my answer here. Do not forget to subtract the mean, otherwise you'll get low accuracy. The link to the code, posted above, takes care of that.