Tensorflow Multiple Input Loss Function - function

I am trying to implement a CNN in Tensorflow (quite similar architecture to VGG), which then splits into two branches after the first fully connected layer. It follows this paper: https://arxiv.org/abs/1612.01697
Each of the two branches of the network outputs a set of 32 numbers. I want to write a joint loss function, which will take 3 inputs:
The predictions of branch 1 (y)
The predictions of branch 2 (alpha)
The labels Y (ground truth) (q)
and calculate a weighted loss, as in the image below:
Loss function definition
q_hat = tf.divide(tf.reduce_sum(tf.multiply(alpha, y),0), tf.reduce_sum(alpha,0))
loss = tf.abs(tf.subtract(q_hat, q))
I understand the fact that I need to use the tf functions in order to implement this loss function. Having implemented the above function, the network is training, but once trained, it is not outputting the expected results.
Has anyone ever tried combining outputs of two branches of a network in one joint loss function? Is this something TensorFlow supports? Maybe I am making a mistake somewhere here? Any help whatsoever would be greatly appreciated. Let me know if you would like me to add any further details.

From TensorFlow perspective, there is absolutely no difference between a "regular" CNN graph and a "branched" graph. For TensorFlow, it is just a graph that needs to be executed. So, TensorFlow certainly supports this. "Combining two branches into joint loss" is also nothing special. In fact, it is "good" that loss depends on both branches. It means that when you ask TensorFlow to compute loss, it will have to do the forward pass through both branches, which is what you want.
One thing I noticed is that your code for loss is different than the image. Your code appears to do this https://ibb.co/kbEH95

Related

Harmonizing regression and classification losses

I'm investigating the task of training a neural network to predict one future value given a sinusoidal input. So for example, as seen in the Figure, the input signal is x and the expected output signal y. The model's output is y^. Doing the regression task is fairly straightforward, and there are a lot of choices for this problem. I'm using a simple recurrent neural network with mean-squared error (MSE) loss between y and y^.
Additionally, suppose I know that the sinusoid is made up of N modalities, e.g., at some points, the wave oscillates at 5 Hz, then 10 Hz, then back to 5 Hz, then up to 15 Hz maybe—i.e., N=3.
In this case, I have ground-truth class labels in a vector k and the model does both regression and classification, additionally outputting a vector k^. An example is shown in the Figure. As this is a multi-class problem with exclusivity, I figured binary cross entropy (BCE) loss should be relevant here.
I'm sure there is a lot of research about combining loss functions, but does just adding MSE and BCE make sense? Scaling one up or down by a factor of 10 doesn't seem to change the learning outcome too much. So I was wondering what is considered the standard approach to problems where there is a joint classification and regression objective.
Additionally, on top of just BCE, I want to penalize k^ for quickly jumping around between classes; for example, if the model guesses one class, I'd like it to stay in that one class and switch only when it's necessary. See how in the Figure, there are fast dark blue blips in k^. I would like the same solid bands as seen in k, and naive BCE loss doesn't account for that.
Appreciate any and all advice!

Bayesian models for CNN using Pyro & Pytorch

I am applying a Bayesian model for a CNN that has many layers (more than 3), using Stochastic Variational Inference in Pyro Package.
However after defining the NN, Model and Guide functions and running the training loop I found that the loss stops decreasing on loss ~8000 (which is extremely high). I tried different learning rates and different optimization functions but non of them reaches a loss lower than 8000.
At last I changed the autoguide function (I tried AutoNormal, AutoGaussian, AutoBeta) they all stopped dicreasing the loss at the same point.
enter image description here
The last thing I did is trying the AutoMultivatiateNormal and for this it reached negative values (it reached -1) but when I looked at the weight matrices I found that they are all turned into scalers!!
The following graph represent the loss pattern for the AutoMultivatiateNormal
enter image description here
Do anyone knows how to solve such problem?? and why is this happening??

Trade off between losses?

I have been working on a super-resolution task. I have this question about determining loss function, So in the case of the task at hand I felt like going with SSIM as a loss function to train my model. I did get a good set of results. Recently I come across perceptual loss function where we compare how a pretrained model looks at the Ground truth(GT) Images and the Super Resolution(SR) Image(Image generated by the model). My question is, I am thinking of using both ((1-SSIM(SR,GT))+Perceptual loss(SR,GT)) loss for backpropagation, so should I use a trade-off parameter between these two losses? if so how can I set up these trade-off parameters? or should I add these losses with equal weights.
PS: the perceptual loss is calculated by finding SSIMs between the feature maps of GT and SR images from the pre-trained model

How to implement soft-argmax in caffe?

In Caffe deep-learning framework there is an argmax layer which is not differentiable and hence can not be used for end to end training of a CNN.
Can anyone tell me how I could implement the soft version of argmax which is soft-argmax?
I want to regress coordinates from heatmap and then use those coordinates in loss calculations. I am very new to this framework therefore no idea how to do this. any help will be much appreciated.
I don't get exactly what you want, but there are following options:
Use L2 loss to train regression task (EuclideanLoss). Or SmoothL1Loss (from SSD Caffe by Wei Lui), or L1 (don't know were you get it).
Use softmax with cross-entropy loss (SoftmaxWithLoss) to train classification task with classes corresponding to the possible values of x or y coordinate. For example, one loss layer for x, and one for y. SoftmaxWithLoss accepts label as a numeric value, and casts it to int with static_cast(). But take into account that implementation doesn't check that the casted value is within 0..(num_classes-1) range, so you have to be careful.
If you want something more unusual, you'll have to write you own layer in C++, C++/CUDA or Python+NumPy. This is very often the case unless you are already using someone other's implementation.

How to perform multi labeling classification (for CNN)?

I am currently looking into multi-labeling classification and I have some questions (and I couldn't find clear answers).
For the sake of clarity let's take an example : I want to classify images of vehicles (car, bus, truck, ...) and their make (Audi, Volkswagen, Ferrari, ...).
So I thought about training two independant CNN (one for the "type" classification and one fore the "make" classifiaction) but I thought it might be possible to train only one CNN on all the classes.
I read that people tend to use sigmoid function instead of softmax to do that. I understand that sigmoid does not sum up to 1 like softmax does but I dont understand in what doing that enables to do multi-labeling classification ?
My second question is : Is it possible to take into account that some classes are completly independant ?
Thridly, in term of performances (accuracy and time to give the classification for a new image), isn't training two independant better ?
Thank you for those who could give my some answers or some ideas :)
Softmax is a special output function; it forces the output vector to have a single large value. Now, training neural networks works by calculating an output vector, comparing that to a target vector, and back-propagating the error. There's no reason to restrict your target vector to a single large value, and for multi-labeling you'd use a 1.0 target for every label that applies. But in that case, using a softmax for the output layer will cause unintended differences between output and target, differences that are then back-propagated.
For the second part: you define the target vectors; you can encode any sort of dependency you like there.
Finally, no - a combined network performs better than the two halves would do independently. You'd only run two networks in parallel when there's a difference in network layout, e.g. a regular NN and CNN in parallel might be viable.