Interpreting output of a multilabel classification task - deep-learning

I have a MultiLabel Classifier Network which has been trained to predict the affinity of a user towards a fruit. The core assumption is that a person can like more than one fruit hence multiple fruits can be tagged against a person. Therefore the final layer of this network is a sigmoid layer which scales the activations of each output node between 0 and 1, independent of the activation of the other output nodes. So for a given person, with some input set of features the MultiLabel Classifier Network predicts 3 probability values of a user liking each of Apple, Orange or Banana as shown in the table below.
There are two questions I want to answer:
Who are the top K users with highest affinity for Apples? For example, user 3 & 4 show highest affinity towards Apples in the table below.
Which fruit is users X's favourite fruit between Apple, Orange and Banana? For example, for user 2 "Banana>Apple>Orange".
We can definitely use this MultiLabel classifier to answer question 1. I would like to know why/why not should we use the output to answer question 2. Basically, for a given user, are the probability values for different fruits comparable?
Person
Apple
Orange
Banana
1
0.1
0.2
0.7
2
0.2
0.1
0.5
3
0.3
0.5
0.1
4
0.4
0.1
0.2

Related

Medical Image Segmentation / Image Segmentation

Total Dataset :- 100 (on case level)
Training :- 76 cases (18000 slices) Validation :- 19 cases (4000 slices) Test :- 5 cases (2000 slices)
I have a dataset that consists of approx. Eighteen thousand images, out of which approx. Fifteen thousand images are of the normal patient and around 3000 images of patients having some diseases. Now, for these 18000 images, I also have their segmentation mask. So, 15000 segmentations masks are empty, and 3000 have patches.
Should I also feed my model (deep learning, i.e., unet with resnet34 backbone) empty masks along with patches?
EDIT:- patches are very small. Currently, I'm getting 57% IoU on validation. When I tested on test cases, I'm getting over-segmenatation.
mask

Convergence failure while training GAN for 128x128 images

thanks for looking at this question!
I attempted to train a simple DCGAN to generate room designs from a dataset of 215 coloured images of size 128x128. My attempt can be summarised as below:
Generator: 5 deconvolution layers from (100x1) noise input to (128x128x1) grayscale image output
Discriminator: 4 convolution layers from (128x128x1) grayscale image input
Optimizer: Adam at learning rate of 0.002 for both Generator and Discriminator
Batch size: 21 images/batch
Epoch: 100 epochs with 10 batches/epoch
Results:
1. D-loss is close to 0, G-loss is close to 1. After which, I've cut down my discriminator by 2 convolution layers, reduce Adam learning rate to 0.00002, hoping that the discriminator doesn't overpower my generator.
Upon (1), D-loss and G-loss hovers around 0.5 - 1.0. However, the generated image still show noise images even after 100 epochs.
Questions:
Is there something wrong in terms of how I trained my GAN?
How should I modify my approach to successfully train the GAN?
Thank you so much everyone for your help, really looking forward!

How can we define an RNN - LSTM neural network with multiple output for the input at time "t"?

I am trying to construct a RNN to predict the possibility of a player playing the match along with the runs score and wickets taken by the player.I would use a LSTM so that performance in current match would influence player's future selection.
Architecture summary:
Input features: Match details - Venue, teams involved, team batting first
Input samples: Player roster of both teams.
Output:
Discrete: Binary: Did the player play.
Discrete: Wickets taken.
Continous: Runs scored.
Continous: Balls bowled.
Question:
Most often RNN uses "Softmax" or"MSE" in the final layers to process "a" from LSTM -providing only a single variable "Y" as output. But here there are four dependant variables( 2 Discrete and 2 Continuous). Is it possible to stitch together all four as output variables?
If yes, how do we handle mix of continuous and discrete outputs with loss function?
(Though the output from LSTM "a" has multiple features and carries the information to the next time-slot, we need multiple features at output for training based on the ground-truth)
You just do it. Without more detail on the software (if any) in use it is hard to give more detasmail
The output of the LSTM unit is at every times step on of the hidden layers of your network
You can then input it in to 4 output layers.
1 sigmoid
2 i'ld messarfound wuth this abit. Maybe 4x sigmoid(4 wickets to an innnings right?) Or relu4
3,4 linear (squarijng it is as lso an option,e or relu)
For training purposes your loss function is the sum of your 4 individual losses.
Since f they were all MSE you could concatenat your 4 outputs before calculating the loss.
But sincd the first is cross-entropy (for a decision sigmoid) yould calculate seperately and sum.
You can still concatenate them after to have a output vector

Can LSTM train for regression with different numbers of feature in each sample?

In my problem, each training and testing sample has different number of features. For example, the training sample is as following:
There are four features in sample1: x1, x2, x3, x4, y1
There are two features in sample2: x6, x3, y2
There are three features in sample3: x8, x1, x5, y3
x is feature, y is target.
Can these samples train for the LSTM regression and make prediction?
Consider following scenario: you have a (way to small) dataset of 6 sample sequences of lengths: { 1, 2, 3, 4, 5, 6} and you want to train your LSTM (or, more general, an RNN) with minibatch of size 3 (you feed 3 sequences at a time at every training step), that is, you have 2 batches per epoch.
Let's say that due to randomization, on step 1 batch ended up to be constructed from sequences of lengths {2, 1, 5}:
batch 1
----------
2 | xx
1 | x
5 | xxxxx
and, the next batch of sequences of length {6, 3, 4}:
batch 2
----------
6 | xxxxxx
3 | xxx
4 | xxxx
What people would typically do, is pad sample sequences up to the longest sequence in the minibatch (not necessarily to the length of the longest sequence overall) and to concatenate sequences together, one on top of another, to get a nice matrix that can be fed into RNN. Let's say your features consist of real numbers and it is not unreasonable to pad with zeros:
batch 1
----------
2 | xx000
1 | x0000
5 | xxxxx
(batch * length = 3 * 5)
(sequence length 5)
batch 2
----------
6 | xxxxxx
3 | xxx000
4 | xxxx00
(batch * length = 3 * 6)
(sequence length 6)
This way, for the first batch your RNN will only run up to necessary number of steps (5) to save some compute. For the second batch it will have to go up to the longest one (6).
The padding value is chosen arbitrarily. It usually should not influence anything, unless you have bugs. Trying some bogus values, like Inf or NaN may help you during debugging and verification.
Importantly, when using padding like that, there are some other things to do for model to work correctly. If you are using backpropagation, you should exclude the results of the padding from both, output computation and gradient computation (deep learning frameworks will do that for you). If you are running a supervised model, labels should typically also be padded and padding should not be considered for the loss calculation. For example, you calculate cross-entropy for the entire batch (with padding). In order to calculate a correct loss, the bogus cross-entropy values that correspond to padding should be masked with zeros, then each sequence should be summed independently and divided by its real length. That is, averaging should be performed without taking padding into account (in my example this is guaranteed due to the neutrality of zero with respect to addition). Same rule applies to regression losses and metrics such as accuracy, MAE etc (that is, if you average together with padding your metrics will also be wrong).
To save even more compute, sometimes people construct batches such that sequences in batches have roughly the same length (or even exactly the same, if dataset allows). This may introduce some undesired effects though, as long and short sequences are never in the same batch.
To conclude, padding is a powerful tool and if you are attentive, it allows you to run RNNs very efficiently with batching and dynamic sequence length.
Yes. Your input_size for LSTM-layer should be maximal among all input_sizes. And spare cells you replace with nulls:
max(input_size) = 5
input array = [x1, x2, x3]
And you transform it this way:
[x1, x2, x3] -> [x1, x2, x3, 0, 0]
This approach is rather common and does not show any negative big influence on prediction accuracy.

Neural Network outputs

I need to develop a neural network and classify the inputs into 3 categories. One of the category is "Don't Know"
Should I train the network using a single output perceptron which categories the training examples as 1,2, or 3? Or should I use a 2 output perceptron and use a binary scheme (01, 10, 00/11) to classify the inputs?
You should use 3 output neurons (one for each class). In the training phase, set output of neuron representing correct class to 1 and all others to 0. Single output with 1 2 and 3 is not optimal because that contains implicit assumtion that classes 2 and 3 are somehow "closer" to each other then 1 and 3. 2 outputs with binary coding is also not good, because in addition to solving classification problem you NN will have to learn binary encoding.
Also, its probably best to use softmax activation on output layer with cross-entropy error function. Softmax will normalize output, so values at each neuron could be interpreted as class probabilities.
Note that "don't know" class in only useful if you have training examples labeled as "don't know". Otherwise, use two output neurons.