We are trying to solve a binary classification problem where during training we have kept the two classes roughly equally proportioned 50-50, but during inferencing we will not be having balanced distribution of the labels. We specifically expect to see one of the classes(the positive class, label 1) as a severe minority as compared to the other(the negative class, label 0).
The below table shows how the precision of the minority class falls with the fall in proportion of the minority class(the positive class, label 1).
S.No.
Label 1
Label 0
Percentage of Label 1
Precision_1
Recall_1
1
50
50
50
0.9574468085106383
0.9
2
50
500
9
0.5625
0.9
3
50
1000
4.7
0.39823008849557523
0.9
4
50
2000
2.4
0.2647058823529412
0.9
5
50
5000
0.9
0.1278409090909091
0.9
We will be doing inference on batches of such data, where the class which is a minority could be ranging from negligible ~0.01% to a much higher number ~50-60%.
While class balancing(giving more weightage to one of the classes during inference) could help if the percentage of the minority class is fixed, this is not case here since the percentage of the minority class is not fixed.
Has anyone faced the same problem and are there any suggestions as to how we can counter this problem ? Please let us know.
Related
I've run into an odd issue when embedding plaintext files in html here. These plaintext files range in their number of lines, and I used to determine the optimal height of the field with a simple multiplication, with a ratio of 22.
Turns out, the larger the number of lines, the less this works. I've put together this table to describe how the ratio and slope change, based on four data points (the optimal height is defined by that which doesn't generate a scrollbar):
3 66 22 N/A
9 186 20.66 20.00
23 366 15.913 12.87
33 516 15.636 15.00
You can also see the odd graph here. Currently, I use this equation to calculate the embed heights. This won't work well for all numbers of lines, however.
I don't understand why:
This isn't a linear fit, considering the font is monospaced, and
The slope changes with each datapoint
PNG files may contain chunks of optional informations. One of these optional information blocks is the physical resolution of the image (chunk-signature pHYs).[1] [2] It contains separate values for horizontal and vertical resolution as pixels per unit, and a unit specifier, that can be 0 for unit unspecified, or 1 for meter ← that's quite confusing, because resolutions are traditionally expressed in DPIs.
The Inch is defined as 25.4 mm in the metric system.
So, if I calculate this correctly, 96 DPIs means 3779.527559... dots per metre. For the pHYs chunk, this has to be rounded. I'd say 3780 is the right value, but I found also 3779 suggested on the web. Images of both kind also coexist on my machine.
The difference may not be important in most cases,
3779 * 0.054 = 95.9866
3780 * 0.054 = 96.012
but I try to avoid tricky layout problems when mixing images of both kind in processes that are DPI-aware like creating PDF files using LaTeX.
[1] Portable Network Graphics (PNG) Specification (Second Edition), section11.3.5.3 pHYs Physical pixel dimensions
[2] PNG Specification: Chunk Specifications, section 4.2.4.2. pHYs Physical pixel dimensions
The relative difference is less that 0.03% (2.65/10000), it's hardly relevant.
Anyway, I'd go with 3780. Not only it's the nearest value, but it would give the correct value if some (sloppy) conversor rounds the value down (instead of rounding to the nearest).
Also, if you google "72.009 DPI PNG" you'll see a similar (non) issue with 72 DPI (example), and it seems that most people rounded the value up (which is also the nearest) 2834.645 -> 2835
I will be thankful if you answer my question. I am worried I am doing wrong, because my network always gives black image without any segmentation.
I am doing semantic segmentation in Caffe. The output of score layer is <1 5 256 256> batch_size no_classes image_width image_height. Which is sent to SoftmaxWithLoss layer, and the out input of loss layer is the groundtruth image with 5 class labels <1 1 256 256>.
My question is: the dimension of these two inputs of loss layer does not match. Should I create 5 label images for these 5 classes and send a batch_size of 5 in label layer into the loss layer?
How can I prepare label data for semantic segmentation?
Regards
your dimensions are okay. you are outputting 5 vector per pixel indicating the probability of each class. The ground truth is a single label (integer) and the loss encourages the probability of the correct label to be the maximal for the pixel
I need help. Trying to understand how the math of a deconv layer works. Let's talk about this layer:
layer {
name: "decon"
type: "Deconvolution"
bottom: "conv2"
top: "decon"
convolution_param {
num_output: 1
kernel_size: 4
stride: 2
pad: 1
}
}
So basically this layer is supposed to "upscale" an image by a factor of 2. If I look at the learned weights, I see e.g. this:
-0,0629104823 -0,1560362280 -0,1512266700 -0,0636162385
-0,0635886043 +0,2607241870 +0,2634004350 -0,0603787377
-0,0718072355 +0,3858278100 +0,3168329000 -0,0817491412
-0,0811873227 -0,0312164668 -0,0321144797 -0,0388795212
So far, so good. Now I'm trying to understand how to apply these weights to actually achieve the upscaling effect. I need to do this in my own code because I want to use simple pixel shaders.
Looking at the Caffe code, "DeconvolutionLayer::Forward_cpu" internally calls "backward_cpu_gemm", which does "gemm", followed by "col2im". My understanding of how all this works is this: gemm takes the input image, and multiplies each pixel with each of the 16 weights listed above. So basically gemm produces 16 output "images". Then col2im sums up these 16 "images" to produce the final output image. But due to the stride of 2, it stretches the 16 gemm images over the output image in such a way that each output pixel is only comprised of 4 gemm pixels. Does that sound correct to you so far?
My understand is that each output pixel is calculated from the nearest 4 low-res pixels, by using 4 weights from the 4x4 deconv weight matrix. If you look at the following image:
https://i.stack.imgur.com/X6iXE.png
Each output pixel uses either the yellow, pink, grey or white weights, but not the other weights. Do I understand that correctly? If so, I have a huge understanding problem, because in order for this whole concept to work correctly, e.g. the yellow weights should add up to the same sum as the pink weights etc. But they do not! As a result my pixel shader produces images where 1 out of 4 pixels is darker than the others, or every other line is darker, or things like that (depending on which trained model I'm using). Obviously, when running the model through Caffe, no such artifacts occur. So I must have a misunderstanding somewhere. But I can't find it... :-(
P.S: Just to complete the information: There's a conv layer in front of the deconv layer with "num_output" of e.g. 64. So the deconv layer actually has e.g. 64 4x4 weights, plus one bias, of course.
After a lot of debugging I found that my understanding of the deconv layer was perfectly alright. I fixed all the artifacts by simply dividing the bias floats by 255.0. That's necessary because pixel shaders run in 0-1 range, while the Caffe bias constants seem to be targetted at 0-255 pixel values.
Everything working great now.
I still don't understand why the 4 weight pairs don't sum up to the same value and how that can possibly work. But what do I know. It does work, after all. I suppose some things will always be a mystery to me.
Like the title, anyone know how to draw a circuit diagram to check a 4 bits number odd or even ??
You don't really need a circuit for this - bit 0 of the input determines whether the number is odd or even, so you can ignore bits 1 - 3 and just use bit 0 as an odd/even output (it will be 1 for odd, 0 for even). So the circuit, such as it is, would look like this:
INPUT OUTPUT
bit 3 o------------- N/C
bit 2 o------------- N/C
bit 1 o------------- N/C
bit 0 o------------------------------------o odd/even