In Random under sampling, How can I define drop ratio? - imblearn

when I use Undersampling code, But it seems to drop Major Class up to same ratio that the number of major Class is as same as the number of minor class.(50% vs 50%)
I want to make 70% for Major class when there is a 30% of Minor class.
How can I handle this problem and what is the parameter for setting weight between Major and Minor Class?
sampler = RandomUnderSampler(ratio={1: 1000, 0: 65})
X_rs, y_rs = sastrong textmpler.fit_sample(X, y)
print('Random undersampling {}'.format(Counter(y_rs)))

Answer
So basically, the RandomUnderSampler(sampling_strategy = X) uses a strategy in which, the minority class is X percent of the majority class. Therefore, if you choose that X=1, you will have a result similar to auto which makes the two classes 100% balanced.
Now if you choose 0.9, you will make the minority class 90% of the majority class.
Therefore, if you want your total set to be 70%-30% you will need to do some math (jokes).
sampling_strategy = 0.5, because we are using ratios, if the majority class is double the size of the minority class, we get 1/3 vs 2/3, which is approximately what you are looking for.
TLDR
sampling_strategy takes the ratio and you want a 30/70 ratio. Therefore, you have to pass it 3/7 which is 0.428 .

Related

How contrastive loss work intuitively in siamese network

I am having issue in getting clear concept of contrastive loss used in siamese network.
Here is pytorch formula
torch.mean((1-label) * torch.pow(euclidean_distance, 2) +
(label) * torch.pow(torch.clamp(margin - euclidean_distance, min=0.0), 2))
where margin=2.
If we convert this to equation format, it can be written as
(1-Y)*D^2 + Y* max(m-d,0)^2
Y=0, if both images are from same class
Y=1, if both images are from different class
What i think, if images are from same class the distance between embedding should decrease. and if images are from different class, the distance should increase.
I am unable to map this concept to contrastive loss.
Let say, if Y is 1 and distance value is larger, the first part become zero (1-Y), and second also become zero, because it should choose whether m-d or 0 is bigger.
So the loss is zero which does not make sense.
Can you please help me to understand this
If the distance of a negative sample is greater than the specified margin, it should be already separable from a positive sample. Therefore, there is no benefit in pushing it farther away.
For details please check this blog post, where the concept of "Equilibrium" gets explained and why the Contrastive Loss makes reaching this point easier.

Understanding MaskRCNN Annotation feed

I'm currently working on a Object Detection project using Matterport MaskRCNN.
As part of the job is to detect a Green leaf that crosses a white grid. Until now I have defined the annotation (Polygons) in such a way that every single leaf which crosses the net (and gives white-green-white pattern) is considered a valid annotation.
But, when changing the definition above from single-cross annotation to multi-cross (more than one leaf crossing the net at once), I started to see a serious decrease in model performance during testing phase.
This raised my question - The only difference between the two comes down to size of the annotation. So:
Which of the following is more influential on learning during MaskRCNN's training - pattern or size?
If the pattern is influential, it's better. Because the goal is to identify a crossing. Conversely, if the size of the annotation is the influencer, then that's a problem, because I don't want the model to look for multi-cross or alternatively large single-cross in the image.
P.S. - References to recommended articles that explain the subject will be welcomed
Thanks in advance
If I understand correctly the shape of the annotation becomes longer and more stretched out if going for multicross annotation.
In that case you can change the size and side ratio of the anchors that are scanning the image for objects. With default settings the model often has squarish bounding boxes. This means that very long and narrow annotations create bounding boxes with a great difference between width and height. These objects seem to be harder to segment and detect by the model.
These are the default configurations in the config.py file:
Length of square anchor side in pixels
RPN_ANCHOR_SCALES = (32, 64, 128, 256, 512)
Ratios of anchors at each cell (width/height). A value of 1 represents a square anchor, and 0.5 is a wide anchor
RPN_ANCHOR_RATIOS = [0.5, 1, 2]
You can play around with these values in inference mode and look if it gives you some better results.

Is there any difference between user units and pixels?

I've been reading several articles about SVG that make a clear distinction between using and not using units (this last case even has a name of its own), e.g.
<!-- the viewport will be 800px by 600px -->
<svg width="800" height="600">
<!-- SVG content drawn onto the SVG canvas -->
</svg>
In SVG, values can be set with or without a unit identifier. A
unitless value is said to be specified in user space using user units.
If a value is specified in user units, then the value is assumed to be
equivalent to the same number of “px” units. This means that the
viewport in the above example will be rendered as a 800px by 600px
viewport.
You can also specify values using units. The supported length unit
identifiers in SVG are: em, ex, px, pt, pc, cm, mm, in, and
percentages.
source
Is there any actual difference between omiting the unit and setting it to px?
Can I just set e.g. mm everywhere to avoid ambiguity, or I'll eventually be getting different results?
<svg width="800mm" height="600mm">
Disclaimer: what follows is pure guessing (I only learnt the basics of SVG last week) but I'm sharing it because I believe it could help others with my same doubts and I hope it doesn't contain serious errors.
The SVG canvas is basically a mental concept—a infinite plane where you use Cartesian coordinates to place stuff and move around. It isn't too different from stroking shapes in a sheet of graph paper where you've drawn a cross to identify an arbitrary point as coordinate origin, except that notebooks are not infinite. In the same way that tou draw a 3-square radius circle in the sheet and you don't care that those squares represent 12 mm, you draw shapes in your SVG canvas using unitless dimensions because it doesn't really matter what exact physical size they represent. The SVG spec uses the term "user units" to express this idea.
Using actual units only makes sense in two situations:
When our virtual user units need to interact with real world, e.g., the canvas is to be printed in a computer monitor.
When we want an element in our graph to be defined in such a way that it doesn't scale, neither up nor down, e.g. a stroke around a letter that needs to look identical no matter how we resize the logo it belongs to.
It's in this situation, more specifically #1, when the px equivalence comes in handy. When we need to render the graph or make calculations what involve actual units, unitless dimensions are interpreted as pixels. We can think of it as a default because we can render the canvas any size and, in any case, pixels are no longer physical pixels in these days of high-res displays and builtin zoom.
And, for all this, it's probably better to just omit units in your SVG code. Adding them in a general basis only makes code unnecessarily verbose.

Actionscript 3, What is the difference is between scale and dimension

I'd like to know if I change the scale value, what happened to the object? I'm using flash air system.
I draw a movie clip box with 1000 x 1000px dimension.
I might change the size with 2 ways :
1st : Control of dimension with mc.with or mc.height
2nd : Control of dimension with mc.scaleX or mc.scaleY
Once I try to change the object with 500 x 500px,
Which one do you prefer : mc.width = mc.height = 500 vs mc.scaleX = mc.scaleY = 0.5
What is the benefit of using scale method?
Some good reading in the documentation here.
Basically they do exactly the same thing. It just depends on the case which one is easier for the developer to define in that case. If you know you need the result to be as wide as 212 pixels or the same width as object1 it makes sense to say
object2.width = 212;
or
object2.width = object1.width;
Let's assume you prefer to keep object2's dimensions proportional. You could then say
object2.scaleY = object2.scaleX;
without even knowing how many pixels that is or having another object of that same height to set it to.
The final note is this: if you change scale, dimension changes, and when you change dimension, scale also does change. In other words, setting scaleX to 1 will also set it back to its original width. Use them interchangeably. Use the one that is simpler for you in that instance.

CSS Additive Sizing

I am playing around with designing my own grid system. I decided to go with something that splits columns on percentage i.e. 10% 20% 30% etc. So I have Col-10 for a column that is 10%.
However, instead of doing a lot of coding, I want instead to use some sort of additive method. Think of it like money.
You have 100 note, a 50 note, a 20 note, a 10 note a 5 note and a bunch of small coins usually of the same denomination 100p coin, 50p coin, 20p coin, 10p coin, 5p coin and of course the all important 2p and 1p.
There is no 77 note. That would be made of 50, 20, and 5 notes, and 2 x 100p coins.
I want to do likewise with CSS. Instead of defining and using a specific width, lets say I want a column of width 77%. I would like to be able to do a class like this:
<div class="Col-50 Col-20 Col-5 Col-2">Content</div>
and in my CSS I would have these classes defined according to their respective percentages.
My problem is. The last class here, would be all that is applied. giving me a 2% column instead of the 77% column I intended.
Is there any magic CSS trick that will allow me to do some sort of additive % like what I am thinking or is this a JavaScript the only option.
I could do this in JavaScript but I want to avoid using JavaScript / jQuery or other code apart from HTML5/CSS3 at all costs because I want to remove external dependencies. While rare, it is still possible to disable JavaScript in browsers and I want my system to work without it if possible.
I also know I could us SCSS / LESS etc but ultimately the end result would be a very large CSS file filled with almost every % between 0 and 100. This is not my goal.
I don't believe this is possible with CSS alone. CSS is a styling language, so (with the exception of calc() https://developer.mozilla.org/en-US/docs/Web/CSS/calc) it doesn't have math calculations. It really isn't designed to compound values in that matter. Even a preprocessor like SASS/Less, I don't believe, would be able to accomplish that since the preprocessing is on the CSS side, not the HTML side. Perhaps an HTML preprocessor?
Either way, I'm not sure I follow the benefit of the class; adding 4 classes just to specify a width seems superfluous.
Maybe if they come out with "Compounding Style Sheets"? :)
no it isn't possible, At some point you would have to have 100 classes for the width. If you are set on staying css only. I would write a CSS generator that basically loops through and creates the redundant code and saves it to a file for you. Then you could go in and add to the file as needed.