Why are negative frequencies of a discrete fourier transform appear where high frequencies are supposed to be? - fft

i am very new to this topic, please help me understand why this happens...
if i make the dft of a cos wave
cos(w*x)=0.5*(exp(i*w*x)+exp(-i*w*x))
i expect to get one peak at the "w" frequency, and the negative one "-w" should not be visible, but it appears at the far end of the spectrum where higher frequencies are supposed to be...
why? do high frequencies produce the same effect as negative ones?
if you imagine a wave with frequency equal to 3*pi/2, that is, (0, 3pi/2, 6pi/2, 9pi/2), it does seem like a wave with negative frequency -pi/2 (0, -pi/2, -pi, -3pi/2)
is that the reason to what is happening? please help!

High frequencies between the Nyquist rate (Fs/2) and the Sample rate (Fs) alias to negative frequencies, due to sampling above the Nyquist rate. And vice versa, negative frequencies alias to the top half of an FFT. "Alias" means you can't tell those frequencies apart at that sample rate, due to the fact that sampling sinewaves at either of those frequencies at that sample rate can produce absolutely identical sets of samples.

Related

How contrastive loss work intuitively in siamese network

I am having issue in getting clear concept of contrastive loss used in siamese network.
Here is pytorch formula
torch.mean((1-label) * torch.pow(euclidean_distance, 2) +
(label) * torch.pow(torch.clamp(margin - euclidean_distance, min=0.0), 2))
where margin=2.
If we convert this to equation format, it can be written as
(1-Y)*D^2 + Y* max(m-d,0)^2
Y=0, if both images are from same class
Y=1, if both images are from different class
What i think, if images are from same class the distance between embedding should decrease. and if images are from different class, the distance should increase.
I am unable to map this concept to contrastive loss.
Let say, if Y is 1 and distance value is larger, the first part become zero (1-Y), and second also become zero, because it should choose whether m-d or 0 is bigger.
So the loss is zero which does not make sense.
Can you please help me to understand this
If the distance of a negative sample is greater than the specified margin, it should be already separable from a positive sample. Therefore, there is no benefit in pushing it farther away.
For details please check this blog post, where the concept of "Equilibrium" gets explained and why the Contrastive Loss makes reaching this point easier.

What does a.sub_(lr*a.grad) actually do?

I am doing the course of fast-ai, SGD and I can not understand.....
This subtracts the coefficients by (learning rate * gradient)...
But why is it necessary to subtract?
here is the code:
def update():
y_hat = x#a
loss = mse(y_hat, y)
if t % 10 == 0: print (loss)
loss.backward()
with torch.no_grad():
a.sub_(lr * a.grad)
Look at the image. It shows the loss function J as a function of the parameter W. Here it is a simplified representation with W being the only parameter. So, for a convex loss function, the curve looks as shown.
Note that the learning rate is positive. On the left side, the gradient (slope of the line tangent to the curve at that point) is negative, so the product of the learning rate and gradient is negative. Thus, subtracting the product from W will actually increase W (since 2 negatives make a positive). In this case, this is good because loss decreases.
On the other hand (on the right side), the gradient is positive, so the product of the learning rate and gradient is positive. Thus, subtracting the product from W reduces W. In this case also, this is good because the loss decreases.
We can extend this same thing for more number of parameters (the graph shown will be higher dimensional and won't be easy to visualize, which is why we had taken a single parameter W initially) and for other loss functions (even non-convex ones, though it won't always converge to the global minima, but definitely to the nearest local minima).
Note : This explanation can be found in Andrew Ng's courses of deeplearning.ai, but I couldn't find a direct link, so I wrote this answer.
I'm assuming a represents your model parameters, based on y_hat = x # a. This is necessary because the stochastic gradient descent algorithm aims to find a minima of the loss function. Therefore, you take the gradient w.r.t. your model parameters, and update them a little in the direction of the gradient.
Think of the analogy of sliding down a hill: if the landscape represents your loss, the gradient is the direction of steepest descent. To get to the bottom (i.e. minimize loss), you take little steps in the direction of the steepest descent from where you're standing.

What is soft constraint in box2d?

I am creating a mouse joint and I bump across this term, what it actually means.
documentation for mouse joint:-"A mouse joint is used to make a point on a body track a specified world point. This a soft constraint with a maximum force.
* This allows the constraint to stretch and without applying huge forces."
Let's say that we have a distance joint;
b2DistanceJointDef DistJointDef;
you can achieve a spring-like effect by tuning the frequency and damping ratios.
DistJointDef.frequencyHz = 0.5f;
DistJointDef.dampingRatio = 0.5f;
FrequencyHz will determine how much the body should stretch/shrink over time.
whereas the dampingRation will determine how long the spring-like effect will last.
These principles are also applied to Mouse joints. you can modify their frequency and damping ratio to achieve a similar effect.
If I recall correctly, you can apply the soft constraints on wheel joints as well.
here is a little bit more info on the subject from Box2dManual
Softness is achieved by tuning two constants in the definition: frequency and damping ratio. Think of the frequency as the frequency of a harmonic oscillator (like a guitar string). The frequency is specified in Hertz. Typically the frequency should be less than a half the frequency of the time step. So if you are using a 60Hz time step, the frequency of the distance joint should be less than 30Hz. The reason is related to the Nyquist frequency.
The damping ratio is non-dimensional and is typically between 0 and 1, but can be larger. At 1, the damping is critical (all oscillations should vanish).

Distance (km) significant but inverse not

Could somone explain this to me please? I have a regression with one of my variables being the distance from one point to another. This value is measured in km. The closer the points are together the lower the value.
However, for interpreation I felt that the inverse of distance would make more sense. The closer you are to an area, holding all things equal....etc.
However, I'm getting two different results in my model whether I take distance or the inverse of distance 1/variable.
See output;
This is with normal distance the coeff. and level of significance.
sportmin | -.0003924**
This is with the inverse of distance
inversesport | .0265864
Could someone explain what the issue is?
Theorticually the variable should be significant positive or negative depending whether you use distance or the inverse.
The problem is that 1/distance is not the inverse function for distance. 1/distance is an asymptotic function that is very curvy in its form, whereas distance is linear. Whenever you change the curvature of a variable, its statistical significance will change, e.g. log(variable) will behave very differently from variable in a regression model.
If you did a transformation that left the variable distance as a linear function, then it should not change its significance after having done the transformation.

How to divide tiny double precision numbers correctly without precision errors?

I'm trying to diagnose and fix a bug which boils down to X/Y yielding an unstable result when X and Y are small:
In this case, both cx and patharea increase smoothly. Their ratio is a smooth asymptote at high numbers, but erratic for "small" numbers. The obvious first thought is that we're reaching the limit of floating point accuracy, but the actual numbers themselves are nowhere near it. ActionScript "Number" types are IEE 754 double-precision floats, so should have 15 decimal digits of precision (if I read it right).
Some typical values of the denominator (patharea):
0.0000000002119123
0.0000000002137313
0.0000000002137313
0.0000000002155502
0.0000000002182787
0.0000000002200977
0.0000000002210072
And the numerator (cx):
0.0000000922932995
0.0000000930474444
0.0000000930582124
0.0000000938123574
0.0000000950458711
0.0000000958000159
0.0000000962901528
0.0000000970442977
0.0000000977984426
Each of these increases monotonically, but the ratio is chaotic as seen above.
At larger numbers it settles down to a smooth hyperbola.
So, my question: what's the correct way to deal with very small numbers when you need to divide one by another?
I thought of multiplying numerator and/or denominator by 1000 in advance, but couldn't quite work it out.
The actual code in question is the recalculate() function here. It computes the centroid of a polygon, but when the polygon is tiny, the centroid jumps erratically around the place, and can end up a long distance from the polygon. The data series above are the result of moving one node of the polygon in a consistent direction (by hand, which is why it's not perfectly smooth).
This is Adobe Flex 4.5.
I believe the problem most likely is caused by the following line in your code:
sc = (lx*latp-lon*ly)*paint.map.scalefactor;
If your polygon is very small, then lx and lon are almost the same, as are ly and latp. They are both very large compared to the result, so you are subtracting two numbers that are almost equal.
To get around this, we can make use of the fact that:
x1*y2-x2*y1 = (x2+(x1-x2))*y2 - x2*(y2+(y1-y2))
= x2*y2 + (x1-x2)*y2 - x2*y2 - x2*(y2-y1)
= (x1-x2)*y2 - x2*(y2-y1)
So, try this:
dlon = lx - lon
dlat = ly - latp
sc = (dlon*latp-lon*dlat)*paint.map.scalefactor;
The value is mathematically the same, but the terms are an order of magnitude smaller, so the error should be an order of magnitude smaller as well.
Jeffrey Sax has correctly identified the basic issue - loss of precision from combining terms that are (much) larger than the final result.
The suggested rewriting eliminates part of the problem - apparently sufficient for the actual case, given the happy response.
You may find, however, that if the polygon becomes again (much) smaller and/or farther away from the origin, inaccuracy will show up again. In the rewritten formula the terms are still quite a bit larger than their difference.
Furthermore, there's another 'combining-large&comparable-numbers-with-different-signs'-issue in the algorithm. The various 'sc' values in subsequent cycles of the iteration over the edges of the polygon effectively combine into a final number that is (much) smaller than the individual sc(i) are. (if you have a convex polygon you will find that there is one contiguous sequence of positive values, and one contiguous sequence of negative values, in non-convex polygons the negatives and positives may be intertwined).
What the algorithm is doing, effectively, is computing the area of the polygon by adding areas of triangles spanned by the edges and the origin, where some of the terms are negative (whenever an edge is traversed clockwise, viewing it from the origin) and some positive (anti-clockwise walk over the edge).
You get rid of ALL the loss-of-precision issues by defining the origin at one of the polygon's corners, say (lx,ly) and then adding the triangle-surfaces spanned by the edges and that corner (so: transforming lon to (lon-lx) and latp to (latp-ly) - with the additional bonus that you need to process two triangles less, because obviously the edges that link to the chosen origin-corner yield zero surfaces.
For the area-part that's all. For the centroid-part, you will of course have to "transform back" the result to the original frame, i.e. adding (lx,ly) at the end.