Let me first explain the idea. The actual math question is below the screenshots.
For musical purpose I am building a groove algorithm where event positions are translated by a mathematical function F(X). The positions are normalized inside the groove range, so I am basically dealing with values between zero and one (which makes shaping groove curves way easier-the only limitation is x'>=0).
This groove algorithm accepts any event position and also work by filtering static notes from a data-structure like a timeline note-track. For filtering events in a certain range (audio block-size) I need the inverse groove-function to locate the notes in the track and transform them into the groove space. So far so good. It works!
In short: I use an inverse function for the fact that it is mirrored to (y=x). So I can plug in a value x and get a y. This y can obviously plugged into the inverse function to get first x again.
Problem: I now want to be able to blend the groove into another, but the usual linear (hint hint) blending code does not behave like I expected it. To make it easier, I first tried to blend to y=x.
B(x)=alpha*F(x) + (1-alpha)*x;
iB(x)=alpha*iF(x) + (1-alpha)*x;
For alpha=1 we get the full curve. For alpha=0 we get the straight line. But for alpha between 0 and 1 B(x) and iB(x) are not mirrored anymore (close, but not enough), F(x) and iF(x) are still mirrored.
Is there a solution for that (besides quantizing the curve into line segments)? Any subject I should throw an eye on?
you are combining two functions, f(x) and g(x), so that y = a f(x) + (1-a) g(x). and given some y, a, f and g, you want to find x. at least, that is what i understand.
i don't see how to do this generally (although i haven't tried very hard - i mean, it would be worth asking someone else), but i suspect that for "nice" shaped functions, like you seem to be using, newton's method would be fairly quick.
you want to find x such that y = a f(x) + (1-a) g(x). in other words, when 0 = a f(x) + (1-a) g(x) - y.
so let's define r(x) = a f(x) + (1-a) g(x) - y and find the "zero" of that. start with a guess in the middle, x_0 = 0.5. calculate x_1 = x_0 - r(x_0) / r'(x_0). repeat. if you are lucky this will rapidly converge (if not, you might consider defining the functions relative to y=x, which you already seem to be doing, and trying it again).
see wikipedia
This problem can't be solved algebraically, in general.
Consider for instance
y = 2e^x (inverse x = log 0.5y)
and
y = 2x (inverse x = 0.5y).
Blending these together with weight 0.5 gives y = e^x+x, and it is well-known that it is not possible to solve for x here using only elementary functions, even though the inverse of each piece was easy to find.
You will want to use a numerical method to approximate the inverse, as discussed by andrew above.
Related
I am doing the course of fast-ai, SGD and I can not understand.....
This subtracts the coefficients by (learning rate * gradient)...
But why is it necessary to subtract?
here is the code:
def update():
y_hat = x#a
loss = mse(y_hat, y)
if t % 10 == 0: print (loss)
loss.backward()
with torch.no_grad():
a.sub_(lr * a.grad)
Look at the image. It shows the loss function J as a function of the parameter W. Here it is a simplified representation with W being the only parameter. So, for a convex loss function, the curve looks as shown.
Note that the learning rate is positive. On the left side, the gradient (slope of the line tangent to the curve at that point) is negative, so the product of the learning rate and gradient is negative. Thus, subtracting the product from W will actually increase W (since 2 negatives make a positive). In this case, this is good because loss decreases.
On the other hand (on the right side), the gradient is positive, so the product of the learning rate and gradient is positive. Thus, subtracting the product from W reduces W. In this case also, this is good because the loss decreases.
We can extend this same thing for more number of parameters (the graph shown will be higher dimensional and won't be easy to visualize, which is why we had taken a single parameter W initially) and for other loss functions (even non-convex ones, though it won't always converge to the global minima, but definitely to the nearest local minima).
Note : This explanation can be found in Andrew Ng's courses of deeplearning.ai, but I couldn't find a direct link, so I wrote this answer.
I'm assuming a represents your model parameters, based on y_hat = x # a. This is necessary because the stochastic gradient descent algorithm aims to find a minima of the loss function. Therefore, you take the gradient w.r.t. your model parameters, and update them a little in the direction of the gradient.
Think of the analogy of sliding down a hill: if the landscape represents your loss, the gradient is the direction of steepest descent. To get to the bottom (i.e. minimize loss), you take little steps in the direction of the steepest descent from where you're standing.
I am trying to understand Loss functions for Bounding Box Regression in CNNs. Currently I use Lasagne and Theano, which makes writing loss expressions very easy. Many sources propose different methods and I am asking myself which one is usually used in practice.
The bounding boxes coordinates are represented as normalized coordinates in the order [left, top, right, bottom] (using T.matrix('targets', dtype=theano.config.floatX)).
I have tried the following functions so far; however all of them have their drawbacks.
Intersection over Union
I was adviced to use the Intersection over Union measure to identify how well the 2 bounding boxes align and overlap. However, a problem occurs when the boxes don't overlap and then intersection is 0; then the whole quotient turns 0 regardless of how far the bounding boxes are apart. I implemented it as:
def get_area(A):
return (A[:,2] - A[:,0]) * (A[:,1] - A[:,3])
def get_intersection(A, B):
return (T.minimum(A[:,2], B[:,2]) - T.maximum(A[:,0], B[:,0])) \
* (T.minimum(A[:,1], B[:,1]) - T.maximum(A[:,3], B[:,3]))
def bbox_overlap_loss(A, B):
"""Computes the bounding box overlap using the
Intersection over union"""
intersection = get_intersection(A, B)
union = get_area(A) + get_area(B) - intersection
# Turn into loss
l = 1.0 - intersection / union
return l.mean()
Squared Diameter Difference
To create an error measure for non overlapping bounding boxes, I tried to compute the squared difference of the bounding box diameter. It seems to work, but I almost sure that there is much better way to do this. I implemented it as:
def squared_diameter_loss(A, B):
# Represent the squared distance from the real diameter
# in normalized pixel coordinates
l = (abs(A[:,0:2]-B[:,0:2]) + abs(A[:,2:4]-B[:,2:4]))**2
return l.mean()
Euclidean Loss
The simplest function would be the Euclidean Loss which computes the square root of the difference of the bounding box parameters squared. However, this doesn't take into account the area of the overlapping bounding box but only the difference of the parameters left, right, top, bottom. I implemented it as:
def euclidean_loss(A, B):
l = lasagne.objectives.squared_error(A, B)
return l.mean()
Could someone guide me on which would be the best loss function for bounding box regression for this use case or spot if I am doing something wrong here. Which loss function is usually used in practice?
Speaking from personal implementation experience, I had much better results training a CNN using IOU as the loss function as opposed to Euclidean (MSE or L2) Loss. Have not used the squared diameter difference loss. In general, a loss function that explicitly represents the goodness of your outputs for the tasks you hope to accomplish is probably best.
With regards to the IOU having a value of zero, you can introduce some additional term in the formulation so that it gracefully trends towards 0, perhaps based on normalized distance between bbox centers. This might give the additional effect of helping to center bounding boxes relative to the ground truth.
This response is mostly conceptual but I'd be happy to supply code examples if desired.
RotateTo comes with two create methods - one which lets you specify only a single angle to rotate to, and another where one can specify a different angle for X and Y.
I don't understand what is going on when you use the latter (specifying angles for both X and Y). In the form, its just that the same angle is used for both.
Can someone explain what is actually going on when you specify two different angles, and the situation when this would actually be useful ? I've tried it out, but I can't figure out how this is useful ?
EDIT: I'm aware that the output using different x,y vs same looks different. I've actually tried it out myself. My question is - what's the point ? In which situation is it useful?
There are difference's between giving single angle and two different x and y. For Single we are giving an angle for example 90° , In case of the second We will give separate angle for both x and y to create an action. The look of animation differs from both. Usage of them actually depends on your project needs.
For Eg If you code like below
Single Angle :- CCFiniteTimeAction* actionRotate1 = CCRotateTo::create(6.0, 90);
or
Different x and y Angle :- CCFiniteTimeAction* actionRotate1 = CCRotateTo::create(6.0, 90,90);
The output will be like this
90° Output
But when you give different angle the difference between the animations can be felt for giving single angle and giving different angle's for x and y.
Single Angle :- CCFiniteTimeAction* actionRotate1 = CCRotateTo::create(6.0, 540);
The output is :- Single Angle Output
Different x and y Angle :- CCFiniteTimeAction* actionRotate1 = CCRotateTo::create(6.0,0,540);
The output is :- Different x and y Output
You can feel the animation changes when we give different angles for both x and y. It gives you flip like animation look but in case of single angle, the specified images is just rotated to the desired angle.
Hope this helps you.
I'm trying to diagnose and fix a bug which boils down to X/Y yielding an unstable result when X and Y are small:
In this case, both cx and patharea increase smoothly. Their ratio is a smooth asymptote at high numbers, but erratic for "small" numbers. The obvious first thought is that we're reaching the limit of floating point accuracy, but the actual numbers themselves are nowhere near it. ActionScript "Number" types are IEE 754 double-precision floats, so should have 15 decimal digits of precision (if I read it right).
Some typical values of the denominator (patharea):
0.0000000002119123
0.0000000002137313
0.0000000002137313
0.0000000002155502
0.0000000002182787
0.0000000002200977
0.0000000002210072
And the numerator (cx):
0.0000000922932995
0.0000000930474444
0.0000000930582124
0.0000000938123574
0.0000000950458711
0.0000000958000159
0.0000000962901528
0.0000000970442977
0.0000000977984426
Each of these increases monotonically, but the ratio is chaotic as seen above.
At larger numbers it settles down to a smooth hyperbola.
So, my question: what's the correct way to deal with very small numbers when you need to divide one by another?
I thought of multiplying numerator and/or denominator by 1000 in advance, but couldn't quite work it out.
The actual code in question is the recalculate() function here. It computes the centroid of a polygon, but when the polygon is tiny, the centroid jumps erratically around the place, and can end up a long distance from the polygon. The data series above are the result of moving one node of the polygon in a consistent direction (by hand, which is why it's not perfectly smooth).
This is Adobe Flex 4.5.
I believe the problem most likely is caused by the following line in your code:
sc = (lx*latp-lon*ly)*paint.map.scalefactor;
If your polygon is very small, then lx and lon are almost the same, as are ly and latp. They are both very large compared to the result, so you are subtracting two numbers that are almost equal.
To get around this, we can make use of the fact that:
x1*y2-x2*y1 = (x2+(x1-x2))*y2 - x2*(y2+(y1-y2))
= x2*y2 + (x1-x2)*y2 - x2*y2 - x2*(y2-y1)
= (x1-x2)*y2 - x2*(y2-y1)
So, try this:
dlon = lx - lon
dlat = ly - latp
sc = (dlon*latp-lon*dlat)*paint.map.scalefactor;
The value is mathematically the same, but the terms are an order of magnitude smaller, so the error should be an order of magnitude smaller as well.
Jeffrey Sax has correctly identified the basic issue - loss of precision from combining terms that are (much) larger than the final result.
The suggested rewriting eliminates part of the problem - apparently sufficient for the actual case, given the happy response.
You may find, however, that if the polygon becomes again (much) smaller and/or farther away from the origin, inaccuracy will show up again. In the rewritten formula the terms are still quite a bit larger than their difference.
Furthermore, there's another 'combining-large&comparable-numbers-with-different-signs'-issue in the algorithm. The various 'sc' values in subsequent cycles of the iteration over the edges of the polygon effectively combine into a final number that is (much) smaller than the individual sc(i) are. (if you have a convex polygon you will find that there is one contiguous sequence of positive values, and one contiguous sequence of negative values, in non-convex polygons the negatives and positives may be intertwined).
What the algorithm is doing, effectively, is computing the area of the polygon by adding areas of triangles spanned by the edges and the origin, where some of the terms are negative (whenever an edge is traversed clockwise, viewing it from the origin) and some positive (anti-clockwise walk over the edge).
You get rid of ALL the loss-of-precision issues by defining the origin at one of the polygon's corners, say (lx,ly) and then adding the triangle-surfaces spanned by the edges and that corner (so: transforming lon to (lon-lx) and latp to (latp-ly) - with the additional bonus that you need to process two triangles less, because obviously the edges that link to the chosen origin-corner yield zero surfaces.
For the area-part that's all. For the centroid-part, you will of course have to "transform back" the result to the original frame, i.e. adding (lx,ly) at the end.
I am developing a game with Flixel as a base, and part of what I need is a way to check for collisions along a line (a line from point A to point B, specifically). Best way to explain this is I have a laser beam shooting from one ship to another object (or to a point in space if nothing is overlapping the line). I want the line to reach only until it hits an object. How can I determine mathematically / programatically where along a line the line is running into an object?
I could try measuring the length of the line and checking points for collision until one does, but that seems like way too much overhead to do every frame when I'm sure there is a mathematical way to determine it.
Edit: Before checking an object for collision with the line itself, I would first eliminate any objects not within the line's bounding box - defined by the x of the left-most point, the y of the top-most point, the x of the right-most point, and the y of the bottom-most point. This will limit line-collision checks to a few objects.
Edit again: My question seems to still not be fully clear, sorry. Some of the solutions would probably work, but I'm looking for a simple, preferably mathematical solution. And when I say "rectangle" I mean one whose sides are locked to the x and y axis, not a rotatable rectangle. So a line is not a rectangle of width 0 unless it's at 90 or -90 degrees (assuming 0 degrees points to the right of the screen).
Here's a visual representation of what I'm trying to find:
So, you have a line segment (A-B) and I gather that line segment is moving, and you want to know at what point the line segment will collide with another line segment (your ship, whatever).
So mathematically what you want is to check when two lines intersect (two lines will always intersect unless parallel) and then check if the point where they intersect is on your screen.
First you need to convert the line segments to line equations, something like this:
typedef struct {
GLfloat A;
GLfloat B;
GLfloat C;
} Line;
static inline Line LineMakeFromCoords(GLfloat x1, GLfloat y1, GLfloat x2, GLfloat y2) {
return (Line) {y2-y1, x1-x2, (y2-y1)*x1+(x1-x2)*y1};
}
static inline Line LineMakeFromSegment(Segment segment) {
return LineMakeFromCoords(segment.P1.x,segment.P1.y,segment.P2.x,segment.P2.y);
}
Then check if they intersect
static inline Point2D IntersectLines(Line line1, Line line2) {
GLfloat det = line1.A*line2.B - line2.A*line1.B;
if(det == 0){
//Lines are parallel
return (Point2D) {0.0, 0.0}; // FIXME should return nil
}else{
return (Point2D) {(line2.B*line1.C - line1.B*line2.C)/det, (line1.A*line2.C - line2.A*line1.C)/det};
}
}
Point2D will give you the intersect point, of course you have to test you line segment against all the ship's line segments, which can be a bit time consuming, that's were collision boxes, etc enter the picture.
The math is all in wikipedia, check there if you need more info.
Edit:
Add-on to follow up comment:
Same as before test your segment for collision against all four segments of the rectangle, you will get one of 3 cases:
No collision/collision point not on screen(remember the collision tests are against lines, not line segments, and lines will always intersect unless parallel), taunt Player for missing :-)
One collision, draw/do whatever you want the segment you're asking will be A-C (C collision point)
Two collisions, check the size of each resulting segment (A-C1) and (A-C2) using something like the code below and keep the one with the shortest size.
static inline float SegmentSizeFromPoints(Vertice3D P1, Vertice3D P2) {
return sqrtf(powf((P1.x - P2.x),2.0) + pow((P1.y - P2.y),2.0));
}
The tricky bit when dealing with collisions, is figuring out ways of minimizing the number of tests you have to make.
Find the formula for the line y = ((y2 - y1)/(x2 - x1)) * (x - x1) + y1
Find the bounding boxes for your sprites
For each sprite's bounding box:
For each corner of the current bounding box:
Enter the x value of the corner's coordinate into the line formula (from 1) and subtract the y value of the coordinate from the result
Record the sign from the calculation in 5
If all 4 signs are equal, then no collision has/will occur. If any sign is different, then a collision is possible, do further checks.
I'm not mathematically gifted but I think you could do something like this:
Measure the distance from the centre of the block and the laser beam.
Measure the distance between the centre of the block and the edge of the block at a given angle (there would be a formula for this I just don't know what it is).
Subtract the result of point 1 from the result of point 2.
Good thing about this is that if point 1 is larger than point 2 you know there hasn't been a collision yet.
Alternatively use box2d, and just use b2ContactPoint
You should look at the Separating Axis Theorem. This is generally used for polygons, but I think that you can make it work for a line and a polygon.
I found a link that explains it in a concise manner, here.