center of a cluster of points and track shape - center

I have plots of points which look like this.
The tracks which these points form can be a circle or an ellipse. Clearly the center of the circular tracks in the two images above are different.
How can I find the center point of these tracks (circular/elliptical)? I want to find the (x,y) coordinates which is the center, not necessary that it has to be a point that's in the plotted data set. i.e., I don't want a medoid.
EDIT: Also, is there anyway that I can find an equation for circle/ellipse that envelopes a majority of these points? In the elliptical track, I've added an ellipse that envelopes the points on the track. The values were calculated by trial and error. The center was also calculated by eye balling the plot. How can I do this programmatically?

Smallest circle problem and the here is a paper (PDF download available) on the smallest ellipse problem. Both have O(N) algorithms and should be able to provide the formula for the circle and area from which you can get the center. However, they focus on enclosing all of the points. To solve that issue you'll need to remove some a number of the bounding points, which you should get from the algorithms as well. Unfortunately, it's pretty much up to you as to what qualifies as a good enough solution.
A fast and simple randomized solution is:
Randomly divide the set of points into k sets of N/k points each.
Run the smallest circle/ellipse algorithm for each set
For each of the k sets, pick at least 1 but no more m bounding points to remove from main point set.
Return to step 1, t times.
Return the result of the circle/ellipse algorithm on remaining points.
The algorithm removes between k and mk bounding points every pass at a cost of O(N). For your purpose you'll probably want to remove some percentage of the bounding points, 1-25% seems like a good starting point. This solution assumes that k is very small compared to N, otherwise you'll be removing too many points.
A slower but likely better algorithm is useful in the case that you want to repeated remove one or all of the bounding point from the smallest elipse, recalculate the smallest ellipse, then remove the bounding points again.
You can do this by having the parent node be the bounding points (points stored as a set for easy for faster removal) of the smallest enclosing ellipse of it's children. The maximum number of bounding points should be no more than k (which I'm thinking is 9 for an ellipse, compared to 3 for a circle). So removing a point from the data structure at O(k log N) as it requires recalculating the smallest circle, which is O(k) for each parent that is affected which is O(log N). So removing m points from the data structure should be O(mk log N). You might also want to consider calculating the area of the ellipse every every removed point and removing every point for a cost of O(Nk log N) until you only have three points left. You could then analyze the area data to determine what ellipse should be used. A simple result would be to simply use the ellipse that has the area closest to the average area of all of the ellipses created, but may not be exactly what you seek. It also might be too slow, in which case I recommend a single pass of the faster algorithm.

This looks like an instance of Robust Ellipse Fitting. Check this paper: Outlier Elimination for
Robust Ellipse and Ellipsoid Fitting http://arxiv.org/pdf/0910.4610.pdf.
A first rough and easy solution is provided by the ellipse of inertia (2D version of the ellipsoid of inertia http://en.wikipedia.org/wiki/Moment_of_inertia#Inertia_ellipsoid). Its center is just the centroid and axes are given by Eigen vectors/values of the 2x2 matrix of inertia.

Related

Quadtrees: a common intersect method failing to handle a simple case

I am writing a simple GUI library and am using quadtrees to determine which, if any, objects are interacted with during a mouse event. I was looking through a number of quadtree libraries on github and they all contained a method for adding a rectangular object to a quadtree.
The method, in all cases, simply checked to see if the rectangle intersected with the given quadtree:
return quadtree.x2 >= rect.x1
and quadtree.x1 <= rect.x2
and quadtree.y2 >= rect.y1
and quadtree.y1 <= rect.y2
However, this gives an unwanted result in one of the simplest cases: Imagine a 100x100 square area. I place four 50x50 square objects into the area with coordinates (0,0), (0,50), (50,0), and (50,50). If these objects had been placed into a 100x100 quadtree with a maximum capacity of one object, I would (visually) expect that the first layer of the quadtree would split and that the four resulting trees would each exactly contain one of the squares.
If I use the above method to determine which tree the squares are placed into, though, I find that each object intersects with all four trees. This would cause each of the trees to rapidly split until the maximum depth is reached.
The only way I see to avoid this is to use two checks:
return (quadtree.x2 > rect.x1
and quadtree.x1 < rect.x2
and quadtree.y2 > rect.y1
and quadtree.y1 < rect.y2)
or (quadtree.x2 == rect.x1
and quadtree.x1 == rect.x2
and quadtree.y2 == rect.y1
and quadtree.y1 == rect.y2)
(in the simplest case. Larger objects would have to be viewed within a bounding box since, for example, an object with coordinates (0,0), w=100, h=100 would belong in the upper-left quadtree as well.)
I could also calculate the overlap between the rectangles and the quadtrees to see if it's non-zero.
Am I missing something? It seems like this should be an ideal situation for a quadtree, yet, in most implementations, it's a huge mess.
I wouldn't call this an ideal situation, because the four rectangles overlap by a fractional amount. For example, if we assume a (fictional) floating precision of 10^(-10), every 'point' is actually a small rectangle with 10^(-10) length, and thus the rectangles overlap by 10^(-10). This is why you get the deep tree.
But I also think the tree could be improved with a slightly modified overlap checking. With your code, the sub-nodes all overlap by a tiny amount. It would work better with excluding the minimum (or maximum values), for example:
return quadtree.x2 >= rect.x1
and quadtree.x1 < rect.x2
and quadtree.y2 >= rect.y1
and quadtree.y1 < rect.y2
So the lower left coordinate of a node is actually outside of that node. This would at least avoid points turning up in several nodes (such as the point (50,50)), and the lower left rectangle would be stored in only one node.

How to divide tiny double precision numbers correctly without precision errors?

I'm trying to diagnose and fix a bug which boils down to X/Y yielding an unstable result when X and Y are small:
In this case, both cx and patharea increase smoothly. Their ratio is a smooth asymptote at high numbers, but erratic for "small" numbers. The obvious first thought is that we're reaching the limit of floating point accuracy, but the actual numbers themselves are nowhere near it. ActionScript "Number" types are IEE 754 double-precision floats, so should have 15 decimal digits of precision (if I read it right).
Some typical values of the denominator (patharea):
0.0000000002119123
0.0000000002137313
0.0000000002137313
0.0000000002155502
0.0000000002182787
0.0000000002200977
0.0000000002210072
And the numerator (cx):
0.0000000922932995
0.0000000930474444
0.0000000930582124
0.0000000938123574
0.0000000950458711
0.0000000958000159
0.0000000962901528
0.0000000970442977
0.0000000977984426
Each of these increases monotonically, but the ratio is chaotic as seen above.
At larger numbers it settles down to a smooth hyperbola.
So, my question: what's the correct way to deal with very small numbers when you need to divide one by another?
I thought of multiplying numerator and/or denominator by 1000 in advance, but couldn't quite work it out.
The actual code in question is the recalculate() function here. It computes the centroid of a polygon, but when the polygon is tiny, the centroid jumps erratically around the place, and can end up a long distance from the polygon. The data series above are the result of moving one node of the polygon in a consistent direction (by hand, which is why it's not perfectly smooth).
This is Adobe Flex 4.5.
I believe the problem most likely is caused by the following line in your code:
sc = (lx*latp-lon*ly)*paint.map.scalefactor;
If your polygon is very small, then lx and lon are almost the same, as are ly and latp. They are both very large compared to the result, so you are subtracting two numbers that are almost equal.
To get around this, we can make use of the fact that:
x1*y2-x2*y1 = (x2+(x1-x2))*y2 - x2*(y2+(y1-y2))
= x2*y2 + (x1-x2)*y2 - x2*y2 - x2*(y2-y1)
= (x1-x2)*y2 - x2*(y2-y1)
So, try this:
dlon = lx - lon
dlat = ly - latp
sc = (dlon*latp-lon*dlat)*paint.map.scalefactor;
The value is mathematically the same, but the terms are an order of magnitude smaller, so the error should be an order of magnitude smaller as well.
Jeffrey Sax has correctly identified the basic issue - loss of precision from combining terms that are (much) larger than the final result.
The suggested rewriting eliminates part of the problem - apparently sufficient for the actual case, given the happy response.
You may find, however, that if the polygon becomes again (much) smaller and/or farther away from the origin, inaccuracy will show up again. In the rewritten formula the terms are still quite a bit larger than their difference.
Furthermore, there's another 'combining-large&comparable-numbers-with-different-signs'-issue in the algorithm. The various 'sc' values in subsequent cycles of the iteration over the edges of the polygon effectively combine into a final number that is (much) smaller than the individual sc(i) are. (if you have a convex polygon you will find that there is one contiguous sequence of positive values, and one contiguous sequence of negative values, in non-convex polygons the negatives and positives may be intertwined).
What the algorithm is doing, effectively, is computing the area of the polygon by adding areas of triangles spanned by the edges and the origin, where some of the terms are negative (whenever an edge is traversed clockwise, viewing it from the origin) and some positive (anti-clockwise walk over the edge).
You get rid of ALL the loss-of-precision issues by defining the origin at one of the polygon's corners, say (lx,ly) and then adding the triangle-surfaces spanned by the edges and that corner (so: transforming lon to (lon-lx) and latp to (latp-ly) - with the additional bonus that you need to process two triangles less, because obviously the edges that link to the chosen origin-corner yield zero surfaces.
For the area-part that's all. For the centroid-part, you will of course have to "transform back" the result to the original frame, i.e. adding (lx,ly) at the end.

Point closest to combined geometric shapes (compound shape)

I have a single point and a set of shapes. I need to know if the point is contained within the compound shape of those shapes. That is, where all of the shapes intersect.
But that is the easy part.
If the point is outside the compound shape I need to find the position within that compound shape that is closest to the point.
These shapes can be of the type:
square
circle
ring (circle with another circle cut out of the center)
inverse circle (basically just the circular hole and a never ending fill outside that hole, or to the end of the canvas is there must be a limit to its size)
part of circle (as in a pie chart)
part of ring (as above but
line
The example below has an inverted circle (the biggest circle with grey surrounding it), a ring (topleft) a square and a line.
If we don't consider the line, then the orange part is the shape to constrain to. If the line is taken into account then the saturated orange part of the line is the shape to constrain to.
The black small dots represent the points that need to be constrained. The blue dots represent the desired result. (a 1, b 2 etc.)
Point "f" has no corresponding constrained result, since it is already in the orange area.
For the purpose of this example, only point "e" is constrained to the line, all others are constrained to the orange orange area.
If none of the shapes would intersect, then the point cannot be constrained. If the constraint would consist of two lines that cross eachother, then every point would be constrained to the same position (the exact position where the lines cross).
I have found methods that come close to this, but none that I can combine to produce the above functionality.
Some similar questions that I found:
Points within a semi circle
What algorithm can I use to determine points within a semi-circle?
Point closest to MovieClip
Flash: Closest point to MovieClip
Closest point through Minkowski Sum (this will work if I can convert the compound shape to polygons)
http://www.codezealot.org/archives/153
Select edge of polygon closest to point (similar to above)
For a point in an irregular polygon, what is the most efficient way to select the edge closest to the point?
PS: I noticed that the orange area may actually come across as yellow on some screens. It's the colored area in any case.
This isn't much of an answer, but it's a bit too long to fit into a comment ...
It's tempting to think, and therefore to advise you, to find the nearest point in each of the shapes to the point of interest, and to find the nearest of those nearest points.
BUT
The area you are interested in is constructed by union, intersection and difference of other areas and there will, therefore, be no general relationship between the closest points of the original shapes and the closest point of the combined shape. If you understand what I mean. For example, while the closest point of A union B is the closest of the set {closest point of A, closest point of B}, the closest point of A intersection B is not a simple function of that same set; at least not for the general case.
I suggest, therefore, that you are going to have to compute the (complex) shape which represents the area of interest and use one of the algorithms you've already discovered to find the closest point to your point of interest.
I look forward to someone much better versed in computational geometry proving me wrong.
Let's call I the intersection of all the shapes, C the contour of I, p the point you want to constrain and r the result point. We have:
If p is in I, then r = p
If p is not in I, then r is in C. So r is the nearest point in C to p.
So I think what you should do is the following:
If p is inside of all the shapes, return p.
Compute the contour C of the intersection of all the shapes, it is defined by a list of parts (segments, arcs, ...).
Find the nearest point to p in every part of C (computed in 2.) and return the nearest point among them to p.
I've discussed this question at length with my brother, and together we came to conclude that any resulting point will always lie on either the point where two shapes intersect, or where a shape intersects with the line from that shape perpendicular to the original point.
In the case of a circular shape constraint, the perpendicular line equals the line to its center. In the case of a line shape constraint, the perpendicular line is (of course) the line perpendicular to itself. In the case of a rectangle, the perpendicular line is the line perpendicular to the closest edge.
(And the same, theoretically, for complex polygon constraints.)
So a new approach (that I'll have to test still) will be to:
calculate all intersecting (with a shape constraint or with the perpendicular line from the original point to the shape constraint) points
keep only those that are valid: that lie within (comply with) all constraints
select the one closest to the original point
If this works, then one more optimization could be to determine first, which intersecting points are nearest and check if they are valid, and then work outward away from the original point until a valid one is found.
If this does not work, I will have another look at the polygon clipping method. For that approach I've come across this useful post:
Compute union of two arbitrary shapes
where clipping complex polygons is made much easier through http://code.google.com/p/gpcas/
The method holds true for all the cases (all points and their results) above, and also for a number of other scenarios that we tested (on paper).
I will try a live version tomorrow at work.

Rotate a circle around another circle

Short question: Given a point P and a line segment L, how do I find the point (or points) on L that are exactly X distance from P, if it guaranteed that there is such a point?
The longer way to ask this question is with an image. Given two circles, one static and one dynamic, if you move the dynamic one towards the static one in a straight line, it's pretty easy to determine the point of contact (see 1, the green dot).
Now, if you move the dynamic circle towards the static circle at an angle, determining the point of contact is much more difficult, (see 2, the purple dot). That part I already have done. What I want to do is, after determining the point of contact, decrease the angle and determine the new point of contact (see 3, 4, the red dot).
In #4, you can see the angle is decreased by less than half, and the new point of contact is half-way between the straight-line point and the original point. In #7, you can see the angle is bisected, but the new point of contact moves much farther than half way back towards the straight-line point.
In my case, I always want to decrease the angle to 5/6ths its original value, but the original angle and distance between the circles are variable. The circles are all the same radius. The actual data I need after decreasing the angle is the vector between the new center of the dynamic circle and the static circle, that is, the blue line in 3, 4, 6, and 7, if that makes the calculation any easier.
So far, I know I have to move the dynamic circle along the line that the purple circle is a center of, towards the center of the static circle. Then the circle has to move directly back towards the original position of the dynamic circle. The hard part is knowing exactly how far back it has to move so that it's just touching the other circle.
To answer your short question, if you are on the Cartesian plane, then find the equation of the line L is sitting on (given the two endpoints of L, this is simple). Find the equation of the perpendicular to said line, which passes through P (this is done by taking the negative inverse of the slope, plugging in P's x and y values, and solving for the intercept). Then find the point where the two perpendicular lines intersect by using their equations as a single system of equations (with x's and y's equal). Then find the distance between the point of intersection and the point P, which is one leg of a triangle. Finally, with that distance and the distance X you are given, use Pythagorean theorem to find the distance of the other leg of the triangle. Now the point you are looking for is a point on L, and also on the line on which L sits. So using the distance you just obtained, the intersection point you had found before, and the equation of L's line, you can find the desired point's coordinates. There can only be a maximum of 2 such points, so all you have to test for is whether the coordinates of the points found are actually on L, or beyond L but still on its line. Sorry for the long answer and if you wanted a geometric explanation rather than an algebraic one.
Draw a circle with the same centre as the stationary circle and the radius of the sum of both radii. There are two intersections with the translation line of the moving circle's centre. The place of the moving circle's center at the time of contact is the closer of those two intersections.
Let the ends of your segment be A and B, and the center of your stationary circle be C. Let the radius of both circles be r. Let the center of the moving circle at the moment of collision be D. We have a triangle ACD, of which we know: the distance AC, because it is constant, the angle DAC, because that's what you are changing, and the distance CD, which is exactly 2r. Theoretically, two sides and angle should let you get all the rest of a triangle...

Finding a free area in the stage

I'm drawing rectangles at random positions on the stage, and I don't want them to overlap.
So for each rectangle, I need to find a blank area to place it.
I've thought about trying a random position, verify if it is free with
private function containsRect(r:Rectangle):Boolean {
var free:Boolean = true;
for (var i:int = 0; i < numChildren; i++)
free &&= getChildAt(i).getBounds(this).containsRect(r);
return free;
}
and in case it returns false, to try with another random position.
The problem is that if there is no free space, I'll be stuck trying random positions forever.
There is an elegant solution to this?
Let S be the area of the stage. Let A be the area of the smallest rectangle we want to draw. Let N = S/A
One possible deterministic approach:
When you draw a rectangle on an empty stage, this divides the stage into at most 4 regions that can fit your next rectangle. When you draw your next rectangle, one or two regions are split into at most 4 sub-regions (each) that can fit a rectangle, etc. You will never create more than N regions, where S is the area of your stage, and A is the area of your smallest rectangle. Keep a list of regions (unsorted is fine), each represented by its four corner points, and each labeled with its area, and use weighted-by-area reservoir sampling with a reservoir size of 1 to select a region with probability proportional to its area in at most one pass through the list. Then place a rectangle at a random location in that region. (Select a random point from bottom left portion of the region that allows you to draw a rectangle with that point as its bottom left corner without hitting the top or right wall.)
If you are not starting from a blank stage then just build your list of available regions in O(N) (by re-drawing all the existing rectangles on a blank stage in any order, for example) before searching for your first point to draw a new rectangle.
Note: You can change your reservoir size to k to select the next k rectangles all in one step.
Note 2: You could alternatively store available regions in a tree with each edge weight equaling the sum of areas of the regions in the sub-tree over the area of the stage. Then to select a region in O(logN) we recursively select the root with probability area of root region / S, or each subtree with probability edge weight / S. Updating weights when re-balancing the tree will be annoying, though.
Runtime: O(N)
Space: O(N)
One possible randomized approach:
Select a point at random on the stage. If you can draw one or more rectangles that contain the point (not just one that has the point as its bottom left corner), then return a randomly positioned rectangle that contains the point. It is possible to position the rectangle without bias with some subtleties, but I will leave this to you.
At worst there is one space exactly big enough for our rectangle and the rest of the stage is filled. So this approach succeeds with probability > 1/N, or fails with probability < 1-1/N. Repeat N times. We now fail with probability < (1-1/N)^N < 1/e. By fail we mean that there is a space for our rectangle, but we did not find it. By succeed we mean we found a space if one existed. To achieve a reasonable probability of success we repeat either Nlog(N) times for 1/N probability of failure, or N² times for 1/e^N probability of failure.
Summary: Try random points until we find a space, stopping after NlogN (or N²) tries, in which case we can be confident that no space exists.
Runtime: O(NlogN) for high probability of success, O(N²) for very high probability of success
Space: O(1)
You can simplify things with a transformation. If you're looking for a valid place to put your LxH rectangle, you can instead grow all of the previous rectangles L units to the right, and H units down, and then search for a single point that doesn't intersect any of those. This point will be the lower-right corner of a valid place to put your new rectangle.
Next apply a scan-line sweep algorithm to find areas not covered by any rectangle. If you want a uniform distribution, you should choose a random y-coordinate (assuming you sweep down) weighted by free area distribution. Then choose a random x-coordinate uniformly from the open segments in the scan line you've selected.
I'm not sure how elegant this would be, but you could set up a maximum number of attempts. Maybe 100?
Sure you might still have some space available, but you could trigger the "finish" event anyway. It would be like when tween libraries snap an object to the destination point just because it's "close enough".
HTH
One possible check you could make to determine if there was enough space, would be to check how much area the current set of rectangels are taking up. If the amount of area left over is less than the area of the new rectangle then you can immediately give up and bail out. I don't know what information you have available to you, or whether the rectangles are being laid down in a regular pattern but if so you may be able to vary the check to see if there is obviously not enough space available.
This may not be the most appropriate method for you, but it was the first thing that popped into my head!
Assuming you define the dimensions of the rectangle before trying to draw it, I think something like this might work:
Establish a grid of possible centre points across the stage for the candidate rectangle. So for a 6x4 rectangle your first point would be at (3, 2), then (3 + 6 * x, 2 + 4 * y). If you can draw a rectangle between the four adjacent points then a possible space exists.
for (x = 0, x < stage.size / rect.width - 1, x++)
for (y = 0, y < stage.size / rect.height - 1, y++)
if can_draw_rectangle_at([x,y], [x+rect.width, y+rect.height])
return true;
This doesn't tell you where you can draw it (although it should be possible to build a list of the possible drawing areas), just that you can.
I think that the only efficient way to do this with what you have is to maintain a 2D boolean array of open locations. Have the array of sufficient size such that the drawing positions still appear random.
When you draw a new rectangle, zero out the corresponding rectangular piece of the array. Then checking for a free area is constant^H^H^H^H^H^H^H time. Oops, that means a lookup is O(nm) time, where n is the length, m is the width. There must be a range based solution, argh.
Edit2: Apparently the answer is here but in my opinion this might be a bit much to implement on Actionscript, especially if you are not keen on the geometry.
Here's the algorithm I'd use
Put down N number of random points, where N is the number of rectangles you want
iteratively increase the dimensions of rectangles created at each point N until they touch another rectangle.
You can constrain the way that the initial points are put down if you want to have a minimum allowable rectangle size.
If you want all the space covered with rectangles, you can then incrementally add random points to the remaining "free" space until there is no area left uncovered.