Are SPATIAL Geometry indices performance dependant on the size and density of geometry shapes? - mysql

Spatial Indexes
Given a spatial index, is the index utility, that is to say the overall performance of the index, only as good as the overall geometrys.
For example, if I were to take a million geometry data types and insert them into a table so that their relative points are densely located to one another, does this make this index perform better to identical geometry shapes whose relative location might be significantly more sparse.
Question 1
For example, take these two geometry shapes.
Situation 1
LINESTRING(0 0,1 1,2 2)
LINESTRING(1 1,2 2,3 3)
Geometrically they are identical, but their coordinates are off by a single point. Imagine this was repeated one million times.
Now take this situation,
Situation 2
LINESTRING(0 0,1 1,2 2)
LINESTRING(1000000 1000000,1000001 10000001,1000002 1000002)
LINESTRING(2000000 2000000,2000001 20000001,2000002 2000002)
LINESTRING(3000000 3000000,3000001 30000001,3000002 3000002)
In the above example:
the lines dimensions are identical to the situation 1,
the lines are of the same number of points
the lines have identical sizes.
However,
the difference is that the lines are massively futher apart.
Why is this important to me?
The reason I ask this question is because I want to know if I should remove as much precision from my input geometries as I possibly can and reduce their density and closeness to each other as much as my application can provide without losing accuracy.
Question 2
This question is similar to the first question, but instead of being spatially close to another geometry shape, should the shapes themselves be reduced to the smalest possible shape to describe what it is that the application requires.
For example, if I were to use a SPATIAL index on a geometry datatype to provide data on dates.
If I wanted to store a date range of two dates, I could use a datetime data type in mysql. However, what if I wanted to use a geometry type, so that I convery the date range by taking each individual date and converting it into a unix_timestamp().
For example:
Date("1st January 2011") to Timestamp = 1293861600
Date("31st January 2011") to Timestamp = 1296453600
Now, I could create a LINESTRING based on these two integers.
LINESTRING(1293861600 0,1296453600 1)
If my application is actually only concerned about days, and the number of seconds isn't important for date ranges at all, should I refactor my geometries so that they are reduced to their smallest possible size in order to fulfil what they need.
So that instead of "1293861600", I would use "1293861600" / (3600 * 24), which happens to be "14975.25".
Can someone help fill in these gaps?

When inserting a new entry, the engine chooses the MBR which would be minimally extended.
By "minimally extended", the engine can mean either "area extension" or "perimeter extension", the former being default in MySQL.
This means that as long as your nodes have non-zero area, their absolute sizes do not matter: the larger MBR's remain larger and the smaller ones remain smaller, and ultimately all nodes will end up in the same MBRs
These articles may be of interest to you:
Overlapping ranges in MySQL
Join on overlapping date ranges
As for the density, the MBR are recalculated on page splits, and there is a high chance that all points too far away from the main cluster will be moved away on the first split to their own MBR. It would be large but be a parent to all outstanding points in few iterations.
This will decrease the search time for the outstanding points and will increase the search time for the cluster points by one page seek.

Related

How would I use DynamoDB to move this usage from my mysql db to nosql?

I'm currently experiencing issues with a service I've developed that relies heavily on large payload reads from the db (500 rows). I'm seeing huge throughput, in the range of 35,000+ requests per minute for up to 500 rows per request going through the DB and it is not handling the scaling at all.
The data in question is retrieved primarily on a latitude / longitude where statement that checks if the latitude and longitude of the row can be contained within a minimum latitude longitude coordinate, and a maximum latitude longitude coordinate. This is effective checking if the row in question is within the bounding box created by the min / max passed into the where.
This is the where portion of the query we rely on for reference.
s.latitude > {minimumLatitude} AND
s.longitude > {minimumLongitude} AND
s.latitude < {maximumLatitude} AND
s.longitude < {maximumLongitude}
SO, with that said. MySQL is handling this find, I'm presently on RDS and having to rely heavily on an r3.8XL master, and 3 r3.8XL reads just to get the throughput capacity I need to prevent the application from slowing down and throwing the CPU into 100% usage.
Obviously, with how heavy the payload is and how frequently it's queried this data needs to be moved into a more fitting service. Something like Elasticache's services or DynamoDB.
I've been leaning towards DynamoDB, but my only option here seems to be using SCAN as there is no useful primary key I can associate on my data to reduce the result set as it relies on calculating if the latitude / longitude of a point is within a bounding box. DynamoDB filters on attributes would work great as they support the basic conditions needed, however on a table that would be 250,000+ rows and growing by nearly 200,000 a day or more would be unusably expensive.
Another option to reduce the result set was to use a Map Binning technique to associate a map region with the data, and reduce on that in dynamo as the primary key and then further filter down on the latitude / longitude attributes. This wouldn't be ideal though, we'd prefer to get data within specific bounds and not have excess redundant data passed back as the min/max lat/lng can overlap multiple bins and would then pull data from pins that a majority may not be needed for.
At this point I'm continuously having to deploy read replicas to keep the service up and it's definitely not ideal. Any help would be greatly appreciated.
You seem to be overlooking what seems like it would be the obvious first thing to try... indexing the data using an index structure suited to the nature of the data... in MySQL.
B-trees are of limited help since you still have to examine all possible matches in one dimension after eliminating impossible matches in the other.
Aside: Assuming you already have an index on (lat,long), you will probably be able to gain some short-term performance improvement by adding a second index with the columns reversed (long,lat). Try this on one of your replicas¹ and see if it helps. If you have no indexes at all, then of course that is your first problem.
Now, the actual solution. This requires MySQL 5.7 because before then, the feature works with MyISAM but not with InnoDB. RDS doesn't like it at all if you try to use MyISAM.
This is effective checking if the row in question is within the bounding box created by the min / max passed into the where.
What you need is an R-Tree index. These indexes actually store the points (or lines, polygons, etc.) in an order that understands and preserves their proximity in more than one dimension... proximate points are closer in the index and minimum bounding rectangles ("bounding box") are easily and quickly identified.
The MySQL spatial extensions support this type of index.
There's even an MBRContains() function that compares the points in the index to the points in the query, using the R-Tree to find all the points contained in thr MBR you're searching. Unlike the usual optimization rule that you should not use column names as function arguments in the where clause to avoid triggering a table scan, this function is an exception -- the optimizer does not actually evaluate the function against every row but uses the meaning of the expression to evaluate it against the index.
There's a bit of a learning curve needed in order to understand the design of the spatial extensions but once you understand the principles, it falls into place nicely and the performance will exceed your expectations. You'll want a single column of type GEOMETRY and you'll want to store lat and long together in that one indexed column as a POINT.
To safely test this without disruption, make a replica, then detach it from your master, promoting it to become its own independent master, and upgrade it to 5.7 if necessary. Create a new table with the same structure plus a GEOMETRY column and a SPATIAL KEY, then populate it with INSERT ... SELECT.
Note that DynamoDB scan is a very "expensive" operation. On a table I was testing against just yesterday, a single scan consistently cost 112 read units each time it was run, regardless of the number of records, presumably because a scan always reads 1MB of data, which is 256 blocks of 4K (definition of a read unit) but not with strong consistency (so, half the cost). 1 MB ÷ 4KB ÷ 2 = 128 which I assume is close enough to 112 that this explains that number.
¹ It's a valid, supported operation to add an index to a MySQL replica but not the master, even in RDS. You need to temporarily make the replica writable by creating a new parameter group identical to the existing one, and then flipping read_only to 0 in that group. Associate the replica to the new parameter group, then wait for the state to change from applying to in-sync, log in to the replica and add the index. Then put the parameter group back when done.

Dealing with clusters when searching for points on map using mysql

I've found various questions with solutions similar to this problem but nothing quite on the money so far. Very grateful for any help.
I have a mysql (v.5.6.10) database with a single table called POSTS that stores millions upon millions of rows of lat/long points of interest on a map. Each point is classified as one of several different types. Each row is structured as id, type, coords:
id an unsigned bigint + primary key. This is auto incremented for each new row that is inserted.
type an unsigned tinyint used to encode the type of the point of interest.
coords a mysql geospatial POINT datatype representing the lat/long of the point of interest.
There is a SPATIAL index on 'coords'.
I need to find an efficient way to query the table and return up to X of the most recently-inserted points within a radius ("R") of a specific lat/long position ("Position"). The database is very dynamic so please assume that the data is radically different each time the table is queried.
If X is infinite, the problem is trivial. I just need to execute a query something like:
SELECT id, type, AsText(coords) FROM POSTS WHERE MBRContains(GeomFromText(BoundingBox, Position))
Where 'BoundingBox' is a mysql POLYGON datatype that perfectly encloses a circle of radius R from Position. Using a bounding box is, of course, not a perfect solution but this is not important for the particular problem that I'm trying to solve. I can order the results using "ORDER BY ID DESC" to retrieve and process the most-recently-inserted points first.
If X is less than infinite then I just need to modify the above to:
SELECT id, type, AsText(coords) FROM POSTS WHERE MBRContains(GeomFromText(BoundingBox, Position)) ORDER BY id DESC LIMIT X
The problem that I am trying to solve is how do I obtain a good representative set of results from a given region on the map when the points in that region are heavily clustered (for example, within cities on the map search region). For example:
In the example above, I am standing at X and searching for the 5 most-recently-inserted points of type black within the black-framed bounding box. If these points were all inserted in the cluster in the bottom right hand corner (let's assume that cluster is London) then my set of results will not include the black point that is near the top right of the search region. This is a problem for my application as I do not want users to be given the impression that there are no points of interest outside any areas where points are clustered.
I have considered a few potential solutions but I can't find one that works efficiently when the number of rows is huge (10s of millions). Approaches that I have tried so far include:
Dividing the search region into S number of squares (i.e., turning it into a grid) and searching for up to x/S points within each square - i.e., executing a separate mysql query for each square in the grid. This works OK for a small number of rows but becomes inefficient when the number of rows is massive as you need to divide the region into a large number of squares for the approach to work effectively. With only a small number of squares, you cannot guarantee that each square won't contain a densely populated cluster. A large number of squares means a large number of mysql searches which causes things to chug.
Adding a column to each row in the table that stores the distance to the nearest neighbour for each point. The nearest neighbour distance for a given point is calculated when the point is inserted into the table. With this structure, I can then order the search results by the nearest neighbour distance column so that any points that are in clusters are returned last. This solution only works when I'm searching for ALL points within the search region. For example, consider the situation in the diagram shown above. If I want to find the 5 most-recently-inserted points of type green, the nearest neighbour distance that is recorded for each point will not be correct. Recalculating these distances for each and every query is going to be far too expensive, even using efficient algorithms like KD trees.
In fact, I can't see any approach that requires pre-processing of data in table rows (or, put another way, 'touching' every point in the relevant search region dataset) to be viable when the number of rows gets large. I have considered algorithms like k-means / DBSCAN, etc. and I can't find anything that will work with sufficient efficiency given the use case explained above.
Any pearls? My intuition tells me this CAN be solved but I'm stumped so far.
Post-processing in that case seems more effective. Fetch last X points of a given type. Find if there is some clustering, for example: too many points too close together, relative to the distance of your point of view. Drop oldest of them (or these which are very close - may be your data is referencing a same POI). How much - up to you. Fetch next X points and see if there are some of them which are not in the cluster, or you can calculate a value for each of them based on remoteness and recentness and discard points according to that value.

Best practice for storing GPS data of a tracking app in mysql database

I have a datamodel question for a GPS tracking app. When someone uses our app it will save latitude, longitude, current speed, timestamp and burned_calories every 5 seconds. When a workout is completed, the average speed, total time/distance and burned calories of the workout will be stored in a database. So far so good..
What we want is to also store the data that is saved those every 5 seconds, so we can utilize this later on to plot graphs/charts of a workout for example.
How should we store this amount of data in a database? A single workout can contain 720 rows if someone runs for an hour. Perhaps a serialised/gzcompressed data array in a single row. I'm aware though that this is bad practice..
A relational one/many to many model would be undone? I know MySQL can easily handle large amounts of data, but we are talking about 720 * workouts
twice a week * 7000 users = over 10 million rows a week.
(Ofcourse we could only store the data of every 10 seconds to halve the no. of rows, or every 20 seconds, etc... but it would still be a large amount of data over time + the accuracy of the graphs would decrease)
How would you do this?
Thanks in advance for your input!
Just some ideas:
Quantize your lat/lon data. I believe that for technical reasons, the data most likely will be quantized already, so if you can detect that quantization, you might use it. The idea here is to turn double numbers into reasonable integers. In the worst case, you may quantize to the precision double numbers provide, which means using 64 bit integers, but I very much doubt your data is even close to that resolution. Perhaps a simple grid with about one meter edge length is enough for you?
Compute differences. Most numbers will be fairly large in terms of absolute values, but also very close together (unless your members run around half the world…). So this will result in rather small numbers. Furthermore, as long as people run with constant speed into a constant direction, you will quite often see the same differences. The coarser your spatial grid in step 1, the more likely you get exactly the same differences here.
Compute a Huffman code for these differences. You might try encoding lat and long movement separately, or computing a single code with 2d displacement vectors at its leaves. Try both and compare the results.
Store the result in a BLOB, together with the dictionary to decode your Huffman code, and the initial position so you can return data to absolute coordinates.
The result should be a fairly small set of data for each data set, which you can retrieve and decompress as a whole. Retrieving individual parts from the database is not possible, but it sounds like you wouldn't be needing that.
The benefit of Huffman coding over gzip is that you won't have to artificially introduce an intermediate byte stream. Directly encoding the actual differences you encounter, with their individual properties, should work much better.

Optimal DB query for prefix search

I have a dataset which is a list of prefix ranges, and the prefixes aren't all the same size. Here are a few examples:
low: 54661601 high: 54661679 "bin": a
low: 526219100 high: 526219199 "bin": b
low: 4305870404 high: 4305870404 "bin": c
I want to look up which "bin" corresponds to a particular value with the corresponding prefix. For example, value 5466160179125211 would correspond to "bin" a. In the case of overlaps (of which there are few), we could return either the longest prefix or all prefixes.
The optimal algorithm is clearly some sort of tree into which the bin objects could be inserted, where each successive level of the tree represents more and more of the prefix.
The question is: how do we implement this (in one query) in a database? It is permissible to alter/add to the data set. What would be the best data & query design for this? An answer using mongo or MySQL would be best.
If you make a mild assumption about the number of overlaps in your prefix ranges, it is possible to do what you want optimally using either MongoDB or MySQL. In my answer below, I'll illustrate with MongoDB, but it should be easy enough to port this answer to MySQL.
First, let's rephrase the problem a bit. When you talk about matching a "prefix range", I believe what you're actually talking about is finding the correct range under a lexicographic ordering (intuitively, this is just the natural alphabetic ordering of strings). For instance, the set of numbers whose prefix matches 54661601 to 54661679 is exactly the set of numbers which, when written as strings, are lexicographically greater than or equal to "54661601", but lexicographically less than "54661680". So the first thing you should do is add 1 to all your high bounds, so that you can express your queries this way. In mongo, your documents would look something like
{low: "54661601", high: "54661680", bin: "a"}
{low: "526219100", high: "526219200", bin: "b"}
{low: "4305870404", high: "4305870405", bin: "c"}
Now the problem becomes: given a set of one-dimensional intervals of the form [low, high), how can we quickly find which interval(s) contain a given point? The easiest way to do this is with an index on either the low or high field. Let's use the high field. In the mongo shell:
db.coll.ensureIndex({high : 1})
For now, let's assume that the intervals don't overlap at all. If this is the case, then for a given query point "x", the only possible interval containing "x" is the one with the smallest high value greater than "x". So we can query for that document and check if its low value is also less than "x". For instance, this will print out the matching interval, if there is one:
db.coll.find({high : {'$gt' : "5466160179125211"}}).sort({high : 1}).limit(1).forEach(
function(doc){ if (doc.low <= "5466160179125211") printjson(doc) }
)
Suppose now that instead of assuming the intervals don't overlap at all, you assume that every interval overlaps with less than k neighboring intervals (I don't know what value of k would make this true for you, but hopefully it's a small one). In that case, you can just replace 1 with k in the "limit" above, i.e.
db.coll.find({high : {'$gt' : "5466160179125211"}}).sort({high : 1}).limit(k).forEach(
function(doc){ if (doc.low <= "5466160179125211") printjson(doc) }
)
What's the running time of this algorithm? The indexes are stored using B-trees, so if there are n intervals in your data set, it takes O(log n) time to lookup the first matching document by high value, then O(k) time to iterate over the next k documents, for a total of O(log n + k) time. If k is constant, or in fact anything less than O(log n), then this is asymptotically optimal (this is in the standard model of computation; I'm not counting number of external memory transfers or anything fancy).
The only case where this breaks down is when k is large, for instance if some large interval contains nearly all the other intervals. In this case, the running time is O(n). If your data is structured like this, then you'll probably want to use a different method. One approach is to use mongo's "2d" indexing, with your low and high values codifying x and y coordinates. Then your queries would correspond to querying for points in a given region of the x - y plane. This might do well in practice, although with the current implementation of 2d indexing, the worst case is still O(n).
There are a number of theoretical results that achieve O(log n) performance for all values of k. They go by names such as Priority Search Trees, Segment trees, Interval Trees, etc. However, these are special-purpose data structures that you would have to implement yourself. As far as I know, no popular database currently implements them.
"Optimal" can mean different things to different people. It seems that you could do something like save your low and high values as varchars. Then all you have to do is
select bin from datatable where '5466160179125211' between low and high
Or if you had some reason to keep the values as integers in the table, you could do the CASTing in the query.
I have no idea whether this would give you terrible performance with a large dataset. And I hope I understand what you want to do.
With MySQL you may have to use a stored procedure, which you call to map value to bin. Said procedure would query the list of buckets for each row and do arithmetic or string ops to find the matching bucket. You could improve this design by using fixed length prefixes, arranged in a fixed number of layers. You could assign a fixed depth to your tree and each layer has a table. You won't get tree-like performance with either of these approaches.
If you want to do something more sophisticated, I suspect you have to use a different platform.
Sql Server has a Hierarchy data type:
http://technet.microsoft.com/en-us/library/bb677173.aspx
PostgreSQL has a cidr data type. I'm not familiar with the level of query support it has, but in theory you could build a routing table inside of your db and use that to assign buckets:
http://www.postgresql.org/docs/7.4/static/datatype-net-types.html#DATATYPE-CIDR
Peyton! :)
If you need to keep everything as integers, and want it to work with a single query, this should work:
select bin from datatable where 5466160179125211 between
low*pow(10, floor(log10(5466160179125211))-floor(log10(low)))
and ((high+1)*pow(10, floor(log10(5466160179125211))-floor(log10(high)))-1);
In this case, it would search between the numbers 5466160100000000 (the lowest number with the low prefix & the same number of digits as the number to find) and 546616799999999 (the highest number with the high prefix & the same number of digits as the number to find). This should still work in cases where the high prefix has more digits than the low prefix. It should also work (I think) in cases where the number is shorter than the length of the prefixes, where the varchar code in the previous solution can give incorrect results.
You'll want to experiment to compare the performance of having a lot of inline math in the query (as in this solution) vs. the performance of using varchars.
Edit: Performance seems to be really good either way even on big tables with no indexes; if you can use varchars then you might be able to further boost performance by indexing the low and high columns. Note that you'd definitely want to use varchars if any of the prefixes have initial zeroes. Here's a fix to allow for the case where the number is shorter than the prefix when using varchars:
select * from datatable2 where '5466' between low and high
and length('5466') >= length(high);

Efficient represention for growing circles in 2D space?

Imagines there's a 2D space and in this space there are circles that grow at different constant rates. What's an efficient data structure for storing theses circles, such that I can query "Which circles intersect point p at time t?".
EDIT: I do realize that I could store the initial state of the circles in a spatial data structure and do a query where I intersect a circle at point p with a radius of fastest_growth * t, but this isn't efficient when there are a few circles that grow extremely quickly whereas most grow slowly.
Additional Edit: I could further augment the above approach by splitting up the circles and grouping them by there growth rate, then applying the above approach to each group, but this requires a bounded time to be efficient.
Represent the circles as cones in 3d, where the third dimension is time. Then use a BSP tree to partition them the best you can.
In general, I think the worst-case for testing for intersection is always O(n), where n is the number of circles. Most spacial data structures work by partitioning the space cleverly so that a fraction of the objects (hopefully close to half) are in each half. However, if the objects overlap then the partitioning cannot be perfect; there will always be cases where more than one object is in a partition. If you just think about the case of two circles overlapping, there is no way to draw a line such that one circle is entirely on one side and the other circle is entirely on the other side. Taken to the logical extreme, assuming arbitrary positioning of the circles and arbitrary radiuses, there is no way to partition them such that testing for intersection takes O(log(n)).
This doesn't mean that, in practice, you won't get a big advantage from using a tree, but the advantage you get will depend on the configuration of the circles and the distribution of the queries.
This is a simplified version of another problem I have posted about a week ago:
How to find first intersection of a ray with moving circles
I still haven't had the time to describe the solution that was expected there, but I will try to outline it here(for this simplar case).
The approach to solve this problem is to use a kinetic KD-tree. If you are not familiar with KD trees it is better to first read about them. You also need to add the time as additional coordinate(you make the space 3d instead of 2d). I have not implemented this idea yet, but I believe this is the correct approach.
I'm sorry this is not completely thought through, but it seems like you might look into multiplicatively-weighted Voronoi Diagrams (MWVDs). It seems like an adversary could force you into computing one with a series of well-placed queries, so I have a feeling they provide a lower-bound to your problem.
Suppose you compute the MWVD on your input data. Then for a query, you would be returned the circle that is "closest" to your query point. You can then determine whether this circle actually contains the query point at the query time. If it doesn't, then you are done: no circle contains your point. If it does, then you should compute the MWVD without that generator and run the same query. You might be able to compute the new MWVD from the old one: the cell containing the generator that was removed must be filled in, and it seems (though I have not proved it) that the only generators that can fill it in are its neighbors.
Some sort of spatial index, such as an quadtree or BSP, will give you O(log(n)) access time.
For example, each node in the quadtree could contain a linked list of pointers to all those circles which intersect it.
How many circles, by the way? For small n, you may as well just iterate over them. If you constantly have to update your spatial index and jump all over cache lines, it may end up being faster to brute-force it.
How are the centres of your circles distributed? If they cover the plane fairly evenly you can discretise space and time, then do the following as a preprocessing step:
for (t=0; t < max_t; t++)
foreach circle c, with centre and radius (x,y,r) at time t
for (int X = x-r; X < x+r; x++)
for (int Y = x-r; Y < y+r; y++)
circles_at[X][Y][T].push_back (&c)
(assuming you discretise space and time along integer boundaries, scale and offset however you like of course, and you can add circles later on or amortise the cost by deferring calculation for distant values of t)
Then your query for point (x,y) at time (t) could do a brute-force linear check over circles_at[x][y][ceil(t)]
The trade-off is obvious, increasing the resolution of any of the three dimensions will increase preprocessing time but give you a smaller bucket in circles_at[x][y][t] to test.
People are going to make a lot of recommendations about types of spatial indices to use, but I would like to offer a bit of orthogonal advice.
I think you are best off building a few indices based on time, i.e. t_0 < t_1 < t_2 ...
If a point intersects a circle at t_i, it will also intersect it at t_{i+1}. If you know the point in advance, you can eliminate all circles that intersect the point at t_i for all computation at t_{i+1} and later.
If you don't know the point in advance, then you can keep these time-point trees (built based on the how big each circle would be at a given time). At query time (e.g. t_query), find i such that t_{i-1} < t_query <= t_i. If you check all the possible circles at t_i, you will not have any false negatives.
This is sort of a hack for a data structure that is "time dynamics aware", but I don't know of any. If you have a threaded environment, then you only need to maintain one spacial index and be working on the next one in the background. It will cost you a lot of computation for the benefit of being able to respond to queries with low latency. This solution should be compared at the very least to the O(n) solution (go through each point and check if dist(point, circle.center) < circle.radius).
Instead of considering the circles, you can test on their bounding boxes to filter out the ones which do not contain the point. If your bounding box sides are all sorted, this is essentially four binary searches.
The tricky part is reconstructing the sorted sides for any given time, t. To do that, you can start off with the original points: two lists for the left and right sides with the x coordinate, and two lists for top and bottom with the y coordinates. For any time greater than 0, all the left side points will move to left, etc. You only need to check each location to the one next to it to obtain a points where the element and the one next to it are are swapped. This should give you a list of time points to modify your ordered lists. If you now sort these modification records by time, for any given starting time and an ending time you can extract all the modification records between the two, and apply them to your four lists in order. I haven't completely figured out the algorithm, but I think there will be edge cases where three or more successive elements can cross over exactly at the same time point, so you may need to modify the algorithm to handle those edge cases as well. Perhaps a list modification record that contains the position in list, and the number of records to reorder would suffice.
I think it's possible to create a binary tree that solves this problem.
Each branch should contain a growing circle, a static circle for partitioning and the latest time at which the partitioning circle cleanly partitions. Further more the growing circle that is contained within a node should always have a faster growing rate than either of it's child nodes' growing circles.
To do a query, take the root node. First check it's growing circle, if it contains the query point at the query time, add it to the answer set. Then, if the time that you're querying is greater than the time at which the partition line is broken, query both children, otherwise if the point falls within the partitioning circle, query the left node, else query the right node.
I haven't quite completed the details of performing insertions, (the difficult part is updating the partition circle so that the number of nodes on the inside and outside is approximately equal and the time when the partition is broken is maximized).
To combat the few circles that grow quickly case, you could sort the circles in descending order by rate of growth and check each of the k fastest growers. To find the proper k given t, I think you can perform a binary search to find the index k such that k*m = (t * growth rate of k)^2 where m is a constant factor you'll need to find by experimentation. The will balance the part the grows linearly with k with the part that falls quadratically with the growth rate.
If you, as already suggested, represent growing circles by vertical cones in 3d, then you can partition the space as regular (may be hexagonal) grid of packed vertical cylinders. For each cylinder calculate minimal and maximal heights (times) of intersections with all cones. If circle center (vertex of cone) is placed inside the cylinder, then minimal time is zero. Then sort cones by minimal intersection time. As result of such indexing, for each cylinder you’ll have the ordered sequence of records with 3 values: minimal time, maximal time and circle number.
When you checking some point in 3d space, you take the cylinder it belongs to and iterate its sequence until stored minimal time exceeds the time of the given point. All obtained cones, which maximal time is less than given time as well, are guaranteed to contain given point. Only cones, where given time lies between minimal and maximal intersection times, are needed to recalculate.
There is a classical tradeoff between indexing and runtime costs – the less is the cylinder diameter, the less is the range of intersection times, therefore fewer cones need recalculation at each point, but more cylinders have to be indexed. If circle centers are distributed non-evenly, then it may be worth to search better cylinder placement configuration then regular grid.
P.S. My first answer here - just registered to post it. Hope it isn’t late.