Data Structure for Range Matching Lookup

Data Structure for Range Matching Lookup - json

Assume I have a structure of ranges, and their associated data, for instance:
data [
[ [0, 100], "New York"],
[ [101, 200], "Boston"],
...
]
For a function that receives a N as an arguments and returns the entry where N is in the range of the left element.
for instance,
> 103
< "Boston"
What will be the best structure to transform the above to achieve the fastest lookup time?

If your data set should be dynamic, use interval tree.

I would suggest you to try with B+ tree. As I haven't personally tried this problem either. B+ tree can have array as its child so you could set the data value for index 0-100 as New York with 101 pointing to child 2 in the tree.
Check about B+ tree here
I would recommend you to take regular approach for this, incase your data is small.

Related

How the get value function works in sparse merkle trees?

I have just started reading about sparse merkle trees and I came across a function(get value) which is used to find value for the specified key. I can't find an explanation on the internet which can explain how the get value function works.
My understanding is that each node is of 256 bits so there can be 2^256 leaf nodes and keys are indexed. So we start from root and keep choosing left or right node based on weather the bit is 0 or 1 but I'm not able to understand v = db.get(v)[32:] statement. How is it leading me to the value for the key provided?
def get(db, root, key):
v = root
path = key_to_path(key)
for i in range(256):
if (path >> 255) & 1:
v = db.get(v)[32:]
else:
v = db.get(v)[:32]
path <<= 1
return v

"A Merkle tree [21] is a binary tree that incorporates the use of cryptographic hash
functions. One or many attributes are inserted into the leaves, and every node
derives a digest which is recursively dependent on all attributes in its subtree.
That is, leaves compute the hash of their own attributes, and parents derive the
hash of their children’s digests concatenated left-to-right."
This is a citation from "https://eprint.iacr.org/2016/683.pdf"
Each hash has a roadmap to all it's dependent hashes.

What algorithm is used to filter out from higher dimensional data points?

I have 4-dimensional data points stored in my MySQL database in server. One time dimension data with three spatial GPS data (lat, lon, alt). GPS data are sampled by 1 minute timeframe for thousands of users and are being added to my server 24x7.
Sample REST/post json looks like,
{
"id": "1005",
"location": {
"lat":-87.8788,
"lon":37.909090,
"alt":0.0,
},
"datetime": 11882784
}
Now, I need to filter out all the candidates (userID) whose positions were within k meters distance from a given userID for a given time period.
Sample REST/get query params for filtering looks like,
{
"id": "1001", // user for whose we need to filter out candidates IDs
"maxDistance":3, // max distance in meter to consider (euclidian distance from users location to candidates location)
"maxDuration":14 // duration offset (in days) from current datetime to consider
}
As we see, thousands of entries are inserted in my database per minute which results huge number of total entries. Thus to iterate over for all the entries for filtering, I am afraid trivial naive approach won't be feasible for my current requirement. So, what algorithm should I implement in the server? I have tried to implement naive algorithm like,
params ($uid, $mDis, $mDay)
1. Init $candidates = []
2. For all the locations $Li of user with $uid
3. For all locations $Di in database within $mDay
4. $dif = EuclidianDis($Li, $Di)
5. If $dif < $mDis
6. $candidates += userId for $Di
7. Return $candidates
However, this approach is very slow in practice. And pre calculation might not be feasible as it costs huge space for all userIDs. What other algorithm can improve efficiency?

You could implement a spatial hashing algorithm to efficiently query your database for candidates within a given area/time.
Divide the 3D space into a 3D grid of cubes with width k, and when inserting a data-point into your database, compute which cube the point lies in and compute a hash value based on the cube co-ordinates.
When querying for all data-points within k of another datapoint d, compute the cube that d sits in, and find the 8 adjacent cubes (+/- 1 in each dimension). Compute the hash values of the 9 cubes and query your database for all entries with these hash values within the given time period. You will have small candidate set from which you can then iterate over to find all datapoints within k of d.
If your value of k can range between 2-5 meters, give your cubes a width of 5.
Timestamps can be stored as a separate field, or alternatively you can make your cubes 4-dimensional and include the timestamp in the hash, and search 27 cubes instead of 9.

Stream filter in cuda

I have an array of values and a linked list of indexes. Now, i only want to keep those values from the array that correspond to the indexes in the LL. is there a standard algorithm to do this. Please give example if possible
So, suppose i have an array 1,2,5,6,7,9
and i have a linked list 2->3
So, i want to keep the values at the index 2 and 3. That is keep 5 and 6.
Thus my function should return 5 and 6

In general, linked list is inherently serial. Having a parallel machine will not speed up the traversal of your list, hence the number of steps of your problem cannot go below O(n), where n is the size of the list.
However, if you have some additional way to access the list you can do something with it.
For example, all elements of the list could be stored in a fixed-size array (although, not necesairly in a consecutive way). List member could be represented in an array using the following struct.
struct ListNode {
bool isValid;
T data;
int next;
}
The value isValid sets if given cell in an array is occupied by a valid list member, or it is just an empty cell.
Now, a parallel algorithm would read all cells at once, check if it represents a valid data, and if so, do something with it.
Second part: Each thread, having a valid index idx of your input array A would have to mark A[idx] not to be deleted. Once we know which elements of A should be removed and which not - a parallel compaction algorithm can be applied.

What's a good way to structure variable nested loops?

Suppose you're working in a language with variable length arrays (e.g. with A[i] for all i in 1..A.length) and have to write a routine that takes n (n : 1..8) variable length arrays of items in a variable length array of length n, and needs to call a procedure with every possible length n array of items where the first is chosen from the first array, the second is chosen from the second array, and so forth.
If you want something concrete to visualize, imagine that your routine has to take data like:
[ [ 'top hat', 'bowler', 'derby' ], [ 'bow tie', 'cravat', 'ascot', 'bolo'] ... ['jackboots','galoshes','sneakers','slippers']]
and make the following procedure calls (in any order):
try_on ['top hat', 'bow tie', ... 'jackboots']
try_on ['top hat', 'bow tie', ... 'galoshes']
:
try_on ['derby','bolo',...'slippers']
This is sometimes called a chinese menu problem, and for fixed n can be coded quite simply (e.g. for n = 3, in pseudo code)
procedure register_combination( items : array [1..3] of vararray of An_item)
for each i1 from items[1]
for each i2 from items[2]
for each i3 from items[3]
register( [ii,i2,i3] )
But what if n can vary, giving a signature like:
procedure register_combination( items : vararray of vararray of An_item)
The code as written contained an ugly case statement, which I replaced with a much simpler solution. But I'm not sure it's the best (and it's surely not the only) way to refactor this.
How would you do it? Clever and surprising are good, but clear and maintainable are better--I'm just passing through this code and don't want to get called back. Concise, clear and clever would be ideal.
Edit: I'll post my solution later today, after others have had a chance to respond.
Teaser: I tried to sell a recursive solution, but they wouldn't go for it, so I had to stick to writing fortran in a HLL.
The answer I went with, posted below.

Either the recursive algorithm
procedure register_combination( items )
register_combination2( [], items [1:] )
procedure register_combination2( head, items)
if items == []
print head
else
for i in items[0]
register_combination2( head ++ i, items [1:] )
or the same with tail calls optimised out, using an array for the indices, and incrementing the last index until it reaches the length of the corresponding array, then carrying the increment up.

Recursion.
Or, better yet, trying to eliminate recursion using stack-like structures and while statements.
For your problem you stated (calling a function with variable arguments) it depends entirely on the programming language you're coding in; many of them allow for passing variable arguments.

Since they were opposed to recursion (don't ask) and I was opposed to messy case statements (which, as it turned out, were hiding a bug) I went with this:
procedure register_combination( items : vararray of vararray of An_item)
possible_combinations = 1
for each item_list in items
possible_combinations = possible_combinations * item_list.length
for i from 0 to possible_combinations-1
index = i
this_combination = []
for each item_list in items
item_from_this_list = index mod item_list.length
this_combination << item_list[item_from_this_list]
index = index div item_list.length
register_combination(this_combination)
Basically, I figure out how many combinations there are, assign each one a number, and then loop through the number producing the corresponding combination. Not a new trick, I suspect, but one worth knowing.
It's shorter, works for any practical combination of list lengths (if there are over 2^60 combinations, they have other problems), isn't recursive, and doesn't have the bug.

How do I detect circular logic or recursion in a multi-levels references and dependencies

I have a graph of multi-level dependecies like this, and I need to detect any circular reference in this graph.
A = B
B = C
C = [D, B]
D = [C, A]
Somebody have a problem like this?
Any solution???
Thanks and sorry by english.
========= updated ==========
I had another situation.
1
2 = 1
3 = 2
4 = [2, 3]
5 = 4
In this case, my recursive code iterate two times in "4" reference, but this references don't generate a infinite loop. My problem is to know when function iterate more than one time a reference and is not infinite loop and when is a infinite loop, to inform user.
1 = 4
2 = 1
3 = 2
4 = [2, 3]
5 = 4
This case is a bit diferent from 2th example. This generate a infinite loop. how can I know when cases generate a infinite loop or not?

Topological sorting. The description on Wikipedia is clear and works for all your examples.
Basically you start with a node that has no dependencies, put it in a list of sorted nodes, and then remove that dependency from every node. For you second example that means you start with 1. Once you remove all dependencies on 1 you're left with 2. You end up sorting them 1,2,3,4,5 and seeing that there's no cycle.
For your third example, every node has a dependency, so there's nowhere to start. Such a graph must contain at least one cycle.

Keep a list of uniquely identified nodes. Try to loop through the entire tree but keep checking nodes in the list till you get a node being referred as a child which is already there in the unique list - take it from there (handle the loop or simply ignore it depending on your requirement)

One way to detect circular dependency is to keep a record of the length of the dependency chains that your ordering algorithm detects. If a chain becomes longer than the total number of nodes (due to repetition over a loop) then there is a circular dependency. This should work both for an iterative and for a recursive algorithm.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008