Lock free doubly linked skip list - lock-free

There exists tons of research on lock-free doubly linked list. Likewise, there is tons of reserach on lock-free skip lists. As best I can tell, however, nobody has managed a lock free doubly linked skip list. Does anybody know of any research to the contrary, or a reason why this is the case?
Edit:
The specific scenario is for building a fast quantile (50%, 75%, etc) accumulator. Samples are inserted into the skip list in O(log n) time. By maintaining an iterator to the current quantile, we can compare the inserted value to the current quantile in O(1) time, and can easily determine whether the inserted value is to the left or right of the quantile, and by how much the quantile needs to move as a result. It's the left move that requires a previous pointer.
As I understand it, any difficulty will come from keeping the previous pointers consistent in the face of multiple threads inserting and removing at once. I imagine the solution will almost certainly involve a clever use of pointer marking.

But why would you do such a thing? I've not actually sat down and worked out exacty how skip lists work, but from my vague understanding, you'd never use the previous pointers. So why have the overhead of maintaining them?
But if you wanted to, I don't see why you cannot. Just replace the singly linked list with a doubly linked list. The doubly linked list is logically coherent, so it's all the same.

I have an idea for you. We use a "cursor" to find the item in a skiplist. The cursor also maintains the trail that was taken to get to the item. We use this trail for delete and insert - it avoids a second search to perform those operations, and it embeds the version # of the list that was seen when the traversal was made. I am wondering if you could use the cursor to more quickly find the previous item.
You would have to go up a level on the cursor and then search for the item that is just barely less than your item. Alternatively, if the search made it to the lowest level of the linked list, just save the prev ptr as you traverse. The lowest level is probably used 50% of the time to find your item, so performance would be decent.
Hmm... thinking about it now, it seems that the cursor would 50% of the time have the prev ptr, 25% of the time need to search again from 1 level up, 12.% 2 levels up, etc. So in infrequent cases, you have to almost do the search entirely again.
I think the advantage to this would be that you don't have to figure out how to "lock free" maintain a double linked skip list, and for the majority of cases you dramatically decrease the cost of locating the previous item.

As an alternative to maintaining backlinks, when a quantile needs to be updated, you could do another search to find the node whose key is less than the current one. As I also just mentioned in a comment to johnnycrash, it's possible to build an array of the rightmost node found at each level -- and from that it would be possible to accelerate the second search. (Fomitchev's thesis mentions this as a possible optimization.)

Related

Adding an order column to database

I have a table containing articles.
By default, the articles are sorted based on their date added (desc.) so newest articles appear first.
However, I would like to give the editor the ability to change the order of the articles so they can be displayed in the order he likes. So I am thinking of adding an integer "order" column.
I am in a dilemma of how to handle this as when an article's order is edited, I don't want to have to change al the others.
What is the best practice for this problem? and how other CMS like Wordpress handle this?
Updating the records between the moved record's original position and it's new position might be simplest and most reliable solution, and can be accomplished in two queries assuming you don't have a unique key on the ordering column.
The idea suggested by Bill's comment sounds like a good alternative, but with enough moves in the same region (about 32 for float, and 64 for double) you could still end up running into precision issues that will need checked for and handled.
Edit: Ok, I was curious and ran a test; it looks like you can half a float column 149 times between 0 and 1 (only taking 0.5, .25, .125, etc... not counting .75 and the like); so it may not be a huge worry.
Edit2: Of course, all that means is that a malicious user can cause a problem by simply moving the third item between the first and second items 150 times (i.e "swapping" the 2rd and 3rd by moving the new third.
More challenging is the UI to facilitate the migration of items.
First, determine what the main goal(s) are. Interview the Editors, then "read between the lines" -- they won't really tell you what they want.
If the only goal is to once move an item to the top of the list, then you could simply have a flag saying which item needs to come first. (Beware: Once the Editors have this feature, they will ask for more!)
Move an item to the 'top' of the list, but newer items will be inserted above it.
Move an item to the 'top' of the list, but newer items will be inserted below it
Swap pairs of adjecent items. (This is often seen in UIs with only a small number of items; not viable for thousands -- unless the rearrangement is just localized.
Major scrambling.
Meanwhile, the UI needs to show enough info to be clear what the items are, yet compact enough to fit on a single screen. (This may be an unsolvable problem.)
Once you have decided on a UI, the internals in the database are not a big deal. INT vs FLOAT -- either would work.
INT -- easy for swapping adjacent pairs; messier for moving a item to the top of the list.
FLOAT -- runs out of steam after about 20 rearrangements (in the worst case). DOUBLE would last longer; BIGINT could simulate such -- by starting with large gaps between items' numbers.
Back to your question -- I doubt if there is a "standard" way to solve the problem. Think of it as a "simple" problem that can be dealt with.

Will alpha-beta pruning remove randomness in my solution with minimax?

Existing implementation:
In my implementation of Tic-Tac-Toe with minimax, I look for all boxes where I can get best result and chose 1 of them randomly, so that the same solution isn't displayed each time.
For ex. if the returned list is [1, 0 , 1, -1], at some point, I will randomly chose between the two highest values.
Question about Alpha-Beta Pruning:
Based on what I understood, when the algorithm finds that it is winning from one path, it would no longer need to look for other paths that might/ might not lead to a winning case.
So will this, like I feel, cause the earliest possible box that leads to the best solution to be displayed as the result and seem the same each time? For example at the time of first move, all moves lead to a draw. So will the 1st box be selected each time?
How can I bring randomness to the solution like with the minimax solution? One way that I thought about now could be to randomly pass the indices to the alpha-beta algorithm. So the result will be the first best solution in that randomly sorted list of positions.
Thanks in advance. If there is some literature on this, I'd be glad to read it.
If someone could post some good reference for aplha-beta pruning, That'll be excellent as I had a hard time understanding how to apply it.
To randomly pick among multiple best solutions (all equal) in alpha-beta pruning, you can modify your evaluation function to add a very small random number whenever you evaluate a game state. You should just make sure that the magnitude of that random number is never greater than the true difference between the evaluations of two states.
For example, if the true evaluation function for your game state can only return values -1, 0, and 1, you could add a randomly generated number in the range [0.0, 0.01] to the evaluation of every game state.
Without this, alpha-beta pruning doesn't necessarily find only one solution. Consider this example from wikipedia. In the middle, you see that two solutions with an evaluation of 6 were found, so it can find more than one. I do actually think it will still find all moves leading to optimal solutions at the root node, but not actually find all solutions deep down in the tree. Suppose, in the example image, that the pruned node with score of 9 in the middle actually had a score of 6. It would still get pruned there, so that particular solution wouldn't be found, but the move from root node leading to it (the middle move at root) would still be found. So, eventually, you would be able to reach it.
Some interesting notes:
This implementation would also work in minimax, and avoid the need to store a list of multiple (equally good) solutions
In more complex games than Tic Tac Toe, where you cannot search the complete state space, adding a small random number for the max player and deducting a small random number for the min player like this may actually slightly improve your heuristic evaluation function. The reason for this is as follows. Suppose in state A you have 5 moves available, and in state B you have 10 moves available, which all result in the same heuristic evaluation score. Intuitively, the successors of state B may be slightly better, because you had more moves available; in many games, having more moves available means that you are in a better position. Because you generated 10 random numbers for the 10 successors of state B, it is also a bit more likely that the highest generated random number is among those 10 (instead of the 5 numbers generated for successors of A)

Database design for recursive children

This design problem is turning out to be a bit more "interesting" than I'd expected....
For context, I'll be implementing whatever solution I derive in Access 2007 (not much choice--customer requirement. I might be able to talk them into a different back end, but the front end has to be Access (and therefore VBA & Access SQL)). The two major activities that I anticipate around these tables are batch importing new structures from flat files and reporting on the structures (with full recursion of the entire structure). Virtually no deletes or updates (aside from entire trees getting marked as inactive when a new version is created).
I'm dealing with two main tables, and wondering if I really have a handle on how to relate them: Products and Parts (there are some others, but they're quite straightforward by comparison).
Products are made up of Parts. A Part can be used in more than one Product, and most Products employ more than one Part. I think that a normal many-to-many resolution table can satisfy this requirement (mostly--I'll revisit this in a minute). I'll call this Product-Part.
The "fun" part is that many Parts are also made up of Parts. Once again, a given Part may be used in more than one parent Part (even within a single Product). Not only that, I think that I have to treat the number of recursion levels as effectively arbitrary.
I can capture the relations with a m-to-m resolution from Parts back to Parts, relating each non-root Part to its immediate parent part, but I have the sneaking suspicion that I may be setting myself up for grief if I stop there. I'll call this Part-Part. Several questions occur to me:
Am I borrowing trouble by wondering about this? In other words, should I just implement the two resolution tables as outlined above, and stop worrying?
Should I also create Part-Part rows for all the ancestors of each non-root Part, with an extra column in the table to store the number of generations?
Should Product-Part contain rows for every Part in the Product, or just the root Parts? If it's all Parts, would a generation indicator be useful?
I have (just today, from the Related Questions), taken a look at the Nested Set design approach. It looks like it could simplify some of the requirements (particularly on the reporting side), but thinking about generating the tree during the import of hundreds (occasionally thousands) of Parts in a Product import is giving me nightmares before I even get to sleep. Am I better off biting that bullet and going forward this way?
In addition to the specific questions above, I'd appreciate any other comentary on the structural design, as well as hints on how to process this, either inbound or outbound (though I'm afraid I can't entertain suggestions of changing the language/DBMS environment).
Bills of materials and exploded parts lists are always so much fun. I would implement Parts as your main table, with a Boolean field to say a part is "sellable". This removes the first-level recursion difference and the redundancy of Parts that are themselves Products. Then, implement Products as a view of Parts that are sellable.
You're on the right track with the PartPart cross-ref table. Implement a constraint on that table that says the parent Part and the child Part cannot be the same Part ID, to save yourself some headaches with infinite recursion.
Generational differences between BOMs can be maintained by creating a new Part at the level of the actual change, and in any higher levels in which the change must be accomodated (if you want to say that this new Part, as part of its parent hierarchy, results in a new Product). Then update the reference tree of any Part levels that weren't revised in this generational change (to maintain Parts and Products that should not change generationally if a child does). To avoid orphans (unreferenced Parts records that are unreachable from the top level), Parts can reference their predecessor directly, creating a linked list of ancestors.
This is a very complex web, to be sure; persisting tree-like structures of similarly-represented objects usually are. But, if you're smart about implementing constraints to enforce referential integrity and avoid infinite recursion, I think it'll be manageable.
I would have one part table for atomic parts, then a superpart table with a superpartID and its related subparts. Then you can have a product/superpart table.
If a part is also a superpart, then you just have one row for the superpartID with the same partID.
Maybe 'component' is a better term than superpart. Components could be reused in larger components, for example.
You can find sample Bill of Materials database schemas at
http://www.databaseanswers.org/data_models/
The website offers Access applications for some of the models. Check with the author of the website.

Searching in graphs trees with Depth/Breadth first/A* algorithms

I have a couple of questions about searching in graphs/trees:
Let's assume I have an empty chess board and I want to move a pawn around from point A to B.
A. When using depth first search or breadth first search must we use open and closed lists ? This is, a list that has all the elements to check, and other with all other elements that were already checked? Is it even possible to do it without having those lists? What about A*, does it need it?
B. When using lists, after having found a solution, how can you get the sequence of states from A to B? I assume when you have items in the open and closed list, instead of just having the (x, y) states, you have an "extended state" formed with (x, y, parent_of_this_node) ?
C. State A has 4 possible moves (right, left, up, down). If I do as first move left, should I let it in the next state come back to the original state? This, is, do the "right" move? If not, must I transverse the search tree every time to check which states I've been to?
D. When I see a state in the tree where I've already been, should I just ignore it, as I know it's a dead end? I guess to do this I'd have to always keep the list of visited states, right?
E. Is there any difference between search trees and graphs? Are they just different ways to look at the same thing?
A. When using depth first search or
breadth first search must we use open
and closed lists ?
With DFS you definitely need to store at least the current path. Otherwise you would not be able to backtrack. If you decide upon maintaining a list of all visited (closed) nodes, you are able to detect and avoid cycles (expanding the same node more than once). On the other side you don't have the space efficiency of DFS anymore. DFS without closed list only needs space proportional to the depth of the search space.
With BFS you need to maintain an open list (sometimes called fringe). Otherwise the algorithm simply can't work. When you additionally maintain a closed list, you will (again) be able to detect/avoid cycles. With BFS the additional space for the closed list might be not that bad, since you have to maintain the fringe anyway. The relation between fringe size and closed list size strongly depends upon the structure of the search space, so this has to be considered here. E.g. for a branching factor of 2, both lists are equal in size and the impact of having the closed list doesn't seem very bad compared to its benefits.
What about A*, does it need it?
A*, as it can be seen as some special (informed) type of BFS, needs the open list. Omitting the closed list is more delicate than with BFS; also deciding upon updating costs inside the closed list. Depending upon those decisions, the algorithm can stop being optimal and/or complete depending on the type of heuristic used, etc. I won't go into details here.
B.
Yup, the closed list should form some kind of inverse tree (pointers going towards the root node), so you can extract the solution path. You usually need the closed list for doing this. For DFS, your current stack is exactly the solution path (no need for closed list here). Also note that sometimes you are not interested in the path but only in the solution or the existence of it.
C.
Read previous answers and look for the parts which talk about the detection of cycles.
D.
To avoid cycles with a closed list: don't expand nodes that are already inside the closed list. Note: with path-costs coming into play (remember A*), things might get more tricky.
E. Is there any difference between
search trees and graphs?
You could consider searches that maintain a closed list to avoid cycles as graph-searches and those without one tree-searches.
A) It's possible to avoid the open/closed lists - you could try all possible paths, but that would take a VERY long time.
B) Once you've reached the goal, you use the parent_of_this_node information to "walk backwards" from the goal. Start with the goal, get its parent, get the parent's parent, etc. until you reach the start.
C) I think it doesn't matter - there's no way that the step you describe would result in a shorter path (unless your steps have negative weight, in which case you can't use Dijkstra/A*). In my A* code, I check for this case and ignore it, but do whatever is easiest to code up.
D) It depends - I believe Dijkstra can never reopen the same node (can someone correct me on that?). A* definitely can revisit a node - if you find a shorter path to the same node, you keep that path, otherwise you ignore it.
E) Not sure, I've never done anything specifically for trees myself.
There's a good introduction to A* here:
http://theory.stanford.edu/~amitp/GameProgramming/
that covers a lot of details about how to implement the open set, pick a heuristic, etc.
A. Open and Closed lists are common implementation details, not part of the algorithm as such. It's common to do a depth-first tree search without either of these for example, the canonical way being a recursive traversal of the tree.
B. It is typical to ensure that nodes refer back to previous nodes allowing for a plan to be reconstructed by following the back-links. Alternatively you just store the entire solution so far in each candidate, though it would then be misleading to call it a node really.
C. I'm assuming that moving left and then moving right bring you to an equivalent state - in this case, you would have already explored the original state, it would be on the closed list, and therefore should not have been put back onto the open list. You don't traverse the search tree each time because you keep a closed list - often implemented as an O(1) structure - for precisely this purpose of knowing which states have already been fully examined. Note that you cannot always assume that being in the same position is the same as being in the same state - for most game path-finding purposes, it is, but for general purpose search, it is not.
D. Yes, the list of visited states is what you're calling the closed list. You also want to check the open list to ensure you're not planning to examine a given state twice. You don't need to search any tree as such, since you typically store these things in linear structures. The algorithm as a whole is searching a tree (or a graph), and it generates a tree (of nodes representing the state space) but you don't explicitly search through a tree structure at any point within the algorithm.
E. A tree is a type of graph with no cycles/loops in it. Therefore you use the same graph search procedure to search either. It's common to generate a tree structure that represents your search through the graph, which is represented implicitly by the backwards links from each node to the node that preceded/generated it in the search. (Although if you go down the route of holding the entire plan in each state, there will be no tree, just a list of partial solutions.)

Pathing in a non-geographic environment

For a school project, I need to create a way to create personnalized queries based on end-user choices.
Since the user can choose basically any fields from any combination of tables, I need to find a way to map the tables in order to make a join and not have extraneous data (This may lead to incoherent reports, but we're willing to live with that).
For up to two tables, I already managed to design an algorithm that works fine. However, when I add another table, I can't find a way to path through my database. All tables available for the personnalized reports can be linked together so it really all falls down to finding which path to use.
You might be able to try some form of an A* algorithm. Basically this looks at each of the possible next options to choose and applies a heuristic to it, a function that determines roughly how far it is between this node and your goal. It then chooses the one that is closer and repeats. The hardest part of implementing A* is designing a good heuristic.
Without more information on how the tables fit together, or what you mean by a 'path' through the tables, it's hard to recommend something though.
Looks like it didn't like my link, probably the * in it, try:
http://en.wikipedia.org/wiki/A*_search_algorithm
Edit:
If that is the whole database, I'd go with a depth-first exhaustive search.
I thought about using A* or a similar algorithm, but as you said, the hardest part is about designing the heuristic.
My tables are centered around somewhat of a backbone with quite a few branches each leading to at most a single leaf node. Here is the actual map (table names removed because I'm paranoid). Assuming I want to view data from tha A, B and C tables, I need an algorithm to find the blue path.