Adding an order column to database - mysql

I have a table containing articles.
By default, the articles are sorted based on their date added (desc.) so newest articles appear first.
However, I would like to give the editor the ability to change the order of the articles so they can be displayed in the order he likes. So I am thinking of adding an integer "order" column.
I am in a dilemma of how to handle this as when an article's order is edited, I don't want to have to change al the others.
What is the best practice for this problem? and how other CMS like Wordpress handle this?

Updating the records between the moved record's original position and it's new position might be simplest and most reliable solution, and can be accomplished in two queries assuming you don't have a unique key on the ordering column.
The idea suggested by Bill's comment sounds like a good alternative, but with enough moves in the same region (about 32 for float, and 64 for double) you could still end up running into precision issues that will need checked for and handled.
Edit: Ok, I was curious and ran a test; it looks like you can half a float column 149 times between 0 and 1 (only taking 0.5, .25, .125, etc... not counting .75 and the like); so it may not be a huge worry.
Edit2: Of course, all that means is that a malicious user can cause a problem by simply moving the third item between the first and second items 150 times (i.e "swapping" the 2rd and 3rd by moving the new third.

More challenging is the UI to facilitate the migration of items.
First, determine what the main goal(s) are. Interview the Editors, then "read between the lines" -- they won't really tell you what they want.
If the only goal is to once move an item to the top of the list, then you could simply have a flag saying which item needs to come first. (Beware: Once the Editors have this feature, they will ask for more!)
Move an item to the 'top' of the list, but newer items will be inserted above it.
Move an item to the 'top' of the list, but newer items will be inserted below it
Swap pairs of adjecent items. (This is often seen in UIs with only a small number of items; not viable for thousands -- unless the rearrangement is just localized.
Major scrambling.
Meanwhile, the UI needs to show enough info to be clear what the items are, yet compact enough to fit on a single screen. (This may be an unsolvable problem.)
Once you have decided on a UI, the internals in the database are not a big deal. INT vs FLOAT -- either would work.
INT -- easy for swapping adjacent pairs; messier for moving a item to the top of the list.
FLOAT -- runs out of steam after about 20 rearrangements (in the worst case). DOUBLE would last longer; BIGINT could simulate such -- by starting with large gaps between items' numbers.
Back to your question -- I doubt if there is a "standard" way to solve the problem. Think of it as a "simple" problem that can be dealt with.

Related

Will alpha-beta pruning remove randomness in my solution with minimax?

Existing implementation:
In my implementation of Tic-Tac-Toe with minimax, I look for all boxes where I can get best result and chose 1 of them randomly, so that the same solution isn't displayed each time.
For ex. if the returned list is [1, 0 , 1, -1], at some point, I will randomly chose between the two highest values.
Question about Alpha-Beta Pruning:
Based on what I understood, when the algorithm finds that it is winning from one path, it would no longer need to look for other paths that might/ might not lead to a winning case.
So will this, like I feel, cause the earliest possible box that leads to the best solution to be displayed as the result and seem the same each time? For example at the time of first move, all moves lead to a draw. So will the 1st box be selected each time?
How can I bring randomness to the solution like with the minimax solution? One way that I thought about now could be to randomly pass the indices to the alpha-beta algorithm. So the result will be the first best solution in that randomly sorted list of positions.
Thanks in advance. If there is some literature on this, I'd be glad to read it.
If someone could post some good reference for aplha-beta pruning, That'll be excellent as I had a hard time understanding how to apply it.
To randomly pick among multiple best solutions (all equal) in alpha-beta pruning, you can modify your evaluation function to add a very small random number whenever you evaluate a game state. You should just make sure that the magnitude of that random number is never greater than the true difference between the evaluations of two states.
For example, if the true evaluation function for your game state can only return values -1, 0, and 1, you could add a randomly generated number in the range [0.0, 0.01] to the evaluation of every game state.
Without this, alpha-beta pruning doesn't necessarily find only one solution. Consider this example from wikipedia. In the middle, you see that two solutions with an evaluation of 6 were found, so it can find more than one. I do actually think it will still find all moves leading to optimal solutions at the root node, but not actually find all solutions deep down in the tree. Suppose, in the example image, that the pruned node with score of 9 in the middle actually had a score of 6. It would still get pruned there, so that particular solution wouldn't be found, but the move from root node leading to it (the middle move at root) would still be found. So, eventually, you would be able to reach it.
Some interesting notes:
This implementation would also work in minimax, and avoid the need to store a list of multiple (equally good) solutions
In more complex games than Tic Tac Toe, where you cannot search the complete state space, adding a small random number for the max player and deducting a small random number for the min player like this may actually slightly improve your heuristic evaluation function. The reason for this is as follows. Suppose in state A you have 5 moves available, and in state B you have 10 moves available, which all result in the same heuristic evaluation score. Intuitively, the successors of state B may be slightly better, because you had more moves available; in many games, having more moves available means that you are in a better position. Because you generated 10 random numbers for the 10 successors of state B, it is also a bit more likely that the highest generated random number is among those 10 (instead of the 5 numbers generated for successors of A)

storing rows order in mysql

I need to give the ability to change order of displaying rows to my script admin page.
for that there is a default order for newly added rows (the go to the end of list) and admin should be able to change the position of an specific row.
I'm going to act the rows like a doubly linked list to be able to re-position rows.
Is it OK to use linked list method for saving the display position of mysql rows?
Is there a better method?
Should I use a separate table to store orders or it is OK to add two next & prev columns to original table?
Is it possibe then to use mysql order statement with this method?
Edit: I also thought of using spaced order codes (e.g. 0, 100, 200, ...) but this has a limit that may be reached
I think you'll be better off just storing the ordering position in a dedicated field, instead of trying to implement a linked list.
The issue with the linked list is that is requires some sort of list traversal to "reconstruct" the order before you can display it to the user. Normally, you'd employ a recursive query to do that, but unfortunately MySQL doesn't support recursive queries, so you'll either need to fiddle with stored procedures, or end-up making a database round-trip for each and every list node.
All in all, just updating the order field of several rows from time to time (when you need to reorder) is probably cheaper than traversing the list every time (when you need to display it), especially if you mostly move rows by small distancees. And if you introduce gaps (as you already mentioned), the number of rows that you'll actually need to update will fall dramatically, at the price of increased complexity.
You may also be able to piggy-back the order field onto the clustering mechanism offered by InnoDB.
YMMV, of course, but I'd advise benchmarking the simple order field approach on representative amounts of data before attempting to implement anything more sophisticated...

Database Design: How should I store 'word difficulty' in MySQL?

I made a vocabulary app for Android that has a list of ~5000 words stored in a local database (SQLite), and I want to find out which words are more difficult than others.
To find out, I'm thinking of adding a very simple feature that puts two random words on the screen, asking the user to choose the more difficult one. Then another pair of random words will show, and this process can be repeated for as long as the user wants. The more users who participate in this 'more difficult word', the app would in theory be able to distinguish difficult words from easy words.
Since the difficulty would be based on input from all users, I know I need to keep track of it online so that every app could then fetch them from the database on my website (which is MySQL). I'm not sure what would be the most efficient way to keep track of the difficulty, but I came up with two possible solutions:
1) Add a difficulty column that holds integer values to the words table. Then for every pair of words that a user looks at and ranks, the word that he/she chooses more difficult would have have its difficulty increased by one, and the word not chosen would have its difficulty decreased by one. I could simply order by that integer value to get the most difficult ones.
2) Create a difficulty table with two columns, more and less, that hold words (or ID's of the words to save space) based on the results of each selection a user makes. I'm still unsure how I would get the most difficult words - some combination of group by and order by?
The benefit of my second solution is that I can know how many times each word has been seen (# of rows from the more column that contain the word + # rows from the less column that contain the word). That helps with statistics, like if I wanted to find out which word has the highest ratio of more / less. But it would also take up much more space than my first suggested solution would, and don't know how it could scale.
Which do you think is the better solution, or what other ones should I consider?
Did you try sphinx for this? Guess a full text search engine like sphinx would solve with great performance.

More efficient to have two tables or one table with tons of fields

Related but not quite the same thing:which is more effcient? (or at least reading through it didn't help me any)
So I am working on a new site (selling insurance policies) we already have several sites up (its a rails application) that do this so I have a table in my sql database called policies.
As you can imagine it has lots of columns to support all the different options available.
While working on this new site I realized I needed to keep track of 20+ more options.
My concern is that the policies table is already large, but the columns in it right now are almost all used by every application we have. Whereas if I add these they would only be used for the new site and would leave tons of null cells on all the rest of the policies.
So my question is do I add those to the existing table or create a new table just for the policies sold on that site? Also I believe that if I created a new table I could leave out some of the columns (but not very many) from the main policies table because they are not needed for this application.
"[A]lmost all used" suggests that you could, upon considering it, split it more naturally.
Now, much of the efficiency concern here goes down to three things:
A single table can be scanned through more quickly than joins across several.
Large rows have a memory and disk-space cost in themselves.
If a single table represents something that is really a 1-to-many, then it requires more work on insert, delete or update.
Point 2 only really comes in, should there be a lot of cases where you need one particular subset of the data, and another batch where you need another subset, and maybe just a few where you need them all. If you're using most of the columns in most places, then it doesn't gain you anything. In that case, splitting tables is bad.
Point 1 and 3 argue for and against joining into one big table, respectively.
Before any of that though, let's get back to "almost all". If there are several rows with a batch of null fields, why? Often answering that "why?" reveals that really there's a natural split there, that should be broken off into another table as part of normal normalisation*. Repetition of fields, is an even greater suggestion that this is the case.
Do this first.
To denormalise - whether by splitting what is naturally one table, or joining what is naturally several - is a very particular type of optimisation - it makes some things more efficient at the cost of making other things less efficient, and it introduces possibilities of bugs that don't exist otherwise. I would never say you should never denormalise - I do it myself - but you need to be able to say "I am denormalising table X & Y in this manner, because it will help case C which happens enough and I can live with the extra cost to case D". Then you need to check it actually did help case C significantly and case D insignificantly, along with looking for hidden costs.
One of the reasons for normalising in the first place is it gives good average performance over a wide range of cases. It's the balance you want most of the time. Denormalising from the get-go rather than with a normalised database as a starting point is almost always premature.
*Fun trivia fact: The name "normalization" was in part a take on Richard Nixon's "Vietnamisation" policy meaning there was a running joke in some quarters of adding "-isation" onto just about anything. Were it not for the Whitehouse's reaction to the Tet Offensive, we could be using the gernund "normalising," or something completely different instead.

Lock free doubly linked skip list

There exists tons of research on lock-free doubly linked list. Likewise, there is tons of reserach on lock-free skip lists. As best I can tell, however, nobody has managed a lock free doubly linked skip list. Does anybody know of any research to the contrary, or a reason why this is the case?
Edit:
The specific scenario is for building a fast quantile (50%, 75%, etc) accumulator. Samples are inserted into the skip list in O(log n) time. By maintaining an iterator to the current quantile, we can compare the inserted value to the current quantile in O(1) time, and can easily determine whether the inserted value is to the left or right of the quantile, and by how much the quantile needs to move as a result. It's the left move that requires a previous pointer.
As I understand it, any difficulty will come from keeping the previous pointers consistent in the face of multiple threads inserting and removing at once. I imagine the solution will almost certainly involve a clever use of pointer marking.
But why would you do such a thing? I've not actually sat down and worked out exacty how skip lists work, but from my vague understanding, you'd never use the previous pointers. So why have the overhead of maintaining them?
But if you wanted to, I don't see why you cannot. Just replace the singly linked list with a doubly linked list. The doubly linked list is logically coherent, so it's all the same.
I have an idea for you. We use a "cursor" to find the item in a skiplist. The cursor also maintains the trail that was taken to get to the item. We use this trail for delete and insert - it avoids a second search to perform those operations, and it embeds the version # of the list that was seen when the traversal was made. I am wondering if you could use the cursor to more quickly find the previous item.
You would have to go up a level on the cursor and then search for the item that is just barely less than your item. Alternatively, if the search made it to the lowest level of the linked list, just save the prev ptr as you traverse. The lowest level is probably used 50% of the time to find your item, so performance would be decent.
Hmm... thinking about it now, it seems that the cursor would 50% of the time have the prev ptr, 25% of the time need to search again from 1 level up, 12.% 2 levels up, etc. So in infrequent cases, you have to almost do the search entirely again.
I think the advantage to this would be that you don't have to figure out how to "lock free" maintain a double linked skip list, and for the majority of cases you dramatically decrease the cost of locating the previous item.
As an alternative to maintaining backlinks, when a quantile needs to be updated, you could do another search to find the node whose key is less than the current one. As I also just mentioned in a comment to johnnycrash, it's possible to build an array of the rightmost node found at each level -- and from that it would be possible to accelerate the second search. (Fomitchev's thesis mentions this as a possible optimization.)