Scope random order with pagination - mysql

I'm trying to figure out how to make a scope that will return random ActiveRecords while also supporting will_paginate.
Right now each page view gets a completely different random set of ActiveRecords. Thus each pagination link, is actually just another random set of records.
How might I set up a scope so that it will be a random order that persists through pagination?
I'm guessing that I need to have some sort of seed that is based on time?

I'm guessing that you're using ORDER BY RAND() in your SQL to get your random ordering. If you are, then you can provide a seed for the random number generator:
RAND(), RAND(N)
[...] If a constant integer argument N is specified, it is used as the seed value, which produces a repeatable sequence of column values.
So you just need to pick a seed (possibly even by using seed = rand(1e6) or something similar in your Ruby code) and track that seed in the session. Then, for the next page, pull the seed out of the session and feed it to MySQL's RAND function.
Keep in mind that ORDER BY RAND() (with or without a seed) isn't the cheapest operation on the planet. If your searchable table is small, you could generate a table with a bunch of columns, fill it with random numbers (probably generated by RAND), and join that table in to provide your random sequence to order by. You would provide different sequences to different viewers by choosing different columns from the random number table: for one user you sort by column 11 of the random number table/matrix but another user will use column 23. Keep in mind that tables (with a primary key) really are just functions on a finite domain (and vice versa) so which you choose is often just an implementation detail. Implementing RAND using a table will usually be pretty cumbersome but I thought I'd mention the option anyway.

Related

get 10 random ids every single time from my database

I need 10 random ids at time. I need to get a new set of random ids every time I ask for a new set, but the new ones must not include any of the ones I already got from any number of previous times I asked for a new set Unless the process is reset. I may have a total of 100 or 1million ids in my database. I plan to use the ids to show 10 items on a webpage, with next and previous buttons. The pages already shown have to be consistent with the original items shown if the users goes back to any previously shown page
I have an idea that I select random numbers with seed 1000 times ,store it on redis server and pop out every 10 rows when a user enter the page. Are there any different ideas?
For a large set of non-repeating 'random' numbers you are probably better off using encryption. If numbers do not repeat, then they are not truly random because they are constrained not to repeat. Every time you pick a number the pool of available numbers shrinks. Hence the output is not truly random.
To implement, set up a counter: 0, 1, 2, 3, ... Pick a constant key. Then encrypt the counter using the key to get a non-repeating output. Then increment the counter ready to generate the next output. Because encryption is reversible then different inputs using the same key are guaranteed to give different outputs. Encryption is a one-to-one process.
AES will give you 128 non-repeating bits, DES only goes to 64 bits. If 128 bits are not enough then you will have to do some research on larger block ciphers, such as Rijndael.
You are looking for a repeatable random sort. In MySQL, you can do this by passing a seed to mathematical function rand(), as explained in the documentation:
for equal argument values, RAND(N) returns the same value each time, and thus produces a repeatable sequence of column values.
This gives you the first 10 records:
select t.*
from mytable t
order by rand(12345)
limit 10
You can then paginate; to get the "next" 10 records, you use the same seed to rand(), and offset the result:
select t.*
from mytable t
order by rand(12345)
limit 10 offset 10

MySQL Predictable Random Ordering

I am very surprised that I can't figure this out.
I'm currently outputting a table from MySQL in a somewhat random order. I say somewhat because there is a formula that is partially reliant on RAND(). In any case, we can assume the order is effectively random for my question.
This was all working great, except I want to keep the same order for a "session". I don't want it to keep jumping around while actively using the data. I have been trying to figure out how to have MySQL generate the same sequence a second time.
I know that you can do RAND(N) where N is a seed, but as far as I can tell that will be the exact same number each time. So basically there will be no random factor at all if I use that.
What I would like is a way I can feed a seed into my ORDER BY and always get a reliable output order. For the same seed, I will get the same order, and if I feed in a different seed, it will be a different random order.
The best I could come up with is that I could create an additional table cell with a RAND for each row and use that for sorting. There are a few issues:
Additional memory is used in the database.
It doesn't work for multiple users, because I'd need a separate column for each user.
I have to think about this, but I'm pretty certain that there is a clever solution here that doesn't involve me adding an additional column to the database. Has anyone else ever encountered the need to do something like this?
As you mentioned, you can provide a seed to generate a sequence of random numbers. The algorithm for generating random numbers returns the same sequence of numbers for the same seed number. For example;
SELECT Rand(1) AS rnd, CustomerId, CustomerName FROM Customers ORDER BY rnd
By doing so you, you will always have the same random order for the seed "1". You can provide session Id or some similar number in order to get the same result.
Hope it helps.
I would strongly suggest that you use one of the columns to generate the number:
select t.*
from t
order by rand(t.id);
This assumes that id is an integer. You can get a different ordering by adjusting the seed, say, rand(t.id + 1).

In a relational database, should all columns that will be ordered in a query have an index?

I'm accessing the database (Predominately MS SQL Server, Postgre) through ORM and defining attributes (like whether the field/column should have an index) via code.
I'm thinking that if a column will be ordered via ORDER BY, it should have an index, otherwise full table scan will be required every time (e.g. if you want to get top 5 records ordered by date).
As I'm defining these indexes in code (on Entity Framework POCO entities, as .NET attributes), I can access these metadata at runtime. When displaying the data in a grid, I'm planning to make only those columns sortable (by clicking on column header) that have an index attribute. Is my thinking correct, or maybe there exist some reasonable conditions where sorting can be desirable on non-indexed column, or vice-versa (indexed column sorting would not make much sense?..)
In short, is it good to assume that only those columns should be sortable in UI, that have corresponding index applied at the database level?
Or, to phrase more generic question: Should columns that will be ordered always have some sort of index?
Whether you need an index depends on how often you query the ordered sequence compared to how often you make changes that could influence the ordered sequence.
Every time you make changes that influence the ordered sequence your database has to reorder the ordered index. So if you will considerably make more changes than queries then the index will be ordered more often than the result of the ordering will be used.
Furthermore it depends on who is willing to wait for the result: the one who makes changes that requires a re-index, or the one who does the queries.
I wouldn't be surprised if the index is ordered by a separate process after the change has been made. If the query is done while the ordering is not finished, the database will need to first finish enough of the ordering before the query can return.
On the other hand, if a new change is made while the ordering that was needed because of an earlier change was not finished, the database probably will not finish the previous ordering, but start ordering the new situation.
So I guess it is not mandatory to have an ordered index for every query. To order every possible column-combination will be too much work, but if quite often a certain ordering is requested by a process that is waiting for the results, it might be wise to create the ordered index.
order by doesn't mandate index on a column but if isn't indexed then it will end up doing a file sort than index sort and thus it's always preferred to have those column indexed if you are intended to use them in WHERE / JOIN ON / HAVING / ORDER BY.
You can generate the query execution plan and see the differences between the versions (indexed over non-indexed)
Kudos to #Harald Coppoolse for a thorough answer - there's something else which you should know about sorting on the DB, and that it is preferred to be done at the app level. See item number 2 in the following list: https://www.brentozar.com/archive/2013/02/7-things-developers-should-know-about-sql-server/

Is the following possible in SQL query

Is the following possible, i been racking my brain to think of a solution.
I have an sql table, very simple table, few text columns and two int columns.
What i want to ideally do is allow user to add a row, but just the text columns and have the sql automatically put the numbers in the integer columns.
Ideally id like these numbers to random but not already exsist (so every row has a unique number) in the column. Also 10 digits long (but think that might be pushing it).
Is there anyway i can achieve this within the query itself?
Thanks
Sure - you pass the string as parameters to the Insert statement and the values as well - after you computed them. you can use SQL fucntion to generate the random number, or use the code you're calling from to generate them.
You can generate unique int numbers for a row with setting it AUTO_INCREMENT. However if you want something like a random hash, you need to do it in your backend. (or in a stored procedure)
Just a thought: if you generate long enough random strings you don't need to worry about having duplication usually. So it's safe to generate a random string, try to insert it and repeat until you get a duplicate entry error. Won't happen most of the time so it might be quicker than checking it first with a select.
You can generate a random number using MySQL. This will generate a random number between 0 and 10.000:
FLOOR(RAND() * 10001)
If you really want the numbers to always be 10 digits long you can generate a number between 1.000.000.000 and 9.999.999.999 like this:
FLOOR(RAND() * 9000000000) + 1000000000
The chance of the number not being unique is ~0.0000000001% and rising as you insert new rows. For a 0% chance of collision I'd suggest doing this the right way and handling this in code and not the database.
The random function explained:
What is happening is RAND() is generating a random decimal number between 0 and 1 (never actually 1). Then we multiply that number by the maximum number that we wish to produce plus 1. We add 1 because the biggest number produced for a set maximum number of 10 will be 9,XXXX and never actually 10 or above (remember I said that RAND() never generates 1), so we add plus one to produce the possibility of 10,XXXX which we later floor using FLOOR() to produce 10. In this case though we don't add 1 because 10.000.000.000 will become possible and it breaches our 10 digit boundary. Then we add the minimum number which we want produced (+ 1.000.000.000 in this case) while subtracting the same from the number we entered before (the maximum number).

Fast mysql query to randomly select N usernames

In my jsp application I have a search box that lets user to search for user names in the database. I send an ajax call on each keystroke and fetch 5 random names starting with the entered string.
I am using the below query:
select userid,name,pic from tbl_mst_users where name like 'queryStr%' order by rand() limit 5
But this is very slow as I have more than 2000 records in my table.
Is there any better approach which takes less time and let me achieve the same..? I need random values.
How slow is "very slow", in seconds?
The reason why your query could be slow is most likely that you didn't place an index on name. 2000 rows should be a piece of cake for MySQL to handle.
The other possible reason is that you have many columns in the SELECT clause. I assume in this case the MySQL engine first copies all this data to a temp table before sorting this large result set.
I advise the following, so that you work only with indexes, for as long as possible:
SELECT userid, name, pic
FROM tbl_mst_users
JOIN (
-- here, MySQL works on indexes only
SELECT userid
FROM tbl_mst_users
WHERE name LIKE 'queryStr%'
ORDER BY RAND() LIMIT 5
) AS sub USING(userid); -- join other columns only after picking the rows in the sub-query.
This method is a bit better, but still does not scale well. However, it should be sufficient for small tables (2000 rows is, indeed, small).
The link provided by #user1461434 is quite interesting. It describes a solution with almost constant performance. Only drawback is that it returns only one random row at a time.
does table has indexing on name?
if not apply it
2.MediaWiki uses an interesting trick (for Wikipedia's Special:Random feature): the table with the articles has an extra column with a random number (generated when the article is created). To get a random article, generate a random number and get the article with the next larger or smaller (don't recall which) value in the random number column. With an index, this can be very fast. (And MediaWiki is written in PHP and developed for MySQL.)
This approach can cause a problem if the resulting numbers are badly distributed; IIRC, this has been fixed on MediaWiki, so if you decide to do it this way you should take a look at the code to see how it's currently done (probably they periodically regenerate the random number column).
3.http://jan.kneschke.de/projects/mysql/order-by-rand/