get 10 random ids every single time from my database - mysql

I need 10 random ids at time. I need to get a new set of random ids every time I ask for a new set, but the new ones must not include any of the ones I already got from any number of previous times I asked for a new set Unless the process is reset. I may have a total of 100 or 1million ids in my database. I plan to use the ids to show 10 items on a webpage, with next and previous buttons. The pages already shown have to be consistent with the original items shown if the users goes back to any previously shown page
I have an idea that I select random numbers with seed 1000 times ,store it on redis server and pop out every 10 rows when a user enter the page. Are there any different ideas?

For a large set of non-repeating 'random' numbers you are probably better off using encryption. If numbers do not repeat, then they are not truly random because they are constrained not to repeat. Every time you pick a number the pool of available numbers shrinks. Hence the output is not truly random.
To implement, set up a counter: 0, 1, 2, 3, ... Pick a constant key. Then encrypt the counter using the key to get a non-repeating output. Then increment the counter ready to generate the next output. Because encryption is reversible then different inputs using the same key are guaranteed to give different outputs. Encryption is a one-to-one process.
AES will give you 128 non-repeating bits, DES only goes to 64 bits. If 128 bits are not enough then you will have to do some research on larger block ciphers, such as Rijndael.

You are looking for a repeatable random sort. In MySQL, you can do this by passing a seed to mathematical function rand(), as explained in the documentation:
for equal argument values, RAND(N) returns the same value each time, and thus produces a repeatable sequence of column values.
This gives you the first 10 records:
select t.*
from mytable t
order by rand(12345)
limit 10
You can then paginate; to get the "next" 10 records, you use the same seed to rand(), and offset the result:
select t.*
from mytable t
order by rand(12345)
limit 10 offset 10

Related

Is the following possible in SQL query

Is the following possible, i been racking my brain to think of a solution.
I have an sql table, very simple table, few text columns and two int columns.
What i want to ideally do is allow user to add a row, but just the text columns and have the sql automatically put the numbers in the integer columns.
Ideally id like these numbers to random but not already exsist (so every row has a unique number) in the column. Also 10 digits long (but think that might be pushing it).
Is there anyway i can achieve this within the query itself?
Thanks
Sure - you pass the string as parameters to the Insert statement and the values as well - after you computed them. you can use SQL fucntion to generate the random number, or use the code you're calling from to generate them.
You can generate unique int numbers for a row with setting it AUTO_INCREMENT. However if you want something like a random hash, you need to do it in your backend. (or in a stored procedure)
Just a thought: if you generate long enough random strings you don't need to worry about having duplication usually. So it's safe to generate a random string, try to insert it and repeat until you get a duplicate entry error. Won't happen most of the time so it might be quicker than checking it first with a select.
You can generate a random number using MySQL. This will generate a random number between 0 and 10.000:
FLOOR(RAND() * 10001)
If you really want the numbers to always be 10 digits long you can generate a number between 1.000.000.000 and 9.999.999.999 like this:
FLOOR(RAND() * 9000000000) + 1000000000
The chance of the number not being unique is ~0.0000000001% and rising as you insert new rows. For a 0% chance of collision I'd suggest doing this the right way and handling this in code and not the database.
The random function explained:
What is happening is RAND() is generating a random decimal number between 0 and 1 (never actually 1). Then we multiply that number by the maximum number that we wish to produce plus 1. We add 1 because the biggest number produced for a set maximum number of 10 will be 9,XXXX and never actually 10 or above (remember I said that RAND() never generates 1), so we add plus one to produce the possibility of 10,XXXX which we later floor using FLOOR() to produce 10. In this case though we don't add 1 because 10.000.000.000 will become possible and it breaches our 10 digit boundary. Then we add the minimum number which we want produced (+ 1.000.000.000 in this case) while subtracting the same from the number we entered before (the maximum number).

Selecting unique fields but am unsure if DISTINCT or GROUP BY will do as I need

I am using MySQL ver 5.5.8.
Lets say I have the table,entries, structure like so:
entry_id int PK
member_id FK
there can be multiple entries for each member. I want to get 10 of them at random but I need to fetch them in a way that allows for the odds of being selected increase with the number of entries a member has. I know I could just do something like:
SELECT member_id
FROM entries
GROUP BY member_id
ORDER BY RAND()
LIMIT 10
But I'm not sure if that will do what I want. Will MySQL group the records THEN select 10? If that were the case then every member would have the same chance to get picked, which is not what I want. I have done some testing and searching but can't come up with a definitive answer. Does anyone know if this will do what I want or do I have to do things a different way? Any help would be appreciated. Thanks much!
LIMIT 10 will choose 10 records base in (in this case) a random order. This is indeed after the grouping.
Maybe you can ORDER BY RAND() / count(*). That way, the number is likely to be smaller for users with more questions, thus they are more likely to be in the top 10.
[edit]
By the way, it seems that over time (as the data grows) ORDER BY RAND() becomes slower. There are a couple of ways to work around that. Mediawiki (software behind Wikipedia) has an interesting method: It generates a random number for each page, so when you select 'random page', it generates one random number between 0 and 1 and selects the page that is closest to that number:
WHERE number > {randomNumber} ORDER BY number LIMIT 1`
That saves having to generate that temporary table for each query. You will need to periodically re-generate the numbers if your data grows, and you must make sure the numbers are evenly generated. That is easy enough: For new records, you can just generate a random number. Periodically the entire list is updated: All records are queried. Then, each record in that order is assigned a number between 0 and 1, but in an incrementing number, that increments 1 / recordCount. That way, the records are evenly spaced, and the change of finding them is the same for each one of them.
You could use that method too. It will make your query faster in the long run, and you could make the distribution smarter: 1) Instead of using 'memberCount', you can use 'totalEntryCount'. 2) Instead of incrementing by 1 / 'memberCount', you could use entryCountForMember / totalEntryCount. That way, the gap before members with more entries will be bigger, therefor, the chance of them matching the random number will be bigger as well. For instance, your members may look like this:
name entries number delta
bob 10 0.1 0.10
john 1 0.11 0.01
jim 5 0.16 0.05
fred 84 1 0.84
The delta isn't saved, of course, but it shows the added number. In the Mediawiki example, this delta would be the same for each page, but in your case, it could depend on the number of entries. Now you see, there's only a small gap between bob and john, so the chance that you pick a random number between 0 and bob is ten times as large as picking a random number between bob and john. So, chances of picking bob are ten times as large as picking john.
You will need a (cron) job to periodically redistribute the numbers, because you don't want to do that on each modification, but for the kind of data you're dealing with, it doesn't have to be real-time, and it makes your queries a lot faster if you got many members and many entries.

Scope random order with pagination

I'm trying to figure out how to make a scope that will return random ActiveRecords while also supporting will_paginate.
Right now each page view gets a completely different random set of ActiveRecords. Thus each pagination link, is actually just another random set of records.
How might I set up a scope so that it will be a random order that persists through pagination?
I'm guessing that I need to have some sort of seed that is based on time?
I'm guessing that you're using ORDER BY RAND() in your SQL to get your random ordering. If you are, then you can provide a seed for the random number generator:
RAND(), RAND(N)
[...] If a constant integer argument N is specified, it is used as the seed value, which produces a repeatable sequence of column values.
So you just need to pick a seed (possibly even by using seed = rand(1e6) or something similar in your Ruby code) and track that seed in the session. Then, for the next page, pull the seed out of the session and feed it to MySQL's RAND function.
Keep in mind that ORDER BY RAND() (with or without a seed) isn't the cheapest operation on the planet. If your searchable table is small, you could generate a table with a bunch of columns, fill it with random numbers (probably generated by RAND), and join that table in to provide your random sequence to order by. You would provide different sequences to different viewers by choosing different columns from the random number table: for one user you sort by column 11 of the random number table/matrix but another user will use column 23. Keep in mind that tables (with a primary key) really are just functions on a finite domain (and vice versa) so which you choose is often just an implementation detail. Implementing RAND using a table will usually be pretty cumbersome but I thought I'd mention the option anyway.

pick random number from a database which doesn't gets picked next time

I want to pick a random number from a database say 1, when I query again say 10 times I'm querying it should return all different random numbers.
Use ORDER BY RAND()
If you want to get just one random record, your query should look like:
SELECT * FROM table ORDER BY RAND() LIMIT 1
If you only want a random number, don't use a database, use a random number generator. If you don't want repeats, simply keep track of the random numbers you've seen before and, if you pick one again, increment the new number by one until you reach a number you haven't seen yet.
If you want, say 10, random records from the database, then use #sAc's solution, but get them all at the same time. That will ensure that you don't have repeats in your selection. If you must select them one at a time, use the same technique as for the random number and keep track of the records you've seen before. Don't use the LIMIT directive, but simply select the first record you haven't seen on each iteration.

ORDER BY RAND() alternative [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
MySQL: Alternatives to ORDER BY RAND()
I currently have a query that ends ORDER BY RAND(HOUR(NOW())) LIMIT 40 to get 40 random results. The list of results changes each hour.
This kills the query cache, which is damaging performance.
Can you suggest an alternative way of getting a random(ish) set of results that changes from time to time? It does not have to be every hour and it does not have to be totally random.
I would prefer a random result, rather than sorting on an arbitrary field in the table, but I will do that as a last resort...
(this is a list of new products that I want to shuffle around a bit every now and then).
If you have an ID column it's better to do a:
-- create a variable to hold the random number
SET #rownum := SELECT count(*) FROM table;
SET #row := (SELECT CEIL((rand() * #rownum));
-- use the random number to select on the id column
SELECT * from tablle WHERE id = #row;
The logic of selecting the random id number can be move to the application level.
SELECT * FROM table ORDER BY RAND LIMIT 40
is very inefficient because MySQL will process ALL the records in the table performing a full table scan on all the rows, order them randomly.
Its going to kill the cache because you are expecting a different result set each time. There is no way that you can cache a random set of values. If you want to cache a group of results, cache a large random set of values, and then within sub sections of the time that you are going to use those values do a random grab within the smaller set [outside of sql].
I think the better way is to download product identifiers to your middle layer, choose random 40 values when you need (once per hour or for every request) and use them in the query: product_id in (#id_1, #id_2, ..., #id_40).
you may have a column with random values that you update every hour.
This is going to be a significantly nasty query if it needs to sort a large data set into a random order (which really does require a sort), then discard all but the first 40 records.
A better solution would be to just pick 40 random records. There are lots of ways of doing this and it usually depends on having keys which are evenly distributed.
Another option is to pick the 40 random records in a batch job which is only run once per hour (or whatever) and then remember which ones they are.
One way to achieve it is to shuffle the objects you map the data to. If you don't map the data to objects, you could shuffle the result array from the database. I don't know if this will perform better or not, but you will at least get the benefits from the query cache as you mention.
You could also generate a random sequence from 1 to n, and index the result array (or object array) with those.
calculate the current hour in your PHP code and pass that to your query. this will result in a static value that can be cached.
note that you might also have a hidden bug. since you're only taking the hour, you only have 24 different values, which will repeat every day. which means that what's showing at 1 pm today will also be the same as what shows tomorrow at 6. you might want to change that.
Don't fight with the cache-- expoit it!
Write your query as you are (or even simpler). Then, in your code, cache the results, setting a cache expiry for 1 hour. If you are using a caching layer, like memcached, you are set. If not, you can build a fairly simple one:
[pseudocode]
global cache[24]
h = Time.hour
if (cache[h] == null) {
cache[h] = .. run your query
}
return cache[h];
If you only need a new set of random data once an hour, don't hit the database - save the results to your application's caching layer (or, if it doesn't have one, just put it out into a temporary file of some sort). Query cache is handy, but if you never need to even execute a query, even better...