I am trying to select a user at random from a very simple table for the purposes of generating sample data.
The table has just two columns, the integer primary key users_id which has all the values from 1 to 46 inclusive, and uname which is a varchar(60).
The query
select relusers.uname from relusers where relusers.users_id=floor(rand()*46+1);
is returning multiple rows. Perhaps I've been staring at this for too long but I fail to see how the above query could ever return more than one row. floor() returns a single integer which is being compared to the primary key column. Including users_id in the selection shows multiple different IDs being selected. Zero rows as a result I can understand, but multiple? Any ideas?
Your code is returning multiple rows because rand() is evaluated on each row. So, you have the change of multiple matches. And a chance of no matches at all.
You can use your idea, but try it this way:
select relusers.uname
from relusers cross join
(selext #rand := rand()) const
where relusers.users_id = floor(#rand*46+1);
This generates just one random value and hence just one row. But, with only 46 rows, the order by method should perform well enough:
select relusers.uname
from relusers
order by rand()
limit 1;
select * from relusers order by rand() limit 1
Related
I have used this line of code many times and it never gave me rows with ID(primary key) greater than 50.I have 134 rows in test table.My goal is the generation range from 0 to 134.I know that SELECT ROUND(RAND()*134) LIMIT 1 gives a perfect result.
In my case, it does not give me a perfect result
SELECT * FROM `test` WHERE `ID` > (SELECT RAND()*134) LIMIT 1;
It seems you need in:
SELECT test.*
FROM test
CROSS JOIN (SELECT ROUND(RAND()*134) random) rnd
WHERE test.id = rnd.random;
fiddle
Subquery generates one random value within the range, and the row with this id value is selected.
Why your query cannot work as you need?
SELECT * FROM `test` WHERE `ID` > (SELECT RAND()*134) LIMIT 1;
The table has 134 rows. For each row separate random number is generated (i.e. each separate row has now its own separate random value). Then id of each row is compared with the random generated for this row. Of course lower id has higher probability to match then the greater. This is the first logical error. Then LIMIT 1 takes all rows matched and return one indefinite (random) row from all matched. This is second logical error of a query.
I'm currently working on a multi-thread program (in Java) that will need to select random rows in a database, in order to update them. This is working well but I started to encounter some performance issue regarding my SELECT request.
I tried multiple solutions before finding this website :
http://jan.kneschke.de/projects/mysql/order-by-rand/
I tried with the following solution :
SELECT * FROM Table
JOIN (SELECT FLOOR( COUNT(*) * RAND() ) AS Random FROM Table)
AS R ON Table.ID > R.Random
WHERE Table.FOREIGNKEY_ID IS NULL
LIMIT 1;
It selects only one row below the random id number generated. This is working pretty good (an average of less than 100ms per request on 150k rows). But after the process of my program, the FOREIGNKEY_ID will no longer be NULL (it will be updated with some value).
The problem is, my SELECT will "forget" some rows than have an id below the random generated id, and I won't be able to process them.
So I tried to adapt my request, doing this :
SELECT * FROM Table
JOIN (SELECT FLOOR(
(SELECT COUNT(id) FROM Table WHERE FOREIGNKEY_ID IS NULL) * RAND() )
AS Random FROM Table)
AS R ON Table.ID > R.Random
WHERE Table.FOREIGNKEY_ID IS NULL
LIMIT 1;
With that request, no more problems of skipping some rows, but performances are decreasing drastically (an average of 1s per request on 150k rows).
I could simply execute the fast one when I still have a lot of rows to process, and switch to the slow one when it remains just a few rows, but it will be a "dirty" fix in the code, and I would prefer an elegant SQL request that can do the work.
Thank you for your help, please let me know if I'm not clear or if you need more details.
For your method to work more generally, you want max(id) rather than count(*):
SELECT t.*
FROM Table t JOIN
(SELECT FLOOR(MAX(id) * RAND() ) AS Random FROM Table) r
ON t.ID > R.Random
WHERE t.FOREIGNKEY_ID IS NULL
ORDER BY t.ID
LIMIT 1;
The ORDER BY is usually added to be sure that the "next" id is returned. In theory, MySQL could always return the maximum id in the table.
The problem is gaps in ids. And, it is easy to create distributions where you never get a random number . . . say that the four ids are 1, 2, 3, 1000. Your method will never get 1000000. The above will almost always get it.
Perhaps the simplest solution to your problem is to run the first query multiple times until it gets a valid row. The next suggestion would be an index on (FOREIGNKEY_ID, ID), which the subquery can use. That might speed the query.
I tend to favor something more along these lines:
SELECT t.id
FROM Table t
WHERE t.FOREIGNKEY_ID IS NULL AND
RAND() < 1.0 / 1000
ORDER BY RAND()
LIMIT 1;
The purpose of the WHERE clause is to reduce the volume considerable, so the ORDER BY doesn't take much time.
Unfortunately, this will require scanning the table, so you probably won't get responses in the 100 ms range on a 150k table. You can reduce that to an index scan with an index on t(FOREIGNKEY_ID, ID).
EDIT:
If you want a reasonable chance of a uniform distribution and performance that does not increase as the table gets larger, here is another idea, which -- alas -- requires a trigger.
Add a new column to the table called random, which is initialized with rand(). Build an index onrandom`. Then run a query such as:
select t.*
from ((select t.*
from t
where random >= #random
order by random
limit 10
) union all
(select t.*
from t
where random < #random
order by random desc
limit 10
)
) t
order by rand();
limit 1;
The idea is that the subqueries can use the index to choose a set of 20 rows that are pretty arbitrary -- 10 before and after the chosen point. The rows are then sorted (some overhead, which you can control with the limit number). These are randomized and returned.
The idea is that if you choose random numbers, there will be arbitrary gaps and these would make the chosen numbers not quite uniform. However, by taking a larger sample around the value, then the probability of any one value being chosen should approach a uniform distribution. The uniformity would still have edge effects, but these should be minor on a large amount of data.
Your ID's are probably gonna contain gaps. Anything that works with COUNT(*) is not going to be able to find all the ID's.
A table with records with ID's 1,2,3,10,11,12,13 has only 7 records. Doing a random with COUNT(*) will often result in a miss as records 4,5 and 6 donot exist, and it will then pick the nearest ID which is 3. This is not only unbalanced (it will pick 3 far too often) but it will also never pick records 10-13.
To get a fair uniformly distrubuted random selection of records, I would suggest loading the ID's of the table first. Even for 150k rows, loading a set of integer id's will not consume a lot of memory (<1 MB):
SELECT id FROM table;
You can then use a function like Collections.shuffle to randomize the order of the ID's. To get the rest of the data, you can select records one at a time or for example 10 at a time:
SELECT * FROM table WHERE id = :id
Or:
SELECT * FROM table WHERE id IN (:id1, :id2, :id3)
This should be fast if the id column has an index, and it will give you a proper random distribution.
If prepared statement can be used, then this should work:
SELECT #skip := Floor(Rand() * Count(*)) FROM Table WHERE FOREIGNKEY_ID IS NULL;
PREPARE STMT FROM 'SELECT * FROM Table WHERE FOREIGNKEY_ID IS NULL LIMIT ?, 1';
EXECUTE STMT USING #skip;
LIMIT in SELECT statement can be used to skip rows
I am trying to understand how Pagination works in mySQL. Considering I have a lot of data that is to be retrieved based on select query how does adding different columns in the select statement change the pagination?
e.g.
Select name from employee; vs Select name, employeeId from employee;
Will using employeeId in the select field help in retrieving data in more efficient manner even though that field in not required. Adding it as employeeId is indexed.
Thanks
Pagination deals with rows, not columns, and is as simple as a LIMIT clause:
LIMIT 10 OFFSET 10
Would give you 10 rows starting from the 10th record.
There is no implicit pagination in MySQL. If you are looking to implement pagination based on your query results, the LIMIT clause may come handy.
For example:
select name from employee limit a,b;
will return b rows from the table employee. These would be row numbers a+1 through a+b
select name from employee 0, 10
would return rows 1 through 10.
Using any number of columns will not affect the way you limit the number of rows which is normally referred to as pagination.
I have been looking on the web on how to select a random row on big tables, I have found various results, but then I analyzed my data and figured out that the best way for me to go is to count the rows and select a random one of those with LIMIT
While testing I start to wonder why this works:
SET #t = CEIL(RAND()*(SELECT MAX(id) FROM logo));
SELECT id
FROM logo
WHERE
current_status_id=29 AND
logo_type_id=4 AND
active='y' AND
id>=#t
ORDER BY id
LIMIT 1;
and gives random results, but this always returns the same 4 or 5 results ?
SELECT id
FROM logo
WHERE
current_status_id=29 AND
logo_type_id=4 AND
active='y' AND
id>=CEIL(RAND()*(SELECT MAX(id) FROM logo))
ORDER BY id
LIMIT 1;
the table has MANY fields (almost 100) and quite a few indexes. over 14 Million records and counting. When I select a random it is almost NEVER that I have to select it from the table, I always have to select depending on various fields values (all indexed).
Could it be a bug of my MySQL server version (5.6.13-log Source distribution)?
One possibility is that this statement in the documentation:
RAND() in a WHERE clause is re-evaluated every time the WHERE is executed.
is simply not always true. It is true when you do:
where rand() < 0.01
to get an approximate 1% sample of the rows. Perhaps the MySQL optimizer says something like "Oh, I'll evaluate the subquery to get one value back. And, just to be more efficient, I'll multiply that row by rand() before defining the constant."
If I had to guess, that would be the case.
Another possibility is that the data is arranged so the values you are looking for has one row with a large id. Or, it could be that there are lots of rows with small ids at the very beginning, and then a very large gap.
Your method of getting a random row, by the way is not guaranteed to return a result when you are doing filtering. I don't know if that is important to you.
EDIT:
Check to see if this version works as you expect:
SELECT id
FROM logo cross join
(SELECT MAX(id) as maxid FROM logo) c
WHERE current_status_id = 29 AND
logo_type_id = 4 AND
active = 'y' AND
id >= RAND() * maxid
ORDER BY id
LIMIT 1;
If so, the problem is that the max id is being calculated and then there is an extra step of multiplying it by rand() as execution of the query begins.
How can I SELECT the last row in a MySQL table?
I'm INSERTing data and I need to retrieve a column value from the previous row.
There's an auto_increment in the table.
Yes, there's an auto_increment in there
If you want the last of all the rows in the table, then this is finally the time where MAX(id) is the right answer! Kind of:
SELECT fields FROM table ORDER BY id DESC LIMIT 1;
Keep in mind that tables in relational databases are just sets of rows. And sets in mathematics are unordered collections. There is no first or last row; no previous row or next row.
You'll have to sort your set of unordered rows by some field first, and then you are free the iterate through the resultset in the order you defined.
Since you have an auto incrementing field, I assume you want that to be the sorting field. In that case, you may want to do the following:
SELECT *
FROM your_table
ORDER BY your_auto_increment_field DESC
LIMIT 1;
See how we're first sorting the set of unordered rows by the your_auto_increment_field (or whatever you have it called) in descending order. Then we limit the resultset to just the first row with LIMIT 1.
You can combine two queries suggested by #spacepille into single query that looks like this:
SELECT * FROM `table_name` WHERE id=(SELECT MAX(id) FROM `table_name`);
It should work blazing fast, but on INNODB tables it's fraction of milisecond slower than ORDER+LIMIT.
on tables with many rows are two queries probably faster...
SELECT #last_id := MAX(id) FROM table;
SELECT * FROM table WHERE id = #last_id;
Almost every database table, there's an auto_increment column(generally id )
If you want the last of all the rows in the table,
SELECT columns FROM table ORDER BY id DESC LIMIT 1;
OR
You can combine two queries into single query that looks like this:
SELECT columns FROM table WHERE id=(SELECT MAX(id) FROM table);
Make it simply use: PDO::lastInsertId
http://php.net/manual/en/pdo.lastinsertid.php
Many answers here say the same (order by your auto increment), which is OK, provided you have an autoincremented column that is indexed.
On a side note, if you have such field and it is the primary key, there is no performance penalty for using order by versus select max(id). The primary key is how data is ordered in the database files (for InnoDB at least), and the RDBMS knows where that data ends, and it can optimize order by id + limit 1 to be the same as reach the max(id)
Now the road less traveled is when you don't have an autoincremented primary key. Maybe the primary key is a natural key, which is a composite of 3 fields...
Not all is lost, though. From a programming language you can first get the number of rows with
SELECT Count(*) - 1 AS rowcount FROM <yourTable>;
and then use the obtained number in the LIMIT clause
SELECT * FROM orderbook2
LIMIT <number_from_rowcount>, 1
Unfortunately, MySQL will not allow for a sub-query, or user variable in the LIMIT clause
If you want the most recently added one, add a timestamp and select ordered in reverse order by highest timestamp, limit 1. If you want to go by ID, sort by ID. If you want to use the one you JUST added, use mysql_insert_id.
You can use an OFFSET in a LIMIT command:
SELECT * FROM aTable LIMIT 1 OFFSET 99
in case your table has 100 rows this return the last row without relying on a primary_key
Without ID in one query:
SELECT * FROM table_name LIMIT 1 OFFSET (SELECT COUNT(*) - 1 FROM table_name)
SELECT * FROM adds where id=(select max(id) from adds);
This query used to fetch the last record in your table.