Why is "LIMIT 0" even allowed in MySQL SELECT statements? - mysql

Limit 0, 1000 returns the first 1,000 results, but LIMIT 0 returns 0 results.
That's not very intuitive imho. For example, dumb old me thought that removing the 1000 would remove the upper limit to the SELECT query, thus returning all of the results.
Why would anybody even want to query MySQL for 0 results?

From the MySQL documentation
LIMIT 0 quickly returns an empty set. This can be useful for checking the validity of a query. When using one of the MySQL APIs, it can also be employed for obtaining the types of the result columns.

limit 0 can be used to get the same columns types of other tables
create table newtable
select col1 from table1 limit 0;
That way, a hard-coded description of the columns types for newtable is not needed, ensuring that the columns types will still match even if a change occurs in the description of table1 before creating newtable
It also works with a more complete statement, involving indexes, engine, multiple tables, etc
create table newtable (primary key (col1)) engine=memory
select col1,col2,col3 from table1,table2 limit 0;

*Polite corrections are welcomed and appreciated if I am incorrect here, but:
My understanding is that LIMIT 0, 1000 is telling SQL that you want to start with the first set of 1000 results in a given database, for the given criteria. For example, if there are 10,000 resulting rows in a dataset, LIMIT 0, 1000 would show you the first set of 1000 results. The zero is like the index of an array in JavaScript - the code starts ITS count with zero, rather than one, when referencing an array item. So item #1 is actually item #0, item #2 is actually item #1, and so on.

In addition to the answers already given, it is also useful when you want to make operations to a table based on the number of rows present in that table.
I.e. using PHP, if you want to delete all entries except for the one with the greatest id from table "myTable":
<?php
$con = mysqli_connect("hostname", "username", "password", "database"); // Connect
$totalRows = mysqli_query($con, "SELECT COUNT(*) FROM myTable"); // Get total row count
$mysqli_query($con, "DELETE FROM myTable WHERE id >= 0 LIMIT ($totalRows - 1);"); // Delete
?>
It's really useful because if you already only have 1 row left, you'll be left with LIMIT 0, which is what you want.

Related

Fastest way of randomising result with large dataset in mysql

I want to return rows order by random from a table with large number of rows to be scanned
Tried:
1) select * from table order by rand() limit 1
2) select * from table where id in (select id from table order by rand() limit 1)
2 is faster than 1 but still too slow on table with large rows
Update:
Query is used in real time app. Insert, select and update are roughly 10/sec. So caching will not be the ideal solution. Rows required for this specific case is 1. But looking for a general solution as well where query is fast and number of rows required>1
Fastest way is using prepared statement in mysql and limit
select #offset:=floor(rand()*total_rows_in_table);
PREPARE STMT FROM 'select id from table limit ?,1';
EXECUTE STMT USING #offset;
total_rows_in_table= total rows in table.
It is much faster as compared to above two.
Limitation: Fetching more than 1 rows is not truly random.
Generate a random set of IDs before doing the query (you can also get MAX(id) very quickly if you need it). Then do the query as id IN (your, list). This will use the index to look only at the IDs you requested, so it will be very fast.
Limitation: if some of your randomly chosen IDs don't exist, the query will return less results, so you'll need to do these operations in a loop until you have enough results.
If you can run two querys in the same "call" you can do something like this, sadly, this asumes there are no deleted records in your database... if they where some query's would not return anything.
I tested with some local records and the fastest i could do was this... that said i tested it on a table with no deleted rows.
SET #randy = CAST(rand()*(SELECT MAX(id) FROM yourtable) as UNSIGNED);
SELECT *
FROM yourtable
WHERE id = #randy;
Another solution that came from modifying a little the answer to this question, and from your own solution:
Using variables as OFFSET in SELECT statments inside mysql's stored functions
SET #randy = CAST(rand()*(SELECT MAX(id) FROM yourtable) as UNSIGNED);
SET #q1 = CONCAT('SELECT * FROM yourtable LIMIT 1 OFFSET ', #randy);
PREPARE stmt1 FROM #q1;
EXECUTE stmt1;
I imagine a table with, say, a million entries. You want to pick a row randomly, so you generate one random number per row, i.e. a million random numbers, and then seek the row with the minimum generated number. There are two tasks involved:
generating all those numbers
finding the minimum number
and then accessing the record of course.
If you wanted more than one row, the DBMS could sort all records and then return n records, but hopefully it would rather apply some part-sort operation where it only detects the n minimum numbers. Quite some task anyway.
There is no thorough way to circumvent this, I guess. If you want random access, this is the way to go.
If you would be ready to live with a less random result, however, I'd suggest to make ID buckets. Imagine ID buckets 000000-0999999, 100000-1999999, ... Then randomly choose one bucket and of this pick your random rows. Well, admittedly, this doesn't look very random and you would either get only old or only new records with such buckets; but it illustrates the technique.
Instead of creating the buckets by value, you'd create them with a modulo function. id % 1000 would give you 1000 buckets. The first with IDs xxx000, the second with IDs xxx001. This would solve the new/old records thing and get the buckets balanced. As IDs are a mere technical thing, it doesn't matter at all that the drawn IDs look so similar. And even if that bothers you, then don't make 1000 buckets, but say 997.
Now create a computed column:
alter table mytable add column bucket int generated always as (id % 997) stored;
Add an index:
create index idx on mytable(bucket);
And query the data:
select *
from mytable
where bucket = floor(rand() * 998)
order by rand()
limit 10;
Only about 0.1% of the table gets into the sorting here. So this should be rather fast. But I suppose that only pays with a very large table and a high number of buckets.
Disadvantages of the technique:
It can happen that you don't get as many rows as you want and you'd have to query again then.
You must choose the modulo number wisely. If there are just two thousand records in the table, you wouldn't make 1000 buckets of course, but maybe 100 and never demand more than, say, ten rows at a time.
If the table grows and grows, a once chosen number may no longer be optimal and you might want to alter it.
Rextester link: http://rextester.com/VDPIU7354
UPDATE: It just dawned on me that the buckets would be really random, if the generated column would not be based on a modulo on the ID, but on a RANDvalue instead:
alter table mytable add column bucket int generated always as (floor(rand() * 1000)) stored;
but MySQL throws an error "Expression of generated column 'bucket' contains a disallowed function". This doesn't seem to make sense, as a non-deterministic function should be okay with the STORED option, but at least in version 5.7.12 this doesn't work. Maybe in some later version?

MySql Big Data With Multiple Select [duplicate]

My iPhone application connects to my PHP web service to retrieve data from a MySQL database, a request can return up to 500 results.
What is the best way to implement paging and retrieve 20 items at a time?
Let's say I receive the first 20 entries from my database, how can I now request the next 20 entries?
From the MySQL documentation:
The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. LIMIT takes one or two numeric arguments, which must both be nonnegative integer constants (except when using prepared statements).
With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return. The offset of the initial row is 0 (not 1):
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
To retrieve all rows from a certain offset up to the end of the result set, you can use some large number for the second parameter. This statement retrieves all rows from the 96th row to the last:
SELECT * FROM tbl LIMIT 95,18446744073709551615;
With one argument, the value specifies the number of rows to return from the beginning of the result set:
SELECT * FROM tbl LIMIT 5; # Retrieve first 5 rows
In other words, LIMIT row_count is equivalent to LIMIT 0, row_count.
For 500 records efficiency is probably not an issue, but if you have millions of records then it can be advantageous to use a WHERE clause to select the next page:
SELECT *
FROM yourtable
WHERE id > 234374
ORDER BY id
LIMIT 20
The "234374" here is the id of the last record from the prevous page you viewed.
This will enable an index on id to be used to find the first record. If you use LIMIT offset, 20 you could find that it gets slower and slower as you page towards the end. As I said, it probably won't matter if you have only 200 records, but it can make a difference with larger result sets.
Another advantage of this approach is that if the data changes between the calls you won't miss records or get a repeated record. This is because adding or removing a row means that the offset of all the rows after it changes. In your case it's probably not important - I guess your pool of adverts doesn't change too often and anyway no-one would notice if they get the same ad twice in a row - but if you're looking for the "best way" then this is another thing to keep in mind when choosing which approach to use.
If you do wish to use LIMIT with an offset (and this is necessary if a user navigates directly to page 10000 instead of paging through pages one by one) then you could read this article about late row lookups to improve performance of LIMIT with a large offset.
Define OFFSET for the query. For example
page 1 - (records 01-10): offset = 0, limit=10;
page 2 - (records 11-20) offset = 10, limit =10;
and use the following query :
SELECT column FROM table LIMIT {someLimit} OFFSET {someOffset};
example for page 2:
SELECT column FROM table
LIMIT 10 OFFSET 10;
There's literature about it:
Optimized Pagination using MySQL, making the difference between counting the total amount of rows, and pagination.
Efficient Pagination Using MySQL, by Yahoo Inc. in the Percona Performance Conference 2009. The Percona MySQL team provides it also as a Youtube video: Efficient Pagination Using MySQL (video),
The main problem happens with the usage of large OFFSETs. They avoid using OFFSET with a variety of techniques, ranging from id range selections in the WHERE clause, to some kind of caching or pre-computing pages.
There are suggested solutions at Use the INDEX, Luke:
"Paging Through Results".
"Pagination done the right way".
This tutorial shows a great way to do pagination.
Efficient Pagination Using MySQL
In short, avoid to use OFFSET or large LIMIT
you can also do
SELECT SQL_CALC_FOUND_ROWS * FROM tbl limit 0, 20
The row count of the select statement (without the limit) is captured in the same select statement so that you don't need to query the table size again.
You get the row count using SELECT FOUND_ROWS();
Query 1: SELECT * FROM yourtable WHERE id > 0 ORDER BY id LIMIT 500
Query 2: SELECT * FROM tbl LIMIT 0,500;
Query 1 run faster with small or medium records, if number of records equal 5,000 or higher, the result are similar.
Result for 500 records:
Query1 take 9.9999904632568 milliseconds
Query2 take 19.999980926514 milliseconds
Result for 8,000 records:
Query1 take 129.99987602234 milliseconds
Query2 take 160.00008583069 milliseconds
Here's how I'm solving this problem using node.js and a MySQL database.
First, lets declare our variables!
const
Key = payload.Key,
NumberToShowPerPage = payload.NumberToShowPerPage,
Offset = payload.PageNumber * NumberToShowPerPage;
NumberToShowPerPage is obvious, but the offset is the page number.
Now the SQL query...
pool.query("SELECT * FROM TableName WHERE Key = ? ORDER BY CreatedDate DESC LIMIT ? OFFSET ?", [Key, NumberToShowPerPage, Offset], (err, rows, fields) => {}));
I'll break this down a bit.
Pool, is a pool of MySQL connections. It comes from mysql node package module. You can create a connection pool using mysql.createPool.
The ?s are replaced by the variables in the array [PageKey, NumberToShow, Offset] in sequential order. This is done to prevent SQL injection.
See at the end were the () => {} is? That's an arrow function. Whatever you want to do with the data, put that logic between the braces.
Key = ? is something I'm using to select a certain foreign key. You would likely remove that if you don't use foreign key constraints.
Hope this helps.
If you are wanting to do this in a stored procedure you can try this
SELECT * FROM tbl limit 0, 20.
Unfortunately using formulas doesn't work so you can you execute a prepared statement or just give the begin and end values to the procedure.

How to get a random row from database in more optimized way

Hi i know how to get random row but i need an optimized way, i am migrating huge data from one schema to another in oracle.Other than performing count validation on each table. I am doing random record validation (for any random row , i am checking whether all column values match or not between the databases).I am using
SELECT * FROM (SELECT * FROM ADM_USER ORDER BY dbms_random.value) WHERE rownum = 1;
Before that i was using:
select * from ADM_USER where ADM_USER_ID=(select Round(dbms_random.value(1,max(ADM_USER_ID))) from ADM_USER)
The problem with the latter one is that values in ADM_USER_ID are not contiguous.So most of the times the query returns empty result set. The first one is good but for tables with huge cardinality it takes 6 to 7 seconnds.
Thanks in advance.
For Oracle, look at the SAMPLE clause. The following will look at random 1% of a table
select * from MDSYS.SDO_COORD_REF_SYS sample(1);
You can still add the rownum=1 filter on top of that.

Is it a bad idea to store row count and number of row to speed up pagination?

My website has more than 20.000.000 entries, entries have categories (FK) and tags (M2M). As for query even like SELECT id FROM table ORDER BY id LIMIT 1000000, 10 MySQL needs to scan 1000010 rows, but that is really unacceptably slow (and pks, indexes, joins etc etc don't help much here, still 1000010 rows). So I am trying to speed up pagination by storing row count and row number with triggers like this:
DELIMITER //
CREATE TRIGGER #trigger_name
AFTER INSERT
ON entry_table FOR EACH ROW
BEGIN
UPDATE category_table SET row_count = (#rc := row_count + 1)
WHERE id = NEW.category_id;
NEW.row_number_in_category = #rc;
END //
And then I can simply:
SELECT *
FROM entry_table
WHERE row_number_in_category > 10
ORDER BY row_number_in_category
LIMIT 10
(now only 10 rows scanned and therefore selects are blazing fast, although inserts are slower, but they are rare comparing to selects, so it is ok)
Is it a bad approach and are there any good alternatives?
Although I like the solution in the question. It may present some issues if data in the entry_table is changed - perhaps deleted or assigned to different categories over time.
It also limits the ways in which the data can be sorted, the method assumes that data is only sorted by the insert order. Covering multiple sort methods requires additional triggers and summary data.
One alternate way of paginating is to pass in offset of the field you are sorting/paginating by instead of an offset to the limit parameter.
Instead of this:
SELECT id FROM table ORDER BY id LIMIT 1000000, 10
Do this - assuming in this scenario that the last result viewed had an id of 1000000.
SELECT id FROM table WHERE id > 1000000 ORDER BY id LIMIT 0, 10
By tracking the offset of the pagination, this can be passed to subsequent queries for data and avoids the database sorting rows that are not ever going to be part of the end result.
If you really only wanted 10 rows out of 20million, you could go further and guess that the next 10 matching rows will occur in the next 1000 overall results. Perhaps with some logic to repeat the query with a larger allowance if this is not the case.
SELECT id FROM table WHERE id BETWEEN 1000000 AND 1001000 ORDER BY id LIMIT 0, 10
This should be significantly faster because the sort will probably be able to limit the result in a single pass.

MySQL Data - Best way to implement paging?

My iPhone application connects to my PHP web service to retrieve data from a MySQL database, a request can return up to 500 results.
What is the best way to implement paging and retrieve 20 items at a time?
Let's say I receive the first 20 entries from my database, how can I now request the next 20 entries?
From the MySQL documentation:
The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. LIMIT takes one or two numeric arguments, which must both be nonnegative integer constants (except when using prepared statements).
With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return. The offset of the initial row is 0 (not 1):
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
To retrieve all rows from a certain offset up to the end of the result set, you can use some large number for the second parameter. This statement retrieves all rows from the 96th row to the last:
SELECT * FROM tbl LIMIT 95,18446744073709551615;
With one argument, the value specifies the number of rows to return from the beginning of the result set:
SELECT * FROM tbl LIMIT 5; # Retrieve first 5 rows
In other words, LIMIT row_count is equivalent to LIMIT 0, row_count.
For 500 records efficiency is probably not an issue, but if you have millions of records then it can be advantageous to use a WHERE clause to select the next page:
SELECT *
FROM yourtable
WHERE id > 234374
ORDER BY id
LIMIT 20
The "234374" here is the id of the last record from the prevous page you viewed.
This will enable an index on id to be used to find the first record. If you use LIMIT offset, 20 you could find that it gets slower and slower as you page towards the end. As I said, it probably won't matter if you have only 200 records, but it can make a difference with larger result sets.
Another advantage of this approach is that if the data changes between the calls you won't miss records or get a repeated record. This is because adding or removing a row means that the offset of all the rows after it changes. In your case it's probably not important - I guess your pool of adverts doesn't change too often and anyway no-one would notice if they get the same ad twice in a row - but if you're looking for the "best way" then this is another thing to keep in mind when choosing which approach to use.
If you do wish to use LIMIT with an offset (and this is necessary if a user navigates directly to page 10000 instead of paging through pages one by one) then you could read this article about late row lookups to improve performance of LIMIT with a large offset.
Define OFFSET for the query. For example
page 1 - (records 01-10): offset = 0, limit=10;
page 2 - (records 11-20) offset = 10, limit =10;
and use the following query :
SELECT column FROM table LIMIT {someLimit} OFFSET {someOffset};
example for page 2:
SELECT column FROM table
LIMIT 10 OFFSET 10;
There's literature about it:
Optimized Pagination using MySQL, making the difference between counting the total amount of rows, and pagination.
Efficient Pagination Using MySQL, by Yahoo Inc. in the Percona Performance Conference 2009. The Percona MySQL team provides it also as a Youtube video: Efficient Pagination Using MySQL (video),
The main problem happens with the usage of large OFFSETs. They avoid using OFFSET with a variety of techniques, ranging from id range selections in the WHERE clause, to some kind of caching or pre-computing pages.
There are suggested solutions at Use the INDEX, Luke:
"Paging Through Results".
"Pagination done the right way".
This tutorial shows a great way to do pagination.
Efficient Pagination Using MySQL
In short, avoid to use OFFSET or large LIMIT
you can also do
SELECT SQL_CALC_FOUND_ROWS * FROM tbl limit 0, 20
The row count of the select statement (without the limit) is captured in the same select statement so that you don't need to query the table size again.
You get the row count using SELECT FOUND_ROWS();
Query 1: SELECT * FROM yourtable WHERE id > 0 ORDER BY id LIMIT 500
Query 2: SELECT * FROM tbl LIMIT 0,500;
Query 1 run faster with small or medium records, if number of records equal 5,000 or higher, the result are similar.
Result for 500 records:
Query1 take 9.9999904632568 milliseconds
Query2 take 19.999980926514 milliseconds
Result for 8,000 records:
Query1 take 129.99987602234 milliseconds
Query2 take 160.00008583069 milliseconds
Here's how I'm solving this problem using node.js and a MySQL database.
First, lets declare our variables!
const
Key = payload.Key,
NumberToShowPerPage = payload.NumberToShowPerPage,
Offset = payload.PageNumber * NumberToShowPerPage;
NumberToShowPerPage is obvious, but the offset is the page number.
Now the SQL query...
pool.query("SELECT * FROM TableName WHERE Key = ? ORDER BY CreatedDate DESC LIMIT ? OFFSET ?", [Key, NumberToShowPerPage, Offset], (err, rows, fields) => {}));
I'll break this down a bit.
Pool, is a pool of MySQL connections. It comes from mysql node package module. You can create a connection pool using mysql.createPool.
The ?s are replaced by the variables in the array [PageKey, NumberToShow, Offset] in sequential order. This is done to prevent SQL injection.
See at the end were the () => {} is? That's an arrow function. Whatever you want to do with the data, put that logic between the braces.
Key = ? is something I'm using to select a certain foreign key. You would likely remove that if you don't use foreign key constraints.
Hope this helps.
If you are wanting to do this in a stored procedure you can try this
SELECT * FROM tbl limit 0, 20.
Unfortunately using formulas doesn't work so you can you execute a prepared statement or just give the begin and end values to the procedure.