SQL IN Query performance - better split it or not - mysql

I get up to 1000 id's from another server to display them for visitors so I have to use the IN query like:
SELECT * FROM `table` WHERE `id` IN (23221, 42422, 2342342....) // and so on, up to 1000
Let's say 1/3 of the visitors will watch though all of the 1000 id's while 2/3 of the them will only watch the first 50.
What would be a better for performance/workload, one query for all the 1000 id's or split them into like 20 queries so 50 id's each? So when the first 50 have been watched, query for the next 50 etc.
EDIT:
I don't need to use LIMIT when spliting, which means the id's in the query would be 50 max. So what's better, one query with 1000 id's at once or 20 queries each 50 id's?
EDIT:
Ok I ask it shortly and more directly: are 1000 id's in one query not too much? I have read here How to optimize an SQL query with many thousands of WHERE clauses that tons of WHERE/OR are bad??

Let's say 1/3 of the visitors will watch though all of the 1000 id's while 2/3 of the them will only watch the first 50.
Since you want to optimize your response as you assumed how visitors will treat it.
What would be a better for performance/workload, one query for all the 1000 id's or split them into like 20 queries so 50 id's each? So when the first 50 have been watched, query for the next 50 etc.
Yes, you are correct you should limit the return response.
This is one example of how you can implement your requirement (I don't know much mysql but this is how you could get desired result).
SELECT * FROM `table` WHERE `id` IN (23221, 42422, 2342342....)
order by `id`
LIMIT 10 OFFSET 10
if it was SQL SERVER:
create stored proc sp_SomeName
#id varchar(8000)
#skip int,
#take int
as
begin
SELECT * FROM some_table WHERE id IN (23221, 42422, 2342342....)
order by id
OFFSET #skip ROWS --if 0 then start selecting from 0
FETCH NEXT #take ROWS ONLY --if 10 then this is the max returning limit
end
what above query will do is : It will get all the data of the posted ids, then it will order by id in ascending order. Then from their it will choose just first 10/50/100, next time, it will choose the next 10/50/100 or whatever your take choice is and skip choice is. Hope this helps man :)

You can look at the answer provided here:
MySQL Data - Best way to implement paging?
With the LIMIT statement you can return only a portion of the result. And by changing the parameters in the LIMIT statement, you can re-use the query.
Do know that unless you use an 'ORDER BY', an SQL server does not always return the same records. In other words, should a record by unavailable to read due to an update that occurs, while the database-server can read the next record, it will fetch the next record (to give a result as soon as possible). I do not know for sure if the LIMIT forces a database-server to take some sort of order into consideration (I am not that familiar with MySql).

Related

Optimize LIMIT the number of rows to be SELECT in SQL

Consider a table Test having 1000 rows
Test Table
id name desc
1 Adi test1
2 Sam test2
3 Kal test3
.
.
1000 Jil test1000
If i need to fetch, say suppose 100 rows(i.e. a small subset) only, then I am using LIMIT clause in my query
SELECT * FROM test LIMIT 100;
This query first fetches 1000 rows and then returns 100 out of it.
Can this be optimised, such that the DB engine queries only 100 rows and returns them
(instead of fetching all 1000 rows first and then returning 100)
Reason for above supposition is that the order of processing will be
FROM
WHERE
SELECT
ORDER BY
LIMIT
You can combine LIMIT ROW COUNT with an ORDER BY, This causes MySQL to stop sorting as soon as it has found the first ROW COUNT rows of the sorted result.
Hope this helps, If you need any clarification just drop a comment.
The query you wrote will fetch only 100 rows, not 1000. But, if you change that query in any way, my statement may be wrong.
GROUP BY and ORDER BY are likely to incur a sort, which is arguably even slower than a full table scan. And that sort must be done before seeing the LIMIT.
Well, not always...
SELECT ... FROM t ORDER BY x LIMIT 100;
together with INDEX(x) -- This may use the index and fetch only 100 rows from the index. BUT... then it has to reach into the data 100 times to find the other columns that you ask for. UNLESS you only ask for x.
Etc, etc.
And here's another wrinkle. A lot of questions on this forum are "Why isn't MySQL using my index?" Back to your query. If there are "only" 1000 rows in your table, my example with the ORDER BY x won't use the index because it is faster to simply read through the table, tossing 90% of the rows. On the other hand, if there were 9999 rows, then it would use the index. (The transition is somewhere around 20%, but it that is imprecise.)
Confused? Fine. Let's discuss one query at a time. I can [probably] discuss the what and why of each one you throw at me. Be sure to include SHOW CREATE TABLE, the full query, and EXPLAIN SELECT... That way, I can explain what EXPLAIN tells you (or does not).
Did you know that having both a GROUP BY and ORDER BY may cause the use of two sorts? EXPLAIN won't point that out. And sometimes there is a simple trick to get rid of one of the sorts.
There are a lot of tricks up MySQL's sleeve.

MySql Big Data With Multiple Select [duplicate]

My iPhone application connects to my PHP web service to retrieve data from a MySQL database, a request can return up to 500 results.
What is the best way to implement paging and retrieve 20 items at a time?
Let's say I receive the first 20 entries from my database, how can I now request the next 20 entries?
From the MySQL documentation:
The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. LIMIT takes one or two numeric arguments, which must both be nonnegative integer constants (except when using prepared statements).
With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return. The offset of the initial row is 0 (not 1):
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
To retrieve all rows from a certain offset up to the end of the result set, you can use some large number for the second parameter. This statement retrieves all rows from the 96th row to the last:
SELECT * FROM tbl LIMIT 95,18446744073709551615;
With one argument, the value specifies the number of rows to return from the beginning of the result set:
SELECT * FROM tbl LIMIT 5; # Retrieve first 5 rows
In other words, LIMIT row_count is equivalent to LIMIT 0, row_count.
For 500 records efficiency is probably not an issue, but if you have millions of records then it can be advantageous to use a WHERE clause to select the next page:
SELECT *
FROM yourtable
WHERE id > 234374
ORDER BY id
LIMIT 20
The "234374" here is the id of the last record from the prevous page you viewed.
This will enable an index on id to be used to find the first record. If you use LIMIT offset, 20 you could find that it gets slower and slower as you page towards the end. As I said, it probably won't matter if you have only 200 records, but it can make a difference with larger result sets.
Another advantage of this approach is that if the data changes between the calls you won't miss records or get a repeated record. This is because adding or removing a row means that the offset of all the rows after it changes. In your case it's probably not important - I guess your pool of adverts doesn't change too often and anyway no-one would notice if they get the same ad twice in a row - but if you're looking for the "best way" then this is another thing to keep in mind when choosing which approach to use.
If you do wish to use LIMIT with an offset (and this is necessary if a user navigates directly to page 10000 instead of paging through pages one by one) then you could read this article about late row lookups to improve performance of LIMIT with a large offset.
Define OFFSET for the query. For example
page 1 - (records 01-10): offset = 0, limit=10;
page 2 - (records 11-20) offset = 10, limit =10;
and use the following query :
SELECT column FROM table LIMIT {someLimit} OFFSET {someOffset};
example for page 2:
SELECT column FROM table
LIMIT 10 OFFSET 10;
There's literature about it:
Optimized Pagination using MySQL, making the difference between counting the total amount of rows, and pagination.
Efficient Pagination Using MySQL, by Yahoo Inc. in the Percona Performance Conference 2009. The Percona MySQL team provides it also as a Youtube video: Efficient Pagination Using MySQL (video),
The main problem happens with the usage of large OFFSETs. They avoid using OFFSET with a variety of techniques, ranging from id range selections in the WHERE clause, to some kind of caching or pre-computing pages.
There are suggested solutions at Use the INDEX, Luke:
"Paging Through Results".
"Pagination done the right way".
This tutorial shows a great way to do pagination.
Efficient Pagination Using MySQL
In short, avoid to use OFFSET or large LIMIT
you can also do
SELECT SQL_CALC_FOUND_ROWS * FROM tbl limit 0, 20
The row count of the select statement (without the limit) is captured in the same select statement so that you don't need to query the table size again.
You get the row count using SELECT FOUND_ROWS();
Query 1: SELECT * FROM yourtable WHERE id > 0 ORDER BY id LIMIT 500
Query 2: SELECT * FROM tbl LIMIT 0,500;
Query 1 run faster with small or medium records, if number of records equal 5,000 or higher, the result are similar.
Result for 500 records:
Query1 take 9.9999904632568 milliseconds
Query2 take 19.999980926514 milliseconds
Result for 8,000 records:
Query1 take 129.99987602234 milliseconds
Query2 take 160.00008583069 milliseconds
Here's how I'm solving this problem using node.js and a MySQL database.
First, lets declare our variables!
const
Key = payload.Key,
NumberToShowPerPage = payload.NumberToShowPerPage,
Offset = payload.PageNumber * NumberToShowPerPage;
NumberToShowPerPage is obvious, but the offset is the page number.
Now the SQL query...
pool.query("SELECT * FROM TableName WHERE Key = ? ORDER BY CreatedDate DESC LIMIT ? OFFSET ?", [Key, NumberToShowPerPage, Offset], (err, rows, fields) => {}));
I'll break this down a bit.
Pool, is a pool of MySQL connections. It comes from mysql node package module. You can create a connection pool using mysql.createPool.
The ?s are replaced by the variables in the array [PageKey, NumberToShow, Offset] in sequential order. This is done to prevent SQL injection.
See at the end were the () => {} is? That's an arrow function. Whatever you want to do with the data, put that logic between the braces.
Key = ? is something I'm using to select a certain foreign key. You would likely remove that if you don't use foreign key constraints.
Hope this helps.
If you are wanting to do this in a stored procedure you can try this
SELECT * FROM tbl limit 0, 20.
Unfortunately using formulas doesn't work so you can you execute a prepared statement or just give the begin and end values to the procedure.

MySQL Limit Event Happens On When

I got one question over here towards MySQL Limit.
let's say i got one table with 100 rows
then after i done query operation (SELECT, WHERE, etc etc)
then i limit the size with LIMIT(10)
in this case the MySQL is retrieving the 100 rows records first then only cut to size 10 OR count the result size until 10 then stop retrieving the remaining already?
Let's think about this logically, and maybe the answer will become evident. Imagine you are using the following query:
SELECT someCol
FROM yourTable
ORDER BY someCol
LIMIT 10
It should be intuitive that MySQL has to know the ordinal position of every record in the result set in order to be able to guarantee that the 10 records returned are in fact the first 10 records of what the entire result set would be.
If MySQL were to just take the first 10 records which it hit during the scan, then in general it could not guarantee that the records returned respect the ordering you specified.

Is it a bad idea to store row count and number of row to speed up pagination?

My website has more than 20.000.000 entries, entries have categories (FK) and tags (M2M). As for query even like SELECT id FROM table ORDER BY id LIMIT 1000000, 10 MySQL needs to scan 1000010 rows, but that is really unacceptably slow (and pks, indexes, joins etc etc don't help much here, still 1000010 rows). So I am trying to speed up pagination by storing row count and row number with triggers like this:
DELIMITER //
CREATE TRIGGER #trigger_name
AFTER INSERT
ON entry_table FOR EACH ROW
BEGIN
UPDATE category_table SET row_count = (#rc := row_count + 1)
WHERE id = NEW.category_id;
NEW.row_number_in_category = #rc;
END //
And then I can simply:
SELECT *
FROM entry_table
WHERE row_number_in_category > 10
ORDER BY row_number_in_category
LIMIT 10
(now only 10 rows scanned and therefore selects are blazing fast, although inserts are slower, but they are rare comparing to selects, so it is ok)
Is it a bad approach and are there any good alternatives?
Although I like the solution in the question. It may present some issues if data in the entry_table is changed - perhaps deleted or assigned to different categories over time.
It also limits the ways in which the data can be sorted, the method assumes that data is only sorted by the insert order. Covering multiple sort methods requires additional triggers and summary data.
One alternate way of paginating is to pass in offset of the field you are sorting/paginating by instead of an offset to the limit parameter.
Instead of this:
SELECT id FROM table ORDER BY id LIMIT 1000000, 10
Do this - assuming in this scenario that the last result viewed had an id of 1000000.
SELECT id FROM table WHERE id > 1000000 ORDER BY id LIMIT 0, 10
By tracking the offset of the pagination, this can be passed to subsequent queries for data and avoids the database sorting rows that are not ever going to be part of the end result.
If you really only wanted 10 rows out of 20million, you could go further and guess that the next 10 matching rows will occur in the next 1000 overall results. Perhaps with some logic to repeat the query with a larger allowance if this is not the case.
SELECT id FROM table WHERE id BETWEEN 1000000 AND 1001000 ORDER BY id LIMIT 0, 10
This should be significantly faster because the sort will probably be able to limit the result in a single pass.

MySQL Data - Best way to implement paging?

My iPhone application connects to my PHP web service to retrieve data from a MySQL database, a request can return up to 500 results.
What is the best way to implement paging and retrieve 20 items at a time?
Let's say I receive the first 20 entries from my database, how can I now request the next 20 entries?
From the MySQL documentation:
The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. LIMIT takes one or two numeric arguments, which must both be nonnegative integer constants (except when using prepared statements).
With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return. The offset of the initial row is 0 (not 1):
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
To retrieve all rows from a certain offset up to the end of the result set, you can use some large number for the second parameter. This statement retrieves all rows from the 96th row to the last:
SELECT * FROM tbl LIMIT 95,18446744073709551615;
With one argument, the value specifies the number of rows to return from the beginning of the result set:
SELECT * FROM tbl LIMIT 5; # Retrieve first 5 rows
In other words, LIMIT row_count is equivalent to LIMIT 0, row_count.
For 500 records efficiency is probably not an issue, but if you have millions of records then it can be advantageous to use a WHERE clause to select the next page:
SELECT *
FROM yourtable
WHERE id > 234374
ORDER BY id
LIMIT 20
The "234374" here is the id of the last record from the prevous page you viewed.
This will enable an index on id to be used to find the first record. If you use LIMIT offset, 20 you could find that it gets slower and slower as you page towards the end. As I said, it probably won't matter if you have only 200 records, but it can make a difference with larger result sets.
Another advantage of this approach is that if the data changes between the calls you won't miss records or get a repeated record. This is because adding or removing a row means that the offset of all the rows after it changes. In your case it's probably not important - I guess your pool of adverts doesn't change too often and anyway no-one would notice if they get the same ad twice in a row - but if you're looking for the "best way" then this is another thing to keep in mind when choosing which approach to use.
If you do wish to use LIMIT with an offset (and this is necessary if a user navigates directly to page 10000 instead of paging through pages one by one) then you could read this article about late row lookups to improve performance of LIMIT with a large offset.
Define OFFSET for the query. For example
page 1 - (records 01-10): offset = 0, limit=10;
page 2 - (records 11-20) offset = 10, limit =10;
and use the following query :
SELECT column FROM table LIMIT {someLimit} OFFSET {someOffset};
example for page 2:
SELECT column FROM table
LIMIT 10 OFFSET 10;
There's literature about it:
Optimized Pagination using MySQL, making the difference between counting the total amount of rows, and pagination.
Efficient Pagination Using MySQL, by Yahoo Inc. in the Percona Performance Conference 2009. The Percona MySQL team provides it also as a Youtube video: Efficient Pagination Using MySQL (video),
The main problem happens with the usage of large OFFSETs. They avoid using OFFSET with a variety of techniques, ranging from id range selections in the WHERE clause, to some kind of caching or pre-computing pages.
There are suggested solutions at Use the INDEX, Luke:
"Paging Through Results".
"Pagination done the right way".
This tutorial shows a great way to do pagination.
Efficient Pagination Using MySQL
In short, avoid to use OFFSET or large LIMIT
you can also do
SELECT SQL_CALC_FOUND_ROWS * FROM tbl limit 0, 20
The row count of the select statement (without the limit) is captured in the same select statement so that you don't need to query the table size again.
You get the row count using SELECT FOUND_ROWS();
Query 1: SELECT * FROM yourtable WHERE id > 0 ORDER BY id LIMIT 500
Query 2: SELECT * FROM tbl LIMIT 0,500;
Query 1 run faster with small or medium records, if number of records equal 5,000 or higher, the result are similar.
Result for 500 records:
Query1 take 9.9999904632568 milliseconds
Query2 take 19.999980926514 milliseconds
Result for 8,000 records:
Query1 take 129.99987602234 milliseconds
Query2 take 160.00008583069 milliseconds
Here's how I'm solving this problem using node.js and a MySQL database.
First, lets declare our variables!
const
Key = payload.Key,
NumberToShowPerPage = payload.NumberToShowPerPage,
Offset = payload.PageNumber * NumberToShowPerPage;
NumberToShowPerPage is obvious, but the offset is the page number.
Now the SQL query...
pool.query("SELECT * FROM TableName WHERE Key = ? ORDER BY CreatedDate DESC LIMIT ? OFFSET ?", [Key, NumberToShowPerPage, Offset], (err, rows, fields) => {}));
I'll break this down a bit.
Pool, is a pool of MySQL connections. It comes from mysql node package module. You can create a connection pool using mysql.createPool.
The ?s are replaced by the variables in the array [PageKey, NumberToShow, Offset] in sequential order. This is done to prevent SQL injection.
See at the end were the () => {} is? That's an arrow function. Whatever you want to do with the data, put that logic between the braces.
Key = ? is something I'm using to select a certain foreign key. You would likely remove that if you don't use foreign key constraints.
Hope this helps.
If you are wanting to do this in a stored procedure you can try this
SELECT * FROM tbl limit 0, 20.
Unfortunately using formulas doesn't work so you can you execute a prepared statement or just give the begin and end values to the procedure.