MySQL pagination without double-querying? - mysql

I was wondering if there was a way to get the number of results from a MySQL query, and at the same time limit the results.
The way pagination works (as I understand it) is to first do something like:
query = SELECT COUNT(*) FROM `table` WHERE `some_condition`
After I get the num_rows(query), I have the number of results. But then to actually limit my results, I have to do a second query:
query2 = SELECT COUNT(*) FROM `table` WHERE `some_condition` LIMIT 0, 10
Is there any way to both retrieve the total number of results that would be given, AND limit the results returned in a single query? Or are there any other efficient ways of achieving this?

I almost never do two queries.
Simply return one more row than is needed, only display 10 on the page, and if there are more than are displayed, display a "Next" button.
SELECT x, y, z FROM `table` WHERE `some_condition` LIMIT 0, 11
// Iterate through and display 10 rows.
// if there were 11 rows, display a "Next" button.
Your query should return in the order of most relevant first, chances are most people aren't going to care about going to page 236 out of 412.
When you do a google search and your results aren't on the first page, you likely go to page two, not nine.

No, that's how many applications that want to paginate have to do it. It's reliable and bullet-proof, albeit it makes the query twice, but you can cache the count for a few seconds and that will help a lot.
The other way is to use SQL_CALC_FOUND_ROWS clause and then call SELECT FOUND_ROWS(). Apart from the fact you have to put the FOUND_ROWS() call afterwards, there is a problem with this: there is a bug in MySQL that this tickles which affects ORDER BY queries making it much slower on large tables than the naive approach of two queries.

Another approach to avoiding double-querying is to fetch all the rows for the current page using a LIMIT clause first, then only do a second COUNT(*) query if the maximum number of rows were retrieved.
In many applications, the most likely outcome will be that all of the results fit on one page, and having to do pagination is the exception rather than the norm. In these cases, the first query will not retrieve the maximum number of results.
For example, answers on a Stackoverflow question rarely spill onto a second page. Comments on an answer rarely spill over the limit of 5 or so required to show them all.
So in these applications you can simply just do a query with a LIMIT first, and then as long as that limit is not reached, you know exactly how many rows there are without the need to do a second COUNT(*) query - which should cover the majority of situations.

In most situations it is much faster and less resource intensive to do it in two separate queries than to do it in one, even though that seems counter-intuitive.
If you use SQL_CALC_FOUND_ROWS, then for large tables it makes your query much slower, significantly slower even than executing two queries, the first with a COUNT(*) and the second with a LIMIT. The reason for this is that SQL_CALC_FOUND_ROWS causes the LIMIT clause to be applied after fetching the rows instead of before, so it fetches the entire row for all possible results before applying the limits. This can't be satisfied by an index because it actually fetches the data.
If you take the two queries approach, the first one only fetching COUNT(*) and not actually fetching and actual data, this can be satisfied much more quickly because it can usually use indexes and doesn't have to fetch the actual row data for every row it looks at. Then, the second query only needs to look at the first $offset + $limit rows and then return.
This post from the MySQL performance blog explains this further:
http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/
For more information on optimising pagination, check this post and this post.

For anyone looking for an answer in 2020. As per MySQL documentation:
The SQL_CALC_FOUND_ROWS query modifier and accompanying FOUND_ROWS() function are deprecated as of MySQL 8.0.17 and will be removed in a future MySQL version. As a replacement, considering executing your query with LIMIT, and then a second query with COUNT(*) and without LIMIT to determine whether there are additional rows.
I guess that settles that.

My answer may be late, but you can skip the second query (with the limit) and just filter the info through your back end script. In PHP for instance, you could do something like:
if($queryResult > 0) {
$counter = 0;
foreach($queryResult AS $result) {
if($counter >= $startAt AND $counter < $numOfRows) {
//do what you want here
}
$counter++;
}
}
But of course, when you have thousands of records to consider, it becomes inefficient very fast. Pre-calculated count maybe a good idea to look into.
Here's a good read on the subject:
http://www.percona.com/ppc2009/PPC2009_mysql_pagination.pdf

SELECT col, col2, (SELECT COUNT(*) FROM `table`) / 10 AS total FROM `table` WHERE `some_condition` LIMIT 0, 10
Where 10 is the page size and 0 is the page number, you need to use pageNumber - 1 in the query.

You can reuse most of the query in a subquery and set it to an identifier. For example a movie query that finds movies containing the letter 's' ordering by runtime would look like this on my site.
SELECT Movie.*, (
SELECT Count(1) FROM Movie
INNER JOIN MovieGenre
ON MovieGenre.MovieId = Movie.Id AND MovieGenre.GenreId = 11
WHERE Title LIKE '%s%'
) AS Count FROM Movie
INNER JOIN MovieGenre
ON MovieGenre.MovieId = Movie.Id AND MovieGenre.GenreId = 11
WHERE Title LIKE '%s%' LIMIT 8;
Do note that I'm not a database expert, and am hoping someone will be able to optimize that a bit better. As it stands running it straight from the SQL command line interface they both take ~0.02 seconds on my laptop.

SELECT *
FROM table
WHERE some_condition
ORDER BY RAND()
LIMIT 0, 10

Related

SQL get result and number of rows in the result with LIMIT

I have a large database in which I use LIMIT in order not to fetch all the results of the query every time (It is not necessary). But I have an issue: I need to count the number of results. The dumbest solution is the following and it works:
We just get the data that we need:
SELECT * FROM table_name WHERE param > 3 LIMIT 10
And then we find the length:
SELECT COUNT(1) FROM table_name WHERE param > 3 LIMIT 10
But this solution bugs me because unlike the query in question, the one that I work with is complex and you have to basically run it twice to achieve the result.
Another dumb solution for me was to do:
SELECT COUNT(1), param, anotherparam, additionalparam FROM table_name WHERE param > 3 LIMIT 10
But this results in only one row. At this point I will be ok if it would just fill the count row with the same number, I just need this information without wasting computation time.
Is there a better way to achieve this?
P.S. By the way, I am not looking to get 10 as the result of COUNT, I need the length without LIMIT.
You should (probably) run the query twice.
MySQL does have a FOUND_ROWS() function that reports the number of rows matched before the limit. But using this function is often worse for performance than running the query twice!
https://www.percona.com/blog/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/
...when we have appropriate indexes for WHERE/ORDER clause in our query, it is much faster to use two separate queries instead of one with SQL_CALC_FOUND_ROWS.
There are exceptions to every rule, of course. If you don't have an appropriate index to optimize the query, it could be more costly to run the query twice. The only way to be sure is to repeat the tests shown in that blog, using your data and your query on your server.
This question is very similar to: How can I count the numbers of rows that a MySQL query returned?
See also: https://mariadb.com/kb/en/found_rows/
This is probably the most efficient solution to your problem, but it's best to test it using EXPLAIN with a reasonably sized dataset.

Limiting search result items

I want to optimize my paginated search result page
For example I have 100millions post to search. and user just type "a". It will take very long to search all of that because we use SQL_CALC_FOUND_ROWS for pagiation purposes
The fact is that there is no need to search all milions of rows (posts) and the answer "1000+" is enough for users. So we need to stop search after we found 1000 results.
We want to show information like this to user:
Showing 1–10 of 1000+ results
[RESULTS]
Page 1 .... Page 100
How to do this without losing our pagination functionality?
My current query looks maybe something like this:
SELECT SQL_CALC_FOUND_ROWS xxx_posts.ID
FROM xxx_posts
WHERE 1=1
AND (((xxx_posts.post_title LIKE '%a%')
LIMIT 0, 10
Your particular example is too risky to try to speed up. However, for the general case...
SELECT id
FROM xxx_posts
WHERE ...
LIMIT 1000, 1;
If you get a row, then there are at least 1000 rows.
Do not use GROUP BY or ORDER BY unless an index can handle the WHERE and those clauses. Otherwise, they will require fetching all the rows, then sorting before getting to the LIMIT.
Your particular example is risky... Using LIKE with an initial wildcard cannot use an index. If there are not 1000+ matching rows, it check every title in the entire table without satisfying the LIMIT 1000, 1. Nothing saved!
Can you use a FULLTEXT index?
See also MariaDB's setting to limit an individual statement's execution time.

Pagination and how to return all possible options together with database search results?

I'm working on a database containing over 5 million rows.
Question 1.
At the moment I'm doing the following:
SELECT COUNT(*) FROM cars
Count total rows to be returned. The above example is very basic. Queries do get more complex with WHERE clause.
I'm showing 50 rows per page. Using PHP I count total pages and offset based on current page retrieved from PHP $_GET. This gets passed to the following query:
SELECT ID FROM cars ORDER BY ID DESC LIMIT $offset, 50
I fetch all IDs of rows to be displayed in current page put them in a single string.
$ID_list = implode( ',', array_column( $mysqli_fetch, 'ID' ) );
This then gets passed to final query.
SELECT ID, make, model, year, price FROM cars WHERE ID IN ($ID_list)
Performance wise I find that passing IDs to third query is up to 8 times faster than just selecting all required columns in second query.
What is the most efficient way to paginate results while displaying total rows count and page numbers. While OFFSET, LIMIT pagination is not efficient, using seek method is not possible to display page numbers. Is there an alternative method? Maybe I should look into technologies other than MySQLi?
Question 2.
What is the best approach in displaying all possible search results of returned data?
https://www.autotrader.co.uk/car-search?advertClassification=standard&postcode=B4%206TB&onesearchad=Used&onesearchad=Nearly%20New&onesearchad=New&advertising-location=at_cars&is-quick-search=TRUE&page=1
The search in the website above starts with no filters applied. Now I can click on for example, Make and it shows a number of possible results next to car brand name. Same goes for every other option. How is this achieved?
Question 1's issues and solution is discussed in http://mysql.rjweb.org/doc.php/pagination
That strongly recommends "remember where you left off" instead of OFFSET, providing a significant performance improvement. It gets rid of $ID_list and lets you do the two SELECTs as one (which is another performance benefit). (Your 8x improvement was due the combination of selecting multiple columns and skipping over rows (OFFSET).)
Question 2 is more difficult since you want to do multiple counts. Try usingGROUP BY and COUNT(*) to get all the counts in a single query. The risk is that this might involve so much data (eg, all 5M rows) that it takes "too long". In the few cases where a "covering" index is available, it might not be "too long".
You could do big group-bys every night -- counts by make and no filtering, counts by model-year and no filtering, etc. Store those in a table for quickly fetching. Once you add filtering, the complexity makes this impractical. Note: doing such a nightly tally implies that you analyze the user's request in order to tailor the SELECT.
Even the count-how-many-row-we-are-about-to-page-through (of Question 1) may be too costly.
See this for how to segregate the "common" attributes from the "rare" ones: http://mysql.rjweb.org/doc.php/eav . That leads to having several composite queries of 2-3 columns in order to handle most of the SELECT from people with random filtering criteria.
Keep the table size down by using minimal datatypes. Model_year could use a 2-byte YEAR datatype. An auto_inc for 5M cars could use a 3-byte MEDIUMINT UNSIGNED (16M limit).
Normalization (replacing a long string with a short id) saves space, but is likely to cost too much when the queries filter on multiple criteria. Eg: make = 'Ford' AND model = 'F150'.
AND is relatively easy to optimize in a WHERE clause; IN is worse and OR is even worse. For some of the IN and OR cases, you may need to resort to UNION to rid of such. Example:
( SELECT ... WHERE make = 'BMW' )
UNION ALL
( SELECT ... WHERE make = 'Audi' )
There will be a number of other cases where you really need to "construct" the query in your app code, not simply hope that MySQL can do something optimal.
The above UNION does not allow for pagination; see my links on how to deal with such.

How to execute query in and get count of total elements without running twice with Rails 4 and MySQL

I´m running a cost-time query in MySQL and Rails. This query is created dynamically and it also manages pagination with LIMIT and OFFSET. This is a summarized example:
SELECT fields
FROM tables
WHERE conditions
ORDER BY order DESC LIMIT ? OFFSET ?
I would also like to get the total count of elements, but I would like to avoid run the query twice for performance purposes. I don´t think is possible, but maybe you surprise me :)
Currently, I have something like:
objects = Object.find_by_sql(query)
totalCount = objects.count
But, of course, this is always returning the limit count.
Because you're using pagination and offsetting, you're not going to get a complete result. You can either run the two separate queries, or you can pull the complete dataset and then filter for pagination. The first option is likely to be faster, especially as your dataset grows.
To improve performance you'd getter better results looking at a caching strategy at various points. Without knowing when the data changes I can't offer any specific advice.
Edit 1: Expanding for Clarification
It might help to explain why this is the case. When you put into place the limit and offset manually, Rails knows nothing about the data not returned by the query. So without having that data available, it's definitionally impossible to make Rails aware of the total count.
That said, a simple COUNT aggregation should always be very fast to execute. If you're seeing speed issues with the execution of that query you'll want to make sure you have the right indexes in place, and that Rails is rendering the conditions in an optimal format.
MySQL? Why not to use SQL_CALC_FOUND_ROWS with FOUND_ROWS() ?
With two queries: (the second query will not hit the database)
SELECT SQL_CALC_FOUND_ROWS * FROM users LIMIT 0,5;
SELECT COUNT(FOUND_ROWS()) AS rows_count FROM users;
But one advise: you must test it. This might be slower or faster than two queries, it depends on some factors, like caching, engine, indexes, etc...
https://dev.mysql.com/doc/refman/5.7/en/information-functions.html#function_found-rows
Is this possible to get total number of rows count with offset limit
To Count the records just add one query as a column to your dynamically created query.
Check this:
SELECT fields,(SELECT count(*) FROM tables WHERE conditions) as obj_count
FROM tables
WHERE conditions
ORDER BY order DESC LIMIT ? OFFSET ?
Using MySQL session variables(starting with symbol #) we can write more efficient query.
SELECT fields,#count as obj_count
FROM tables,(SELECT #count:=count(*) FROM tables WHERE conditions) as sub
WHERE conditions
ORDER BY order DESC LIMIT ? OFFSET ?
This is a bit late, but try using objects.length. Length will count what you already have in the array.

MySQL matching query performance on large dataset

Does anyone have experience with query of the form
select * from TableX where columnY = 'Z' limit some_limits
For example:
select * from Topics where category_id = 100
Here columnY is indexed, but it's not the primary key. ColumnY = Z could return an unpredictable number of rows (from zero to a few thousands).
I only wonder the case of quite large dataset, for example, more than 10 millions items in TableX. What is the performance of such query?
A little detail about the performance should be nice (I mean specific big-O analysis, for example).
It depends upon the records found. If your query return a large number of records it may take time to load the browswer. And even larger return could make your browser unresponsive. But this is how you execute the query. The better solution for such problems could be limiting the query as you did with relevant limits. Further more instead of limiting manually you may use limit with loop till certain index and again start from following index in the case of programming. I am answering with the context of programming. Hope this answers your question