I have a large database in which I use LIMIT in order not to fetch all the results of the query every time (It is not necessary). But I have an issue: I need to count the number of results. The dumbest solution is the following and it works:
We just get the data that we need:
SELECT * FROM table_name WHERE param > 3 LIMIT 10
And then we find the length:
SELECT COUNT(1) FROM table_name WHERE param > 3 LIMIT 10
But this solution bugs me because unlike the query in question, the one that I work with is complex and you have to basically run it twice to achieve the result.
Another dumb solution for me was to do:
SELECT COUNT(1), param, anotherparam, additionalparam FROM table_name WHERE param > 3 LIMIT 10
But this results in only one row. At this point I will be ok if it would just fill the count row with the same number, I just need this information without wasting computation time.
Is there a better way to achieve this?
P.S. By the way, I am not looking to get 10 as the result of COUNT, I need the length without LIMIT.
You should (probably) run the query twice.
MySQL does have a FOUND_ROWS() function that reports the number of rows matched before the limit. But using this function is often worse for performance than running the query twice!
https://www.percona.com/blog/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/
...when we have appropriate indexes for WHERE/ORDER clause in our query, it is much faster to use two separate queries instead of one with SQL_CALC_FOUND_ROWS.
There are exceptions to every rule, of course. If you don't have an appropriate index to optimize the query, it could be more costly to run the query twice. The only way to be sure is to repeat the tests shown in that blog, using your data and your query on your server.
This question is very similar to: How can I count the numbers of rows that a MySQL query returned?
See also: https://mariadb.com/kb/en/found_rows/
This is probably the most efficient solution to your problem, but it's best to test it using EXPLAIN with a reasonably sized dataset.
Related
I have this query :
SELECT *,(SELECT count(id) FROM riverLikes
WHERE riverLikes.river_id = River.id) as likeCounts
FROM River
WHERE user_id IN (1,2,3)
LIMIT 10
my question is my sub-query runs only 10 time ( foreach row that are fetched ) or it run for every row in the "River" table ?
my "River" has lots of records and i like to have the best performance to get the rivers .
thanks.
In general, calculated data (either subqueries or functions), is calculated for the rows that matter, being rows that are returned, or rows for which the outcome of the calculation is relevant to further filtering or grouping.
In addition, the query optimizer may do all kinds of magic, and it is unlikely that it will run the subquery many times as such. It can be transformed in such a way that all relevant information is fetched at once.
And even if it didn't do that, it all takes place within the same operation in the database SQL engine, so executing this subselect 10 times is way, way faster than executing that subselect as a separate select 10 times, because the SQL engine only has to parse and prepare it once, and doesn't suffer from roundtrip times.
A simple select like that could easily take 30 milliseconds or so when executed from PHP, so quick math would suggest that it'd take 300ms extra to have this subselect in a 10-row query, but that's not the case, because the lion's share of those 30ms is overhead of communication between PHP and the database.
Because of the reasons mentioned above, this subselect is possibly way faster than a join, and it's a common misconception that a join is (almost) always faster.
So, to get back to your example, the subquery won't be executed for all rows in River, but will only be executed, probably in optimized form, for those 10 records of Rivers 1, 2 and 3.
In most production-ready RDBMS's subquery will be run only for rows which included in result set, i.e. only 10 times in your case. I think it is true for mysql too.
EDIT:
For assurance run
EXPLAIN <your query>
And view execution plan of your query
The subquery in the select statement runs one time per row returned, in your sample 10 times
I´m running a cost-time query in MySQL and Rails. This query is created dynamically and it also manages pagination with LIMIT and OFFSET. This is a summarized example:
SELECT fields
FROM tables
WHERE conditions
ORDER BY order DESC LIMIT ? OFFSET ?
I would also like to get the total count of elements, but I would like to avoid run the query twice for performance purposes. I don´t think is possible, but maybe you surprise me :)
Currently, I have something like:
objects = Object.find_by_sql(query)
totalCount = objects.count
But, of course, this is always returning the limit count.
Because you're using pagination and offsetting, you're not going to get a complete result. You can either run the two separate queries, or you can pull the complete dataset and then filter for pagination. The first option is likely to be faster, especially as your dataset grows.
To improve performance you'd getter better results looking at a caching strategy at various points. Without knowing when the data changes I can't offer any specific advice.
Edit 1: Expanding for Clarification
It might help to explain why this is the case. When you put into place the limit and offset manually, Rails knows nothing about the data not returned by the query. So without having that data available, it's definitionally impossible to make Rails aware of the total count.
That said, a simple COUNT aggregation should always be very fast to execute. If you're seeing speed issues with the execution of that query you'll want to make sure you have the right indexes in place, and that Rails is rendering the conditions in an optimal format.
MySQL? Why not to use SQL_CALC_FOUND_ROWS with FOUND_ROWS() ?
With two queries: (the second query will not hit the database)
SELECT SQL_CALC_FOUND_ROWS * FROM users LIMIT 0,5;
SELECT COUNT(FOUND_ROWS()) AS rows_count FROM users;
But one advise: you must test it. This might be slower or faster than two queries, it depends on some factors, like caching, engine, indexes, etc...
https://dev.mysql.com/doc/refman/5.7/en/information-functions.html#function_found-rows
Is this possible to get total number of rows count with offset limit
To Count the records just add one query as a column to your dynamically created query.
Check this:
SELECT fields,(SELECT count(*) FROM tables WHERE conditions) as obj_count
FROM tables
WHERE conditions
ORDER BY order DESC LIMIT ? OFFSET ?
Using MySQL session variables(starting with symbol #) we can write more efficient query.
SELECT fields,#count as obj_count
FROM tables,(SELECT #count:=count(*) FROM tables WHERE conditions) as sub
WHERE conditions
ORDER BY order DESC LIMIT ? OFFSET ?
This is a bit late, but try using objects.length. Length will count what you already have in the array.
OK, let's assume I have a big table with a 1k+ records, and that I need to take three records from it. Now, let's assume there are no records that meet the conditions. By doing a COUNT(*) using the same conditions and then doing a SELECT if the count is greater than zero, am I making my queries faster by making sure there are records available before doing a SELECT, or is this just a waste of time?
That is a tiny table in the overall scheme of things. You should just query for your filtered results directly, and if you need to do something different in your app when no results are returned, just do a check against the number of rows returned to skip trying to work with the result set.
There would never be a case where the COUNT() approach performs better, because it would be doing the same exact query logic you would do on a full select anyway.
When I add LIMIT 1 to a MySQL query, does it stop the search after it finds 1 result (thus making it faster) or does it still fetch all of the results and truncate at the end?
Depending on the query, adding a limit clause can have a huge effect on performance. If you want only one row (or know for a fact that only one row can satisfy the query), and are not sure about how the internal optimizer will execute it (for example, WHERE clause not hitting an index and so forth), then you should definitely add a LIMIT clause.
As for optimized queries (using indexes on small tables) it probably won't matter much in performance, but again - if you are only interested in one row than add a LIMIT clause regardless.
Limit can affect the performance of the query (see comments and the link below) and it also reduces the result set that is output by MySQL. For a query in which you expect a single result there is benefits.
Moreover, limiting the result set can in fact speed the total query time as transferring large result sets use memory and potentially create temporary tables on disk. I mention this as I recently saw a application that did not use limit kill a server due to huge result sets and with limit in place the resource utilization dropped tremendously.
Check this page for more specifics: MySQL Documentation: LIMIT Optimization
The answer, in short, is yes. If you limit your result to 1, then even if you are "expecting" one result, the query will be faster because your database wont look through all your records. It will simply stop once it finds a record that matches your query.
If there is only 1 result coming back, then no, LIMIT will not make it any faster. If there are a lot of results, and you only need the first result, and there is no GROUP or ORDER by statements then LIMIT will make it faster.
If you really only expect one single result, it really makes sense to append the LIMIT to your query. I don't know the inner workings of MySQL, but I'm sure it won't gather a result set of 100'000+ records just to truncate it back to 1 at the end..
I was wondering if there was a way to get the number of results from a MySQL query, and at the same time limit the results.
The way pagination works (as I understand it) is to first do something like:
query = SELECT COUNT(*) FROM `table` WHERE `some_condition`
After I get the num_rows(query), I have the number of results. But then to actually limit my results, I have to do a second query:
query2 = SELECT COUNT(*) FROM `table` WHERE `some_condition` LIMIT 0, 10
Is there any way to both retrieve the total number of results that would be given, AND limit the results returned in a single query? Or are there any other efficient ways of achieving this?
I almost never do two queries.
Simply return one more row than is needed, only display 10 on the page, and if there are more than are displayed, display a "Next" button.
SELECT x, y, z FROM `table` WHERE `some_condition` LIMIT 0, 11
// Iterate through and display 10 rows.
// if there were 11 rows, display a "Next" button.
Your query should return in the order of most relevant first, chances are most people aren't going to care about going to page 236 out of 412.
When you do a google search and your results aren't on the first page, you likely go to page two, not nine.
No, that's how many applications that want to paginate have to do it. It's reliable and bullet-proof, albeit it makes the query twice, but you can cache the count for a few seconds and that will help a lot.
The other way is to use SQL_CALC_FOUND_ROWS clause and then call SELECT FOUND_ROWS(). Apart from the fact you have to put the FOUND_ROWS() call afterwards, there is a problem with this: there is a bug in MySQL that this tickles which affects ORDER BY queries making it much slower on large tables than the naive approach of two queries.
Another approach to avoiding double-querying is to fetch all the rows for the current page using a LIMIT clause first, then only do a second COUNT(*) query if the maximum number of rows were retrieved.
In many applications, the most likely outcome will be that all of the results fit on one page, and having to do pagination is the exception rather than the norm. In these cases, the first query will not retrieve the maximum number of results.
For example, answers on a Stackoverflow question rarely spill onto a second page. Comments on an answer rarely spill over the limit of 5 or so required to show them all.
So in these applications you can simply just do a query with a LIMIT first, and then as long as that limit is not reached, you know exactly how many rows there are without the need to do a second COUNT(*) query - which should cover the majority of situations.
In most situations it is much faster and less resource intensive to do it in two separate queries than to do it in one, even though that seems counter-intuitive.
If you use SQL_CALC_FOUND_ROWS, then for large tables it makes your query much slower, significantly slower even than executing two queries, the first with a COUNT(*) and the second with a LIMIT. The reason for this is that SQL_CALC_FOUND_ROWS causes the LIMIT clause to be applied after fetching the rows instead of before, so it fetches the entire row for all possible results before applying the limits. This can't be satisfied by an index because it actually fetches the data.
If you take the two queries approach, the first one only fetching COUNT(*) and not actually fetching and actual data, this can be satisfied much more quickly because it can usually use indexes and doesn't have to fetch the actual row data for every row it looks at. Then, the second query only needs to look at the first $offset + $limit rows and then return.
This post from the MySQL performance blog explains this further:
http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/
For more information on optimising pagination, check this post and this post.
For anyone looking for an answer in 2020. As per MySQL documentation:
The SQL_CALC_FOUND_ROWS query modifier and accompanying FOUND_ROWS() function are deprecated as of MySQL 8.0.17 and will be removed in a future MySQL version. As a replacement, considering executing your query with LIMIT, and then a second query with COUNT(*) and without LIMIT to determine whether there are additional rows.
I guess that settles that.
My answer may be late, but you can skip the second query (with the limit) and just filter the info through your back end script. In PHP for instance, you could do something like:
if($queryResult > 0) {
$counter = 0;
foreach($queryResult AS $result) {
if($counter >= $startAt AND $counter < $numOfRows) {
//do what you want here
}
$counter++;
}
}
But of course, when you have thousands of records to consider, it becomes inefficient very fast. Pre-calculated count maybe a good idea to look into.
Here's a good read on the subject:
http://www.percona.com/ppc2009/PPC2009_mysql_pagination.pdf
SELECT col, col2, (SELECT COUNT(*) FROM `table`) / 10 AS total FROM `table` WHERE `some_condition` LIMIT 0, 10
Where 10 is the page size and 0 is the page number, you need to use pageNumber - 1 in the query.
You can reuse most of the query in a subquery and set it to an identifier. For example a movie query that finds movies containing the letter 's' ordering by runtime would look like this on my site.
SELECT Movie.*, (
SELECT Count(1) FROM Movie
INNER JOIN MovieGenre
ON MovieGenre.MovieId = Movie.Id AND MovieGenre.GenreId = 11
WHERE Title LIKE '%s%'
) AS Count FROM Movie
INNER JOIN MovieGenre
ON MovieGenre.MovieId = Movie.Id AND MovieGenre.GenreId = 11
WHERE Title LIKE '%s%' LIMIT 8;
Do note that I'm not a database expert, and am hoping someone will be able to optimize that a bit better. As it stands running it straight from the SQL command line interface they both take ~0.02 seconds on my laptop.
SELECT *
FROM table
WHERE some_condition
ORDER BY RAND()
LIMIT 0, 10