SQL group by and limit [duplicate] - mysql

When I add LIMIT 1 to a MySQL query, does it stop the search after it finds 1 result (thus making it faster) or does it still fetch all of the results and truncate at the end?

Depending on the query, adding a limit clause can have a huge effect on performance. If you want only one row (or know for a fact that only one row can satisfy the query), and are not sure about how the internal optimizer will execute it (for example, WHERE clause not hitting an index and so forth), then you should definitely add a LIMIT clause.
As for optimized queries (using indexes on small tables) it probably won't matter much in performance, but again - if you are only interested in one row than add a LIMIT clause regardless.

Limit can affect the performance of the query (see comments and the link below) and it also reduces the result set that is output by MySQL. For a query in which you expect a single result there is benefits.
Moreover, limiting the result set can in fact speed the total query time as transferring large result sets use memory and potentially create temporary tables on disk. I mention this as I recently saw a application that did not use limit kill a server due to huge result sets and with limit in place the resource utilization dropped tremendously.
Check this page for more specifics: MySQL Documentation: LIMIT Optimization

The answer, in short, is yes. If you limit your result to 1, then even if you are "expecting" one result, the query will be faster because your database wont look through all your records. It will simply stop once it finds a record that matches your query.

If there is only 1 result coming back, then no, LIMIT will not make it any faster. If there are a lot of results, and you only need the first result, and there is no GROUP or ORDER by statements then LIMIT will make it faster.

If you really only expect one single result, it really makes sense to append the LIMIT to your query. I don't know the inner workings of MySQL, but I'm sure it won't gather a result set of 100'000+ records just to truncate it back to 1 at the end..

Related

Pagination or Where Clause executed First

MySQL Data - Best way to implement paging?
SELECT * FROM SALES
WHERE name like 'Sl%'
ORDER BY name DESC
LIMIT 1,2;
Pagination or Where Clause executed First. I am going to run this in huge Database
Thanks
Pagination or Where Clause executed First
The limit (pagination) always applies last. Otherwise the database would just be taking a few random records, and then attempting to apply your where clause to them and possibly returning no records at all from your query. That would not make any sense.
LIKE operation kills or hangs for long time. If LIMIT is applied first, i am happy to put the LIKE in a HUGE table
If your table is huge, then you need to make sure your where clause is always running against an index.

Does the count query ignore sorting in MySQL?

I'm just wondering, if I do SELECT COUNT(*) FROM ... WHERE ... ORDER BY ..., does MySQL sort the records before counting? Or does it understand that it makes no sense in this case and just ignores ORDER BY?
The ORDER BY statement, is (in this query) the last statement to be executed.
The ORDER BY statement only afects the result, never affects the data you are looking into, so in this case the system will work in this way:
Count the rows that meet your condition (maybe using an index or a full scan, depending on the data and the conditions).
Get the rows for your answer, in this case just 1 row.
Try to order the rows of your answer, in this case with no effect because you have only one row.
So the order by (in this case) will have no effect, because the order applies only to the rows of the answer and you only have 1 row. And as someone said, it doesn´t make any sense to try to order a result with just one row.
I try to answer your question with what I know, but I'm not sure if it will help you, if my answer is wrong, I hope you can point it out.
I am using the InnoDB engine of mysql. InnoDB cannot maintain a row_count variable like MyISAM, so when performing the count(*) operation, a full table scan must be performed, but it does not need to be as system-intensive as the **select *** operation Resources, but you know, after some calls from the relatively top-level sub_select function, all branches will eventually be called into the row_search_mvcc function, which is used to read a row from the B+-tree structure stored by the InnoDB storage engine to memory In one of the buf (uchar * ), it will be used for subsequent processing. Locks, MVCC, etc. may be involved here, but row locks are not involved. This is for reading a row of data in mysql, but it does not need to read a whole row of valid data. At the code level, it will be in evaluate_join_record The lines read are evaluated in the function. So, in my opinion, whether you use order by or not has little effect on count(*).
Responding to your specific question, it is not ignored nor optimized by the DB engine.
Performance is affected by # of rows not the # of keys.
Maybe you can find further info here
ORDER BY Optimization

How to execute query in and get count of total elements without running twice with Rails 4 and MySQL

I´m running a cost-time query in MySQL and Rails. This query is created dynamically and it also manages pagination with LIMIT and OFFSET. This is a summarized example:
SELECT fields
FROM tables
WHERE conditions
ORDER BY order DESC LIMIT ? OFFSET ?
I would also like to get the total count of elements, but I would like to avoid run the query twice for performance purposes. I don´t think is possible, but maybe you surprise me :)
Currently, I have something like:
objects = Object.find_by_sql(query)
totalCount = objects.count
But, of course, this is always returning the limit count.
Because you're using pagination and offsetting, you're not going to get a complete result. You can either run the two separate queries, or you can pull the complete dataset and then filter for pagination. The first option is likely to be faster, especially as your dataset grows.
To improve performance you'd getter better results looking at a caching strategy at various points. Without knowing when the data changes I can't offer any specific advice.
Edit 1: Expanding for Clarification
It might help to explain why this is the case. When you put into place the limit and offset manually, Rails knows nothing about the data not returned by the query. So without having that data available, it's definitionally impossible to make Rails aware of the total count.
That said, a simple COUNT aggregation should always be very fast to execute. If you're seeing speed issues with the execution of that query you'll want to make sure you have the right indexes in place, and that Rails is rendering the conditions in an optimal format.
MySQL? Why not to use SQL_CALC_FOUND_ROWS with FOUND_ROWS() ?
With two queries: (the second query will not hit the database)
SELECT SQL_CALC_FOUND_ROWS * FROM users LIMIT 0,5;
SELECT COUNT(FOUND_ROWS()) AS rows_count FROM users;
But one advise: you must test it. This might be slower or faster than two queries, it depends on some factors, like caching, engine, indexes, etc...
https://dev.mysql.com/doc/refman/5.7/en/information-functions.html#function_found-rows
Is this possible to get total number of rows count with offset limit
To Count the records just add one query as a column to your dynamically created query.
Check this:
SELECT fields,(SELECT count(*) FROM tables WHERE conditions) as obj_count
FROM tables
WHERE conditions
ORDER BY order DESC LIMIT ? OFFSET ?
Using MySQL session variables(starting with symbol #) we can write more efficient query.
SELECT fields,#count as obj_count
FROM tables,(SELECT #count:=count(*) FROM tables WHERE conditions) as sub
WHERE conditions
ORDER BY order DESC LIMIT ? OFFSET ?
This is a bit late, but try using objects.length. Length will count what you already have in the array.

mysql order by query issue

I'm having some problem in using order by in mysql. I have a table called "site" with 3 fields like id,name,rank. This table consists around 1.4m records. when i apply query like,
select name from site limit 50000,10;
it returns 10 records in 7.45 seconds [checked via terminal]. But when i use order by in the above query like,
select name from site order by id limit 50000,10;
the query never seems to be complete. Since the id is set as primary key, i thought it doesn't need another indexing to speedup my query. but i don't know where is the mistake.
Any help greatly appreciated, Thanks.
This is "to be expected" with large LIMIT values:
From http://www.mysqlperformanceblog.com/2006/09/01/order-by-limit-performance-optimization/
Beware of large LIMIT Using index to sort is efficient if you need
first few rows, even if some extra filtering takes place so you need
to scan more rows by index then requested by LIMIT. However if you’re
dealing with LIMIT query with large offset efficiency will suffer.
LIMIT 1000,10 is likely to be way slower than LIMIT 0,10. It is true
most users will not go further than 10 page in results, however Search
Engine Bots may very well do so. I’ve seen bots looking at 200+ page
in my projects. Also for many web sites failing to take care of this
provides very easy task to launch a DOS attack – request page with
some large number from few connections and it is enough. If you do not
do anything else make sure you block requests with too large page
numbers.
For some cases, for example if results are static it may make sense to
precompute results so you can query them for positions. So instead of
query with LIMIT 1000,10 you will have WHERE position between 1000 and
1009 which has same efficiency for any position (as long as it is
indexed)
AND
One more note about ORDER BY … LIMIT is – it provides scary explain
statements and may end up in slow query log as query which does not
use indexes
The last point is THE important point in your case - the combination of ORDER BY and LIMIT with a big table (1.4m) and the "not use indexes" (even if there are indexes!) in this case makes for really slow performance...
EDIT - as per comment:
For this specific case you should use select name from site order by id and handle the splitting of the resultset into chunks of 50,000 each in your code!
Can you try this:
SELECT name
FROM site
WHERE id >= ( SELECT id
FROM site
ORDER BY id
LIMIT 50000, 1
)
ORDER BY id
LIMIT 10 ;

Mysql SQL_CALC_FOUND_ROWS and pagination

So I have a table that has a little over 5 million rows. When I use SQL_CALC_FOUND_ROWS the query just hangs forever. When I take it out the query executes within a second withe LIMIT ,25. My question is for pagination reasons is there an alternative to getting the number of total rows?
SQL_CALC_FOUND_ROWS forces MySQL to scan for ALL matching rows, even if they'd never get fetched. Internally it amounts to the same query being executed without the LIMIT clause.
If the filtering you're doing via WHERE isn't too crazy, you could calculate and cache various types of filters to save the full-scan load imposed by calc_found_rows. Basically run a "select count(*) from ... where ...." for most possible where clauses.
Otherwise, you could go Google-style and just spit out some page numbers that occasionally have no relation whatsoever with reality (You know, you see "Goooooooooooogle", get to page 3, and suddenly run out of results).
Detailed talk about implementing Google-style pagination using MySQL
You should choose between COUNT(*) AND SQL_CALC_FOUND_ROWS depending on situation. If your query search criteria uses rows that are in index - use COUNT(*). In this case Mysql will "read" from indexes only without touching actual data in the table while SQL_CALC_FOUND_ROWS method will load rows from disk what can be expensive and time consuming on massive tables.
More information on this topic in this article #mysqlperformanceblog.