Mysql LIMIT operator - equal efficency when not using it? - mysql

How is it that when i use LIMIT, mysql checks the same number of rows? and how do i solve this?

The Where clause is processed first. Once the matches are found, the limit is applied on the result set, so all of the rows have to be evaluated to determine if they match the conditions before the limit can be applied.

The explain output is misleading. Mysql will evaluate the query using the where clause and such, but it'll stop after it find LIMIT number of matching rows (1 in this case). This is a known issue with mysql: http://bugs.mysql.com/bug.php?id=50168
To clarify... the limit clause will work as expected... it's only the explain output that's inaccurate.

Related

Is mysql’s lag function non-deterministic without an ORDER BY?

I’ve been working on a query using the lag function. My initial query appeared to be returning the correct data, even though I left out the ORDER BY in the OVER clause. I was partitioning over several columns.
Then I added a WHERE clause and was surprised to find that the result set returned contained rows which were not in the unfiltered query.
My question is, is there a use case for using lag without an ORDER BY in the OVER clause? I also read in the documentation that lag does not even require an OVER clause, but it seems to me, that without the OVER and the ORDER BY lag would return random values from the column. Am I missing something?
It seems like OVER and ORDER BY in the OVER clause should be an error.

SQL get result and number of rows in the result with LIMIT

I have a large database in which I use LIMIT in order not to fetch all the results of the query every time (It is not necessary). But I have an issue: I need to count the number of results. The dumbest solution is the following and it works:
We just get the data that we need:
SELECT * FROM table_name WHERE param > 3 LIMIT 10
And then we find the length:
SELECT COUNT(1) FROM table_name WHERE param > 3 LIMIT 10
But this solution bugs me because unlike the query in question, the one that I work with is complex and you have to basically run it twice to achieve the result.
Another dumb solution for me was to do:
SELECT COUNT(1), param, anotherparam, additionalparam FROM table_name WHERE param > 3 LIMIT 10
But this results in only one row. At this point I will be ok if it would just fill the count row with the same number, I just need this information without wasting computation time.
Is there a better way to achieve this?
P.S. By the way, I am not looking to get 10 as the result of COUNT, I need the length without LIMIT.
You should (probably) run the query twice.
MySQL does have a FOUND_ROWS() function that reports the number of rows matched before the limit. But using this function is often worse for performance than running the query twice!
https://www.percona.com/blog/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/
...when we have appropriate indexes for WHERE/ORDER clause in our query, it is much faster to use two separate queries instead of one with SQL_CALC_FOUND_ROWS.
There are exceptions to every rule, of course. If you don't have an appropriate index to optimize the query, it could be more costly to run the query twice. The only way to be sure is to repeat the tests shown in that blog, using your data and your query on your server.
This question is very similar to: How can I count the numbers of rows that a MySQL query returned?
See also: https://mariadb.com/kb/en/found_rows/
This is probably the most efficient solution to your problem, but it's best to test it using EXPLAIN with a reasonably sized dataset.

SQL group by and limit [duplicate]

When I add LIMIT 1 to a MySQL query, does it stop the search after it finds 1 result (thus making it faster) or does it still fetch all of the results and truncate at the end?
Depending on the query, adding a limit clause can have a huge effect on performance. If you want only one row (or know for a fact that only one row can satisfy the query), and are not sure about how the internal optimizer will execute it (for example, WHERE clause not hitting an index and so forth), then you should definitely add a LIMIT clause.
As for optimized queries (using indexes on small tables) it probably won't matter much in performance, but again - if you are only interested in one row than add a LIMIT clause regardless.
Limit can affect the performance of the query (see comments and the link below) and it also reduces the result set that is output by MySQL. For a query in which you expect a single result there is benefits.
Moreover, limiting the result set can in fact speed the total query time as transferring large result sets use memory and potentially create temporary tables on disk. I mention this as I recently saw a application that did not use limit kill a server due to huge result sets and with limit in place the resource utilization dropped tremendously.
Check this page for more specifics: MySQL Documentation: LIMIT Optimization
The answer, in short, is yes. If you limit your result to 1, then even if you are "expecting" one result, the query will be faster because your database wont look through all your records. It will simply stop once it finds a record that matches your query.
If there is only 1 result coming back, then no, LIMIT will not make it any faster. If there are a lot of results, and you only need the first result, and there is no GROUP or ORDER by statements then LIMIT will make it faster.
If you really only expect one single result, it really makes sense to append the LIMIT to your query. I don't know the inner workings of MySQL, but I'm sure it won't gather a result set of 100'000+ records just to truncate it back to 1 at the end..

MySQL: Does LIMIT reduce the number of calls to user-defined functions?

I have a computationally expensive user-defined function that I need to use against a large dataset. I don't sort nor ask for row-count (no FOUND_ROWS). If I specify LIMIT as part of the query, does MYSQL engine actually stop calling the function after getting the LIMIT-rows or does it run the function against the entire dataset regardless? Example:
select cols, .. where fingerprint_match(col, arg) > score limit 5;
Ideally, fingerprint_match would be called as few as 5 times if first (random) rows resulted in a passing score.
As documented under Optimizing LIMIT Queries:
MySQL sometimes optimizes a query that has a LIMIT row_count clause and no HAVING clause:
[ deletia ]
As soon as MySQL has sent the required number of rows to the client, it aborts the query unless you are using SQL_CALC_FOUND_ROWS.
I believe the query will stop processing as soon as the specified number of matches are found but ONLY IF there is no ORDER BY clause. Otherwise it must find and sort all matches before applying the limit.
The only evidence I have for this is the statement in the docs that "LIMIT 0 quickly returns an empty set. This can be useful for checking the validity of a query.". This suggests to me that it doesn't bother applying the where clause to any rows once the limit has already been satisfied.
http://dev.mysql.com/doc/refman/5.6/en/limit-optimization.html

Rewrite a group-by over a randomly-ordered sub-query using only one select

Here's the thing. I'm having 3 tables, and I'm doing this query:
select t.nomefile, t.tipo_nome, t.ordine
from
(select nomefile, tipo_nome, categorie.ordine
from t_gallerie_immagini_new as immagini
join t_gallerie_new as collezioni on collezioni.id=immagini.id_ref
join t_gallerie_tipi as categorie on collezioni.type=categorie.id
order by RAND()
) as t
group by t.tipo_nome
order by t.ordine
It's applied to 3 tables, all in relationship 1-N, which need to be joined and then take 1 random result from each different result in the higher level table. This query works just fine, the problem is that I'm being asked to rewrite this query USING ONLY ONE SELECT. I've come with another way of doing this with only one select, the thing is that according to SQL sintax the GROUP BY must be before the ORDER BY, so it's pointless to order by random when you already have only the first record for each value in the higher level table.
Someone has a clue on how to write this query using only one select?
Generally, if I am not much mistaken, an ORDER BY clause in the subquery of a query like this has to do with a technique that allows you to pull non-GROUP BY columns (in the outer query) according the order specified. And so you may be out of luck here, because that means the subquery is important to this query.
Well, because in this specific case the order chosen is BY RAND() and not by a specific column/set of columns, you may have a very rough equivalent by doing both the joins and the grouping on the same level, like this:
select nomefile, tipo_nome, categorie.ordine
from t_gallerie_immagini_new as immagini
join t_gallerie_new as collezioni on collezioni.id=immagini.id_ref
join t_gallerie_tipi as categorie on collezioni.type=categorie.id
group by tipo_nome
order by categorie.ordine
You must understand, though, why this is not an exact equivalent. The thing is, MySQL does allow you to pull non-GROUP BY columns in a GROUP BY query, but if they are not correlated to the GROUP BY columns, then the values returned would be... no, not random, the term used by the manual is indeterminate. On the other hand, the technique mentioned in the first paragraph takes advantage of the fact that if the row set is ordered explicitly and unambiguously prior to grouping, then the non-GROUP BY column values will always be the same*. So indeterminateness has to do with the fact that "normally" rows are not ordered explicitly before grouping.
Now you can probably see the difference. The original version orders the rows explicitly. Even if it's BY RAND(), it is intentionally so, to ensure (as much as possible) different results in the output most of the times. But the modified version is "robbed" of the explicit ordering, and so you are likely to get identical results for many executions in a row, even if they are kind of "random".
So, in general, I consider your problem unsolvable for the above stated reasons, and if you choose to use something like the suggested modified version, then just be aware that it is likely to behave slightly differently from the original.
* The technique may not be well documented, by the way, and may have been found rather empirically than by following manuals.
I was not able to understand the reasons behind the request to rewrite this query, however, i found out that there is a solution which uses the "select" word only once. Here's the query:
SELECT g.type, SUBSTRING_INDEX(GROUP_CONCAT(
i.nomefile ORDER BY
RAND()),',',1) nomefile
FROM t_gallerie_new g JOIN t_gallerie_immagini_new i ON g.id=i.id_ref
GROUP BY g.type;
for anyone interested in this question.
NOTE: The use of GROUP_CONCAT has a couple of downsides: It is not recommended to use this keyword when using medium/large tables since it could increase the server side payload. Also, there is a limit to the size of the string returned by GROUP_CONTACT, by default 1024, so, it's necessary to modify a parameter in the mySql server to be able to receive a bigger string from this instruction.