Behavior of LIMIT using JOIN clause - mysql

In mysql when you have 3 tables (huge amount of data ) are joined by JOIN clause, at the end of the SELECT you have a LIMIT clause, the sql engine makes the cartesian product of tables and after this applies the LIMIT, or exists any optimization made before?
I'm asking because the JOIN operation is a expensive operation if we are using big data volumes.
In this case is better do the JOIN and pull all data, or do the SELECT with LIMIT clause N times?

Short version: it depends on the query
Longer version: when you have a LIMIT clause, it assumes you have data ordered some way. If you do not explicitly specify the ORDER BY clause, you will see, that the execution of the query will stop as long as MySQL found the required number of rows. If you have an ordering an it requires filesort, all data will be sorted in a temporary table and MySQL will output the first N rows you specified.
But if you forced the specific join order and index usage, MySQL will happiely short cirquit the execution.

No, the limit feature is only a visual afterwards effect.
1/ the JOINS are calculated.
2/ the WHERE filters as well as the HAVING filters are processed
3/ the LIMIT reduces the number of returned results. The order are still kept, if you specified any ORDER BY clause.
Improve your joins to improve calculation overhead.

Related

json_object() and `group by` Optimisation in MySQL

i've got an optimisation problem with my query, once I use the aggregate GROUP BY in my query with a JSON_OBJECT(), the performances are heavily affected, and it seems that the JSON_OBJECT() function is called for EVERY row in the table, even if there is a LIMIT.
Once there is no more GROUP BY, the query is executed really fast. I abstracted the query i'm using to the easiest, but I need to GROUP BY cause
I'm using JSON_ARRAYAGG() for another join.
I got ~25k rows in my table and it takes 10x less time when removing the group by aggregate
select JSON_OBJECT('id',`b`.`id`) as bw
from a
left join `b` on `a`.`id` = `b`.`id_a`
group by `a`.`id`
LIMIT 1;
In general, JSON should be used for storing structured data that only the app needs to look inside. It is clumsy and probably very inefficient for MySQL to pick apart JSON for use with WHERE, GROUP BY, etc.
As for GROUP BY (or ORDER BY) plus LIMIT 1:
With just the LIMIT, MySQL simply peels of the first row it finds. -- much faster, but which row you get is unpredictable.
With Group or Order, it may have to gather all possible rows, juggle them (grouping or sorting), and only then peel off 1 row. -- much slower.
It sounds like you have an "array" of things in each JSON? The RDBMS equivalent involves a second table to handle all those arrays -- one element per row. Switching to that may lead to much faster code. (I don't understand your data well enough to give you a concrete suggestion.)

Why PostgreSQL query give inconsistent results unless enforce a predictable result ordering with ORDER BY

In PostgreSQL doc says:
The query optimizer takes LIMIT into account when generating query plans, so you are very likely to get different plans (yielding different row orders) depending on what you give for LIMIT and OFFSET. Thus, using different LIMIT/OFFSET values to select different subsets of a query result will give inconsistent results unless you enforce a predictable result ordering with ORDER BY. This is not a bug; it is an inherent consequence of the fact that SQL does not promise to deliver the results of a query in any particular order unless ORDER BY is used to constrain the order.
But in MySQL InnoDB table, they result will have been delivered in PRIMARY KEY.
Why the query give inconsistent results? What happens in the query?
What the Postgres documentation and your observations are telling you is that records in SQL tables have no internal order. Instead, database tables are modeled after unordered sets of records. Hence, in the following query which could run on either MySQL or Postgres:
SELECT *
FROM yourTable
LIMIT 5
The database is free to return whichever 5 records it wants. In the case of MySQL, if you are seeing an ordering based on primary key, it is only by coincidence, and MySQL does not offer any such contract that this would always happen.
To resolve this problem, you should always be using an ORDER BY clause when using LIMIT. So the following query is well-defined:
SELECT *
FROM yourTable
ORDER BY some_column
LIMIT 5

Does MySQL not use LIMIT to optimize query select functions?

I've got a complex query I have to run in an application that is giving me some performance trouble. I've simplified it here. The database is MySQL 5.6.35 on CentOS.
SELECT a.`po_num`,
Count(*) AS item_count,
Sum(b.`quantity`) AS total_quantity,
Group_concat(`web_sku` SEPARATOR ' ') AS web_skus
FROM `order` a
INNER JOIN `order_item` b
ON a.`order_id` = b.`order_key`
WHERE `store` LIKE '%foobar%'
LIMIT 200 offset 0;
The key part of this query is where I've placed "foobar" as a placeholder. If this value is something like big_store, the query takes much longer (roughly 0.4 seconds in the query provided here, much longer in the query I'm actually using) than if the value is small_store (roughly 0.1 seconds in the query provided). big_store would return significantly more results if there were not limit.
But there is a limit and that's what surprises me. Both datasets have more than the LIMIT, which is only 200. It appears to me that MySQL performing the select functions COUNT, SUM, GROUP_CONCAT for all big_store/small_store rows and then applies the LIMIT retroactively. I would imagine that it'd be best to stop when you get to 200.
Could it not do the select functions COUNT, SUM, GROUP_CONCAT actions after grabbing the 200 rows it will use, making my query much much quicker? This seems feasible to me except in cases where there's an ORDER BY on one of those rows.
Does MySQL not use LIMIT to optimize a query select functions? If not, is there a good reason for that? If so, did I make a mistake in my thinking above?
It can stop short due to the LIMIT, but that is not a reasonable query since there is no ORDER BY.
Without ORDER BY, it will pick whatever 200 rows it feels like and stop short.
With an ORDER BY, it will have to scan the entire table that contains store (please qualify columns with which table they come from!). This is because of the leading wildcard. Only then can it trim to 200 rows.
Another problem -- Without a GROUP BY, aggregates (SUM, etc) are performed across the entire table (or at least those that remain after filtering). The LIMIT does not apply until after that.
Perhaps what you are asking about is MariaDB 5.5.21's "LIMIT_ROWS_EXAMINED".
Think of it this way ... All of the components of a SELECT are done in the order specified by the syntax. Since LIMIT is last, it does not apply until after the other stuff is performed.
(There are a couple of exceptions: (1) SELECT col... must be done after FROM ..., since it would not know which table(s); (2) The optimizer readily reorders JOINed table and clauses in WHERE ... AND ....)
More details on that query.
The optimizer peeks ahead, and sees that the WHERE is filtering on order (that is where store is, yes?), so it decides to start with the table order.
It fetches all rows from order that match %foobar%.
For each such row, find the row(s) in order_item. Now it has some number of rows (possibly more than 200) with which to do the aggregates.
Perform the aggregates - COUNT, SUM, GROUP_CONCAT. (Actually this will probably be done as it gathers the rows -- another optimization.)
There is now 1 row (with an unpredictable value for a.po_num).
Skip 0 rows for the OFFSET part of the LIMIT. (OK, another out-of-order thingie.)
Deliver up to 200 rows. (There is only 1.)
Add ORDER BY (but no GROUP BY) -- big deal, sort the 1 row.
Add GROUP BY (but no ORDER BY) in, now you may have more than 200 rows coming out, and it can stop short.
Add GROUP BY and ORDER BY and they are identical, then it may have to do a sort for the grouping, but not for the ordering, and it may stop at 200.
Add GROUP BY and ORDER BY and they are not identical, then it may have to do a sort for the grouping, and will have to re-sort for the ordering, and cannot stop at 200 until after the ORDER BY. That is, virtually all the work is performed on all the data.
Oh, and all of this gets worse if you don't have the optimal index. Oh, did I fail to insist on providing SHOW CREATE TABLE?
I apologize for my tone. I have thrown quite a few tips in your direction; please learn from them.

Difference in select query result on MyISAM Vs InnoDB MySQL engines(especially for FULL TEXT SEARCHES)

I would like to know if there is any difference in the result output of the same select query on MyISAM Vs that on InnoDB for the same table.
The thing I am aware of is MyISAM can do FULL TEXT searches. But will the order of the output differ ?
The ordering of the output is determined by the order by clause. You have three possibilities.
First, there is no order by clause. Then the result set is in an indeterminate order. You cannot say that running the same query on the same data will produce results in the same order on multiple runs. You definitely cannot make any statement about runs on different databases.
Second, there is an order by clause and it is a stable sort -- meaning that each key for the order by uniquely identifies each row (there are no ties). Then the results are specified by both the SQL standard and MySQL documentation. The result sets will be in the same order.
Third, there is an order by clause and there are ties. The keys will be in the same order in both result sets. However, because keys with ties can be in any order, the two result sets are not guaranteed to be in the same order.
Summary: if you want results in a particular order, use order by.

Does LIMIT OFFSET,LENGTH require ORDER BY for pagination?

I have a MyISAM table with 28,900 entires. I'm processing it in chunks of 1500, which a query like this:
SELECT * FROM table WHERE id>0 LIMIT $iStart,1500
Then I loop over this and increment $iStart by 1500 each time.
The problem is that the queries are returning the same rows in some cases. For example the LIMIT 0,1500 query returns some of the same rows as the LIMIT 28500,1500 query does.
If I don't ORDER the rows, can I not expect to use LIMIT for pagination?
(The table is static while these queries are happening, no other queries are going on that would alter its rows).
Like pretty much every other SQL engine out there, MySQL MyISAM tables make no guarantees at all about the order in which rows are returned unless you specify an ORDER BY clause. Typically the order they are returned in will be the order they were read off the filesystem in, which can change from query to query depending on updates, deletes, and even the state of cached selects.
If you want to avoid having the same row returned more than once then you must order by something, the primary key being the most obvious candidate.
You should use ORDER BY to ensure a consistent order. Otherwise the DBMS may return rows in an arbitrary order (however, one would assume that it's arbitrary but consistent between queries if the no rows are modified).