mysql limit with in clause - mysql

SELECT *
FROM restaurant_rate
WHERE
table_id IN (SELECT id_table FROM rTable WHERE restaurant_id = ?)
LIMIT 0, 10;
The number of the result of the inner select is not static. In this case, mysql scans only 10 rows searching one by one? Or scans whole table and returns top 10 rows?
The id_table is an index column of rTable.

The way your query is written is quite inneficient, because it forces to evaluate the in clause for every row in your table. If there are many records in rTable that match the criteria (let's say 1000), every row in restaurant_rate will need to be compared with 1000 values before being accepted or rejected by the where condition.
I would rewrite your query like this:
select rr.*
from
restaurant_rate as rr
inner join rTAble as r on rr.table_id = r.id_table
where
r.restaurant_id = ?
limit 0, 10;
Things you must consider:
You need indexes for the columns involved in the join and in the where condition.
Using limit without order by does not makes much sense; add an order by clause before the limit.

Related

In query with joins and multi-table/field ORDER BY, how to set LIMIT offset to start from a particular row identified by a unique id field?

Suppose I have four tables: tbl1 ... tbl4. Each has a unique numerical id field. tbl1, tbl2 and tbl3 each has a foreign key field for the next table in the sequence. E.g. tbl1 has a tbl2_id foreign key field, and so on. Each table also has a field order (and other fields not relevant to the question).
It is straightforward to join all four tables to return all rows of tbl1 together with corresponding fields from the other three fields. It is also easy to order this result set by a specific ORDER BY combination of the order fields. It is also easy to return just the row that corresponds to some particular id in tbl1, e.g. WHERE tbl1.id = 7777.
QUESTION: what query most efficiently returns (e.g.) 100 rows, starting from the row corresponding to id=7777, in the order determined by the specific combination of order fields?
Using ROW_NUMBER or (an emulation of it in MySQL version < 8) to get the position of the id=7777 row, and then using that in a new version of the same query to set the offset in the LIMIT clause would be one approach. (With a read lock in between.) But can it be done in a single query?
# FIRST QUERY: get row number of result row where tbl1.id = 7777
SELECT x.row_number
FROM
(SELECT #row_number:=#row_number+1 AS row_number, tbl1.id AS id
FROM (SELECT #row_number:=0) AS t, tbl1
INNER JOIN tbl2 ON tbl2.id = tbl1.tbl2_id
INNER JOIN tbl3 ON tbl3.id = tbl2.tbl3_id
INNER JOIN tbl4 ON tbl4.id = tbl3.tbl4_id
WHERE <some conditions>
ORDER BY tbl4.order, tbl3.order, tbl2.order, tbl1.order
) AS x
WHERE id=7777;
Store the row number from the above query and use it to bind :offset in the following query.
# SECOND QUERY : Get 100 rows starting from the one with id=7777
SELECT x.field1, x.field2, <etc.>
FROM
(SELECT #row_number:=#row_number+1 AS row_number, field1, field2
FROM (SELECT #row_number:=0) AS t, tbl1
INNER JOIN tbl2 ON tbl2.id = tbl1.tbl2_id
INNER JOIN tbl3 ON tbl3.id = tbl2.tbl3_id
INNER JOIN tbl4 ON tbl4.id = tbl3.tbl4_id
WHERE <same conditions as before>
ORDER BY tbl4.order, tbl3.order, tbl2.order, tbl1.order
) AS x
LIMIT :offset, 100;
Clarify question
In the general case, you won't ask for WHERE id1 > 7777. Instead, you have a tuple of (11,22,33,44) and you want to "continue where you left off".
Two discussions, with
That is messy, but not impossible. See Iterating through a compound key . Ig gives an example of doing it with 2 columns; 4 columns coming from 4 tables is an extension of such.
A variation
Here is another discussion of such: https://dba.stackexchange.com/questions/164428/should-i-store-data-pre-ordered-rather-than-ordering-on-the-fly/164755#164755
In actually implementing such, I have found that letting the "100" (LIMIT) be flexible can be easier to think through. The idea is: reach forward 100 rows (with LIMIT 100,1). Let's say you get (111,222,333,444). If you are currently at (111, ...), then deal with id2/3/4. If it is, say, (113, ...), then do WHERE id1 < 113 and leave off any specification of id2/3/4. This means fetching less than 100 rows, but it lands you just shy of starting id1=113.
That is, it involves constructing a WHERE clause with between 1 and 4 conditions.
In all cases, your query says ORDER BY id1, id2, id3, id4. And the only use for LIMIT is in the probe to figure out how far ahead the 100th row is (with LIMIT 100,1).
I think I can dig out some old Perl code for that.

Apache Drill: Providing a limit in the subquery for a lateral join is not returning the correct results

I am trying to create a simple query with a inner lateral join but I want to restrict the join to a single result in the subquery
select b.`CODE`
from foo.bar.`BRANCH` b
inner join lateral (
select branch_id
from foo.bar.`BRANCH_DISTANCE`
where branch_id=b.CODE
and distance < 100
limit 1
) on true
The BRANCH_DISTANCE table contains the distances between any two branches and I want to return all branches that are within 100 km of another branch, which is why in the subquery, as long as there is one record that contains the branch and its distance is less than 100, it should return the branch (and stop looking for any further matches).
But when I add the limit, the query returns only one record. On removing the limit, around 2000 records are returned.
If I replace the select b.CODE with select distinct b.CODE, the get around 500 results (which is the correct answer).
My objective is to not use the distinct keyword in the select statement and that is why I was adding the limit in the subquery so that the join is done not on every record in the BRANCH_DISTANCE table that contains the branch code and distance < 100 (because it is possible for a branch to be less than 100 km away from more than one branch).
Join may multiply resulting rows count for the case when joining is happening on the column with duplicate values (in this one, or both of branch_id and b.CODE columns have duplicate values).
To restrict the join to a single result in a subquery, please use IN clause.
So something like this should work as expected:
select b.`CODE`
from foo.bar.`BRANCH` b
where b.`CODE` in (
select branch_id
from foo.bar.`BRANCH_DISTANCE`
and distance < 100
)

Performance in mysql joins with subquery and limit

In a join operation between two subqueries (or a table and a subquery), is it preferred to specify the LIMIT clause in an inner query rather than on the outer query (since the order would determine the amount of rows the DBMS would have to iterate to check the where clause)? like:
((
SELECT id
FROM Table1
WHERE score>=100
LIMIT 10)
AS A NATURAL JOIN Table2
))
would be better than
((
SELECT id
FROM Table1
WHERE score>=100)
AS A NATURAL JOIN Table2
))
LIMIT 10
My thinking is that in the last query, the DBMS first have to iterate (full table or an index) ALL rows in Table1 where score>=100 that can be mapped to Table2 on their common columns (which could be any number of rows), and only after that it will truncate to only 10 rows, whereas in the first query, it will only scan until it has found 10 rows from Table1 that satisfy the where clause and can be mapped to Table2, then stop....
The 2 partial statements are not equivalent. When using LIMIT order matters. If you're placing the limit on Table1 you might never see the rows you would have otherwise seen with limit placed on the whole dataset. Given that disclaimer, it seems like using the limit and then joining would be more efficient, but rule of thumb is you should always measure.
Also consider that instead of joining the SELECT as table, for which MySQL will have to build an internal temporary table, you could join the table itself, i.e.:
SELECT t0.col0, t1.col1
FROM
Table0 t0
JOIN Table1 t1 ON (t0.col0 = t1.col0 AND t1.score >= 100)
which might be even more efficient if you have good indexes and end up using them. But again, you should measure.

Is there a point to restrict values in limited queries on big tables?

I have a multiple similar tables with thousands of rows, and I do queries like this
select * from firsttable where firsttable.type = :rt
union
select * from secondtable where secondtable.type = :rt
...
order by exp desc limit 50;
For every type there is a certain exp limit that could be set to restrict amount of rows look into.
Will this filter query modification increase query performance?
select * from firsttable where exp > $expFilter and firsttable.type = :rt
union
select * from secondtable where exp > $expFilter and secondtable.type = :rt
...
order by exp desc limit 50;
Assuming $expFilter filters out a significant number of rows, it could make a difference because there would be fewer rows to sort in the order by. How much of a difference it makes would depend on whether exp is indexed or not.

Limit by different field than order in MySQL statement?

Okay, lets say I have the following MySQL query:
SELECT table1.*, COUNT(table2.link_id) AS count
FROM table1
LEFT JOIN table2 on (table1.key = table2.link_id)
GROUP BY table1.key
ORDER BY table1.name ASC
LIMIT 20
Simple right? It returns the table1 info, with the number of times each row is linked in table2.
However, you'll notice that it limits the resulting rows to 20... and sorts the resulting rows by table1.name. What this does is return the top 20 results in alphabetical order.
What I was wondering if there was a way I could limit to the top 20 results based on count in descending order; while ALSO getting the remaining 20 results in alphabetical order. I know I can simply sort the returned array in a followup code, but I'm wondering if there is a way to do this in a single query.
Use subselect for limit, and sort in the outer select
SELECT * FROM (SELECT table1.*, COUNT(table2.link_id) AS count
FROM table1
LEFT JOIN table2 on (table1.key = table2.link_id)
GROUP BY table1.key
ORDER BY count DESC
LIMIT 20 ) t
ORDER BY name ASC