MySQL: LIMIT parameter in JOIN producing unexpected results - mysql

I have read here that MySQL processes ordering before applying limits. However, I receive different results when applying a LIMIT parameter in conjunction with a JOIN subquery. Here is my query:
SELECT
t1.id,
(t2.counts / c.matches)
FROM
table_one t1
JOIN
table_two t2 ON t1.id = t2.id
JOIN
(
SELECT
t1.id, COUNT(DISTINCT t1.id) AS matches
FROM
table_one t1
JOIN table_two t2 ON t1.id = t2.id
WHERE
t1.id IN (3390 , 3236, 148, 2811, 829, 137)
AND t2.value_one <= 30
AND t2.value_two < 2
GROUP BY t1.id
ORDER BY (t2.counts / matches)
LIMIT 0, 50 -- PROBLEM IS HERE (I think)
) c ON c.id = t1.id
ORDER BY (t2.counts / c.matches), t1.id;
Here is a rough description of what I think is happening:
The sub-query selects a bunch of ids from table_one that meet the criteria
These are ordered by (t2.counts / matches)
The top 50 (in ascending order) are fashioned into a table
This resulting table is then joined on the the id column
Results are returned from the top level JOIN - without a GROUP BY clause this time. table_one is a reference table so this will return many rows with the same ID.
I appreciate that some of these joins don't make a lot of sense but I have stripped down my query for readability - it's normally quite chunky .
The problem is that when, I include the LIMIT parameter I get a different set of results and not just the top 50. What I want to do is get the top results from the subquery and use these to join onto a bunch of other tables based on the reference table.
Here is what I have tried so far:
LIMIT on the outer query (this is undesirable as this cuts off important information).
Trying different LIMIT tables and values.
Any idea what is going wrong, or what else I could try?

I have found a solution to my problem. It seems as if my matches column name does can't be used in my ORDER BY clause - which is weird since I don't get an error. Either way, this solves the problem:
SELECT
t1.id,
(t2.counts / c.matches)
FROM
table_one t1
JOIN
table_two t2 ON t1.id = t2.id
JOIN
(
SELECT
t1.id, COUNT(DISTINCT t1.id) AS matches
FROM
table_one t1
JOIN table_two t2 ON t1.id = t2.id
WHERE
t1.id IN (3390 , 3236, 148, 2811, 829, 137)
AND t2.value_one <= 30
AND t2.value_two < 2
GROUP BY t1.id
ORDER BY (t2.counts / COUNT(DISTINCT t1.id)) -- This line is changed
LIMIT 0, 50
) c ON c.id = t1.id
ORDER BY (t2.counts / c.matches), t1.id;

Related

How to optimize mysql on left join

I try to explain a very high level
I have two complex SELECT queries(for the sake of example I reduce the queries to the following):
SELECT id, t3_id FROM t1;
SELECT t3_id, MAX(added) as last FROM t2 GROUP BY t3_id;
query 1 returns 16k rows and query 2 returns 15k
each queries individually takes less than 1 second to compute
However what I need is to sort the results using column added of query 2, when I try to use LEFT join
SELECT
t1.id, t1.t3_Id
FROM
t1
LEFT JOIN
(SELECT t3_id, MAX(added) as last FROM t2 GROUP BY t3_id) AS t_t2
ON t_t2.t3_id = t1.t3_id
GROUP BY t1.t3_id
ORDER BY t_t2.last
However, the execution time goes up to over a 1 minute.
I like to understand the reason
what is the cause of such a huge explosion?
NOTE:
ALL the used columns on every table have been indexed
e.g. :
table t1 has index on id,t3_Id
table t2 has index on t3_id and added
EDIT1
after #Tim Biegeleisen suggestion, I change the query to the following now the query is executing in about 16 seconds. If I remove the ORDER BY it query gets executed in less than 1 seconds. The problem is that ORDER BY the sole reason for this.
SELECT
t1.id, t1.t3_Id
FROM
t1
LEFT JOIN
t2 ON t2.t3_id = t1.t3_id
GROUP BY t1.t3_id
ORDER BY MAX(t2.added)
Even though table t2 has an index on column t3_id, when you join t1 you are actually joining to a derived table, which either can't use the index, or can't use it completely effectively. Since t1 has 16K rows and you are doing a LEFT JOIN, this means the database engine will need to scan the entire derived table for each record in t1.
You should use MySQL's EXPLAIN to see what the exact execution strategy is, but my suspicion is that the derived table is what is slowing you down.
The correct query should be:
SELECT
t1.id,
t1.t3_Id,
MAX(t2.added) as last
FROM t1
LEFT JOIN t2 on t1.t3_Id = t2.t3_Id
GROUP BY t2.t3_id
ORDER BY last;
This is happen because a temp table is generating on each record.
I think you could try to order everything after the records are available. Maybe:
select * from (
select * from
(select t3_id,max(t1_id) from t1 group by t3_id) as t1
left join (select t3_id,max(added) as last from t2 group by t3_id) as t2
on t1.t3_id = t2.t3_id ) as xx
order by last

Mysql: select value that matches several criteria on multiple rows

Good evening,
I have two tables t1 and t2
In t1, I have two variables, ID (which uniquely identify each row) and DOC (which can be common to several IDs)
In t2, I have three variables, ID (which does not necessarily uniquely identify the rows here), AUTH , and TYPE. Each ID has a maximum of 1 distinct AUTH.
Sample data:
What I would like to do is to select the DOCs that have an ID with AUTH='EP', and that also have an ID with AUTH='US'. They could have additional IDs with other AUTH, but they have to have at least these two.
Thus, i would have a final table with the DOC, ID,and AUTH (there should be at least 2 IDs per doc, but it can be more if there exists an additional AUTH to US and EP for this DOC)
The desired results:
This should work:
SELECT DISTINCT (T1.ID), T1.DOC, T2.AUTH FROM T1
LEFT JOIN T2 ON T2.ID = T1.ID
WHERE T1.DOC IN( SELECT T1.DOC FROM T2
LEFT JOIN T1 ON T1.ID = T2.ID
WHERE T2.AUTH IN('EP','US')
GROUP BY T1.DOC HAVING COUNT(DISTINCT T2.AUTH) = 2) ;
If I could understand correctly the query is going to be something like that:
select t1.doc, t1.id, t2.auth from t1
left join t2 on t2.id = t1.id
where t1.doc in( select t1.doc from t2
left join t1 on t1.id = t2.id
where t2.auth in('EP','US') );
Although, the result set is basically going to be the first sample data table, due to the ID 6 which has a AUTH = "EP" and, consequently, the ID 7 which has the same DOC from ID 6.

Why Limit is not working in Mysql?

My question is why the ORDER BY and LIMIT was not working in the concerned query? Sometimes, it returns 1 row which is correct and sometimes it returns all rows (by join). Is there any known issues of MySQL regarding this?
select t1.*, t2.c_value
from table1 t1
inner join table2 t2 on t1.id = t2.f_id
where table1.user_id = 139
order by table1.id desc
limit 0,1

Using a query on a joined table instead of actual table

I have a query now which works great for finding player info including their rank:
SELECT t1.pid,
t1.type,
t1.pspeed,
t1.distance,
t1.maxspeed,
t1.prestrafe,
t1.strafes,
t1.sync,
t1.wpn,
1+SUM(t2.pid IS NOT NULL) as rank
FROM records t1
LEFT JOIN records t2
ON t1.type = t2.type
AND t1.pspeed = t2.pspeed
AND t1.distance < t2.distance
WHERE t1.pid = "'.$pid.'"
AND t1.type IN ("type1","type2","type3")
GROUP BY t1.type, t1.pspeed
The problem now is that I wish to use an authid instead of a pid as my variable. I can do an easy join on the records table along with a player table that has an id equal to the pid in the records table as well as the authid I wish to search by:
SELECT * FROM records JOIN players ON records.pid=players.id
I wish to use this second query in place of the FROM records portion of the first query but I've confused myself in my attempts via the many aliases used.
Here's a pitiful example of an attempt:
SELECT t1.pid,
t1.type,
t1.pspeed,
t1.distance,
t1.maxspeed,
t1.prestrafe,
t1.strafes,
t1.sync,
t1.wpn,
1+SUM(t2.pid IS NOT NULL) as rank
FROM (
SELECT *
FROM records
JOIN players ON records.pid=players.id
) t3 t1
LEFT JOIN records t2
ON t1.type = t2.type
AND t1.pspeed = t2.pspeed
AND t1.distance < t2.distance
WHERE t1.authid = "'.$authid.'"
AND t1.type IN ("type1","type2","type3")
GROUP BY t1.type, t1.pspeed
Anyone able to whip me into shape please feel free to do so. I'm awful at SQL still.
Your Join looks pretty close. However you gave your subquery two aliases. Try just one
SELECT t1.pid, t1.type, t1.pspeed, t1.distance, t1.maxspeed, t1.prestrafe, t1.strafes, t1.sync, t1.wpn, 1+SUM(t2.pid IS NOT NULL) as rank
FROM
(
SELECT players.*, rec.pid, rec.type, rec.pspeed, rec.distance, rec.maxspeed,
rec.prestrafe, rec.strafes, rec.sync, rec.wpn
FROM records AS rec JOIN players ON rec.pid = players.id ) AS t1
LEFT OUTER JOIN records t2
ON t1.type = t2.type AND t1.pspeed = t2.pspeed AND t1.distance < t2.distance
WHERE t1.authid = "'.$authid.'" AND
t1.type IN ("type1","type2","type3")
GROUP BY t1.type, t1.pspeed
Also, naming your columns in your subquery should eliminate the duplicate column error.

MySQL query optimisation - to alias or not?

Is there any major difference from an optimisation point of view between the following two alternatives? In the first option I alias the table, so the total_paid calculation is only run once. In the second option, there is no table alias, but the SUM calculation is required a few times.
Option 1
SELECT tt.*, tt.amount - tt.total_paid as outstanding
FROM
(
SELECT t1.id, t1.amount, SUM(t2.paid) as total_paid
FROM table1 t1
LEFT JOIN table2 t2 on t1.id = t2.t1_id
GROUP BY t1.id
) as temp_table tt
WHERE (tt.amount - tt.total_paid) > 0
LIMIT 0, 25
Option 2
SELECT t1.id, t1.amount, SUM(t2.paid) as total_paid
, (t1.amount - SUM(t2.paid)) as outstanding
FROM table1 t1
LEFT JOIN table2 t2 on t1.id = t2.t1_id
WHERE (t1.amount - SUM(t2.paid)) > 0
GROUP BY t1.id
LIMIT 0, 25
Or perhaps there is an even better option?
if you run the queries with EXPLAIN, you'll be able to see what's going on 'inside'.
Also, why don't you just run it and compare the execution times?
Read more on explain here