I need to get data from multiple tables with a single query which gives approximately 10600 results (rows). The problem is that the query takes a very long time to execute. Like.. very long time.. 90 sec.
Is there any way I could improve the query without adding indexes? The tables are updated constantly (rows inserted, updated, deleted).
Here is the query:
SELECT
t1.ID
, t1.ref
, t1.type
, GROUP_CONCAT(DISTINCT t3.name) AS parish
, GROUP_CONCAT(DISTINCT t2.village) AS village
, GROUP_CONCAT(DISTINCT t2.code) AS code
, GROUP_CONCAT(DISTINCT t4.year) AS year
FROM table1 t1
LEFT OUTER JOIN table2 AS t2 ON t2.teade_ID = t1.ID
LEFT OUTER JOIN table3 AS t3 ON t2.parish_ID = t3.ID
LEFT OUTER JOIN table4 AS t4 ON t4.teade_ID = t1.ID
GROUP BY t1.ID, t1.ref, t1.type
ORDER BY t1.ID DESC
Any help is very much appriciated!
Plan A - Make the GROUP BY and ORDER BY match:
Normally an index is primarily used for the WHERE clause. But there is no filtering, so the index can move on to GROUP BY. What index(es) do you have? If you have PRIMARY KEY(id), then changing to simply this is likely to work:
GROUP BY t1.ID
ORDER BY t1.ID DESC
If there is trouble with ONLY_FULL_GROUP_BY, you might need
GROUP BY t1.ID, t1.ref, t1.type
ORDER BY t1.ID DESC, t1.ref DESC, t1.type DESC
In either case note how the GROUP BY and ORDER BY "match" each other. With this (unlike what you have), both clauses can be done in a single step. Hence no need to gather all the rows, do the grouping, then sort. Getting rid of the sort is where you would gain speed.
Plan B - Delay the access to the troublesome ref and type:
SELECT ID, t1x.ref, t1x.type
FROM (
SELECT
t1.ID
, GROUP_CONCAT(DISTINCT t3.name) AS parish
, GROUP_CONCAT(DISTINCT t2.village) AS village
, GROUP_CONCAT(DISTINCT t2.code) AS code
, GROUP_CONCAT(DISTINCT t4.year) AS year
FROM table1 t1
LEFT OUTER JOIN table2 AS t2 ON t2.teade_ID = t1.ID
LEFT OUTER JOIN table3 AS t3 ON t2.parish_ID = t3.ID
LEFT OUTER JOIN table4 AS t4 ON t4.teade_ID = t1.ID
GROUP BY t1.ID
) x
JOIN t1 AS t1x USING(ID)
ORDER BY t1.ID DESC
ORDER BY is ignored in the derived table; GROUP BY is not necessary in the outer table.
Plan C - Get rid of the GROUP BY on the assumption that ID is the PK:
SELECT ID, ref, type
( SELECT GROUP_CONCAT(DISTINCT t3.name)
FROM t3 WHERE t3.ID = t1.ID ) AS parish,
( ... ) AS ...,
( ... ) AS ...,
( ... ) AS ...
FROM t1
ORDER BY ID DESC
The subqueries have the same semantics as your original LEFT JOIN.
Your original query suffers from "explode-implode". First the JOINs gather all the parishes, etc, leading to a big intermediate table. Then the grouping shrinks it back to only what you needed. Plan C avoids that explode-implode, and hence the GROUP BY.
Furthermore, there won't be a sort because it can simply scan the table in reverse order.
Aggregate before joining:
SELECT t1.ID, t1.ref, t1.type,
t2.villages, t2.codes,
t3.villages, t4.years
FROM table1 t1 LEFT JOIN
(SELECT t2.teade_ID, GROUP_CONCAT(t2.code) AS codes,
GROUP_CONCAT(t2.village) as villages
FROM table2 t2
GROUP BY t2.teade_ID
) t2
ON t2.teade_ID = t1.ID LEFT JOIN
(SELECT t2.teade_ID, GROUP_CONCAT(t3.village) as villages
FROM table2 t2 JOIN
table3 t3
ON t2.parish_ID = t3.ID
GROUP BY t2.teade_ID
) t3
ON t3.teade_id = t.id LEFT JOIN
(SELECT GROUP_CONCAT(t4.year) AS year
FROM table4 t4
GROUP BY t2.teade_ID
) t4
ON t4.teade_ID = t1.ID
ORDER BY t1.ID DESC;
You might still need DISTINCT in the GROUP_CONCAT(). It is not clear from your question if this is still needed.
Why is this faster? Your version is generating a cross product of all the tables for each ID -- potentially greatly multiplying the size of the data. More data makes the GROUP BY slower.
Also note that there is no aggregation in the outer query.
Related
I have a query
SELECT DISTINCT t1.country
FROM table2 t2
JOIN table3 t3
ON t2.table3_id = t3.id
JOIN table4 t4
ON t3.table4_id = t4id
LEFT
JOIN table1 t1
ON t1.table2_id = t2.id
WHERE t2.type IN ('Some')
AND t4.locale_id = 11
AND t1.table2_id IS NOT NULL
AND t1.country IS NOT NULL
AND t1.country != ''
ORDER
BY t1.country ASC
And when I remove Distinct and ordering it works much faster in mysql console, BUT it working same time when I run it through Rails ActiveRecord:
ActiveRecord::Base.connection.execute(query)
So that I have two questions.
First and main - why optimization hasn't result in Rails Environment?
Second - Do you know how to speed up this query more?
SELECT t1.name as r_name, t1.values as r_values
FROM table as t1
JOIN (
SELECT SUM(amount) as amount
FROM database2.table
WHERE ids IN (t1.values)
) as t2
WHERE t1.id = 20;
I get an error, that t1.values inside the subquery is unknown column.
You need to rewrite your query and take inne where to join condition:
SELECT t1.name as r_name, t1.values as r_values
FROM table as t1
JOIN (
SELECT SUM(amount) as amount
FROM database2.table
) as t2 ON t2.ids = t1.values
WHERE t1.id = 20;
Also, you don't use amount column, so what is the point of join?
Another issue, you don't have any join condition defined.
I think you need to read about joins in SQL first :)
It seems you are trying to join database2.table to your t1 based on t1.values list.
I added group by IDs in t2 since your using aggregation function. Then, not sure what's the purpose of your sum(amount)
SELECT t1.name as r_name, t1.values as r_values
FROM table as t1
JOIN (
SELECT SUM(amount) as amount, ids
FROM database2.table
GROUP BY ids
) as t2 on t2.ids IN (t1.values)
WHERE t1.id = 20;
I have 2 tables and result as shown in the image below: MySQL DB
What would be best way to join the two tables so we get the result as shown above.
SELECT * FROM (SELECT id, desc FROM table2) as T1
LEFT JOIN (SELECT * FROM table1) as T2 ON T1.id = T2.id
I guess my SQL is not working.
You can use a LEFT JOIN with COALESCE:
SELECT t1.id, COALESCE(t2.desc, t1.desc) AS desc, t1.D1, t1.D2
FROM table1 as T1
LEFT JOIN table2 as T2 ON T1.id = T2.id
Use a left join with coalesce to prioritize table 2's values if they are present, but fallback on table 1's values if not.
select t1.id,
coalesce(t2.desc, t1.desc) as desc,
t1.d1, t1.d2
from table1 t1
left join table2 t2
on t2.id = t1.id
order by t1.id
You can use ifnull:
SELECT t1.id, ifnull(t2.desc, t1.desc) AS desc, t1.D1, t1.D2
FROM table1 as T1
LEFT JOIN table2 as T2 ON T1.id = T2.id
coalesce or case .. when is also possible. All together with the left join
Is is possible to simplify this UNION to avoid the near redundancy of the queries being unioned? As seen here, both queries are similar. They just join on a different column in table2. The reason i use Union, instead of just Inner Joining 2x in the same query is because the results must be in 1 column by virtue of the fact that this queries is used as a subquery.
SELECT t1.id as id
FROM table1 g
INNER JOIN table2 t1 on g.t_id = t1.id
WHERE g.id=1
UNION
SELECT t2.id as id2
FROM table1 g
INNER JOIN table2 t2 on g.t2_id = t2.id
WHERE g.id=1
I don't see why this couldn't be treated as a simple inner join that can be satisfied by a match in either of two predicates. I've removed the original table aliases of t1, t2, and g for the sake of clarity. Since I don't know if the query could produce duplicate rows, I used DISTINCT in order to collapse duplicate rows in the same manner that the UNION did in the original query.
SELECT DISTINCT table2.id
FROM table1
INNER JOIN table2
ON ( table1.t_id = table2.id OR table1.t2_id = table2.id )
WHERE table1.id = 1
;
It is possible to do with two joins, and the IFNULL() function:
SELECT IFNULL (t1.id, t2.id) as id
FROM table1 g
INNER JOIN table2 t1 on g.t_id = t1.id
INNER JOIN table2 t2 on g.t2_id = t2.id
WHERE g.id=1
You might find this simpler:
select distinct t.id
from table2 t
where t.id in (select g.t_id from table1 g) or
t.id in (select g.t2_id from table1 g)
However, the performance would be awful on MySQL. You can also do:
select distinct t.id
from table2 t
where exists (select 1 from table1 g where g.t_id = t.id or g.t2_id = t.id)
The second version should work better in MySQL.
I tried to count how many new tuples are in a subset of t2 as compared to t1 by
SELECT
COUNT(t2.id)
FROM (
(SELECT id, col1 FROM t2 WHERE col2=0 AND col3=0) AS t
LEFT OUTER JOIN
t1
ON
t.id=t1.id
)
WHERE
t1.id IS NULL;
The subset is defined by
(SELECT id, col1 FROM t2 WHERE col2=0 AND col3=0) AS t
But the above program doesn't seem to work, issuing errors.
There is no need to enclose the FROM clause in (). You are referencing t2.id in your aggregate COUNT(), but your SELECT list will only produce t.id from the subquery that encapsulates t2. This version addresses the source of your errors:
SELECT
COUNT(t.id) AS idcount
FROM
(SELECT id, col1 FROM t2 WHERE col2=0 AND col3=0) AS t
LEFT OUTER JOIN t1 ON t.id = t1.id
WHERE t1.id IS NULL
However:
Since your subquery is actually pretty simple, I believe it isn't necessary at all. The whole thing can be done with a LEFT JOIN:
SELECT
/* The equivalent of COUNT(*) in this context */
COUNT(t2.id) AS idcount
FROM
t2
LEFT OUTER JOIN t1 ON t2.id = t1.id
WHERE
t1.id IS NULL
AND (t2.col2 = 0 AND t2.col3 = 0)
are you sure you don't want to do COUNT(t.id)? t2 is in a subquery and is not available to the main query only t and t1 are available.
The problem is the alias. You have:
select count(t2.id)
But, t2 is defined in the subquery, so it is out of scope.
You want:
select count(t.id)