How to optimize join which causes very slow performace - mysql

This query runs more than 12 seconds, even though all tables are relatively small - about 2 thousands rows.
SELECT attr_73206_ AS attr_73270_
FROM object_73130_ f1
LEFT OUTER JOIN (
SELECT id_field, attr_73206_ FROM (
SELECT m.id_field, t0.attr_73102_ AS attr_73206_ FROM object_73200_ o
INNER JOIN master_slave m ON (m.id_object = 73130 OR m.id_object = 73290) AND (m.id_master = 73200 OR m.id_master = 73354) AND m.id_slave_field = o.id
INNER JOIN object_73101_ t0 ON t0.id = o.attr_73206_
ORDER BY o.id_order
) AS o GROUP BY o.id_field
) AS o ON f1.id = o.id_field
Both tables have fields id as primary keys. Besides, id_field, id_order,attr_73206_ and all fields in master_slave are indexed. As for the logic of this query, on the whole it's of master-detail kind. Table object_73130_ is a master-table, table object_73200_ is a detail-table. They are linked by a master_slave table. object_73101_ is an ad-hoc table used to get a real value for the field attr_73206_ by its id. For each row in the master table the query returns a field from the very first row of its detail table. Firstly, the query had another look, but here at stackoverflow I was advised to use this more optimized structure (instead of a subquery which was used previously, and, by the way, the query started to run much faster). I observe that the subquery in the first JOIN block runs very fast but returns a number of rows comparable to the number of rows in the main master-table. In any way, I do not know how to optimize it. I just wonder why a simple fast-running join causes so much trouble. Oh, the main observation is that if I remove an ad-hoc object_73101_ from the query to return just an id, but not a real value, then the query runs as quick as a flash. So, all attention should be focused on this part of the query
INNER JOIN object_73101_ t0 ON t0.id = o.attr_73206_
Why does it slow down the whole query so terribly?
EDIT
In this way it runs super-fast
SELECT t0.attr_73102_ AS attr_73270_
FROM object_73130_ f1
LEFT OUTER JOIN (
SELECT id_field, attr_73206_ FROM (
SELECT m.id_field, attr_73206_ FROM object_73200_ o
INNER JOIN master_slave m ON (m.id_object = 73130 OR m.id_object = 73290) AND (m.id_master = 73200 OR m.id_master = 73354) AND m.id_slave_field = o.id
ORDER BY o.id_order
) AS o GROUP BY o.id_field
) AS o ON f1.id = o.id_field
LEFT JOIN object_73101_ t0 ON t0.id = o.attr_73206_
So, you can see, that I just put the add-hoc join outside of the subquery. But, the problem is, that subquery is automatically created and I have an access to that part of algo which creates it and I can modify this algo, and I do not have access to the part of algo which builds the whole query, so the only thing I can do is just to fix the subquery somehow. Anyway, I still can't understand why INNER JOIN inside a subquery can slow down the whole query hundreds of times.
EDIT
A new version of query with different aliases for each table. This has no effect on the performance:
SELECT attr_73206_ AS attr_73270_
FROM object_73130_ f1
LEFT OUTER JOIN (
SELECT id_field, attr_73206_ FROM (
SELECT m.id_field, t0.attr_73102_ AS attr_73206_ FROM object_73200_ a
INNER JOIN master_slave m ON (m.id_object = 73130 OR m.id_object = 73290) AND (m.id_master = 73200 OR m.id_master = 73354) AND m.id_slave_field = a.id
INNER JOIN object_73101_ t0 ON t0.id = a.attr_73206_
ORDER BY a.id_order
) AS b GROUP BY b.id_field
) AS c ON f1.id = c.id_field
EDIT
This is the result of EXPLAIN command:
| id | select_type | TABLE | TYPE | possible_keys | KEY | key_len | ROWS | Extra |
| 1 | PRIMARY | f1 | INDEX | NULL | PRIMARY | 4 | 1570 | USING INDEX
| 1 | PRIMARY | derived2| ALL | NULL | NULL | NULL | 1564 |
| 2 | DERIVED | derived3| ALL | NULL | NULL | NULL | 1575 | USING TEMPORARY; USING filesort
| 3 | DERIVED | m | RANGE | id_object,id_master,..| id_object | 4 | 1356 | USING WHERE; USING TEMPORARY; USING filesort
| 3 | DERIVED | a | eq_ref | PRIMARY,attr_73206_ | PRIMARY | 4 | 1 |
| 3 | DERIVED | t0 | eq_ref | PRIMARY | PRIMARY | 4 | 1 |
What is wrong with that?
EDIT
Here is the result of EXPLAIN command for the "super-fast" query
| id | select_type | TABLE | TYPE | possible_keys | KEY | key_len | ROWS | Extra
| 1 | PRIMARY | f1 | INDEX | NULL | PRIMARY | 4 | 1570 | USING INDEX
| 1 | PRIMARY | derived2| ALL | NULL | NULL | NULL | 1570 |
| 1 | PRIMARY | t0 | eq_ref| PRIMARY | PRIMARY | 4 | 1 |
| 2 | DERIVED | derived3| ALL | NULL | NULL | NULL | 1581 | USING TEMPORARY; USING filesort
| 3 | DERIVED | m | RANGE | id_object,id_master,| id_bject | 4 | 1356 | USING WHERE; USING TEMPORARY; USING filesort
| 3 | DERIVED | a | eq_ref | PRIMARY | PRIMARY | 4 | 1 |
CLOSED
I will use my own "super-fast" query, which I presented above. I think it is impossible to optimize it anymore.

Without knowing the exact nature of the data/query, there are a couple things that I'm seeing:
MySQL is notoriously bad at handling sub-selects, as it requires the creation of derived tables. In fact, some versions of MySQL also ignore indexes when using sub-selects. Typically, it's better to use JOINs instead of sub-selects, but if you need to use sub-selects, it's best to make that sub-select as lean as possible.
Unless you have a very specific reason for putting the ORDER BY in the sub-select, it may be a good idea to move it to the "main" query portion because the result set may be smaller (allowing for quicker sorting).
So all that being said, I tried to re-write your query using JOIN logic, but I was wondering What table the final value (attr_73102_) is coming from? Is it the result of the sub-select, or is it coming from table object_73130_? If it's coming from the sub-select, then I don't see why you're bothering with the original LEFT JOIN, as you will only be returning the list of values from the sub-select, and NULL for any non-matching rows from object_73130_.
Regardless, not knowing this answer, I think the query below MAY be syntactically equivalent:
SELECT t0.attr_73102_ AS attr_73270_
FROM object_73130_ f1
LEFT JOIN (object_73200_ o
INNER JOIN master_slave m ON m.id_slave_field = o.id
INNER JOIN object_73101_ t0 ON t0.id = o.attr_73206_)
ON f1.id = o.id_field
WHERE m.id_object IN (73130,73290)
AND m.id_master IN (73200,73354)
GROUP BY o.id_field
ORDER BY o.id_order;

Related

Mysql query: fastest way to find users followed by or following another user

I have the following tables:
Relationships
id, follower_id, followee_id, status
Users
id, name, email
I want to find all users who are either following or followed by a specific user.
This is what I have so far but it is very slow:
SELECT DISTINCT
`users`.*
FROM
`users`
INNER JOIN
`relationships` ON ((`users`.`id` = `relationships`.`follower_id`
AND `relationships`.`followee_id` = 1)
OR (`users`.`id` = `relationships`.`followee_id`
AND `relationships`.`follower_id` = 1))
WHERE
`relationships`.`status` = 'following'
ORDER BY `users`.`id`
What I mean by slow
I have one user who has roughly 600 followers and 600 following and it takes about 5 seconds for this query to run which seems insanely slow for those numbers!
The explain method shows the following:
+----+-------------+---------------+------+-----------------------------------------------------------------------+------+---------+------+------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+------+-----------------------------------------------------------------------+------+---------+------+------+------------------------------------------------+
| 1 | SIMPLE | relationships | ALL | index_relationships_on_followed_id,index_relationships_on_follower_id | NULL | NULL | NULL | 727 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | users | ALL | PRIMARY | NULL | NULL | NULL | 767 | Range checked for each record (index map: 0x1) |
+----+-------------+---------------+------+-----------------------------------------------------------------------+------+---------+------+------+------------------------------------------------+
Try breaking this into two queries, with a union:
SELECT u.*
FROM `users` u INNER JOIN
`relationships` r
ON u.`id` = r.`follower_id` AND r.`followee_id` = 1
WHERE `r.`status` = 'following'
UNION
SELECT u.*
FROM `users` u INNER JOIN
`relationships` r
ON u.`id` = r.`followee_id` AND r.`follower_id` = 1
WHERE `r.`status` = 'following'
ORDER BY id;
This may be a case where a more complicated query has better performance. These queries will also benefit from indexes: relationships(status, follower_id, followee_id) and relationships(status, followee_id, follower_id).

Slow count query with where clause

I am trying to perform a count to get the total number of results in a pagination but the query is too slow 2.12s
+-------+
| size |
+-------+
| 50000 |
+-------+
1 row in set (2.12 sec)
my count query
select count(appeloffre0_.ID_APPEL_OFFRE) as size
from ao.appel_offre appeloffre0_
inner join ao.acheteur acheteur1_
on appeloffre0_.ID_ACHETEUR=acheteur1_.ID_ACHETEUR
where
(exists (select 1 from ao.lot lot2_ where lot2_.ID_APPEL_OFFRE=appeloffre0_.ID_APPEL_OFFRE and lot2_.ESTIMATION_COUT>=1))
and (exists (select 1 from ao.lieu_execution lieuexecut3_ where lieuexecut3_.appel_offre=appeloffre0_.ID_APPEL_OFFRE and lieuexecut3_.region=1))
and (exists (select 1 from ao.ao_activite aoactivite4_ where aoactivite4_.ID_APPEL_OFFRE=appeloffre0_.ID_APPEL_OFFRE and (aoactivite4_.ID_ACTIVITE=1)))
and appeloffre0_.DATE_OUVERTURE_PLIS>'2015-01-01'
and (appeloffre0_.CATEGORIE='fournitures' or appeloffre0_.CATEGORIE='travaux' or appeloffre0_.CATEGORIE='services')
and acheteur1_.ID_ENTITE_MERE=2
explain cmd :
+----+--------------------+--------------+------+---------------------------------------------+--------------------+---------+--------------------------------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------+------+---------------------------------------------+--------------------+---------+--------------------------------+-------+--------------------------+
| 1 | PRIMARY | acheteur1_ | ref | PRIMARY,acheteur_ibfk_1 | acheteur_ibfk_1 | 5 | const | 3 | Using where; Using index |
| 1 | PRIMARY | appeloffre0_ | ref | appel_offre_ibfk_2 | appel_offre_ibfk_2 | 4 | ao.acheteur1_.ID_ACHETEUR | 31061 | Using where |
| 4 | DEPENDENT SUBQUERY | aoactivite4_ | ref | ao_activites_activite_fk,ao_activites_ao_fk | ao_activites_ao_fk | 4 | ao.appeloffre0_.ID_APPEL_OFFRE | 3 | Using where |
| 3 | DEPENDENT SUBQUERY | lieuexecut3_ | ref | fk_ao_lieuex,fk_region_lieuex | fk_ao_lieuex | 4 | ao.appeloffre0_.ID_APPEL_OFFRE | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | lot2_ | ref | FK_LOT_AO | FK_LOT_AO | 4 | ao.appeloffre0_.ID_APPEL_OFFRE | 5 | Using where |
+----+--------------------+--------------+------+---------------------------------------------+--------------------+---------+--------------------------------+-------+--------------------------+
the index acheteur_ibfk_1 is a FK references table ENTITE_MERE because i have and acheteur1_.ID_ENTITE_MERE=2 in where clause.
You can have multiple conditions on your joins by using ON condition1 AND condition2 etc.
SELECT COUNT(appeloffre0_.ID_APPEL_OFFRE) as size
FROM ao.appel_offre appeloffre0_
JOIN ao.acheteur acheteur1_ ON appeloffre0_.ID_ACHETEUR=acheteur1_.ID_ACHETEUR
JOIN ao.lot lot2_ ON appeloffre0_.ID_APPEL_OFFRE=lot2_.ID_APPEL_OFFRE AND lot2_.ESTIMATION_COUT>=1
JOIN ao.lieu_execution lieuexecut3_ ON appeloffre0_.ID_APPEL_OFFRE=lieuexecut3_.ID_APPEL_OFFRE AND lieuexecut3_.ID_ACTIVITE=1
JOIN ao.ao_activite aoactivite4_ ON appeloffre0_.ID_APPEL_OFFRE=aoactivite4_.ID_APPEL_OFFRE AND aoactivite4_.ID_ACTIVITE=1
WHERE appeloffre0_.DATE_OUVERTURE_PLIS>'2015-01-01'
AND (appeloffre0_.CATEGORIE='fournitures' OR appeloffre0_.CATEGORIE='travaux' OR appeloffre0_.CATEGORIE='services')
AND acheteur1_.ID_ENTITE_MERE=2;
You can try:
select count(aa.ID_APPEL_OFFRE) as size
from (
select ID_APPEL_OFFRE, ID_ACHETEUR from ao.appel_offre appeloffre0_
inner join ao.acheteur acheteur1_
on appeloffre0_.ID_ACHETEUR=acheteur1_.ID_ACHETEUR
where appeloffre0_.DATE_OUVERTURE_PLIS>'2015-01-01'
and (appeloffre0_.CATEGORIE in ('fournitures','travaux','services'))
and (acheteur1_.ID_ENTITE_MERE=2)) aa
inner join ao.lot lot2_ on lot2_.ID_APPEL_OFFRE=aa.ID_APPEL_OFFRE
inner join ao.lieu_execution lieuexecut3_ on lieuexecut3_.appel_offre=aa.ID_APPEL_OFFRE
inner join ao.ao_activite aoactivite4_ on aoactivite4_.ID_APPEL_OFFRE=aa.ID_APPEL_OFFRE
where
aoactivite4_.ID_ACTIVITE=1
and lot2_.ESTIMATION_COUT>=1
and lieuexecut3_.region=1;
But I haven't seen your tables so I am not 100% sure that you won't get duplicates because of joins.
A couple of low-hanging fruits might also be found by ensuring that your appeloffre0_.CATEGORIE and appeloffre0_.DATE_OUVERTURE_PLIS have indexes on them.
Other fields which should have indexes on them are ao.lot.ID_APPEL_OFFRE, ao.lieu_execution.ID_APPEL_OFFRE and ao.ao_activite.ID_APPEL_OFFRE, and ao.appel_offre.ID_ACHETEUR (all the joined fields).
I would have the following indexes on your tables if not already... These are covering indexes for your query meaning the index has the applicable column to get your results without having to go to the actual raw data pages.
table index
appel_offre ( DATE_OUVERTURE_PLIS, CATEGORIE, ID_APPEL_OFFRE, ID_ACHETEUR )
lot ( ID_APPEL_OFFRE, ESTIMATION_COUT )
lieu_execution ( appel_offre, region )
ao_activite ( ID_APPEL_OFFRE, ID_ACTIVITE )
Having indexes on just individual columns won't really help optimize what you are looking for. Also, I am doing count of DISTINCT ID_APPEL_OFFRE's in case any of the JOINed tables have more than 1 record, it does not create a Cartesian result count for you
select
count(distinct AOF.ID_APPEL_OFFRE) as size
from
ao.appel_offre AOF
JOIN ao.acheteur ACH
on AOF.ID_ACHETEUR = ACH.ID_ACHETEUR
and ACH.ID_ENTITE_MERE = 2
JOIN ao.lot
ON AOF.ID_APPEL_OFFRE = lot.ID_APPEL_OFFRE
and lot.ESTIMATION_COUT >= 1
JOIN ao.lieu_execution EX
ON AOF.ID_APPEL_OFFRE = EX.appel_offre
and EX.region = 1
JOIN ao.ao_activite ACT
ON AOF.ID_APPEL_OFFRE = ACT.ID_APPEL_OFFRE
and ACT.ID_ACTIVITE = 1
where
AOF.DATE_OUVERTURE_PLIS > '2015-01-01'
and ( AOF.CATEGORIE = 'fournitures'
or AOF.CATEGORIE = 'travaux'
or AOF.CATEGORIE = 'services')
Like #FuzzyTree said in his comment exists is faster than an inner join if it's not a 1:1 relationship because it terminates as soon as it finds 1 whereas the join will get every matching row.
But the solution is that We add in and not exists :
where ( appeloffre0_.ID_APPEL_OFFRE IN (select lot2_.ID_APPEL_OFFRE from ao.lot lot2_
where lot2_.ESTIMATION_COUT>=1)
)
So the query run very fast than exists or joins .
select count(appeloffre0_.ID_APPEL_OFFRE) as size
from ao.appel_offre appeloffre0_
inner join ao.acheteur acheteur1_
on appeloffre0_.ID_ACHETEUR=acheteur1_.ID_ACHETEUR
where
( appeloffre0_.ID_APPEL_OFFRE IN (select lot2_.ID_APPEL_OFFRE from ao.lot lot2_ where lot2_.ESTIMATION_COUT>=1))
and (appeloffre0_.ID_APPEL_OFFRE IN (select lieuexecut3_.appel_offre from ao.lieu_execution lieuexecut3_ where lieuexecut3_.region=1))
and (appeloffre0_.ID_APPEL_OFFRE IN (select aoactivite4_.ID_APPEL_OFFRE from ao.ao_activite aoactivite4_ where aoactivite4_.ID_ACTIVITE=1 ))
and appeloffre0_.DATE_OUVERTURE_PLIS>'2015-01-01'
and (appeloffre0_.CATEGORIE='fournitures' or appeloffre0_.CATEGORIE='travaux' or appeloffre0_.CATEGORIE='services')
and acheteur1_.ID_ENTITE_MERE=2

Fastest way of doing a 1 to 1 left join on tables with a 1 to many relationship (MySQL)

I have two tables that have a 1 to many relationship, which I'm doing a 1:1 left join on. The query returns the correct results but it shows up in my slow query log (it takes up to 5s). Is there a better way to write this query?
select * from
tablea a left join tableb b
on a.tablea_id = b.tablea_id
and b.tableb_id = (select max(tableb_id) from tableb b2 where b2.tablea_id = a.tablea_id)
i.e. I would like TableA left joined to the row in TableB with the largest tableb_id.
TableA
tablea_id
1
2
TableB
tableb_id, tablea_id, data
1, 1, x
2, 1, y
Expected Result
tablea_id, tableb_id, data
1, 2, y
2, null, null
TableA has an index on tablea_id and TableB has a composite index on tablea_id,tableb_id.
Explain Output
+----+--------------------+---------------+--------+-----------------+---------------+---------+----------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------------+--------+-----------------+---------------+---------+----------------------+-------+-------------+
| 1 | PRIMARY | c | index | NULL | department_id | 4 | NULL | 18966 | Using index |
| 1 | PRIMARY | recent_cv_lut | eq_ref | PRIMARY,case_id | PRIMARY | 4 | func | 1 | |
| 2 | DEPENDENT SUBQUERY | cases_visits | ref | case_id | case_id | 4 | abcd_records_v2.c.id | 2 | Using index |
+----+--------------------+---------------+--------+-----------------+---------------+---------+----------------------+-------+-------------+
Likely, that correlated subquery is getting executed for each row from tableb.
(Without the output from EXPLAIN, we're really just guessing as to whether appropriate indexes are available, and if MySQL is making use of them.)
It might be more efficient to use an inline view query, to get the maximum tableb_id value for each tablea_id in one shot, and then use a join operation. Something like this:
SELECT a.*
, b.*
FROM tablea a
LEFT
JOIN ( SELECT n.tablea_id
, MAX(n.tableb_id) AS max_tableb_id
FROM tableb n
GROUP
BY n.tablea_id
) m
ON m.tablea_id = a.tablea_id
LEFT
JOIN tableb b
ON b.tablea_id = m.tablea_id
AND b.tableb_id = m.max_tableb_id
That's an alternative, but there's no guarantee that's going to be faster. It really depends, on a whole load of things that we don't have any information about. (Number of rows, cardinality, datatypes, available indexes, etc.)
EDIT
As an alternative, we could do the join between tablea and tableb in an inline view. This might improve performance. (Again, it really depends on a lot of things we don't have any information about.)
SELECT m.tablea_id
, m.foo
, b.*
FROM ( SELECT a.tablea_id
, a.foo
, MAX(n.tableb_id) AS max_tableb_id
FROM tablea a
LEFT
JOIN tableb n ON n.tablea_id = a.tablea_id
GROUP
BY a.tablea_id
) m
LEFT
JOIN tableb b
ON b.tablea_id = m.tablea_id
AND b.tableb_id = m.max_tableb_id

MySQL grouping query optimization

I have three tables: categories, articles, and article_events, with the following structure
categories: id, name (100,000 rows)
articles: id, category_id (6000 rows)
article_events: id, article_id, status_id (20,000 rows)
The highest article_events.id for each article row describes the current status of each article.
I'm returning a table of categories and how many articles are in them with a most-recent-event status_id of '1'.
What I have so far works, but is fairly slow (10 seconds) with the size of my tables. Wondering if there's a way to make this faster. All the tables have proper indexes as far as I know.
SELECT c.id,
c.name,
SUM(CASE WHEN e.status_id = 1 THEN 1 ELSE 0 END) article_count
FROM categories c
LEFT JOIN articles a ON a.category_id = c.id
LEFT JOIN (
SELECT article_id, MAX(id) event_id
FROM article_events
GROUP BY article_id
) most_recent ON most_recent.article_id = a.id
LEFT JOIN article_events e ON most_recent.event_id = e.id
GROUP BY c.id
Basically I have to join to the events table twice, since asking for the status_id along with the MAX(id) just returns the first status_id it finds, and not the one associated with the MAX(id) row.
Any way to make this better? or do I just have to live with 10 seconds? Thanks!
Edit:
Here's my EXPLAIN for the query:
ID | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
---------------------------------------------------------------------------------------------------------------------------
1 | PRIMARY | c | index | NULL | PRIMARY | 4 | NULL | 124044 | Using index; Using temporary; Using filesort
1 | PRIMARY | a | ref | category_id | category_id | 4 | c.id | 3 |
1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 6351 |
1 | PRIMARY | e | eq_ref | PRIMARY | PRIMARY | 4 | most_recent.event_id | 1 |
2 | DERIVED | article_events | ALL | NULL | NULL | NULL | NULL | 19743 | Using temporary; Using filesort
If you can eliminate subqueries with JOINs, it often performs better because derived tables can't use indexes. Here's your query without subqueries:
SELECT c.id,
c.name,
COUNT(a1.article_id) AS article_count
FROM categories c
LEFT JOIN articles a ON a.category_id = c.id
LEFT JOIN article_events ae1
ON ae1.article_id = a.id
LEFT JOIN article_events ae2
ON ae2.article_id = a.id
AND ae2.id > a1.id
WHERE ae2.id IS NULL
GROUP BY c.id
You'll want to experiment with the indexes and use EXPLAIN to test, but here's my guess (I'm assuming id fields are primary keys and you are using InnoDB):
categories: `name`
articles: `category_id`
article_events: (`article_id`, `id`)
Didn't try it, but I'm thinking this will save a bit of work for the database:
SELECT ae.article_id AS ref_article_id,
MAX(ae.id) event_id,
ae.status_id,
(select a.category_id from articles a where a.id = ref_article_id) AS cat_id,
(select c.name from categories c where c.id = cat_id) AS cat_name
FROM article_events
GROUP BY ae.article_id
Hope that helps
EDIT:
By the way... Keep in mind that joins have to go through each row, so you should start your selection from the small end and work your way up, if you can help it. In this case, the query has to run through 100,000 records, and join each one, then join those 100,000 again, and again, and again, even if values are null, it still has to go through those.
Hope this all helps...
I don't like that index on categories.id is used, as you're selecting the whole table.
Try running:
ANALYZE TABLE categories;
ANALYZE TABLE article_events;
and re-run the query.

Can I specify the order in which the joins occur?

Say I have three tables, A, B and C. Conceptually A (optionally) has one B, and B (always) has one C.
Table A:
a_id
... other stuff
Table B:
a_fk_id (foreign key to A.a_id, unique, primary, not null)
c_fk_id (foreign key to C.c_id, not null)
... other stuff
Table C:
c_id
... other stuff
I want to select All records from A as well as their associated records from B and C if present. However, the B and C data must only occur in the result if both B and C are present.
I feel like I want to do:
SELECT *
FROM
A
LEFT JOIN B on A.a_id=B.a_fk_id
INNER JOIN C on B.c_fk_id=C.c_id
But Joins seem to be left associative (the first join happens before the second join), so this will not give records from A that don't have an entry in C.
AFAICT I must use sub queries, something along the lines of:
SELECT *
FROM
A
LEFT JOIN (
SELECT * FROM B INNER JOIN C ON B.c_fk_id=C.c_id
) as tmp ON A.id = tmp.a_fk_id
but once I have a couple of such relationships in a query (in reality I may have two or three nested), I'm worried both about code complexity and about the query optimizer.
Is there a way for me to specify the join order, other than this subquery method?
Thanks.
In SQL Server you can do
SELECT *
FROM a
LEFT JOIN b
INNER JOIN c
ON b.c_fk_id = c.c_id
ON a.id = b.a_fk_id
The position of the ON clause means that the LEFT JOIN on b logically happens last. As far as I know this is standard (claimed to be ANSI prescribed here) but I'm sure the downvotes will notify me if it doesn't work in MySQL!
Edit: And that's what I get for talking faster than I think. My previous solution doesn't work because 'c' hasn't been joined yet. Let's try this again.
We can use a WHERE clause to limit the results to only those that match the criteria you're looking for, where C has a valid (IS NOT NULL) or B does not have a value (IS NULL). Like this:
SELECT *
FROM a
LEFT JOIN b ON (b.a = a.a)
LEFT JOIN c ON (c.b = b.b)
WHERE (c.c IS NOT NULL OR b.b IS NULL);
Without WHERE Results:
mysql> SELECT * FROM a LEFT JOIN b ON (b.a = a.a) LEFT JOIN c ON (c.b = b.b);
+------+------+------+------+------+
| a | a | b | c | b |
+------+------+------+------+------+
| 1 | 1 | 1 | 1 | 1 |
| 1 | 1 | 2 | NULL | NULL |
| 2 | 2 | 3 | 2 | 3 |
| 3 | NULL | NULL | NULL | NULL |
| 4 | NULL | NULL | NULL | NULL |
+------+------+------+------+------+
With WHERE Results:
mysql> SELECT * FROM a LEFT JOIN b ON (b.a = a.a) LEFT JOIN c ON (c.b = b.b) WHERE (c.c IS NOT NULL OR b.b IS NULL);
+------+------+------+------+------+
| a | a | b | c | b |
+------+------+------+------+------+
| 1 | 1 | 1 | 1 | 1 |
| 2 | 2 | 3 | 2 | 3 |
| 3 | NULL | NULL | NULL | NULL |
| 4 | NULL | NULL | NULL | NULL |
+------+------+------+------+------+
Yes, you use the STRAIGHT_JOIN for this.
When using this keyword the join will occur in the exact order that you specify.
See: http://dev.mysql.com/doc/refman/5.5/en/join.html
Well, I thought up another solution as well, and I'm posting it for completeness (Though I'm actually using Martin's answer).
Use a RIGHT JOIN:
SELECT
*
FROM
b
INNER JOIN c ON b.c_fk_id = c.c_id
RIGHT JOIN a ON a.id = b.a_fk_id
I'm pretty sure every piece I've read about JOINS said that RIGHT JOINs were pointless, but there you are.