Let me just say, first of all, that I'm not a mySQL guru; while I use it adequately I don't know a lot of details about it. In a system I just inherited, I've got this query:
SELECT DISTINCT profile2.f3
FROM node AS profile
JOIN node AS profile2
ON ( profile.f1 = profile2.f1 )
WHERE profile.f2 = "aString"
AND profile.f3 = "anotherString"
AND profile2.f2 = "aThirdString"
AND NOT EXISTS (SELECT profile3.f1
FROM node AS profile3
WHERE profile3.f1 = profile.f1
AND profile3.f2 = "yetAnotherString") ;
SHOW CREATE TABLE gives:
CREATE TABLE `node` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`graph` varchar(100) CHARACTER SET latin1 DEFAULT NULL,
`f1` varchar(200) NOT NULL,
`f2` varchar(200) NOT NULL,
`f3` mediumtext NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `nodeindex` (`graph`(20),`f1`(100),`f2`(100),`f3`(100)),
KEY `ix_node_f1` (`f1`),
KEY `ix_node_graph` (`graph`),
KEY `ix_node_f3` (`f3`(255)),
KEY `ix_node_f2` (`f2`),
KEY `node_po` (`f2`,`f3`(130)),
KEY `node_so` (`f1`,`f3`(130)),
KEY `node_sp` (`f1`,`f2`(130)),
FULLTEXT KEY `node_search` (`f3`)
) ENGINE=MyISAM AUTO_INCREMENT=455854703 DEFAULT CHARSET=utf8
EXPLAIN EXTENDED gives:
+----+--------------------+----------+------+--------------------------------------------------------------------------------------+---------+---------+-----------------------------------+-------+----------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+----------+------+--------------------------------------------------------------------------------------+---------+---------+-----------------------------------+-------+----------+------------------------------+
| 1 | PRIMARY | profile | ref | ix_node_f1,ix_node_f3,ix_node_f2,node_po,node_so,node_sp,node_search | node_po | 994 | const,const | 49084 | 100.00 | Using where; Using temporary |
| 1 | PRIMARY | profile2 | ref | ix_node_f1,ix_node_f2,node_po,node_so,node_sp | node_sp | 994 | sumazi_prdf.profile.f1,const | 1 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | profile3 | ref | ix_node_f1,ix_node_f2,node_po,node_so,node_sp | node_sp | 994 | sumazi_prdf.profile.f1,const | 1 | 100.00 | Using where |
+----+--------------------+----------+------+--------------------------------------------------------------------------------------+---------+---------+-----------------------------------+-------+----------+------------------------------+
As I say, I'm not an RDBMS guru, but my intuition suggests that the performance of this query could be substantially improved. Any suggestions?
You can try this and this should be relatively faster or you can go for joins
SELECT DISTINCT profile2.f3
FROM node AS profile
JOIN node AS profile2
ON ( profile.f1 = profile2.f1 )
WHERE profile.f2 = "aString"
AND profile.f3 = "anotherString"
AND profile2.f2 = "aThirdString"
AND PROFILE.F1 NOT IN (SELECT profile3.f1
FROM node AS profile3
WHERE profile3.f2 = "yetAnotherString") ;
Left Joins ... Where NULL tend to be faster than Not Exists clauses in MySQL; in other RDBMSs, it tends to be the other way round. Try:
SELECT DISTINCT profile2.f3
FROM node AS profile
JOIN node AS profile2 ON profile.f1 = profile2.f1
LEFT JOIN node AS profile3 ON profile.f1 = profile3.f1
AND profile3.f2 = "yetAnotherString"
WHERE profile.f2 = "aString"
AND profile.f3 = "anotherString"
AND profile2.f2 = "aThirdString"
AND profile3.f1 IS NULL
Related
I have a query with 2 INNER JOIN statements, and only fetching a few column, but it is very slow even though I have indexes on all required columns.
My query
SELECT
dysfonctionnement,
montant,
listRembArticles,
case when dys.reimputation is not null then dys.reimputation else dys.responsable end as responsable_final
FROM
db.commandes AS com
INNER JOIN db.dysfonctionnements AS dys ON com.id_commande = dys.id_commande
INNER JOIN db.pe AS pe ON com.code_pe = pe.pe_id
WHERE
com.prestataireLAD REGEXP '.*'
AND pe_nom REGEXP 'bordeaux|chambéry-annecy|grenoble|lyon|marseille|metz|montpellier|nancy|nice|nimes|rouen|strasbourg|toulon|toulouse|vitry|vitry bis 1|vitry bis 2|vlg'
AND com.date_livraison BETWEEN '2022-06-11 00:00:00'
AND '2022-07-08 00:00:00';
It takes around 20 seconds to compute and fetch 4123 rows.
The problem
In order to find what's wrong and why is it so slow, I've used the EXPLAIN statement, here is the output:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|----|-------------|-------|------------|--------|----------------------------|-------------|---------|------------------------|--------|----------|-------------|
| 1 | SIMPLE | dys | | ALL | id_commande,id_commande_2 | | | | 878588 | 100.00 | Using where |
| 1 | SIMPLE | com | | eq_ref | id_commande,date_livraison | id_commande | 110 | db.dys.id_commande | 1 | 7.14 | Using where |
| 1 | SIMPLE | pe | | ref | pe_id | pe_id | 5 | db.com.code_pe | 1 | 100.00 | Using where |
I can see that the dysfonctionnements JOIN is rigged, and doesn't use a key even though it could...
Table definitions
commandes (included relevant columns only)
CREATE TABLE `commandes` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`id_commande` varchar(36) NOT NULL DEFAULT '',
`date_commande` datetime NOT NULL,
`date_livraison` datetime NOT NULL,
`code_pe` int(11) NOT NULL,
`traitement_dysfonctionnement` tinyint(4) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_commande` (`id_commande`),
KEY `date_livraison` (`date_livraison`),
KEY `traitement_dysfonctionnement` (`traitement_dysfonctionnement`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
dysfonctionnements (again, relevant columns only)
CREATE TABLE `dysfonctionnements` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`id_commande` varchar(36) DEFAULT NULL,
`dysfonctionnement` varchar(150) DEFAULT NULL,
`responsable` varchar(50) DEFAULT NULL,
`reimputation` varchar(50) DEFAULT NULL,
`montant` float DEFAULT NULL,
`listRembArticles` text,
PRIMARY KEY (`id`),
UNIQUE KEY `id_commande` (`id_commande`,`dysfonctionnement`),
KEY `id_commande_2` (`id_commande`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
pe (again, relevant columns only)
CREATE TABLE `pe` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`pe_id` int(11) DEFAULT NULL,
`pe_nom` varchar(30) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `pe_nom` (`pe_nom`),
KEY `pe_id` (`pe_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Investigation
If I remove the db.pe table from the query and the WHERE clause on pe_nom, the query takes 1.7 seconds to fetch 7k rows, and with the EXPLAIN statement, I can see it is using keys as I expect it to do:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|----|-------------|-------|------------|-------|----------------------------|----------------|---------|------------------------|--------|----------|-----------------------------------------------|
| 1 | SIMPLE | com | | range | id_commande,date_livraison | date_livraison | 5 | | 389558 | 100.00 | Using index condition; Using where; Using MRR |
| 1 | SIMPLE | dys | | ref | id_commande,id_commande_2 | id_commande_2 | 111 | ooshop.com.id_commande | 1 | 100.00 | |
I'm open to any suggestions, I see no reason not to use the key when it does on a very similar query and it definitely makes it faster...
I had a similar experience when MySQL optimiser selected a joined table sequence far from optimal. At that time I used MySQL specific STRAIGHT_JOIN operator to overcome default optimiser behaviour. In your case I would try this:
SELECT
dysfonctionnement,
montant,
listRembArticles,
case when dys.reimputation is not null then dys.reimputation else dys.responsable end as responsable_final
FROM
db.commandes AS com
STRAIGHT_JOIN db.dysfonctionnements AS dys ON com.id_commande = dys.id_commande
INNER JOIN db.pe AS pe ON com.code_pe = pe.pe_id
Also, in your WHERE clause one of the REGEXP probably might be changed to IN operator, I assume it can use index.
Remove com.prestataireLAD REGEXP '.*'. The Optimizer probably won't realize that this has no impact on the resultset. If you are dynamically building the WHERE clause, then eliminate anything else you can.
id_commande_2 is redundant. In queries where it might be useful, the UNIQUE can take care of it.
These indexes might help:
com: INDEX(date_livraison, id_commande, code_pe)
pe: INDEX(pe_nom, pe_id)
I have a aggregate query with two levels deep subqueries. What is strange is that the two subqueries run acceptably fast but the outside query unacceptably slow.
The basic idea behind the query is to use a table to find all elements linked to a key, selected by one of the elements queries. This resultant set should then be provided to the outside query that will match it according to its own keys/indexes.
Here with all outputs and statements:
We start with the two table definitions
CREATE TABLE `table1` (
`id1` int(11) NOT NULL DEFAULT '0',
`id2` int(11) NOT NULL,
`value` int(11) DEFAULT '0',
PRIMARY KEY (`id1`,`id2`),
KEY `k_id1` (`id1`),
KEY `k_id2` (`id2`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `lookuptable1` (
`id3` int(11) NOT NULL,
`id4` int(11) NOT NULL,
PRIMARY KEY (`id3`,`id4`),
UNIQUE KEY `id4_idx` (`id4`),
KEY `id3_idx` (`id3`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
The inside subquery with it's own subquery
SELECT lt1.id4
FROM lookuptable1 lt1
WHERE lt1.id3 = (SELECT pt1.id3
FROM lookuptable1 pt1
WHERE pt1.id4 = 5960)
+-----------+
| id4 |
+-----------+
| 5960 |
| 17215 |
| 3625734 |
| 9312798 |
+-----------+
4 rows in set (0.00 sec)
As you can see: Fast enough.
But the outside query is where the bad bottleneck lies.
Complete query
SELECT
t1.id1,
sum(t1.value)
FROM table1 t1
WHERE t1.id2 = 3 AND t1.id1 IN
(
SELECT lt1.id4
FROM lookuptable1 lt1
WHERE lt1.id3 = (SELECT pt1.id3
FROM lookuptable1 pt1
WHERE pt1.id4 = 5960)
);
+-----------+-----------------------+
| id 1. | sum(t1.value) |
+-----------+-----------------------+
| 9312798 | 0 |
+-----------+-----------------------+
1 row in set (8.01 sec)
That is 8 seconds too slow
herewith the Explain extended for this query:
+----+--------------------+-------+--------+-------------------+-------------+---------+------------+---------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+--------+-------------------+-------------+---------+------------+---------+----------+--------------------------+
| 1 | PRIMARY | t1 | index | NULL. | PRIMARY | 8 | NULL. | 1454343 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | lt1 | eq_ref | PRIMARY,id3,id4 | PRIMARY | 8 | const,func | 1 | 100.00 | Using where; Using index |
| 3 | SUBQUERY | pt1 | const | id4 | id4_idx | 4 | | 1 | 100.00 | Using index |
+----+--------------------+-------+--------+-------------------+-------------+---------+------------+---------+----------+--------------------------+
As I understand from this, the outside query doesn't actually use the index that it could.
What could we possibly be doing wrong in this query. Surely it should be running much much faster.
I tried running the outside query with the subqueries' result copy-pasted inside the IN clause (in other words the subqueries aren't run. It runs normally fast. Here's the explain extended then:
+----+-------------+-------+-------+----------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+-------+----------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | t1 | range | PRIMARY,k_id1 | PRIMARY | 4 | NULL | 5 | 100.00 | Using where |
+----+-------------+-------+-------+----------------+---------+---------+------+------+----------+-------------+
Oh yeah. This is running on MySQL 5.5
you could avoid the IN clause using an inner join
SELECT
t1.id1,
sum(t1.value)
FROM table1 t1
INNER JOIN (
SELECT lt1.id4
FROM lookuptable1 lt1
WHERE lt1.id3 = (SELECT pt1.id3
FROM lookuptable1 pt1
WHERE pt1.id4 = 5960)
) t on t.id4 = t1.id1 and t1.id2 = 3
and this could improve your query ..
be sure you have a proper index on table1 (id1, id2)
I have problem with query using JOIN and MAX/MIN. For Example:
SELECT Min(a.date), Max(a.date)
FROM a
INNER JOIN b ON b.ID = a.ID AND b.cID = 5
Its possible to add index or change this query result was better?
Below the result of explain
+----+-------------+----------+------+-----------------+-----+---------+-----------+--------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+-----------------+-----+---------+-----------+--------+-----------------------+
| 1 | SIMPLE | b | ref | PRIMARY,cID | cID | 5 | const | 680648 | Using index |
| 1 | SIMPLE | a | ref | ID | ID | 5 | base.b.ID | 1 | Using index condition |
+----+-------------+----------+------+-----------------+-----+---------+-----------+--------+-----------------------+
Sorry, but I would not put here the whole table, and could make a lot of confusion.
CREATE TABLE `a` (
`ID` int(11) NOT NULL,
`date` datetime DEFAULT,
PRIMARY KEY (`ID`),
KEY `date` (`date`),
)
CREATE TABLE `b` (
`bID` int(11) NOT NULL,
`ID` int(11) NOT NULL,
`cID` int(11) DEFAULT,
PRIMARY KEY (`bID`),
KEY `cID` (`cID`),
)
b: INDEX(cID, ID)
will make that a "covering" index, so it will probably get through the 680648 rows faster. It should replace the current KEY(cID).
Key_len for b is 5. That disagrees with the table definition; something got simplified too much.
We have simple database with 4 tables: files, file_versions, users, organizations.
I do select all files which owned by some organization with some condition on trashing date by this query:
select * FROM organizations o
LEFT JOIN users u ON o.id=u.organization_id
LEFT JOIN files f ON u.user_identity=f.owner_identity
LEFT JOIN file_versions fv ON f.owner_identity=fv.owner_identity
AND f.local_path=fv.local_path
WHERE o.id=2001237 AND o.trashed_file_age_limit>=1
AND f.trashing_date<(1433943058 - o.trashed_file_age_limit*24*60*60);
Explain select shows me that optimizer choose wrong table order, which is different from query order(organizations-> users->files->file_versions):
mysql> explain select * FROM organizations o LEFT JOIN users u ON o.id=u.organization_id LEFT JOIN files f ON u.user_identity=f.owner_identity LEFT JOIN file_versions fv ON f.owner_identity=fv.owner_identity AND f.local_path=fv.local_path WHERE o.id=2001237 AND o.trashed_file_age_limit>=1 AND f.trashing_date<(1433943058 - o.trashed_file_age_limit*24*60*60);
+----+-------------+-------+--------+----------------------------------+----------+---------+----------------------------------------------------+-----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------------------------+----------+---------+----------------------------------------------------+-----------+-------------+
| 1 | SIMPLE | o | const | PRIMARY | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | f | ALL | PRIMARY | NULL | NULL | NULL | 109615125 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY,identity,organization_id | identity | 36 | filemirror.f.owner_identity | 1 | Using where |
| 1 | SIMPLE | fv | ref | PRIMARY | PRIMARY | 3035 | filemirror.u.user_identity,filemirror.f.local_path | 1 | |
+----+-------------+-------+--------+----------------------------------+----------+---------+----------------------------------------------------+-----------+-------------+
4 rows in set (0.01 sec)
Of couse this query is slow because of full scan by files table and I have to use STRAIGHT_JOIN(which is not equivalent to LEFT JOIN) to fix table order and make query faster.
mysql> explain select * FROM organizations o STRAIGHT_JOIN users u ON o.id=u.organization_id STRAIGHT_JOIN files f ON u.user_identity=f.owner_identity STRAIGHT_JOIN file_versions fv ON f.owner_identity=fv.owner_identity AND f.local_path=fv.local_path WHERE o.id=2001237 AND o.trashed_file_age_limit>=1 AND f.trashing_date<(1433943058 - o.trashed_file_age_limit*24*60*60);
+----+-------------+-------+-------+----------------------------------+---------+---------+----------------------------------------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+----------------------------------+---------+---------+----------------------------------------------------+---------+-------------+
| 1 | SIMPLE | o | const | PRIMARY | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | u | ref | PRIMARY,identity,organization_id | PRIMARY | 4 | const | 36 | |
| 1 | SIMPLE | f | ref | PRIMARY | PRIMARY | 36 | filemirror.u.user_identity | 6089324 | Using where |
| 1 | SIMPLE | fv | ref | PRIMARY | PRIMARY | 3035 | filemirror.u.user_identity,filemirror.f.local_path | 1 | |
+----+-------------+-------+-------+----------------------------------+---------+---------+----------------------------------------------------+---------+-------------+
4 rows in set (0.00 sec)
My question is why mysql can change table order in not symmetric join operation?
Tables structure:
CREATE TABLE `file_versions` (
`owner_identity` char(36) character set latin1 collate latin1_bin NOT NULL,
`local_path` varchar(999) character set utf8 NOT NULL,
`version_number` int(11) unsigned NOT NULL,
...
PRIMARY KEY (`owner_identity`,`local_path`,`version_number`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
CREATE TABLE `files` (
`owner_identity` char(36) character set latin1 collate latin1_bin NOT NULL,
`local_path` varchar(999) character set utf8 NOT NULL,
`version_number` int(11) unsigned NOT NULL,
..
`trashing_date` int(11) default NULL,
...
PRIMARY KEY (`owner_identity`,`local_path`),
KEY `trashing_date` (`trashing_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
CREATE TABLE `organizations` (
`id` int(11) NOT NULL,
...
`trashed_file_age_limit` int(11) default NULL,
...
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
CREATE TABLE `users` (
`organization_id` int(11) NOT NULL,
`id` int(11) NOT NULL,
`user_identity` char(36) character set latin1 collate latin1_bin NOT NULL,
...
PRIMARY KEY (`organization_id`,`id`),
UNIQUE KEY `identity` (`user_identity`),
KEY `organization_id` (`organization_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
Mysql version 5.5
Look at the rows estimates, mysql thinks that it will need to read 109M rows of files table in first plan and 6M for each of 36 users = 216M rows for second plan. So it seems reasonable to read all 109M rows only once and in priamry key order instead reading them in separate blocks.. Those estimates does not seem very reasonable to me, so I would try running analyze table on files, but they are estimates so maybe you wont get better numbers.
Using LEFT join and then adding condition on the table to WHERE turns it into INNER join as Strawberry says in their comment - you have to have value for the where condition to ever be true, so mysql feels free to reorder those a bit, maybe even it seems better for optimizer to do "really-inner" joins first, so that may be second reason for that plan.
You can try using STRAIGHT_JOIN in different way - if you put it just once right after SELECT, then your join order is used by optimizer if possible (it usually is barring some weird right joins and other corner cases) without changing join type on specific tables (it is then used as sort of FLAG, in the way SQL_NO_CACHE is used to signalize something, instead of as special join type)
Then to make it even better, you may try adding index to files on (owner_identity, trashing_date) which should help in localizing specific files for each user and not globally as with current key on (trashing_date) only.
In MySQL 5.0.75-0ubuntu10.2 I've got a fixed table layout like that:
Table parent with an id
Table parent2 with an id
Table children1 with a parentId
CREATE TABLE `Parent` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(200) default NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB
CREATE TABLE `Parent2` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(200) default NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB
CREATE TABLE `Children1` (
`id` int(11) NOT NULL auto_increment,
`parentId` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `parent` (`parentId`)
) ENGINE=InnoDB
A children has a parent in one of the tables Parent or Parent2. When I need to get a children I use a query like that:
select * from Children1 c
inner join (
select id as parentId from Parent
union
select id as parentId from Parent2
) p on p.parentId = c.parentId
Explaining this query yields:
+----+--------------+------------+-------+---------------+---------+---------+------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+-------+---------------+---------+---------+------+------+-----------------------------------------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |
| 2 | DERIVED | Parent | index | NULL | PRIMARY | 4 | NULL | 1 | Using index |
| 3 | UNION | Parent2 | index | NULL | PRIMARY | 4 | NULL | 1 | Using index |
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+------------+-------+---------------+---------+---------+------+------+-----------------------------------------------------+
4 rows in set (0.00 sec)
which is reasonable given the layout.
Now the problem: The previous query is somewhat useless, since it returns no columns from the parent elements. In the moment I add more columns to the inner query no index will be used anymore:
mysql> explain select * from Children1 c inner join ( select id as parentId,name from Parent union select id as parentId,name from Parent2 ) p on p.parentId = c.parentId;
+----+--------------+------------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |
| 2 | DERIVED | Parent | ALL | NULL | NULL | NULL | NULL | 1 | |
| 3 | UNION | Parent2 | ALL | NULL | NULL | NULL | NULL | 1 | |
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+------------+------+---------------+------+---------+------+------+-----------------------------------------------------+
4 rows in set (0.00 sec)
Can anyone explain why the (PRIMARY) indices are not used any more? Is there a workaround for this problem if possible without having to change the DB layout?
Thanks!
I think that the optimizer falls down once you start pulling out multiple columns in the derived query because of the possibility that it would need to convert data types on the union (not in this case, but in general). It may also be due to the fact that your query essentially wants to be a correlated derived subquery, which isn't possible (from dev.mysql.com):
Subqueries in the FROM clause cannot be correlated subqueries, unless used within the ON clause of a JOIN operation.
What you are trying to do (but isn't valid) is:
select * from Children1 c
inner join (
select id as parentId from Parent where Parent.id = c.parentId
union
select id as parentId from Parent2 where Parent.id = c.parentId
) p
Result: "Unknown column 'c.parentId' in 'where clause'.
Is there a reason you don't prefer two left joins and IFNULLs:
select *, IFNULL(p1.name, p2.name) AS name from Children1 c
left join Parent p1 ON p1.id = c.parentId
left join Parent2 p2 ON p2.id = c.parentId
The only difference between the queries is that in yours you'll get two rows if there is a parent in each table. If that's what you want/need then this will work well also and joins will be fast and always make use of the indexes:
(select * from Children1 c join Parent p1 ON p1.id = c.parentId)
union
(select * from Children1 c join Parent2 p2 ON p2.id = c.parentId)
My first thought is to insert a "significant" number of records in the tables and use ANALYZE TABLE to update the statistics. A table with 4 records will always be faster to read using a full scan rather then going via the index!
Further, you can try USE INDEX to force the usage of the index and look how the plan changes.
I will also recomend reading this documentation and see which bits are relevant
MYSQL::Optimizing Queries with EXPLAIN
This article can also be useful
7 ways to convince MySQL to use the right index