I am running the following query:
SELECT p.val1, p.val2, p.val3, p.val4, p.val5, p.val6, p.val7, p.val8
FROM db1.tbl1 AS p
INNER JOIN db2.tbl2 vp ON p.pid = vp.pid
INNER JOIN db2.tbl1 AS vs ON vp.vid = vs.vid
INNER JOIN db3.tbl1 AS sa ON vs.sid = sa.sid
LEFT JOIN db4.tbl1 AS fs ON p.aid = fs.aid
WHERE sa.id = '11594'
AND fs.aid IS NULL
ORDER BY IF( (
ISNULL( egl )
OR egl = '' ) , 1, 0
), egl DESC
LIMIT 15
OFFSET 0
Unfortunately, it just hangs when run.
Running an EXPLAIN nets me this info:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
|*---*|*-----------*|*-----*|*-----*|*---------------------------------*|*----------*|*-------*|*----------*|*------*|*--------------
| 1 | SIMPLE | sa | const | PRIMARY,s_key,p_key,n_key,ignored | PRIMARY | 4 | const | 1 | Using filesort
| 1 | SIMPLE | p | ALL | PRIMARY, pid | NULL | NULL | NULL | 744704 |
| 1 | SIMPLE | vp | ref | PRIMARY,pid | pid | 130 | db1.p.pid | 1 | Using index
| 1 | SIMPLE | vs | ref | vid | vid | 130 | db2.vp.vid | 1 | Using where
| 1 | SIMPLE | fs | ref | a_key | a_key | 97 | func | 1 | Using where; Using index
If I and USE INDEX or FORCE INDEX after the FROM db1.tbl1 AS p, it does not change a thing.
My assumption is the problem is that table p isn't using any of the indexes. Is this assumption correct?
What are some reasons this query wouldn't use one of the possible keys?
The problem was with the ORDER BY clause. The dbms was attempting to apply it to db1.tbl1 before the joins (apparently). Wrapping the query in a select and putting the ORDER BY outside made the dbms work as expected.
SELECT * FROM
(SELECT p.val1, p.val2, p.val3, p.val4, p.val5, p.val6, p.val7, p.val8
FROM db1.tbl1 AS p
INNER JOIN db2.tbl2 vp ON p.pid = vp.pid
INNER JOIN db2.tbl1 AS vs ON vp.vid = vs.vid
INNER JOIN db3.tbl1 AS sa ON vs.sid = sa.sid
LEFT JOIN db4.tbl1 AS fs ON p.aid = fs.aid
WHERE sa.id = '11594'
AND fs.aid IS NULL) AS tmp
ORDER BY IF( (
ISNULL( egl )
OR egl = '' ) , 1, 0
), egl DESC
LIMIT 15
OFFSET 0
Related
I have a select query, that selects over 50k records from MySQL 5.5 database at once, and this amount is expected to grow. The query contains multiple subquery which is taking over 120s to execute.
Initially some of the sale_items and stock tables didn't have more that the ID keys, so I added some more:
SELECT
`p`.`id` AS `id`,
`p`.`Name` AS `Name`,
`p`.`Created` AS `Created`,
`p`.`Image` AS `Image`,
`s`.`company` AS `supplier`,
`s`.`ID` AS `supplier_id`,
`c`.`name` AS `category`,
IFNULL((SELECT
SUM(`stocks`.`Total_Quantity`)
FROM `stocks`
WHERE (`stocks`.`Product_ID` = `p`.`id`)), 0) AS `total_qty`,
IFNULL((SELECT
SUM(`sale_items`.`quantity`)
FROM `sale_items`
WHERE (`sale_items`.`product_id` = `p`.`id`)), 0) AS `total_sold`,
IFNULL((SELECT
SUM(`sale_items`.`quantity`)
FROM `sale_items`
WHERE ((`sale_items`.`product_id` = `p`.`id`) AND `sale_items`.`Sale_ID` IN (SELECT
`refunds`.`Sale_ID`
FROM `refunds`))), 0) AS `total_refund`
FROM ((`products` `p`
LEFT JOIN `cats` `c`
ON ((`c`.`ID` = `p`.`cat_id`)))
LEFT JOIN `suppliers` `s`
ON ((`s`.`ID` = `p`.`supplier_id`)))
This is the explain result
+----+--------------------+------------+----------------+------------------------+------------------------+---------+---------------------------------
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+----------------+------------------------+------------------------+---------+---------------------------------
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 20981 | |
| 2 | DERIVED | p | ALL | NULL | NULL | NULL | NULL | 20934 | |
| 2 | DERIVED | c | eq_ref | PRIMARY | PRIMARY | 4 | p.cat_id | 1 | |
| 2 | DERIVED | s | eq_ref | PRIMARY | PRIMARY | 4 | p.supplier_id | 1 | |
| 5 | DEPENDENT SUBQUERY | sale_items | ref | sales_items_product_id | sales_items_product_id | 5 | p.id | 33 | Using where |
| 6 | DEPENDENT SUBQUERY | refunds | index_subquery | IDX_refunds_sale_id | IDX_refunds_sale_id | 5 | func | 1 | Using index; Using where |
| 4 | DEPENDENT SUBQUERY | sale_items | ref | sales_items_product_id | sales_items_product_id | 5 | p.id | 33 | Using where |
| 3 | DEPENDENT SUBQUERY | stocks | ref | IDX_stocks_product_id | IDX_stocks_product_id | 5 | p.id | 1 | Using where |
+----+--------------------+------------+----------------+------------------------+------------------------+---------+---------------------------------
I am expecting that the query takes less that 3s at most, but I can't seem to figure out the best way to optimize this query.
The query looks fine to me. You select all data and aggregate some of it. This takes time. Your explain plan shows there are indexes on the IDs, which is good. And at a first glance there is not much we seem to be able to do here...
What you can do, though, is provide covering indexes, i.e. indexes that contain all columns you need from a table, so the data can be taken from the index directly.
create index idx1 on cats(id, name);
create index idx2 on suppliers(id, company);
create index idx3 on stocks(product_id, total_quantity);
create index idx4 on sale_items(product_id, quantity, sale_id);
This can really boost your query.
What you can try About the query itself is to move the subqueries to the FROM clause. MySQL's optimizer is not great, so although it should get the same execution plan, it may well be that it favors the FROM clause.
SELECT
p.id,
p.name,
p.created,
p.image,
s.company as supplier,
s.id AS supplier_id,
c.name AS category,
COALESCE(st.total, 0) AS total_qty,
COALESCE(si.total, 0) AS total_sold,
COALESCE(si.refund, 0) AS total_refund
FROM products p
LEFT JOIN cats c ON c.id = p.cat_id
LEFT JOIN suppliers s ON s.id = p.supplier_id
LEFT JOIN
(
SELECT SUM(total_quantity) AS total
FROM stocks
GROUP BY product_id
) st ON st.product_id = p.id
LEFT JOIN
(
SELECT
SUM(quantity) AS total,
SUM(CASE WHEN sale_id IN (SELECT sale_id FROM refunds) THEN quantity END) as refund
FROM sale_items
GROUP BY product_id
) si ON si.product_id = p.id;
(If sale_id is unique in refunds, then you can even join it to sale_items. Again: this should usually not make a difference, but in MySQL it may still. MySQL was once notorious for treating IN clauses much worse than the FROM clause. This may not be the case anymore, I don't know. You can try - if refunds.sale_id is unique).
I have troubles with the subsequent query, which I submit to a MySQL server. It takes 25s... for (COUNT(*) < 20k)-tables - only featured have 600k rows. However, indexes are created where it should (that is to say : for concerned columns in ON clauses). I tried to remove GROUP BY, which improved the case a bit. But the queries still give a slow response a general rule. I made that post because I could not find a solution into the variety of cases found into stackoverflow. Any suggestion?
SELECT
doctor.id as doctor_id,
doctor.uuid as doctor_uuid,
doctor.firstname as doctor_firstname,
doctor.lastname as doctor_lastname,
doctor.cloudRdvMask as doctor_cloudRdvMask,
GROUP_CONCAT(recommendation.id SEPARATOR ' ') as recommendation_ids,
GROUP_CONCAT(recommendation.uuid SEPARATOR ' ') as recommendation_uuids,
GROUP_CONCAT(recommendation.disponibility SEPARATOR ' ') as recommendation_disponibilities,
GROUP_CONCAT(recommendation.user_id SEPARATOR ' ') as recommendation_user_ids,
GROUP_CONCAT(recommendation.user_uuid SEPARATOR ' ') as recommendation_user_uuids,
location.id as location_id,
location.uuid as location_uuid,
location.lat as location_lat,
location.lng as location_lng,
profession.id as profession_id,
profession.uuid as profession_uuid,
profession.name as profession_name
FROM featured as doctor
LEFT JOIN location as location
ON doctor.location_id = location.id
LEFT JOIN profession as profession
ON doctor.profession_id = profession.id
LEFT JOIN
(
SELECT
featured.id as id,
featured.uuid as uuid,
featured.doctor_id as doctor_id,
featured.disponibility as disponibility,
user.id as user_id,
user.uuid as user_uuid
FROM featured as featured
LEFT JOIN user as user
ON featured.user_id = user.id
WHERE discr = 'recommendation'
) as recommendation
ON recommendation.doctor_id = doctor.id
WHERE
doctor.discr = 'doctor'
AND
doctor.state = 'PubliƩ'
GROUP BY doctor.uuid
Here comes the EXPLAIN result:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
1 | SIMPLE | doctor | NULL | ref | discr,state | discr | 767 | const | 194653 | 50.00 | Using where |
1 | SIMPLE | location | NULL | eq_ref | PRIMARY | PRIMARY | 4 | doctoome.doctor.location_id | 1 | 100.00 | NULL |
1 | SIMPLE | profession | NULL | eq_ref | PRIMARY | PRIMARY | 4 | doctoome.doctor.profession_id | 1 | 100.00 | NULL |
1 | SIMPLE | featured | NULL | ref | IDX_3C1359D487F4FB17,discr | IDX_3C1359D487F4FB17 | 5 | doctoome.doctor.id | 196 | 100.00 | Using where |
1 | SIMPLE | user | NULL | eq_ref | PRIMARY | PRIMARY | 4 | doctoome.featured.user_id | 1 | 100.00 | Using index |
EDIT This link helped me, it goes now with 8s. https://www.percona.com/blog/2016/10/12/mysql-5-7-performance-tuning-immediately-after-installation/. But I still find it slow, I just let it in case anybody would know what could also be improved. Thanks
I think removing the subquery might help, along with some more indexes:
SELECT . . . -- you need to fix the `GROUP_CONCAT()` column references
FROM featured doctor LEFT JOIN
location location
ON doctor.location_id = location.id LEFT JOIN
profession profession
ON doctor.profession_id = profession.id LEFT JOIN
featured featured
ON featured.doctor_id = doctor.doctor_id LEFT JOIN
user user
ON featured.user_id = user.id
WHERE doctor.discr = 'doctor' AND
doctor.state = 'PubliƩ' AND
featured.discr = 'recommendation'
GROUP BY doctor.uuid;
Then you want an index on featured(discr, state, doctor_id, location_id, profession_id) and featured(doctor_id, discr, user_id).
I have a report that needs running to satisfy our reporting requirements for a government body. The report is supposed to return the study load for each student in each module for a given period of time.
For example the report needs to return the students enrolled in a given module for a given intake in a given year and semester, with a census date (a government specified date that after which the student is liable for the cost of the unit even if they withdraw)
So I've written this mysql query
SELECT
e.enrolstudent AS '313',
(SELECT c.ntiscode FROM course c WHERE c.courseid=ec.courseid) AS '307',
e.startdate as '534',
'AOU' as '333',
m.mod_eftsl as '339',
e.enrolmod as '354',
e.census_date as '489',
m.diciplinecode as '464',
(CASE
WHEN m.mode = 'Face to Face' THEN 1
WHEN m.mode = 'Online' THEN 2
WHEN m.mode = 'RPL' THEN 5
ELSE 3
END) AS '329',
'A6090' as '477',
up.citizen AS '358',
vf.maxcontribute as '392',
vf.studentstatus as '490',
vf.total_amount_charged as '384',
vf.amount_paid as '381',
vf.loan_fee as '529',
u.chessn as '488',
m.workexp as '337',
'0' as '390',
m.sumwinschool as '551',
vf.help_debt as '558'
FROM
enrolment e
INNER JOIN enrolcourse AS ec ON ec.studentid=e.enrolstudent
INNER JOIN vetfee AS vf ON vf.userid=e.enrolstudent
INNER JOIN users AS u ON u.userid = e.enrolstudent
INNER JOIN users_personal AS up ON up.userid = e.enrolstudent
INNER JOIN module AS m ON m.modshortname = e.enrolmod
WHERE
e.online_intake in (select oi.intakecode from online_intake oi where STR_TO_DATE(oi.censusdate,'%d-%m-%Y') > '2015-07-01' and STR_TO_DATE(oi.censusdate,'%d-%m-%Y') < '2015-09-31') AND
e.enrolstudent NOT LIKE '%onlinetutor%' AND
e.enrolstudent NOT LIKE '%tes%' AND
e.enrolstudent NOT like '%student%' AND
e.enrolrole = 'student'
ORDER BY e.enrolstudent;"
It seems to hang, I've left it running for an hour with no result. There's only 10189 records in th enrolment table, 1538 in enrolcourse,650 in module. I don't think its the number of records, I'm guessing I've just constructed my query wrong, first time using joins (other than natural). Any ideas or tips in improving this would be greatly appreciated.
select count(*) from enrolment;
+----------+
| count(*) |
+----------+
| 10189 |
+----------+
select count(*) from enrolcourse;
+----------+
| count(*) |
+----------+
| 1538 |
+----------+
select count(*) from vetfee;
+----------+
| count(*) |
+----------+
| 1538 |
+----------+
select count(*) from users;
+----------+
| count(*) |
+----------+
| 1249 |
+----------+
select count(*) from users_personal;
+----------+
| count(*) |
+----------+
| 941 |
+----------+
select count(*) from module;
+----------+
| count(*) |
+----------+
| 650 |
Here's the results of the EXPLAIN
+----+--------------------+-------+------+---------------+------+---------+------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+---------------+------+---------+------+-------+---------------------------------+
| 1 | PRIMARY | m | ALL | NULL | NULL | NULL | NULL | 691 | Using temporary; Using filesort |
| 1 | PRIMARY | up | ALL | NULL | NULL | NULL | NULL | 987 | Using join buffer |
| 1 | PRIMARY | u | ALL | NULL | NULL | NULL | NULL | 1180 | Using where; Using join buffer |
| 1 | PRIMARY | ec | ALL | NULL | NULL | NULL | NULL | 1607 | Using where; Using join buffer |
| 1 | PRIMARY | e | ALL | NULL | NULL | NULL | NULL | 10629 | Using where; Using join buffer |
| 1 | PRIMARY | vf | ALL | NULL | NULL | NULL | NULL | 10959 | Using where; Using join buffer |
| 3 | DEPENDENT SUBQUERY | oi | ALL | NULL | NULL | NULL | NULL | 42 | Using where |
| 2 | DEPENDENT SUBQUERY | c | ALL | NULL | NULL | NULL | NULL | 23 | Using where |
+----+--------------------+-------+------+---------------+------+---------+------+-------+---------------------------------+
Get rid of those correlated subqueries. Use a join instead.
Also, use BETWEEN to reduce one STR_TO_DATE call
Finally, you should look at a way of eliminating all those LIKE calls.
SELECT
e.enrolstudent AS '313',
c.ntiscode AS '307',
e.startdate as '534',
'AOU' as '333',
m.mod_eftsl as '339',
e.enrolmod as '354',
e.census_date as '489',
m.diciplinecode as '464',
(CASE
WHEN m.mode = 'Face to Face' THEN 1
WHEN m.mode = 'Online' THEN 2
WHEN m.mode = 'RPL' THEN 5
ELSE 3
END) AS '329',
'A6090' as '477',
up.citizen AS '358',
vf.maxcontribute as '392',
vf.studentstatus as '490',
vf.total_amount_charged as '384',
vf.amount_paid as '381',
vf.loan_fee as '529',
u.chessn as '488',
m.workexp as '337',
'0' as '390',
m.sumwinschool as '551',
vf.help_debt as '558'
FROM
enrolment e
INNER JOIN enrolcourse AS ec ON ec.studentid=e.enrolstudent
INNER JOIN course AS c ON c.courseid = ec.courseid
INNER JOIN vetfee AS vf ON vf.userid=e.enrolstudent
INNER JOIN users AS u ON u.userid = e.enrolstudent
INNER JOIN users_personal AS up ON up.userid = e.enrolstudent
INNER JOIN module AS m ON m.modshortname = e.enrolmod
INNER JOIN online_intake oi ON oi.intakecode = e.online_intake
AND STR_TO_DATE(oi.censusdate, '%d-%m-%Y') BETWEEN '2015-07-01' AND '2015-09-31'
WHERE e.enrolstudent NOT LIKE '%onlinetutor%'
AND e.enrolstudent NOT LIKE '%tes%'
AND e.enrolstudent NOT like '%student%'
AND e.enrolrole = 'student'
ORDER BY e.enrolstudent;
Given your posted EXPLAIN output, you'll also want to add the following indexes:
ALTER TABLE enrolment
ADD INDEX (enrolstudent),
ADD INDEX (enrolmod),
ADD INDEX (online_intake);
ALTER TABLE enrolcourse
ADD INDEX (studentid),
ADD INDEX (courseid);
ALTER TABLE course
ADD INDEX (courseid);
ALTER TABLE vetfee
ADD INDEX (userid);
ALTER TABLE users
ADD INDEX (userid);
ALTER TABLE users_personal
ADD INDEX (userid);
ALTER TABLE module
ADD INDEX (modshortname);
ALTER TABLE online_intake
ADD INDEX (intakecode);
I have a query, which is not operating on a lot of data (IMHO) but takes a number of minutes (5-10) to execute and ends up filling the /tmp space (takes up to 20GB) while executing. Once it's finished the space is freed again.
The query is as follows:
SELECT c.name, count(b.id), c.parent_accounting_reference, o.contract, a.contact_person, a.address_email, a.address_phone, a.address_fax, concat(ifnull(concat(a.description, ', '),''), ifnull(concat(a.apt_unit, ', '),''), ifnull(concat(a.preamble, ', '),''), ifnull(addr_entered,'')) FROM
booking b
join visit v on (b.visit_id = v.id)
join super_booking s on (v.super_booking_id = s.id)
join customer c on (s.customer_id = c.id)
join address a on (a.customer_id = c.id)
join customer_number cn on (cn.customer_numbers_id = c.id)
join number n on (cn.number_id = n.id)
join customer_email ce on (ce.customer_emails_id = c.id)
join email e on (ce.email_id = e.id)
left join organization o on (o.accounting_reference = c.parent_accounting_reference)
left join address_type at on (a.type_id = at.id and at.name_key = 'billing')
where s.company_id = 1
and v.expected_start_date between '2015-01-01 00:00:00' and '2015-02-01 00:00:00'
group by s.customer_id
order by count(b.id) desc
And the explain plan for the same is:
+----+-------------+-------+--------+--------------------------------------------------------------+---------------------+---------+--------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------------------------------------------------+---------------------+---------+--------------------------------------+-------+----------------------------------------------+
| 1 | SIMPLE | s | ref | PRIMARY,FKC4F8739580E01B03,FKC4F8739597AD73B1 | FKC4F8739580E01B03 | 9 | const | 74088 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | ce | ref | FK864C4FFBAF6458E3,customer_emails_id,customer_emails_id_2 | customer_emails_id | 9 | id_dev.s.customer_id | 1 | Using where |
| 1 | SIMPLE | cn | ref | FK530F62CA30E87991,customer_numbers_id,customer_numbers_id_2 | customer_numbers_id | 9 | id_dev.ce.customer_emails_id | 1 | Using where |
| 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 8 | id_dev.s.customer_id | 1 | |
| 1 | SIMPLE | e | eq_ref | PRIMARY | PRIMARY | 8 | id_dev.ce.email_id | 1 | Using index |
| 1 | SIMPLE | n | eq_ref | PRIMARY | PRIMARY | 8 | id_dev.cn.number_id | 1 | Using index |
| 1 | SIMPLE | v | ref | PRIMARY,FK6B04D4BEF4FD9A | FK6B04D4BEF4FD9A | 8 | id_dev.s.id | 1 | Using where |
| 1 | SIMPLE | b | ref | FK3DB0859E1684683 | FK3DB0859E1684683 | 8 | id_dev.v.id | 1 | Using index |
| 1 | SIMPLE | o | ref | org_acct_reference | org_acct_reference | 767 | id_dev.c.parent_accounting_reference | 1 | |
| 1 | SIMPLE | a | ref | FKADDRCUST,customer_address_idx | FKADDRCUST | 9 | id_dev.c.id | 256 | Using where |
| 1 | SIMPLE | at | eq_ref | PRIMARY | PRIMARY | 8 | id_dev.a.type_id | 1 | |
+----+-------------+-------+--------+--------------------------------------------------------------+---------------------+---------+--------------------------------------+-------+----------------------------------------------+
It appears to be using the correct indexes and such so I can't understand why the large usage of /tmp and long execution time.
Your query uses a temporary table, which you can see by the Using temporary; note in the EXPLAIN result. Your MySQL settings are probably configured to use /tmp to store temporary tables.
If you want to optimize the query further, you should probably investigate why the temporary table is needed at all. The best way to do that is gradually simplifying the query until you figure out what is causing it. In this case, probably just the amount of rows needed to be processed, so if you really do need all this data, you probably need the temp table too. But don't give up on optimizing on my account ;)
By the way, on another note, you might want to look into COALESCE for handling NULL values.
You're stuck with a temporary table, because you're doing an aggregate query and then ordering it by one of the results in the aggregate. Your optimizing goal should be to reduce the number of rows and/or columns in that temporary table.
Add an index on visit.expected_start_date. This may help MySQL satisfy your query more quickly, especially if your visit table has many rows that lie outside the date range in your query.
It looks like you're trying to find the customers with the most bookings in a particular date range.
So, let's start with a subquery to summarize the least amount of material from your database.
SELECT count(*) booking_count, s.customer_id
FROM visit v
JOIN super_booking s ON v.super_booking_id = s.id
JOIN booking b ON v.id = b.visit_id
WHERE v.expected_start_date <= '2015-01-01 00:00:00'
AND v.expected_start_date > '2015-02-01 00:00:00'
AND s.company_id = 1
GROUP BY s.customer_id
This gives back a list of booking counts and customer ids for the date range and company id in question. It will be pretty efficient, especially if you put an index on expected_start_date in the visit table
Then, let's join that subquery to the one that pulls out all that information you need.
SELECT c.name, booking_count, c.parent_accounting_reference,
o.contract,
a.contact_person, a.address_email, a.address_phone, a.address_fax,
concat(ifnull(concat(a.description, ', '),''),
ifnull(concat(a.apt_unit, ', '),''),
ifnull(concat(a.preamble, ', '),''),
ifnull(addr_entered,''))
FROM (
SELECT count(*) booking_count, s.customer_id
FROM visit v
JOIN super_booking s ON v.super_booking_id = s.id
JOIN booking b ON v.id = b.visit_id
WHERE v.expected_start_date <= '2015-01-01 00:00:00'
AND v.expected_start_date > '2015-02-01 00:00:00'
AND s.company_id = 1
GROUP BY s.customer_id
) top
join customer c on top.customer_id = c.id
join address a on (a.customer_id = c.id)
join customer_number cn on (cn.customer_numbers_id = c.id)
join number n on (cn.number_id = n.id)
join customer_email ce on (ce.customer_emails_id = c.id)
join email e on (ce.email_id = e.id)
left join organization o on (o.accounting_reference = c.parent_accounting_reference)
left join address_type at on (a.type_id = at.id and at.name_key = 'billing')
order by booking_count DESC
That should speed your work up a whole bunch, by reducing the size of the data you need to summarize.
Note: Beware the trap in date BETWEEN this AND that. You really want
date >= this
AND date < that
because BETWEEN means
date >= this
AND date <= that
this is a follow up from MySQL - Find rows matching all rows from joined table
Thanks to this site the query runs perfectly.
But now i had to extend the query for a search for artist and track. This has lead me to the following query:
SELECT DISTINCT`t`.`id`
FROM `trackwords` AS `tw`
INNER JOIN `wordlist` AS `wl` ON wl.id=tw.wordid
INNER JOIN `track` AS `t` ON tw.trackid=t.id
WHERE (wl.trackusecount>0) AND
(wl.word IN ('please','dont','leave','me')) AND
t.artist IN (
SELECT a.id
FROM artist as a
INNER JOIN `artistalias` AS `aa` ON aa.ref=a.id
WHERE a.name LIKE 'pink%' OR aa.name LIKE 'pink%'
)
GROUP BY tw.trackid
HAVING (COUNT(*) = 4);
The Explain for this query looks quite good i think:
+----+--------------------+-------+--------+----------------------------+---------+---------+-----------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+--------+----------------------------+---------+---------+-----------------+------+----------------------------------------------+
| 1 | PRIMARY | wl | range | PRIMARY,word,trackusecount | word | 767 | NULL | 4 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | tw | ref | wordid,trackid | wordid | 4 | mbdb.wl.id | 31 | |
| 1 | PRIMARY | t | eq_ref | PRIMARY | PRIMARY | 4 | mbdb.tw.trackid | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | aa | ref | ref,name | ref | 4 | func | 2 | |
| 2 | DEPENDENT SUBQUERY | a | eq_ref | PRIMARY,name,namefull | PRIMARY | 4 | func | 1 | Using where |
+----+--------------------+-------+--------+----------------------------+---------+---------+-----------------+------+----------------------------------------------+
Did you see room for optimization ? Query has a runtime from around 7secs, which is to much unfortunatly. Any suggestions are welcome.
TIA
You have two possible selective conditions here: artists's name and the word list.
Assuming that the words are more selective than artists:
SELECT tw.trackid
FROM (
SELECT tw.trackid
FROM wordlist AS wl
JOIN trackwords AS tw
ON tw.wordid = wl.id
WHERE wl.trackusecount > 0
AND wl.word IN ('please','dont','leave','me')
GROUP BY
tw.trackid
HAVING COUNT(*) = 4
) tw
INNER JOIN
track AS t
ON t.id = tw.trackid
AND EXISTS
(
SELECT NULL
FROM artist a
WHERE a.name LIKE 'pink%'
AND a.id = t.artist
UNION ALL
SELECT NULL
FROM artist a
JOIN artistalias aa
ON aa.ref = a.id
AND aa.name LIKE 'pink%'
WHERE a.id = t.artist
)
You need to have the following indexes for this to be efficient:
wordlist (word, trackusecount)
trackwords (wordid, trackid)
artistalias (ref, name)
Have you already indexed the name columns? That should speed this up.
You can also try using fulltext searching with Match and Against.