I'm making a query that allows me to order recipes by score.
Tables structure
Structure is that a flyer contains one or many flyer_items, which can contain one or many ingredients_to_flyer_item (this table links ingredient to the flyer item). The other table ingredient_to_recipe links the same ingredients but to one or many recipes. Link to .sql file is included at the end.
Example query
I want to get recipe_id and a SUM of the MAX price weight of each ingredient that are part of the recipe (linked by ingredient_to_recipe), but if a recipe has multiple ingredients that belongs to the same flyers_item, it should be counted once.
SELECT itr.recipe_id,
SUM(itr.weight),
SUM(max_price_weight),
SUM(itr.weight + max_price_weight) AS score
FROM
( SELECT MAX(itf.max_price_weight) AS max_price_weight,
itf.flyer_item_id,
itf.ingredient_id
FROM
(SELECT ifi.ingredient_id,
MAX(i.price_weight) AS max_price_weight,
ifi.flyer_item_id
FROM flyer_items i
JOIN ingredient_to_flyer_item ifi ON i.id = ifi.flyer_item_id
WHERE i.flyer_id IN (1,
2)
GROUP BY ifi.ingredient_id ) itf
GROUP BY itf.flyer_item_id) itf2
JOIN `ingredient_to_recipe` AS itr ON itf2.`ingredient_id` = itr.`ingredient_id`
WHERE recipe_id = 5730
GROUP BY itr.`recipe_id`
ORDER BY score DESC
LIMIT 0,10
The query almost works fine, because most of the results are good, but for some lines, some ingredients are ignored and not counted from the score as they should.
Test cases
| recipe_id | 'score' with current query | what 'score' should be | explanation |
|-----------|----------------------------|------------------------|-----------------------------------------------------------------------------|
| 8376 | 51 | 51 | Good result |
| 3152 | 1 | 18 | Only 1 ingredient having a score of one is counted, should be 4 ingredients |
| 4771 | 41 | 45 | One ingredient worth score 4 is ignored |
| 10230 | 40 | 40 | Good result |
| 8958 | 39 | 39 | Good result |
| 4656 | 28 | 34 | One ingredient worth 6 is ignored |
| 11338 | 1 | 10 | 2 ingredients, worth 4 and 5 are ignored |
I have a very difficult time finding an easy way to explain it. Let me know if anything else could help.
Here is a link to the demo database to run the query, test examples and test cases: https://nofile.io/f/F4YSEu8DWmT/meta.zip
Thank you very much.
Update (as asked by Rick James):
Here is the furthest I could make it work. The results are always good, in subquery too, but, I've completely taken out the group by 'flyer_item_id'. So with this query, I get the good score, but if many ingredients of the recipe are the same flyer_item_item, they will be counted multiple times (Like score would be 59 for recipe_id = 10557 instead of the good 56, because 2 ingredients worth 3 are in the same flyers_item). The only thing I need more is to count one MAX(price_weight) per flyer_item_id per recipe, (which I originally tried by grouping by 'flyer_item_id' over the first group_by ingredient_id.
SELECT itr.recipe_id,
SUM(itr.weight) as total_ingredient_weight,
SUM(itf.price_weight) as total_price_weight,
SUM(itr.weight+itf.price_weight) as score
FROM
(SELECT fi1.id, MAX(fi1.price_weight) as price_weight, ingredient_to_flyer_item.ingredient_id as ingredient_id, recipe_id
FROM flyer_items fi1
INNER JOIN (
SELECT flyer_items.id as id, MAX(price_weight) as price_weight, ingredient_to_flyer_item.ingredient_id as ingredient_id
FROM flyer_items
JOIN ingredient_to_flyer_item ON flyer_items.id = ingredient_to_flyer_item.flyer_item_id
GROUP BY id
) fi2 ON fi1.id = fi2.id AND fi1.price_weight = fi2.price_weight
JOIN ingredient_to_flyer_item ON fi1.id = ingredient_to_flyer_item.flyer_item_id
JOIN ingredient_to_recipe ON ingredient_to_flyer_item.ingredient_id = ingredient_to_recipe.ingredient_id
GROUP BY ingredient_to_flyer_item.ingredient_id) AS itf
INNER JOIN `ingredient_to_recipe` AS `itr` ON `itf`.`ingredient_id` = `itr`.`ingredient_id`
GROUP BY `itr`.`recipe_id`
ORDER BY `score` DESC
LIMIT 10
Here is the explain, but I'm not sure it's useful as the last working part is still missing:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | |
|----|-------------|--------------------------|------------|--------|-------------------------------|---------------|---------|-------------------------------------------------------|--------|----------|---------------------------------|---|
| 1 | PRIMARY | itr | NULL | ALL | recipe_id,ingredient_id | NULL | NULL | NULL | 151800 | 100.00 | Using temporary; Using filesort | |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 4 | metadata3.itr.ingredient_id | 10 | 100.00 | NULL | |
| 2 | DERIVED | ingredient_to_flyer_item | NULL | ALL | NULL | NULL | NULL | NULL | 249 | 100.00 | Using temporary; Using filesort | |
| 2 | DERIVED | fi1 | NULL | eq_ref | id_2,id,price_weight | id_2 | 4 | metadata3.ingredient_to_flyer_item.flyer_item_id | 1 | 100.00 | NULL | |
| 2 | DERIVED | <derived3> | NULL | ref | <auto_key0> | <auto_key0> | 9 | metadata3.ingredient_to_flyer_item.flyer_item_id,m... | 10 | 100.00 | NULL | |
| 2 | DERIVED | ingredient_to_recipe | NULL | ref | ingredient_id | ingredient_id | 4 | metadata3.ingredient_to_flyer_item.ingredient_id | 40 | 100.00 | NULL | |
| 3 | DERIVED | ingredient_to_flyer_item | NULL | ALL | NULL | NULL | NULL | NULL | 249 | 100.00 | Using temporary; Using filesort | |
| 3 | DERIVED | flyer_items | NULL | eq_ref | id_2,id,flyer_id,price_weight | id_2 | 4 | metadata3.ingredient_to_flyer_item.flyer_item_id | 1 | 100.00 | NULL | |
Update 2
I managed to find a query that works, but now I have to make it faster, it takes over 500ms to run.
SELECT sum(ff.price_weight) as price_weight, sum(ff.weight) as weight, sum(ff.price_weight+ff.weight) as score, ff.recipe_id FROM
(
SELECT DISTINCT
itf.flyer_item_id as flyer_item_id,
itf.recipe_id,
itf.weight,
aprice_weight AS price_weight
FROM
(SELECT itfin.flyer_item_id AS flyer_item_id,
itfin.price_weight AS aprice_weight,
itfin.ingredient_id,
itr.recipe_id,
itr.weight
FROM
(SELECT ifi2.flyer_item_id, ifi2.ingredient_id as ingredient_id, MAX(ifi2.price_weight) as price_weight
FROM
ingredient_to_flyer_item ifi1
INNER JOIN (
SELECT id, MAX(price_weight) as price_weight, ingredient_to_flyer_item.ingredient_id as ingredient_id, ingredient_to_flyer_item.flyer_item_id
FROM ingredient_to_flyer_item
GROUP BY ingredient_id
) ifi2 ON ifi1.price_weight = ifi2.price_weight AND ifi1.ingredient_id = ifi2.ingredient_id
WHERE flyer_id IN (1,2)
GROUP BY ifi1.ingredient_id) AS itfin
INNER JOIN `ingredient_to_recipe` AS `itr` ON `itfin`.`ingredient_id` = `itr`.`ingredient_id`
) AS itf
) ff
GROUP BY recipe_id
ORDER BY `score` DESC
LIMIT 20
Here is the EXPLAIN:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | |
|----|-------------|--------------------------|------------|-------|----------------------------------------------|---------------|---------|---------------------|------|----------|---------------------------------|---|
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 1318 | 100.00 | Using temporary; Using filesort | |
| 2 | DERIVED | <derived4> | NULL | ALL | NULL | NULL | NULL | NULL | 37 | 100.00 | Using temporary | |
| 2 | DERIVED | itr | NULL | ref | ingredient_id | ingredient_id | 4 | itfin.ingredient_id | 35 | 100.00 | NULL | |
| 4 | DERIVED | <derived5> | NULL | ALL | NULL | NULL | NULL | NULL | 249 | 100.00 | Using temporary; Using filesort | |
| 4 | DERIVED | ifi1 | NULL | ref | ingredient_id,itx_full,price_weight,flyer_id | ingredient_id | 4 | ifi2.ingredient_id | 1 | 12.50 | Using where | |
| 5 | DERIVED | ingredient_to_flyer_item | NULL | index | ingredient_id,itx_full | ingredient_id | 4 | NULL | 249 | 100.00 | NULL | |
Sounds like "explode-implode". This is where the query has a JOIN and GROUP BY.
The JOIN gathers the appropriate combinations of rows from the joined tables; then
The GROUP BY COUNTs, SUMs, etc, giving you inflated values for the aggregates.
There are two common fixes, both involve doing the aggregation separate from the JOIN.
Case 1:
SELECT ...
( SELECT SUM(x) FROM t2 WHERE id = ... ) AS sum_x,
...
FROM t1 ...
That case gets clumsy if you need multiple aggregates from t2, since it allows only one at a time.
Case 2:
SELECT ...
FROM ( SELECT grp,
SUM(x) AS sum_x,
COUNT(*) AS ct
FROM t2 ) AS s
JOIN t1 ON t1.grp = s.grp
You have 2 JOINs and 3 GROUP BYs, so I recommend you debug (and rewrite) your query from the inside out.
SELECT ifi.ingredient_id,
MAX(price_weight) as max_price_weight,
flyer_item_id
from flyer_items i
join ingredient_to_flyer_item ifi ON i.id = ifi.flyer_item_id
where flyer_id in (1, 2)
group by ifi.ingredient_id
But I can't help you, since you have not qualified price_weight by the table (or an alias) it is in. (Ditto for some other columns.)
(Actually, MAX and MIN won't get inflated values; AVG will get slightly wrong values; COUNT and SUM get "wrong" values.)
Hence, I will leave the rest as an "exercise" to the reader".
INDEXes
itr: (ingredient_id, recipe_id) -- for the JOIN and WHERE and GROUP BY
itr: (recipe_id, ingredient_id, weight) -- for 1st Update
(There is no optimization available for the ORDER BY and LIMIT)
flyer_items: (flyer_id, price_weight) -- unless flyer_id is the PRIMARY KEY
ifi: (flyer_item_id, ingredient_id)
ifi: (ingredient_id, flyer_item_id) -- for 1st Update
Please provide `SHOW CREATE TABLE for the relevant tables.
Please provide EXPLAIN SELECT ....
If ingredient_to_flyer_item is a many:many mapping tables, please follow the tips here . Ditto for ingredient_to_recipe?
GROUP BY itf.flyer_item_id is probably invalid since it does not include the non-aggregated ifi.ingredient_id. See "only_full_group_by".
Reformulate
After you finish evaluating the INDEXes, try the following. Caution: I do not know if it will work correctly.
JOIN `ingredient_to_recipe` AS itr ON itf2.`ingredient_id` = itr.`ingredient_id`
to
JOIN ( SELECT recipe_id,
ingredient_id,
SUM(weight) AS sum_weight
FROM ingredient_to_recipe ) AS itr
And change the initial SELECT to replace SUMs by these computed sums. (I suspect I have not handled ingredient_id correctly.)
What version of MySQL/MariaDB are you running?
I've been wanting to take a look at this but unfortunately haven't had time until now. I think this query will give you the results you are looking for.
SELECT recipe_id, SUM(weight) AS weight, SUM(max_price_weight) AS price_weight, SUM(weight + max_price_weight) AS score
FROM (SELECT recipe_id, ingredient_id, MAX(weight) AS weight, MAX(price_weight) AS max_price_weight
FROM (SELECT itr.recipe_id, MIN(itr.ingredient_id) AS ingredient_id, MAX(itr.weight) AS weight, fi.id, MAX(fi.price_weight) AS price_weight
FROM ingredient_to_recipe itr
JOIN ingredient_to_flyer_item itfi ON itfi.ingredient_id = itr.ingredient_id
JOIN flyer_items fi ON fi.id = itfi.flyer_item_id
GROUP BY itr.recipe_id, fi.id) ri
GROUP BY recipe_id, ingredient_id) r
GROUP BY recipe_id
ORDER BY score DESC
LIMIT 10
It groups first by flyer_item_id and then on MIN(ingredient_id) to take account of ingredients within a recipe which have the same flyer_item_id. Then it sums the results to get the score you want. If I use the query with a
HAVING recipe_id IN (8376, 3152, 4771, 10230, 8958, 4656, 11338)
clause it gives the following results, which match your "what score should be" column above:
recipe_id weight price_weight score
8376 10 41 51
4771 5 40 45
10230 10 30 40
8958 15 24 39
4656 15 19 34
3152 0 18 18
11338 0 10 10
I'm not sure how fast this query will execute on your system, it's comparable to your query on my laptop (which I would expect to be quite a bit slower). I'm pretty sure there are some optimisations possible but again, haven't had time to look into them thoroughly.
I hope this provides you with a bit more help getting to a workable solution.
I'm not sure I fully understood the problem. It seems to me you are grouping by the wrong column flyer_items.id. You should be grouping by the column ingredient_id instead. If you do this, it makes more sense (to me). Here's how I see it:
select
itr.recipe_id,
sum(itr.weight),
sum(max_price_weight),
sum(itr.weight + max_price_weight) as score
from (
select
ifi.ingredient_id,
max(price_weight) as max_price_weight
from flyer_items i
join ingredients_to_flyer_item ifi on i.id = ifi.flyer_item_id
where flyer_id in (1, 2)
group by ifi.ingredient_id
) itf
join `ingredient_to_recipe` as itr on itf.`ingredient_id` = itr.`ingredient_id`
group by itr.`recipe_id`
order by score desc
limit 0,10;
I hope it helps.
I'm trying to select all rows in this table, with the constraint that revised id's are selected instead of the original ones. So, if a row has a revision, that revision is selected instead of that row, if there are multiple revision numbers the highest revision number is preferred.
I think an example table, output, and query will explain this better:
Table:
+----+-------+-------------+-----------------+-------------+
| id | value | original_id | revision_number | is_revision |
+----+-------+-------------+-----------------+-------------+
| 1 | abcd | null | null | 0 |
| 2 | zxcv | null | null | 0 |
| 3 | qwert | null | null | 0 |
| 4 | abd | 1 | 1 | 1 |
| 5 | abcde | 1 | 2 | 1 |
| 6 | zxcvb | 2 | 1 | 1 |
| 7 | poiu | null | null | 0 |
+----+-------+-------------+-----------------+-------------+
Desired Output:
+----+-------+-------------+-----------------+
| id | value | original_id | revision_number |
+----+-------+-------------+-----------------+
| 3 | qwert | null | null |
| 5 | abcde | 1 | 2 |
| 6 | zxcvb | 2 | 1 |
| 7 | poiu | null | null |
+----+-------+-------------+-----------------+
View Called revisions_max:
SELECT
responses.original_id AS original_id,
MAX(responses.revision_number) AS revision
FROM
responses
WHERE
original_id IS NOT NULL
GROUP BY responses.original_id
My Current Query:
SELECT
responses.*
FROM
responses
WHERE
id NOT IN (
SELECT
original_id
FROM
revisions_max
)
AND
is_revision = 0
UNION
SELECT
responses.*
FROM
responses
INNER JOIN revisions_max ON revisions_max.original_id = responses.original_id
AND revisions_max.revision_number = responses.revision_number
This query works, but takes 0.06 seconds to run. With a table of only 2000 rows. This table will quickly start expanding to tens or hundreds of thousands of rows. The query under the union is what takes most of the time.
What can I do to improve this queries performance?
How about using coalesce()?
SELECT COALESCE(y.id, x.id) AS id,
COALESCE(y.value, x.value) AS value,
COALESCE(y.original_id, x.original_id) AS original_id,
COALESCE(y.revision_number, x.revision_number) AS revision_number
FROM responses x
LEFT JOIN (SELECT r1.*
FROM responses r1
INNER JOIN (SELECT responses.original_id AS
original_id,
Max(responses.revision_number) AS
revision
FROM responses
WHERE original_id IS NOT NULL
GROUP BY responses.original_id) rev
ON r1.original_id = rev.original_id
AND r1.revision_number = rev.revision) y
ON x.id = y.original_id
WHERE y.id IS NOT NULL
OR x.original_id IS NULL;
The approach I would take with any other DBMS is to use NOT EXISTS:
SELECT r1.*
FROM Responses AS r1
WHERE NOT EXISTS
( SELECT 1
FROM Responses AS r2
WHERE r2.original_id = COALESCE(r1.original_id, r1.id)
AND r2.revision_number > COALESCE(r1.revision_number, 0)
);
To remove any rows where a higher revision number exists for the same id (or original_id if it is populated). However, in MySQL, LEFT JOIN/IS NULL will perform better than NOT EXISTS1. As such I would rewrite the above as:
SELECT r1.*
FROM Responses AS r1
LEFT JOIN Responses AS r2
ON r2.original_id = COALESCE(r1.original_id, r1.id)
AND r2.revision_number > COALESCE(r1.revision_number, 0)
WHERE r2.id IS NULL;
Example on DBFiddle
I realise that you have said that you don't want to use LEFT JOIN and check for nulls, but I don't see that there is a better solution.
1. At least this was the case historically, I don't actively use MySQL so don't keep up to date with developments in the optimiser
Below is the explain output for the slow query that has 10's of "copying to tmp table" state in mysql processlist.
explain SELECT distinct
(radgroupreply.groupname),
count(distinct (radusergroup.username)) AS users
FROM
radgroupreply
LEFT JOIN
radusergroup ON radgroupreply.groupname = radusergroup.groupname
WHERE
(radgroupreply.groupname NOT LIKE 'FB-%' AND radgroupreply.groupname NOT LIKE '%Dropped%')
GROUP BY radgroupreply.groupname
UNION SELECT distinct
(radgroupcheck.groupname),
count(distinct (radusergroup.username))
FROM
radgroupcheck
LEFT JOIN
radusergroup ON radgroupcheck.groupname = radusergroup.groupname
WHERE
(radgroupcheck.groupname NOT LIKE 'FB-%' AND radgroupcheck.groupname NOT LIKE '%Dropped%')
GROUP BY radgroupcheck.groupname
ORDER BY groupname asc;
+----+--------------+---------------+------+---------------+------+---------+------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+---------------+------+---------------+------+---------+------+-------+----------------------------------------------+
| 1 | PRIMARY | radgroupreply | ALL | NULL | NULL | NULL | NULL | 456 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | radusergroup | ALL | NULL | NULL | NULL | NULL | 10261 | |
| 2 | UNION | radgroupcheck | ALL | NULL | NULL | NULL | NULL | 167 | Using where; Using temporary; Using filesort |
| 2 | UNION | radusergroup | ALL | NULL | NULL | NULL | NULL | 10261 | |
|NULL| UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | Using filesort |
+----+--------------+---------------+------+---------------+------+---------+------+-------+----------------------------------------------+
5 rows in set (0.00 sec)
I cant get my head around this query to create compound/single index and optimize since it has multiple joins, group by and like operations.
Here are three observations to get started.
The select distinct is unnecessary (the group by takes care of that).
The left joins are unnecessary (the where clauses turn them into inner joins).
The UNION should probably be UNION ALL. I doubt you really want to incur the overhead of removing duplicates.
So, you can write the query as:
SELECT rr.groupname, count(distinct rg.username) AS users
FROM radgroupreply rr JOIN
radusergroup rg
ON rr.groupname = rg.groupname
WHERE rr.groupname NOT LIKE 'FB-%' AND rr.groupname NOT LIKE '%Dropped%'
GROUP BY rr.groupname
UNION ALL
SELECT rc.groupname, count(rg.username)
FROM radgroupcheck rc JOIN
radusergroup rg
ON rc.groupname = rg.groupname
WHERE rc.groupname NOT LIKE 'FB-%' AND rc.groupname NOT LIKE '%Dropped%'
GROUP BY rc.groupname
ORDER BY groupname asc;
This query can take advantage of indexes on radusergroup(groupname). I am guessing an index on rc(radusergroup) would be used.
I would also advice you to remove the DISTINCT in COUNT(DISTINCT) if it is not necessary.
I have this sql query running on MySQL 5.1 non-normalized table. It works the way i want it to, but it can be quite slow. I added an index on the day column but it still needs to be faster. Any suggestions on how to get this faster? (maybe with a join instead?)
SELECT DISTINCT(bucket) AS b,
(possible_free_slots -
(SELECT COUNT(availability)
FROM ip_bucket_list
WHERE bucket = b
AND availability = 'used'
AND tday = 'evening'
AND day LIKE '2012-12-14%'
AND network = '10_83_mh1_bucket')) AS free_slots
FROM ip_bucket_list
ORDER BY free_slots DESC;
The individual queries are fast:
SELECT DISTINCT(bucket) FROM ip_bucket_list;
1024 rows in set (0.05 sec)
SELECT COUNT(availability) from ip_bucket_list WHERE bucket = 0 AND availability = 'used' AND tday = 'evening' AND day LIKE '2012-12-14%' AND network = '10_83_mh1_bucket';
1 row in set (0.00 sec)
Table:
mysql> describe ip_bucket_list;
+---------------------+--------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| ip | varchar(50) | YES | | NULL | |
| bucket | int(11) | NO | MUL | NULL | |
| availability | varchar(20) | YES | | NULL | |
| network | varchar(100) | NO | MUL | NULL | |
| possible_free_slots | int(11) | NO | | NULL | |
| tday | varchar(20) | YES | | NULL | |
| day | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
+---------------------+--------------+------+-----+-------------------+----------------+
and the DESC:
DESC SELECT DISTINCT(bucket) as b,(possible_free_slots - (SELECT COUNT(availability) from ip_bucket_list WHERE bucket = b AND availability = 'used' AND tday = 'evening' AND day LIKE '2012-12-14%' AND network = '10_83_mh1_bucket')) as free_slots FROM ip_bucket_list ORDER BY free_slots DESC;
+----+--------------------+----------------+------+-----------------------------------------+--------+---------+------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------------+------+-----------------------------------------+--------+---------+------+--------+---------------------------------+
| 1 | PRIMARY | ip_bucket_list | ALL | NULL | NULL | NULL | NULL | 328354 | Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | ip_bucket_list | ref | bucket,network,ip_bucket_list_day_index | bucket | 4 | func | 161 | Using where |
+----+--------------------+----------------+------+-----------------------------------------+--------+---------+------+--------+---------------------------------+
I would move the correlated subquery from the SELECT clause into the FROM clause, using a join:
SELECT distinct bucket as b,
(possible_free_slots - a.avail) as free_slots
FROM ip_bucket_list ipbl left outer join
(SELECT bucket COUNT(availability) as avail
from ip_bucket_list
WHERE availability = 'used' AND tday = 'evening' AND
day LIKE '2012-12-14%' AND network = '10_83_mh1_bucket'
) on a
on ipbl.bucket = avail.bucket
ORDER BY free_slots DESC;
The version in the SELECT clause is probably being re-run for every row (even before the distinct is running). By putting it in the from clause, the ip_bucket_list table will be scanned only once.
Also, if you are expecting each bucket to only show up once, then I would recommend that you use group by rather than distinct. It would clarify the purpose of the query. You may be able to eliminate the second reference to the table altogether, with something like:
SELECT bucket as b,
max(possible_free_slots -
(case when availability = 'used' AND tday = 'evening' AND
day LIKE '2012-12-14%' AND network = '10_83_mh1_bucket'
then 1 else 0
end)
) as free_slots
FROM ip_bucket_list
group by bucket
ORDER BY free_slots DESC;
To speed up your version of the query, you need an index on bucket, because this is used for the correlated subquery.
Try moving the subquery into the main query - like so:
SELECT b.bucket AS b,
b.possible_free_slots - COUNT(l.availability) AS free_slots
FROM ip_bucket_list b
LEFT JOIN ip_bucket_list l
ON l.bucket = b.bucket
AND l.availability = 'used'
AND l.tday = 'evening'
AND l.day LIKE '2012-12-14%'
AND l.network = '10_83_mh1_bucket'
GROUP BY b.bucket, b.possible_free_slots
ORDER BY 2 DESC
this is a follow up from MySQL - Find rows matching all rows from joined table
Thanks to this site the query runs perfectly.
But now i had to extend the query for a search for artist and track. This has lead me to the following query:
SELECT DISTINCT`t`.`id`
FROM `trackwords` AS `tw`
INNER JOIN `wordlist` AS `wl` ON wl.id=tw.wordid
INNER JOIN `track` AS `t` ON tw.trackid=t.id
WHERE (wl.trackusecount>0) AND
(wl.word IN ('please','dont','leave','me')) AND
t.artist IN (
SELECT a.id
FROM artist as a
INNER JOIN `artistalias` AS `aa` ON aa.ref=a.id
WHERE a.name LIKE 'pink%' OR aa.name LIKE 'pink%'
)
GROUP BY tw.trackid
HAVING (COUNT(*) = 4);
The Explain for this query looks quite good i think:
+----+--------------------+-------+--------+----------------------------+---------+---------+-----------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+--------+----------------------------+---------+---------+-----------------+------+----------------------------------------------+
| 1 | PRIMARY | wl | range | PRIMARY,word,trackusecount | word | 767 | NULL | 4 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | tw | ref | wordid,trackid | wordid | 4 | mbdb.wl.id | 31 | |
| 1 | PRIMARY | t | eq_ref | PRIMARY | PRIMARY | 4 | mbdb.tw.trackid | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | aa | ref | ref,name | ref | 4 | func | 2 | |
| 2 | DEPENDENT SUBQUERY | a | eq_ref | PRIMARY,name,namefull | PRIMARY | 4 | func | 1 | Using where |
+----+--------------------+-------+--------+----------------------------+---------+---------+-----------------+------+----------------------------------------------+
Did you see room for optimization ? Query has a runtime from around 7secs, which is to much unfortunatly. Any suggestions are welcome.
TIA
You have two possible selective conditions here: artists's name and the word list.
Assuming that the words are more selective than artists:
SELECT tw.trackid
FROM (
SELECT tw.trackid
FROM wordlist AS wl
JOIN trackwords AS tw
ON tw.wordid = wl.id
WHERE wl.trackusecount > 0
AND wl.word IN ('please','dont','leave','me')
GROUP BY
tw.trackid
HAVING COUNT(*) = 4
) tw
INNER JOIN
track AS t
ON t.id = tw.trackid
AND EXISTS
(
SELECT NULL
FROM artist a
WHERE a.name LIKE 'pink%'
AND a.id = t.artist
UNION ALL
SELECT NULL
FROM artist a
JOIN artistalias aa
ON aa.ref = a.id
AND aa.name LIKE 'pink%'
WHERE a.id = t.artist
)
You need to have the following indexes for this to be efficient:
wordlist (word, trackusecount)
trackwords (wordid, trackid)
artistalias (ref, name)
Have you already indexed the name columns? That should speed this up.
You can also try using fulltext searching with Match and Against.