Need help speeding up a MySQL query - mysql

I need a query that quickly shows the articles within a particular module (a subset of articles) that a user has NOT uploaded a PDF for. The query I am using below takes about 37 seconds, given there are 300,000 articles in the Article table, and 6,000 articles in the Module.
SELECT *
FROM article a
INNER JOIN article_module_map amm ON amm.article=a.id
WHERE amm.module = 2 AND
a.id NOT IN (
SELECT afm.article
FROM article_file_map afm
INNER JOIN article_module_map amm ON amm.article = afm.article
WHERE afm.organization = 4 AND
amm.module = 2
)
What I am doing in the above query is first truncating the list of articles to the selected module, and then further truncating that list to the articles that are not in the subquery. The subquery is generating a list of the articles that an organization has already uploaded PDF's for. Hence, the end result is a list of articles that an organization has not yet uploaded PDF's for.
Help would be hugely appreciated, thanks in advance!
EDIT 2012/10/25
With #fthiella's help, the below query ran in an astonishing 1.02 seconds, down from 37+ seconds!
SELECT a.* FROM (
SELECT article.* FROM article
INNER JOIN article_module_map
ON article.id = article_module_map.article
WHERE article_module_map.module = 2
) AS a
LEFT JOIN article_file_map
ON a.id = article_file_map.article
AND article_file_map.organization=4
WHERE article_file_map.id IS NULL

I am not sure that i can understand the logic and the structure of the tables correctly. This is my query:
SELECT
article.id
FROM
article
INNER JOIN
article_module_map
ON article.id = article_module_map.article
AND article_module_map.module=2
LEFT JOIN
article_file_map
ON article.id = article_file_map.article
AND article_file_map.organization=4
WHERE
article_file_map.id IS NULL
I extract all of the articles that have a module 2. I then select those that organization 4 didn't provide a file.
I used a LEFT JOIN instead of a subquery. In some circumstances this could be faster.
EDIT Thank you for your comment. I wasn't sure it would run faster, but it surprises me that it is so much slower! Anyway, it was worth a try!
Now, out of curiosity, I would like to try all the combinations of LEFT/INNER JOIN and subquery, to see which one runs faster, eg:
SELECT *
FROM
(SELECT *
FROM
article INNER JOIN article_module_map
ON article.id = article_module_map.article
WHERE
article_module_map.module=2)
LEFT JOIN
etc.
maybe removing *, and I would like to see what changes between the condition on the WHERE clause and on the ON clause... anyway I think it doesn't help much, you should concentrate on indexes now.
Indexes on keys/foreign key should be okay already, but what if you add an index on article_module_map.module and/or article_file_map.organization ?

When optimizing queries I use to check the following points:
First: I would avoid using * in SELECT clause, instead, name the diferent fields you want. This increases crazily the speed (I had one which took 7 seconds with *, and naming the field decreased to 0.1s).
Second: As #Adder says, add indexes to your tables.
Third: Try using INNER JOIN instead of WHERE amm.module = 2 AND a.id NOT IN ( ... ). I think I read (I don't remember it well, so take it carefully) that usually MySQL optimize INNER JOINS, and as your subquery is a filter, maybe using three INNER JOINS plus WHERE would be faster to retrieve.

Related

Is there any difference, performance wise, with these two queries? (Repeating the where clause inside the sub-query) MYSQL

I have a query that goes something like this.
Select *
FROM FaultCode FC
JOIN (
SELECT INNER_E.* FROM Equipment INNER_E
) E USING(EquipmentID)
LEFT JOIN AssetType AT ON AT.id_asset_type = E.id_asset_type AND AT.id_language = 'en-us'
LEFT JOIN Project P ON E.current_id_project = P.id_project
WHERE E.id_organization = 100057 AND E.equipment_status = 'ACTIVE'
AND FC.code_status = 'OPEN'
As you can see, in the outside query, there is a where clause in the outside main query.
But also, on the inside, we have an Inner Join statement with the line SELECT INNER_E.* FROM Equipment INNER_E. This inner join makes us only retrieve the fault codes that are inside the equipment table (correct me if I'm wrong).
I am trying to optimize this query.
My question is, does it make any difference to do this
Select *
FROM FaultCode FC
JOIN (
SELECT INNER_E.* FROM Equipment INNER_E
WHERE INNER_E.id_organization = 100057 AND INNER_E.equipment_status = 'ACTIVE'
) E USING(EquipmentID)
LEFT JOIN AssetType AT ON AT.id_asset_type = E.id_asset_type AND AT.id_language = 'en-us'
LEFT JOIN Project P ON E.current_id_project = P.id_project
WHERE E.id_organization = 100057 AND E.equipment_status = 'ACTIVE'
AND FC.code_status = 'OPEN'
So repeating the where clause inside the inner sub query, to further limit it before it joins. Or does the optimizer know to do this automatically?
I tried implementing that line in code, and it seemed to only make my query slower strangely enough. Is there any way I can optimize that query above, or since it's pretty simple, is that the best it's going to get without indexes?
I tried running the Explain Select statement, but I have a hard time parsing what it's telling me. Are there any good resources I can look into to learn some tips or techniques to optimize my query?
I don't have any aggregate functions in my Select fields. So is the only real answer Indexes?
Why is the first subquery needed? Perhaps simply
Select *
FROM FaultCode FC
JOIN Equipment AS E USING(EquipmentID)
LEFT JOIN AssetType AT ON AT.id_asset_type = E.id_asset_type
AND AT.id_language = 'en-us'
LEFT JOIN Project P ON E.current_id_project = P.id_project
WHERE E.id_organization = 100057
AND E.equipment_status = 'ACTIVE'
AND FC.code_status = 'OPEN';
Likely Indexes:
FC: INDEX(code_status, EquipmentID)
E: INDEX(id_organization, equipment_status, EquipmentID,)
Probably unwise to do SELECT * -- It will give you all the columns of all 4 tables. (Without further details, I cannot suggest any "covering" indexes, which seems likely for AT.)
With my version of the query, your question about repeating the WHERE vanishes. With your version, it is likely to help. I don't think the Optimizer is smart enough to catch on to what you are doing.
Show us the EXPLAINs. We can help some with what the cryptic stuff is saying. (And what it is not saying.)
"the best it's going to get without indexes" -- Are you saying you have no indexes??! Not even a PRIMARY KEY for each table? "So is the only real answer Indexes?" Every time you write a query against a non-tiny table, you should ask "do the table(s) have adequate indexes for this query?"

Optimising a Query with 1 million rows

I've been trying to optimise this query I've got, originally I was using INNER JOIN for the vip.tvip database however noticed that people that didn't exist in that table weren't showing and read I have to use a LEFT JOIN which has caused further issues.
SELECT sb_admins.srv_group AS role, rankme.lastconnect, rankme.steam, rankme.name, rankme.pfp, vip.tvip.vip_level FROM bans.sb_admins
INNER JOIN rankme ON CONCAT("STEAM_0:", rankme.authid) = sb_admins.authid
LEFT JOIN vip.tvip ON tvip.playerid = rankme.authid
AND gid > 0 ORDER BY rankme.name;
This is the query I'm currently using, it seems to take around 5 seconds to get the result due to the rankme table being 1.3 million rows. I am also attaching the EXPLAIN for this query too, I'm not that well versed in MySQL queries so apologies if I am butchering this.
If someone could give an in-sight on how to fix this, would be tremendously helpful. I have created keys for anything which I could such as name being a FULLTEXT key etc but still no prevail.
Cheers.
Could you try:
SELECT sb_admins.srv_group AS role, rankme.lastconnect, rankme.steam, rankme.name, rankme.pfp, vip.tvip.vip_level FROM bans.sb_admins
INNER JOIN rankme ON rankme.authid = REPLACE(sb_admins.authid,"STEAM_0:","")
LEFT JOIN vip.tvip ON tvip.playerid = rankme.authid
AND gid > 0 ORDER BY rankme.name;
This should be able to use the index on rankme.authid in rankme. (if that exists...)

Convert a MySQL NOT EXISTS into an INNER JOIN

I'm a little new to joins, so I'm not even sure if this is possible. I've been Googling and trying a few things..
What I need:
Select data.id where the corresponding user2data.user_id does not exist where user2data.user_id = 'X'
Exciting right? :D
What works:
SELECT * FROM data WHERE NOT EXISTS (SELECT * FROM user2data WHERE user2data.user_id=1 AND user2data.data_id=data.id) LIMIT 100;
However, it's slow, even though all 3 columns are indexed. I tried an OUTER JOIN for this purpose from another SO answer, but it's EVEN SLOWER than the above. What I need is an INNER JOIN.
Please let me know if this is actually possible, or if there is an alternative that takes advantage of the indexes.
Thanks and best
Could be you can use left join
SELECT *
FROM data WHERE
LEFT JOIN user2data ON ( user2data.user_id=1 AND user2data.data_id=data.id )
where user2data.data_id is null
LIMIT 100;

Optimize "JOIN" query

this is my query from my source code
SELECT `truyen`.*, MAX(chapter.chapter) AS last_chapter
FROM (`truyen`)
LEFT JOIN `chapter` ON `chapter`.`truyen` = `truyen`.`Id`
WHERE `truyen`.`title` LIKE \'%%\'
GROUP BY `truyen`.`Id`
LIMIT 250
When I install it on iFastnet host, It cause over 500,000 rows to be examined due to the join, and the query is being blocked (this would used over 100% of a CPU, which ultimately would cause server instability).
I also tried to add this line before the query, it fixed the problem above but lead to another issue making some of functions can not run correctly
mysql_query("SET SQL_BIG_SELECTS=1");
How can I fix this problem without buying another hosting ?
Thanks.
You might be looking for an INNER JOIN. That would remove results that do not match. I find INNER JOINs to be faster than LEFT JOINs.
However, I'm not sure what results you are actually looking for. But because you are using the GROUP BY, it looks like the INNER JOIN might work for you.
One thing I would recommend is copy and paste the query that it generates into SQL with DESCRIBE before it.
So if the query ended up being:
SELECT truyen.*, MAX(chapter.chapter) AS last_chapter FROM truyen
LEFT JOIN chapter ON chapter.truyen = truyen.Id
WHERE truyen.title LIKE '%queryString%'
You would type:
DESCRIBE SELECT truyen.*, MAX(chapter.chapter) AS last_chapter FROM truyen
LEFT JOIN chapter ON chapter.truyen = truyen.Id
WHERE truyen.title LIKE '%queryString%'
This will tell you if you could possibly ad an index to your table to JOIN on faster.
I hope this at least points you in the right direction.
Michael Berkowski seems to agree with the indexing, which you will be able to see from the DESCRIBE.
Please look if you have indexes on chapter.chapter and chapter.truyen. If not, set them and try again. If this is not successful try these suggestions:
Do you have the possibility to flag permanently on insert/update your last chapter in a column of your chapter table? Then you could use it to reduce the joined rows and you could drop out the GROUP BY. Maybe in this way:
SELECT `truyen`.*, `chapter`.`chapter` as `last_chapter`
FROM `truyen`, `chapter`
WHERE `chapter`.`truyen` = `truyen`.`Id`
AND `chapter`.`flag_last_chapter` = 1
AND `truyen`.`title` LIKE '%queryString%'
LIMIT 250
Or create a new table for that instead:
INSERT INTO new_table (truyen, last_chapter)
SELECT truyen, MAX(chapter) FROM chapter GROUP BY truyen;
SELECT `truyen`.*, `new_table`.`last_chapter`
FROM (`truyen`)
LEFT JOIN `new_table` ON `new_table`.`truyen` = `truyen`.`Id`
WHERE `truyen`.`title` LIKE '%queryString%'
GROUP BY `truyen`.`Id`
LIMIT 250
Otherwise you could just fetch the 250 rows of truyen, collect your truyen ids in an array and build another SQL Statement to select the 250 rows of the chapter table. I have seen in your original question that you can use PHP for that. So you could merge the results after that:
SELECT * FROM truyen
WHERE title LIKE '%queryString%'
LIMIT 250
SELECT truyen, MAX(chapter) AS last_chapter
FROM chapter
WHERE truyen in (comma_separated_ids_from_first_select)

Help me optimize this query

I have this query for an application that I am designing. There is a table of references, an authors table and a reference_authors table. There is a sub query to return all authors for a given reference which I then display formatted in php. The subquery and query run individually are both nice and speedy. However as soon as the subquery is put into the main query the whole thing takes over 120s to run. I would apprecaite some fresh eyes on this one.
Thanks.
SELECT
rf.reference_id,
rf.reference_type_id,
rf.article_title,
rf.publication,
rf.annotation,
rf.publication_year,
(SELECT GROUP_CONCAT(a.author_name)
FROM authors_final AS a
INNER JOIN reference_authors AS ra2 ON ra2.author_id = a.author_id
WHERE ra2.reference_id = rf.reference_id
GROUP BY ra2.reference_id) AS authors
FROM
references_final AS rf
INNER JOIN reference_authors AS ra ON rf.reference_id = ra.reference_id
LEFT JOIN reference_institutes AS ri ON rf.reference_id = ri.reference_id;
Here is the fixed query. Thanks guys for the recommendations.
SELECT
rf.reference_id,
rf.reference_type_id,
rf.article_title,
rf.publication,
rf.annotation,
rf.publication_year,
GROUP_CONCAT(a.author_name) AS authors
FROM
references_final as rf
INNER JOIN (reference_authors AS ra INNER JOIN authors_final AS a ON ra.author_id = a.author_id)
ON rf.reference_id = ra.reference_id
LEFT JOIN reference_institutes AS ri ON rf.reference_id = ri.reference_id
GROUP BY rf.reference_id
Although not every subquery can be rewritten as an inner join, I think yours can.
From 120 seconds to 78 milliseconds is not a bad improvement--about three orders of magnitude. Take the rest of the day off.
When you come back tomorrow, start looking for other subqueries in your source code.
You say the subquery is nice and speedy in isolation but its now obviously running for every single row - 100 rows = 100 sub queries.
Assuming you have indexes on all your foreign keys that's as good as it gets as a sub query.
One option is to left join authors and create a Cartesian product - you'll have a lot more rows returned and will need some code to get to the same end result but it will put less strain on the db and will run quicker.
If you've got paging on and say are returning 10 rows, issung 10 individual calls to get the authors in isolation would also be be pretty quick.