My query selects the maximum value of a second table (laboratory values) for a subset of cases. Obviously this method is extremely slow (>25seconds) as I am using a subquery that, not knowing parent's filter, groups all laboratory.CaseID (15k in a 10mil record table) before just using 10 of them. The problem is easy to see in an EXPLAIN.
SELECT cases.CaseID,t.val FROM cases
LEFT JOIN
(SELECT Max(laboratory.LaboratoryValue) as val , CaseID FROM laboratory
WHERE laboratory.LaboratoryID =682
GROUP by laboratory.CaseID) as t
ON t.CaseID = cases.CaseID
WHERE cases.Diagnosis = 16;
I tried to optimize the query using a TEMPORARY TABLE, which speeds up to 0,008s, but this method seems a bit clumsy when doing it to many fields.
CREATE TEMPORARY TABLE tmptest
SELECT cases.CaseID FROM cases WHERE cases.Diagnosis = 16;
SELECT cases.CaseID,t.val FROM cases
LEFT JOIN
(SELECT Max(laboratory.LaboratoryValue) as val , CaseID FROM laboratory
WHERE laboratory.LaboratoryID =682
AND laboratory.CaseID IN (SELECT CaseID FROM tmptest)
GROUP by laboratory.CaseID) as t
ON t.CaseID = cases.CaseID
WHERE cases.Diagnosis = 16;
Is there a simpler solution for query optimization?
As requested the EXPLAIN. It shows the problem, CaseID is (as expected) not used.
The second query does it right (NoDoubles contains both indexes);
These indexes are needed:
laboratory: INDEX(LaboratoryID, CaseID, LaboratoryValue)
cases: INDEX(Diagnosis, CaseID)
Two more formulations to try and time:
SELECT c.CaseID, L.val
FROM
( SELECT CaseID, Max(laboratory.LaboratoryValue) as val
FROM laboratory
WHERE laboratory.LaboratoryID = 682
GROUP by laboratory.CaseID
) AS L
JOIN cases AS c USING(CaseID)
WHERE c.Diagnosis = 16;
SELECT c.CaseID,
( SELECT Max(laboratory.LaboratoryValue)
FROM laboratory
WHERE laboratory.LaboratoryID = 682
AND CaseID = c.CaseID
) AS val
FROM cases AS c
WHERE c.Diagnosis = 16;
They are not identical because of "LEFT". So, verify that you get the "right" answers. The first uses JOIN; the second is equivalent to your LEFT JOIN.
For further discussion, please provide EXPLAIN for whatever formulations you want to discuss. And SHOW CREATE TABLE.
Related
In a current web project of mine, we are implementing a complex search function. As part of that search functionality, we are using the MySQL COUNT function to be able to return the number of matching results.
We are running into a performance hiccup as a result. When getting the actual list of results, MySQL properly uses the indexes we have setup and returns results very quickly. When using the COUNT query, however, the results are sometimes returned very slowly. When examining the execution plans for various search queries, we have discovered that sometimes the COUNT query is doing a full table scan. Other times, despite the query logic being practically identical, the query is using an index. We can't seem to notice any particular pattern that distinguishes the two.
Here is an example of a query that is NOT doing a full table scan:
select COUNT(DISTINCT text.name) AS count
from `text_epigraphy`
inner join `text` ON `text`.`uuid` = `text_epigraphy`.`text_uuid`
inner join `hierarchy` ON `hierarchy`.`uuid` = `text_epigraphy`.`text_uuid`
inner join `text_epigraphy` as `t1` ON `t1`.`text_uuid` = `text_epigraphy`.`text_uuid`
and `t1`.`reading_uuid` in ('01f1e805-1278-ec9b-9f69-fced97bc923e',
'07a120bc-02ec-c1ac-e0ba-532de39766ed', '126f978b-bd99-40f0-8f3b-d2bcec1ed3fe',
'44ec304e-71f4-4995-a30d-0ca6d3bec95a', '4a1d8673-9e30-2d1e-7b87-453dec2886db',
'bce40e36-d6eb-c44a-d114-8c7653a0e68c', 'c9083b77-6122-7933-ea21-63d3777749f3' )
and t1.char_on_tablet=text_epigraphy.char_on_tablet + 1
and t1.line=text_epigraphy.line
inner join `text_epigraphy` as `t2` ON `t2`.`text_uuid` = `text_epigraphy`.`text_uuid`
and `t2`.`reading_uuid` in ('3fc156dc-e831-493e-5dc1-84a547aeb4fa',
'70f9be19-62b6-3fe8-ddda-32bd50a8d36e' )
and t2.char_on_tablet=text_epigraphy.char_on_tablet + 2
and t2.line=text_epigraphy.line
inner join `text_epigraphy` as `t3` ON `t3`.`text_uuid` = `text_epigraphy`.`text_uuid`
and `t3`.`reading_uuid` in ('1ee91402-ebb0-3be9-cc38-9d4187816031',
'25a44259-fe7a-2b73-6e2c-02171c924805', 'a23fd531-c796-353e-4a53-54680248438a',
'd55fa6ad-c523-2e33-6378-b4f2e2a020f1' )
and t3.char_on_tablet=text_epigraphy.char_on_tablet + 3
and t3.line=text_epigraphy.line
where `text_epigraphy`.`reading_uuid` in ('6c0e47d0-00aa-26fb-e184-07038ca64323',
'd8904652-f049-11f9-3f7a-038f1e3b6055', 'eca27c41-d3ca-417c-15e0-db5353ddaefb' )
and 1 = 1
and (1 = 1
or 1 = 0)
limit 1
And yet this query IS doing a full table scan:
select COUNT(DISTINCT text.name) AS count
from `text_epigraphy`
inner join `text` ON `text`.`uuid` = `text_epigraphy`.`text_uuid`
inner join `hierarchy` ON `hierarchy`.`uuid` = `text_epigraphy`.`text_uuid`
inner join `text_epigraphy` as `t1` ON `t1`.`text_uuid` = `text_epigraphy`.`text_uuid`
and `t1`.`reading_uuid` in ('3fc156dc-e831-493e-5dc1-84a547aeb4fa')
and t1.char_on_tablet=text_epigraphy.char_on_tablet + 1
and t1.line=text_epigraphy.line
inner join `text_epigraphy` as `t2` ON `t2`.`text_uuid` = `text_epigraphy`.`text_uuid`
and `t2`.`reading_uuid` in ('1ee91402-ebb0-3be9-cc38-9d4187816031',
'25a44259-fe7a-2b73-6e2c-02171c924805', 'a23fd531-c796-353e-4a53-54680248438a',
'd55fa6ad-c523-2e33-6378-b4f2e2a020f1' )
and t2.char_on_tablet=text_epigraphy.char_on_tablet + 2
and t2.line=text_epigraphy.line
where `text_epigraphy`.`reading_uuid` in ('c9083b77-6122-7933-ea21-63d3777749f3')
and 1 = 1
and (1 = 1
or 1 = 0)
limit 1
Like I said, we can't quite figure out why some searches are doing a full table scan when using COUNT but it is resulting in significantly slower searches. If anyone could help us figure out what is causing the difference and how we might be able to avoid the full table scan or at least optimize the queries.
Can't you remove hierarchy?
What indexes exist on text_epigraphy? This looks useful:
INDEX(line, char_on_tablet, reading_uuid, text_uuid)
On text: INDEX(uuid, name)
After that, please provide EXPLAIN SELECT; then I will look at your question.
I have the following query which displays a list of accounts with a certain margin level:
SELECT
crm_margincall.id,
crm_margincall.CreationTime,
ba.name AS crm_bankaccount_id,
crm_margincall.name,
crm_margincall.MarginCallLevel,
crm_margincall.UseOfEquityForMargin,
crm_margincall.MarginRequired,
crm_margincall.NetEquityForMargin,
crm_margincall.MarginDeficit,
crm_margincall.balance,
crm_margincall.deposited,
crm_margincall.prefunded,
crm_margincall.required
FROM
crm_margincall
LEFT JOIN
crm_bankaccount ba ON crm_margincall.crm_bankaccount_id = ba.id
WHERE
crm_margincall.name = 'MarginCall'
AND
crm_margincall.MarginCallLevel >= 100
AND
crm_margincall.crm_account_id NOT IN
(
SELECT
x.crm_account_id
FROM
crm_margincall x
WHERE
x.crm_account_id = crm_margincall.crm_account_id
AND
x.name = 'LevelDrop'
AND
x.MarginCallLevel < 100
AND
x.id > crm_margincall.id
)
ORDER BY
id
DESC
This query, on a table of ~22.500 records takes >10 seconds to run, this is caused by the subquery defining the NOT IN section (tried NOT EXISTS, isnt much faster). How can I join this table on itself to achieve the same effect?
This query, on a table of ~22.500 records takes >10 seconds to run,
this is caused by the subquery defining the NOT IN section (tried NOT
EXISTS, isnt much faster). How can I join this table on itself to
achieve the same effect?
This can be done in several ways, but a scan of 22500 records taking 10" means either a hardware issue, or a very inefficient JOIN.
The most likely cause of the latter is a missing index or a misconfigured index, and to investigate this, you need to issue an EXPLAIN:
EXPLAIN SELECT ...
Totally shooting in the dark, judging from the selected columns being used, I'd try with
CREATE INDEX test_index ON crm_margincall(name, crm_account_id, MarginCallLevel, id)
Other improvements might be possible, but you'd need to prepare a sample structure with some fake data in a SQLfiddle to really allow debugging.
Try something like this:
SELECT
crm_margincall.id,
crm_margincall.CreationTime,
ba.name AS crm_bankaccount_id,
crm_margincall.name,
crm_margincall.MarginCallLevel,
crm_margincall.UseOfEquityForMargin,
crm_margincall.MarginRequired,
crm_margincall.NetEquityForMargin,
crm_margincall.MarginDeficit,
crm_margincall.balance,
crm_margincall.deposited,
crm_margincall.prefunded,
crm_margincall.required
FROM
crm_margincall
LEFT JOIN
crm_bankaccount ba ON crm_margincall.crm_bankaccount_id = ba.id
INNER JOIN
(
SELECT
x.crm_account_id
FROM
crm_margincall x
WHERE
x.name = 'LevelDrop'
AND
x.MarginCallLevel < 100
AND
x.id > crm_margincall.id
) tt ON crm_margincall.crm_account_id = tt.crm_account_id
WHERE
crm_margincall.name = 'MarginCall'
AND
crm_margincall.MarginCallLevel >= 100
ORDER BY
id
DESC
I run the following MYSQL sentence, however it spends over 10 seconds, although it just return 10 rows. BTW, if I remove the LIMIT 0, 10 , it would return 1,000,000 rows.I have created Index1 for column SceneCode and Index2 for column ProviderId.
SELECT *
FROM
(SELECT * FROM gf_sceneprovider WHERE SceneCode='DL00000003' ) AS sprovider
LEFT JOIN (SELECT * FROM gf_sceneprovidertemplate WHERE SceneCode='DL00000003' ) AS stemplate
ON sprovider.ProviderId = stemplate.ProviderId
INNER JOIN gf_provider AS provider
ON provider.ProviderId = sprovider.ProviderId
LIMIT 0, 10
I would do away with the subqueries in favor of direct joins:
SELECT *
FROM gf_sceneprovider sprovider
LEFT JOIN gf_sceneprovidertemplate stemplate
ON sprovider.ProviderId = stemplate.ProviderId
INNER JOIN gf_provider AS provider
ON provider.ProviderId = sprovider.ProviderId
WHERE sprovider.SceneCode = 'DL00000003' AND
stemplate.SceneCode = 'DL00000003'
LIMIT 0, 10
Then, add indices on the join columns if possible. Your original subqueries might prevent the indices on the gf_sceneprovider and gf_sceneprovidertemplate tables from being used effectively/at all. The reason for this is that your subqueries essentially create an on-the-fly table which, unlike the tables from which they select, have no indices. I think some RDMBS can cope with this in certain scenarios but it looks like that is not the case here.
So, this query is currently used in a webshop to retrieve technical data about articles.
It has served its purpose fine except the amount of products shown have increased lately resulting in unacceptable long loading times for some categories.
For one of the worst pages this (and some other queries) get requested about 80 times.
I only recently learned that MySQL does not optimize sub-queries that don't have a depending parameter to only run once.
So if someone could help me with one of the queries and explain how you can replace the in's and exists's to joins, i will probably be able to change the other ones myself.
select distinct criteria.cri_id, des_texts.tex_text, article_criteria.acr_value, article_criteria.acr_kv_des_id
from article_criteria, designations, des_texts, criteria, articles
where article_criteria.acr_cri_id = criteria.cri_id
and article_criteria.acr_art_id = articles.art_id
and articles.art_deliverystatus = 1
and criteria.cri_des_id = designations.des_id
and designations.des_lng_id = 9
and designations.des_tex_id = des_texts.tex_id
and criteria.cri_id = 328
and article_criteria.acr_art_id IN (Select distinct link_art.la_art_id
from link_art, link_la_typ
where link_art.la_id = link_la_typ.lat_la_id
and link_la_typ.lat_typ_id = 17484
and link_art.la_ga_id IN (Select distinct link_ga_str.lgs_ga_id
from link_ga_str, search_tree
where link_ga_str.lgs_str_id = search_tree.str_id
and search_tree.str_type = 1
and search_tree.str_id = 10132
and EXISTS (Select *
from link_la_typ
where link_la_typ.lat_typ_id = 17484
and link_ga_str.lgs_ga_id = link_la_typ.lat_ga_id)))
order by article_criteria.acr_value
I think this one is the main badguy with sub-sub-sub-queries
I just noticed i can remove the last exist and still get the same results but with no increase in speed, not part of the question though ;) i'll figure out myself whether i still need that part.
Any help or pointers are appreciated, if i left out some useful information tell me as well.
I think this is equivalent:
SELECT DISTINCT c.cri_id, dt.tex_text, ac.acr_value, ac.acr_kv_des_id
FROM article_criteria AS ac
JOIN criteria AS c ON ac.acr_cri_id = c.cri_id
JOIN articles AS a ON ac.acr_art_id = a.art_id
JOIN designations AS d ON c.cri_des_id = d.des_id
JOIN des_texts AS dt ON dt.tex_id = d.des_tex_id
JOIN (SELECT distinct la.la_art_id
FROM link_art AS la
JOIN link_la_typ AS llt ON la.la_id = llt.lat_la_id
JOIN (SELECT DISTINCT lgs.lgs_ga_id
FROM link_ga_str AS lgs
JOIN search_tree AS st ON lgs.lgs_str_id = st.str_id
JOIN link_la_typ AS llt ON lgs.lgs_ga_id = llt.lat_ga_id
WHERE st.str_type = 1
AND st.str_id = 10132
AND llt.lat_typ_id = 17484) AS lgs
ON la.la_ga_id = lgs.lgs_ga_id
WHERE llt.lat_typ_id = 17484) AS la
ON ac.acr_art_id = la.la_art_id
WHERE a.art_deliverystatus = 1
AND d.des_lng_id = 9
AND c.cri_id = 328
ORDER BY ac.acr_value
All the IN <subquery> clauses can be replaced with JOIN <subquery>, where you then JOIN on the column being tested equaling the column returned by the subquery. And the EXISTS test is converted to a join with the table, moving the comparison in the subquery's WHERE clause into the ON clause of the JOIN.
It's probably possible to flatten the whole thing, instead of joining with subqueries. But I suspect performance will be poor, because this won't reduce the temporary tables using DISTINCT. So you'll get combinatorial explosion in the resulting cross product, which will then have to be reduced at the end with the DISTINCT at the top.
I've converted all the implicit joins to ANSI JOIN clauses, to make the structure clearer, and added table aliases to make things more readable.
In general, you can convert a FROM tab1 WHERE ... val IN (SELECT blah) to a join like this.
FROM tab1
JOIN (
SELECT tab1_id
FROM tab2
JOIN tab3 ON whatever = whatever
WHERE whatever
) AS sub1 ON tab1.id = sub1.tab1_id
The JOIN (an inner join) will drop the rows that don't match the ON condition from your query.
If your tab1_id values can come up duplicate from your inner query, use SELECT DISTINCT. But don't use SELECT DISTINCT unless you need to; it is costly to evaluate.
I'm in over my head with a big mysql query (mysql 5.0), and i'm hoping somebody here can help.
Earlier I asked how to get distinct values from a joined query
mysql count only for distinct values in joined query
The response I got worked (using a subquery with join as)
select *
from media m
inner join
( select uid
from users_tbl
limit 0,30) map
on map.uid = m.uid
inner join users_tbl u
on u.uid = m.uid
unfortunately, my query has grown more unruly, and though I have it running, joining into a derived table is taking too long because there is no indexes available to the derived query.
my query now looks like this
SELECT mdate.bid, mdate.fid, mdate.date, mdate.time, mdate.title, mdate.name,
mdate.address, mdate.rank, mdate.city, mdate.state, mdate.lat, mdate.`long`,
ext.link,
ext.source, ext.pre, meta, mdate.img
FROM ext
RIGHT OUTER JOIN (
SELECT media.bid,
media.date, media.time, media.title, users.name, users.img, users.rank, media.address,
media.city, media.state, media.lat, media.`long`,
GROUP_CONCAT(tags.tagname SEPARATOR ' | ') AS meta
FROM media
JOIN users ON media.bid = users.bid
LEFT JOIN tags ON users.bid=tags.bid
WHERE `long` BETWEEN -122.52224684058 AND -121.79760915942
AND lat BETWEEN 37.07500915942 AND 37.79964684058
AND date = '2009-02-23'
GROUP BY media.bid, media.date
ORDER BY media.date, users.rank DESC
LIMIT 0, 30
) mdate ON (mdate.bid = ext.bid AND mdate.date = ext.date)
phew!
SO, as you can see, if I understand my problem correctly, i have two derivative tables without indexes (and i don't deny that I may have screwed up the Join statements somehow, but I kept messing with different types, is this ended up giving me the result I wanted).
What's the best way to create a query similar to this which will allow me to take advantage of the indexes?
Dare I say, I actually have one more table to add into the mix at a later date.
Currently, my query is taking .8 seconds to complete, but I'm sure if I could take advantage of the indexes, this could be significantly faster.
First, check for indices on ext(bid, date), users(bid) and tags(bid), you should really have them.
It seems, though, that it's LONG and LAT that cause you most problems. You should try keeping your LONG and LAT as a (coordinate POINT), create a SPATIAL INDEX on this column and query like that:
WHERE MBRContains(#MySquare, coordinate)
If you can't change your schema for some reason, you can try creating additional indices that include date as a first field:
CREATE INDEX ix_date_long ON media (date, `long`)
CREATE INDEX ix_date_lat ON media (date, lat)
These indices will be more efficient for you query, as you use exact search on date combined with a ranged search on axes.
Starting fresh:
Question - why are you grouping by both media.bid and media.date? Can a bid have records for more than one date?
Here's a simpler version to try:
SELECT
mdate.bid,
mdate.fid,
mdate.date,
mdate.time,
mdate.title,
mdate.name,
mdate.address,
mdate.rank,
mdate.city,
mdate.state,
mdate.lat,
mdate.`long`,
ext.link,
ext.source,
ext.pre,
meta,
mdate.img,
( SELECT GROUP_CONCAT(tags.tagname SEPARATOR ' | ')
FROM tags
WHERE ext.bid = tags.bid
ORDER BY tags.bid GROUP BY tags.bid
) AS meta
FROM
ext
LEFT JOIN
media ON ext.bid = media.bid AND ext.date = media.date
JOIN
users ON ext.bid = users.bid
WHERE
`long` BETWEEN -122.52224684058 AND -121.79760915942
AND lat BETWEEN 37.07500915942 AND 37.79964684058
AND ext.date = '2009-02-23'
AND users.userid IN
(
SELECT userid FROM users ORDER BY rank DESC LIMIT 30
)
ORDER BY
media.date,
users.rank DESC
LIMIT 0, 30
You might want to compare your perforamnces against using a temp table for each selection, and joining those tables together.
create table #whatever
create table #whatever2
insert into #whatever select...
insert into #whatever2 select...
select from #whatever join #whatever 2
....
drop table #whatever
drop table #whatever2
If your system has enough memory to hold full tables this might work out much faster. It depends on how big your database is.