I run the following MYSQL sentence, however it spends over 10 seconds, although it just return 10 rows. BTW, if I remove the LIMIT 0, 10 , it would return 1,000,000 rows.I have created Index1 for column SceneCode and Index2 for column ProviderId.
SELECT *
FROM
(SELECT * FROM gf_sceneprovider WHERE SceneCode='DL00000003' ) AS sprovider
LEFT JOIN (SELECT * FROM gf_sceneprovidertemplate WHERE SceneCode='DL00000003' ) AS stemplate
ON sprovider.ProviderId = stemplate.ProviderId
INNER JOIN gf_provider AS provider
ON provider.ProviderId = sprovider.ProviderId
LIMIT 0, 10
I would do away with the subqueries in favor of direct joins:
SELECT *
FROM gf_sceneprovider sprovider
LEFT JOIN gf_sceneprovidertemplate stemplate
ON sprovider.ProviderId = stemplate.ProviderId
INNER JOIN gf_provider AS provider
ON provider.ProviderId = sprovider.ProviderId
WHERE sprovider.SceneCode = 'DL00000003' AND
stemplate.SceneCode = 'DL00000003'
LIMIT 0, 10
Then, add indices on the join columns if possible. Your original subqueries might prevent the indices on the gf_sceneprovider and gf_sceneprovidertemplate tables from being used effectively/at all. The reason for this is that your subqueries essentially create an on-the-fly table which, unlike the tables from which they select, have no indices. I think some RDMBS can cope with this in certain scenarios but it looks like that is not the case here.
Related
My query selects the maximum value of a second table (laboratory values) for a subset of cases. Obviously this method is extremely slow (>25seconds) as I am using a subquery that, not knowing parent's filter, groups all laboratory.CaseID (15k in a 10mil record table) before just using 10 of them. The problem is easy to see in an EXPLAIN.
SELECT cases.CaseID,t.val FROM cases
LEFT JOIN
(SELECT Max(laboratory.LaboratoryValue) as val , CaseID FROM laboratory
WHERE laboratory.LaboratoryID =682
GROUP by laboratory.CaseID) as t
ON t.CaseID = cases.CaseID
WHERE cases.Diagnosis = 16;
I tried to optimize the query using a TEMPORARY TABLE, which speeds up to 0,008s, but this method seems a bit clumsy when doing it to many fields.
CREATE TEMPORARY TABLE tmptest
SELECT cases.CaseID FROM cases WHERE cases.Diagnosis = 16;
SELECT cases.CaseID,t.val FROM cases
LEFT JOIN
(SELECT Max(laboratory.LaboratoryValue) as val , CaseID FROM laboratory
WHERE laboratory.LaboratoryID =682
AND laboratory.CaseID IN (SELECT CaseID FROM tmptest)
GROUP by laboratory.CaseID) as t
ON t.CaseID = cases.CaseID
WHERE cases.Diagnosis = 16;
Is there a simpler solution for query optimization?
As requested the EXPLAIN. It shows the problem, CaseID is (as expected) not used.
The second query does it right (NoDoubles contains both indexes);
These indexes are needed:
laboratory: INDEX(LaboratoryID, CaseID, LaboratoryValue)
cases: INDEX(Diagnosis, CaseID)
Two more formulations to try and time:
SELECT c.CaseID, L.val
FROM
( SELECT CaseID, Max(laboratory.LaboratoryValue) as val
FROM laboratory
WHERE laboratory.LaboratoryID = 682
GROUP by laboratory.CaseID
) AS L
JOIN cases AS c USING(CaseID)
WHERE c.Diagnosis = 16;
SELECT c.CaseID,
( SELECT Max(laboratory.LaboratoryValue)
FROM laboratory
WHERE laboratory.LaboratoryID = 682
AND CaseID = c.CaseID
) AS val
FROM cases AS c
WHERE c.Diagnosis = 16;
They are not identical because of "LEFT". So, verify that you get the "right" answers. The first uses JOIN; the second is equivalent to your LEFT JOIN.
For further discussion, please provide EXPLAIN for whatever formulations you want to discuss. And SHOW CREATE TABLE.
I have what seems like an easy many-to-many relationship query with pagination. It works fine, but the downside is the time it takes. On the prod server, it's more than 20 seconds. On my development environment, 13 seconds.
Here is the code:
$query = $this->excerpt->orderBy($sort, $order);
$excerpts = $query->with('source.authors')
->with('excerptType')
->with('tags')
->whereHas('tags', function($q) use ($tagId){
$q->where('tag_id', $tagId);
})
->paginate($this->paginateCount);
These two queries take the longest
select count(*) as aggregate
from `excerpt`
where (select count(*)
from `tags`
inner join `excerpt_tag`
on `tags`.`id` = `excerpt_tag`.`tag_id`
where `excerpt_tag`.`excerpt_id` = `excerpt`.`id`
and `tag_id` = '655') >= 1
2.02 secs
select *
from `excerpt`
where (select count(*) from `tags`
inner join `excerpt_tag`
on `tags`.`id` = `excerpt_tag`.`tag_id`
where `excerpt_tag`.`excerpt_id` = `excerpt`.`id`
and `tag_id` = '655') >= 1
order by `created_at` desc limit 15 offset 0
2.02 secs
I was thinking of changing this to a simple query with inner joins, like:
select *
from `excerpt`
inner join excerpt_tag on excerpt.id = excerpt_tag.excerpt_id
inner join tags on excerpt_tag.tag_id = tags.id
where tags.id = 655
limit 10 offset 0
But then I lose the advantage of eager loading and so on.
Does anyone have an idea on what the best way to speed this up would be?
Change
( SELECT COUNT(*) ... ) > 0
to
EXISTS ( SELECT 1 ... )
Follow the instructions here for index tips in many:many tables.
If a tag is just a short string, don't bother having a table (tags) for them. Instead, simply have the tag in the excerpt_tag and get rid of tag_id.
A LIMIT without an ORDER BY is somewhat meaningless -- which 10 rows you get will be unpredictable.
Well I have a solution that has led to a significant improvement and only added a few lines of code and only 1 or maybe 2 extra sql queries.
I decided to query the tags first and find out which excerpts were connected and then use a whereIn to then query all information from the excerpts, thus hoping to still make use of the with function and eager loading. At least keep the number of queries down to an absolute minimum.
Here is the code with the solution:
// workaround to make excerpt query faster
$excerptsWithTag = $this->tag->with(['excerpts' => function($query) {
$query->select('excerpt.id');
}])->find($tagId,['tags.id']);
// actual excrpt query
$excerptIds = array_column($excerptsWithTag->excerpts()->get()->toArray(), 'id');
$query = $this->excerpt->orderBy($sort, $order);
$excerpts = $query->with([
'source.authors',
'excerptType',
'tags'
])
->whereIn('excerpt.id', $excerptIds)
->paginate($this->paginateCount);
There is very likely a much more eloquent way to solve this problem, but this works and I'm happy.
In a current web project of mine, we are implementing a complex search function. As part of that search functionality, we are using the MySQL COUNT function to be able to return the number of matching results.
We are running into a performance hiccup as a result. When getting the actual list of results, MySQL properly uses the indexes we have setup and returns results very quickly. When using the COUNT query, however, the results are sometimes returned very slowly. When examining the execution plans for various search queries, we have discovered that sometimes the COUNT query is doing a full table scan. Other times, despite the query logic being practically identical, the query is using an index. We can't seem to notice any particular pattern that distinguishes the two.
Here is an example of a query that is NOT doing a full table scan:
select COUNT(DISTINCT text.name) AS count
from `text_epigraphy`
inner join `text` ON `text`.`uuid` = `text_epigraphy`.`text_uuid`
inner join `hierarchy` ON `hierarchy`.`uuid` = `text_epigraphy`.`text_uuid`
inner join `text_epigraphy` as `t1` ON `t1`.`text_uuid` = `text_epigraphy`.`text_uuid`
and `t1`.`reading_uuid` in ('01f1e805-1278-ec9b-9f69-fced97bc923e',
'07a120bc-02ec-c1ac-e0ba-532de39766ed', '126f978b-bd99-40f0-8f3b-d2bcec1ed3fe',
'44ec304e-71f4-4995-a30d-0ca6d3bec95a', '4a1d8673-9e30-2d1e-7b87-453dec2886db',
'bce40e36-d6eb-c44a-d114-8c7653a0e68c', 'c9083b77-6122-7933-ea21-63d3777749f3' )
and t1.char_on_tablet=text_epigraphy.char_on_tablet + 1
and t1.line=text_epigraphy.line
inner join `text_epigraphy` as `t2` ON `t2`.`text_uuid` = `text_epigraphy`.`text_uuid`
and `t2`.`reading_uuid` in ('3fc156dc-e831-493e-5dc1-84a547aeb4fa',
'70f9be19-62b6-3fe8-ddda-32bd50a8d36e' )
and t2.char_on_tablet=text_epigraphy.char_on_tablet + 2
and t2.line=text_epigraphy.line
inner join `text_epigraphy` as `t3` ON `t3`.`text_uuid` = `text_epigraphy`.`text_uuid`
and `t3`.`reading_uuid` in ('1ee91402-ebb0-3be9-cc38-9d4187816031',
'25a44259-fe7a-2b73-6e2c-02171c924805', 'a23fd531-c796-353e-4a53-54680248438a',
'd55fa6ad-c523-2e33-6378-b4f2e2a020f1' )
and t3.char_on_tablet=text_epigraphy.char_on_tablet + 3
and t3.line=text_epigraphy.line
where `text_epigraphy`.`reading_uuid` in ('6c0e47d0-00aa-26fb-e184-07038ca64323',
'd8904652-f049-11f9-3f7a-038f1e3b6055', 'eca27c41-d3ca-417c-15e0-db5353ddaefb' )
and 1 = 1
and (1 = 1
or 1 = 0)
limit 1
And yet this query IS doing a full table scan:
select COUNT(DISTINCT text.name) AS count
from `text_epigraphy`
inner join `text` ON `text`.`uuid` = `text_epigraphy`.`text_uuid`
inner join `hierarchy` ON `hierarchy`.`uuid` = `text_epigraphy`.`text_uuid`
inner join `text_epigraphy` as `t1` ON `t1`.`text_uuid` = `text_epigraphy`.`text_uuid`
and `t1`.`reading_uuid` in ('3fc156dc-e831-493e-5dc1-84a547aeb4fa')
and t1.char_on_tablet=text_epigraphy.char_on_tablet + 1
and t1.line=text_epigraphy.line
inner join `text_epigraphy` as `t2` ON `t2`.`text_uuid` = `text_epigraphy`.`text_uuid`
and `t2`.`reading_uuid` in ('1ee91402-ebb0-3be9-cc38-9d4187816031',
'25a44259-fe7a-2b73-6e2c-02171c924805', 'a23fd531-c796-353e-4a53-54680248438a',
'd55fa6ad-c523-2e33-6378-b4f2e2a020f1' )
and t2.char_on_tablet=text_epigraphy.char_on_tablet + 2
and t2.line=text_epigraphy.line
where `text_epigraphy`.`reading_uuid` in ('c9083b77-6122-7933-ea21-63d3777749f3')
and 1 = 1
and (1 = 1
or 1 = 0)
limit 1
Like I said, we can't quite figure out why some searches are doing a full table scan when using COUNT but it is resulting in significantly slower searches. If anyone could help us figure out what is causing the difference and how we might be able to avoid the full table scan or at least optimize the queries.
Can't you remove hierarchy?
What indexes exist on text_epigraphy? This looks useful:
INDEX(line, char_on_tablet, reading_uuid, text_uuid)
On text: INDEX(uuid, name)
After that, please provide EXPLAIN SELECT; then I will look at your question.
So, this query is currently used in a webshop to retrieve technical data about articles.
It has served its purpose fine except the amount of products shown have increased lately resulting in unacceptable long loading times for some categories.
For one of the worst pages this (and some other queries) get requested about 80 times.
I only recently learned that MySQL does not optimize sub-queries that don't have a depending parameter to only run once.
So if someone could help me with one of the queries and explain how you can replace the in's and exists's to joins, i will probably be able to change the other ones myself.
select distinct criteria.cri_id, des_texts.tex_text, article_criteria.acr_value, article_criteria.acr_kv_des_id
from article_criteria, designations, des_texts, criteria, articles
where article_criteria.acr_cri_id = criteria.cri_id
and article_criteria.acr_art_id = articles.art_id
and articles.art_deliverystatus = 1
and criteria.cri_des_id = designations.des_id
and designations.des_lng_id = 9
and designations.des_tex_id = des_texts.tex_id
and criteria.cri_id = 328
and article_criteria.acr_art_id IN (Select distinct link_art.la_art_id
from link_art, link_la_typ
where link_art.la_id = link_la_typ.lat_la_id
and link_la_typ.lat_typ_id = 17484
and link_art.la_ga_id IN (Select distinct link_ga_str.lgs_ga_id
from link_ga_str, search_tree
where link_ga_str.lgs_str_id = search_tree.str_id
and search_tree.str_type = 1
and search_tree.str_id = 10132
and EXISTS (Select *
from link_la_typ
where link_la_typ.lat_typ_id = 17484
and link_ga_str.lgs_ga_id = link_la_typ.lat_ga_id)))
order by article_criteria.acr_value
I think this one is the main badguy with sub-sub-sub-queries
I just noticed i can remove the last exist and still get the same results but with no increase in speed, not part of the question though ;) i'll figure out myself whether i still need that part.
Any help or pointers are appreciated, if i left out some useful information tell me as well.
I think this is equivalent:
SELECT DISTINCT c.cri_id, dt.tex_text, ac.acr_value, ac.acr_kv_des_id
FROM article_criteria AS ac
JOIN criteria AS c ON ac.acr_cri_id = c.cri_id
JOIN articles AS a ON ac.acr_art_id = a.art_id
JOIN designations AS d ON c.cri_des_id = d.des_id
JOIN des_texts AS dt ON dt.tex_id = d.des_tex_id
JOIN (SELECT distinct la.la_art_id
FROM link_art AS la
JOIN link_la_typ AS llt ON la.la_id = llt.lat_la_id
JOIN (SELECT DISTINCT lgs.lgs_ga_id
FROM link_ga_str AS lgs
JOIN search_tree AS st ON lgs.lgs_str_id = st.str_id
JOIN link_la_typ AS llt ON lgs.lgs_ga_id = llt.lat_ga_id
WHERE st.str_type = 1
AND st.str_id = 10132
AND llt.lat_typ_id = 17484) AS lgs
ON la.la_ga_id = lgs.lgs_ga_id
WHERE llt.lat_typ_id = 17484) AS la
ON ac.acr_art_id = la.la_art_id
WHERE a.art_deliverystatus = 1
AND d.des_lng_id = 9
AND c.cri_id = 328
ORDER BY ac.acr_value
All the IN <subquery> clauses can be replaced with JOIN <subquery>, where you then JOIN on the column being tested equaling the column returned by the subquery. And the EXISTS test is converted to a join with the table, moving the comparison in the subquery's WHERE clause into the ON clause of the JOIN.
It's probably possible to flatten the whole thing, instead of joining with subqueries. But I suspect performance will be poor, because this won't reduce the temporary tables using DISTINCT. So you'll get combinatorial explosion in the resulting cross product, which will then have to be reduced at the end with the DISTINCT at the top.
I've converted all the implicit joins to ANSI JOIN clauses, to make the structure clearer, and added table aliases to make things more readable.
In general, you can convert a FROM tab1 WHERE ... val IN (SELECT blah) to a join like this.
FROM tab1
JOIN (
SELECT tab1_id
FROM tab2
JOIN tab3 ON whatever = whatever
WHERE whatever
) AS sub1 ON tab1.id = sub1.tab1_id
The JOIN (an inner join) will drop the rows that don't match the ON condition from your query.
If your tab1_id values can come up duplicate from your inner query, use SELECT DISTINCT. But don't use SELECT DISTINCT unless you need to; it is costly to evaluate.
I'm in over my head with a big mysql query (mysql 5.0), and i'm hoping somebody here can help.
Earlier I asked how to get distinct values from a joined query
mysql count only for distinct values in joined query
The response I got worked (using a subquery with join as)
select *
from media m
inner join
( select uid
from users_tbl
limit 0,30) map
on map.uid = m.uid
inner join users_tbl u
on u.uid = m.uid
unfortunately, my query has grown more unruly, and though I have it running, joining into a derived table is taking too long because there is no indexes available to the derived query.
my query now looks like this
SELECT mdate.bid, mdate.fid, mdate.date, mdate.time, mdate.title, mdate.name,
mdate.address, mdate.rank, mdate.city, mdate.state, mdate.lat, mdate.`long`,
ext.link,
ext.source, ext.pre, meta, mdate.img
FROM ext
RIGHT OUTER JOIN (
SELECT media.bid,
media.date, media.time, media.title, users.name, users.img, users.rank, media.address,
media.city, media.state, media.lat, media.`long`,
GROUP_CONCAT(tags.tagname SEPARATOR ' | ') AS meta
FROM media
JOIN users ON media.bid = users.bid
LEFT JOIN tags ON users.bid=tags.bid
WHERE `long` BETWEEN -122.52224684058 AND -121.79760915942
AND lat BETWEEN 37.07500915942 AND 37.79964684058
AND date = '2009-02-23'
GROUP BY media.bid, media.date
ORDER BY media.date, users.rank DESC
LIMIT 0, 30
) mdate ON (mdate.bid = ext.bid AND mdate.date = ext.date)
phew!
SO, as you can see, if I understand my problem correctly, i have two derivative tables without indexes (and i don't deny that I may have screwed up the Join statements somehow, but I kept messing with different types, is this ended up giving me the result I wanted).
What's the best way to create a query similar to this which will allow me to take advantage of the indexes?
Dare I say, I actually have one more table to add into the mix at a later date.
Currently, my query is taking .8 seconds to complete, but I'm sure if I could take advantage of the indexes, this could be significantly faster.
First, check for indices on ext(bid, date), users(bid) and tags(bid), you should really have them.
It seems, though, that it's LONG and LAT that cause you most problems. You should try keeping your LONG and LAT as a (coordinate POINT), create a SPATIAL INDEX on this column and query like that:
WHERE MBRContains(#MySquare, coordinate)
If you can't change your schema for some reason, you can try creating additional indices that include date as a first field:
CREATE INDEX ix_date_long ON media (date, `long`)
CREATE INDEX ix_date_lat ON media (date, lat)
These indices will be more efficient for you query, as you use exact search on date combined with a ranged search on axes.
Starting fresh:
Question - why are you grouping by both media.bid and media.date? Can a bid have records for more than one date?
Here's a simpler version to try:
SELECT
mdate.bid,
mdate.fid,
mdate.date,
mdate.time,
mdate.title,
mdate.name,
mdate.address,
mdate.rank,
mdate.city,
mdate.state,
mdate.lat,
mdate.`long`,
ext.link,
ext.source,
ext.pre,
meta,
mdate.img,
( SELECT GROUP_CONCAT(tags.tagname SEPARATOR ' | ')
FROM tags
WHERE ext.bid = tags.bid
ORDER BY tags.bid GROUP BY tags.bid
) AS meta
FROM
ext
LEFT JOIN
media ON ext.bid = media.bid AND ext.date = media.date
JOIN
users ON ext.bid = users.bid
WHERE
`long` BETWEEN -122.52224684058 AND -121.79760915942
AND lat BETWEEN 37.07500915942 AND 37.79964684058
AND ext.date = '2009-02-23'
AND users.userid IN
(
SELECT userid FROM users ORDER BY rank DESC LIMIT 30
)
ORDER BY
media.date,
users.rank DESC
LIMIT 0, 30
You might want to compare your perforamnces against using a temp table for each selection, and joining those tables together.
create table #whatever
create table #whatever2
insert into #whatever select...
insert into #whatever2 select...
select from #whatever join #whatever 2
....
drop table #whatever
drop table #whatever2
If your system has enough memory to hold full tables this might work out much faster. It depends on how big your database is.