Okay, so as the title says, the queries were running fine 2 days ago, then all of a sudden yesterday the site was loading very slow. I tracked it down to a couple queries. One I was able to add an index which seems to have helped, but this one I just can't figure out. I tried running a repair and optimize on the tables, and that didn't help. I don't know what could have changed so much that would make it go from less than a second to query to 20+ seconds. Any help would be much appreciated.
SELECT city
FROM listings LEFT JOIN agencies
ON listings.agencyid_fk = agencies.agencyid
WHERE listingstatus IN (1,3) AND appid_fk = 101 AND active = 1
AND auction IS NULL AND agencies.multilist = 1
AND notagency IS NULL
GROUP BY city
ORDER BY city;
I wasn't sure how to export the explain query result to make it readable on here, so I just put it in a code snippet. click run to see it in an html table.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE listings ref appid_fk,listingstatus appid_fk 2 const 21699 Using where; Using temporary; Using filesort
1 SIMPLE agencies eq_ref PRIMARY,active PRIMARY 2 mls2.listings.agencyid_fk 1 Using where
And here are the tables...
listings table:
CREATE TABLE mls2.listings (
listingid INT(11) AUTO_INCREMENT NOT NULL,
appid_fk SMALLINT(3) NOT NULL DEFAULT '0',
agencyid_fk SMALLINT(3) NOT NULL DEFAULT '0',
listingstatus SMALLINT(3),
city VARCHAR(30) CHARACTER SET latin1 COLLATE latin1_swedish_ci,
multilist TINYINT(1),
auction TINYINT(1),
PRIMARY KEY (listingid)
) ENGINE = myisam ROW_FORMAT = DEFAULT CHARACTER SET latin1;
agencies table:
CREATE TABLE mls2.agenciesx (
agencyid SMALLINT(6) AUTO_INCREMENT NOT NULL,
multilist TINYINT(4) DEFAULT '0',
notagency TINYINT(1),
active TINYINT(1),
PRIMARY KEY (agencyid)
) ENGINE = myisam ROW_FORMAT = DEFAULT CHARACTER SET latin1;
Once you've taken on board the comments above, try adding the following indexes to your tables...
INDEX(city,listingstatus,appid_fk,auction)
INDEX(active,multilist,notagency)
In both cases, the order in which columns are arranged in the index may make a difference, so play around with those, although there are so few rows in the agencies table, that that one won't really matter.
Next, get rid of the GROUP BY clause, and write your query as follows.
SELECT DISTINCT l.city
FROM listings l
JOIN agencies a
ON a.agencyid = l.agencyid_fk
WHERE l.listingstatus IN (1,3)
AND l.appid_fk = 101
AND a.active = 1
AND l.auction IS NULL
AND a.multilist = 1
AND a.notagency IS NULL
ORDER
BY city;
Note: Although irrelevant for this particular problem, your original question showed that this schema is desperately in need of normalisation.
Related
I have a query which purpose is to generate statistics for how many musical work (track) has been downloaded from a site at different periods (by month, by quarter, by year etc). The query operates on the tables entityusage, entityusage_file and track.
To get the number of downloads for tracks belonging to an specific album I would do the following query :
select
date_format(eu.updated, '%Y-%m-%d') as p, count(eu.id) as c
from entityusage as eu
inner join entityusage_file as euf
ON euf.entityusage_id = eu.id
inner join track as t
ON t.id = euf.track_id
where
t.album_id = '0054a47e-b594-407b-86df-3be078b4e7b7'
and entitytype = 't'
and action = 1
group by date_format(eu.updated, '%Y%m%d')
I need to set entitytype = 't' as the entityusage can hold downloads of other entities as well (if entitytype = 'a' then an entire album would have been downloaded, and entityusage_file would then hold all tracks which the album "translated" into at the point of download).
This query takes 40 - 50 seconds. I've been trying to optimize this query for a while, but I have the feeling that I'm approaching this the wrong way.
This is one out of 4 similar queries which must run to generate a report. The report should preferable be able to finish while a user waits for it. Right now, I'm looking at 3 - 4 minutes. That's a long time to wait.
Can this query be optimised further with indexes, or do I need to take another approach to get this job done?
CREATE TABLE `entityusage` (
`id` char(36) NOT NULL,
`title` varchar(255) DEFAULT NULL,
`entitytype` varchar(5) NOT NULL,
`entityid` char(36) NOT NULL,
`externaluser` int(10) NOT NULL,
`action` tinyint(1) NOT NULL,
`updated` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `e` (`entityid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `entityusage_file` (
`id` char(36) NOT NULL,
`entityusage_id` char(36) NOT NULL,
`track_id` char(36) NOT NULL,
`file_id` char(36) NOT NULL,
`type` varchar(3) NOT NULL,
`quality` int(1) NOT NULL,
`size` int(20) NOT NULL,
`updated` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `file_id` (`file_id`),
KEY `entityusage_id` (`entityusage_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `track` (
`id` char(36) NOT NULL,
`album_id` char(36) NOT NULL,
`number` int(3) NOT NULL DEFAULT '0',
`title` varchar(255) DEFAULT NULL,
`updated` datetime NOT NULL DEFAULT '2000-01-01 00:00:00',
PRIMARY KEY (`id`),
KEY `album` (`album_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 CHECKSUM=1 DELAY_KEY_WRITE=1 ROW_FORMAT=DYNAMIC;
An EXPLAIN on the query gives me the following :
+------+-------------+-------+--------+----------------+----------------+---------+------------------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+--------+----------------+----------------+---------+------------------------------+---------+----------------------------------------------+
| 1 | SIMPLE | eu | ALL | NULL | NULL | NULL | NULL | 7832817 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | euf | ref | entityusage_id | entityusage_id | 108 | func | 1 | Using index condition |
| 1 | SIMPLE | t | eq_ref | PRIMARY,album | PRIMARY | 108 | trackerdatabase.euf.track_id | 1 | Using where |
+------+-------------+-------+--------+----------------+----------------+---------+------------------------------+---------+----------------------------------------------+
This is your query:
select date_format(eu.updated, '%Y-%m-%d') as p, count(eu.id) as c
from entityusage eu join
entityusage_file euf
on euf.entityusage_id = eu.id join
track t
on t.id = euf.track_id
where t.album_id = '0054a47e-b594-407b-86df-3be078b4e7b7' and
eu.entitytype = 't' and
eu.action = 1
group by date_format(eu.updated, '%Y%m%d');
I would suggest indexes on track(album_id, id), entityusage_file(track_id, entityusage_id), and entityusage(id, entitytype, action).
Assuming that entityusage_file is mostly a many:many mapping table, see this for tips on improving it. Note that it calls for getting rid of the id and making a pair of 2-column indexes, one of which is the PRIMARY KEY(track_id, entityusage_id). Since your table has a few extra columns, that link does not cover everything.
The UUIDs could be shrunk from 108 bytes to 36, then then to 16 by going to BINARY(16) and using a compression function. Many exist (including a builtin pair in version 8.0); here's mine.
To explain one thing... The query execution should have started with track (on the assumption that '0054a47e-b594-407b-86df-3be078b4e7b7' is very selective). The hangup was that there was no index to get from there to the next table. Gordon's suggested indexes include such.
date_format(eu.updated, '%Y-%m-%d') and date_format(eu.updated, '%Y%m%d') can be simplified to DATE(eu.updated). (No significant performance change.)
(The other Answers and Comments cover a number of issues; I won't repeat them here.)
Because the GROUP BY operation is on an expression involving a function, MySQL can't use an index to optimize that operation. It's going to require a "Using filesort" operation.
I believe the indexes that Gordon suggested are the best bets, given the current table definitions. But even with those indexes, the "tall post" is the eu table, chunking through and sorting all those rows.
To get more reasonable performance, you may need to introduce a "precomputed results" table. It's going to be expensive to generate the counts for everything... but we can pay that price ahead of time...
CREATE TABLE usage_track_by_day
( updated_dt DATE NOT NULL
, PRIMARY KEY (track_id, updated_dt)
)
AS
SELECT eu.track_id
, DATE(eu.updated) AS updated_dt
, SUM(IF(eu.action = 1,1,0) AS cnt
FROM entityusage eu
WHERE eu.track_id IS NOT NULL
AND eu.updated IS NOT NULL
GROUP
BY eu.track_id
, DATE(eu.updated)
An index ON entityusage (track_id,updated,action) may benefit performance.
Then, we could write a query against the new "precomputed results" table, with a better shot at reasonable performance.
The "precomputed results" table would get stale, and would need to be periodically refreshed.
This isn't necessarily the best solution to the issue, but it's a technique we can use in datawarehouse/datamart applications. This lets us churn through lots of detail rows to get counts one time, and then save those counts for fast access.
can you try this. i cant really test it without some sample data from you.
In this case the query looks first in table track and joins then the other tables.
SELECT
date_format(eu.updated, '%Y-%m-%d') AS p
, count(eu.id) AS c
FROM track AS t
INNER JOIN entityusage_file AS euf ON t.id = euf.track_id
INNER JOIN entityusage AS eu ON euf.entityusage_id = eu.id
WHERE
t.album_id = '0054a47e-b594-407b-86df-3be078b4e7b7'
AND entitytype = 't'
AND ACTION = 1
GROUP BY date_format(eu.updated, '%Y%m%d');
I have a mySQL DB of ~1 milllion entries.
I run the query:
SELECT a.id as aid, a.title as atitle, a.slug, summary,
a.link as alink, author, published, image, a.cat as acat,
a.rss as arss, a.site as asite
FROM articles a
ORDER BY published DESC
LIMIT 616150, 50;
It takes about 5 minutes or more ot load.
My TABLE and INDEXes:
CREATE TABLE IF NOT EXISTS `articles` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`slug` varchar(255) NOT NULL,
`summary` text NOT NULL,
`link` text NOT NULL,
`author` varchar(255) NOT NULL,
`published` datetime NOT NULL,
`image` text NOT NULL,
`cat` int(11) NOT NULL,
`rss` int(11) NOT NULL,
`site` int(11) NOT NULL,
`bitly` varchar(255) NOT NULL,
`checked` tinyint(4) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `title` (`title`),
KEY `cat` (`cat`),
KEY `published` (`published`),
KEY `site` (`site`),
KEY `rss` (`rss`),
KEY `checked` (`checked`),
KEY `id_publ_index` (`id`,`published`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1230234;
What explain says:
mysql> EXPLAIN EXTENDED SELECT a.id as aid, a.title as atitle, a.slug, summary, a.link as alink, author, published, image, a.cat as acat, a.rss as arss, a.site as asite FROM articles a ORDER BY published DESC LIMIT 616150, 50;
+----+-------------+-------+-------+---------------+-----------+---------+------+--------+----------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+-------+---------------+-----------+---------+------+--------+----------+-------+
| 1 | SIMPLE | a | index | NULL | published | 8 | NULL | 616200 | 152.94 | |
+----+-------------+-------+-------+---------------+-----------+---------+------+--------+----------+-------+
1 row in set, 1 warning (0.46 sec)
Any tips of how to optimize this query? Why mySQL needs to read all 616200 rows and not just the 50 that were asked?
Thank you for your time.
The reason that you are seeing the published key being used is because that is what you are ordering by. How often does this query need to run?
There is one simple thing that you can do to help this query run much, much faster:
Make better use of your published key. Use WHERE to define what range of dates you want to retrieve from your table.
The reason why you are reading 616,200 rows of your table right now is because you are not using the index to limit the range. MySQL needs to use your full index to:
Sort the first 616200 rows in DESC order and then
Finally limit the result to 50 rows.
If possible, you should filter the results of your database in a different way. Changing your results to be based on the WHERE (making more efficient use of your index) will be the quickest way.
For example:
SELECT a.id as aid, a.title as atitle, a.slug, summary,
a.link as alink, author, published, image, a.cat as acat,
a.rss as arss, a.site as asite
FROM articles a
WHERE published > '2010-01-01'
ORDER BY published DESC
LIMIT 6150, 50;
The sad part is that ORDER BY and LIMIT does not scale too well, and you will loose you speed very quickly. (eg, change your limit to 0, 50, and then to 900000, 50 and see how your speed is affected) so adding more information to your WHERE will help your query be much faster.
EDIT:
There is no way I can know what to display by date, so putting a where is not possible. In addition this query is run on a news aggregator, that collects news every ... second. The limit is made so I can create paginated results.
Because you are inserting new posts, your LIMIT statement is going to cause the news items to jump when a user is going through page anyway. For instance, if I am on page one and three items get added before I press 'Next', then by the time that I click 'Next', I will see the last three items from the previous page.
For the best possible user experience, you should try adding the ID of the last seen news item or the date of the last seen news item to the pagination somehow. This could be done either by sessions or part of the query URL, but it will allow you to have better use of your indexes.
I understand why the limit is there - it's just how can you fix the issue of the query being slow after a certain amount of pages have been hit.
To efficiently fix your speed issues, you will need to make better use of an index, rather then relying on 'LIMIT' to be your sole method of pagination. LIMIT is amazing, yes, but it is not optimized for retrieving records the way that you are trying to do it, because you need to sort by the date.
Even though as you say 'there is no way I can know what to display by date' (at least currently...) there must be a way for your application to limit what needs to be fetched from your db. The same way Facebook does not need to go through every member of the website's individual posts just to make results show on your Facebook wall. You need to find out how it can be made more efficient.
I am trying to optimise my site and would be grateful for some help.
The site is a mobile phone comparison site and the query is to show the offers for a particular phone. The data is on 2 tables, the cmp_deals table has 931000 entries and the cmp_tariffs table has 2600 entries. The common field is the tariff_id.
###The deals table###
id int(11)
hs_id varchar(30)
tariff_id varchar(30)
hs_price decimal(4,2)
months_free int(2)
months_half int(2)
cash_back decimal(4,2)
free_gift text
retailer_id varchar(20)
###Deals Indexes###
hs_id INDEX 430 Edit Drop hs_id
months_half INDEX 33 Edit Drop months_half
months_free INDEX 25 Edit Drop months_free
hs_price INDEX 2223 Edit Drop hs_price
cash_back INDEX<br/>
###The tariff table###
ID int(11)
tariff_id varchar(30)
tariff_name varchar(255)
tariff_desc text
anytime_mins int(5)
offpeak_mins int(5)
texts int(5)
line_rental decimal(4,2)
cost_offset decimal(4,2)
No Indexes
The initial query is
SELECT * FROM `cmp_deals`
LEFT JOIN `cmp_tariffs` USING (tariff_id)
WHERE hs_id = 'iphone432gbwhite'
and then the results can be sorted by various items in the cmp_deals table such as cash_back or free_gift
SELECT * FROM `cmp_deals`
LEFT JOIN `cmp_tariffs` USING (tariff_id)
WHERE hs_id = 'iphone432gbwhite'
ORDER BY hs_price DESC
LIMIT 30
This is the result when I run an "EXPLAIN" command, but I don't really understand the results.
id select_type table type possible_keys key key_len ref rows Extra<br/>
1 SIMPLE cmp_deals ref hs_id hs_id 92 const 179 Using where; Using temporary; Using filesort<br/>
1 SIMPLE cmp_tariffs ALL NULL NULL NULL NULL 2582
I would like to know please if I am doing these queries in the most efficient way as the queries are averaging at 2 seconds plus.
Thanks in advance
Can't say I'm a fan of all those double IDs (numeric and human-readable). If you don't actually need the varchar versions, drop them.
Change the foreign key cmp_deals.tarrif_id to reference cmp_tarrifs.ID (ie, make it an INT and if using InnoDB, actually create foreign key constraints).
At the very least make cmp_tariffs.ID a primary key and optionally cmp_tariffs.tariff_id a unique index.
Having zero indexes on the tariffs table means it has to do a table scan to complete the join.
I am having some issues with a group query with MySQL.
Question
Is there a reason why a query won't use a 10 character partial index on a varchar(255) field to optimize a group by?
Details
My setup:
CREATE TABLE `sessions` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`ref_source` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`guid` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`initial_path` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`referrer_host` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`campaign` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_sessions_on_user_id` (`user_id`),
KEY `index_sessions_on_referrer_host` (`referrer_host`(10)),
KEY `index_sessions_on_initial_path` (`initial_path`(10)),
KEY `index_sessions_on_campaign` (`campaign`(10))
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
A number of columns and indexes are not shown here since they don't really impact the issue.
What I want to do is run a query to see all of the referring hosts and the number of session coming from each. I don't have a huge table, but it is big enough where I full table scans aren't fun. The query I want to run is:
SELECT COUNT(*) AS count_all, referrer_host AS referrer_host FROM `sessions` GROUP BY referrer_host;
The explain gives:
+----+-------------+----------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+------+---------+------+--------+---------------------------------+
| 1 | SIMPLE | sessions | ALL | NULL | NULL | NULL | NULL | 303049 | Using temporary; Using filesort |
+----+-------------+----------+------+---------------+------+---------+------+--------+---------------------------------+
I have a partial index on referrer_host, but it isn't using it. Even if I try to USE INDEX or FORCE INDEX it doesn't help. The explain is the same, as is the performance.
If I add a full index on referrer_host, instead of a 10 character partial index, everything is works better, if not instantly. (350ms vs. 10 seconds)
I have tested partial indexes that are bigger than the longest entry in the field to no avail as well. The full index is the only thing that seems to work.
with the full index, the query will find scan the entire index and return the number of records pointed to for each unique key. the table isn't touched.
with the partial index, the engine doesn't know the value of the referrer_host until it looks at the record. It has to scan the whole table!
if most of the values for referrer_host are less than 10 chars then in theory, the optimiser could use the index and then only check rows that have more than 10 chars. But, because this is not a clustered index it would have to make many non-sequential disk reads to find these records. It could end up being even slower, because a table scan will at least be a sequential read. Instead of making assumptions, the optimiser just does a scan.
Try this query:
EXPLAIN SELECT COUNT(referrer_host) AS count_all, referrer_host FROM `sessions` GROUP BY referrer_host;
Now the count will fail for the group by on referrer_host = null, but I'm uncertain if there's another way around this.
You're grouping on referrer_host for all the rows in the table. As your index doesn't include referrer_host (it contains the first 10 chars!), it's going to scan the whole table.
I'll bet that this is faster, though less detailed:
SELECT COUNT(*) AS count_all, substring(referrer_host,1,10) AS referrer_host FROM `sessions` GROUP BY referrer_host;
If you need the full referrer, index it.
I have web application that use a similar table scheme like below. simply I want to optimize the selection of articles. articles are selected based on the tag given. for example, if the tag is 'iphone' , the query should output all open articles about 'iphone' from the last month.
CREATE TABLE `article` (
`id` int(11) NOT NULL auto_increment,
`title` varchar(100) NOT NULL,
`body` varchar(200) NOT NULL,
`date` timestamp NOT NULL default CURRENT_TIMESTAMP,
`author_id` int(11) NOT NULL,
`section` varchar(30) NOT NULL,
`status` int(1) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
CREATE TABLE `tags` (
`name` varchar(30) NOT NULL,
`article_id` int(11) NOT NULL,
PRIMARY KEY (`name`,`article_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `users` (
`id` int(11) NOT NULL auto_increment,
`username` varchar(30) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=3 ;
The following is my MySQL query
explain select article.id,users.username,article.title
from article,users,tags
where article.id=tags.article_id and tags.name = 'iphone4'
and article.author_id=users.id and article.status = '1'
and article.section = 'mobile'
and article.date > '2010-02-07 13:25:46'
ORDER BY tags.article_id DESC
the output is
id select_type table type possible_keys key key_len ref rows Extra <br>
1 SIMPLE tags ref PRIMARY PRIMARY 92 const 55 Using where; Using index <br>
1 SIMPLE article eq_ref PRIMARY PRIMARY 4 test.tags.article_id 1 Using where <br>
1 SIMPLE users eq_ref PRIMARY PRIMARY 4 test.article.author_id 1 <br>
is it possible to optimize it more?
This query may be optimized, depending on which condition is more selective: tags.name = 'iphone4' or article.date > '2010-02-07 13:25:46'
If there are less articles tagged iphone than those posted after Feb 7, then your original query is nice.
If there are many articles tagged iphone, but few those posted after Feb 7, then this query will be more efficient:
SELECT article.id, users.username, article.title
FROM tags
JOIN article
ON article.id = tags.article_id
AND article.status = '1'
AND article.section = 'mobile'
AND article.date > '2010-02-07 13:25:46'
JOIN users
ON users.id = article.author_id
WHERE tags.name = 'iphone4'
ORDER BY
tags.article_date DESC, tags.article_id DESC
Note that the ORDER BY condition has changed. This may or may not be what you want, however, generally the orders of id and date correspond to each other.
If you really need your original ORDER BY condition you may leave it but it will add a filesort (or just revert to your original plan).
In either case, create an index on
article (status, section, date, id)
the query should output all open articles about 'iphone' from the last month.
So the only query you are going to run on this data uses the tag and the date. You've got a index for the tag in the tags table, but the date is stored in a different table (article - you're a bit inconsistent with your naming schema). Adding an index on the article table using date would be no benefit at all. Using id,date (in that order) would help a little - but really the date needs to be denormalised into the tags table to get the query running really fast.
Unless you're regularly moving around bulk data sets - just add a datetime column with a default of the current timestamp to the tags table.
I expect that you may be wanting to interact with the data in lots of other ways - really you should set a low (no?) threshold for slow query logging then analyse the resulting data to identify where you're performance problems are (try looking at the queries with the highest values for duration^2*frequency first).
There's a script at the URL below which is useful for this analysis:
http://www.retards.org/projects/mysql/
You could index the additional fields in article that you are referencing in your select statement. In this case, I would suggest you create an index in article like this:
CREATE INDEX article_idx ON article (author_id, status, section, date);
Creating that index should speed up your query depending on how many overall records you are dealing with. From my understanding, properly creating indexes involves looking at the queries you've written and indexing the columns that are a part of your where clause. This helps the query optimizer better process the query in general. That does not mean create an index on each individual column, however, as its both inefficient to do so and ineffective. When possible, create multiple column indexes that represent your select statement.