Related
I encountered a very puzzling optimization case. I'm no SQL expert but still this case seems to defy my understanding of clustered key principles.
I have the below table schema:
CREATE TABLE `orders` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`chargeQuote` tinyint(1) NOT NULL,
`features` int(11) NOT NULL,
`sequenceIndex` int(11) NOT NULL,
`createdAt` bigint(20) NOT NULL,
`previousSeqId` bigint(20) NOT NULL,
`refOrderId` bigint(20) NOT NULL,
`refSeqId` bigint(20) NOT NULL,
`seqId` bigint(20) NOT NULL,
`updatedAt` bigint(20) NOT NULL,
`userId` bigint(20) NOT NULL,
`version` bigint(20) NOT NULL,
`amount` decimal(36,18) NOT NULL,
`fee` decimal(36,18) NOT NULL,
`filledAmount` decimal(36,18) NOT NULL,
`makerFeeRate` decimal(36,18) NOT NULL,
`price` decimal(36,18) NOT NULL,
`takerFeeRate` decimal(36,18) NOT NULL,
`triggerOn` decimal(36,18) NOT NULL,
`source` varchar(32) NOT NULL,
`status` varchar(50) NOT NULL,
`symbol` varchar(32) NOT NULL,
`type` varchar(50) NOT NULL,
PRIMARY KEY (`id`),
KEY `IDX_STATUS` (`status`) USING BTREE,
KEY `IDX_USERID_SYMBOL_STATUS_TYPE` (`userId`,`symbol`,`status`,`type`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=7937243 DEFAULT CHARSET=utf8mb4;
This is a big table. 100 million rows. It's already sharded by createdAt, so 100 million = 1 month worth of orders.
I have a below slow query. The query is pretty straight-forward:
select id,chargeQuote,features,sequenceIndex,createdAt,previousSeqId,refOrderId,refSeqId,seqId,updatedAt,userId,version,amount,fee,filledAmount,makerFeeRate,price,takerFeeRate,triggerOn,source,`status`,symbol,type
from orders where 1=1
and userId=100000
and createdAt >= '1567775174000' and createdAt <= '1567947974000'
and symbol in ( 'BTC_USDT' )
and status in ( 'FULLY_FILLED' , 'PARTIAL_CANCELLED' , 'FULLY_CANCELLED' )
and type in ( 'BUY_LIMIT' , 'BUY_MARKET' , 'SELL_LIMIT' , 'SELL_MARKET' )
order by id desc limit 0,20;
This query takes 24 seconds. The number of rows that satisfy userId=100000 is very little, around 100. And the number of rows that satisfy this entire where clause is 0.
But when I did a small tweak, that is, I changed the order by clause:
order by id desc limit 0,20; -- before
order by createdAt desc, id desc limit 0,20; -- after
It became very fast, 0.03 seconds.
I can see it made a big difference in MySQL engine because explain gives that, before the change it was using key: PRIMARY and after it finally uses key: IDX_USERID_SYMBOL_STATUS_TYPE, as expected, and I guess therefore very fast. Here's the explain plan:
select_type table partitions type possible_keys key key_len ref rows filtered Extra
SIMPLE orders index IDX_STATUS,IDX_USERID_SYMBOL_STATUS_TYPE PRIMARY 8 20360 0.02 Using where
SIMPLE orders range IDX_STATUS,IDX_USERID_SYMBOL_STATUS_TYPE IDX_USERID_SYMBOL_STATUS_TYPE 542 26220 11.11 Using index condition; Using where; Using filesort
So what gives? Actually I was very surprised by the fact that it was not naturally sorted by id (which is the PRIMARY KEY). Isn't this the clustered key in MySQL? And why it chose to not to use index when it's sorted by id?
I'm very puzzled because a more demanding query (sort by 2 conditions) is super fast but a more lenient query is slow.
And no, I tried ANALYZE TABLE orders; and nothing happened.
MySQL has two alternative query plans for queries with ORDER BY ... LIMIT n:
Read all qualifying rows, sort them, and pick the n top rows.
Read the rows in sorted order and stop when n qualifying rows have been found.
In order to decide which is the better option, the optimizer needs to estimate the filtering effect of your WHERE condition. This is not straight-forward, especially for columns that are not indexed, or for columns where values are correlated. In your case, the MySQL optimizer evidently thinks that the second strategy is the best. Inn other words, it does not see that the WHERE clause will not be satisfied by any rows, but thinks that 2% of the rows will satisfy the WHERE clause, and that it will be able to find 20 rows by only scanning part of the table backwards in PRIMARY key order.
How the filtering effect of a WHERE clause is estimated varies quite a bit between 5.6, 5.7, and 8.0. If you are using MySQL 8.0, you can try to create histograms for the columns involved to see if that can improve the estimation. If not, I think your only option is to use a FORCE INDEX hint to make the optimizer choose the desired index.
For your fast query, the second strategy is not an option since there is no index on createdAt that can be used to avoid sorting.
Update:
Reading Rick's answer, I realized that an index on only userId should speed up your ORDER BY id query. In such an index, the entries for a given userId will be sorted on primary key. Hence, using this index will both make it possible to only access the rows of the requested userId, and access the rows in the requested sort order (by id).
The main filters works well with cardinality estimator. When order by uses limit, this is automatically another filter, as data needs to be filter further. This may redirect cardinality estimator to prone to inaccurate estimation which eventually result a poor plan to be selected. In order to prove this, run the 24sec query without the limit clause. It should also respond at 0.3 as your trick.
In order to solve this, if you have a standard very good performance just with the main filters, select this first, and filter at later 2nd time where the result set will be significantly smaller than the whole table. Use something like:
select * from (select ...main select statement)
order by x limit by y
...or...
insert into temp select ...main select statement
select from temp order by x limit by y
Given
and userId=100000
and createdAt >= '1567775174000' and createdAt <= '1567947974000'
and ... -- I am not making use of the other items
order by createdAt DESC, id desc -- I am assuming this change
limit 0,20;
I would try
INDEX(userId, createdAt, id) -- in this order
userId is tested by = is first, thereby narrows down the part of the index to look at.
Leave out the columns tested by IN. If there are multiple values in a IN, we can't make use of step 4.
createdAt filters further by range.
createdAt and id are compared in the same direction (DESC). (Yes, I know 8.0 has an improvement, but I don't think you wanted (ASC, DESC)).
What is the proper indexing for this query.
I tried given different combinations of indexes for this query but it is still using from using tempory , using filesort etc.
Total table data - 7,60,346
product= 'Dresses' - Total rows = 122 554
CREATE TABLE IF NOT EXISTS `product_data` (
`table_id` int(11) NOT NULL AUTO_INCREMENT,
`id` int(11) NOT NULL,
`price` int(11) NOT NULL,
`store` varchar(255) NOT NULL,
`brand` varchar(255) DEFAULT NULL,
`product` varchar(255) NOT NULL,
`model` varchar(255) NOT NULL,
`size` varchar(50) NOT NULL,
`discount` varchar(255) NOT NULL,
`gender_id` int(11) NOT NULL,
`availability` int(11) NOT NULL,
PRIMARY KEY (`table_id`),
UNIQUE KEY `table_id` (`table_id`),
KEY `id` (`id`),
KEY `discount` (`discount`),
KEY `step_one` (`product`,`availability`),
KEY `step_two` (`product`,`availability`,`brand`,`store`),
KEY `step_three` (`product`,`availability`,`brand`,`store`,`id`),
KEY `step_four` (`brand`,`store`),
KEY `step_five` (`brand`,`store`,`id`)
) ENGINE=InnoDB ;
Query :
SELECT id ,store,brand FROM `product_data` WHERE product='dresses' and
availability='1' group by brand,store order by store limit 10;
excu..time :- (10 total, Query took 1.0941 sec)
EXPLAIN PLAN :
possible_keys :- step_one, step_two, step_three, step_four, step_five
key :- step_two
ref :- const,const
rows :- 229438
Extra :-Using where; Using temporary; Using filesort
I tried these indexes
Key step_one (product,availability)
Key step_two (product,availability,brand,store)
Key step_three (product,availability,brand,store,id)
Key step_four (brand,store)
Key step_five (brand,store,id)
The real problem is not the index, but the mismatch between GROUP BY and ORDER BY preventing taking advantage of LIMIT.
This
INDEX(product, availability, store, brand, id)
will be "covering" and in the right order. But note that I have swapped store and brand...
Change the query to
SELECT id ,store,brand
FROM `product_data`
WHERE product='dresses'
and availability='1'
GROUP BY store, brand -- change
ORDER BY store, brand -- change
limit 10;
That changes the GROUP BY to start with store, to reflect the ORDER BY ordering -- this avoid an extra sort. And it changes the ORDER BY to be identical to the GROUP BY so that the two can be combined.
Given those changes, the INDEX can now go all the way through to the LIMIT, thereby allowing the processing to look at only 10 rows, not a much larger set.
Anything less than all these changes will not be as efficient.
Further discussion:
INDEX(product, availability, -- these two can be in either order
store, brand, -- must match both `GROUP BY` and `ORDER BY`
id) -- tacked on (on the end) to make it "covering"
"Covering" means that all the columns for the SELECT are found in the INDEX, so no need to reach over into the data.
But... The whole query does not make sense because of the inclusion of id in the SELECT. If you want to find what stores have available dresses, then get rid of id. If you want to list all the available dresses, then change id to GROUP_CONCAT(id).
For the indexes, the best index is the step_two. The product field is used in where and has more variation than the availability-field.
Couple of notes about the query:
availability='1' should be availability=1 so that needless int->varchar conversion would be avoided.
"group by brand" should not be used as GROUP BY should only be used when you use aggregate functions as selected columns. What as it that you were trying to achieve with the group by?
Your group by clause doesn't really make sense without an aggregate function.
If you can re-write the query to
SELECT id ,store
FROM `product_data`
WHERE product='dresses'
and availability='1'
order by store limit 10;
Then an index on (product,availability,store) will remove all filesorts.
See SQLFiddle: http://sqlfiddle.com/#!9/60f33d/2
UPDATE:
The SQLFiddle makes your intention clear - you're using GROUP BY to simulate DISTINCT. I don't think you can get rid of the filesort and temporary table steps in your query if this is the case - but I also don't think those steps should be hugely expensive.
I have a site where there is an activity feed, similar to how social sites like Facebook have one. It is a "newest first" list that describes actions taken by users. In production, there's about 200k entries in that table.
Since this is going to be asked anyway, I'll first share the full table structure:
CREATE TABLE `karmalog` (
`id` int(11) NOT NULL auto_increment,
`guid` char(36) default NULL,
`user_id` int(11) default NULL,
`user_name` varchar(45) default NULL,
`user_avat_url` varchar(255) default NULL,
`user_sec_id` int(11) default NULL,
`user_sec_name` varchar(45) default NULL,
`user_sec_avat_url` varchar(255) default NULL,
`event` enum('EDIT_PROFILE','EDIT_AVATAR','EDIT_EMAIL','EDIT_PASSWORD','FAV_IMG_ADD','FAV_IMG_ADDED','FAV_IMG_REMOVE','FAV_IMG_REMOVED','FOLLOW','FOLLOWED','UNFOLLOW','UNFOLLOWED','COM_POSTED','COM_POST','COM_VOTE','COM_VOTED','IMG_VOTED','IMG_UPLOAD','LIST_CREATE','LIST_DELETE','LIST_ADMINDELETE','LIST_VOTE','LIST_VOTED','IMG_UPD','IMG_RESTORE','IMG_UPD_LIC','IMG_UPD_MOD','IMG_GEO','IMG_UPD_MODERATED','IMG_VOTE','IMG_VOTED','TAG_FAV_ADD','CLASS_DOWN','CLASS_UP','IMG_DELETE','IMG_ADMINDELETE','IMG_ADMINDELETEFAV','SET_PASSWORD','IMG_RESTORED','IMG_VIEW','FORUM_CREATE','FORUM_DELETE','FORUM_ADMINDELETE','FORUM_REPLY','FORUM_DELETEREPLY','FORUM_ADMINDELETEREPLY','FORUM_SUBSCRIBE','FORUM_UNSUBSCRIBE','TAG_INFO_EDITED','IMG_ADDSPECIE','IMG_REMOVESPECIE','SPECIE_ADDVIDEO','SPECIE_REMOVEVIDEO','EARN_MEDAL','JOIN') NOT NULL,
`event_type` enum('follow','tag','image','class','list','forum','specie','medal','user') NOT NULL,
`active` bit(1) NOT NULL,
`delete` bit(1) NOT NULL default '\0',
`object_id` int(11) default NULL,
`object_cache` text,
`object_sec_id` int(11) default NULL,
`object_sec_cache` text,
`karma_delta` int(11) NOT NULL,
`gold_delta` int(11) NOT NULL,
`newkarma` int(11) NOT NULL,
`newgold` int(11) NOT NULL,
`migrated` int(11) NOT NULL default '0',
`date_created` timestamp NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `user_sec_id` (`user_sec_id`),
KEY `image_id` (`object_id`),
KEY `date_event` (`date_created`,`event`),
KEY `event` (`event`),
KEY `date_created` (`date_created`),
CONSTRAINT `karmalog_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `user` (`id`) ON DELETE SET NULL,
CONSTRAINT `karmalog_ibfk_2` FOREIGN KEY (`user_sec_id`) REFERENCES `user` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Before optimizing this table, my query had 5 joins and I ran into slow query times. I have denormalized all of that data, so that not a single join is there anymore. So the table and query is flat.
As you can see in the table design, there's an "event" field which is an enum, holding a few dozen possible values. Throughout the site, I show activity feeds based on specific event types. Typically that query looks like this:
SELECT * FROM karmalog as k
WHERE k.event IN ($events) AND k.delete=0
ORDER BY k.date_created DESC, k.id DESC
LIMIT 0,30
What this query does is to find the latest 30 entries in the total set that match any of the events passed in $events, which can be multiple.
Due to removing the joins and having indices on most fields, I was expecting this to perform very well, but it doesn't. On 200k entries, it still takes over 3 seconds and I don't understand why.
Regarding solutions, I know I could archive older entries or partition the table per event type, but that will have quite a code impact, and I first would like to understand why the above is so slow.
As a temporary work-around, I'm now doing this:
SELECT * FROM
(SELECT * FROM karmalog ORDER BY date_created DESC, id DESC LIMIT 0,1000) as karma
WHERE karma.event IN ($events) AND karma.delete=0
LIMIT $page,$pagesize
What this does is to limit the baseset to search in to the latest 1000 entries only, hoping and guessing that there's 30 entries to be found for the filters that I pass in. It's not very robust though. It will not work for more rare events, and it brings pagination issues.
Therefore, I first like to get to the root cause of why my initial query is slow, against my expectation.
Edit: I was asked to share the execution plan. Here's the test query:
EXPLAIN SELECT * FROM karmalog
WHERE event IN ('FAV_IMG_ADD','FOLLOW','COM_POST','IMG_VOTE','LIST_VOTE','JOIN','CLASS_UP','LIST_CREATE','FORUM_REPLY','FORUM_CREATE','FORUM_SUBSCRIBE','IMG_GEO','IMG_ADDSPECIE','SPECIE_ADDVIDEO','EARN_MEDAL') AND karmalog.delete=0
ORDER BY date_created DESC, id DESC
LIMIT 0,36
Execution plan:
id = 1
select_type = SIMPLE
table = karmalog
type = range
possible_keys = event
key = event
key_len = 1
red = NULL
rows = 80519
Extra = Using where; Using filesort
I'm not sure how to read into the above, but I do know that the sort clause really seems to kill this query. With this sorting, it takes 4.3 secs, without 0.03 secs.
SELECT * sometimes slows down ordered queries by a huge amount, so let's start by refactoring your query as follows:
SELECT k.*
FROM karmalog AS k
JOIN (
SELECT id
FROM karmalog
WHERE event IN ($events)
AND delete=0
ORDER BY date_created DESC, id DESC
LIMIT 0,30
) AS m ON k.id = m.id
ORDER BY k.date_created DESC, k.id DESC
This will do your ORDER BY ... LIMIT operation without having to haul the whole table around in the sorting phase. Finally it will look up the appropriate thirty rows from the original table and sort just those again. This might save a whole lot of I/O and in-memory data shuffling.
Second, if id column values are assigned in ascending order as records are inserted, then the use of date_created in your ORDER BY operation is redundant. But MySQL doesn't know that, so leaving it out might help. This will be true if you always use the current date when inserting, and never update the dates.
Third, you might be able to use a compound covering index for the selection (inner) query. This is an index that contains all the fields you need. When you use a covering index, the whole query can be satisfied from the index, and there's no need to bounce back to the original table. This saves disk access time.
Try this compound covering index: (delete, event, id). If you decide you can't get rid of the use of date_created in your ordering, try this instead: (delete, event, date_created, id)
Add a compound index over the two relevant questions. In your table, you can do that by specifying e.g.
KEY `date_created` (`date_created`, `event`)
This key can still be used to satisfy plain old date_created range searching. But in addition to that, the event data is included as well, so the DBS will be able to detect the relevant rows by only looking at the index.
If you want, you can try the other order as well: first event and then date. This might allow some optimization if there are many event types but your filter only contains few. On the other hand, I'm not sure the system will be able to make use of the LIMIT clause in this case, so I'm not certain that this other order will be any help at all.
Edit: I completely missed that your date_event index already has this info. According to your execution plan, though, that one isn't used. Looks like the optimizer is getting things wrong. You could try removing the event index, and perhaps the date index as well, and see what happens then.
I have a simple mysql query, but when I have a lot of records (currently 103,0000), the performance is really slow and it says it is using filesort, im not sure if this is why it is slow. Has anyone any suggestions to speed it up? or stop it using filesort?
MYSQL query :
SELECT *
FROM adverts
WHERE (price >= 0)
AND (status = 1)
AND (approved = 1)
ORDER BY date_updated DESC
LIMIT 19990, 10
The Explain results :
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE adverts range price price 4 NULL 103854 Using where; Using filesort
Here is the adverts table and indexes:
CREATE TABLE `adverts` (
`advert_id` int(10) NOT NULL AUTO_INCREMENT,
`user_id` int(10) NOT NULL,
`type_id` tinyint(1) NOT NULL,
`breed_id` int(10) NOT NULL,
`advert_type` tinyint(1) NOT NULL,
`headline` varchar(50) NOT NULL,
`description` text NOT NULL,
`price` int(4) NOT NULL,
`postcode` varchar(7) NOT NULL,
`town` varchar(60) NOT NULL,
`county` varchar(60) NOT NULL,
`latitude` float NOT NULL,
`longitude` float NOT NULL,
`telephone1` varchar(15) NOT NULL,
`telephone2` varchar(15) NOT NULL,
`email` varchar(80) NOT NULL,
`status` tinyint(1) NOT NULL DEFAULT '0',
`approved` tinyint(1) NOT NULL DEFAULT '0',
`date_created` datetime NOT NULL,
`date_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`expiry_date` datetime NOT NULL,
PRIMARY KEY (`advert_id`),
KEY `price` (`price`),
KEY `user` (`user_id`),
KEY `type_breed` (`type_id`,`breed_id`),
KEY `headline_keywords` (`headline`),
KEY `date_updated` (`date_updated`),
KEY `type_status_approved` (`advert_type`,`status`,`approved`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
The problem is that MySQL only uses one index when executing the query. If you add a new index that uses the 3 fields in your WHERE clause, it will find the rows faster.
ALTER TABLE `adverts` ADD INDEX price_status_approved(`price`, `status`, `approved`);
According to the MySQL documentation ORDER BY Optimization:
In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. These cases include the following:
The key used to fetch the rows is not the same as the one used in the ORDER BY.
This is what happens in your case.
As the output of EXPLAIN tells us, the optimizer uses the key price to find the rows. However, the ORDER BY is on the field date_updated which does not belong to the key price.
To find the rows faster AND sort the rows faster, you need to add an index that contains all the fields used in the WHERE and in the ORDER BY clauses:
ALTER TABLE `adverts` ADD INDEX status_approved_date_updated(`status`, `approved`, `date_updated`);
The field used for sorting must be in the last position in the index. It is useless to include price in the index, because the condition used in the query will return a range of values.
If EXPLAIN still shows that it is using filesort, you may try forcing MySQL to use an index you choose:
SELECT adverts.*
FROM adverts
FORCE INDEX(status_approved_date_updated)
WHERE price >= 0
AND adverts.status = 1
AND adverts.approved = 1
ORDER BY date_updated DESC
LIMIT 19990, 10
It is usually not necessary to force an index, because the MySQL optimizer most often does the correct choice. But sometimes it makes a bad choice, or not the best choice. You will need to run some tests to see if it improves performance or not.
Remove the ticks around the '0' - it currently may prevent using the index but I am not sure.
Nevertheless it is better style since price is int type and not a character column.
SELECT adverts .*
FROM adverts
WHERE (
price >= 0
)
AND (
adverts.status = 1
)
AND (
adverts.approved = 1
)
ORDER BY date_updated DESC
LIMIT 19990 , 10
MySQL does not make use of the key date_updated for the sorting but just uses the price key as it is used in the WHERE clause. You could try to to use index hints:
http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
Add something like
USE KEY FOR ORDER BY (date_updated)
I have two suggestions. First, remove the quotes around the zero in your where clause. That line should be:
price >= 0
Second, create this index:
CREATE INDEX `helper` ON `adverts`(`status`,`approved`,`price`,`date_created`);
This should allow MySQL to find the 10 rows specified by your LIMIT clause by using only the index. Filesort itself is not a bad thing... the number of rows that need to be processed is.
Your WHERE condition uses price, status, approved to select, and then date_updated is used to sort.
So you need a single index with those fields; I'd suggest indexing on approved, status, price and date_updated, in this order.
The general rule is placing WHERE equalities first, then ranges (more than, less or equal, between, etc), and sorting fields last. (Note that leaving one field out might make the index less usable, or even unusable, for this purpose).
CREATE INDEX advert_ndx ON adverts (approved, status, price, date_updated);
This way, access to the table data is only needed after LIMIT has worked its magic, and you will slow-retrieve only a small number of records.
I'd also remove any unneeded indexes, which would speed up INSERTs and UPDATEs.
I have web application that use a similar table scheme like below. simply I want to optimize the selection of articles. articles are selected based on the tag given. for example, if the tag is 'iphone' , the query should output all open articles about 'iphone' from the last month.
CREATE TABLE `article` (
`id` int(11) NOT NULL auto_increment,
`title` varchar(100) NOT NULL,
`body` varchar(200) NOT NULL,
`date` timestamp NOT NULL default CURRENT_TIMESTAMP,
`author_id` int(11) NOT NULL,
`section` varchar(30) NOT NULL,
`status` int(1) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
CREATE TABLE `tags` (
`name` varchar(30) NOT NULL,
`article_id` int(11) NOT NULL,
PRIMARY KEY (`name`,`article_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `users` (
`id` int(11) NOT NULL auto_increment,
`username` varchar(30) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=3 ;
The following is my MySQL query
explain select article.id,users.username,article.title
from article,users,tags
where article.id=tags.article_id and tags.name = 'iphone4'
and article.author_id=users.id and article.status = '1'
and article.section = 'mobile'
and article.date > '2010-02-07 13:25:46'
ORDER BY tags.article_id DESC
the output is
id select_type table type possible_keys key key_len ref rows Extra <br>
1 SIMPLE tags ref PRIMARY PRIMARY 92 const 55 Using where; Using index <br>
1 SIMPLE article eq_ref PRIMARY PRIMARY 4 test.tags.article_id 1 Using where <br>
1 SIMPLE users eq_ref PRIMARY PRIMARY 4 test.article.author_id 1 <br>
is it possible to optimize it more?
This query may be optimized, depending on which condition is more selective: tags.name = 'iphone4' or article.date > '2010-02-07 13:25:46'
If there are less articles tagged iphone than those posted after Feb 7, then your original query is nice.
If there are many articles tagged iphone, but few those posted after Feb 7, then this query will be more efficient:
SELECT article.id, users.username, article.title
FROM tags
JOIN article
ON article.id = tags.article_id
AND article.status = '1'
AND article.section = 'mobile'
AND article.date > '2010-02-07 13:25:46'
JOIN users
ON users.id = article.author_id
WHERE tags.name = 'iphone4'
ORDER BY
tags.article_date DESC, tags.article_id DESC
Note that the ORDER BY condition has changed. This may or may not be what you want, however, generally the orders of id and date correspond to each other.
If you really need your original ORDER BY condition you may leave it but it will add a filesort (or just revert to your original plan).
In either case, create an index on
article (status, section, date, id)
the query should output all open articles about 'iphone' from the last month.
So the only query you are going to run on this data uses the tag and the date. You've got a index for the tag in the tags table, but the date is stored in a different table (article - you're a bit inconsistent with your naming schema). Adding an index on the article table using date would be no benefit at all. Using id,date (in that order) would help a little - but really the date needs to be denormalised into the tags table to get the query running really fast.
Unless you're regularly moving around bulk data sets - just add a datetime column with a default of the current timestamp to the tags table.
I expect that you may be wanting to interact with the data in lots of other ways - really you should set a low (no?) threshold for slow query logging then analyse the resulting data to identify where you're performance problems are (try looking at the queries with the highest values for duration^2*frequency first).
There's a script at the URL below which is useful for this analysis:
http://www.retards.org/projects/mysql/
You could index the additional fields in article that you are referencing in your select statement. In this case, I would suggest you create an index in article like this:
CREATE INDEX article_idx ON article (author_id, status, section, date);
Creating that index should speed up your query depending on how many overall records you are dealing with. From my understanding, properly creating indexes involves looking at the queries you've written and indexing the columns that are a part of your where clause. This helps the query optimizer better process the query in general. That does not mean create an index on each individual column, however, as its both inefficient to do so and ineffective. When possible, create multiple column indexes that represent your select statement.