I'm running the following query on a Macbook Pro 2.53ghz with 4GB of Ram:
SELECT
c.id AS id,
c.name AS name,
c.parent_id AS parent_id,
s.domain AS domain_name,
s.domain_id AS domain_id,
NULL AS stats
FROM
stats s
LEFT JOIN stats_id_category sic ON s.id = sic.stats_id
LEFT JOIN categories c ON c.id = sic.category_id
GROUP BY
c.name
It takes about 17 seconds to complete.
EXPLAIN:
alt text http://img7.imageshack.us/img7/1364/picture1va.png
The tables:
Information:
Number of rows: 147397
Data size: 20.3MB
Index size: 1.4MB
Table:
CREATE TABLE `stats` (
`id` int(11) unsigned NOT NULL auto_increment,
`time` int(11) NOT NULL,
`domain` varchar(40) NOT NULL,
`ip` varchar(20) NOT NULL,
`user_agent` varchar(255) NOT NULL,
`domain_id` int(11) NOT NULL,
`date` timestamp NOT NULL default CURRENT_TIMESTAMP,
`referrer` varchar(400) default NULL,
KEY `id` (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=147398 DEFAULT CHARSET=utf8
Information second table:
Number of rows: 1285093
Data size: 11MB
Index size: 17.5MB
Second table:
CREATE TABLE `stats_id_category` (
`stats_id` int(11) NOT NULL,
`category_id` int(11) NOT NULL,
KEY `stats_id` (`stats_id`,`category_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
Information third table:
Number of rows: 161
Data size: 3.9KB
Index size: 8KB
Third table:
CREATE TABLE `categories` (
`id` int(11) NOT NULL auto_increment,
`parent_id` int(11) default NULL,
`name` varchar(40) NOT NULL,
`questions_category_id` int(11) NOT NULL default '0',
`rank` int(2) NOT NULL default '0',
PRIMARY KEY (`id`),
KEY `id` (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=205 DEFAULT CHARSET=latin1
Hopefully someone can help me speed this up.
I see several WTF's in your query:
You use two LEFT OUTER JOINs but then you group by the c.name column which might have no matches. So perhaps you don't really need an outer join? If that's the case, you should use an inner join, because outer joins are often slower.
You are grouping by c.name but this gives ambiguous results for every other column in your select-list. I.e. there might be multiple values in these columns in each grouping by c.name. You're lucky you're using MySQL, because this query would simply give an error in any other RDBMS.
This is a performance issue because the GROUP BY is likely causing the "using temporary; using filesort" you see in the EXPLAIN. This is a notorious performance-killer, and it's probably the single biggest reason this query is taking 17 seconds. Since it's not clear why you're using GROUP BY at all (using no aggregate functions, and violating the Single-Value Rule), it seems like you need to rethink this.
You are grouping by c.name which doesn't have a UNIQUE constraint on it. You could in theory have multiple categories with the same name, and these would be lumped together in a group. I wonder why you don't group by c.id if you want one group per category.
SELECT NULL AS stats: I don't understand why you need this. It's kind of like creating a variable that you never use. It shouldn't harm performance, but it's just another WTF that makes me think you haven't thought this query through very well.
You say in a comment you're looking for number of visitors per category. But your query doesn't have any aggregate functions like SUM() or COUNT(). And your select-list includes s.domain and s.domain_id which would be different for every visitor, right? So what value do you expect to be in the result set if you only have one row per category? This isn't really a performance issue either, it just means your query results don't tell you anything useful.
Your stats_id_category table has an index over its two columns, but no primary key. So you can easily get duplicate rows, and this means your count of visitors may be inaccurate. You need to drop that redundant index and use a primary key instead. I'd order category_id first in that primary key, so the join can take advantage of the index.
ALTER TABLE stats_id_category DROP KEY stats_id,
ADD PRIMARY KEY (category_id, stats_id);
Now you can eliminate one of your joins, if all you need to count is the number of visitors:
SELECT c.id, c.name, c.parent_id, COUNT(*) AS num_visitors
FROM categories c
INNER JOIN stats_id_category sic ON (sic.category_id = c.id)
GROUP BY c.id;
Now the query doesn't need to read the stats table at all, or even the stats_id_category table. It can get its count simply by reading the index of the stats_id_category table, which should eliminate a lot of work.
You are missing the third table in the information provided (categories).
Also, it seems odd that you are doing a LEFT JOIN and then using the right table (which might be all NULLS) in the GROUP BY. You will end up grouping all of the non-matching rows together as a result, is that what you intended?
Finally, can you provide an EXPLAIN for the SELECT?
Harrison is right; we need the other table. I would start by adding an index on category_id to stats_id_category, though.
I agree with Bill. Point 2 is very important. The query doesn't even make logical sense. Also, with the simple fact that there is no where statement means that you have to pull back every row in the stats table, which seems to be around 140000. It then has to sort all that data, so that it can perform the GROUP BY. This is because sort [ O(n log n)] and then find duplicates [ O(n) ] is much faster than just finding duplicates without sorting the data set [ O(n^2)?? ].
Related
I have a large table with user votes. I tried nearly every tutorial and essay about INDEX usage and after failing that ... every possible combination of fields as keys, but the query stays slow.
Is there any Index I can use to speed this up?
(I will spare you my hidious attempts at indizes so far...)
CREATE TABLE IF NOT EXISTS `votes` (
`uid` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
`objectId` bigint(15) NOT NULL,
`vote` tinyint(1) NOT NULL,
`created` datetime NOT NULL,
UNIQUE KEY `unique input` (`uid`,`objectId`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
The table has around 1.3Million rows and will continue growing. This is the query i am trying:
EXPLAIN SELECT objectId, COUNT(uid) AS voteCount, AVG(vote) AS rating
FROM votes
GROUP BY objectId
Any pointers?
The only approach I could suggest is the following, although I don't know if it would increase performance. It assumes that you have an Objects table, with one row per ObjectId:
SELECT ObjectId,
(select count(*) from votes v where v.objectid = o.objectid) as votecount,
(select avg(vote) from votes v where v.objectid = o.objectid) as rating
FROM objects o;
Then you want the following index: votes(objectid, vote).
This would replace the outer group by with index scans, which may speed up the query.
I don't see how an index will help to speed this up, because the average function will require that you interact with each and every row. There's no WHERE clause.
Maybe you could create a VIEW that would amortize the cost for you.
I think duffymo is correct. However you could try swapping the two columns in your unique key around (or add an index for objectid only) as it may help the GROUP BY.
I have a site where there is an activity feed, similar to how social sites like Facebook have one. It is a "newest first" list that describes actions taken by users. In production, there's about 200k entries in that table.
Since this is going to be asked anyway, I'll first share the full table structure:
CREATE TABLE `karmalog` (
`id` int(11) NOT NULL auto_increment,
`guid` char(36) default NULL,
`user_id` int(11) default NULL,
`user_name` varchar(45) default NULL,
`user_avat_url` varchar(255) default NULL,
`user_sec_id` int(11) default NULL,
`user_sec_name` varchar(45) default NULL,
`user_sec_avat_url` varchar(255) default NULL,
`event` enum('EDIT_PROFILE','EDIT_AVATAR','EDIT_EMAIL','EDIT_PASSWORD','FAV_IMG_ADD','FAV_IMG_ADDED','FAV_IMG_REMOVE','FAV_IMG_REMOVED','FOLLOW','FOLLOWED','UNFOLLOW','UNFOLLOWED','COM_POSTED','COM_POST','COM_VOTE','COM_VOTED','IMG_VOTED','IMG_UPLOAD','LIST_CREATE','LIST_DELETE','LIST_ADMINDELETE','LIST_VOTE','LIST_VOTED','IMG_UPD','IMG_RESTORE','IMG_UPD_LIC','IMG_UPD_MOD','IMG_GEO','IMG_UPD_MODERATED','IMG_VOTE','IMG_VOTED','TAG_FAV_ADD','CLASS_DOWN','CLASS_UP','IMG_DELETE','IMG_ADMINDELETE','IMG_ADMINDELETEFAV','SET_PASSWORD','IMG_RESTORED','IMG_VIEW','FORUM_CREATE','FORUM_DELETE','FORUM_ADMINDELETE','FORUM_REPLY','FORUM_DELETEREPLY','FORUM_ADMINDELETEREPLY','FORUM_SUBSCRIBE','FORUM_UNSUBSCRIBE','TAG_INFO_EDITED','IMG_ADDSPECIE','IMG_REMOVESPECIE','SPECIE_ADDVIDEO','SPECIE_REMOVEVIDEO','EARN_MEDAL','JOIN') NOT NULL,
`event_type` enum('follow','tag','image','class','list','forum','specie','medal','user') NOT NULL,
`active` bit(1) NOT NULL,
`delete` bit(1) NOT NULL default '\0',
`object_id` int(11) default NULL,
`object_cache` text,
`object_sec_id` int(11) default NULL,
`object_sec_cache` text,
`karma_delta` int(11) NOT NULL,
`gold_delta` int(11) NOT NULL,
`newkarma` int(11) NOT NULL,
`newgold` int(11) NOT NULL,
`migrated` int(11) NOT NULL default '0',
`date_created` timestamp NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `user_sec_id` (`user_sec_id`),
KEY `image_id` (`object_id`),
KEY `date_event` (`date_created`,`event`),
KEY `event` (`event`),
KEY `date_created` (`date_created`),
CONSTRAINT `karmalog_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `user` (`id`) ON DELETE SET NULL,
CONSTRAINT `karmalog_ibfk_2` FOREIGN KEY (`user_sec_id`) REFERENCES `user` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Before optimizing this table, my query had 5 joins and I ran into slow query times. I have denormalized all of that data, so that not a single join is there anymore. So the table and query is flat.
As you can see in the table design, there's an "event" field which is an enum, holding a few dozen possible values. Throughout the site, I show activity feeds based on specific event types. Typically that query looks like this:
SELECT * FROM karmalog as k
WHERE k.event IN ($events) AND k.delete=0
ORDER BY k.date_created DESC, k.id DESC
LIMIT 0,30
What this query does is to find the latest 30 entries in the total set that match any of the events passed in $events, which can be multiple.
Due to removing the joins and having indices on most fields, I was expecting this to perform very well, but it doesn't. On 200k entries, it still takes over 3 seconds and I don't understand why.
Regarding solutions, I know I could archive older entries or partition the table per event type, but that will have quite a code impact, and I first would like to understand why the above is so slow.
As a temporary work-around, I'm now doing this:
SELECT * FROM
(SELECT * FROM karmalog ORDER BY date_created DESC, id DESC LIMIT 0,1000) as karma
WHERE karma.event IN ($events) AND karma.delete=0
LIMIT $page,$pagesize
What this does is to limit the baseset to search in to the latest 1000 entries only, hoping and guessing that there's 30 entries to be found for the filters that I pass in. It's not very robust though. It will not work for more rare events, and it brings pagination issues.
Therefore, I first like to get to the root cause of why my initial query is slow, against my expectation.
Edit: I was asked to share the execution plan. Here's the test query:
EXPLAIN SELECT * FROM karmalog
WHERE event IN ('FAV_IMG_ADD','FOLLOW','COM_POST','IMG_VOTE','LIST_VOTE','JOIN','CLASS_UP','LIST_CREATE','FORUM_REPLY','FORUM_CREATE','FORUM_SUBSCRIBE','IMG_GEO','IMG_ADDSPECIE','SPECIE_ADDVIDEO','EARN_MEDAL') AND karmalog.delete=0
ORDER BY date_created DESC, id DESC
LIMIT 0,36
Execution plan:
id = 1
select_type = SIMPLE
table = karmalog
type = range
possible_keys = event
key = event
key_len = 1
red = NULL
rows = 80519
Extra = Using where; Using filesort
I'm not sure how to read into the above, but I do know that the sort clause really seems to kill this query. With this sorting, it takes 4.3 secs, without 0.03 secs.
SELECT * sometimes slows down ordered queries by a huge amount, so let's start by refactoring your query as follows:
SELECT k.*
FROM karmalog AS k
JOIN (
SELECT id
FROM karmalog
WHERE event IN ($events)
AND delete=0
ORDER BY date_created DESC, id DESC
LIMIT 0,30
) AS m ON k.id = m.id
ORDER BY k.date_created DESC, k.id DESC
This will do your ORDER BY ... LIMIT operation without having to haul the whole table around in the sorting phase. Finally it will look up the appropriate thirty rows from the original table and sort just those again. This might save a whole lot of I/O and in-memory data shuffling.
Second, if id column values are assigned in ascending order as records are inserted, then the use of date_created in your ORDER BY operation is redundant. But MySQL doesn't know that, so leaving it out might help. This will be true if you always use the current date when inserting, and never update the dates.
Third, you might be able to use a compound covering index for the selection (inner) query. This is an index that contains all the fields you need. When you use a covering index, the whole query can be satisfied from the index, and there's no need to bounce back to the original table. This saves disk access time.
Try this compound covering index: (delete, event, id). If you decide you can't get rid of the use of date_created in your ordering, try this instead: (delete, event, date_created, id)
Add a compound index over the two relevant questions. In your table, you can do that by specifying e.g.
KEY `date_created` (`date_created`, `event`)
This key can still be used to satisfy plain old date_created range searching. But in addition to that, the event data is included as well, so the DBS will be able to detect the relevant rows by only looking at the index.
If you want, you can try the other order as well: first event and then date. This might allow some optimization if there are many event types but your filter only contains few. On the other hand, I'm not sure the system will be able to make use of the LIMIT clause in this case, so I'm not certain that this other order will be any help at all.
Edit: I completely missed that your date_event index already has this info. According to your execution plan, though, that one isn't used. Looks like the optimizer is getting things wrong. You could try removing the event index, and perhaps the date index as well, and see what happens then.
one question that I should be able to answer myself but I don't and I also don't find any answer in google:
I have a table that contains 5 million rows with this structure:
CREATE TABLE IF NOT EXISTS `files_history2` (
`FILES_ID` int(10) unsigned DEFAULT NULL,
`DATE_FROM` date DEFAULT NULL,
`DATE_TO` date DEFAULT NULL,
`CAMPAIGN_ID` int(10) unsigned DEFAULT NULL,
`CAMPAIGN_STATUS_ID` int(10) unsigned DEFAULT NULL,
`ON_HOLD` decimal(1,0) DEFAULT NULL,
`DIVISION_ID` int(11) DEFAULT NULL,
KEY `DATE_FROM` (`DATE_FROM`),
KEY `FILES_ID` (`FILES_ID`),
KEY `CAMPAIGN_ID` (`CAMPAIGN_ID`),
KEY `CAMP_DATE` (`CAMPAIGN_ID`,`DATE_FROM`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
When I execute
SELECT files_id, min( date_from )
FROM files_history2
WHERE campaign_id IS NOT NULL
GROUP BY files_id
the query rests with status "Sending data" for more than eight hours (then I killed the process).
Here the explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE files_history2 ALL CAMPAIGN_ID,CAMP_DATE NULL NULL NULL 5073254 Using where; Using temporary; Using filesort
I assume that I generated the necessary keys but then the query should take that long, does it?
I would suggest a different index... Index on (Files_ID, Date_From, Campaign_ID)...
Since your group by is on Files_ID, you want THOSE grouped. Then the MIN( Date_From), so that is in second position... Then FINALLY the Campaign_ID to qualify for not null and here's why...
If you put all your campaign IDs first, great, get all the NULLs out of the way... Now, you have 1,000 campaigns and the Files_ID spans MANY campaigns and they also span many dates, you are going to choke.
By the index I'm projecting, by the Files_ID first, you have each "files_id" already ordered to match your group by. Then, within that, all the earliest dates are at the top of the indexed list... great, almost there, then, by campaign ID. Skip over whatever NULL may be there and you are done, on to the next Files_ID
Hope this makes sense -- unless you have TONs of entries with NULL value campaigns.
Also, by having all 3 parts of the index matching the criteria and output columns of your query, it never has to go back to the raw data file for the data, it gets it all from the index directly.
I'd create a covering index (CAMPAIGN_ID, files_id, date_from) and check that performance. I suspect your issue is due to the grouping not and date_from not being able to use the same index.
CREATE INDEX your_index_name ON files_history2 (CAMPAIGN_ID, files_id, date_from);
If this works you could drop the point index CAMPAIGN_ID as it's included in the composite index.
Well the query is slow due to the aggregation ( function MIN ) along with grouping.
One of the solution is altering your query by moving the aggregating subquery from the WHERE clause to the FROM clause, which will be lot faster than the approach you are using.
try following:
SELECT f.files_id
FROM file_history2 AS f
JOIN (
SELECT campaign_id, MIN(date_from) AS datefrom
FROM file_history2
GROUP BY files_id
) AS f1 ON f.campaign_id = f1.campaign_id AND f.date_from = f1.datefrom;
This should have lot better performance, if doesn't work temporary table would only be the choice to go with.
Query :
SELECT
r.reply_id,
r.msg_id,
r.uid,
r.body,
r.date,
u.username as username,
u.profile_picture as profile_picture
FROM
pm_replies as r
LEFT JOIN users as u
ON u.uid = r.uid
WHERE
r.msg_id = '784351921943772258'
ORDER BY r.date DESC
i tried all index combinations i could think of, searched in google how best i could index this but nothing worked.
this query takes 0,33 on 500 returned items and counting...
EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE r ALL index1 NULL NULL NULL 540 Using where; Using filesort
1 SIMPLE u eq_ref uid uid 8 site.r.uid 1
SHOW CREATE pm_replies
CREATE TABLE `pm_replies` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`reply_id` bigint(20) NOT NULL,
`msg_id` bigint(20) NOT NULL,
`uid` bigint(20) NOT NULL,
`body` text COLLATE utf8_unicode_ci NOT NULL,
`date` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `index1` (`msg_id`,`date`,`uid`)
) ENGINE=MyISAM AUTO_INCREMENT=541 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
SHOW CREATE users
CREATE TABLE `users` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`uid` bigint(20) NOT NULL,
`username` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
`email` text CHARACTER SET latin1 NOT NULL,
`password` text CHARACTER SET latin1 NOT NULL,
`profile_picture` text COLLATE utf8_unicode_ci NOT NULL,
`date_registered` datetime NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `uid` (`uid`),
UNIQUE KEY `username` (`username`)
) ENGINE=MyISAM AUTO_INCREMENT=2004 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
For the query as it is, the best indexes would seem to be...
pm_replies: (msg_id, date, uid)
users: (uid)
The important one is pm_replies. You use it to both filter your data (the filter column is first) then order your data (the order column is second).
The would be different if you removed the filter. Then you'd just want (date, uid) as your index.
The last field in the index just makes it a fraction friendlier to the join, the important part is actually the index on users.
There is a lot more that coudl be said on this, a whole chapter in a book at the very least, and several books if your wanted to. But I hope this helps.
EDIT
Not that my suggested index for pm_replies is one index covering three fields, and not just three indexes. This ensures that all the entries in the index are pre-sorted by those columns. It's like sorting data in Excel by three columns.
Having three separate indexes is like having the Excel data on three tabs. Each sorted by a different fields.
Only whith one index over three fields do you get this behaviour...
- You can select one 'bunch' of records with the same msg_id
- That whole 'bunch' are next to each other, no gaps, etc
- That whole 'bunch' are sorted in date order for that msg_id
- For any rows with the same date, they're ordered by user_id
(Again the user_id part is really very minor.)
Please try this:
SELECT
r.reply_id,
r.msg_id,
r.uid,
r.body,
r.date,
u.username as username,
u.profile_picture as profile_picture
FROM
pm_replies as r
LEFT JOIN users as u
ON (u.uid = r.uid AND r.msg_id = '784351921943772258')
ORDER BY r.date DESC
in my case it help.
Add date to your index1 key so that msg_id and date are both in the index.
What Dems is saying should be correct, but there is one additional detail if you are using InnoDB: perhaps you are paying the price of secondary indexes on clustered tables - essentially, accessing a row through the secondary index requires additional lookup trough the primary, i.e. clustering index. This "double lookup" might make the index less attractive to the query optimizer.
To alleviate this, try covering the all the fields in your select statement with the index:
pm_replies: (msg_id, date, uid, reply_id, body, date)
users: (uid, username, profile_picture)
It appears the optimizer is trying to force the index by ID to make the join to the user table. Since you are doing a left-join (which doesn't make sense since I would expect every entry to have a user ID, thus a normal INNER JOIN), I'll keep it left join.
So, I would try the following. Query just the replies based on the MESSAGE ID and order by the date descending on its own merits, THEN left join, such as
SELECT
r.reply_id,
r.msg_id,
r.uid,
r.body,
r.date,
u.username as username,
u.profile_picture as profile_picture
FROM
( select R2.*
from pm_replies R2
where r2.msg_id = '784351921943772258' ) r
LEFT JOIN users as u
ON u.uid = r.uid
ORDER BY
r.date DESC
In addition, since I don't have MySQL readily available, and can't remember if order by is allowed in a sub-query, if so, you can optimize the inner prequery (using alias "R2") and put the order by there, so it uses the (msgid, date) index and returns just that set... THEN joins to user table on the ID which no index is required at that point from the SOURCE result set, just the index on the user table to find the match.
I have web application that use a similar table scheme like below. simply I want to optimize the selection of articles. articles are selected based on the tag given. for example, if the tag is 'iphone' , the query should output all open articles about 'iphone' from the last month.
CREATE TABLE `article` (
`id` int(11) NOT NULL auto_increment,
`title` varchar(100) NOT NULL,
`body` varchar(200) NOT NULL,
`date` timestamp NOT NULL default CURRENT_TIMESTAMP,
`author_id` int(11) NOT NULL,
`section` varchar(30) NOT NULL,
`status` int(1) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
CREATE TABLE `tags` (
`name` varchar(30) NOT NULL,
`article_id` int(11) NOT NULL,
PRIMARY KEY (`name`,`article_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `users` (
`id` int(11) NOT NULL auto_increment,
`username` varchar(30) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=3 ;
The following is my MySQL query
explain select article.id,users.username,article.title
from article,users,tags
where article.id=tags.article_id and tags.name = 'iphone4'
and article.author_id=users.id and article.status = '1'
and article.section = 'mobile'
and article.date > '2010-02-07 13:25:46'
ORDER BY tags.article_id DESC
the output is
id select_type table type possible_keys key key_len ref rows Extra <br>
1 SIMPLE tags ref PRIMARY PRIMARY 92 const 55 Using where; Using index <br>
1 SIMPLE article eq_ref PRIMARY PRIMARY 4 test.tags.article_id 1 Using where <br>
1 SIMPLE users eq_ref PRIMARY PRIMARY 4 test.article.author_id 1 <br>
is it possible to optimize it more?
This query may be optimized, depending on which condition is more selective: tags.name = 'iphone4' or article.date > '2010-02-07 13:25:46'
If there are less articles tagged iphone than those posted after Feb 7, then your original query is nice.
If there are many articles tagged iphone, but few those posted after Feb 7, then this query will be more efficient:
SELECT article.id, users.username, article.title
FROM tags
JOIN article
ON article.id = tags.article_id
AND article.status = '1'
AND article.section = 'mobile'
AND article.date > '2010-02-07 13:25:46'
JOIN users
ON users.id = article.author_id
WHERE tags.name = 'iphone4'
ORDER BY
tags.article_date DESC, tags.article_id DESC
Note that the ORDER BY condition has changed. This may or may not be what you want, however, generally the orders of id and date correspond to each other.
If you really need your original ORDER BY condition you may leave it but it will add a filesort (or just revert to your original plan).
In either case, create an index on
article (status, section, date, id)
the query should output all open articles about 'iphone' from the last month.
So the only query you are going to run on this data uses the tag and the date. You've got a index for the tag in the tags table, but the date is stored in a different table (article - you're a bit inconsistent with your naming schema). Adding an index on the article table using date would be no benefit at all. Using id,date (in that order) would help a little - but really the date needs to be denormalised into the tags table to get the query running really fast.
Unless you're regularly moving around bulk data sets - just add a datetime column with a default of the current timestamp to the tags table.
I expect that you may be wanting to interact with the data in lots of other ways - really you should set a low (no?) threshold for slow query logging then analyse the resulting data to identify where you're performance problems are (try looking at the queries with the highest values for duration^2*frequency first).
There's a script at the URL below which is useful for this analysis:
http://www.retards.org/projects/mysql/
You could index the additional fields in article that you are referencing in your select statement. In this case, I would suggest you create an index in article like this:
CREATE INDEX article_idx ON article (author_id, status, section, date);
Creating that index should speed up your query depending on how many overall records you are dealing with. From my understanding, properly creating indexes involves looking at the queries you've written and indexing the columns that are a part of your where clause. This helps the query optimizer better process the query in general. That does not mean create an index on each individual column, however, as its both inefficient to do so and ineffective. When possible, create multiple column indexes that represent your select statement.