optimization of mysql explain output - mysql

i have just run mysql explain to check one query and got surprised to see that it has to check more than 250000 records to sort the result however i have index for where clause and whatever mysql explain is giving i am completely agree as that much new row has been added so how to sort out this issue .mysql table structure is
tableA is a forum where users can post the content
id userid created title
1 3 12232 xyz
2 etc...............
my mysql query is
explain SELECT * from tableA where userid='2' order by created desc limit 3
output of this explain query is
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE tableA ref userid userid 4 const 275216 Using where; Using temporary; Using filesort
my worrie is how to reduce this to 3-4 as i am interested in displaying only three result but mysql is searching 275216 records before displaying the output . as 275216 records has been created after userid 2 has posted in forum but what is the solution to tell mysql that look only for specific data so that it can search the result from very small set of rows i want maximum 20-30 rows mysql should search to server 3 rows

Try this::
SELECT * from tableA FORCE INDEX (userid) where userid=2 order by created desc limit 3

Related

JOINs being done in weird order; messing up ORDER BY?

Let's say I have three tables - users, servers and payments. Each user can have multiple servers and each server can have multiple payments. Let's also say I wanted to find the most recent payments and get info about the servers / customers those payments are attached to. Here's a query that could do this:
SELECT *
FROM payments p
JOIN customers c ON p.custID = c.custID
JOIN servers s ON s.serverID = p.serverID
WHERE c.hold = 0
AND c.archive = 0
ORDER BY p.paymentID DESC
LIMIT 10;
The problem is that when I run EXPLAIN on this query I get this:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE c ref PRIMARY,hold_archive hold_archive 3 const,const 28728 Using where; Using index; Using temporary; Using filesort
1 SIMPLE p ref custID custID 5 customers.custID 3 Using where
1 SIMPLE s eq_ref PRIMARY PRIMARY 4 payments.serverID 1 Using index
The problem is that the query takes a while to run. If I remove the ORDER BY it becomes 10x as fast. But I need the ORDER BY. Here's the EXPLAIN when I remove the ORDER BY:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE c ref PRIMARY,hold_archive hold_archive 3 const,const 28728 Using where; Using index
1 SIMPLE p ref custID custID 5 customers.custID 3 Using where
1 SIMPLE s eq_ref PRIMARY PRIMARY 4 payments.serverID 1 Using index
So the big difference here is that "Using temporary" and "Using filesort" are missing from the Extra column.
It seems like the reason, in this case, is that the column I'm doing the ORDER BY on isn't the first column in the EXPLAIN.
Another observation. If I remove one of the WHERE clauses (whilst keeping the ORDER BY) it speeds up similarily, but I need both WHERE's. Here's an example EXPLAIN of that:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE p index custID,serverID PRIMARY 4 NULL 10 Using where
1 SIMPLE c eq_ref PRIMARY,hold_archive PRIMARY 4 payments.custID 1 Using where
1 SIMPLE s eq_ref PRIMARY PRIMARY 4 payments.serverID 1 Using index
Here the ORDER BY column /is/ being done on the first column of the EXPLAIN. But why is MySQL re-arranging the order the tables are JOINed in and how can I make it so it doesn't do that? You can force indexes in MySQL but it doesn't seem like that'd help..
Any ideas?
10x faster -- It can find "any 10 rows" a lot faster than "find all possible rows, sort them, then deliver 10".
Having WHERE and ORDER BY hit different columns is hard to optimize.
What percentage of payments have hold=0 and archive=0? It sounds like a small percentage? How many rows in each table?
Does anything else need INDEX(hold, archive)? If not, get rid of it. It seems to be only causing trouble here.
If hold=0 and archive=0 is common, then you would prefer the execution to go like your 3rd EXPLAIN -- that is scan payments in descending order. With most of them matching the WHERE, it will usually` need to hit not much more than 10 rows before finding 10 matching rows.
Another solution (other than getting rid of the index) is to change JOIN to STRAIGHT_JOIN in the query. This tells the Optimizer that you know better, and payments should be scanned first, customers second. That works well if my previous paragraph applies.
But the query will screw up (by being slow) if, say, you look for archive=1.

Removing all duplicates except one - optimized queries

I have tried the following two queries:
delete from app where not exists
(select a2.app_package, max(a2.id) from (select * from app) as a2
where a2.app_package = app.app_package having max(a2.id) = app.id);
AND
DELETE FROM app
USING app,
(select app_package, max(id) as ID from app
group by app_package
) as A
where A.ID > app.ID AND
A.app_package = app.app_package;
and am really stuck as to which one would execute faster.
SQLFiddles:
http://sqlfiddle.com/#!2/46498/1
http://sqlfiddle.com/#!2/142593/1
Both execution plans are the same:
ID SELECT_TYPE TABLE TYPE POSSIBLE_KEYS KEY KEY_LEN REF ROWS FILTERED EXTRA
1 SIMPLE app ALL 7 100
Are there further optimizations that could be made?
The execution plan you are showing, is not that of the DELETE query but that from the SELECT * FROM app query, which just does a full table scan (as expected as you aren't filtering on anything).
To see the execution plan, you will need to run the explain on the delete statements instead (appearantly not possible in sqlfiddle).
I took the liberty of assuming the you have an index on app_package. If you don't, you should definitely add it.
The first example (simply replace DELETE FROM with SELECT * FROM) shows that you are doing full table scans (bad) and using a DEPENDENT subquery which will be ran for almost every record in the outer table (which is bad as well).
1 PRIMARY app ALL 7 Using where
2 DEPENDENT SUBQUERY <derived3> ALL 7 Using where
3 DERIVED app ALL 7
To see that of the second one, you will have to translate the delete into a SELECT statement, something like this
SELECT * FROM app, (
SELECT app_package, MAX( id ) AS ID
FROM app
GROUP BY app_package
) AS A
WHERE A.ID > app.ID
AND A.app_package = app.app_package
which gives
1 PRIMARY <derived2> ALL 4
1 PRIMARY app ref 1 Using where
2 DERIVED app index 7
As you can see, this is one isn't using dependant subqueries and not doing full table scans. This will definitely run faster when the amount of data in the table grows.

Query Optimization for Friends Feed - MySQL

I'm having my weird trouble with a friends feed query - here is the background:
I have 3 tables
checkin - around 13m records
users - around 250k records
friends - around 1.5m records
In the checkin table - it lists activity that are performed by users. (here are numerous indexes, however there is an index on user_id, created_at, and (user_id,created_at).
The users table is just the basic user information There is an index on user_id.
The friends table has a user_id, target_id and is_approved. There is an index on the (user_id, is_approved) fields.
In my query, I am trying to pull down just a basic friends feed of any users - so I have been doing this:
SELECT checkin_id, created_at
FROM checkin
WHERE (user_id IN (SELECT friend_id from friends where user_id = 1 and is_approved = 1) OR user_id = 1)
ORDER by created_at DESC
LIMIT 0, 15
The goal of the query is just to pull the checkin_id and created_at for all the users' friend plus their activity. It's a pretty simple query, but when a user's friends have tons of recent activity, this query is very quick, here is the EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY checkin index user_id,user_id_2 created_at 8 NULL 15 Using where
2 DEPENDENT SUBQUERY friends eq_ref user_id,friend_id,is_approved,friend_looku... PRIMARY 8 const,func 1 Using where
As an explanation, user_id is a simple index on user_id - while user_id_2 is an index on user_id and created_at. On the friends table, friends_lookup is the index of user_id and is_approved.
This is a very simple query and get's completed in: Showing rows 0 - 14 (15 total, Query took 0.0073 sec).
However when a user's friends activity is not very recent and there isn't a lot of the data, the same query takes around 5-7 seconds and it has the same EXPLAIN as the previous query - but takes longer.
It doesn't seem to have an affect on more friends, it seems to speed up with more recent activity.
Is there any tips that anyone have to optimize these queries to makes sure they run the same speed irregardless of activity?
Server Setup
This is a dedicated MySQL server running 16GB of RAM. It is running Ubuntu 10.10 and the version of MySQL is 5.1.49
UPDATE
So most people have suggested remove the IN piece and move them into a INNER JOIN:
SELECT c.checkin_id, c.created_at
FROM checkin c
INNER JOIN friends f ON c.user_id = f.friend_id
WHERE f.user_id =1
AND f.is_approved =1
ORDER BY c.created_at DESC
LIMIT 0 , 15
This query is 10x worse - as reported in the EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE f ref PRIMARY,user_id,friend_id,is_approved,friend_looku... friend_lookup 5 const,const 938 Using temporary; Using filesort
1 SIMPLE c ref user_id,user_id_2 user_id 4 untappd_prod.f.friend_id 71 Using where
The goal of this query to get all the friends activity, and yours in the same query (instead of having to create two queries and merge the results together and sort by created_at). I also can't remove the index on user_id as it's important piece of another query.
The interesting part is when I run this query on a user account that doesn't have a lot activity, I get this explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE f index_merge PRIMARY,user_id,friend_id,is_approved,friend_looku... user_id,friend_lookup 4,5 NULL 11 Using intersect(user_id,friend_lookup); Using wher...
1 SIMPLE c ref user_id,user_id_2 user_id 4 untappd_prod.f.friend_id 71 Using where
Any advice?
so.. you have a few things going on here..
in the explain plan .. usually the optimizer will choose whats in "key" and not whats in possible_keys. So thats why you experience when it needs to scan more records when the data is not recent.
on checkin table only ( user_id, created_at ) and created_at is necessary.. you dont need another index for user_id.. the optimizer will use (user_id, created_at ) since user_id is the first order.
try this..
use join between friends and checkin and remove the in clause, such that friends becomes the driving table and you should see that first on the execution path of your explain plan.
with 1 done, you should make sure that checkin is using (user_id, created_dt ) index in the execution path.
write another query for the OR condition where user_id from checkin table is 1. I think your data set should be mutually exclusive for these two sets, it should then be ok .. or else you would not need to have the OR condition after the IN clause in the first place.
remove the user_id index thats by it self as you have user_id, created_at index.
-- your goal is that it uses the index under key not just possible keys.
this should take care of older non recent checkins as well as recent ones.
My first suggestion is to remove the dependent subquery and turn it into a join. I've found that MySQL is not good at processing these types of queries. Try this:
SELECT c.checkin_id, c.created_at
FROM checkin c
INNER JOIN friends f
ON c.user_id = f.friend_id
WHERE f.user_id = 1
AND f.is_approved = 1
ORDER by c.created_at DESC
LIMIT 0, 15
My second suggestion, since you have a dedicated server, is to use the InnoDB storage engine for all your tables. Make sure that you tweak default InnoDB settings, especially for innodb_buffer_pool_size: http://www.mysqlperformanceblog.com/2007/11/03/choosing-innodb_buffer_pool_size/

Optimizing this mysql query to make use of my indexes if possible?

I have a mysql query that I thought should be using my indexes but still seems to be needing to scan alot of rows (I think).
Here is my query:
SELECT DISTINCT DAY(broadcast_at) AS 'days'
from v3211062009
where month(broadcast_at) = 5 and
year(broadcast_at) = 2012
and deviceid = 337 order by days;
On my table I have an index setup on broadcast_at, deviceid. However the results of a explain on this query looks like:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE v3211062009 ref indx_deviceid,indx_tracking_query indx_tracking_query 4 const **172958** Using where; Using index; Using temporary; Using filesort
I don't understand why it needs to look up so many of the rows. The total amount of rows for this deviceid record is only 184085 so my query seems to be almost looking at all of them just to get the result set. Is the index on broadcast_at not working.
I'm obviously doing something fundamentally wrong but can't figure it out. Changing the order of the columns in my index didn't work.
I don't think MySQL can take advantage of the index on broadcast_at if you use functions on that field.
How does it perform if you do:
SELECT DISTINCT DAY(broadcast_at) AS 'days'
from v3211062009
where broadcast_at >= ('2012-05-01') AND
broadcast_at < ('2012-06-01')
and deviceid = 337 order by days;

Why is this query using where instead of index?

EXPLAIN EXTENDED SELECT `board` . *
FROM `board`
WHERE `board`.`category_id` = '5'
AND `board`.`board_id` = '0'
AND `board`.`display` = '1'
ORDER BY `board`.`order` ASC
The output of the above query is
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE board ref category_id_2 category_id_2 9 const,const,const 4 100.00 Using where
I'm a little confused by this because I have an index that contains the columns that I'm using in the same order they're used in the query...:
category_id_2 BTREE No No
category_id 33 A
board_id 33 A
display 33 A
order 66 A
The output of EXPLAIN can sometimes be misleading.
For instance, filesort has nothing to do with files, using where does not mean you are using a WHERE clause, and using index can show up on the tables without a single index defined.
Using where just means there is some restricting clause on the table (WHERE or ON), and not all record will be returned. Note that LIMIT does not count as a restricting clause (though it can be).
Using index means that all information is returned from the index, without seeking the records in the table. This is only possible if all fields required by the query are covered by the index.
Since you are selecting *, this is impossible. Fields other than category_id, board_id, display and order are not covered by the index and should be looked up.
It is actually using index category_id_2.
It's using the index category_id_2 properly, as shown by the key field of the EXPLAIN.
Using where just means that you're selecting only some rows by using the WHERE statement, so you won't get the entire table back ;)