SQL optimizing large or query - mysql

I'm attempting to run a query to find any matches between multiple phone number columns on two tables and it is taking far too long (>5 minutes) and this is with the data filtered as much as possible. I've separated the actual columns I can search from both tables into their own tables, just to reduce the amount of total rows.
This is from a legacy application I inherited.
Query
select count(b.bid)
from customers_with_phone c,buyers_orders_with_phone b
where
(b.hphone=c.pprim or b.hphone=c.phome or b.hphone=c.pwork or b.hphone=c.pother)
or (b.wphone=c.pprim or b.wphone=c.phome or b.wphone=c.pwork or b.wphone=c.pother)
or (b.cphone=c.pprim or b.cphone=c.phome or b.cphone=c.pwork or b.cphone=c.pother)
group by b.bid;
Tables
mysql> show columns from customers_with_phone;
+--------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+-------+
| pnum | int(11) | YES | | NULL | |
| pprim | text | YES | | NULL | |
| phome | text | YES | | NULL | |
| pwork | text | YES | | NULL | |
| pother | text | YES | | NULL | |
+--------+---------+------+-----+---------+-------+
mysql> show columns from buyers_orders_with_phone;
+--------+------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+------+------+-----+---------+-------+
| bid | text | YES | | NULL | |
| hphone | text | YES | | NULL | |
| wphone | text | YES | | NULL | |
| cphone | text | YES | | NULL | |
+--------+------+------+-----+---------+-------+
Explain
+----+-------------+-------+------+---------------+------+---------+------+-------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+---------------+------+---------+------+-------+----------+----------------------------------------------+
| 1 | SIMPLE | b | ALL | NULL | NULL | NULL | NULL | 8673 | 100.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | c | ALL | NULL | NULL | NULL | NULL | 75931 | 100.00 | Using where; Using join buffer |
+----+-------------+-------+------+---------------+------+---------+------+-------+----------+----------------------------------------------+
I realize that neither tables have a primary key, as these are only the columns that I need to search on and I extracted these columns from their original table. But using the original table it takes even longer because there is far more data to filter through.
I have other queries that are similar to this that will work with much more data so if I can make this one work in a reasonable time, I can get the others to work similarly.

A primary key is not a optimazation. What you need are non clustered index on your telephone text fields (one index per column). With these, you won't need to extract your data to seperate tables.

The legacy query is awful, sorry. It is full cartesian product.
The data structure cannot handle such queries effectively. You have 3 fields in one table and 4 in other and try to figure if any pair matches.
Possibly primary key and key for every phone column can improve this query, not sure, but it can make worse delete/insert/update performance.
Btw, you wrote that impossible to index by nullable column. It's not correct.
I can believe in only radical solution - change data structure or adding some kind of caching mechanism with trigger. But it is hard.

Related

Index not used in query. How to improve performance?

I have this query:
SELECT
*
FROM
`av_cita`
JOIN `av_cita_cstm` ON (
(
`av_cita`.`id` = `av_cita_cstm`.`id_c`
)
)
WHERE
av_cita.deleted = 0
This query takes over 120 seconds to finish, yet I have added all indexes.
When I ask for the execution plan:
explain SELECT * FROM `av_cita`
JOIN `av_cita_cstm` ON ( ( `av_cita`.`id` = `av_cita_cstm`.`id_c` ) )
WHERE av_cita.deleted = 0;
I get this:
+----+-------------+--------------+--------+----------------------+---------+---------+---------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+----------------------+---------+---------+---------------------------+--------+-------------+
| 1 | SIMPLE | av_cita | ALL | PRIMARY,delete_index | NULL | NULL | NULL | 192549 | Using where |
| 1 | SIMPLE | av_cita_cstm | eq_ref | PRIMARY | PRIMARY | 108 | rednacional_v2.av_cita.id | 1 | |
+----+-------------+--------------+--------+----------------------+---------+---------+---------------------------+--------+-------------+
delete_index is listed in the possible_keys column, but the key is null, and it doesn't use the index.
Table and index definitions:
+------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+--------------+------+-----+---------+-------+
| id | char(36) | NO | PRI | NULL | |
| name | varchar(255) | YES | MUL | NULL | |
| date_entered | datetime | YES | MUL | NULL | |
| date_modified | datetime | YES | | NULL | |
| modified_user_id | char(36) | YES | | NULL | |
| created_by | char(36) | YES | MUL | NULL | |
| description | text | YES | | NULL | |
| deleted | tinyint(1) | YES | MUL | 0 | |
| assigned_user_id | char(36) | YES | MUL | NULL | |
+------------------+--------------+------+-----+---------+-------+
+---------+------------+--------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+--------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| av_cita | 0 | PRIMARY | 1 | id | A | 192786 | NULL | NULL | | BTREE | | |
| av_cita | 1 | delete_index | 1 | deleted | A | 2 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | name_index | 1 | name | A | 96393 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | date_entered_index | 1 | date_entered | A | 96393 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | created_by | 1 | created_by | A | 123 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | assigned_user_id | 1 | assigned_user_id | A | 1276 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | deleted_id | 1 | deleted | A | 2 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | deleted_id | 2 | id | A | 192786 | NULL | NULL | | BTREE | | |
| av_cita | 1 | id | 1 | id | A | 192786 | NULL | NULL | | BTREE | | |
+---------+------------+--------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
How can I improve the performance of this query?
The query is losing time on making the join. I would strongly suggest to create and index on av_cita_cstm.id_c. Then the plan will probably be changed to use that index for the av_cita_cstm table, which is much better than PRIMARY. As a consequence PRIMARY will be used on ac_cita.
I think that will bring a big improvement. You might still get more improvement if you make sure delete_index is defined with two fields: (deleted, id), and then move the where condition of the SQL statement into the join condition. But I am not sure MySql will see this as a possibility.
The index on deleted is not used probably because the optimizer has decided that a full table-scan is cheaper than using the index. MySQL tends to make this decision if the value you search for is found on about 20% or more of the rows in the table.
By analogy, think of the index at the back of a book. You can understand why common words like "the" aren't indexed. It would be easier to just read the book cover-to-cover than to flip back and forth to the index, which only tells you that "the" appears on a majority of pages.
If you think MySQL has made the wrong decision, you can make it pretend that a table-scan is more expensive than using a specific index:
SELECT
*
FROM
`av_cita` FORCE INDEX (deleted_index)
JOIN `av_cita_cstm` ON (
(
`av_cita`.`id` = `av_cita_cstm`.`id_c`
)
)
WHERE
av_cita.deleted = 0
Read http://dev.mysql.com/doc/refman/5.7/en/index-hints.html for more information about index hints. Don't overuse index hints, they're useful only in rare cases. Most of the time the optimizer makes the right decision.
Your EXPLAIN plan shows that your join to av_cita_cstm is already using a unique index (the clue is "type: eq_ref" and also the "rows: 1"). I don't think any new index is needed in that table.
I notice the EXPLAIN shows that the table-scan on av_cita scans about an estimated 192549 rows. I'm really surprised that this takes 120 seconds. On any reasonably powerful computer, that should run much faster.
That makes me wonder if you have something else that needs tuning or configuration on this server:
What other processes are running on the server? A lot of applications, perhaps? Are the other processes also running slowly on this server? Do you need to increase the power of the server, or move applications onto their own server?
If you're on MySQL 5.7, try querying the sys schema: this:
select * from sys.innodb_buffer_stats_by_table
where object_name like 'av_cita%';
Are there other costly SQL queries running concurrently?
Did you under-allocate MySQL's innodb_buffer_pool_size? If it's too small, it could be furiously recycling pages in RAM as it scans your table.
select ##innodb_buffer_pool_size;
Did you over-allocate innodb_buffer_pool_size? Once I helped tune a server that was running very slowly. It turned out they had a 4GB buffer pool, but only 1GB of physical RAM. The operating system was swapping like crazy, causing everything to run slowly.
Another thought: You have shown us the columns in av_cita, but not the table structure for av_cita_cstm. Why are you fetching SELECT *? Do you really need all the columns? Are there huge BLOB/TEXT columns in the latter table? If so, it could be reading a large amount of data from disk that you don't need.
When you ask SQL questions, it would help if you run
SHOW CREATE TABLE av_cita\G
SHOW TABLE STATUS LIKE 'av_cita'\G
And also run the same commands for the other table av_cita_cstm, and include the output in your question above.

How can I make this UPDATE query faster?

I need to make this update query more efficient.
UPDATE #table_name# SET #column_name2# = 1 WHERE #column_name1# in (A list of data)
Right now it takes more than 2 minute to finish the job when my list of data is quite large. Here is the result of explain of this query:
+----+-------------+--------------+-------+---------------+---------+---------+------+--------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+---------------+---------+---------+------+--------+------------------------------+
| 1 | SIMPLE | #table_name# | index | NULL | PRIMARY | 38 | NULL | 763719 | Using where; Using temporary |
+----+-------------+--------------+-------+---------------+---------+---------+------+--------+------------------------------+
In class, I was told that an OK query should at least have a type of range and is better to reach ref. Right now mine is index, which is the second slowest I think. I'm wondering if there's a way to optimize that.
Here is the table format:
+--------------------+-------------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+-------------+------+-----+-------------------+-------+
| #column_name1# | varchar(12) | NO | PRI | | |
| #column_name2# | tinyint(4) | NO | | 0 | |
| #column_name3# | tinyint(4) | NO | | 0 | |
| ENTRY_TIME | datetime | NO | | CURRENT_TIMESTAMP | |
+--------------------+-------------+------+-----+-------------------+-------+
My friend suggested me that using exists rather than in clause may help. However, it looks like I cannot use exists like exists (A list of data)
For this query:
UPDATE #table_name#
SET #column_name2# = 1
WHERE #column_name1# in (A list of data);
You want an index on #table_name#(#column_name1#).
Do note that the number of records being updated has a very big impact on performance. If the "list of data" is really a subquery, then other methods are likely to be more helpful for improving performance.

What index should increase performance of select query?

This is table structure:
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| visitor_hash | varchar(40) | YES | MUL | NULL | |
| uri | varchar(255) | YES | | NULL | |
| ip_address | char(15) | YES | MUL | NULL | |
| last_visit | datetime | YES | | NULL | |
| visits | int(11) | NO | | NULL | |
| object_app | varchar(255) | YES | MUL | NULL | |
| object_model | varchar(255) | YES | | NULL | |
| object_id | varchar(255) | YES | | NULL | |
| blocked | tinyint(1) | NO | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
This is request:
SELECT `object_id`
FROM `visits_visit`
WHERE `object_model` = 'News'
GROUP BY `object_id`
ORDER BY COUNT( * ) DESC
LIMIT 0, 3
Time for response is ~77,63 ms.
CREATE INDEX resource_model ON visits_visit (object_model(100));
After this request the time for response increased to ~150ms.
How to improve performance for this case? Thank you.
UPDATED:
Answering to Michal Komorowski.
This is explain before index:
+----+-------------+--------------+------+---------------+------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+--------+----------------------------------------------+
| 1 | SIMPLE | visits_visit | ALL | NULL | NULL | NULL | NULL | 142938 | Using where; Using temporary; Using filesort |
+----+-------------+--------------+------+---------------+------+---------+------+--------+----------------------------------------------+
1 row in set (0.00 sec)
And this is after index:
+----+-------------+--------------+------+----------------+----------------+---------+-------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+----------------+----------------+---------+-------+-------+----------------------------------------------+
| 1 | SIMPLE | visits_visit | ref | resource_model | resource_model | 303 | const | 64959 | Using where; Using temporary; Using filesort |
+----+-------------+--------------+------+----------------+----------------+---------+-------+-------+----------------------------------------------+
1 row in set (0.00 sec)
I don't know what gives me this information.
SELECT `object_id`
FROM `visits_visit`
WHERE `object_model` = 'News'
GROUP BY `object_id`
ORDER BY COUNT( * ) DESC
LIMIT 0, 3
78,85 ms before indexing and 365,59 ms after indexing.
Also i have index
CREATE INDEX resource ON visits_visit (object_app(100), object_model(100), object_id(100));
But i need this one, because in other select queries WHERE contains this three keys.
UPDATE:
I'm using django debug toolbar to test performance of requests.
UPDATE:
Query:
ANALYZE TABLE visits_visit;
Output:
+-----------------------------+---------+----------+-----------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------------------------+---------+----------+-----------------------------+
| **************.visits_visit | analyze | status | Table is already up to date |
+-----------------------------+---------+----------+-----------------------------+
1 row in set (0.00 sec)
UPDATE:
SHOW INDEXES FROM visits_visit;
Output:
+--------------+------------+-----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+-----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| visits_visit | 0 | PRIMARY | 1 | id | A | 142938 | NULL | NULL | | BTREE | | |
| visits_visit | 1 | visits_visit_0880babc | 1 | visitor_hash | A | 142938 | NULL | NULL | YES | BTREE | | |
| visits_visit | 1 | visits_visit_5325a746 | 1 | ip_address | A | 142938 | NULL | NULL | YES | BTREE | | |
| visits_visit | 1 | resource | 1 | object_app | A | 1 | 100 | NULL | YES | BTREE | | |
| visits_visit | 1 | resource | 2 | object_model | A | 3 | 100 | NULL | YES | BTREE | | |
| visits_visit | 1 | resource | 3 | object_id | A | 959 | 100 | NULL | YES | BTREE | | |
+--------------+------------+-----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
It seems to me that although you have an index, MySQL doesn't know how to use it properly. It happens when information about data distribution (statistics) within a table are not up to date. In order to update them you should call ANALYZE TABLE visits_visit and then check results.
I was confused by misunderstanding of sql mechanisms, so i decided to create model Popular and save instances in it every 24 hours. Thanks to everyone, who tried to help.
As I said in your other question, Prefix indexes are virtually useless; don't use them except in rare circumstances.
Shrink the fields to reasonable lengths and you won't be tempted to use Prefix indexes.
The optimal index for that query is INDEX(object_model, object_id). Attempting to use INDEX(object_model(##), ...) will not get past object_model to anything after it.
If object_model is things like 'News', I suspect the other possible values are short, and perhaps there is a finite number of models. For "short" change to some smaller VARCHAR. For "finite", consider using ENUM('News', 'Weather', 'Sports', ...).
As for why it took longer after indexing...
Without the index, the Optimizer had no choice but to scan the entire table. This is a simple linear scan. It would read but not count any non-News rows.
With the index, the Optimizer has the additional choice of using the index. But, perhaps most rows are News? Well, it would scan the index (nice), but for each News item in the index, it would have to look up the row to get object_id (not so nice). It seems (from the timings) that the latter is less efficient.
By shrinking the declarations and using INDEX(object_model, object_id) (in this order), the query can be performed in the index. Think of the index as a mini-table with just those two columns in it. It is smaller. It is ordered by model, so it only needs to scan the 'News' part. The explain will show this "covering" by saying "Using index".
If all cases, the GROUP BY adds some overhead -- either keeping a hash of object_id in RAM or by saving intermediate results and sorting them. Then the ORDER BY requires a sort (or a priority hash) before the LIMIT can apply.

Spliting database according to user's ID

I have a database of 5m rows and it grows and it's getting harder and harder to do operations with it.
Is it a good idea to split the table in 10 tables (v0_table, v1_table... v9_table), where the number(v*) is the first number of the user's id?
The user's id in my case are not auto-increment so it would sort the data evenly across those 10 tables.
The problem is I have never done similar things....
Can anyone spot any disadvantages?
EDIT:
I would appreciate any help with tuning the structure or the query.
So the slowest query is the following one:
SELECT logos.user,
logos.date,
logos.level,
logos.title,
Count(guesses.id),
Sum(guesses.points)
FROM logos
LEFT JOIN guesses
ON guesses.user = '".$user['uid']."'
AND guesses.done = '1'
AND guesses.logo = logos.id
WHERE open = '1'
GROUP BY level
Where guesses table:
+--------+------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| logo | int(11) | NO | MUL | NULL | |
| user | int(11) | NO | MUL | NULL | |
| date | timestamp | NO | | CURRENT_TIMESTAMP | |
| points | int(4) | YES | MUL | 100 | |
| done | tinyint(1) | NO | MUL | 0 | |
+--------+------------+------+-----+-------------------+----------------+
LOGOS table:
+-------+--------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(100) | NO | | NULL | |
| img | varchar(222) | NO | MUL | NULL | |
| level | int(3) | NO | MUL | NULL | |
| date | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
| user | int(11) | NO | MUL | NULL | |
| open | tinyint(1) | NO | MUL | 0 | |
+-------+--------------+------+-----+-------------------+----------------+
EXPLAIN:
+----+-------------+---------+------+----------------+------+---------+-------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+----------------+------+---------+-------+------+----------------------------------------------+
| 1 | SIMPLE | logos | ref | open | open | 1 | const | 521 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | guesses | ref | done,user,logo | user | 4 | const | 87 | |
+----+-------------+---------+------+----------------+------+---------+-------+------+----------------------------------------------+
Your problem isn't that you have too much data, it's that this data is not properly indexed. Try adding an index:
CREATE INDEX open_level ON logos(open, level)
This should eliminate Using temporary; Using filesort on logos.
Basically, you need an index on this table for this query to cover two things: open - for WHERE open = '1' and level - for GROUP BY level in this order, as MySQL will first filter by open, then will group the results by level (implicitly sorting by it in process).
Short and sweet: No. This is never a good idea. Is your table properly indexed? Is MySQL properly tuned? Are your queries efficient? Are you using any caching?
Instead of sharding your table, you may want to examine other tables in your database to see if they can be split off into other dbs. For example tables, that are never joined to are great candidates for this type of vertical partitioning.
This allows you to optimize hardware for smaller sets of data.

MySQL: How do I speed up a "Count()" query with a "JOIN" and "order_by"?

I have the following two (simplified for the sake of example) tables in my MySQL db:
DESCRIBE appname_item;
-----------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(200) | NO | | NULL | |
+-----------------+---------------+------+-----+---------+----------------+
DESCRIBE appname_favorite;
+---------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | NO | MUL | NULL | |
| item_id | int(11) | NO | MUL | NULL | |
+---------------+----------+------+-----+---------+----------------+
I'm trying to get a list of items ordered by the number of favorites. The query below works, however there are thousands of records in the Item table, and the query is taking up to a couple of minutes to complete.
SELECT `appname_item`.`id`, `appname_item`.`name`, COUNT(`appname_favorite`.`id`) AS `num_favorites`
FROM `appname_item`
LEFT OUTER JOIN `appname_favorite` ON (`appname_item`.`id` = `appname_favorite`.`item_id`)
GROUP BY `appname_item`.`id`, `appname_item`.`name`
ORDER BY `num_favorites` DESC;
Here are the results of EXPLAIN, which provides some insight as to why the query is so slow (type "ALL", "using temporary", and "using filesort" should all be avoided if possible.)
+----+-------------+--------------------+------+-----------------------------+-----------------------------+---------+-------------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+------+-----------------------------+-----------------------------+---------+-------------------------------+------+---------------------------------+
| 1 | SIMPLE | appname_item | ALL | NULL | NULL | NULL | NULL | 574 | Using temporary; Using filesort |
| 1 | SIMPLE | appname_favorite | ref | appname_favorite_67b70d25 | appname_favorite_67b70d25 | 4 | appname.appname_item.id | 1 | |
+----+-------------+--------------------+------+-----------------------------+-----------------------------+---------+-------------------------------+------+---------------------------------+
I know that the easiest way to optimize the query is to add an Index, but I can't seem to figure out how to add an Index for a Count() query that involves a JOIN and an order_by. I should also mention that I am running this through the Django ORM, so would prefer to not change the sql query and just work on fixing and fine tuning the db to run the query in the most effective way.
I've been trying to figure this out for a while, so any help would be much appreciated!
UPDATE
Here are the indexes that are already in the db:
+--------------------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| appname_favorite | 0 | PRIMARY | 1 | id | A | 594 | NULL | NULL | | BTREE | |
| appname_favorite | 1 | appname_favorite_fbfc09f1 | 1 | user_id | A | 12 | NULL | NULL | | BTREE | |
| appname_favorite | 1 | appname_favorite_67b70d25 | 1 | item_id | A | 594 | NULL | NULL | | BTREE | |
+--------------------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
Actually you can't avoid filesort because the count is determined at the calculation time and is unknown in the index. The only solution I can imagine is to create a composite index for table appname_item, which may help a little or not, depending on your particular data:
ALTER TABLE appname_item ADD UNIQUE INDEX `item_id_name` (`id` ASC, `name` ASC);
There is nothing wrong with your query - it looks good.
It could be the the optimizer has out-of-date info about the table. Try running this:
ANALYZE TABLE <tableaname>;
for all tables involved.
Firstly, for the count() function, you can check this answer to know more detail:
https://stackoverflow.com/a/2710630/1020600
For example, using MySQL, count(*) will be fast under a MyISAM table
but slow under an InnoDB. Under InnoDB you should use count(1) or
count(pk)
If your storage engines is MYISAM and if you want to count on row (i guess so), use count(*) is enough.
From your EXPLAIN, I found there's no Key for appname_item, if i try to add a condition
where `appname_item`.`id` = `appname_favorite`.`item_id`
then the "key" appears. so funny but it's work.
The final sql like this
explain SELECT `appname_item`.`id`, `appname_item`.`name`, COUNT(*) AS `num_favorites`
FROM `appname_item`
LEFT OUTER JOIN `appname_favorite` ON (`appname_item`.`id` = `appname_favorite`.`item_id`)
where `appname_item`.`id` = `appname_favorite`.`item_id`
GROUP BY `appname_item`.`id`, `appname_item`.`name`
ORDER BY `num_favorites` DESC;
+----+-------------+------------------+--------+---------------+---------+---------+-------------------------------+------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key
| key_len | ref | rows | Extra
|
+----+-------------+------------------+--------+---------------+---------+---------+-------------------------------+------+----------------------------------------------+ | 1 | SIMPLE | appname_favorite | index | item_id |
item_id | 5 | NULL | 2312 | Using
index; Using temporary; Using filesort | | 1 | SIMPLE |
appname_item | eq_ref | PRIMARY | PRIMARY | 4 |
test.appname_favorite.item_id | 1 | Using where
|
+----+-------------+------------------+--------+---------------+---------+---------+-------------------------------+------+----------------------------------------------+
On my computer, table appname_item has 1686 rows and appname_favorite has 2312 rows, old sql takes from 15 to 23ms. new sql takes 3.7 to 5.3ms