I am running a basic select on a table with 189,000 records. The table structure is:
items
id - primary key
ad_count - int, indexed
company_id - varchar, indexed
timestamps
the select query is:
select *
from `items`
where `company_id` is not null
and `ad_count` <= 100
order by `ad_count` desc, `items`.`id` asc
limit 50
On my production servers, just the MySQL portion of the execution takes 300 - 400ms
If I run an explain, I get:
select type: SIMPLE
table: items
type: range
possible_keys: items_company_id_index,items_ad_count_index
key: items_company_id_index
key_len: 403
ref: NULL
rows: 94735
Extra: Using index condition; Using where; Using filesort
When fetching this data in our application, we paginate it groups of 50, but the above query is "the first page"
I'm not too familiar with dissecting explain queries. Is there something I'm missing here?
An ORDER BY clause with different sorting order can cause the creation of temporary tables and filesort. MySQL below (and including) v5.7 doesn't handle such scenarios well at all, and there is actually no point in indexing the fields in the ORDER BY clause, as MySQL's optimizer will never use them.
Therefore, if the application's requirements allow, it's best to use the same order for all columns in the ORDER BY clause.
So in this case:
order by `ad_count` desc, `items`.`id` asc
Will become:
order by `ad_count` desc, `items`.`id` desc
P.S, as a small tip to read more about - it seems that MySQL 8.0 is going to change things and these use cases might perform significantly better when it's released.
Try replacing items_company_id_index with a multi-column index on (company_id, ad_count).
DROP INDEX items_company_id_index ON items;
CREATE INDEX items_company_id_ad_count_index ON items (company_id, ad_count);
This will allow it to use the index to test both conditions in the WHERE clause. Currently, it's using the index just to find non-null company_id, and then doing a full scan of those records to test ad_count. If most records have non-null company_id, it's scanning most of the table.
You don't need to retain the old index on just the company_id column, because a multi-column index is also an index on any prefix columns, because of the way B-trees work.
I could be wrong here (depending on your sql version this could be faster) but try a Inner Join with your company table.
Like:
Select *
From items
INNER JOIN companies ON companies.id = items.company_id
and items.ad_count <= 100
LIMIT 50;
because of your high indexcount building the btrees will slow down the database each time a new entry is inserted. Maybe remove the index of ad_count?! (this depends on how often you use that entry for queries)
Related
Mysql Version - 5.5.39
I have these two tables Bugs and BugStatus
I want to fetch the Open and Closed bug counts for a given user.
I am currently using this query
SELECT BugStatus.name,
count(BugStatus.name) AS count
FROM bugs
INNER JOIN BugStatus ON bugs.status = bugstatus.id
WHERE bugs.assignee='irakam'
GROUP BY bugstatus.name;
Now let's assume I am going to have 100,000 rows in my Bugs table. Does this query still stand or how should I modify it. I did use Explain but I am still confused. So can this query be optimised?
SQLFiddle link - Click here
Select bs.name,
count(*) as count -- simply count(*) unless you are avoiding nulls
from bugs
inner join BugStatus AS bs ON bugs.status = bs.id
where bugs.assignee='irakam'
group by bs.name;
bugs: INDEX(assignee) -- since filtering occurs first
Index Cookbook
You can further optimize your table by creating an index on bugs.status and bugs.assignee:
CREATE INDEX idx_bugs_assignee_status on bugs(assignee, status);
As far as the execution plan goes:
Select Type: Simple
This means you are executing a simple query, without any subqueries or unions.
Type: ALL
This means that you are doing a full-table scan is being done on the bug status table (every row is inspected), should be avoided for large tables, but this is ok for the BugStatus table, since it only contains 2 rows.
Type: ref
This means all rows with the matching index values are read from the Bugs table, for each combination of rows found in BugStatus.
possible_keys
This lists out the possible indexes that might be used to answer your query (The primary key of BugStatus, and the foreign key on bugs.status)
Key
This is the actual index that the optimizer chose to answer your query (none in the case of the BugStatus table, since a full-table scan is being performed on it, and the foreign key on status in the case of the bugs table.)
ref
This shows the index that was used on the joined table to compare results.
rows
This column indicates the number of rows that were examined.
extra: Using temporary; Using filesort
'Using temporary' means that mysql needs to create a temporary table to sort your results, which is done because of your GROUP BY clause.
'Using filesort' this means the database had to perform an another pass over your results to figure out how to retrieve the sorted rows.
extra: Using where
Means you had a WHERE clause in your query.
See: https://dev.mysql.com/doc/refman/5.5/en/explain-output.html
The following query is showing up in my log as not using an index:
SELECT ordinal,YEAR(ceremonydate) as yr
FROM awardinfo
ORDER BY YEAR(ceremonydate) DESC LIMIT 1;
Explain shows it's not using an index:
id: 1
select_type: SIMPLE
table: awardinfo
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 72
Extra: Using filesort
ordinal, ceremonydate both have an index. Are they not being used due to the yr alias? Is there a way to create an index on YEAR(ceremonydate) instead of just ceremonydate? Or is there a way to index an alias?
It is because of the alias. ORDER BY can use an index if it is ordering by something that is indexed. While ceremonyDate date may be indexed, YEAR(ceremoneyDate) changes the value of ceremonyDate to something completely different, so YEAR(ceremoneyDate) is not indexed.
And since you can't index an alias, this means that in order for an ORDER BY to use an index, it must be a simple column name, or list of column names.
You should be able to do this and use the index:
SELECT ordinal,YEAR(ceremonydate) as yr
FROM awardinfo
ORDER BY ceremonydate DESC LIMIT 1;
Without knowing what your data looks like, that may work for you instead.
More info:http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
You can test this by running a simple select query and using the alias:
SELECT ordinal, ceremonydate as yr FROM ... and start adding complexity to your query to see where the indexes stop being used. Most likely, because you are ordering based on YEAR(ceremonydate) mysql is creating a temporary table. Your best bet is to process ceremonydate in your code. MySQL loses a lot of efficiency with inline processing and computation like YEAR() because it has to create those temporary tables.
I wish to fetch the last 10 rows from the table of 1 M rows.
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`updated_date` datetime NOT NULL,
PRIMARY KEY (`id`)
)
One way of doing this is -
select * from test order by -id limit 10;
**10 rows in set (0.14 sec)**
Another way of doing this is -
select * from test order by id desc limit 10;
**10 rows in set (0.00 sec)**
So I did an 'EXPLAIN' on these queries -
Here is the result for the query where I use 'order by desc'
EXPLAIN select * from test order by id desc limit 10;
And here is the result for the query where I use 'order by -id'
EXPLAIN select * from test order by -id limit 10;
I thought this would be same but is seems there are differences in the execution plan.
RDBMS use heuristics to calculate the execution plan, they cannot always determine the semantic equivalence of two statements as it is a too difficult problem (in terms of theoretical and practical complexity).
So MySQL is not able to use the index, as you do not have an index on "-id", that is a custom function applied to the field "id". Seems trivial, but the RDBMSs must minimize the amount of time needed to compute the plans, so they get stuck with simple problems.
When an optimization cannot be found for a query (i.e. using an index) the system fall back to the implementation that works in any case: a scan of the full table.
As you can see in Explain results,
1 : order by id
MySQL is using indexing on id. So it need to iterate only 10 rows as it is already indexed. And also in this case MySQL don't need to use filesort algorithm as it is already indexed.
2 : order by -id
MySQL is not using indexing on id. So it needs to iterate all the rows.( e.g. 455952) to get your expected results. In this case MySQL needs to use filesort algorithm as id is not indexed. So it will obviously take more time :)
You use ORDER BY with an expression that includes terms other than the key column name:
SELECT * FROM t1 ORDER BY ABS(key);
SELECT * FROM t1 ORDER BY -key;
You index only a prefix of a column named in the ORDER BY clause. In this case, the index cannot be used to fully resolve the sort order. For example, if you have a CHAR(20) column, but index only the first 10 bytes, the index cannot distinguish values past the 10th byte and a filesort will be needed.
The type of table index used does not store rows in order. For example, this is true for a HASH index in a MEMORY table.
Please follow this link: http://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html
I have this query that runs unbelievably slow (4 minutes):
SELECT * FROM `ad` WHERE `ad`.`user_id` = USER_ID ORDER BY `ad`.`id` desc LIMIT 20;
Ad table has approximately 10 million rows.
SELECT COUNT(*) FROM `ad` WHERE `ad`.`user_id` = USER_ID;
Returns 10k rows.
Table has following indexes:
PRIMARY KEY (`id`),
KEY `idx_user_id` (`user_id`,`status`,`sorttime`),
EXPLAIN gives this:
id: 1
select_type: SIMPLE
table: ad
type: index
possible_keys: idx_user_id
key: PRIMARY
key_len: 4
ref: NULL
rows: 4249
Extra: Using where
I am failing to understand why does it take so long? Also this query is generated by ORM (pagination) so it would be nice to optimize it from outside (maybe add some extra index).
BTW this query works fast:
select aa.*
from (select id from ad where user_id=USER_ID order by id desc limit 20) as a
join ad as aa on a.id = aa.id ;
Edit: I tried another user with much less rows (dozens) than original one. I am wondering why doesn't original query use idx_user_id:
EXPLAIN SELECT * FROM `ad` WHERE `ad`.`user_id` = ANOTHER_ID ORDER BY `ad`.`id` desc LIMIT 20;
id: 1
select_type: SIMPLE
table: ad
type: ref
possible_keys: idx_user_id
**key: idx_user_id**
key_len: 3
ref: const
rows: 84
Extra: Using where; Using filesort
Edit2: with help of Alexander I decided to try force MySQL to use the index I want, and following query is much faster (1 sec instead of 4 mins):
SELECT *
FROM `ad` USE INDEX (idx_user_id)
WHERE `ad`.`user_id` = 1884774
ORDER BY `ad`.`id` desc LIMIT 20;
In the EXPLAIN output you can see that the key value is PRIMARY. This means that MySQL optimizer decided that it is faster to scan all table records (which are already sorted by id) and search first 20 records with the specific user_id value than to use idx_user_id key, which was considered by optimizer as a possible key and then rejected.
In your second query the optimizer sees that only id values are necessary in the subquery, and decided to use idx_user_id index instead, as that index allows to calculate the list of necessary ids without touching the table itself. Then only 20 records are retrieved by direct search by primary key value, which is very fast operation for that small number of records.
As you query with ANOTHER_ID shows, the MySQL wrong decision was based on the number of rows for the previous USER_ID value. This number was so big that the optimizer guessed that it will find the first 20 records with this specific user_id faster just by looking at the table records itself and skipping records with wrong user_id values.
If table rows are accessed by index, it requires random access operations. For typical HDD random access operations are about 100 time slower then sequential scan. So in order for index to be useful it must reduce the count of rows to less then 1% of the total rows count. If the rows for the specific USER_ID value accounts for more than 1% of the total number of rows, it may be more efficient to do full table scan instead of using of index, if we want to retrieve all these rows. But MySQL optimizer doesn't takes into account the fact that only 20 of this rows will be retrieved. So it mistakenly decided not to use index and do full table scan instead.
In order to make your query fast for any user_id value you can add one more index which will allow the query execution in the fastest way possible:
create index idx_user_id_2 on ad(user_id, id);
This index allows MySQL to do both filtering and sorting. To do that the columns used for filtering should be placed first, and the columns used for ordering should be placed second. MySQL should be smart enough to use that index, because this index allows to search all necessary records without skipping any records.
I have a table 1 and table 2.
Table 1
PARTNUM - ID_BRAND
partnum is the primary key
id_brand is "indexed"
Table 2
ID_BRAND - BRAND_NAME
id_brand is the primary key
brand_name is "indexed"
The table 1 contains 1 million of records and the table 2 contains 1.000 records.
I'm trying to optimize some query using EXPLAIN and after a lot of try I have reached a dead end.
EXPLAIN
SELECT pm.partnum, pb.brand_name
FROM products_main AS pm
LEFT JOIN products_brands AS pb ON pm.id_brand=pb.id_brand
ORDER BY pb.brand ASC
LIMIT 0, 10
The query returns this execution plan:
ID, SELECT_TYPE, TABLE, TYPE, POSSIBLE_KEYS, KEY, KEY_LEN , REF, ROWS, EXTRA
1, SIMPLE, pm, range, PRIMARY, PRIMARY, 1, , 1000000, Using where; Using temporary; Using filesort
1, SIMPLE, pb, ref, PRIMARY, PRIMARY, 4, demo.pm.id_pbrand, 1,
The MySQL query optimizer shows a temporary + filesort in the execution plan.
How can I avoid this?
The "EVIL" is in the ORDER BY pb.brand ASC. Ordering by that external field seems to be the bottleneck..
First of all, I question the use of an outer join seeing as the order by is operating on the rhs, and the NULL's injected by the left join are likely to play havoc with it.
Regardless, the simplest approach to speeding up this query would be a covering index on pb.id_brand and pb.brand. This will allow the order by to be evaluated 'using index' with the join condition. The alternative is to find some way to reduce the size of the intermediate result passed to the order-by.
Still, the combination of outer-join, order-by, and limit, leaves me wondering what exactly you are querying for, and if there might not be a better way of expressing the query itself.
Try replacing the join with a subquery. MySQL's optimizer kind of sucks; subqueries often give better performance than joins.
First, try changing your index on the products_brands table. Delete the existing one on brand_name, and create a new one:
ALTER TABLE products_brands ADD INDEX newIdx (brand_name, id_brand)
Then, the table will already have a "orderedByBrandName" index with the ids you need for the join, and you can try:
EXPLAIN
SELECT pb.brand_name, pm.partnum
FROM products_brands AS pb
LEFT JOIN products_main AS pm ON pb.id_brand = pm.id_brand
LIMIT 0, 10
Note that I also changed the order of the tables in the query, so you start with the small one.
This question is somewhat outdated, but I did find it, and so will other people.
Mysql uses temporary if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue.
So you just need to have the join order reversed by using STRAIGHT_JOIN, to bypass the order invented by optimizer:
SELECT STRAIGHT_JOIN pm.partnum, pb.brand_name
FROM products_brands AS pb
RIGHT JOIN products_main AS pm ON pm.id_brand=pb.id_brand
ORDER BY pb.brand ASC
LIMIT 0, 10
Also make sure that max_heap_table_size AND tmp_table_size variables are set to a number big enough to store the results:
SET global tmp_table_size=100000000;
SET global max_heap_table_size=100000000;
-- 100 megabytes in this example. These can be set in my.cnf config file, too.