I'm looking for some suggestion or optimization.
Table definition:
CREATE TABLE IF NOT EXISTS MilestonesAndFlags(
id SERIAL,
site_id BIGINT,
milestone BIGINT,
value BIGINT,
TIMESTAMP BIGINT,
timestamp_confirmation BIGINT,
COMMENT TEXT,
INDEX(site_id),
INDEX(milestone),
INDEX(milestone,site_id)
);
In this table I store different milestones with timestamp (to be able to make historical view of any changes) per different sites. Table has about million rows at that time.
The problem occures when I try to get latest actual milestone value per sites using queries like
SELECT site_id,
value
FROM SitesMilestonesAndFlags
WHERE id IN
(SELECT max(id)
FROM SitesMilestonesAndFlags
WHERE milestone=1
GROUP BY milestone,
site_id);
This request execution time is higher 5 minutes on my PC..
EXPLAIN seems to be OK:
+----+--------------------+--------------------+------+-----------------------+-------------+---------+-------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------------+------+-----------------------+-------------+---------+-------+--------+--------------------------+
| 1 | PRIMARY | MilestonesAndFlags | ALL | NULL | NULL | NULL | NULL | 1111320| Using where |
| 2 | DEPENDENT SUBQUERY | MilestonesAndFlags | ref | milestone,milestone_2 | milestone_2 | 9 | const | 180660| Using where; Using index |
+----+--------------------+--------------------+------+-----------------------+-------------+---------+-------+--------+--------------------------+
Any suggestion about more correct query or table structure?
MySQL >= 5.5
I'll take a shot and propose that you use a temporary aliased table instead of the where statement that is a dependent subquery. Not sure if mysql optimized or runs the subquery for every row of the main/outer query.
It would be very interesting if you ran the queries on large data sizes and come back with your results.
Example:
SELECT *
FROM MilestonesAndFlags AS MF,
(SELECT max(id) AS id
FROM MilestonesAndFlags
WHERE milestone=1
GROUP BY milestone,
site_id) AS MaxMF
WHERE MaxMF.id = MF.id;
SQLFiddle: http://sqlfiddle.com/#!2/a0d628/10
Pros and Cons:
Pro:
Avoidance of dependent subquery.
Cons:
Join causes projection and selection. This causes all rows of temp table to be "multiplied" with rows of original table and then where condition filters.
Update
I suspect also that the version of mysql plays a major role in the optimizations done.
Below the explain results for 2 different mysql versions where one defined the subquery as dependent and the other as not.
MySQL 5.5.32
ID SELECT_TYPE TABLE TYPE POSSIBLE_KEYS KEY KEY_LEN REF ROWS EXTRA
1 PRIMARY MilestonesAndFlags ALL (null) (null) (null) (null) 29 Using where; Using filesort
2 DEPENDENT SUBQUERY MilestonesAndFlags ref milestone,milestone_2 milestone_2 9 const 15 Using where; Using index
http://sqlfiddle.com/#!2/a0d628/11
MySQL MySQL 5.6.6 m9
ID SELECT_TYPE TABLE TYPE POSSIBLE_KEYS KEY KEY_LEN REF ROWS EXTRA
1 PRIMARY MilestonesAndFlags ALL (null) (null) (null) (null) 29 Using where; Using filesort
2 SUBQUERY MilestonesAndFlags ref milestone,milestone_2 milestone_2 9 const 15 Using where; Using index
http://sqlfiddle.com/#!9/a0d62/2
Related
I'm having trouble optimizing what I think is a reasonable / straightforward query in MySQL. After spending a few late nights on this, I thought I'd post my question here as I'm sure the solution will be obvious to somebody.
Here are the columns in a simplified version of my table T:
id: varchar(32) not null (primary key)
timestamp: bigint(20) unsigned not null
family: char(32) not null
size: bigint(20) unsigned not null
And here's the query that I need to optimize:
select
id
from
T
where
family = 'some constant'
and
timestamp between T1 and T2
order by
size desc
limit
5
My table is fairly large (~630M rows) so I'm hoping that an index can do most of the work for me... but I'm having trouble picking the right columns for my index.
It seems that in order for MySQL to use an index to answer a range query (like what I'm doing w/ the timestamp), that column must be the last column in the index. But then there's the "order by", which is on a different column. I'm not sure which one of these columns should be last in my index, so I've tried creating the following indices:
i1, on (family, timestamp, size)
i2, on (family, size, timestamp)
... but neither of these seems to help.
Any idea what I'm doing wrong?
(BTW I'm running MySQL 8 in Amazon RDS, in case that makes a difference.)
Thanks in advance for any helpful suggestions you may have!
EDIT #1 ---------------------------------------
I just created this simplified table that I described above, and copied 10M rows worth of data from the original table to the simplified table, just to keep things clean. Then I ran the following query:
mysql> select
-> id, size
-> from
-> T
-> where
-> family = 'be0bf4a203797729f38c6355b6d80903'
-> and
-> timestamp between 1578460425887 and 1584710866343
-> order by
-> size desc;
... and took 1.27 seconds. I really need this to be faster, otherwise this sort of query (which I need to run many times per second) will take much too long on the real dataset.
Here are the results of an EXPLAIN on the query above:
+----+-------------+-------+------------+-------+---------------------------+---------------------------+---------+------+--------+----------+------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------------------+---------------------------+---------+------+--------+----------+------------------------------------------+
| 1 | SIMPLE | T | NULL | range | idx_family_timestamp_size | idx_family_timestamp_size | 136 | NULL | 178324 | 100.00 | Using where; Using index; Using filesort |
+----+-------------+-------+------------+-------+---------------------------+---------------------------+---------+------+--------+----------+------------------------------------------+
I bet it's the filesort that's killing performance. Any ideas?
EDIT #2 ---------------------------------------
Oops, I just realized that I forgot the LIMIT in EDIT #1's query. I've fixed that here, and also grew T to 100M rows -- so now it's 10x the size that it was in my previous edit.
Now my query takes almost 10 sec. to run, and the results of the EXPLAIN are as follows:
mysql> explain
-> select
-> id, size
-> from
-> T
-> where
-> family = 'be0bf4a203797729f38c6355b6d80903'
-> and
-> timestamp between 1578460425887 and 1584710866343
-> order by
-> size desc
-> limit
-> 5;
+----+-------------+-------+------------+-------+-----------------------------------------------------+---------------------------+---------+------+--------+----------+--------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+-----------------------------------------------------+---------------------------+---------+------+--------+----------+--------------------------------------------------------+
| 1 | SIMPLE | T | NULL | range | idx_family_timestamp_size,idx_family_size_timestamp | idx_family_size_timestamp | 144 | NULL | 410744 | 100.00 | Using where; Using index for skip scan; Using filesort |
+----+-------------+-------+------------+-------+-----------------------------------------------------+---------------------------+---------+------+--------+----------+--------------------------------------------------------+
I have the following MYSQL query:
SELECT statusdate,consigneenamee,weight,productcode,
pieces,statustime,statuscode,statusdesc,generatingiata,
shipmentdate,shipmenttime,consigneeairportcode,signatory
FROM notes
where (shipperref='180908184' OR shipperref='180908184'
OR shipperref='180908184' OR shipperref='180908184 '
OR shipperref like 'P_L_%180908184')
order by edicheckpointdate asc, edicheckpointtime asc;
I added an index to speed up this query using the MYSQL Workbench but when executing the EXPLAIN command, it still does not show the key and shows as NULL:
+----+-------------+---------------+------+---------------+------+---------+------+---------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+------+---------------+------+---------+------+---------+-----------------------------+
| 1 | SIMPLE | dhltracking_2 | ALL | index2 | NULL | NULL | NULL | 3920874 | Using where; Using filesort |
+----+-------------+---------------+------+---------------+------+---------+------+---------+-----------------------------+
Any reason why this is happening and how I can speed up this query?
My Index:
You have LIKE statement in your query and I think that your index spans on more than 20-30% of table rows (or more..) and that's why MySQL can ignore it for performance reasons.
My proposal:
Add FORCE INDEX as #Solarflare proposes
Use FULLTEXT index (works on CHAR and VARCHAR also) and use MATCH ... AGAINST search (https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html)
Please, help me!
How to optimize a query like:
SELECT idu
FROM `user`
WHERE `username`!='manager'
AND `username`!='user1#yahoo.com'
ORDER BY lastdate DESC
This is the explain:
explain SELECT idu FROM `user` WHERE `username`!='manager' AND `username`!='ser1#yahoo.com' order by lastdate DESC;
+----+-------------+-------+------+----------------------------+------+---------+------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------------+------+---------+------+--------+-----------------------------+
| 1 | SIMPLE | user | ALL | username,username-lastdate | NULL | NULL | NULL | 208478 | Using where; Using filesort |
+----+-------------+-------+------+----------------------------+------+---------+------+--------+-----------------------------+
1 row in set (0.00 sec)
To avoid file sorting in a big database.
Since this query is just scanning all rows, you need an index on lastdate to avoid MySQL from having to order the results manually (using filesort, which isn't always to disk/temp table).
For super read performance, add the following multi-column "covering" index:
user(lastdate, username, idu)
A "covering" index would allow MySQL to just scan the index instead of the actual table data.
If using InnoDB and any of the above columns are your primary key, you don't need it in the index.
I have a query:
select SQL_NO_CACHE id from users
where id>1 and id <1000
and id in ( select owner_id from comments and content_type='Some_string');
(note that it is short of an actual large query used for my sphinx indexing, representing the problem)
This query is taking about 3.5 seconds(modifying range from id = 1..5000 makes it about 15 secs).
users table has about 35000 entries and comments table has about 8000 entries.
Explain on above query:
explain select SQL_NO_CACHE id from users
where id>1 and id <1000
and id in ( select distinct owner_id from d360_core_comments);
| id | select_type | table | type | possible_keys
| key | key_len | ref | rows | Extra |
| 1 | PRIMARY | users | range | PRIMARY
| PRIMARY | 4 | NULL | 1992 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | d360_core_comments | ALL | NULL | NULL | NULL | NULL | 6901 | Using where; Using temporary |
where the individual subquery(select owner_id from d360_core_comments where content_type='Community20::Topic';) here is taking almost 0.0 seconds.
However if I add index on owner_id,content_type, (note the order here)
create index tmp_user on d360_core_comments (owner_id,content_type);
My subquery runs as is in ~0.0 seconds with NO index used:
mysql> explain select owner_id from d360_core_comments where
content_type='Community20::Topic';
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | d360_core_comments | ALL | NULL | NULL
| NULL | NULL | 6901 | Using where |
However now my main query (select SQL_NO_CACHE id from users where id>1 and id <1000 and id in ( select owner_id from d360_core_comments where content_type='Community20::Topic');)
now runs in ~0 seconds with following explain:
mysql> explain select SQL_NO_CACHE id from users where id>1 and id
<1000 and id in ( select owner_id from d360_core_comments where
content_type='Community20::Topic');
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
| 1 | PRIMARY
| users | range | PRIMARY | PRIMARY | 4 | NULL | 1992 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY |
d360_core_comments | index_subquery | tmp_user | tmp_user | 5 | func | 34 | Using where |
So the main questions I have are:
If the index defined on the table used in my subquery is not getting used in my actual subquery then how it is optimizing the query here?
And why in the first place the first query was taking so much time when the actual subquery and main query independently are much faster?
What seems to happen in full query without the index is that MySQL will build (some sort of) temporary table of all the owner_id that the subquery generates. Then for each row from the users table that matches the id constraint, a lookup in this temporary construct will be performed. It is unclear if the overhead is creating the temporary construct, or if the lookup is implemented suboptimally (so that all elements are linearly matched for each row from the outer query.
When you create the index on owner_id, this doesn't change anything when you run only the subquery, because it has no condition on owner_id, nor does the index cover the content_type column.
However, when you run the full query with the index, there is more information available, since we now have values coming from the outer query that should be matched to owner_id, which is covered by the index. So the execution now seems to be to run the first part of the outer query, and for each matching row do an index lookup by owner_id. In other words, a possible execution plan is:
From Index-Users-Id Get all id matching id>1 and id <1000
For Each Row
Include Row If Index-Comment-OwnerId Contains row.Id
And Row Matches content_type='Some_string'
So in this case, the work to run 1000 (I assume) index lookups is faster than building a temporary construct of the 8000 possible owner_id. But this is only a hypothesis, since I don't know MySQL very well.
If you read this section of the MySQL Reference Manual: Optimizing Subqueries with EXISTS Strategy, you'll see that the query optimizer transforms your subquery condition from:
id in ( select distinct owner_id
from d360_core_comments
where content_type='Community20::Topic')
into:
exists ( select 1
from d360_core_comments
where content_type='Community20::Topic'
and owner_id = users.id )
This is why a index on (owner_id, content_type) is not useful when the subquery is tested as standalone query, but it is useful when considering the transformed subquery.
The first thing you should know is that MySQL can not optimize dependent subqueries, it is a for a long time well-known MySQL deficiency, that is going to be fixed in MySQL 6.x (just google for "mysql dependent subquery" and you will see). That is the subquery is basically executed for each matching row in users table. Since you have an additional condition, the overall execution time depends on that condition. The solution is to substitute the subquery with a join (the very optimization that you expect from MySQL under the hood).
Second, there is a syntax error in your subquery, and I think there was a condition on owner_id. Thus, when you add an index on owner_id it is used, but is not enough for the second condition (hence no using index), but why is not mentioned in EXPLAIN at all is a question (I think because of the condition on the users.id)
Third, I do not know why you need that id > 1 and id < 5000 condition, but you should understand that these are two range conditions that require very accurate, sometimes non-obvious and data-dependent indexing approach (as opposed to equality comparison conditions), and if you actually do not need them and use only to undestand why the query takes so long, then it was a bad idea and they would shed no light.
In case, the conditions are required and the index on owner_id is still there, I would rewrite the query as follows:
SELECT id
FROM (
SELECT owner_id as id
FROM comments
WHERE owner_id < 5000 AND content_type = 'some_string'
) as ids
JOIN users ON (id)
WHERE id > 1;
P.S. A composite index on (content_type, owner_id) will even be better for the query.
Step 1: Use id BETWEEN x AND y instead of id >= x AND id <= y. You may find some surprising gains because it indexes better.
Step 2: Adjust your sub-SELECT to do the filtering so it doesn't have to be done twice:
SELECT SQL_NO_CACHE id
FROM users
WHERE id IN (SELECT owner_id
FROM comments
WHERE content_type='Some_string'
AND owner_id BETWEEN 1 AND 1000);
There seems to be several errors in your statement. You're selecting 2 through 999 for instance, presumably off by one on both ends, and the subselect wasn't valid.
Please consider following schema
CREATE table articles (
id Int UNSIGNED NOT NULL AUTO_INCREMENT,
cat_id Int UNSIGNED NOT NULL,
status Int UNSIGNED NOT NULL,
date_added Datetime,
Primary Key (id)) ENGINE = InnoDB;
CREATE INDEX cat_list_INX ON articles (cat_id, status, date_added);
CREATE INDEX categories_list_INX ON articles (cat_id, status);
I have written following two queries which can be completely satisfied by the above two indicies but MySQL is putting where in extra column.
mysql> EXPLAIN SELECT cat_id FROM articles USE INDEX (cat_list_INX) WHERE cat_id=158 AND status=2 ORDER BY date_added DESC LIMIT 500, 5;
+----+-------------+----------+------+---------------+--------------+---------+-------------+-------+--------------------------+
| id | select_type | table ref | | type | possible_keys | key | key_len | rows | Extra |
+----+-------------+----------+------+---------------+--------------+---------+-------------+-------+--------------------------+
| 1 | SIMPLE | articles | ref | cat_list_INX | cat_list_INX | 5 | const,const | 50698 | Using where; Using index |
+----+-------------+----------+------+---------------+--------------+---------+-------------+-------+--------------------------+
mysql> EXPLAIN SELECT cat_id FROM articles USE INDEX (categories_list_INX) WHERE cat_id=158 AND status=2;
+----+-------------+----------+------+---------------------+---------------------+---------+-------------+-------+--------------------------+
| id | select_type | tab key |le | type | possible_keys | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------------+---------------------+---------+-------------+-------+--------------------------+
| 1 | SIMPLE | articles | ref | categories_list_INX | categories_list_INX | 5 | const,const | 52710 | Using where; Using index |
+----+-------------+----------+------+---------------------+---------------------+---------+-------------+-------+--------------------------+
As far as I know where requires an additional disk seek. Why it's not just using index?
The first query is filtering records at the mysql level outside of the storage engine because of your "ORDER BY" clause using date_added field.
This can be mitigated by moving the date_added field first in the index like this
CREATE INDEX cat_list_INX ON articles (date_added, cat_id, status);
The 2nd query - my version of mysql is not showing a "Using where" - I would not expect to either - maybe its because I have no records.
mysql> EXPLAIN SELECT cat_id FROM articles USE INDEX (categories_list_INX) WHERE cat_id=158 AND status=2;
+----+-------------+----------+------+---------------------+---------------------+---------+-------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------------+---------------------+---------+-------------+------+-------------+
| 1 | SIMPLE | articles | ref | categories_list_INX | categories_list_INX | 8 | const,const | 1 | Using index |
+----+-------------+----------+------+---------------------+---------------------+---------+-------------+------+-------------+
1 row in set (0.00 sec)
Extra column info from High Performance MySQL:
Using Index: This indicates that MySQL will use a covering index to avoid accessing the table. Don't confuse covering indexes with the index access type.
Using Where: This means the MySQL server will post-filter rows after the storage engine retrieves them. Many WHERE conditions that involve columns in an index can be checked by the storage engine when (and if) it reads the index, so not all queries with a WHERE clause will show "Using where". Sometimes the presence of "Using where" is a hint that the query can benefit from different indexing.