mysql query optimization strategy on single table covering index - mysql

currently i am working on mysql query optimization . My mysql table contain 200 million records.
After doing lots of google i finally decide to use covering index.
So, i made a index in this order
alter table table_name add index
index_name(MODEL, COUNTRY, REGION, NETWORK, E_TUAL,
ECHS, DEVID, COUNTRY_CODE, SOURCE);
when i run this query efficiency is good as compare to previous
SELECT E_TUAL,ECHS, DEVID, MODEL, COUNTRY, REGION, COUNTRY_CODE, NETWORK, SOURCE
FROM table_name
WHERE model = 'fox | s453' AND country = 'india' AND
E_TUAL <= '1435755600' AND
E_TUAL >= '1433163600'
ORDER BY E_TUAL DESC
LIMIT 101 OFFSET 0;
+----+-------------+-----------+------+---------------+--------+---------+-------------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+--------+---------+-------------+------+-----------------------------+
| 1 | SIMPLE | mytable | ref | genrl | genrl | 131 | const | 239 | Using where; Using filesort |
+----+-------------+-----------+------+---------------+--------+---------+-------------+------+-----------------------------+
But This query is worst compare to older one (before index)
SELECT E_TUAL,ECHS, DEVID, MODEL,COUNTRY, REGION, COUNTRY_CODE, NETWORK, SOURCE
FROM table_name
WHERE model = 'fox | s453' AND
country is not null AND
E_TUAL <= '1435755600' AND
E_TUAL >= '1433163600'
ORDER BY E_TUAL DESC
LIMIT 101 OFFSET 0;
+----+-------------+-----------+-------+---------------+--------+---------+------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+--------+---------+------+------+-----------------------------+
| 1 | SIMPLE | mytable | range | genrl | genrl | 131 | NULL | 1105 | Using where; Using filesort |
+----+-------------+-----------+-------+---------------+--------+---------+------+------+-----------------------------+
i write country is not null to maintain order so mysql optimizer use index
plz help how to improve efficiency when some of column field is absent and
to make use of index what can i do ?
I am not good in english. so for any mistake i am sorry

Dispense with the covering index and use two separate indexes:
alter table table_name add index index_name(MODEL, COUNTRY, E_TUAL);
for the first query. And:
alter table table_name add index index_name(MODEL, E_TUAL);
A covering index might provide marginal improvement, but it will use a lot of space. Instead, focus on the WHERE clauses and then the ORDER BY if you can use those for the index.
As a note: your ordering of the columns was not optimal for either query.

Related

Why does an indexed mysql query filtered on less char values result in more rows examined?

When I run the following query, I see the expected rows examined as 40
EXPLAIN SELECT s.* FROM subscription s
WHERE s.current_period_end_date <= NOW()
AND s.status in ('A', 'T')
AND s.period_end_action in ('R','C')
ORDER BY s._id ASC limit 20;
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | s | index | status,current_period_end_date | PRIMARY | 4 | NULL | 40 | Using where |
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
But when I run this query that simply changes AND s.period_end_action in ('R','C') to AND s.period_end_action = 'C', I see the expected rows examined as 611
EXPLAIN SELECT s.* FROM subscription s
WHERE s.current_period_end_date <= NOW()
AND s.status in ('A', 'T')
AND s.period_end_action = 'C'
ORDER BY s._id ASC limit 20;
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | s | index | status,current_period_end_date | PRIMARY | 4 | NULL | 611 | Using where |
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
I have the following indexes on the subscription table:
_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
INDEX(status, period_end_action),
INDEX(current_period_end_date),
Any ideas? I don't understand why removing one of the period_end_action values would cause such a large increase in rows examined?
(I agree with others that EXPLAIN often has terrible row estimates.)
Actually the numbers might be reasonable (though I doubt it). The optimizer decided to do a table scan in both cases. And the query with fewer options for period_end_action probably has to scan farther to get the 20 rows. This is because it punted on using either of your secondary indexes.
These indexes are more likely to help your second query:
INDEX(period_end_action, _id)
INDEX(period_end_action, status)
INDEX(period_end_action, current_period_end_date)
The optimal index is usually starts with any columns tested by =.
Since there is no such thing for your first query, the Optimizer probably decided to scan in _id order so that it could avoid the "sort" mandated by ORDER BY.

How to make Mysql use index for selects with unary condition in 'where'

I have a query in Ruby on Rails application with a strange unary condition in where:
SELECT * FROM messages WHERE (active) ORDER BY id DESC;
I didn't even know that such conditions are allowed and can't find documentation describing this syntax anywhere. Experiments show that this is some kind of an equivalent to
SELECT * FROM messages WHERE active!=0 ORDER BY id DESC;
The problem is that Mysql uses index for the second variany only:
mysql> explain SELECT * FROM messages WHERE (active) ORDER BY id DESC;
+----+-------------+----------+------+---------------+------+---------+------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+------+---------+------+--------+-----------------------------+
| 1 | SIMPLE | messages | ALL | NULL | NULL | NULL | NULL | 560646 | Using where; Using filesort |
+----+-------------+----------+------+---------------+------+---------+------+--------+-----------------------------+
mysql> explain SELECT * FROM messages WHERE active!=0 ORDER BY id DESC;
+----+-------------+----------+-------+------------------+--------+---------+------+------+---------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+------------------+--------+---------+------+------+---------------------------------------+
| 1 | SIMPLE | messages | range | active_id,active | active | 2 | NULL | 1394 | Using index condition; Using filesort |
+----+-------------+----------+-------+------------------+--------+---------+------+------+---------------------------------------+
I can't change the query text since, as it was explained to me, the application generates queries on the fly and they are not stored anywhere. So my questions are:
Do I understand the meaning of this unary clause correctly?
Why such queries don't use indices?
Is it possible to make Mysql to use an index on this one without changing the query text?
You are right, both clauses should have the same result (assuming an int type).
The server code doesn't appear to recognise the equivalence. Testing on MySQL-8.0 and the same query plan exists. MariaDB-10.2, and 10.3 appear to both use an index for both cases.
No.
If the range of values in active is 0 or 1; A SELECT * FROM messages WHERE active=1 ORDER BY id DESC query will be able to use the index for ordering (hence no filesort), if id is a primary key;

How to avoid Using Filesort in "!=" mysql

Please, help me!
How to optimize a query like:
SELECT idu
FROM `user`
WHERE `username`!='manager'
AND `username`!='user1#yahoo.com'
ORDER BY lastdate DESC
This is the explain:
explain SELECT idu FROM `user` WHERE `username`!='manager' AND `username`!='ser1#yahoo.com' order by lastdate DESC;
+----+-------------+-------+------+----------------------------+------+---------+------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------------+------+---------+------+--------+-----------------------------+
| 1 | SIMPLE | user | ALL | username,username-lastdate | NULL | NULL | NULL | 208478 | Using where; Using filesort |
+----+-------------+-------+------+----------------------------+------+---------+------+--------+-----------------------------+
1 row in set (0.00 sec)
To avoid file sorting in a big database.
Since this query is just scanning all rows, you need an index on lastdate to avoid MySQL from having to order the results manually (using filesort, which isn't always to disk/temp table).
For super read performance, add the following multi-column "covering" index:
user(lastdate, username, idu)
A "covering" index would allow MySQL to just scan the index instead of the actual table data.
If using InnoDB and any of the above columns are your primary key, you don't need it in the index.

Should I use derived table in this situation?

I need to fetch 10 random rows from a table, the query below will not do it as it is going to be very slow on a large scale (I've read strong arguments against it):
SELECT `title` FROM table1 WHERE id1 = 10527 and id2 = 37821 ORDER BY RAND() LIMIT 10;
EXPLAIN:
select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
------------+-------------+------+---------------+-------+---------+------+------+----------------+
SIMPLE | table1 | ref | id1,id2 | id2 | 5 | const| 7 | Using where; Using temporary; Using filesort
I tried the following workaround:
SELECT * FROM
(SELECT `title`, RAND() as n1
FROM table1
WHERE id1 = 10527 and id2 = 37821) TTA
ORDER BY n1 LIMIT 10;
EXPLAIN:
select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
------------+-------------+------+---------------+-------+---------+------+------+----------------+
PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 7 | Using filesort |
DERIVED | table1 | ref | id1,id2 | id2 | 5 |const | 7 | Using where |
But I’ve read also couple of statements against using derived tables.
Could you please tell me if the latter query is going to make any improvement?
You should try the first method to see if it works for you. If you have an index on table1(id1, id2) and there are not very many occurrences of any given value pair, then the performance is probably fine for what you want to do.
Your second query is going to have somewhat worse performance than the first. The issue with the performance of order by rand() is not the time taken to calculate random numbers. The issue is the order by, and your second query is basically doing the same thing, with the additional overhead of a derived table.
If you know that there were always at least, say, 1000 matching values, then the following would generally work faster:
SELECT `title`
FROM table1
WHERE id1 = 10527 and id2 = 37821 and rand() < 0.05
ORDER BY RAND()
LIMIT 10;
This would take a random sample of about 5% of the data and with 1,000 matching rows, you would almost always have at least 10 rows to choose from.

Is index dependent on selected columns?

I am executing most of the queries based on the time. So i created index for the created time. But , The index only works , If I select the indexed columns only. Is mysql index is dependant the selected columns?.
My Assumption On Index
I thought index is like a telephone dictionary index page. Ex: If i want to find "Mark" . Index page shows which page character "M" starts in the directory. I think as same as the mysql works.
Table
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| Name | varchar(100) | YES | | NULL | |
| OPERATION | varchar(100) | YES | | NULL | |
| PID | int(11) | YES | | NULL | |
| CREATED_TIME | bigint(20) | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
Indexes On the table.
+-----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| IndexTest | 0 | PRIMARY | 1 | ID | A | 10261 | NULL | NULL | | BTREE | | |
| IndexTest | 1 | t_dx | 1 | CREATED_TIME | A | 410 | NULL | NULL | YES | BTREE | | |
+-----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Queries Using Indexes:
explain select * from IndexTest where ID < 5;
+----+-------------+-----------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | IndexTest | range | PRIMARY | PRIMARY | 4 | NULL | 4 | Using where |
+----+-------------+-----------+-------+---------------+---------+---------+------+------+-------------+
explain select CREATED_TIME from IndexTest where CREATED_TIME > UNIX_TIMESTAMP(CURRENT_DATE())*1000;
+----+-------------+-----------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+------+---------+------+------+--------------------------+
| 1 | SIMPLE | IndexTest | range | t_dx | t_dx | 9 | NULL | 5248 | Using where; Using index |
+----+-------------+-----------+-------+---------------+------+---------+------+------+--------------------------+
Queries Not using Indexes
explain select count(distinct(PID)) from IndexTest where CREATED_TIME > UNIX_TIMESTAMP(CURRENT_DATE())*1000;
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | IndexTest | ALL | t_dx | NULL | NULL | NULL | 10261 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
explain select PID from IndexTest where CREATED_TIME > UNIX_TIMESTAMP(CURRENT_DATE())*1000;
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | IndexTest | ALL | t_dx | NULL | NULL | NULL | 10261 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
Short answer: No.
Whether indexes are used depends on the expresion in your WHERE clause, JOINs etc, but not on the columns you select.
But no rule without an exception (or actually a long list of those):
Long answer: Usually not
There are a number of factors used by the MySQL Optimizer in order to determine whether it should use an index.
The optimizer may decide to ignore an index if...
another (otherwise non-optimal) saves it from accessing the table data at all
it fails to understand that an expression is a constant
its estimates suggest it will return the full table anyway
if its use will cause the creation of a temporary file
... and tons of other reasons, some of which seem not to be documented anywhere
Sometimes the choices made by said optimizer are... erm... lets call them sub-optimal. Now what do you do in those cases?
You can help the optimizer by doing an OPTIMIZE TABLE and/or ANALYZE TABLE. That is easy to do, and sometimes helps.
You can make it use a certain index with the USE INDEX(indexname) or FORCE INDEX(indexname) syntax
You can make it ignore a certain index with the IGNORE INDEX(indexname) syntax
More details on Index Hints, Optimize Table and Analyze Table on the MySQL documentation website.
Actually, it makes no difference wether you select the column or not. Indexes are used for lookups, meaning for reducing really fast the number of records you need retrieved. That makes it usually useful in situations where: you have joins, you have where conditions. Also indexes help alot in ordering.
Updating and deleting can be sped up quite alot using indexes on the where conditions as well.
As an example:
table: id int pk ai, col1 ... indexed, col2 ...
select * from table -> does not use a index
select id from table where col1 = something -> uses the col1 index although it is not selected.
Looking at the second query, mysql does a lookup in the index, locates the records, then in this case stops and delivers (both id and col1 have index and id happens to be pk, so no need for a secondary lookup).
Situation changes a little in this case:
select col2 from table where col1 = something
This will make internally 2 lookups: 1 for the condition, and 1 on the pk for delivering the col2 data. Please notice that again, you don't need to select the col1 column to use the index.
Getting back to your query, the problem lies with: UNIX_TIMESTAMP(CURRENT_DATE())*1000;
If you remove that, your index will be used for lookups.
Is mysql index is dependant the selected columns?.
Yes, absolutely.
For example:
MySQL cannot use the index to perform lookups if the columns do not form a leftmost
prefix of the index. Suppose that you have the SELECT statements shown here:
SELECT * FROM tbl_name WHERE col1=val1;
SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;
SELECT * FROM tbl_name WHERE col2=val2;
SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;
If an index exists on (col1, col2, col3), only the first two queries use the index.
The third and fourth queries do involve indexed columns, but (col2) and (col2, col3)
are not leftmost prefixes of (col1, col2, col3).
Have a read through the extensive documentation.
for mysql query , the answer is yes, but not all
the query:
explain select * from IndexTest where ID < 5;
use the table cluster index if you use innodb, its table's primary key, so it use primary for query
the second query:
select CREATED_TIME from IndexTest where CREATED_TIME >
UNIX_TIMESTAMP(CURRENT_DATE())*1000;
this one is just fetch the index column that mysql does not need to fetch data from table but just index, so your explain result got "Using Index"
the query:
select count(distinct(PID)) from IndexTest where CREATED_TIME >
UNIX_TIMESTAMP(CURRENT_DATE())*1000;
it look like this
select PID from IndexTest where
CREATE_TIME>UNIX_TIMESTAMP(CURRENT_DATE())*1000 group by PID
mysql can use index to fetch data from database also, but mysql thinks this query it no need to use index to fetch data, because of the where condition filter, mysql thinks that use index fetch data is more expensive than scan all table, you can use force index also
the same reason for your last query
hopp this answer can help you
indexing helps speed the search for that particular column and associated data rather than the table data. So you have to include the indexed column to speed up select.