I am using MySQl database.
I know if I create a index for a column, it will be fast to query data from a table by using that column index. But, I still have the following questions:
(suppose I have a table named cars, there is a column named country, and I have created index for country column)
I know for example the query SELECT * FROM cars WHERE country='japan'will use the index on column country to query data which is fast. How about != operation? will SELECT * FROM cars WHERE country!='japan'; also use index to query data?
Does WHERE ... IN ... operation use index to query data? For example SELECT * FROM cars WHERE country IN ('japan','usa','sweden');
The general answer is: it depends. It depends on what the database optimizer thinks is the best way to retrieve the data, and its decision may need on the distribution of the data.
For example, if 99% of your rows have country = 'japan', maybe the first query (=) will not use the index, but the country with != will use it.
You can use EXPLAIN SELECT to find out if your query uses an index or not.
For example:
EXPLAIN SELECT *
FROM A
WHERE foo NOT IN (1,4,5,6);
Might yield:
+----+-------------+-------+------+---------------
| id | select_type | table | type | possible_keys
+----+-------------+-------+------+---------------
| 1 | SIMPLE | A | ALL | NULL
+----+-------------+-------+------+---------------
+------+---------+------+------+-------------+
| key | key_len | ref | rows | Extra |
+------+---------+------+------+-------------+
| NULL | NULL | NULL | 2 | Using where |
+------+---------+------+------+-------------+
In this case, the query had no possible_keys and therefore used no key to do the query. It's the key column you'd be interested in.
More information here:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
http://dev.mysql.com/doc/refman/5.0/en/optimization-indexes.html
Use 'EXPLAIN' to see what happens with your query. You will probably be interested in the 'possible_keys' and 'key' column.
EXPLAIN SELECT * FROM CARS WHERE `country` != 'japan'
Both queries will use indexes (assuming there is index with country as it's first column)
When in doubt use EXPLAIN. You will also want to read (at least parts of) this
Related
I have a table below:
CREATE TABLE `student` (
`name` varchar(30) NOT NULL DEFAULT '',
`city` varchar(30) NOT NULL DEFAULT '',
`age` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`name`,`city`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I want to know, if I execute the following two SQLs, do they have the same performance?
mysql> select * from student where name='John' and city='NewYork';
mysql> select * from student where city='NewYork' and name='John';
Involved question:
If there is a multi-column indexes (name, city), do the two SQLs all use it?
Does the optimizer change the second sql to the first because of the index?
I execute explain on the two of them, the result is below:
mysql> explain select * from student where name='John' and city='NewYork';
+----+-------------+---------+-------+---------------+---------+---------+-------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+---------+---------+-------------+------+-------+
| 1 | SIMPLE | student | const | PRIMARY | PRIMARY | 184 | const,const | 1 | NULL |
+----+-------------+---------+-------+---------------+---------+---------+-------------+------+-------+
mysql> explain select * from student where city='NewYork' and name='John';
+----+-------------+---------+-------+---------------+---------+---------+-------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+---------+---------+-------------+------+-------+
| 1 | SIMPLE | student | const | PRIMARY | PRIMARY | 184 | const,const | 1 | NULL |
+----+-------------+---------+-------+---------------+---------+---------+-------------+------+-------+
If, given an index on (name,city), I execute the following two SQLs, do they have the same performance?
where name='John' and city='NewYork'
where city='NewYork' and name='John'
Yes.
The query planner doesn't care about the order of WHERE clauses. If both your clauses filter on equality, the planner can use the index. SQL is a declarative language, not procedural. That is, you say what you want, not how to get it. It's a little counterintuitive for many programmers.
It can also use the (name,city) index for WHERE name LIKE 'Raymo%' because name is first in the index. It cannot use that index for WHERE city = 'Kalamazoo', though.
It can use the index for WHERE city LIKE 'Kalam%' AND name = 'Raymond'. In that case it uses the index to find the name value, then scans for matching cities.
If you had an index on (city,name) you could also use that for WHERE city = 'Kalamazoo' AND name = 'Raymond'. If both indexes exist, the query planner will pick one, probably based on some kind of cardinality consideration.
Note. If instead you have two different indexes on city and name, the query planner can't (as of mid-2017) use more than one of them to satisfy WHERE city = 'Kalamazoo' AND name = 'Raymond'.
http://use-the-index-luke.com/ for good info.
The order of columns in a multi-column index matters.
The documentation of the multiple-column indexes reads:
MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.
This means an index on columns name and city can be used when an index on column name is needed but it cannot be used instead of an index on column city.
The order of conditions in the WHERE clause doesn't matter. The MySQL optimizer does a lot of work on the conditions on the WHERE clause to eliminate as many candidate rows as possible as early as possible and to read as little data as possible from the tables and indexes (because some of the read data is dropped because it doesn't match the entire WHERE clause).
I'm running follwing query on the table, I'm changing values in the where condition, while running in one case it's taking one index and another case taking it's another(wrong??) index.
row count for query 1 is 402954 it's taking approx 1.5 sec
row count for query 2 is 52097 it's taking approx 35 sec
Both queries query 1 and query 2 are same , only I'm changing values in the where condition
query 1
EXPLAIN SELECT
log_type,count(DISTINCT subscriber_id) AS distinct_count,
count(subscriber_id) as total_count
FROM campaign_logs
WHERE
domain = 'xxx' AND
campaign_id='123' AND
log_type IN ('EMAIL_SENT', 'EMAIL_CLICKED', 'EMAIL_OPENED', 'UNSUBSCRIBED') AND
log_time BETWEEN
CONVERT_TZ('2015-02-12 00:00:00','+05:30','+00:00') AND
CONVERT_TZ('2015-02-19 23:59:58','+05:30','+00:00')
GROUP BY log_type;
EXPLAIN of above query
+----+-------------+---------------+-------+------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+------+--------+-------------+
| 1 | SIMPLE | campaign_logs | range | campaign_id_index,domain_index,log_type_index,log_time_index,campaignid_domain_logtype_logtime_index | campaignid_domain_logtype_logtime_index | 468 | NULL | 402954 | Using where |
+----+-------------+---------------+-------+------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+------+--------+-------------+
query 2
EXPLAIN SELECT
log_type,count(DISTINCT subscriber_id) AS distinct_count,
count(subscriber_id) as total_count
FROM stats.campaign_logs
WHERE
domain = 'yyy' AND
campaign_id='345' AND
log_type IN ('EMAIL_SENT', 'EMAIL_CLICKED', 'EMAIL_OPENED', 'UNSUBSCRIBED') AND
log_time BETWEEN
CONVERT_TZ('2014-02-05 00:00:00','+05:30','+00:00') AND
CONVERT_TZ('2015-02-19 23:59:58','+05:30','+00:00')
GROUP BY log_type;
explain of above query
+----+-------------+---------------+-------------+------------------------------------------------------------------------------------------------------+--------------------------------+---------+------+-------+------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------------+------------------------------------------------------------------------------------------------------+--------------------------------+---------+------+-------+------------------------------------------------------------------------------+
| 1 | SIMPLE | campaign_logs | index_merge | campaign_id_index,domain_index,log_type_index,log_time_index,campaignid_domain_logtype_logtime_index | campaign_id_index,domain_index | 153,153 | NULL | 52097 | Using intersect(campaign_id_index,domain_index); Using where; Using filesort |
+----+-------------+---------------+-------------+------------------------------------------------------------------------------------------------------+--------------------------------+---------+------+-------+------------------------------------------------------------------------------+
Query 1 is using correct index because I have composite index
Query 2 is using index merge , it's taking long time to execute
Why MySql using different indexes for same query
I know we can mention USE INDEX in the query , but why MySql is not picking correct index in this case??. am I doing anything wrong??
No, you're not doing anything wrong.
As Chipmonkey stated in comments, sometimes MySQL will choose the wrong execution plan because of outdated table statistics. You can update the table statistics by performing ANALYZE TABLE.
Still, MySQL optimizer isn't that sophisticated. It sees that in both cases, MySQL will have to visit both the secondary index and then perform a lookup to the clustered index to get the actual table data, so when it saw that perhaps the second query had better selectivity by using the two separate indexes and merging them, you can't blame it too much just because it guessed wrong.
I'm guessing that if you had a covering index so that MySQL could perform the entire query with just the index, it will favor that index over performing a merge.
Try adding subscriber_id to the end of your multi-column index to get a covering index.
Otherwise, use USE INDEX or FORCE INDEX, because that's what they're there for. You know more about the data than MySQL does.
I suggest you try this:
Add this permutation of your compound index.
(campaign_id,domain,log_time,log_type,subscriber_id)
Change your query to remove the WHERE log_type IN() criterion, thus allowing the aggregate function to use all the records it finds in the range scan on log_time. Including subscriber_id in the index should allow the whole query to be satisfied directly from the index. That is, this is a covering index.
Finally, you can filter on your log_type values by wrapping the whole query in
SELECT *
FROM (/*the whole query*/) x
WHERE log_type IN
('EMAIL_SENT', 'EMAIL_CLICKED', 'EMAIL_OPENED', 'UNSUBSCRIBED')
ORDER BY log_type
This should give you better, and more predictable, performance.
(Unless the log_types you want are a tiny subset of the records, in which case please ignore this suggestion.)
I have a table that has 4,000,000 records.
The table is created that : (user_id int, partner_id int, PRIMARY_KEY ( user_id )) engine=InnoDB;
I want to test the performance of select 100 records.
Then, I tested following:
mysql> explain select user_id from MY_TABLE use index (PRIMARY) where user_id IN ( 1 );
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------------+
| 1 | PRIMARY | MY_TABLE | const | PRIMARY | PRIMARY | 4 | const | 1 | Using index |
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------------+
1 row in set, 1 warning (0.00 sec)
This is OK.
But, this query is buffered by mysql.
So, this test make no after the first test.
Then, I thinked of a sql that select by random value.
I tested following:
mysql> explain select user_id from MY_TABLE use index (PRIMARY) where user_id IN ( select ceil( rand() ) );
+----+-------------+----------+-------+---------------+---------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------+---------+--------------------------+
| 1 | PRIMARY | MY_TABLE | index | NULL | PRIMARY | 4 | NULL | 3998727 | Using where; Using index |
+----+-------------+----------+-------+---------------+---------+---------+------+---------+--------------------------+
But, it's bad.
Explain shows that possible_keys is NULL.
So, full index scanning is planned, and in fact, it's too slow rather than the one before.
Then, I want to ask you to teach me how do I write random value with index looking up.
Thanks
Using rand() in SQL is usually a sure-fire way to make the query slow. A common theme here is people using it in ORDER BY to get a random sequence. It's slow because not only does it throw away the indexes, but it also reads through the whole table.
However in your case, the fact that the function calls are in a sub-query ought to allow the outer query to still use its indexes. The fact that it isn't seems quite odd (so I've given the question a +1 vote).
My theory is that perhaps MySQL's optimiser is getting it wrong -- it's seeing the functions in the inner query, and deciding incorrectly that it can't use an index.
The only thing I can suggest to work around that is using force index to push MySQL into using the index you want.
See the definition of rand().
If i understand right, you are trying to get a random record from the database. If that is the case, again from the rand() definition:
ORDER BY RAND() combined with LIMIT is useful for selecting a random sample from a set of rows:
SELECT * FROM table1, table2 WHERE a=b AND c<d -> ORDER BY RAND() LIMIT 1000;
It's a limitation of the MySQL optimizer, that it can't tell that the subquery returns exactly one value, it has to assume the subquery returns multiple rows with unpredictable values, potentially even all the values of user_id. Therefore it decides it's just going to do an index scan.
Here's a workaround:
mysql> explain select user_id from MY_TABLE use index (PRIMARY)
where user_id = ( select ceil( rand() ) );
Note that MySQL's RAND() function returns a value in the range 0 <= v < 1.0. If you CEIL() it, you'll likely get the value 1. Therefore you'll virtually always get the row where user_id=1. If you don't have such a row in your table, you'll get an empty set result. You certainly won't get a user chosen randomly among all your users.
To fix that problem, you'd have to multiply the rand() by the number of distinct user_id values. And that brings up the problem that you might have gaps, so a randomly chosen value won't match any existing user_id.
Re your comment:
You'll always see possible keys as NULL when you get an index scan (i.e., "type" is "index").
I tried your explain query on a similar table, and it appears that the optimizer can't figure out that the subquery is a constant expression. You can workaround this limitation by calculating the random number in application code and then using the result as a constant value in your query:
select user_id from MY_TABLE use index (PRIMARY)
where user_id = $random;
I have a table from a legacy system which does not have a primary key. It records transactional data for issuing materials in a factory.
For simplicities sake, lets say each row contains job_number, part_number, quantity & date_issued.
I added an index to the date issued column. When I run an EXPLAIN SELECT * FROM issued_parts WHERE date_issued > '20100101', it shows this:
+----+-------------+----------------+------+-------------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+------+-------------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | issued_parts | ALL | date_issued_alloc | NULL | NULL | NULL | 9724620 | Using where |
+----+-------------+----------------+------+-------------------+------+---------+------+---------+-------------+
So it sees the key, but it doesn't use it?
Can someone explain why?
Something tells me the MySQL Query Optimizer decided correctly.
Here is how you can tell. Run these:
Count of Rows
SELECT COUNT(1) FROM issued_parts;
Count of Rows Matching Your Query
SELECT COUNT(1) FROM issued_parts WHERE date_issued > '20100101';
If the number of rows you are actually retrieving exceeds 5% of the table's total number, the MySQL Query Optimizer decides it would be less effort to do a full table scan.
Now, if your query was more exact, for example, with this:
SELECT * FROM issued_parts WHERE date_issued = '20100101';
then, you will get a different EXPLAIN plan altogether.
possible_keys names keys with the relevant columns in, but that doesn't mean that each key in it is going to be useful for the query. In this case, none are.
There are multiple types of indexes (indices?). A hash index is a fast way to do a lookup on an item given a specific value. If you have a bunch of discreet values that you are querying against, (for example, a list of 10 dates) then you can calculate a hash for each of those values, and look them up in the index. Since you aren't doing a lookup on a specific value, but rather doing a comparison, a hash index won't help you.
On the other hand, a B-Tree index can help you because it gives an ordering to the elements it is indexing. For instance, see here: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html for mysql (search for B-Tree Index Characteristics) . You may want to check that your table is using a b-tree index for it's index column.
I am querying a mySQL database to retrieve the data from 1 particular row. I'm using the table primary key as the WHERE constraint parameter.
E.g.
SELECT name FROM users WHERE userid = 4
The userid column is the primary key of the table. Is it good practice to use LIMIT 1 on the end of that mySQL statement? Or are there any speed benefits?
I would call that a bad practice as when it comes to something like a userid it's generally unique and you won't have more than one. Therefore, having LIMIT 1 seems pretty contradictory and someone who comes to maintain your code later may have to second-guess your design.
Also, I don't think it has any speed benefit at all. You can check out mySQL's Explain for a simple tool to analyze a query.
Note, as mentioned in the comments. LIMIT # does have speed and general benefits in other cases, just not this one.
The userid column is the primary key of the table. Is it good practice to use LIMIT 1 on the end of that mySQL statement? Or are there any speed benefits?
It is not good practice to use LIMIT 1 at the end of the example - it's completely unnecessary, because the userid column is a primary key. A primary key means there is only one row/record in the table with that value, only one row/record will ever be returned.
But the ultimate indicator is the explain plan:
explain SELECT t.name FROM USERS t WHERE t.userid = 4
...returns:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
-----------------------------------------------------------------------------------------------------
1 | SIMPLE | users | const | PRIMARY | PRIMARY | 4 | const | 1 |
...and:
explain SELECT t.name FROM USERS t WHERE t.userid = 4 LIMIT 1
...returns:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
-----------------------------------------------------------------------------------------------------
1 | SIMPLE | users | const | PRIMARY | PRIMARY | 4 | const | 1 |
Conclusion
No difference, no need. It appears to be optimized out in this case (only searching against the primary key).
The LIMIT clause
Using LIMIT without an ORDER BY will return an arbitrary row/record if more than one is returned. For example, using the "John Smith" scenario where 2+ people can have the name "John Smith":
SELECT t.userid
FROM USERS t
WHERE t.first_name = 'John'
AND t.last_name = 'Smith'
LIMIT 1
...risks returning any of the possible userid values where the first name is "John" and the last name is "Smith". It can't be guaranteed to always return the same value, and the likelihood of getting a different value every time increases with the number of possible records.
Personally I don't care for the use of LIMIT. The syntax isn't supported on Oracle, SQL Server or DB2 - making queries less portable. LIMIT is a tool to be used conservatively, not the first thing you reach for - know when to use aggregate and/or analytic functions.