How does SQL engines differ when we use equal sign and IN operator have same value? Does execution time changes?
1st one using equality check operator
WHERE column_value = 'All'
2nd one using IN operator and single value
WHERE column_value IN ('All')
Does SQL engine changes IN to = if only one value is there?
Is there any difference for same in MySQL and PostgreSQL?
There is no difference between those two statements, and the optimiser will transform the IN to the = when IN has just one element in it.
Though when you have a question like this, just run both statements, run their execution plan and see the differences. Here - you won't find any.
After a big search online, I found a document on SQL to support this (I assume it applies to all DBMS):
If there is only one value inside the parenthesis, this commend [sic] is equivalent to,
WHERE "column_name" = 'value1
Here is the execution plan of both queries in Oracle (most DBMS will process this the same):
EXPLAIN PLAN FOR
select * from dim_employees t
where t.identity_number = '123456789'
Plan hash value: 2312174735
-----------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID| DIM_EMPLOYEES |
| 2 | INDEX UNIQUE SCAN | SYS_C0029838 |
-----------------------------------------------------
And for IN() :
EXPLAIN PLAN FOR
select * from dim_employees t
where t.identity_number in('123456789');
Plan hash value: 2312174735
-----------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID| DIM_EMPLOYEES |
| 2 | INDEX UNIQUE SCAN | SYS_C0029838 |
-----------------------------------------------------
As you can see, both are identical. This is on an indexed column. Same goes for an unindexed column (just full table scan).
There are no big differences really, but if your column_value is indexed, IN operator may not read it as an index.
Encountered this problem once, so be careful.
There is no difference when you are using it with a single value. If you will check the table scan, index scan, or index seek for the above two queries you will find that there is no difference between the two queries.
Is there any difference for same in Mysql and PostgresSQL?
No it would not have any difference on the two engines(Infact it would be same for most of the databases including SQL Server, Oracle etc). Both engines will convert IN to =
Teach a man to fish, etc. Here's how to see for yourself what variations on your queries will do:
mysql> EXPLAIN SELECT * FROM sentence WHERE sentence_lang_id = "AMH"\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sentence
type: ref
possible_keys: sentence_lang_id
key: sentence_lang_id
key_len: 153
ref: const
rows: 442
Extra: Using where
And let's try it the other way:
mysql> EXPLAIN SELECT * FROM sentence WHERE sentence_lang_id in ("AMH")\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sentence
type: ref
possible_keys: sentence_lang_id
key: sentence_lang_id
key_len: 153
ref: const
rows: 442
Extra: Using where
You can read here about how to interpret the results of a mysql EXPLAIN request. For now, note that we got identical output for both queries: exactly the same "execution plan" is generated. The type row tells us that the query uses a non-unique index (a foreign key, in this case), and the ref row tells us that the query is executed by comparing a constant value against this index.
For single IN Clause,there is no difference..below is demo using an EMPS table i have..
select * from emps where empid in (1)
select * from emps where empid=1
Predicate for First Query in execution plan:
[PerformanceV3].[dbo].[Emps].[empID]=CONVERT_IMPLICIT(int,[#1],0)
Predicate for second query in execution plan:
[PerformanceV3].[dbo].[Emps].[empID]=CONVERT_IMPLICIT(int,[#1],0)
If you have multiple values in IN Clause,its better to convert them to joins
Just to add a different perspective, one of the main points of rdbms systems is that they will rewrite your query for you, and pick the best execution plan for that query and all equivalent ones. This means that as long as two queries are logically identical, the should always generate the same execution plan on a given rdbms.
That being said, many queries are equivalent (same result set) but only because of constraints the database itself is unaware of, so be careful about those cases (E.g for a flag field with numbers 1-6, the db doesn't know <3 is the same as in (1,2)). But at the end of the day, if you're just thinking about legibility of and and or statements it won't make a difference for performance which way you write them.
Related
I have a table the_table with attributes the_table.id, the_table.firstVal and the_table.secondVal (the primary key is the_table.id, of course).
After defining an index over the first non-key attribute like this:
CREATE INDEX idx_firstval
ON the_table (firstVal);
The EXPLAIN result for the following disjunctive (OR) query
SELECT * FROM the_table WHERE the_table.firstVal = 'A' OR the_table.secondVal = 'B';
is
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
| 1 | SIMPLE | the_table | ALL | idx_firstval | NULL | NULL | NULL | 3436 | Using where
which shows that the index idx_firstval is not used. Now, the EXPLAIN result for the following conjunctive (AND) query
SELECT * FROM the_table WHERE the_table.firstVal = 'A' AND the_table.secondVal = 'B';
is
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
| 1 | SIMPLE | the_table | ref | idx_firstval | idx_firstval | 767 | const | 124 | Using index condition; Using where
which shows the index in use, this time around.
Why is MySQL choosing not to use indexes for the disjunctive query, but it is for the conjunctive one?
I've scoured SO, and as suggested by the answer in this thread, "using OR in a query will often cause the Query Optimizer to abandon use of index seeks and revert to scans". However, this doesn't answer why it happens, just that it does.
Another thread tries to answer why a disjunctive query doesn't use indexes, but I think it fails at doing so - it is merely concluded that the OP is using a small database. I'm wanting to know the difference between the disjunctive and the conjunctive case.
Because MySQL execution plan uses only one index for a table.
If MySQL uses range scan on idx_firstval to satisfy equality predicate on firstVal column, that leaves MySQL still needing to check the condition on secondVal column.
With the AND, MySQL only needs to check the rows returned from the range scan of the index. The set of rows that need to be checked is constrained by the condition.
With the OR, MySQL needs to check the rows that were not returned by the index range scan, all the rest of the rows in the table. Without an index, that means a full scan of the table. And if we're doing a full scan of the table to check secondVal, then it will be less expensive to check both conditions on the scan (i.e. a plan that includes an index accesses as well as a full scan will be more expensive.)
(If a composite index containing both firstVal and secondVal is available, then for the OR query, it is conceivable that optimizer might think its less expensive to check all the rows in the table by doing a full index scan, and then looking up the data pages.)
When we understand what operations are available to the optimizer, that's leads us to avoiding the OR and to rewrite the query, to return an equivalent resultset, with a query pattern that more explicitly defines a combination of two sets
SELECT a.*
FROM the_table a
WHERE a.firstVal = 'A'
UNION ALL
SELECT b.*
FROM the_table b
WHERE b.secondVal = 'B'
AND NOT ( b.firstVal <=> 'A' )
(Add an ORDER BY if we expect rows to be returned in a particular order)
I am surprised that MySQL is using an index for either of the two queries. The correct index to use here would be a composite index which covers the two columns in the WHERE clause:
CREATE INDEX idx ON the_table (firstVal, secondVal);
As to why MySQL is using the index in the second case, one possibility might be if most of the records in the_table have firstVal values which are not A. In this case, simply knowing that the equality the_table.firstVal = 'A' is false would mean that the entire outcome of the WHERE clause would be known (as false). So, the answer as to why the index is being used could have something to do with the cardinality of your exact data. But in any case, consider using the composite index to cover all bases.
How does SQL engines differ when we use equal sign and IN operator have same value? Does execution time changes?
1st one using equality check operator
WHERE column_value = 'All'
2nd one using IN operator and single value
WHERE column_value IN ('All')
Does SQL engine changes IN to = if only one value is there?
Is there any difference for same in MySQL and PostgreSQL?
There is no difference between those two statements, and the optimiser will transform the IN to the = when IN has just one element in it.
Though when you have a question like this, just run both statements, run their execution plan and see the differences. Here - you won't find any.
After a big search online, I found a document on SQL to support this (I assume it applies to all DBMS):
If there is only one value inside the parenthesis, this commend [sic] is equivalent to,
WHERE "column_name" = 'value1
Here is the execution plan of both queries in Oracle (most DBMS will process this the same):
EXPLAIN PLAN FOR
select * from dim_employees t
where t.identity_number = '123456789'
Plan hash value: 2312174735
-----------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID| DIM_EMPLOYEES |
| 2 | INDEX UNIQUE SCAN | SYS_C0029838 |
-----------------------------------------------------
And for IN() :
EXPLAIN PLAN FOR
select * from dim_employees t
where t.identity_number in('123456789');
Plan hash value: 2312174735
-----------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID| DIM_EMPLOYEES |
| 2 | INDEX UNIQUE SCAN | SYS_C0029838 |
-----------------------------------------------------
As you can see, both are identical. This is on an indexed column. Same goes for an unindexed column (just full table scan).
There are no big differences really, but if your column_value is indexed, IN operator may not read it as an index.
Encountered this problem once, so be careful.
There is no difference when you are using it with a single value. If you will check the table scan, index scan, or index seek for the above two queries you will find that there is no difference between the two queries.
Is there any difference for same in Mysql and PostgresSQL?
No it would not have any difference on the two engines(Infact it would be same for most of the databases including SQL Server, Oracle etc). Both engines will convert IN to =
Teach a man to fish, etc. Here's how to see for yourself what variations on your queries will do:
mysql> EXPLAIN SELECT * FROM sentence WHERE sentence_lang_id = "AMH"\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sentence
type: ref
possible_keys: sentence_lang_id
key: sentence_lang_id
key_len: 153
ref: const
rows: 442
Extra: Using where
And let's try it the other way:
mysql> EXPLAIN SELECT * FROM sentence WHERE sentence_lang_id in ("AMH")\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sentence
type: ref
possible_keys: sentence_lang_id
key: sentence_lang_id
key_len: 153
ref: const
rows: 442
Extra: Using where
You can read here about how to interpret the results of a mysql EXPLAIN request. For now, note that we got identical output for both queries: exactly the same "execution plan" is generated. The type row tells us that the query uses a non-unique index (a foreign key, in this case), and the ref row tells us that the query is executed by comparing a constant value against this index.
For single IN Clause,there is no difference..below is demo using an EMPS table i have..
select * from emps where empid in (1)
select * from emps where empid=1
Predicate for First Query in execution plan:
[PerformanceV3].[dbo].[Emps].[empID]=CONVERT_IMPLICIT(int,[#1],0)
Predicate for second query in execution plan:
[PerformanceV3].[dbo].[Emps].[empID]=CONVERT_IMPLICIT(int,[#1],0)
If you have multiple values in IN Clause,its better to convert them to joins
Just to add a different perspective, one of the main points of rdbms systems is that they will rewrite your query for you, and pick the best execution plan for that query and all equivalent ones. This means that as long as two queries are logically identical, the should always generate the same execution plan on a given rdbms.
That being said, many queries are equivalent (same result set) but only because of constraints the database itself is unaware of, so be careful about those cases (E.g for a flag field with numbers 1-6, the db doesn't know <3 is the same as in (1,2)). But at the end of the day, if you're just thinking about legibility of and and or statements it won't make a difference for performance which way you write them.
I have a query of such like
$query = "SELECT * FROM tbl_comments WHERE id=222 ORDER BY comment_time";
Do I need to add an index on the comment_time field?
Also, if I want to get the data between two dates then how should I build the index?
Yes, index will help you, when using ORDER BY. Because INDEX is a sorted data structure, so the request will be executed faster.
Look at this example: table test2 with 3 rows. I used LIMIT after order by to show the difference in execution.
DROP TABLE IF EXISTS `test2`;
CREATE TABLE `test2` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`value` varchar(10) CHARACTER SET utf8 COLLATE utf8_swedish_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `ix_value` (`value`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8;
-- ----------------------------
-- Records of test2
-- ----------------------------
INSERT INTO `test2` VALUES ('1', '10');
INSERT INTO `test2` VALUES ('2', '11');
INSERT INTO `test2` VALUES ('2', '9');
-- ----------------------------
-- Without INDEX
-- ----------------------------
mysql> EXPLAIN SELECT * FROM test2 ORDER BY value LIMIT 1\G
*************************** 1. row *************************
id: 1
select_type: SIMPLE
table: test2
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 3
Extra: Using filesort
1 row in set (0.00 sec)
MySQL checked 3 rows to output the result.
After CREATE INDEX, we get this:
mysql> CREATE INDEX ix_value ON test2 (value) USING BTREE;
Query OK, 0 rows affected (0.14 sec)
-- ----------------------------
-- With INDEX
-- ----------------------------
mysql> EXPLAIN SELECT * FROM test2 ORDER BY value LIMIT 1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: test2
type: index
possible_keys: NULL
key: ix_value
key_len: 32
ref: NULL
rows: 1
Extra: Using index
1 row in set (0.00 sec)
Now MySQL used only 1 row.
Answering the received comments, I tried the same query without LIMIT:
-- ----------------------------
-- Without INDEX
-- ----------------------------
mysql> EXPLAIN SELECT * FROM test2 ORDER BY value\G
*************************** 1. row ******************
id: 1
select_type: SIMPLE
table: test2
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 3
Extra: Using filesort
-- ----------------------------
-- With INDEX
-- ----------------------------
mysql> EXPLAIN SELECT * FROM test2 ORDER BY value\G
*************************** 1. row *****************
id: 1
select_type: SIMPLE
table: test2
type: index
possible_keys: NULL
key: ix_value
key_len: 32
ref: NULL
rows: 3
Extra: Using index
As we see, it uses index, for the 2-nd ORDER BY.
To build an index on your field, use this:
CREATE INDEX ix_comment_time ON tbl_comments (comment_time) USING BTREE;
http://dev.mysql.com/doc/refman/5.0/en/create-index.html
An index on the comment_time field might not help at all for a query like this:
SELECT *
FROM tbl_comments
WHERE id=222
ORDER BY comment_time;
The query needs to scan the the table to find the matching id values. It can do this by scanning the index, looking up the rows, and doing the test. If there is one row that matches and it has the highext comment_time, then this requires scanning the index and reading the table.
Without the index, it would scan the table, find the row, and very quickly sort the 1 row. The sequential scan of the table would typically be faster than an index scan followed by a page lookup (and would definitely be faster on a table larger than available memory).
On the other hand, an index on id, comment_time would be very helpful.
Technically you don't need indices on every field, as it will work too, however for performance reasons you might need one or more.
EDIT
This problem is known from the beginning of software design. Typically if you increase amount of memory used by the program, you will reduce its speed (assuming the program is well-written). Assigning an index to a field increases data used by the db, but makes searching faster. If you do not want to search anything by this field (you actually do in the question), it would not be necessary.
In modern era the indices are not so big comparing to disk data size and adding one or more should not be a bad idea.
Normally it is very difficult to surely tell "do I need index or not". Some help is provided by EXPLAIN statement (refer to the manual).
Regarding your first question, you don't have to create index on comment_time. If the number of records is very large you'll need indices to speed your retrieval. But for your operation you don't need indices.
For your second question using a WHERE Clause like this will help you.
WHERE(comment_time BETWEEN 'startDate' AND 'endDate');
You don't have to put the index on comment_time if your where id is distinct.
To increase the speed of retrieval of data you would need index. This will work with out index also. For your second question you can use WHERE and BETWEEN clause.
Refer: http://www.w3schools.com/sql/sql_between.asp
The EXPLAIN statement is very useful in situations like that. For your query, you would use it as follows:
EXPLAIN SELECT * FROM tbl_comments WHERE id=222 ORDER BY comment_time
This will output which indexes are being used to execute the query and allows you to perform experiments with different indexes to find the best configuration. In order to speed up sorting, you will want a BTREE index since it stores data in a sorted manner. To speed up finding items with a certain id, a HASH index is the better option since it provides quick lookups for equality predicates. Note that MySQL might not be able to use a combination of both indexes to execute your query and will instead use just one of them.
Further information: http://dev.mysql.com/doc/refman/5.7/en/using-explain.html
For range predicates, like dates in a range of dates, a BTREE index will perform better than a HASH index.
Further information: http://dev.mysql.com/doc/refman/5.7/en/create-index.html
From time to time I encounter a strange MySQL behavior. Let's assume I have indexes (type, rel, created), (type), (rel). The best choice for a query like this one:
SELECT id FROM tbl
WHERE rel = 3 AND type = 3
ORDER BY created;
would be to use index (type, rel, created).
But MySQL decides to intersect indexes (type) and (rel), and that leads to worse perfomance. Here is an example:
mysql> EXPLAIN
-> SELECT id FROM tbl
-> WHERE rel = 3 AND type = 3
-> ORDER BY created\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tbl
type: index_merge
possible_keys: idx_type,idx_rel,idx_rel_type_created
key: idx_type,idx_rel
key_len: 1,2
ref: NULL
rows: 4343
Extra: Using intersect(idx_type,idx_rel); Using where; Using filesort
And the same query, but with a hint added:
mysql> EXPLAIN
-> SELECT id FROM tbl USE INDEX (idx_type_rel_created)
-> WHERE rel = 3 AND type = 3
-> ORDER BY created\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tbl
type: ref
possible_keys: idx_type_rel_created
key: idx_type_rel_created
key_len: 3
ref: const,const
rows: 8906
Extra: Using where
I think MySQL takes an execution plan which contains less number in the "rows" column of the EXPLAIN command. From that point of view, index intersection with 4343 rows looks really better than using my combined index with 8906 rows. So, maybe the problem is within those numbers?
mysql> SELECT COUNT(*) FROM tbl WHERE type=3 AND rel=3;
+----------+
| COUNT(*) |
+----------+
| 3056 |
+----------+
From this I can conclude that MySQL is mistaken at calculating approximate number of rows for combined index.
So, what can I do here to make MySQL take the right execution plan?
I can not use optimizer hints, because I have to stick to Django ORM
The only solution I found yet is to remove those one-field indexes.
MySQL version is 5.1.49.
The table structure is:
CREATE TABLE tbl (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` tinyint(1) NOT NULL,
`rel` smallint(2) NOT NULL,
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_type` (`type`),
KEY `idx_rel` (`rel`),
KEY `idx_type_rel_created` (`type`,`rel`,`created`)
) ENGINE=MyISAM;
It's hard to tell exactly why MySQL chooses index_merge_intersection over the index scan, but you should note that with the composite indexes, statistics up to the given column are stored for the composite indexes.
The value of information_schema.statistics.cardinality for the column type of the composite index will show the cardinality of (rel, type), not type itself.
If there is a correlation between rel and type, then cardinality of (rel, type) will be less than product of cardinalities of rel and type taken separately from the indexes on corresponding columns.
That's why the number of rows is calculated incorrectly (an intersection cannot be larger in size than a union).
You can forbid index_merge_intersection by setting it to off in ##optimizer_switch:
SET optimizer_switch = 'index_merge_intersection=off'
Another thing is worth mentioning: you would not have the problem if you deleted the index on type only. the index is not required since it duplicates a part of the composite index.
Some time the intersection on same table could be interesting, and you may not want to remove an index on a single colum so as some other query work well with intersection.
In such case, if the bad execution plan concerns only one single query, a solution is to exclude the unwanted index. Il will then prevent the usage of intersection only for that sepcific query...
In your example :
SELECT id FROM tbl IGNORE INDEX(idx_type)
WHERE rel = 3 AND type = 3
ORDER BY created;
enter code here
MySQL questions are some of my favorites on StackOverflow.
Unfortunately, things like this:
SELECT foo, bar, baz, quux, frozzle, lambchops FROM something JOIN somethingelse ON 1=1 JOIN (SELECT * FROM areyouserious) v ON 0=5 WHERE lambchops = 'good';
make my eyes bleed.
Also, attempts at describing your schema often go like this:
I have a table CrazyTable with a column that is a date and it has a primary key of Foo_Key but I want to join on SOMETABLE using a substring of column_bar (which is in CrazyTable) which pertains to the phase of the moon (which I store in moon_phases as a thrice-serialized PHP array).
Here is an example of a question I asked, that had I not followed the steps below, I would never have gotten a satisfactory answer from anyone: I have no shame..
I will answer below with what helps me the most with getting the best answer to your question. What helps you?
Use SHOW CREATE TABLE
This tells me more about your tables than your words ever could:
mysql> show create table magic\G
*************************** 1. row ***************************
Table: magic
Create Table: CREATE TABLE `magic` (
`id` int(11) DEFAULT NULL,
`what` varchar(255) DEFAULT NULL,
`the` datetime DEFAULT NULL,
`heck` text,
`soup_is_good` double DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
CAVEAT: If you have 70 columns in your table, omit the unnecessary ones. What's necessary?
Fields JOINed on
Fields SELECTed
Fields WHEREed on
Use EXPLAIN
This allows me to see how best to optimize your currently working, yet presumably slow query:
mysql> explain select * from magic\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: magic
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1
Extra:
1 row in set (0.00 sec)
Use \G
Having to scroll right is generally an inconvenience.
Usual:
mysql> select * from magic;
+------------+-------------------------------+---------------------+-------------------+--------------+
| id | what | the | heck | soup_is_good |
+------------+-------------------------------+---------------------+-------------------+--------------+
| 1000000000 | A really long text string yay | 2009-07-29 22:28:17 | OOOH A TEXT FIELD | 100.5 |
+------------+-------------------------------+---------------------+-------------------+--------------+
1 row in set (0.00 sec)
Better:
mysql> select * from magic\G
*************************** 1. row ***************************
id: 1000000000
what: A really long text string yay
the: 2009-07-29 22:28:17
heck: OOOH A TEXT FIELD
soup_is_good: 100.5
1 row in set (0.00 sec)
CAVEAT: \G obviously turns one row of data into several. This becomes equally cumbersome for several rows of data. Do what looks best.
Use an external pastebin for obnoxiously large chunks of data:
Pastie
gist.github
Let us know your expectations
Slow? - We don't know what slow is to you. Seconds, minutes, hours? It helps to know.
Faster - We don't know this either. What's your expectation of fast?
Frequency - Is this a query that you plan to run just once? Daily? Hundreds or thousands of times a day? This helps us know when it's Good Enough.
Procedure Analyse
select * from yourtable procedure analyse()\G
The above will let others know the max and min values stored in the table. That helps.
Knowing which indexes you have on the tables concerned is vital, imo. You state you are using a substring of column_bar in the where clause - you may need to denormalize and store this substring in another column and then index it. There again cardinality of the column can make it worthless using an index on that column, if (for example) there are only 2 distinct values present. For a useful video tutorial on Performance Tuning Best Practices watch this youtube video by Jay Pipes.