MySQL OR vs IN performance

MySQL OR vs IN performance - mysql

I am wondering if there is any difference with regards to performance between the following
SELECT ... FROM ... WHERE someFIELD IN(1,2,3,4)
SELECT ... FROM ... WHERE someFIELD between 0 AND 5
SELECT ... FROM ... WHERE someFIELD = 1 OR someFIELD = 2 OR someFIELD = 3 ...
or will MySQL optimize the SQL in the same way compilers optimize code?
EDIT
Changed the AND's to OR's for the reason stated in the comments.

I needed to know this for sure, so I benchmarked both methods. I consistenly found IN to be much faster than using OR.
Do not believe people who give their "opinion", science is all about testing and evidence.
I ran a loop of 1000x the equivalent queries (for consistency, I used sql_no_cache):
IN: 2.34969592094s
OR: 5.83781504631s
Update:
(I don't have the source code for the original test, as it was 6 years ago, though it returns a result in the same range as this test)
In request for some sample code to test this, here is the simplest possible use case. Using Eloquent for syntax simplicity, raw SQL equivalent executes the same.
$t = microtime(true);
for($i=0; $i<10000; $i++):
$q = DB::table('users')->where('id',1)
->orWhere('id',2)
->orWhere('id',3)
->orWhere('id',4)
->orWhere('id',5)
->orWhere('id',6)
->orWhere('id',7)
->orWhere('id',8)
->orWhere('id',9)
->orWhere('id',10)
->orWhere('id',11)
->orWhere('id',12)
->orWhere('id',13)
->orWhere('id',14)
->orWhere('id',15)
->orWhere('id',16)
->orWhere('id',17)
->orWhere('id',18)
->orWhere('id',19)
->orWhere('id',20)->get();
endfor;
$t2 = microtime(true);
echo $t."\n".$t2."\n".($t2-$t)."\n";
1482080514.3635
1482080517.3713
3.0078368186951
$t = microtime(true);
for($i=0; $i<10000; $i++):
$q = DB::table('users')->whereIn('id',[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20])->get();
endfor;
$t2 = microtime(true);
echo $t."\n".$t2."\n".($t2-$t)."\n";
1482080534.0185
1482080536.178
2.1595389842987

I also did a test for future Googlers. Total count of returned results is 7264 out of 10000
SELECT * FROM item WHERE id = 1 OR id = 2 ... id = 10000
This query took 0.1239 seconds
SELECT * FROM item WHERE id IN (1,2,3,...10000)
This query took 0.0433 seconds
IN is 3 times faster than OR

The accepted answer doesn't explain the reason.
Below are quoted from High Performance MySQL, 3rd Edition.
In many database servers, IN() is just a synonym for multiple OR clauses, because the two are logically equivalent. Not so in MySQL, which sorts the values in the IN() list and uses a fast binary search to see whether a value is in the list. This is O(Log n) in the size of the list, whereas an equivalent series of OR clauses is O(n) in the size of the list (i.e., much slower for large lists)

I think the BETWEEN will be faster since it should be converted into:
Field >= 0 AND Field <= 5
It is my understanding that an IN will be converted to a bunch of OR statements anyway. The value of IN is the ease of use. (Saving on having to type each column name multiple times and also makes it easier to use with existing logic - you don't have to worry about AND/OR precedence because the IN is one statement. With a bunch of OR statements, you have to ensure you surround them with parentheses to make sure they are evaluated as one condition.)
The only real answer to your question is PROFILE YOUR QUERIES. Then you will know what works best in your particular situation.

It depends on what you are doing; how wide is the range, what is the data type (I know your example uses a numeric data type but your question can also apply to a lot of different data types).
This is an instance where you want to write the query both ways; get it working and then use EXPLAIN to figure out the execution differences.
I'm sure there is a concrete answer to this but this is how I would, practically speaking, figure out the answer for my given question.
This might be of some help: http://forge.mysql.com/wiki/Top10SQLPerformanceTips
Regards,
Frank

I think one explanation to sunseeker's observation is MySQL actually sort the values in the IN statement if they are all static values and using binary search, which is more efficient than the plain OR alternative. I can't remember where I've read that, but sunseeker's result seems to be a proof.

Just when you thought it was safe...
What is your value of eq_range_index_dive_limit? In particular, do you have more or fewer items in the IN clause?
This will not include a Benchmark, but will peer into the inner workings a little. Let's use a tool to see what is going on -- Optimizer Trace.
The query: SELECT * FROM canada WHERE id ...
With an OR of 3 values, part of the trace looks like:
"condition_processing": {
"condition": "WHERE",
"original_condition": "((`canada`.`id` = 296172) or (`canada`.`id` = 295093) or (`canada`.`id` = 293626))",
"steps": [
{
"transformation": "equality_propagation",
"resulting_condition": "(multiple equal(296172, `canada`.`id`) or multiple equal(295093, `canada`.`id`) or multiple equal(293626, `canada`.`id`))"
},
...
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "id",
"ranges": [
"293626 <= id <= 293626",
"295093 <= id <= 295093",
"296172 <= id <= 296172"
],
"index_dives_for_eq_ranges": true,
"chosen": true
...
"refine_plan": [
{
"table": "`canada`",
"pushed_index_condition": "((`canada`.`id` = 296172) or (`canada`.`id` = 295093) or (`canada`.`id` = 293626))",
"table_condition_attached": null,
"access_type": "range"
}
]
Note how ICP is being given ORs. This implies that OR is not turned into IN, and InnoDB will be performing a bunch of = tests through ICP. (I do not feel it is worth considering MyISAM.)
(This is Percona's 5.6.22-71.0-log; id is a secondary index.)
Now for IN() with a few values
eq_range_index_dive_limit = 10; there are 8 values.
"condition_processing": {
"condition": "WHERE",
"original_condition": "(`canada`.`id` in (296172,295093,293626,295573,297148,296127,295588,295810))",
"steps": [
{
"transformation": "equality_propagation",
"resulting_condition": "(`canada`.`id` in (296172,295093,293626,295573,297148,296127,295588,295810))"
},
...
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "id",
"ranges": [
"293626 <= id <= 293626",
"295093 <= id <= 295093",
"295573 <= id <= 295573",
"295588 <= id <= 295588",
"295810 <= id <= 295810",
"296127 <= id <= 296127",
"296172 <= id <= 296172",
"297148 <= id <= 297148"
],
"index_dives_for_eq_ranges": true,
"chosen": true
...
"refine_plan": [
{
"table": "`canada`",
"pushed_index_condition": "(`canada`.`id` in (296172,295093,293626,295573,297148,296127,295588,295810))",
"table_condition_attached": null,
"access_type": "range"
}
]
Note that the IN does not seem to be turned into OR.
A side note: Notice that the constant values were sorted. This can be beneficial in two ways:
By jumping around less, there may be better caching, less I/O to get to all the values.
If two similar queries are coming from separate connections, and they are in transactions, there is a better chance of getting a delay instead of a deadlock due to overlapping lists.
Finally, IN() with a lots of values
{
"condition_processing": {
"condition": "WHERE",
"original_condition": "(`canada`.`id` in (293831,292259,292881,293440,292558,295792,292293,292593,294337,295430,295034,297060,293811,295587,294651,295559,293213,295742,292605,296018,294529,296711,293919,294732,294689,295540,293000,296916,294433,297112,293815,292522,296816,293320,293232,295369,291894,293700,291839,293049,292738,294895,294473,294023,294173,293019,291976,294923,294797,296958,294075,293450,296952,297185,295351,295736,296312,294330,292717,294638,294713,297176,295896,295137,296573,292236,294966,296642,296073,295903,293057,294628,292639,293803,294470,295353,297196,291752,296118,296964,296185,295338,295956,296064,295039,297201,297136,295206,295986,292172,294803,294480,294706,296975,296604,294493,293181,292526,293354,292374,292344,293744,294165,295082,296203,291918,295211,294289,294877,293120,295387))",
"steps": [
{
"transformation": "equality_propagation",
"resulting_condition": "(`canada`.`id` in (293831,292259,292881,293440,292558,295792,292293,292593,294337,295430,295034,297060,293811,295587,294651,295559,293213,295742,292605,296018,294529,296711,293919,294732,294689,295540,293000,296916,294433,297112,293815,292522,296816,293320,293232,295369,291894,293700,291839,293049,292738,294895,294473,294023,294173,293019,291976,294923,294797,296958,294075,293450,296952,297185,295351,295736,296312,294330,292717,294638,294713,297176,295896,295137,296573,292236,294966,296642,296073,295903,293057,294628,292639,293803,294470,295353,297196,291752,296118,296964,296185,295338,295956,296064,295039,297201,297136,295206,295986,292172,294803,294480,294706,296975,296604,294493,293181,292526,293354,292374,292344,293744,294165,295082,296203,291918,295211,294289,294877,293120,295387))"
},
...
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "id",
"ranges": [
"291752 <= id <= 291752",
"291839 <= id <= 291839",
...
"297196 <= id <= 297196",
"297201 <= id <= 297201"
],
"index_dives_for_eq_ranges": false,
"rows": 111,
"chosen": true
...
"refine_plan": [
{
"table": "`canada`",
"pushed_index_condition": "(`canada`.`id` in (293831,292259,292881,293440,292558,295792,292293,292593,294337,295430,295034,297060,293811,295587,294651,295559,293213,295742,292605,296018,294529,296711,293919,294732,294689,295540,293000,296916,294433,297112,293815,292522,296816,293320,293232,295369,291894,293700,291839,293049,292738,294895,294473,294023,294173,293019,291976,294923,294797,296958,294075,293450,296952,297185,295351,295736,296312,294330,292717,294638,294713,297176,295896,295137,296573,292236,294966,296642,296073,295903,293057,294628,292639,293803,294470,295353,297196,291752,296118,296964,296185,295338,295956,296064,295039,297201,297136,295206,295986,292172,294803,294480,294706,296975,296604,294493,293181,292526,293354,292374,292344,293744,294165,295082,296203,291918,295211,294289,294877,293120,295387))",
"table_condition_attached": null,
"access_type": "range"
}
]
Side note: I needed this due to the bulkiness of the trace:
##global.optimizer_trace_max_mem_size = 32222;

OR will be slowest. Whether IN or BETWEEN is faster will depend on your data, but I'd expect BETWEEN to be faster normally as it can simple take a range from an index (assuming someField is indexed).

Below are details of 6 queries using MySQL 5.6 #SQLFiddle
In summary the 6 queries cover independently indexed columns and 2 queries were used per data type. All queries resulted in use of an index regardless of IN() or ORs being used.
| ORs | IN()
integer | uses index | uses index
date | uses index | uses index
varchar | uses index | uses index
I really just wanted to debunk statements made that OR means no index can be used. This isn't true. Indexes can be used in queries using OR as the 6 queries in the following examples display.
Also it seems to me that many have ignored the fact that IN() is a syntax shortcut for a set of ORs. At small scale perfomance differences between using IN() -v- OR are extremely (infintessinally) marginal.
While at larger scale IN() is certainly more convenient, but it sill equates to a set of OR conditions logically. Circumstance change for each query so testing your query on your tables is always best.
Summary of the 6 explain plans, all "Using index condition" (scroll right)
Query select_type table type possible_keys key key_len ref rows filtered Extra
------------- --------- ------- --------------- ----------- --------- ----- ------ ---------- -----------------------
Integers using OR SIMPLE mytable range aNum_idx aNum_idx 4 10 100.00 Using index condition
Integers using IN SIMPLE mytable range aNum_idx aNum_idx 4 10 100.00 Using index condition
Dates using OR SIMPLE mytable range aDate_idx aDate_idx 6 7 100.00 Using index condition
Dates using IN SIMPLE mytable range aDate_idx aDate_idx 6 7 100.00 Using index condition
Varchar using OR SIMPLE mytable range aName_idx aName_idx 768 10 100.00 Using index condition
Varchar using IN SIMPLE mytable range aName_idx aName_idx 768 10 100.00 Using index condition
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE `myTable` (
`id` mediumint(8) unsigned NOT NULL auto_increment,
`aName` varchar(255) default NULL,
`aDate` datetime,
`aNum` mediumint(8),
PRIMARY KEY (`id`)
) AUTO_INCREMENT=1;
ALTER TABLE `myTable` ADD INDEX `aName_idx` (`aName`);
ALTER TABLE `myTable` ADD INDEX `aDate_idx` (`aDate`);
ALTER TABLE `myTable` ADD INDEX `aNum_idx` (`aNum`);
INSERT INTO `myTable` (`aName`,`aDate`)
VALUES
("Daniel","2017-09-19 01:22:31")
,("Quentin","2017-06-03 01:06:45")
,("Chester","2017-06-14 17:49:36")
,("Lev","2017-08-30 06:27:59")
,("Garrett","2018-10-04 02:40:37")
,("Lane","2017-01-22 17:11:21")
,("Chaim","2017-09-20 11:13:46")
,("Kieran","2018-03-10 18:37:26")
,("Cedric","2017-05-20 16:25:10")
,("Conan","2018-07-10 06:29:39")
,("Rudyard","2017-07-14 00:04:00")
,("Chadwick","2018-08-18 08:54:08")
,("Darius","2018-10-02 06:55:56")
,("Joseph","2017-06-19 13:20:33")
,("Wayne","2017-04-02 23:20:25")
,("Hall","2017-10-13 00:17:24")
,("Craig","2016-12-04 08:15:22")
,("Keane","2018-03-12 04:21:46")
,("Russell","2017-07-14 17:21:58")
,("Seth","2018-07-25 05:51:30")
,("Cole","2018-06-09 15:32:53")
,("Donovan","2017-08-12 05:21:35")
,("Damon","2017-06-27 03:44:19")
,("Brian","2017-02-01 23:35:20")
,("Harper","2017-08-25 04:29:27")
,("Chandler","2017-09-30 23:54:06")
,("Edward","2018-07-30 12:18:07")
,("Curran","2018-05-23 09:31:53")
,("Uriel","2017-05-08 03:31:43")
,("Honorato","2018-04-07 14:57:53")
,("Griffin","2017-01-07 23:35:31")
,("Hasad","2017-05-15 05:32:41")
,("Burke","2017-07-04 01:11:19")
,("Hyatt","2017-03-14 17:12:28")
,("Brenden","2017-10-17 05:16:14")
,("Ryan","2018-10-10 08:07:55")
,("Giacomo","2018-10-06 14:21:21")
,("James","2018-02-06 02:45:59")
,("Colt","2017-10-10 08:11:26")
,("Kermit","2017-09-18 16:57:16")
,("Drake","2018-05-20 22:08:36")
,("Berk","2017-04-16 17:39:32")
,("Alan","2018-09-01 05:33:05")
,("Deacon","2017-04-20 07:03:05")
,("Omar","2018-03-02 15:04:32")
,("Thaddeus","2017-09-19 04:07:54")
,("Troy","2016-12-13 04:24:08")
,("Rogan","2017-11-02 00:03:25")
,("Grant","2017-08-21 01:45:16")
,("Walker","2016-11-26 15:54:52")
,("Clarke","2017-07-20 02:26:56")
,("Clayton","2018-08-16 05:09:29")
,("Denton","2018-08-11 05:26:05")
,("Nicholas","2018-07-19 09:29:55")
,("Hashim","2018-08-10 20:38:06")
,("Todd","2016-10-25 01:01:36")
,("Xenos","2017-05-11 22:50:35")
,("Bert","2017-06-17 18:08:21")
,("Oleg","2018-01-03 13:10:32")
,("Hall","2018-06-04 01:53:45")
,("Evan","2017-01-16 01:04:25")
,("Mohammad","2016-11-18 05:42:52")
,("Armand","2016-12-18 06:57:57")
,("Kaseem","2018-06-12 23:09:57")
,("Colin","2017-06-29 05:25:52")
,("Arthur","2016-12-29 04:38:13")
,("Xander","2016-11-14 19:35:32")
,("Dante","2016-12-01 09:01:04")
,("Zahir","2018-02-17 14:44:53")
,("Raymond","2017-03-09 05:33:06")
,("Giacomo","2017-04-17 06:12:52")
,("Fulton","2017-06-04 00:41:57")
,("Chase","2018-01-14 03:03:57")
,("William","2017-05-08 09:44:59")
,("Fuller","2017-03-31 20:35:20")
,("Jarrod","2017-02-15 02:45:29")
,("Nissim","2018-03-11 14:19:25")
,("Chester","2017-11-05 00:14:27")
,("Perry","2017-12-24 11:58:04")
,("Theodore","2017-06-26 12:34:12")
,("Mason","2017-10-02 03:53:49")
,("Brenden","2018-10-08 10:09:47")
,("Jerome","2017-11-05 20:34:25")
,("Keaton","2018-08-18 00:55:56")
,("Tiger","2017-05-21 16:59:07")
,("Benjamin","2018-04-10 14:46:36")
,("John","2018-09-05 18:53:03")
,("Jakeem","2018-10-11 00:17:38")
,("Kenyon","2017-12-18 22:19:29")
,("Ferris","2017-03-29 06:59:13")
,("Hoyt","2017-01-03 03:48:56")
,("Fitzgerald","2017-07-27 11:27:52")
,("Forrest","2017-10-05 23:14:21")
,("Jordan","2017-01-11 03:48:09")
,("Lev","2017-05-25 08:03:39")
,("Chase","2017-06-18 19:09:23")
,("Ryder","2016-12-13 12:50:50")
,("Malik","2017-11-19 15:15:55")
,("Zeph","2018-04-04 11:22:12")
,("Amala","2017-01-29 07:52:17")
;
.
update MyTable
set aNum = id
;
Query 1:
select 'aNum by OR' q, mytable.*
from mytable
where aNum = 12
OR aNum = 22
OR aNum = 27
OR aNum = 32
OR aNum = 42
OR aNum = 52
OR aNum = 62
OR aNum = 65
OR aNum = 72
OR aNum = 82
Results:
| q | id | aName | aDate | aNum |
|------------|----|----------|----------------------|------|
| aNum by OR | 12 | Chadwick | 2018-08-18T08:54:08Z | 12 |
| aNum by OR | 22 | Donovan | 2017-08-12T05:21:35Z | 22 |
| aNum by OR | 27 | Edward | 2018-07-30T12:18:07Z | 27 |
| aNum by OR | 32 | Hasad | 2017-05-15T05:32:41Z | 32 |
| aNum by OR | 42 | Berk | 2017-04-16T17:39:32Z | 42 |
| aNum by OR | 52 | Clayton | 2018-08-16T05:09:29Z | 52 |
| aNum by OR | 62 | Mohammad | 2016-11-18T05:42:52Z | 62 |
| aNum by OR | 65 | Colin | 2017-06-29T05:25:52Z | 65 |
| aNum by OR | 72 | Fulton | 2017-06-04T00:41:57Z | 72 |
| aNum by OR | 82 | Brenden | 2018-10-08T10:09:47Z | 82 |
Query 2:
select 'aNum by IN' q, mytable.*
from mytable
where aNum IN (
12
, 22
, 27
, 32
, 42
, 52
, 62
, 65
, 72
, 82
)
Results:
| q | id | aName | aDate | aNum |
|------------|----|----------|----------------------|------|
| aNum by IN | 12 | Chadwick | 2018-08-18T08:54:08Z | 12 |
| aNum by IN | 22 | Donovan | 2017-08-12T05:21:35Z | 22 |
| aNum by IN | 27 | Edward | 2018-07-30T12:18:07Z | 27 |
| aNum by IN | 32 | Hasad | 2017-05-15T05:32:41Z | 32 |
| aNum by IN | 42 | Berk | 2017-04-16T17:39:32Z | 42 |
| aNum by IN | 52 | Clayton | 2018-08-16T05:09:29Z | 52 |
| aNum by IN | 62 | Mohammad | 2016-11-18T05:42:52Z | 62 |
| aNum by IN | 65 | Colin | 2017-06-29T05:25:52Z | 65 |
| aNum by IN | 72 | Fulton | 2017-06-04T00:41:57Z | 72 |
| aNum by IN | 82 | Brenden | 2018-10-08T10:09:47Z | 82 |
Query 3:
select 'adate by OR' q, mytable.*
from mytable
where aDate= str_to_date("2017-02-15 02:45:29",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2018-03-10 18:37:26",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2017-05-20 16:25:10",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2018-07-10 06:29:39",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2017-07-14 00:04:00",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2018-08-18 08:54:08",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2018-10-02 06:55:56",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2017-04-20 07:03:05",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2018-03-02 15:04:32",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2017-09-19 04:07:54",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2016-12-13 04:24:08",'%Y-%m-%d %h:%i:%s')
Results:
| q | id | aName | aDate | aNum |
|-------------|----|----------|----------------------|------|
| adate by OR | 47 | Troy | 2016-12-13T04:24:08Z | 47 |
| adate by OR | 76 | Jarrod | 2017-02-15T02:45:29Z | 76 |
| adate by OR | 44 | Deacon | 2017-04-20T07:03:05Z | 44 |
| adate by OR | 46 | Thaddeus | 2017-09-19T04:07:54Z | 46 |
| adate by OR | 10 | Conan | 2018-07-10T06:29:39Z | 10 |
| adate by OR | 12 | Chadwick | 2018-08-18T08:54:08Z | 12 |
| adate by OR | 13 | Darius | 2018-10-02T06:55:56Z | 13 |
Query 4:
select 'adate by IN' q, mytable.*
from mytable
where aDate IN (
str_to_date("2017-02-15 02:45:29",'%Y-%m-%d %h:%i:%s')
, str_to_date("2018-03-10 18:37:26",'%Y-%m-%d %h:%i:%s')
, str_to_date("2017-05-20 16:25:10",'%Y-%m-%d %h:%i:%s')
, str_to_date("2018-07-10 06:29:39",'%Y-%m-%d %h:%i:%s')
, str_to_date("2017-07-14 00:04:00",'%Y-%m-%d %h:%i:%s')
, str_to_date("2018-08-18 08:54:08",'%Y-%m-%d %h:%i:%s')
, str_to_date("2018-10-02 06:55:56",'%Y-%m-%d %h:%i:%s')
, str_to_date("2017-04-20 07:03:05",'%Y-%m-%d %h:%i:%s')
, str_to_date("2018-03-02 15:04:32",'%Y-%m-%d %h:%i:%s')
, str_to_date("2017-09-19 04:07:54",'%Y-%m-%d %h:%i:%s')
, str_to_date("2016-12-13 04:24:08",'%Y-%m-%d %h:%i:%s')
)
Results:
| q | id | aName | aDate | aNum |
|-------------|----|----------|----------------------|------|
| adate by IN | 47 | Troy | 2016-12-13T04:24:08Z | 47 |
| adate by IN | 76 | Jarrod | 2017-02-15T02:45:29Z | 76 |
| adate by IN | 44 | Deacon | 2017-04-20T07:03:05Z | 44 |
| adate by IN | 46 | Thaddeus | 2017-09-19T04:07:54Z | 46 |
| adate by IN | 10 | Conan | 2018-07-10T06:29:39Z | 10 |
| adate by IN | 12 | Chadwick | 2018-08-18T08:54:08Z | 12 |
| adate by IN | 13 | Darius | 2018-10-02T06:55:56Z | 13 |
Query 5:
select 'name by OR' q, mytable.*
from mytable
where aname = 'Alan'
OR aname = 'Brian'
OR aname = 'Chandler'
OR aname = 'Darius'
OR aname = 'Evan'
OR aname = 'Ferris'
OR aname = 'Giacomo'
OR aname = 'Hall'
OR aname = 'James'
OR aname = 'Jarrod'
Results:
| q | id | aName | aDate | aNum |
|-------------|----|----------|----------------------|------|
| name by OR | 43 | Alan | 2018-09-01T05:33:05Z | 43 |
| name by OR | 24 | Brian | 2017-02-01T23:35:20Z | 24 |
| name by OR | 26 | Chandler | 2017-09-30T23:54:06Z | 26 |
| name by OR | 13 | Darius | 2018-10-02T06:55:56Z | 13 |
| name by OR | 61 | Evan | 2017-01-16T01:04:25Z | 61 |
| name by OR | 90 | Ferris | 2017-03-29T06:59:13Z | 90 |
| name by OR | 37 | Giacomo | 2018-10-06T14:21:21Z | 37 |
| name by OR | 71 | Giacomo | 2017-04-17T06:12:52Z | 71 |
| name by OR | 16 | Hall | 2017-10-13T00:17:24Z | 16 |
| name by OR | 60 | Hall | 2018-06-04T01:53:45Z | 60 |
| name by OR | 38 | James | 2018-02-06T02:45:59Z | 38 |
| name by OR | 76 | Jarrod | 2017-02-15T02:45:29Z | 76 |
Query 6:
select 'name by IN' q, mytable.*
from mytable
where aname IN (
'Alan'
,'Brian'
,'Chandler'
, 'Darius'
, 'Evan'
, 'Ferris'
, 'Giacomo'
, 'Hall'
, 'James'
, 'Jarrod'
)
Results:
| q | id | aName | aDate | aNum |
|------------|----|----------|----------------------|------|
| name by IN | 43 | Alan | 2018-09-01T05:33:05Z | 43 |
| name by IN | 24 | Brian | 2017-02-01T23:35:20Z | 24 |
| name by IN | 26 | Chandler | 2017-09-30T23:54:06Z | 26 |
| name by IN | 13 | Darius | 2018-10-02T06:55:56Z | 13 |
| name by IN | 61 | Evan | 2017-01-16T01:04:25Z | 61 |
| name by IN | 90 | Ferris | 2017-03-29T06:59:13Z | 90 |
| name by IN | 37 | Giacomo | 2018-10-06T14:21:21Z | 37 |
| name by IN | 71 | Giacomo | 2017-04-17T06:12:52Z | 71 |
| name by IN | 16 | Hall | 2017-10-13T00:17:24Z | 16 |
| name by IN | 60 | Hall | 2018-06-04T01:53:45Z | 60 |
| name by IN | 38 | James | 2018-02-06T02:45:59Z | 38 |
| name by IN | 76 | Jarrod | 2017-02-15T02:45:29Z | 76 |

2018: IN (...) is faster. But >= && <= is even faster than IN.
Here's my benchmark.

I'll bet they are the same, you can run a test by doing the following:
loop over the "in (1,2,3,4)" 500 times and see how long it takes. loop over the "=1 or =2 or=3..." version 500 times and seeing how long it runs.
you could also try a join way, if someField is an index and your table is big it could be faster...
SELECT ...
FROM ...
INNER JOIN (SELECT 1 as newField UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4) dt ON someFIELD =newField
I tried the join method above on my SQL Server and it is nearly the same as the in (1,2,3,4), and they both result in a clustered index seek. I'm not sure how MySQL will handle them.

As explained by others, IN is better chosen than OR with respect to query performance.
Queries with OR condition might take more longer execution time in the below cases.
to execute if MySQL optimizer choses any other index to be efficient(during false positive cases).
If the number of records is more ( As clearly stated by Jacob )

From what I understand about the way that the compiler optimizes these types of queries, using the IN clause is more efficient than multiple OR clauses. If you have values where the BETWEEN clause can be used, that is more efficient still.

I know that, as long as you have an index on Field, the BETWEEN will use it to quickly find one end, then traverse to the other. This is most efficient.
Every EXPLAIN I've seen shows "IN ( ... )" and " ... OR ..." to be interchangeable and equally (in)efficient. Which you would expect, since the optimizer has no way to know whether or not they comprise an interval. It's also equivalent to a UNION ALL SELECT on the individual values.

Related

which query is faster? and why? [duplicate]

I am wondering if there is any difference with regards to performance between the following
SELECT ... FROM ... WHERE someFIELD IN(1,2,3,4)
SELECT ... FROM ... WHERE someFIELD between 0 AND 5
SELECT ... FROM ... WHERE someFIELD = 1 OR someFIELD = 2 OR someFIELD = 3 ...
or will MySQL optimize the SQL in the same way compilers optimize code?
EDIT
Changed the AND's to OR's for the reason stated in the comments.

I needed to know this for sure, so I benchmarked both methods. I consistenly found IN to be much faster than using OR.
Do not believe people who give their "opinion", science is all about testing and evidence.
I ran a loop of 1000x the equivalent queries (for consistency, I used sql_no_cache):
IN: 2.34969592094s
OR: 5.83781504631s
Update:
(I don't have the source code for the original test, as it was 6 years ago, though it returns a result in the same range as this test)
In request for some sample code to test this, here is the simplest possible use case. Using Eloquent for syntax simplicity, raw SQL equivalent executes the same.
$t = microtime(true);
for($i=0; $i<10000; $i++):
$q = DB::table('users')->where('id',1)
->orWhere('id',2)
->orWhere('id',3)
->orWhere('id',4)
->orWhere('id',5)
->orWhere('id',6)
->orWhere('id',7)
->orWhere('id',8)
->orWhere('id',9)
->orWhere('id',10)
->orWhere('id',11)
->orWhere('id',12)
->orWhere('id',13)
->orWhere('id',14)
->orWhere('id',15)
->orWhere('id',16)
->orWhere('id',17)
->orWhere('id',18)
->orWhere('id',19)
->orWhere('id',20)->get();
endfor;
$t2 = microtime(true);
echo $t."\n".$t2."\n".($t2-$t)."\n";
1482080514.3635
1482080517.3713
3.0078368186951
$t = microtime(true);
for($i=0; $i<10000; $i++):
$q = DB::table('users')->whereIn('id',[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20])->get();
endfor;
$t2 = microtime(true);
echo $t."\n".$t2."\n".($t2-$t)."\n";
1482080534.0185
1482080536.178
2.1595389842987

I also did a test for future Googlers. Total count of returned results is 7264 out of 10000
SELECT * FROM item WHERE id = 1 OR id = 2 ... id = 10000
This query took 0.1239 seconds
SELECT * FROM item WHERE id IN (1,2,3,...10000)
This query took 0.0433 seconds
IN is 3 times faster than OR

The accepted answer doesn't explain the reason.
Below are quoted from High Performance MySQL, 3rd Edition.
In many database servers, IN() is just a synonym for multiple OR clauses, because the two are logically equivalent. Not so in MySQL, which sorts the values in the IN() list and uses a fast binary search to see whether a value is in the list. This is O(Log n) in the size of the list, whereas an equivalent series of OR clauses is O(n) in the size of the list (i.e., much slower for large lists)

I think the BETWEEN will be faster since it should be converted into:
Field >= 0 AND Field <= 5
It is my understanding that an IN will be converted to a bunch of OR statements anyway. The value of IN is the ease of use. (Saving on having to type each column name multiple times and also makes it easier to use with existing logic - you don't have to worry about AND/OR precedence because the IN is one statement. With a bunch of OR statements, you have to ensure you surround them with parentheses to make sure they are evaluated as one condition.)
The only real answer to your question is PROFILE YOUR QUERIES. Then you will know what works best in your particular situation.

It depends on what you are doing; how wide is the range, what is the data type (I know your example uses a numeric data type but your question can also apply to a lot of different data types).
This is an instance where you want to write the query both ways; get it working and then use EXPLAIN to figure out the execution differences.
I'm sure there is a concrete answer to this but this is how I would, practically speaking, figure out the answer for my given question.
This might be of some help: http://forge.mysql.com/wiki/Top10SQLPerformanceTips
Regards,
Frank

I think one explanation to sunseeker's observation is MySQL actually sort the values in the IN statement if they are all static values and using binary search, which is more efficient than the plain OR alternative. I can't remember where I've read that, but sunseeker's result seems to be a proof.

Just when you thought it was safe...
What is your value of eq_range_index_dive_limit? In particular, do you have more or fewer items in the IN clause?
This will not include a Benchmark, but will peer into the inner workings a little. Let's use a tool to see what is going on -- Optimizer Trace.
The query: SELECT * FROM canada WHERE id ...
With an OR of 3 values, part of the trace looks like:
"condition_processing": {
"condition": "WHERE",
"original_condition": "((`canada`.`id` = 296172) or (`canada`.`id` = 295093) or (`canada`.`id` = 293626))",
"steps": [
{
"transformation": "equality_propagation",
"resulting_condition": "(multiple equal(296172, `canada`.`id`) or multiple equal(295093, `canada`.`id`) or multiple equal(293626, `canada`.`id`))"
},
...
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "id",
"ranges": [
"293626 <= id <= 293626",
"295093 <= id <= 295093",
"296172 <= id <= 296172"
],
"index_dives_for_eq_ranges": true,
"chosen": true
...
"refine_plan": [
{
"table": "`canada`",
"pushed_index_condition": "((`canada`.`id` = 296172) or (`canada`.`id` = 295093) or (`canada`.`id` = 293626))",
"table_condition_attached": null,
"access_type": "range"
}
]
Note how ICP is being given ORs. This implies that OR is not turned into IN, and InnoDB will be performing a bunch of = tests through ICP. (I do not feel it is worth considering MyISAM.)
(This is Percona's 5.6.22-71.0-log; id is a secondary index.)
Now for IN() with a few values
eq_range_index_dive_limit = 10; there are 8 values.
"condition_processing": {
"condition": "WHERE",
"original_condition": "(`canada`.`id` in (296172,295093,293626,295573,297148,296127,295588,295810))",
"steps": [
{
"transformation": "equality_propagation",
"resulting_condition": "(`canada`.`id` in (296172,295093,293626,295573,297148,296127,295588,295810))"
},
...
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "id",
"ranges": [
"293626 <= id <= 293626",
"295093 <= id <= 295093",
"295573 <= id <= 295573",
"295588 <= id <= 295588",
"295810 <= id <= 295810",
"296127 <= id <= 296127",
"296172 <= id <= 296172",
"297148 <= id <= 297148"
],
"index_dives_for_eq_ranges": true,
"chosen": true
...
"refine_plan": [
{
"table": "`canada`",
"pushed_index_condition": "(`canada`.`id` in (296172,295093,293626,295573,297148,296127,295588,295810))",
"table_condition_attached": null,
"access_type": "range"
}
]
Note that the IN does not seem to be turned into OR.
A side note: Notice that the constant values were sorted. This can be beneficial in two ways:
By jumping around less, there may be better caching, less I/O to get to all the values.
If two similar queries are coming from separate connections, and they are in transactions, there is a better chance of getting a delay instead of a deadlock due to overlapping lists.
Finally, IN() with a lots of values
{
"condition_processing": {
"condition": "WHERE",
"original_condition": "(`canada`.`id` in (293831,292259,292881,293440,292558,295792,292293,292593,294337,295430,295034,297060,293811,295587,294651,295559,293213,295742,292605,296018,294529,296711,293919,294732,294689,295540,293000,296916,294433,297112,293815,292522,296816,293320,293232,295369,291894,293700,291839,293049,292738,294895,294473,294023,294173,293019,291976,294923,294797,296958,294075,293450,296952,297185,295351,295736,296312,294330,292717,294638,294713,297176,295896,295137,296573,292236,294966,296642,296073,295903,293057,294628,292639,293803,294470,295353,297196,291752,296118,296964,296185,295338,295956,296064,295039,297201,297136,295206,295986,292172,294803,294480,294706,296975,296604,294493,293181,292526,293354,292374,292344,293744,294165,295082,296203,291918,295211,294289,294877,293120,295387))",
"steps": [
{
"transformation": "equality_propagation",
"resulting_condition": "(`canada`.`id` in (293831,292259,292881,293440,292558,295792,292293,292593,294337,295430,295034,297060,293811,295587,294651,295559,293213,295742,292605,296018,294529,296711,293919,294732,294689,295540,293000,296916,294433,297112,293815,292522,296816,293320,293232,295369,291894,293700,291839,293049,292738,294895,294473,294023,294173,293019,291976,294923,294797,296958,294075,293450,296952,297185,295351,295736,296312,294330,292717,294638,294713,297176,295896,295137,296573,292236,294966,296642,296073,295903,293057,294628,292639,293803,294470,295353,297196,291752,296118,296964,296185,295338,295956,296064,295039,297201,297136,295206,295986,292172,294803,294480,294706,296975,296604,294493,293181,292526,293354,292374,292344,293744,294165,295082,296203,291918,295211,294289,294877,293120,295387))"
},
...
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "id",
"ranges": [
"291752 <= id <= 291752",
"291839 <= id <= 291839",
...
"297196 <= id <= 297196",
"297201 <= id <= 297201"
],
"index_dives_for_eq_ranges": false,
"rows": 111,
"chosen": true
...
"refine_plan": [
{
"table": "`canada`",
"pushed_index_condition": "(`canada`.`id` in (293831,292259,292881,293440,292558,295792,292293,292593,294337,295430,295034,297060,293811,295587,294651,295559,293213,295742,292605,296018,294529,296711,293919,294732,294689,295540,293000,296916,294433,297112,293815,292522,296816,293320,293232,295369,291894,293700,291839,293049,292738,294895,294473,294023,294173,293019,291976,294923,294797,296958,294075,293450,296952,297185,295351,295736,296312,294330,292717,294638,294713,297176,295896,295137,296573,292236,294966,296642,296073,295903,293057,294628,292639,293803,294470,295353,297196,291752,296118,296964,296185,295338,295956,296064,295039,297201,297136,295206,295986,292172,294803,294480,294706,296975,296604,294493,293181,292526,293354,292374,292344,293744,294165,295082,296203,291918,295211,294289,294877,293120,295387))",
"table_condition_attached": null,
"access_type": "range"
}
]
Side note: I needed this due to the bulkiness of the trace:
##global.optimizer_trace_max_mem_size = 32222;

OR will be slowest. Whether IN or BETWEEN is faster will depend on your data, but I'd expect BETWEEN to be faster normally as it can simple take a range from an index (assuming someField is indexed).

Below are details of 6 queries using MySQL 5.6 #SQLFiddle
In summary the 6 queries cover independently indexed columns and 2 queries were used per data type. All queries resulted in use of an index regardless of IN() or ORs being used.
| ORs | IN()
integer | uses index | uses index
date | uses index | uses index
varchar | uses index | uses index
I really just wanted to debunk statements made that OR means no index can be used. This isn't true. Indexes can be used in queries using OR as the 6 queries in the following examples display.
Also it seems to me that many have ignored the fact that IN() is a syntax shortcut for a set of ORs. At small scale perfomance differences between using IN() -v- OR are extremely (infintessinally) marginal.
While at larger scale IN() is certainly more convenient, but it sill equates to a set of OR conditions logically. Circumstance change for each query so testing your query on your tables is always best.
Summary of the 6 explain plans, all "Using index condition" (scroll right)
Query select_type table type possible_keys key key_len ref rows filtered Extra
------------- --------- ------- --------------- ----------- --------- ----- ------ ---------- -----------------------
Integers using OR SIMPLE mytable range aNum_idx aNum_idx 4 10 100.00 Using index condition
Integers using IN SIMPLE mytable range aNum_idx aNum_idx 4 10 100.00 Using index condition
Dates using OR SIMPLE mytable range aDate_idx aDate_idx 6 7 100.00 Using index condition
Dates using IN SIMPLE mytable range aDate_idx aDate_idx 6 7 100.00 Using index condition
Varchar using OR SIMPLE mytable range aName_idx aName_idx 768 10 100.00 Using index condition
Varchar using IN SIMPLE mytable range aName_idx aName_idx 768 10 100.00 Using index condition
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE `myTable` (
`id` mediumint(8) unsigned NOT NULL auto_increment,
`aName` varchar(255) default NULL,
`aDate` datetime,
`aNum` mediumint(8),
PRIMARY KEY (`id`)
) AUTO_INCREMENT=1;
ALTER TABLE `myTable` ADD INDEX `aName_idx` (`aName`);
ALTER TABLE `myTable` ADD INDEX `aDate_idx` (`aDate`);
ALTER TABLE `myTable` ADD INDEX `aNum_idx` (`aNum`);
INSERT INTO `myTable` (`aName`,`aDate`)
VALUES
("Daniel","2017-09-19 01:22:31")
,("Quentin","2017-06-03 01:06:45")
,("Chester","2017-06-14 17:49:36")
,("Lev","2017-08-30 06:27:59")
,("Garrett","2018-10-04 02:40:37")
,("Lane","2017-01-22 17:11:21")
,("Chaim","2017-09-20 11:13:46")
,("Kieran","2018-03-10 18:37:26")
,("Cedric","2017-05-20 16:25:10")
,("Conan","2018-07-10 06:29:39")
,("Rudyard","2017-07-14 00:04:00")
,("Chadwick","2018-08-18 08:54:08")
,("Darius","2018-10-02 06:55:56")
,("Joseph","2017-06-19 13:20:33")
,("Wayne","2017-04-02 23:20:25")
,("Hall","2017-10-13 00:17:24")
,("Craig","2016-12-04 08:15:22")
,("Keane","2018-03-12 04:21:46")
,("Russell","2017-07-14 17:21:58")
,("Seth","2018-07-25 05:51:30")
,("Cole","2018-06-09 15:32:53")
,("Donovan","2017-08-12 05:21:35")
,("Damon","2017-06-27 03:44:19")
,("Brian","2017-02-01 23:35:20")
,("Harper","2017-08-25 04:29:27")
,("Chandler","2017-09-30 23:54:06")
,("Edward","2018-07-30 12:18:07")
,("Curran","2018-05-23 09:31:53")
,("Uriel","2017-05-08 03:31:43")
,("Honorato","2018-04-07 14:57:53")
,("Griffin","2017-01-07 23:35:31")
,("Hasad","2017-05-15 05:32:41")
,("Burke","2017-07-04 01:11:19")
,("Hyatt","2017-03-14 17:12:28")
,("Brenden","2017-10-17 05:16:14")
,("Ryan","2018-10-10 08:07:55")
,("Giacomo","2018-10-06 14:21:21")
,("James","2018-02-06 02:45:59")
,("Colt","2017-10-10 08:11:26")
,("Kermit","2017-09-18 16:57:16")
,("Drake","2018-05-20 22:08:36")
,("Berk","2017-04-16 17:39:32")
,("Alan","2018-09-01 05:33:05")
,("Deacon","2017-04-20 07:03:05")
,("Omar","2018-03-02 15:04:32")
,("Thaddeus","2017-09-19 04:07:54")
,("Troy","2016-12-13 04:24:08")
,("Rogan","2017-11-02 00:03:25")
,("Grant","2017-08-21 01:45:16")
,("Walker","2016-11-26 15:54:52")
,("Clarke","2017-07-20 02:26:56")
,("Clayton","2018-08-16 05:09:29")
,("Denton","2018-08-11 05:26:05")
,("Nicholas","2018-07-19 09:29:55")
,("Hashim","2018-08-10 20:38:06")
,("Todd","2016-10-25 01:01:36")
,("Xenos","2017-05-11 22:50:35")
,("Bert","2017-06-17 18:08:21")
,("Oleg","2018-01-03 13:10:32")
,("Hall","2018-06-04 01:53:45")
,("Evan","2017-01-16 01:04:25")
,("Mohammad","2016-11-18 05:42:52")
,("Armand","2016-12-18 06:57:57")
,("Kaseem","2018-06-12 23:09:57")
,("Colin","2017-06-29 05:25:52")
,("Arthur","2016-12-29 04:38:13")
,("Xander","2016-11-14 19:35:32")
,("Dante","2016-12-01 09:01:04")
,("Zahir","2018-02-17 14:44:53")
,("Raymond","2017-03-09 05:33:06")
,("Giacomo","2017-04-17 06:12:52")
,("Fulton","2017-06-04 00:41:57")
,("Chase","2018-01-14 03:03:57")
,("William","2017-05-08 09:44:59")
,("Fuller","2017-03-31 20:35:20")
,("Jarrod","2017-02-15 02:45:29")
,("Nissim","2018-03-11 14:19:25")
,("Chester","2017-11-05 00:14:27")
,("Perry","2017-12-24 11:58:04")
,("Theodore","2017-06-26 12:34:12")
,("Mason","2017-10-02 03:53:49")
,("Brenden","2018-10-08 10:09:47")
,("Jerome","2017-11-05 20:34:25")
,("Keaton","2018-08-18 00:55:56")
,("Tiger","2017-05-21 16:59:07")
,("Benjamin","2018-04-10 14:46:36")
,("John","2018-09-05 18:53:03")
,("Jakeem","2018-10-11 00:17:38")
,("Kenyon","2017-12-18 22:19:29")
,("Ferris","2017-03-29 06:59:13")
,("Hoyt","2017-01-03 03:48:56")
,("Fitzgerald","2017-07-27 11:27:52")
,("Forrest","2017-10-05 23:14:21")
,("Jordan","2017-01-11 03:48:09")
,("Lev","2017-05-25 08:03:39")
,("Chase","2017-06-18 19:09:23")
,("Ryder","2016-12-13 12:50:50")
,("Malik","2017-11-19 15:15:55")
,("Zeph","2018-04-04 11:22:12")
,("Amala","2017-01-29 07:52:17")
;
.
update MyTable
set aNum = id
;
Query 1:
select 'aNum by OR' q, mytable.*
from mytable
where aNum = 12
OR aNum = 22
OR aNum = 27
OR aNum = 32
OR aNum = 42
OR aNum = 52
OR aNum = 62
OR aNum = 65
OR aNum = 72
OR aNum = 82
Results:
| q | id | aName | aDate | aNum |
|------------|----|----------|----------------------|------|
| aNum by OR | 12 | Chadwick | 2018-08-18T08:54:08Z | 12 |
| aNum by OR | 22 | Donovan | 2017-08-12T05:21:35Z | 22 |
| aNum by OR | 27 | Edward | 2018-07-30T12:18:07Z | 27 |
| aNum by OR | 32 | Hasad | 2017-05-15T05:32:41Z | 32 |
| aNum by OR | 42 | Berk | 2017-04-16T17:39:32Z | 42 |
| aNum by OR | 52 | Clayton | 2018-08-16T05:09:29Z | 52 |
| aNum by OR | 62 | Mohammad | 2016-11-18T05:42:52Z | 62 |
| aNum by OR | 65 | Colin | 2017-06-29T05:25:52Z | 65 |
| aNum by OR | 72 | Fulton | 2017-06-04T00:41:57Z | 72 |
| aNum by OR | 82 | Brenden | 2018-10-08T10:09:47Z | 82 |
Query 2:
select 'aNum by IN' q, mytable.*
from mytable
where aNum IN (
12
, 22
, 27
, 32
, 42
, 52
, 62
, 65
, 72
, 82
)
Results:
| q | id | aName | aDate | aNum |
|------------|----|----------|----------------------|------|
| aNum by IN | 12 | Chadwick | 2018-08-18T08:54:08Z | 12 |
| aNum by IN | 22 | Donovan | 2017-08-12T05:21:35Z | 22 |
| aNum by IN | 27 | Edward | 2018-07-30T12:18:07Z | 27 |
| aNum by IN | 32 | Hasad | 2017-05-15T05:32:41Z | 32 |
| aNum by IN | 42 | Berk | 2017-04-16T17:39:32Z | 42 |
| aNum by IN | 52 | Clayton | 2018-08-16T05:09:29Z | 52 |
| aNum by IN | 62 | Mohammad | 2016-11-18T05:42:52Z | 62 |
| aNum by IN | 65 | Colin | 2017-06-29T05:25:52Z | 65 |
| aNum by IN | 72 | Fulton | 2017-06-04T00:41:57Z | 72 |
| aNum by IN | 82 | Brenden | 2018-10-08T10:09:47Z | 82 |
Query 3:
select 'adate by OR' q, mytable.*
from mytable
where aDate= str_to_date("2017-02-15 02:45:29",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2018-03-10 18:37:26",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2017-05-20 16:25:10",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2018-07-10 06:29:39",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2017-07-14 00:04:00",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2018-08-18 08:54:08",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2018-10-02 06:55:56",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2017-04-20 07:03:05",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2018-03-02 15:04:32",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2017-09-19 04:07:54",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2016-12-13 04:24:08",'%Y-%m-%d %h:%i:%s')
Results:
| q | id | aName | aDate | aNum |
|-------------|----|----------|----------------------|------|
| adate by OR | 47 | Troy | 2016-12-13T04:24:08Z | 47 |
| adate by OR | 76 | Jarrod | 2017-02-15T02:45:29Z | 76 |
| adate by OR | 44 | Deacon | 2017-04-20T07:03:05Z | 44 |
| adate by OR | 46 | Thaddeus | 2017-09-19T04:07:54Z | 46 |
| adate by OR | 10 | Conan | 2018-07-10T06:29:39Z | 10 |
| adate by OR | 12 | Chadwick | 2018-08-18T08:54:08Z | 12 |
| adate by OR | 13 | Darius | 2018-10-02T06:55:56Z | 13 |
Query 4:
select 'adate by IN' q, mytable.*
from mytable
where aDate IN (
str_to_date("2017-02-15 02:45:29",'%Y-%m-%d %h:%i:%s')
, str_to_date("2018-03-10 18:37:26",'%Y-%m-%d %h:%i:%s')
, str_to_date("2017-05-20 16:25:10",'%Y-%m-%d %h:%i:%s')
, str_to_date("2018-07-10 06:29:39",'%Y-%m-%d %h:%i:%s')
, str_to_date("2017-07-14 00:04:00",'%Y-%m-%d %h:%i:%s')
, str_to_date("2018-08-18 08:54:08",'%Y-%m-%d %h:%i:%s')
, str_to_date("2018-10-02 06:55:56",'%Y-%m-%d %h:%i:%s')
, str_to_date("2017-04-20 07:03:05",'%Y-%m-%d %h:%i:%s')
, str_to_date("2018-03-02 15:04:32",'%Y-%m-%d %h:%i:%s')
, str_to_date("2017-09-19 04:07:54",'%Y-%m-%d %h:%i:%s')
, str_to_date("2016-12-13 04:24:08",'%Y-%m-%d %h:%i:%s')
)
Results:
| q | id | aName | aDate | aNum |
|-------------|----|----------|----------------------|------|
| adate by IN | 47 | Troy | 2016-12-13T04:24:08Z | 47 |
| adate by IN | 76 | Jarrod | 2017-02-15T02:45:29Z | 76 |
| adate by IN | 44 | Deacon | 2017-04-20T07:03:05Z | 44 |
| adate by IN | 46 | Thaddeus | 2017-09-19T04:07:54Z | 46 |
| adate by IN | 10 | Conan | 2018-07-10T06:29:39Z | 10 |
| adate by IN | 12 | Chadwick | 2018-08-18T08:54:08Z | 12 |
| adate by IN | 13 | Darius | 2018-10-02T06:55:56Z | 13 |
Query 5:
select 'name by OR' q, mytable.*
from mytable
where aname = 'Alan'
OR aname = 'Brian'
OR aname = 'Chandler'
OR aname = 'Darius'
OR aname = 'Evan'
OR aname = 'Ferris'
OR aname = 'Giacomo'
OR aname = 'Hall'
OR aname = 'James'
OR aname = 'Jarrod'
Results:
| q | id | aName | aDate | aNum |
|-------------|----|----------|----------------------|------|
| name by OR | 43 | Alan | 2018-09-01T05:33:05Z | 43 |
| name by OR | 24 | Brian | 2017-02-01T23:35:20Z | 24 |
| name by OR | 26 | Chandler | 2017-09-30T23:54:06Z | 26 |
| name by OR | 13 | Darius | 2018-10-02T06:55:56Z | 13 |
| name by OR | 61 | Evan | 2017-01-16T01:04:25Z | 61 |
| name by OR | 90 | Ferris | 2017-03-29T06:59:13Z | 90 |
| name by OR | 37 | Giacomo | 2018-10-06T14:21:21Z | 37 |
| name by OR | 71 | Giacomo | 2017-04-17T06:12:52Z | 71 |
| name by OR | 16 | Hall | 2017-10-13T00:17:24Z | 16 |
| name by OR | 60 | Hall | 2018-06-04T01:53:45Z | 60 |
| name by OR | 38 | James | 2018-02-06T02:45:59Z | 38 |
| name by OR | 76 | Jarrod | 2017-02-15T02:45:29Z | 76 |
Query 6:
select 'name by IN' q, mytable.*
from mytable
where aname IN (
'Alan'
,'Brian'
,'Chandler'
, 'Darius'
, 'Evan'
, 'Ferris'
, 'Giacomo'
, 'Hall'
, 'James'
, 'Jarrod'
)
Results:
| q | id | aName | aDate | aNum |
|------------|----|----------|----------------------|------|
| name by IN | 43 | Alan | 2018-09-01T05:33:05Z | 43 |
| name by IN | 24 | Brian | 2017-02-01T23:35:20Z | 24 |
| name by IN | 26 | Chandler | 2017-09-30T23:54:06Z | 26 |
| name by IN | 13 | Darius | 2018-10-02T06:55:56Z | 13 |
| name by IN | 61 | Evan | 2017-01-16T01:04:25Z | 61 |
| name by IN | 90 | Ferris | 2017-03-29T06:59:13Z | 90 |
| name by IN | 37 | Giacomo | 2018-10-06T14:21:21Z | 37 |
| name by IN | 71 | Giacomo | 2017-04-17T06:12:52Z | 71 |
| name by IN | 16 | Hall | 2017-10-13T00:17:24Z | 16 |
| name by IN | 60 | Hall | 2018-06-04T01:53:45Z | 60 |
| name by IN | 38 | James | 2018-02-06T02:45:59Z | 38 |
| name by IN | 76 | Jarrod | 2017-02-15T02:45:29Z | 76 |

2018: IN (...) is faster. But >= && <= is even faster than IN.
Here's my benchmark.

I'll bet they are the same, you can run a test by doing the following:
loop over the "in (1,2,3,4)" 500 times and see how long it takes. loop over the "=1 or =2 or=3..." version 500 times and seeing how long it runs.
you could also try a join way, if someField is an index and your table is big it could be faster...
SELECT ...
FROM ...
INNER JOIN (SELECT 1 as newField UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4) dt ON someFIELD =newField
I tried the join method above on my SQL Server and it is nearly the same as the in (1,2,3,4), and they both result in a clustered index seek. I'm not sure how MySQL will handle them.

As explained by others, IN is better chosen than OR with respect to query performance.
Queries with OR condition might take more longer execution time in the below cases.
to execute if MySQL optimizer choses any other index to be efficient(during false positive cases).
If the number of records is more ( As clearly stated by Jacob )

From what I understand about the way that the compiler optimizes these types of queries, using the IN clause is more efficient than multiple OR clauses. If you have values where the BETWEEN clause can be used, that is more efficient still.

I know that, as long as you have an index on Field, the BETWEEN will use it to quickly find one end, then traverse to the other. This is most efficient.
Every EXPLAIN I've seen shows "IN ( ... )" and " ... OR ..." to be interchangeable and equally (in)efficient. Which you would expect, since the optimizer has no way to know whether or not they comprise an interval. It's also equivalent to a UNION ALL SELECT on the individual values.

ORDER BY first OR condition true

I have the folloing table in a MySQL 5.6+ database
+----+-------+-------+--------------+--------------+
| id | mood1 | mood2 | mood1_visual | mood2_visual |
+----+-------+-------+--------------+--------------+
| 1 | 5 | 7 | 5c | 7a |
| 2 | 5 | 7 | 5b | 7b |
| 3 | 5 | 7 | 5c | 7d |
| 4 | 5 | 8 | 5a | 8a |
| 5 | 5 | 7 | 5c | 7a |
| 6 | 5 | 8 | 5b | 8a |
+----+-------+-------+--------------+--------------+
I need to select rows where
(`mood1_visual`='5c' AND `mood2_visual`='7a') OR
(`mood1`=5 AND `mood2`=7 AND (`mood1_visual`<>'5c' OR `mood2_visual`<>'7a'))
In the example above those would be the rows 1,2,3 and 5. However I need them sorted with the ones that match the first part of the WHERE clause before the others, i.e. I need them sorted 1,5,2,3, because row 1 and row 5 satisfy the first part of the WHERE clause
(`mood1_visual`='5c' AND `mood2_visual`='7a')
How do I tell that in the ORDER BY clause? Is there anything like
ORDER BY THE ONES THAT MATCH CONDITION FIRST THEN THE OTHERS ?
Please note that 5,1,2,3 or 1,5,3,2 or 5,1,3,2 are all valid sortings for my needs, just in case that made the solution easier.

MySQL can order by arbitrary expressions, e.g.
ORDER BY (foo = bar)
The boolean result of that will be typecast to an integer 0 or 1, so if you want all of the "equal" values first (1), then
ORDER BY (foo = bar) DESC
and if you want all of the false (0) values first, then
ORDER BY (foo = bar) ASC

What if you use CASE expression with ORDER BY like
ORDER BY
CASE
WHEN (`mood1_visual`='5c' AND `mood2_visual`='7a') THEN id
END

What is the difference when comparing with parentheses: WHERE (a, b)=(1,2)

I stumbled across the follwoing (valid) query in MySQL (also works in Oracle/MSSQL when replacing = with IN):
SELECT * from mytable WHERE (a, b)=(1,2)
It's the same as
SELECT * from mytable WHERE a=1 and b=2
I think the definition in the MySQL docs is here:
simple_expr:
[...]
| (expr [, expr] ...)
[...]
What is this called? Are there any pros and cons for using it?

It can be very handy when needed to compare multiple columns to multiple combination of values by using IN() :
SELECT * FROM YourTable
WHERE (col1,col2) IN((1,2),(2,3),(4,4)...)
Instead of:
SELECT * FROM YourTable
WHERE (col1 = 1 and col2 = 2) OR
(col1 = 2 and col2 = 3) OR
(col1 = 4 and col2 = 4) OR
....
After reviewing the execution plan of both queries, I can say that in Oracle(Using IN() which is basically the same), the optimizer evaluate both the same way and both are using the indexes :
Separate conditions:
EXPLAIN PLAN FOR
SELECT * FROM dim_remedy_tickets_cache t
where t.tt_id = '1' and t.region_name = 'one';
6 | 0 | SELECT STATEMENT | | 1 | 311 | 30 (0)| 00:00:01 |
7 | 1 | TABLE ACCESS BY INDEX ROWID| DIM_REMEDY_TICKETS_CACHE | 1 | 311 | 30 (0)| 00:00:01 |
8 | 2 | INDEX RANGE SCAN | DIM_REMEDY_TICKETS_HISTORYPK | 1 | | 20 (0)| 00:00:01 |
Combined conditions:
EXPLAIN PLAN FOR
SELECT * FROM dim_remedy_tickets_cache t
where (t.tt_id,t.region_name) in (('1','one'))
6 | 0 | SELECT STATEMENT | | 1 | 311 | 30 (0)| 00:00:01 |
7 | 1 | TABLE ACCESS BY INDEX ROWID| DIM_REMEDY_TICKETS_CACHE | 1 | 311 | 30 (0)| 00:00:01 |
8 | 2 | INDEX RANGE SCAN | DIM_REMEDY_TICKETS_HISTORYPK | 1 | | 20 (0)| 00:00:01 |
I assume all RDBMS will evaluate this queries the same.
Row constructors are legal in other contexts. For example, the following two statements are semantically equivalent (and are handled in the same way by the optimizer):
So:
Cons - May be less readable for some people , but basically no cons.
Pros - Less code , and the combination of multiple columns comparison using IN() :

MySQL Count within an IF

+-------------+--------------+----------+-------+
| ticketRefNo | nameOnTicket | boughtBy | event |
+-------------+--------------+----------+-------+
| 38 | J XXXXXXXXX | 2 | 13 |
| 39 | C YYYYYYY | 1 | 13 |
| 40 | M ZZZZZZZZZZ | 3 | 14 |
| 41 | C AAAAAAA | 3 | 15 |
| 42 | D BBBBBB | 3 | 16 |
| 43 | A CCCCC | 3 | 17 |
+-------------+--------------+----------+-------+
+-------------+------------------+--------------+---------------------+--------+
| ticketRefNo | cardNo | cardHolder | exp | issuer |
+-------------+------------------+--------------+---------------------+--------+
| 38 | 4444111133332222 | J McKenny | 2016-01-01 00:00:00 | BOS |
| 39 | 4434111133332222 | C Dempsey | 2016-04-01 00:00:00 | BOS |
| 40 | 4244111133332222 | M Gunn-Davis | 2018-02-01 00:00:00 | RBS |
+-------------+------------------+--------------+---------------------+--------+
+-------------+-------------+----------+
| ticketRefNo | boxOfficeID | paidWith |
+-------------+-------------+----------+
| 41 | 1 | card |
| 42 | 2 | cash |
| 43 | 3 | chequ |
+-------------+-------------+----------+
I have a database with the data shown above. It represents a ticket-buying system. I would like to be able to see a list of tickets bought with the name of the event and either the boxOfficeID or the issuer of the debit card.
I have tried running the following code, to no avail.
SELECT t.ticketRefNo AS 'Reference', t.event AS 'Event',
IF(COUNT(SELECT * FROM Online WHERE t.ticketRefNo=o.ticketRefNo;) >= 1,
o.issuer, InPerson.boxOfficeID) AS 'Card Issuer or Box Office'
FROM Ticket AS t, InPerson, Online AS o
WHERE t.ticketRefNo=o.ticketRefNo;
Cheers in advance!

Some notes: the semicolon character isn't valid syntax; if you have a need to delimit the subquery, wrap it in parens. Escape column aliases like you'd escape any other identifier: use backticks, not single quotes. Single quotes are used around string literals.
Assuming that issuer in the Online table is NOT NULL, and assuming that ticketRefNo is unique in both the Online and InPerson tables, you could do something like this:
SELECT t.ticketRefNo AS `Reference`
, t.event AS `Event`
, IF(o.ticketRefNo IS NOT NULL,o.issuer,i.boxOfficeId)
AS `Card Issuer or Box Office`
FROM Ticket t
LEFT
JOIN InPerson i
ON i.ticketRefNo = t.ticketRefNo
LEFT
JOIN Online o
ON o.ticketRefNo = t.ticketRefNo
Use outer join operations to find matching rows in the InPerson and Online tables, and use a conditional test to see if you got a matching row from the Online table. A NULL will be returned if there wasn't a matching row found.

It's not a good idea to have one column JOINing to two different tables with some values in each of the two tables.
But here goes anyway:
( SELECT ... FROM Ticket t JOIN InPerson x USING(ticketRefNo) ... )
UNION ALL
( SELECT ... FROM Ticket t JOIN Online x USING(ticketRefNo) ... )
ORDER BY ...
The ALL assumes that InPerson and Online never have any overlapping ticketRefNos.
The ORDER BY an the end is in case you want to sort things, although I see no need for it in your attempted SELECT.
The two SELECTs must have the same number of columns.

MySQL using GROUP BY to group by multiple columns

I'd like to use GROUP BY multiple columns, I think it's best to start with an example:
SELECT
eventsviews.eventId,
showsActive.showId,
showsActive.venueId,
COUNT(*) AS count
FROM eventsviews
INNER JOIN events ON events.eventId = eventsviews.eventId
INNER JOIN showsActive ON showsActive.eventId = eventsviews.eventId
WHERE events.status = 1
GROUP BY showsActive.venueId, showsActive.showId, showsActive.eventId
ORDER BY count DESC
LIMIT 100;
Output:
| *eventId* | *showId* | *venueId* | *count* |
+-----------+----------+-----------+---------+
[...snip...]
| 95 | 92099 | 9770 | 32 |
| 95 | 105472 | 10702 | 32 |
| 3804 | 41225 | 8165 | 17 |
| 3804 | 41226 | 8165 | 17 |
| 923 | 2866 | 5451 | 14 |
| 923 | 20184 | 5930 | 14 |
[...snip...]
What I would like instead:
| *eventId* | *showId* | *venueId* | *count* |
+-----------+----------+-----------+---------+
| 95 | 92099 | 9770 | 32 |
| 3804 | 41226 | 8165 | 17 |
| 923 | 20184 | 5930 | 14 |
So, I want my data grouped by eventId, but only once for each showId and venueId ...
I actually have a SQL query that does that, but it has 8 subqueries and is as slow as a T-Ford ... And since this is executed on every page load, speeding things up looks like a good idea!
There are a few questions like this, and I've tried many different things, but I've been at this query for an hour and I can't seem to get it to work as I want :-(
Thanks!

You probably want either a min or a max on showid, and then not include it in the group by, I can't tell which because looking at your "prefered" resultset, you have both.

If you want your data grouped by eventId, group just by eventId and you'll get exactly the result you're looking for.
This is a MySQL feature (?) that it allows you to select non-aggregate columns, in which case it will return the first row available. In other DBMS it's achieved by DISTINCT ON, which is not available in MySQL.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL OR vs IN performance - mysql

I also did a test for future Googlers. Total count of returned results is 7264 out of 10000 SELECT * FROM item WHERE id = 1 OR id = 2 ... id = 10000 This query took 0.1239 seconds SELECT * FROM item WHERE id IN (1,2,3,...10000) This query took 0.0433 seconds IN is 3 times faster than OR

I think one explanation to sunseeker's observation is MySQL actually sort the values in the IN statement if they are all static values and using binary search, which is more efficient than the plain OR alternative. I can't remember where I've read that, but sunseeker's result seems to be a proof.

OR will be slowest. Whether IN or BETWEEN is faster will depend on your data, but I'd expect BETWEEN to be faster normally as it can simple take a range from an index (assuming someField is indexed).

2018: IN (...) is faster. But >= && <= is even faster than IN. Here's my benchmark.

From what I understand about the way that the compiler optimizes these types of queries, using the IN clause is more efficient than multiple OR clauses. If you have values where the BETWEEN clause can be used, that is more efficient still.

Related

which query is faster? and why? [duplicate]

ORDER BY first OR condition true

What is the difference when comparing with parentheses: WHERE (a, b)=(1,2)

MySQL Count within an IF

MySQL using GROUP BY to group by multiple columns

Categories

Resources