MySQL SUM returns unexpected value - mysql

I've been trying to use SUM() to total up some numeric values in my database, but it seems to be returning unexpected values. Here is the PHP-side for generating the SQL:
public function calcField(string $field, array $weeks)
{
$stmt = $this->conn->prepare('SELECT SUM(`'. $field .'`) AS r FROM `ws` WHERE `week` IN(?);');
$stmt->execute([implode(',', $weeks)]);
return $stmt->fetch(\PDO::FETCH_ASSOC)['r'];
}
Let's give it some example data for you folks at home:
$field = 'revenue';
$weeks = [14,15,16,17,18,19,20,21,22,23,24,25,26];
This returns this value:
4707.92
Without seeing the data, this may have seemed to have worked, but here's the rows for those weeks:
+----+------+------+---------+-------+---------+---------+------+----------+
| id | week | year | revenue | sales | gpm_ave | uploads | pool | sold_ave |
+----+------+------+---------+-------+---------+---------+------+----------+
| 2 | 14 | 2019 | 4707.92 | 292 | 13 | 0 | 1479 | 20 |
| 3 | 15 | 2019 | 4373.32 | 304 | 13 | 0 | 1578 | 19 |
| 4 | 16 | 2019 | 4513.10 | 275 | 14 | 0 | 1460 | 19 |
| 5 | 17 | 2019 | 4944.80 | 336 | 14 | 0 | 1642 | 20 |
| 6 | 18 | 2019 | 4343.87 | 339 | 13 | 0 | 1652 | 21 |
| 7 | 19 | 2019 | 3918.59 | 356 | 14 | 0 | 1419 | 25 |
| 8 | 20 | 2019 | 4091.20 | 247 | 19 | 0 | 1602 | 15 |
| 9 | 21 | 2019 | 4177.22 | 242 | 12 | 0 | 1588 | 15 |
| 10 | 22 | 2019 | 3447.88 | 227 | 18 | 0 | 1585 | 14 |
| 11 | 23 | 2019 | 3334.18 | 216 | 15 | 0 | 1675 | 13 |
| 12 | 24 | 2019 | 4736.15 | 281 | 13 | 0 | 1388 | 20 |
| 13 | 25 | 2019 | 4863.84 | 252 | 12 | 0 | 1465 | 17 |
| 14 | 26 | 2019 | 4465.95 | 281 | 21 | 0 | 1704 | 16 |
+----+------+------+---------+-------+---------+---------+------+----------+
As you can see, the total should be far greater than 4707.92 - and I notice that the first row revenue = 4707.92.
Here's what things get weird, if I add this into the function:
echo 'SELECT SUM(`'. $field .'`) AS r FROM `ws` WHERE `week` IN('. implode(',', $weeks) .');';
Which outputs:
SELECT SUM(revenue) AS r FROM ws WHERE week IN(14,15,16,17,18,19,20,21,22,23,24,25,26);
Copying and pasting this into MySQL CLI returns:
MariaDB [nmn]> SELECT SUM(revenue) AS r FROM ws WHERE week IN(14,15,16,17,18,19,20,21,22,23,24,25,26);
+----------+
| r |
+----------+
| 55918.02 |
+----------+
1 row in set (0.00 sec)
Which, looks a lot more accurate. However, that very same SQL statement returns the first row value rather than summing the column for those weeks.
This function gets triggered by an AJAX script:
$d = new Page\Snapshot\D();
# at the minute only outputting dump of values to see what happens
echo '<pre>'. print_r(
$d->getQuarterlySnapshot(new Page\Snapshot\S(), new App\Core\Date(), $_POST['quarter'], '2019'),
1
). '</pre>';
The function $d->getQuarterlySnapshot function:
public function getQuarterlySnapshot(S $s, Date $date, int $q, string $year)
{
switch($q)
{
case 1:
$start = $year. '-01-01 00:00:00';
$end = $year. '-03-31 23:59:59';
break;
case 2:
$start = $year. '-04-01 00:00:00';
$end = $year. '-06-30 23:59:59';
break;
case 3:
$start = $year. '-07-01 00:00:00';
$end = $year. '-09-30 23:59:59';
break;
case 4:
$start = $year. '-10-01 00:00:00';
$end = $year. '-12-31 23:59:59';
break;
}
$weeks = $date->getWeeksInRange('2019', 'W', $start, $end);
foreach ($weeks as $key => $week){$weeks[$key] = $week[0];}
return [
'rev' => $s->calcField('revenue', $weeks),
'sales' => $s->calcField('sales', $weeks),
'gpm_ave' => $s->calcField('gpm_ave', $weeks),
'ul' => $s->calcField('uploads', $weeks),
'pool' => $s->calcField('pool', $weeks),
'sold_ave' => $s->calcField('sold_ave', $weeks)
];
}
So I don't overwrite the value anywhere (that I can see at least). How do I use SUM() with the IN() conditional?

As pointed out in the comments by #MadhurBhaiya:
Comma separated week_id values is passing as a single string in the query: week_id in ('1,2,3...,15') . MySQL implicitly typecasts this to 1 and thus gets the first row only. You will need to change the query preparation code
Led me to breaking up the single ? into named parameters using a foreach loop:
public function calcField(string $field, array $weeks)
{
$data = [];
$endKey = end(array_keys($weeks));
$sql = 'SELECT SUM(`'. $field .'`) AS r FROM `ws` WHERE `week` IN(';
foreach ($weeks as $key => $week)
{
$sql .= ':field'. $key;
$sql .= ($key !== $endKey ? ',' : '');
$data[':field'. $key] = $week;
}
$sql .= ');';
$stmt = $this->conn->prepare($sql);
$stmt->execute($data);
return $stmt->fetch(\PDO::FETCH_ASSOC)['r'];
}
Now each numeric value is treated individually, which, is getting me the expected value.

Yes, it is showing only first row data in your query because you have mentioned where week IN (1,2,3,4,5,6,7,8,9,10,11,12,13); and your data shows that you have week 13 and 1 to 12 does not exist in your data from IN clause of your query.

You are selecting only the row with week 13, the week 1-12 is not existing in your data. That's why the result of your query is 43.900001525878906
Solution 1:
You might want to change the values in your IN()because you are filtering your data by the column week.
SELECT SUM(`revenue`) as `r` FROM `ws` WHERE `week` IN (13,14,15,16,17,18,19,20,21,22,23,24,25);
Solution 2:
You can change the week in your where clause to id
SELECT SUM(`revenue`) as `r` FROM `ws` WHERE `id` IN (1,2,3,4,5,6,7,8,9,10,11,12,13);

Related

which query is faster? and why? [duplicate]

I am wondering if there is any difference with regards to performance between the following
SELECT ... FROM ... WHERE someFIELD IN(1,2,3,4)
SELECT ... FROM ... WHERE someFIELD between 0 AND 5
SELECT ... FROM ... WHERE someFIELD = 1 OR someFIELD = 2 OR someFIELD = 3 ...
or will MySQL optimize the SQL in the same way compilers optimize code?
EDIT
Changed the AND's to OR's for the reason stated in the comments.
I needed to know this for sure, so I benchmarked both methods. I consistenly found IN to be much faster than using OR.
Do not believe people who give their "opinion", science is all about testing and evidence.
I ran a loop of 1000x the equivalent queries (for consistency, I used sql_no_cache):
IN: 2.34969592094s
OR: 5.83781504631s
Update:
(I don't have the source code for the original test, as it was 6 years ago, though it returns a result in the same range as this test)
In request for some sample code to test this, here is the simplest possible use case. Using Eloquent for syntax simplicity, raw SQL equivalent executes the same.
$t = microtime(true);
for($i=0; $i<10000; $i++):
$q = DB::table('users')->where('id',1)
->orWhere('id',2)
->orWhere('id',3)
->orWhere('id',4)
->orWhere('id',5)
->orWhere('id',6)
->orWhere('id',7)
->orWhere('id',8)
->orWhere('id',9)
->orWhere('id',10)
->orWhere('id',11)
->orWhere('id',12)
->orWhere('id',13)
->orWhere('id',14)
->orWhere('id',15)
->orWhere('id',16)
->orWhere('id',17)
->orWhere('id',18)
->orWhere('id',19)
->orWhere('id',20)->get();
endfor;
$t2 = microtime(true);
echo $t."\n".$t2."\n".($t2-$t)."\n";
1482080514.3635
1482080517.3713
3.0078368186951
$t = microtime(true);
for($i=0; $i<10000; $i++):
$q = DB::table('users')->whereIn('id',[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20])->get();
endfor;
$t2 = microtime(true);
echo $t."\n".$t2."\n".($t2-$t)."\n";
1482080534.0185
1482080536.178
2.1595389842987
I also did a test for future Googlers. Total count of returned results is 7264 out of 10000
SELECT * FROM item WHERE id = 1 OR id = 2 ... id = 10000
This query took 0.1239 seconds
SELECT * FROM item WHERE id IN (1,2,3,...10000)
This query took 0.0433 seconds
IN is 3 times faster than OR
The accepted answer doesn't explain the reason.
Below are quoted from High Performance MySQL, 3rd Edition.
In many database servers, IN() is just a synonym for multiple OR clauses, because the two are logically equivalent. Not so in MySQL, which sorts the values in the IN() list and uses a fast binary search to see whether a value is in the list. This is O(Log n) in the size of the list, whereas an equivalent series of OR clauses is O(n) in the size of the list (i.e., much slower for large lists)
I think the BETWEEN will be faster since it should be converted into:
Field >= 0 AND Field <= 5
It is my understanding that an IN will be converted to a bunch of OR statements anyway. The value of IN is the ease of use. (Saving on having to type each column name multiple times and also makes it easier to use with existing logic - you don't have to worry about AND/OR precedence because the IN is one statement. With a bunch of OR statements, you have to ensure you surround them with parentheses to make sure they are evaluated as one condition.)
The only real answer to your question is PROFILE YOUR QUERIES. Then you will know what works best in your particular situation.
It depends on what you are doing; how wide is the range, what is the data type (I know your example uses a numeric data type but your question can also apply to a lot of different data types).
This is an instance where you want to write the query both ways; get it working and then use EXPLAIN to figure out the execution differences.
I'm sure there is a concrete answer to this but this is how I would, practically speaking, figure out the answer for my given question.
This might be of some help: http://forge.mysql.com/wiki/Top10SQLPerformanceTips
Regards,
Frank
I think one explanation to sunseeker's observation is MySQL actually sort the values in the IN statement if they are all static values and using binary search, which is more efficient than the plain OR alternative. I can't remember where I've read that, but sunseeker's result seems to be a proof.
Just when you thought it was safe...
What is your value of eq_range_index_dive_limit? In particular, do you have more or fewer items in the IN clause?
This will not include a Benchmark, but will peer into the inner workings a little. Let's use a tool to see what is going on -- Optimizer Trace.
The query: SELECT * FROM canada WHERE id ...
With an OR of 3 values, part of the trace looks like:
"condition_processing": {
"condition": "WHERE",
"original_condition": "((`canada`.`id` = 296172) or (`canada`.`id` = 295093) or (`canada`.`id` = 293626))",
"steps": [
{
"transformation": "equality_propagation",
"resulting_condition": "(multiple equal(296172, `canada`.`id`) or multiple equal(295093, `canada`.`id`) or multiple equal(293626, `canada`.`id`))"
},
...
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "id",
"ranges": [
"293626 <= id <= 293626",
"295093 <= id <= 295093",
"296172 <= id <= 296172"
],
"index_dives_for_eq_ranges": true,
"chosen": true
...
"refine_plan": [
{
"table": "`canada`",
"pushed_index_condition": "((`canada`.`id` = 296172) or (`canada`.`id` = 295093) or (`canada`.`id` = 293626))",
"table_condition_attached": null,
"access_type": "range"
}
]
Note how ICP is being given ORs. This implies that OR is not turned into IN, and InnoDB will be performing a bunch of = tests through ICP. (I do not feel it is worth considering MyISAM.)
(This is Percona's 5.6.22-71.0-log; id is a secondary index.)
Now for IN() with a few values
eq_range_index_dive_limit = 10; there are 8 values.
"condition_processing": {
"condition": "WHERE",
"original_condition": "(`canada`.`id` in (296172,295093,293626,295573,297148,296127,295588,295810))",
"steps": [
{
"transformation": "equality_propagation",
"resulting_condition": "(`canada`.`id` in (296172,295093,293626,295573,297148,296127,295588,295810))"
},
...
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "id",
"ranges": [
"293626 <= id <= 293626",
"295093 <= id <= 295093",
"295573 <= id <= 295573",
"295588 <= id <= 295588",
"295810 <= id <= 295810",
"296127 <= id <= 296127",
"296172 <= id <= 296172",
"297148 <= id <= 297148"
],
"index_dives_for_eq_ranges": true,
"chosen": true
...
"refine_plan": [
{
"table": "`canada`",
"pushed_index_condition": "(`canada`.`id` in (296172,295093,293626,295573,297148,296127,295588,295810))",
"table_condition_attached": null,
"access_type": "range"
}
]
Note that the IN does not seem to be turned into OR.
A side note: Notice that the constant values were sorted. This can be beneficial in two ways:
By jumping around less, there may be better caching, less I/O to get to all the values.
If two similar queries are coming from separate connections, and they are in transactions, there is a better chance of getting a delay instead of a deadlock due to overlapping lists.
Finally, IN() with a lots of values
{
"condition_processing": {
"condition": "WHERE",
"original_condition": "(`canada`.`id` in (293831,292259,292881,293440,292558,295792,292293,292593,294337,295430,295034,297060,293811,295587,294651,295559,293213,295742,292605,296018,294529,296711,293919,294732,294689,295540,293000,296916,294433,297112,293815,292522,296816,293320,293232,295369,291894,293700,291839,293049,292738,294895,294473,294023,294173,293019,291976,294923,294797,296958,294075,293450,296952,297185,295351,295736,296312,294330,292717,294638,294713,297176,295896,295137,296573,292236,294966,296642,296073,295903,293057,294628,292639,293803,294470,295353,297196,291752,296118,296964,296185,295338,295956,296064,295039,297201,297136,295206,295986,292172,294803,294480,294706,296975,296604,294493,293181,292526,293354,292374,292344,293744,294165,295082,296203,291918,295211,294289,294877,293120,295387))",
"steps": [
{
"transformation": "equality_propagation",
"resulting_condition": "(`canada`.`id` in (293831,292259,292881,293440,292558,295792,292293,292593,294337,295430,295034,297060,293811,295587,294651,295559,293213,295742,292605,296018,294529,296711,293919,294732,294689,295540,293000,296916,294433,297112,293815,292522,296816,293320,293232,295369,291894,293700,291839,293049,292738,294895,294473,294023,294173,293019,291976,294923,294797,296958,294075,293450,296952,297185,295351,295736,296312,294330,292717,294638,294713,297176,295896,295137,296573,292236,294966,296642,296073,295903,293057,294628,292639,293803,294470,295353,297196,291752,296118,296964,296185,295338,295956,296064,295039,297201,297136,295206,295986,292172,294803,294480,294706,296975,296604,294493,293181,292526,293354,292374,292344,293744,294165,295082,296203,291918,295211,294289,294877,293120,295387))"
},
...
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "id",
"ranges": [
"291752 <= id <= 291752",
"291839 <= id <= 291839",
...
"297196 <= id <= 297196",
"297201 <= id <= 297201"
],
"index_dives_for_eq_ranges": false,
"rows": 111,
"chosen": true
...
"refine_plan": [
{
"table": "`canada`",
"pushed_index_condition": "(`canada`.`id` in (293831,292259,292881,293440,292558,295792,292293,292593,294337,295430,295034,297060,293811,295587,294651,295559,293213,295742,292605,296018,294529,296711,293919,294732,294689,295540,293000,296916,294433,297112,293815,292522,296816,293320,293232,295369,291894,293700,291839,293049,292738,294895,294473,294023,294173,293019,291976,294923,294797,296958,294075,293450,296952,297185,295351,295736,296312,294330,292717,294638,294713,297176,295896,295137,296573,292236,294966,296642,296073,295903,293057,294628,292639,293803,294470,295353,297196,291752,296118,296964,296185,295338,295956,296064,295039,297201,297136,295206,295986,292172,294803,294480,294706,296975,296604,294493,293181,292526,293354,292374,292344,293744,294165,295082,296203,291918,295211,294289,294877,293120,295387))",
"table_condition_attached": null,
"access_type": "range"
}
]
Side note: I needed this due to the bulkiness of the trace:
##global.optimizer_trace_max_mem_size = 32222;
OR will be slowest. Whether IN or BETWEEN is faster will depend on your data, but I'd expect BETWEEN to be faster normally as it can simple take a range from an index (assuming someField is indexed).
Below are details of 6 queries using MySQL 5.6 #SQLFiddle
In summary the 6 queries cover independently indexed columns and 2 queries were used per data type. All queries resulted in use of an index regardless of IN() or ORs being used.
| ORs | IN()
integer | uses index | uses index
date | uses index | uses index
varchar | uses index | uses index
I really just wanted to debunk statements made that OR means no index can be used. This isn't true. Indexes can be used in queries using OR as the 6 queries in the following examples display.
Also it seems to me that many have ignored the fact that IN() is a syntax shortcut for a set of ORs. At small scale perfomance differences between using IN() -v- OR are extremely (infintessinally) marginal.
While at larger scale IN() is certainly more convenient, but it sill equates to a set of OR conditions logically. Circumstance change for each query so testing your query on your tables is always best.
Summary of the 6 explain plans, all "Using index condition" (scroll right)
Query select_type table type possible_keys key key_len ref rows filtered Extra
------------- --------- ------- --------------- ----------- --------- ----- ------ ---------- -----------------------
Integers using OR SIMPLE mytable range aNum_idx aNum_idx 4 10 100.00 Using index condition
Integers using IN SIMPLE mytable range aNum_idx aNum_idx 4 10 100.00 Using index condition
Dates using OR SIMPLE mytable range aDate_idx aDate_idx 6 7 100.00 Using index condition
Dates using IN SIMPLE mytable range aDate_idx aDate_idx 6 7 100.00 Using index condition
Varchar using OR SIMPLE mytable range aName_idx aName_idx 768 10 100.00 Using index condition
Varchar using IN SIMPLE mytable range aName_idx aName_idx 768 10 100.00 Using index condition
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE `myTable` (
`id` mediumint(8) unsigned NOT NULL auto_increment,
`aName` varchar(255) default NULL,
`aDate` datetime,
`aNum` mediumint(8),
PRIMARY KEY (`id`)
) AUTO_INCREMENT=1;
ALTER TABLE `myTable` ADD INDEX `aName_idx` (`aName`);
ALTER TABLE `myTable` ADD INDEX `aDate_idx` (`aDate`);
ALTER TABLE `myTable` ADD INDEX `aNum_idx` (`aNum`);
INSERT INTO `myTable` (`aName`,`aDate`)
VALUES
("Daniel","2017-09-19 01:22:31")
,("Quentin","2017-06-03 01:06:45")
,("Chester","2017-06-14 17:49:36")
,("Lev","2017-08-30 06:27:59")
,("Garrett","2018-10-04 02:40:37")
,("Lane","2017-01-22 17:11:21")
,("Chaim","2017-09-20 11:13:46")
,("Kieran","2018-03-10 18:37:26")
,("Cedric","2017-05-20 16:25:10")
,("Conan","2018-07-10 06:29:39")
,("Rudyard","2017-07-14 00:04:00")
,("Chadwick","2018-08-18 08:54:08")
,("Darius","2018-10-02 06:55:56")
,("Joseph","2017-06-19 13:20:33")
,("Wayne","2017-04-02 23:20:25")
,("Hall","2017-10-13 00:17:24")
,("Craig","2016-12-04 08:15:22")
,("Keane","2018-03-12 04:21:46")
,("Russell","2017-07-14 17:21:58")
,("Seth","2018-07-25 05:51:30")
,("Cole","2018-06-09 15:32:53")
,("Donovan","2017-08-12 05:21:35")
,("Damon","2017-06-27 03:44:19")
,("Brian","2017-02-01 23:35:20")
,("Harper","2017-08-25 04:29:27")
,("Chandler","2017-09-30 23:54:06")
,("Edward","2018-07-30 12:18:07")
,("Curran","2018-05-23 09:31:53")
,("Uriel","2017-05-08 03:31:43")
,("Honorato","2018-04-07 14:57:53")
,("Griffin","2017-01-07 23:35:31")
,("Hasad","2017-05-15 05:32:41")
,("Burke","2017-07-04 01:11:19")
,("Hyatt","2017-03-14 17:12:28")
,("Brenden","2017-10-17 05:16:14")
,("Ryan","2018-10-10 08:07:55")
,("Giacomo","2018-10-06 14:21:21")
,("James","2018-02-06 02:45:59")
,("Colt","2017-10-10 08:11:26")
,("Kermit","2017-09-18 16:57:16")
,("Drake","2018-05-20 22:08:36")
,("Berk","2017-04-16 17:39:32")
,("Alan","2018-09-01 05:33:05")
,("Deacon","2017-04-20 07:03:05")
,("Omar","2018-03-02 15:04:32")
,("Thaddeus","2017-09-19 04:07:54")
,("Troy","2016-12-13 04:24:08")
,("Rogan","2017-11-02 00:03:25")
,("Grant","2017-08-21 01:45:16")
,("Walker","2016-11-26 15:54:52")
,("Clarke","2017-07-20 02:26:56")
,("Clayton","2018-08-16 05:09:29")
,("Denton","2018-08-11 05:26:05")
,("Nicholas","2018-07-19 09:29:55")
,("Hashim","2018-08-10 20:38:06")
,("Todd","2016-10-25 01:01:36")
,("Xenos","2017-05-11 22:50:35")
,("Bert","2017-06-17 18:08:21")
,("Oleg","2018-01-03 13:10:32")
,("Hall","2018-06-04 01:53:45")
,("Evan","2017-01-16 01:04:25")
,("Mohammad","2016-11-18 05:42:52")
,("Armand","2016-12-18 06:57:57")
,("Kaseem","2018-06-12 23:09:57")
,("Colin","2017-06-29 05:25:52")
,("Arthur","2016-12-29 04:38:13")
,("Xander","2016-11-14 19:35:32")
,("Dante","2016-12-01 09:01:04")
,("Zahir","2018-02-17 14:44:53")
,("Raymond","2017-03-09 05:33:06")
,("Giacomo","2017-04-17 06:12:52")
,("Fulton","2017-06-04 00:41:57")
,("Chase","2018-01-14 03:03:57")
,("William","2017-05-08 09:44:59")
,("Fuller","2017-03-31 20:35:20")
,("Jarrod","2017-02-15 02:45:29")
,("Nissim","2018-03-11 14:19:25")
,("Chester","2017-11-05 00:14:27")
,("Perry","2017-12-24 11:58:04")
,("Theodore","2017-06-26 12:34:12")
,("Mason","2017-10-02 03:53:49")
,("Brenden","2018-10-08 10:09:47")
,("Jerome","2017-11-05 20:34:25")
,("Keaton","2018-08-18 00:55:56")
,("Tiger","2017-05-21 16:59:07")
,("Benjamin","2018-04-10 14:46:36")
,("John","2018-09-05 18:53:03")
,("Jakeem","2018-10-11 00:17:38")
,("Kenyon","2017-12-18 22:19:29")
,("Ferris","2017-03-29 06:59:13")
,("Hoyt","2017-01-03 03:48:56")
,("Fitzgerald","2017-07-27 11:27:52")
,("Forrest","2017-10-05 23:14:21")
,("Jordan","2017-01-11 03:48:09")
,("Lev","2017-05-25 08:03:39")
,("Chase","2017-06-18 19:09:23")
,("Ryder","2016-12-13 12:50:50")
,("Malik","2017-11-19 15:15:55")
,("Zeph","2018-04-04 11:22:12")
,("Amala","2017-01-29 07:52:17")
;
.
update MyTable
set aNum = id
;
Query 1:
select 'aNum by OR' q, mytable.*
from mytable
where aNum = 12
OR aNum = 22
OR aNum = 27
OR aNum = 32
OR aNum = 42
OR aNum = 52
OR aNum = 62
OR aNum = 65
OR aNum = 72
OR aNum = 82
Results:
| q | id | aName | aDate | aNum |
|------------|----|----------|----------------------|------|
| aNum by OR | 12 | Chadwick | 2018-08-18T08:54:08Z | 12 |
| aNum by OR | 22 | Donovan | 2017-08-12T05:21:35Z | 22 |
| aNum by OR | 27 | Edward | 2018-07-30T12:18:07Z | 27 |
| aNum by OR | 32 | Hasad | 2017-05-15T05:32:41Z | 32 |
| aNum by OR | 42 | Berk | 2017-04-16T17:39:32Z | 42 |
| aNum by OR | 52 | Clayton | 2018-08-16T05:09:29Z | 52 |
| aNum by OR | 62 | Mohammad | 2016-11-18T05:42:52Z | 62 |
| aNum by OR | 65 | Colin | 2017-06-29T05:25:52Z | 65 |
| aNum by OR | 72 | Fulton | 2017-06-04T00:41:57Z | 72 |
| aNum by OR | 82 | Brenden | 2018-10-08T10:09:47Z | 82 |
Query 2:
select 'aNum by IN' q, mytable.*
from mytable
where aNum IN (
12
, 22
, 27
, 32
, 42
, 52
, 62
, 65
, 72
, 82
)
Results:
| q | id | aName | aDate | aNum |
|------------|----|----------|----------------------|------|
| aNum by IN | 12 | Chadwick | 2018-08-18T08:54:08Z | 12 |
| aNum by IN | 22 | Donovan | 2017-08-12T05:21:35Z | 22 |
| aNum by IN | 27 | Edward | 2018-07-30T12:18:07Z | 27 |
| aNum by IN | 32 | Hasad | 2017-05-15T05:32:41Z | 32 |
| aNum by IN | 42 | Berk | 2017-04-16T17:39:32Z | 42 |
| aNum by IN | 52 | Clayton | 2018-08-16T05:09:29Z | 52 |
| aNum by IN | 62 | Mohammad | 2016-11-18T05:42:52Z | 62 |
| aNum by IN | 65 | Colin | 2017-06-29T05:25:52Z | 65 |
| aNum by IN | 72 | Fulton | 2017-06-04T00:41:57Z | 72 |
| aNum by IN | 82 | Brenden | 2018-10-08T10:09:47Z | 82 |
Query 3:
select 'adate by OR' q, mytable.*
from mytable
where aDate= str_to_date("2017-02-15 02:45:29",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2018-03-10 18:37:26",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2017-05-20 16:25:10",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2018-07-10 06:29:39",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2017-07-14 00:04:00",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2018-08-18 08:54:08",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2018-10-02 06:55:56",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2017-04-20 07:03:05",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2018-03-02 15:04:32",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2017-09-19 04:07:54",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2016-12-13 04:24:08",'%Y-%m-%d %h:%i:%s')
Results:
| q | id | aName | aDate | aNum |
|-------------|----|----------|----------------------|------|
| adate by OR | 47 | Troy | 2016-12-13T04:24:08Z | 47 |
| adate by OR | 76 | Jarrod | 2017-02-15T02:45:29Z | 76 |
| adate by OR | 44 | Deacon | 2017-04-20T07:03:05Z | 44 |
| adate by OR | 46 | Thaddeus | 2017-09-19T04:07:54Z | 46 |
| adate by OR | 10 | Conan | 2018-07-10T06:29:39Z | 10 |
| adate by OR | 12 | Chadwick | 2018-08-18T08:54:08Z | 12 |
| adate by OR | 13 | Darius | 2018-10-02T06:55:56Z | 13 |
Query 4:
select 'adate by IN' q, mytable.*
from mytable
where aDate IN (
str_to_date("2017-02-15 02:45:29",'%Y-%m-%d %h:%i:%s')
, str_to_date("2018-03-10 18:37:26",'%Y-%m-%d %h:%i:%s')
, str_to_date("2017-05-20 16:25:10",'%Y-%m-%d %h:%i:%s')
, str_to_date("2018-07-10 06:29:39",'%Y-%m-%d %h:%i:%s')
, str_to_date("2017-07-14 00:04:00",'%Y-%m-%d %h:%i:%s')
, str_to_date("2018-08-18 08:54:08",'%Y-%m-%d %h:%i:%s')
, str_to_date("2018-10-02 06:55:56",'%Y-%m-%d %h:%i:%s')
, str_to_date("2017-04-20 07:03:05",'%Y-%m-%d %h:%i:%s')
, str_to_date("2018-03-02 15:04:32",'%Y-%m-%d %h:%i:%s')
, str_to_date("2017-09-19 04:07:54",'%Y-%m-%d %h:%i:%s')
, str_to_date("2016-12-13 04:24:08",'%Y-%m-%d %h:%i:%s')
)
Results:
| q | id | aName | aDate | aNum |
|-------------|----|----------|----------------------|------|
| adate by IN | 47 | Troy | 2016-12-13T04:24:08Z | 47 |
| adate by IN | 76 | Jarrod | 2017-02-15T02:45:29Z | 76 |
| adate by IN | 44 | Deacon | 2017-04-20T07:03:05Z | 44 |
| adate by IN | 46 | Thaddeus | 2017-09-19T04:07:54Z | 46 |
| adate by IN | 10 | Conan | 2018-07-10T06:29:39Z | 10 |
| adate by IN | 12 | Chadwick | 2018-08-18T08:54:08Z | 12 |
| adate by IN | 13 | Darius | 2018-10-02T06:55:56Z | 13 |
Query 5:
select 'name by OR' q, mytable.*
from mytable
where aname = 'Alan'
OR aname = 'Brian'
OR aname = 'Chandler'
OR aname = 'Darius'
OR aname = 'Evan'
OR aname = 'Ferris'
OR aname = 'Giacomo'
OR aname = 'Hall'
OR aname = 'James'
OR aname = 'Jarrod'
Results:
| q | id | aName | aDate | aNum |
|-------------|----|----------|----------------------|------|
| name by OR | 43 | Alan | 2018-09-01T05:33:05Z | 43 |
| name by OR | 24 | Brian | 2017-02-01T23:35:20Z | 24 |
| name by OR | 26 | Chandler | 2017-09-30T23:54:06Z | 26 |
| name by OR | 13 | Darius | 2018-10-02T06:55:56Z | 13 |
| name by OR | 61 | Evan | 2017-01-16T01:04:25Z | 61 |
| name by OR | 90 | Ferris | 2017-03-29T06:59:13Z | 90 |
| name by OR | 37 | Giacomo | 2018-10-06T14:21:21Z | 37 |
| name by OR | 71 | Giacomo | 2017-04-17T06:12:52Z | 71 |
| name by OR | 16 | Hall | 2017-10-13T00:17:24Z | 16 |
| name by OR | 60 | Hall | 2018-06-04T01:53:45Z | 60 |
| name by OR | 38 | James | 2018-02-06T02:45:59Z | 38 |
| name by OR | 76 | Jarrod | 2017-02-15T02:45:29Z | 76 |
Query 6:
select 'name by IN' q, mytable.*
from mytable
where aname IN (
'Alan'
,'Brian'
,'Chandler'
, 'Darius'
, 'Evan'
, 'Ferris'
, 'Giacomo'
, 'Hall'
, 'James'
, 'Jarrod'
)
Results:
| q | id | aName | aDate | aNum |
|------------|----|----------|----------------------|------|
| name by IN | 43 | Alan | 2018-09-01T05:33:05Z | 43 |
| name by IN | 24 | Brian | 2017-02-01T23:35:20Z | 24 |
| name by IN | 26 | Chandler | 2017-09-30T23:54:06Z | 26 |
| name by IN | 13 | Darius | 2018-10-02T06:55:56Z | 13 |
| name by IN | 61 | Evan | 2017-01-16T01:04:25Z | 61 |
| name by IN | 90 | Ferris | 2017-03-29T06:59:13Z | 90 |
| name by IN | 37 | Giacomo | 2018-10-06T14:21:21Z | 37 |
| name by IN | 71 | Giacomo | 2017-04-17T06:12:52Z | 71 |
| name by IN | 16 | Hall | 2017-10-13T00:17:24Z | 16 |
| name by IN | 60 | Hall | 2018-06-04T01:53:45Z | 60 |
| name by IN | 38 | James | 2018-02-06T02:45:59Z | 38 |
| name by IN | 76 | Jarrod | 2017-02-15T02:45:29Z | 76 |
2018: IN (...) is faster. But >= && <= is even faster than IN.
Here's my benchmark.
I'll bet they are the same, you can run a test by doing the following:
loop over the "in (1,2,3,4)" 500 times and see how long it takes. loop over the "=1 or =2 or=3..." version 500 times and seeing how long it runs.
you could also try a join way, if someField is an index and your table is big it could be faster...
SELECT ...
FROM ...
INNER JOIN (SELECT 1 as newField UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4) dt ON someFIELD =newField
I tried the join method above on my SQL Server and it is nearly the same as the in (1,2,3,4), and they both result in a clustered index seek. I'm not sure how MySQL will handle them.
As explained by others, IN is better chosen than OR with respect to query performance.
Queries with OR condition might take more longer execution time in the below cases.
to execute if MySQL optimizer choses any other index to be efficient(during false positive cases).
If the number of records is more ( As clearly stated by Jacob )
From what I understand about the way that the compiler optimizes these types of queries, using the IN clause is more efficient than multiple OR clauses. If you have values where the BETWEEN clause can be used, that is more efficient still.
I know that, as long as you have an index on Field, the BETWEEN will use it to quickly find one end, then traverse to the other. This is most efficient.
Every EXPLAIN I've seen shows "IN ( ... )" and " ... OR ..." to be interchangeable and equally (in)efficient. Which you would expect, since the optimizer has no way to know whether or not they comprise an interval. It's also equivalent to a UNION ALL SELECT on the individual values.

Adding one hour to datetime and comparing with present time based on hours and minutes

I have written a query where I am adding one hour to a datetime which is present in database, and comparing that time to the present time based on minutes and hours.
Table structure of upgrades:
id| postid |type |status| counter| datetime | autoreposttime
1 | 139 | M | P | 1 | 2017-04-26 10:49:23 | 60
2 | 140 | M | P | 1 | 2017-04-26 10:49:27 | 60
3 | 141 | M | P | 1 | 2017-04-26 10:49:31 | 60
4 | 142 | M | P | 1 | 2017-04-26 10:49:34 | 60
Table structure of posts:
post_id | locationid | priority_time
81 | 1 | 2017-04-20 18:29:17
82 | 27 | 2017-04-20 18:29:19
85 | 27 | 2017-04-20 18:29:07
Here is my SQL query which I have written in where I want to retrieve rows where datetime + 1 hour is equal to present time.
$posts = DB::table('upgrades')
->join('posts','posts.post_id','=','upgrades.postid')
->where(DB::raw('DATE_FORMAT(DATE_ADD(`upgrades`.`datetime`, INTERVAL 1 HOUR),"%Y-%m-%d %H:%i")'),$datetime)
->select('posts.*','upgrades.*')->get();
I'm getting data as null, and I think there's some problem in the where condition. What is wrong with my query?
One way to do that is to use Carbon. If I understood you correctly, you want result for specified minute, so here's an example:
$time = Carbon::now()->subHour();
$posts = ....
->whereBetween('datetime', [$time->format('Y-m-d H:i:00'), $time->format('Y-m-d H:i:59'])
....
->get();

eloquent with Order by not returning correctly the SUM of field

I'm trying to work with eloquents and I'm having problems in generating the expected result.
To be honest, I will show you what I have, what I want and what I did. But, maybe there is another way to do what I want and I will be happy to test it as well. :)
What I have:
I do have a table, that is:
+----+------------+--------------+---------------+---------------------+---------------------+
| id | machine_id | balance_left | balance_found | created_at | updated_at |
+----+------------+--------------+---------------+---------------------+---------------------+
| 1 | 1 | 12.30 | 12.30 | 2016-01-26 16:39:18 | 2016-01-28 16:39:18 |
| 2 | 2 | 45.60 | 45.60 | 2016-01-26 16:39:18 | 2016-01-28 16:39:18 |
| 3 | 3 | 78.90 | 78.90 | 2016-01-26 16:40:22 | 2016-01-28 16:40:22 |
| 4 | 4 | 90.12 | 90.12 | 2016-01-26 16:40:22 | 2016-01-28 16:40:22 |
| 5 | 1 | 5.60 | 3.40 | 2016-01-28 17:30:33 | 2016-01-28 17:30:33 |
| 6 | 2 | 7.89 | 4.56 | 2016-01-28 17:30:33 | 2016-01-28 17:30:33 |
+----+------------+--------------+---------------+---------------------+---------------------+
What I need:
Given a date, I do need to get the newest results and SUM the 'balance_left'. BUT, having the machine_id as unique entry.
For example, if I pass as date 26/01 at 23:59, I would expect to receive the lines with ID 1, 2, 3 and 4. And, the SUM would be 12.30 + 45.60 + 78.90 + 90.12 = 226.92.
Now, if I pass as date 28/01 at 23:59, I would expect to receive the lines with ID 3, 4, 5 and 6. Once that the entries 5 and 6 are the NEWEST entries for the machine_id 1 and 2. In this case, the sum would be: 78.90 +90.12 + 5.60 + 7.89 = 182.51
What I did:
$sub = App\Transaction::orderBy('created_at','DESC');
$transcations = DB::table(DB::raw("({$sub->toSql()}) as sub"))->where('created_at','<=', '2016-01-28 23:59:59')->groupBy('machine_id')->sum('balance_left');
Without the SUM, but using a select('balance_left'), I'm getting the correct lines:
=> [
{#672
+"balance_left": 5.6,
},
{#671
+"balance_left": 7.89,
},
{#643
+"balance_left": 78.9,
},
{#677
+"balance_left": 90.12,
},
]
The problem is that when I request the sum, I'm getting:
=> 17.9
And, after a while, I discovered that actually, he is doing a sum of the machine_id entry, so the lines with ID 1 and 5, have machine_id 1, and then this sum is the sum of 12.30 +5.60 = 17.90
Problem:
How can I actually SUM the lines returned? Or, is there any other better way to return what I'm expecting?
Thanks a lot!

Count and select all dates for a specific field in MySQL

i have a data format like this:
+----+--------+---------------------+
| ID | utente | data |
+----+--------+---------------------+
| 1 | Man1 | 2014-02-10 12:12:00 |
+----+--------+---------------------+
| 2 | Women1 | 2015-02-10 12:12:00 |
+----+--------+---------------------+
| 3 | Man2 | 2016-02-10 12:12:00 |
+----+--------+---------------------+
| 4 | Women1 | 2014-03-10 12:12:00 |
+----+--------+---------------------+
| 5 | Man1 | 2014-04-10 12:12:00 |
+----+--------+---------------------+
| 6 | Women1 | 2014-02-10 12:12:00 |
+----+--------+---------------------+
I want to make a report that organise the ouptout in way like this:
+---------+--------+-------+---------------------+---------------------+---------------------+
| IDs | utente | count | data1 | data2 | data3 |
+---------+--------+-------+---------------------+---------------------+---------------------+
| 1, 5 | Man1 | 2 | 2014-02-10 12:12:00 | 2014-04-10 12:12:00 | |
+---------+--------+-------+---------------------+---------------------+---------------------+
| 2, 4, 6 | Women1 | 3 | 2015-02-10 12:12:00 | 2014-03-10 12:12:00 | 2014-05-10 12:12:00 |
+---------+--------+-------+---------------------+---------------------+---------------------+
All the row thath include the same user (utente) more than one time will be included in one row with all the dates and the count of records.
Thanks
While it's certainly possible to write a query that returns the data in the format you want, I would suggest you to use a GROUP BY query and two GROUP_CONCAT aggregate functions:
SELECT
GROUP_CONCAT(ID) as IDs,
utente,
COUNT(*) as cnt,
GROUP_CONCAT(data ORDER BY data) AS Dates
FROM
tablename
GROUP BY
utente
then at the application level you can split your Dates field to multiple columns.
Looks like a fairly standard "Breaking" report, complicated only by the fact that your dates extend horizontally instead of down...
SELECT * FROM t ORDER BY utente, data
$lastutente = $lastdata = '';
echo "<table>\n";
while ($row = fetch()) {
if ($lastutente != $row['utente']) {
if ($lastutente != '') {
/****
* THIS SECTION REF'D BELOW
***/
echo "<td>$cnt</td>\n";
foreach ($datelst[] as $d)
echo "<td>$row[data]</td>\n";
for ($i = count($datelst); $i < $NumberOfDateCells; $i++)
echo "<td> </td>\n";
echo "</tr>\n";
/****
* END OF SECTION REF'D BELOW
***/
}
echo "<tr><td>$row[utente]</td>\n"; // start a new row - you probably want to print other stuff too
$datelst = array();
$cnt = 0;
}
if ($lastdata != $row['data']) {
datelst[] = $row['data'];
}
$cnt += $row['cnt']; // or $cnt++ if it's one per row
}
print the end of the last row - see SECTION REF'D ABOVE
echo "</table>\n";
You could add a GROUP BY utente, data to your query above to put a little more load on mysql and a little less on your code - then you should have SUM(cnt) as cnt or COUNT(*) as cnt.

MySQL OR vs IN performance

I am wondering if there is any difference with regards to performance between the following
SELECT ... FROM ... WHERE someFIELD IN(1,2,3,4)
SELECT ... FROM ... WHERE someFIELD between 0 AND 5
SELECT ... FROM ... WHERE someFIELD = 1 OR someFIELD = 2 OR someFIELD = 3 ...
or will MySQL optimize the SQL in the same way compilers optimize code?
EDIT
Changed the AND's to OR's for the reason stated in the comments.
I needed to know this for sure, so I benchmarked both methods. I consistenly found IN to be much faster than using OR.
Do not believe people who give their "opinion", science is all about testing and evidence.
I ran a loop of 1000x the equivalent queries (for consistency, I used sql_no_cache):
IN: 2.34969592094s
OR: 5.83781504631s
Update:
(I don't have the source code for the original test, as it was 6 years ago, though it returns a result in the same range as this test)
In request for some sample code to test this, here is the simplest possible use case. Using Eloquent for syntax simplicity, raw SQL equivalent executes the same.
$t = microtime(true);
for($i=0; $i<10000; $i++):
$q = DB::table('users')->where('id',1)
->orWhere('id',2)
->orWhere('id',3)
->orWhere('id',4)
->orWhere('id',5)
->orWhere('id',6)
->orWhere('id',7)
->orWhere('id',8)
->orWhere('id',9)
->orWhere('id',10)
->orWhere('id',11)
->orWhere('id',12)
->orWhere('id',13)
->orWhere('id',14)
->orWhere('id',15)
->orWhere('id',16)
->orWhere('id',17)
->orWhere('id',18)
->orWhere('id',19)
->orWhere('id',20)->get();
endfor;
$t2 = microtime(true);
echo $t."\n".$t2."\n".($t2-$t)."\n";
1482080514.3635
1482080517.3713
3.0078368186951
$t = microtime(true);
for($i=0; $i<10000; $i++):
$q = DB::table('users')->whereIn('id',[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20])->get();
endfor;
$t2 = microtime(true);
echo $t."\n".$t2."\n".($t2-$t)."\n";
1482080534.0185
1482080536.178
2.1595389842987
I also did a test for future Googlers. Total count of returned results is 7264 out of 10000
SELECT * FROM item WHERE id = 1 OR id = 2 ... id = 10000
This query took 0.1239 seconds
SELECT * FROM item WHERE id IN (1,2,3,...10000)
This query took 0.0433 seconds
IN is 3 times faster than OR
The accepted answer doesn't explain the reason.
Below are quoted from High Performance MySQL, 3rd Edition.
In many database servers, IN() is just a synonym for multiple OR clauses, because the two are logically equivalent. Not so in MySQL, which sorts the values in the IN() list and uses a fast binary search to see whether a value is in the list. This is O(Log n) in the size of the list, whereas an equivalent series of OR clauses is O(n) in the size of the list (i.e., much slower for large lists)
I think the BETWEEN will be faster since it should be converted into:
Field >= 0 AND Field <= 5
It is my understanding that an IN will be converted to a bunch of OR statements anyway. The value of IN is the ease of use. (Saving on having to type each column name multiple times and also makes it easier to use with existing logic - you don't have to worry about AND/OR precedence because the IN is one statement. With a bunch of OR statements, you have to ensure you surround them with parentheses to make sure they are evaluated as one condition.)
The only real answer to your question is PROFILE YOUR QUERIES. Then you will know what works best in your particular situation.
It depends on what you are doing; how wide is the range, what is the data type (I know your example uses a numeric data type but your question can also apply to a lot of different data types).
This is an instance where you want to write the query both ways; get it working and then use EXPLAIN to figure out the execution differences.
I'm sure there is a concrete answer to this but this is how I would, practically speaking, figure out the answer for my given question.
This might be of some help: http://forge.mysql.com/wiki/Top10SQLPerformanceTips
Regards,
Frank
I think one explanation to sunseeker's observation is MySQL actually sort the values in the IN statement if they are all static values and using binary search, which is more efficient than the plain OR alternative. I can't remember where I've read that, but sunseeker's result seems to be a proof.
Just when you thought it was safe...
What is your value of eq_range_index_dive_limit? In particular, do you have more or fewer items in the IN clause?
This will not include a Benchmark, but will peer into the inner workings a little. Let's use a tool to see what is going on -- Optimizer Trace.
The query: SELECT * FROM canada WHERE id ...
With an OR of 3 values, part of the trace looks like:
"condition_processing": {
"condition": "WHERE",
"original_condition": "((`canada`.`id` = 296172) or (`canada`.`id` = 295093) or (`canada`.`id` = 293626))",
"steps": [
{
"transformation": "equality_propagation",
"resulting_condition": "(multiple equal(296172, `canada`.`id`) or multiple equal(295093, `canada`.`id`) or multiple equal(293626, `canada`.`id`))"
},
...
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "id",
"ranges": [
"293626 <= id <= 293626",
"295093 <= id <= 295093",
"296172 <= id <= 296172"
],
"index_dives_for_eq_ranges": true,
"chosen": true
...
"refine_plan": [
{
"table": "`canada`",
"pushed_index_condition": "((`canada`.`id` = 296172) or (`canada`.`id` = 295093) or (`canada`.`id` = 293626))",
"table_condition_attached": null,
"access_type": "range"
}
]
Note how ICP is being given ORs. This implies that OR is not turned into IN, and InnoDB will be performing a bunch of = tests through ICP. (I do not feel it is worth considering MyISAM.)
(This is Percona's 5.6.22-71.0-log; id is a secondary index.)
Now for IN() with a few values
eq_range_index_dive_limit = 10; there are 8 values.
"condition_processing": {
"condition": "WHERE",
"original_condition": "(`canada`.`id` in (296172,295093,293626,295573,297148,296127,295588,295810))",
"steps": [
{
"transformation": "equality_propagation",
"resulting_condition": "(`canada`.`id` in (296172,295093,293626,295573,297148,296127,295588,295810))"
},
...
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "id",
"ranges": [
"293626 <= id <= 293626",
"295093 <= id <= 295093",
"295573 <= id <= 295573",
"295588 <= id <= 295588",
"295810 <= id <= 295810",
"296127 <= id <= 296127",
"296172 <= id <= 296172",
"297148 <= id <= 297148"
],
"index_dives_for_eq_ranges": true,
"chosen": true
...
"refine_plan": [
{
"table": "`canada`",
"pushed_index_condition": "(`canada`.`id` in (296172,295093,293626,295573,297148,296127,295588,295810))",
"table_condition_attached": null,
"access_type": "range"
}
]
Note that the IN does not seem to be turned into OR.
A side note: Notice that the constant values were sorted. This can be beneficial in two ways:
By jumping around less, there may be better caching, less I/O to get to all the values.
If two similar queries are coming from separate connections, and they are in transactions, there is a better chance of getting a delay instead of a deadlock due to overlapping lists.
Finally, IN() with a lots of values
{
"condition_processing": {
"condition": "WHERE",
"original_condition": "(`canada`.`id` in (293831,292259,292881,293440,292558,295792,292293,292593,294337,295430,295034,297060,293811,295587,294651,295559,293213,295742,292605,296018,294529,296711,293919,294732,294689,295540,293000,296916,294433,297112,293815,292522,296816,293320,293232,295369,291894,293700,291839,293049,292738,294895,294473,294023,294173,293019,291976,294923,294797,296958,294075,293450,296952,297185,295351,295736,296312,294330,292717,294638,294713,297176,295896,295137,296573,292236,294966,296642,296073,295903,293057,294628,292639,293803,294470,295353,297196,291752,296118,296964,296185,295338,295956,296064,295039,297201,297136,295206,295986,292172,294803,294480,294706,296975,296604,294493,293181,292526,293354,292374,292344,293744,294165,295082,296203,291918,295211,294289,294877,293120,295387))",
"steps": [
{
"transformation": "equality_propagation",
"resulting_condition": "(`canada`.`id` in (293831,292259,292881,293440,292558,295792,292293,292593,294337,295430,295034,297060,293811,295587,294651,295559,293213,295742,292605,296018,294529,296711,293919,294732,294689,295540,293000,296916,294433,297112,293815,292522,296816,293320,293232,295369,291894,293700,291839,293049,292738,294895,294473,294023,294173,293019,291976,294923,294797,296958,294075,293450,296952,297185,295351,295736,296312,294330,292717,294638,294713,297176,295896,295137,296573,292236,294966,296642,296073,295903,293057,294628,292639,293803,294470,295353,297196,291752,296118,296964,296185,295338,295956,296064,295039,297201,297136,295206,295986,292172,294803,294480,294706,296975,296604,294493,293181,292526,293354,292374,292344,293744,294165,295082,296203,291918,295211,294289,294877,293120,295387))"
},
...
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "id",
"ranges": [
"291752 <= id <= 291752",
"291839 <= id <= 291839",
...
"297196 <= id <= 297196",
"297201 <= id <= 297201"
],
"index_dives_for_eq_ranges": false,
"rows": 111,
"chosen": true
...
"refine_plan": [
{
"table": "`canada`",
"pushed_index_condition": "(`canada`.`id` in (293831,292259,292881,293440,292558,295792,292293,292593,294337,295430,295034,297060,293811,295587,294651,295559,293213,295742,292605,296018,294529,296711,293919,294732,294689,295540,293000,296916,294433,297112,293815,292522,296816,293320,293232,295369,291894,293700,291839,293049,292738,294895,294473,294023,294173,293019,291976,294923,294797,296958,294075,293450,296952,297185,295351,295736,296312,294330,292717,294638,294713,297176,295896,295137,296573,292236,294966,296642,296073,295903,293057,294628,292639,293803,294470,295353,297196,291752,296118,296964,296185,295338,295956,296064,295039,297201,297136,295206,295986,292172,294803,294480,294706,296975,296604,294493,293181,292526,293354,292374,292344,293744,294165,295082,296203,291918,295211,294289,294877,293120,295387))",
"table_condition_attached": null,
"access_type": "range"
}
]
Side note: I needed this due to the bulkiness of the trace:
##global.optimizer_trace_max_mem_size = 32222;
OR will be slowest. Whether IN or BETWEEN is faster will depend on your data, but I'd expect BETWEEN to be faster normally as it can simple take a range from an index (assuming someField is indexed).
Below are details of 6 queries using MySQL 5.6 #SQLFiddle
In summary the 6 queries cover independently indexed columns and 2 queries were used per data type. All queries resulted in use of an index regardless of IN() or ORs being used.
| ORs | IN()
integer | uses index | uses index
date | uses index | uses index
varchar | uses index | uses index
I really just wanted to debunk statements made that OR means no index can be used. This isn't true. Indexes can be used in queries using OR as the 6 queries in the following examples display.
Also it seems to me that many have ignored the fact that IN() is a syntax shortcut for a set of ORs. At small scale perfomance differences between using IN() -v- OR are extremely (infintessinally) marginal.
While at larger scale IN() is certainly more convenient, but it sill equates to a set of OR conditions logically. Circumstance change for each query so testing your query on your tables is always best.
Summary of the 6 explain plans, all "Using index condition" (scroll right)
Query select_type table type possible_keys key key_len ref rows filtered Extra
------------- --------- ------- --------------- ----------- --------- ----- ------ ---------- -----------------------
Integers using OR SIMPLE mytable range aNum_idx aNum_idx 4 10 100.00 Using index condition
Integers using IN SIMPLE mytable range aNum_idx aNum_idx 4 10 100.00 Using index condition
Dates using OR SIMPLE mytable range aDate_idx aDate_idx 6 7 100.00 Using index condition
Dates using IN SIMPLE mytable range aDate_idx aDate_idx 6 7 100.00 Using index condition
Varchar using OR SIMPLE mytable range aName_idx aName_idx 768 10 100.00 Using index condition
Varchar using IN SIMPLE mytable range aName_idx aName_idx 768 10 100.00 Using index condition
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE `myTable` (
`id` mediumint(8) unsigned NOT NULL auto_increment,
`aName` varchar(255) default NULL,
`aDate` datetime,
`aNum` mediumint(8),
PRIMARY KEY (`id`)
) AUTO_INCREMENT=1;
ALTER TABLE `myTable` ADD INDEX `aName_idx` (`aName`);
ALTER TABLE `myTable` ADD INDEX `aDate_idx` (`aDate`);
ALTER TABLE `myTable` ADD INDEX `aNum_idx` (`aNum`);
INSERT INTO `myTable` (`aName`,`aDate`)
VALUES
("Daniel","2017-09-19 01:22:31")
,("Quentin","2017-06-03 01:06:45")
,("Chester","2017-06-14 17:49:36")
,("Lev","2017-08-30 06:27:59")
,("Garrett","2018-10-04 02:40:37")
,("Lane","2017-01-22 17:11:21")
,("Chaim","2017-09-20 11:13:46")
,("Kieran","2018-03-10 18:37:26")
,("Cedric","2017-05-20 16:25:10")
,("Conan","2018-07-10 06:29:39")
,("Rudyard","2017-07-14 00:04:00")
,("Chadwick","2018-08-18 08:54:08")
,("Darius","2018-10-02 06:55:56")
,("Joseph","2017-06-19 13:20:33")
,("Wayne","2017-04-02 23:20:25")
,("Hall","2017-10-13 00:17:24")
,("Craig","2016-12-04 08:15:22")
,("Keane","2018-03-12 04:21:46")
,("Russell","2017-07-14 17:21:58")
,("Seth","2018-07-25 05:51:30")
,("Cole","2018-06-09 15:32:53")
,("Donovan","2017-08-12 05:21:35")
,("Damon","2017-06-27 03:44:19")
,("Brian","2017-02-01 23:35:20")
,("Harper","2017-08-25 04:29:27")
,("Chandler","2017-09-30 23:54:06")
,("Edward","2018-07-30 12:18:07")
,("Curran","2018-05-23 09:31:53")
,("Uriel","2017-05-08 03:31:43")
,("Honorato","2018-04-07 14:57:53")
,("Griffin","2017-01-07 23:35:31")
,("Hasad","2017-05-15 05:32:41")
,("Burke","2017-07-04 01:11:19")
,("Hyatt","2017-03-14 17:12:28")
,("Brenden","2017-10-17 05:16:14")
,("Ryan","2018-10-10 08:07:55")
,("Giacomo","2018-10-06 14:21:21")
,("James","2018-02-06 02:45:59")
,("Colt","2017-10-10 08:11:26")
,("Kermit","2017-09-18 16:57:16")
,("Drake","2018-05-20 22:08:36")
,("Berk","2017-04-16 17:39:32")
,("Alan","2018-09-01 05:33:05")
,("Deacon","2017-04-20 07:03:05")
,("Omar","2018-03-02 15:04:32")
,("Thaddeus","2017-09-19 04:07:54")
,("Troy","2016-12-13 04:24:08")
,("Rogan","2017-11-02 00:03:25")
,("Grant","2017-08-21 01:45:16")
,("Walker","2016-11-26 15:54:52")
,("Clarke","2017-07-20 02:26:56")
,("Clayton","2018-08-16 05:09:29")
,("Denton","2018-08-11 05:26:05")
,("Nicholas","2018-07-19 09:29:55")
,("Hashim","2018-08-10 20:38:06")
,("Todd","2016-10-25 01:01:36")
,("Xenos","2017-05-11 22:50:35")
,("Bert","2017-06-17 18:08:21")
,("Oleg","2018-01-03 13:10:32")
,("Hall","2018-06-04 01:53:45")
,("Evan","2017-01-16 01:04:25")
,("Mohammad","2016-11-18 05:42:52")
,("Armand","2016-12-18 06:57:57")
,("Kaseem","2018-06-12 23:09:57")
,("Colin","2017-06-29 05:25:52")
,("Arthur","2016-12-29 04:38:13")
,("Xander","2016-11-14 19:35:32")
,("Dante","2016-12-01 09:01:04")
,("Zahir","2018-02-17 14:44:53")
,("Raymond","2017-03-09 05:33:06")
,("Giacomo","2017-04-17 06:12:52")
,("Fulton","2017-06-04 00:41:57")
,("Chase","2018-01-14 03:03:57")
,("William","2017-05-08 09:44:59")
,("Fuller","2017-03-31 20:35:20")
,("Jarrod","2017-02-15 02:45:29")
,("Nissim","2018-03-11 14:19:25")
,("Chester","2017-11-05 00:14:27")
,("Perry","2017-12-24 11:58:04")
,("Theodore","2017-06-26 12:34:12")
,("Mason","2017-10-02 03:53:49")
,("Brenden","2018-10-08 10:09:47")
,("Jerome","2017-11-05 20:34:25")
,("Keaton","2018-08-18 00:55:56")
,("Tiger","2017-05-21 16:59:07")
,("Benjamin","2018-04-10 14:46:36")
,("John","2018-09-05 18:53:03")
,("Jakeem","2018-10-11 00:17:38")
,("Kenyon","2017-12-18 22:19:29")
,("Ferris","2017-03-29 06:59:13")
,("Hoyt","2017-01-03 03:48:56")
,("Fitzgerald","2017-07-27 11:27:52")
,("Forrest","2017-10-05 23:14:21")
,("Jordan","2017-01-11 03:48:09")
,("Lev","2017-05-25 08:03:39")
,("Chase","2017-06-18 19:09:23")
,("Ryder","2016-12-13 12:50:50")
,("Malik","2017-11-19 15:15:55")
,("Zeph","2018-04-04 11:22:12")
,("Amala","2017-01-29 07:52:17")
;
.
update MyTable
set aNum = id
;
Query 1:
select 'aNum by OR' q, mytable.*
from mytable
where aNum = 12
OR aNum = 22
OR aNum = 27
OR aNum = 32
OR aNum = 42
OR aNum = 52
OR aNum = 62
OR aNum = 65
OR aNum = 72
OR aNum = 82
Results:
| q | id | aName | aDate | aNum |
|------------|----|----------|----------------------|------|
| aNum by OR | 12 | Chadwick | 2018-08-18T08:54:08Z | 12 |
| aNum by OR | 22 | Donovan | 2017-08-12T05:21:35Z | 22 |
| aNum by OR | 27 | Edward | 2018-07-30T12:18:07Z | 27 |
| aNum by OR | 32 | Hasad | 2017-05-15T05:32:41Z | 32 |
| aNum by OR | 42 | Berk | 2017-04-16T17:39:32Z | 42 |
| aNum by OR | 52 | Clayton | 2018-08-16T05:09:29Z | 52 |
| aNum by OR | 62 | Mohammad | 2016-11-18T05:42:52Z | 62 |
| aNum by OR | 65 | Colin | 2017-06-29T05:25:52Z | 65 |
| aNum by OR | 72 | Fulton | 2017-06-04T00:41:57Z | 72 |
| aNum by OR | 82 | Brenden | 2018-10-08T10:09:47Z | 82 |
Query 2:
select 'aNum by IN' q, mytable.*
from mytable
where aNum IN (
12
, 22
, 27
, 32
, 42
, 52
, 62
, 65
, 72
, 82
)
Results:
| q | id | aName | aDate | aNum |
|------------|----|----------|----------------------|------|
| aNum by IN | 12 | Chadwick | 2018-08-18T08:54:08Z | 12 |
| aNum by IN | 22 | Donovan | 2017-08-12T05:21:35Z | 22 |
| aNum by IN | 27 | Edward | 2018-07-30T12:18:07Z | 27 |
| aNum by IN | 32 | Hasad | 2017-05-15T05:32:41Z | 32 |
| aNum by IN | 42 | Berk | 2017-04-16T17:39:32Z | 42 |
| aNum by IN | 52 | Clayton | 2018-08-16T05:09:29Z | 52 |
| aNum by IN | 62 | Mohammad | 2016-11-18T05:42:52Z | 62 |
| aNum by IN | 65 | Colin | 2017-06-29T05:25:52Z | 65 |
| aNum by IN | 72 | Fulton | 2017-06-04T00:41:57Z | 72 |
| aNum by IN | 82 | Brenden | 2018-10-08T10:09:47Z | 82 |
Query 3:
select 'adate by OR' q, mytable.*
from mytable
where aDate= str_to_date("2017-02-15 02:45:29",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2018-03-10 18:37:26",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2017-05-20 16:25:10",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2018-07-10 06:29:39",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2017-07-14 00:04:00",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2018-08-18 08:54:08",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2018-10-02 06:55:56",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2017-04-20 07:03:05",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2018-03-02 15:04:32",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2017-09-19 04:07:54",'%Y-%m-%d %h:%i:%s')
OR aDate = str_to_date("2016-12-13 04:24:08",'%Y-%m-%d %h:%i:%s')
Results:
| q | id | aName | aDate | aNum |
|-------------|----|----------|----------------------|------|
| adate by OR | 47 | Troy | 2016-12-13T04:24:08Z | 47 |
| adate by OR | 76 | Jarrod | 2017-02-15T02:45:29Z | 76 |
| adate by OR | 44 | Deacon | 2017-04-20T07:03:05Z | 44 |
| adate by OR | 46 | Thaddeus | 2017-09-19T04:07:54Z | 46 |
| adate by OR | 10 | Conan | 2018-07-10T06:29:39Z | 10 |
| adate by OR | 12 | Chadwick | 2018-08-18T08:54:08Z | 12 |
| adate by OR | 13 | Darius | 2018-10-02T06:55:56Z | 13 |
Query 4:
select 'adate by IN' q, mytable.*
from mytable
where aDate IN (
str_to_date("2017-02-15 02:45:29",'%Y-%m-%d %h:%i:%s')
, str_to_date("2018-03-10 18:37:26",'%Y-%m-%d %h:%i:%s')
, str_to_date("2017-05-20 16:25:10",'%Y-%m-%d %h:%i:%s')
, str_to_date("2018-07-10 06:29:39",'%Y-%m-%d %h:%i:%s')
, str_to_date("2017-07-14 00:04:00",'%Y-%m-%d %h:%i:%s')
, str_to_date("2018-08-18 08:54:08",'%Y-%m-%d %h:%i:%s')
, str_to_date("2018-10-02 06:55:56",'%Y-%m-%d %h:%i:%s')
, str_to_date("2017-04-20 07:03:05",'%Y-%m-%d %h:%i:%s')
, str_to_date("2018-03-02 15:04:32",'%Y-%m-%d %h:%i:%s')
, str_to_date("2017-09-19 04:07:54",'%Y-%m-%d %h:%i:%s')
, str_to_date("2016-12-13 04:24:08",'%Y-%m-%d %h:%i:%s')
)
Results:
| q | id | aName | aDate | aNum |
|-------------|----|----------|----------------------|------|
| adate by IN | 47 | Troy | 2016-12-13T04:24:08Z | 47 |
| adate by IN | 76 | Jarrod | 2017-02-15T02:45:29Z | 76 |
| adate by IN | 44 | Deacon | 2017-04-20T07:03:05Z | 44 |
| adate by IN | 46 | Thaddeus | 2017-09-19T04:07:54Z | 46 |
| adate by IN | 10 | Conan | 2018-07-10T06:29:39Z | 10 |
| adate by IN | 12 | Chadwick | 2018-08-18T08:54:08Z | 12 |
| adate by IN | 13 | Darius | 2018-10-02T06:55:56Z | 13 |
Query 5:
select 'name by OR' q, mytable.*
from mytable
where aname = 'Alan'
OR aname = 'Brian'
OR aname = 'Chandler'
OR aname = 'Darius'
OR aname = 'Evan'
OR aname = 'Ferris'
OR aname = 'Giacomo'
OR aname = 'Hall'
OR aname = 'James'
OR aname = 'Jarrod'
Results:
| q | id | aName | aDate | aNum |
|-------------|----|----------|----------------------|------|
| name by OR | 43 | Alan | 2018-09-01T05:33:05Z | 43 |
| name by OR | 24 | Brian | 2017-02-01T23:35:20Z | 24 |
| name by OR | 26 | Chandler | 2017-09-30T23:54:06Z | 26 |
| name by OR | 13 | Darius | 2018-10-02T06:55:56Z | 13 |
| name by OR | 61 | Evan | 2017-01-16T01:04:25Z | 61 |
| name by OR | 90 | Ferris | 2017-03-29T06:59:13Z | 90 |
| name by OR | 37 | Giacomo | 2018-10-06T14:21:21Z | 37 |
| name by OR | 71 | Giacomo | 2017-04-17T06:12:52Z | 71 |
| name by OR | 16 | Hall | 2017-10-13T00:17:24Z | 16 |
| name by OR | 60 | Hall | 2018-06-04T01:53:45Z | 60 |
| name by OR | 38 | James | 2018-02-06T02:45:59Z | 38 |
| name by OR | 76 | Jarrod | 2017-02-15T02:45:29Z | 76 |
Query 6:
select 'name by IN' q, mytable.*
from mytable
where aname IN (
'Alan'
,'Brian'
,'Chandler'
, 'Darius'
, 'Evan'
, 'Ferris'
, 'Giacomo'
, 'Hall'
, 'James'
, 'Jarrod'
)
Results:
| q | id | aName | aDate | aNum |
|------------|----|----------|----------------------|------|
| name by IN | 43 | Alan | 2018-09-01T05:33:05Z | 43 |
| name by IN | 24 | Brian | 2017-02-01T23:35:20Z | 24 |
| name by IN | 26 | Chandler | 2017-09-30T23:54:06Z | 26 |
| name by IN | 13 | Darius | 2018-10-02T06:55:56Z | 13 |
| name by IN | 61 | Evan | 2017-01-16T01:04:25Z | 61 |
| name by IN | 90 | Ferris | 2017-03-29T06:59:13Z | 90 |
| name by IN | 37 | Giacomo | 2018-10-06T14:21:21Z | 37 |
| name by IN | 71 | Giacomo | 2017-04-17T06:12:52Z | 71 |
| name by IN | 16 | Hall | 2017-10-13T00:17:24Z | 16 |
| name by IN | 60 | Hall | 2018-06-04T01:53:45Z | 60 |
| name by IN | 38 | James | 2018-02-06T02:45:59Z | 38 |
| name by IN | 76 | Jarrod | 2017-02-15T02:45:29Z | 76 |
2018: IN (...) is faster. But >= && <= is even faster than IN.
Here's my benchmark.
I'll bet they are the same, you can run a test by doing the following:
loop over the "in (1,2,3,4)" 500 times and see how long it takes. loop over the "=1 or =2 or=3..." version 500 times and seeing how long it runs.
you could also try a join way, if someField is an index and your table is big it could be faster...
SELECT ...
FROM ...
INNER JOIN (SELECT 1 as newField UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4) dt ON someFIELD =newField
I tried the join method above on my SQL Server and it is nearly the same as the in (1,2,3,4), and they both result in a clustered index seek. I'm not sure how MySQL will handle them.
As explained by others, IN is better chosen than OR with respect to query performance.
Queries with OR condition might take more longer execution time in the below cases.
to execute if MySQL optimizer choses any other index to be efficient(during false positive cases).
If the number of records is more ( As clearly stated by Jacob )
From what I understand about the way that the compiler optimizes these types of queries, using the IN clause is more efficient than multiple OR clauses. If you have values where the BETWEEN clause can be used, that is more efficient still.
I know that, as long as you have an index on Field, the BETWEEN will use it to quickly find one end, then traverse to the other. This is most efficient.
Every EXPLAIN I've seen shows "IN ( ... )" and " ... OR ..." to be interchangeable and equally (in)efficient. Which you would expect, since the optimizer has no way to know whether or not they comprise an interval. It's also equivalent to a UNION ALL SELECT on the individual values.