I have a mysql table that holds about 8 Million Records and I need to run some analytics on it to get averages as shown in below table definition and query. The result contains hourly analytics (avg of a parameter value) for the last 1 year data.
MySQL Server Version : 8.0.15
Table:
create table `temp_data` (
`dateLogged` datetime NOT NULL,
`paramName` varchar(30) NOT NULL,
`paramValue` float NOT NULL,
`sensorId` varchar(20) NOT NULL,
`locationCode` varchar(30) NOT NULL,
PRIMARY KEY (`sensorId`,`paramName`,`dateLogged`),
KEY `summary` (`locationCode`,`paramName`,`dateLogged`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED
Query: The below query transposes row based parameters into columns and while doing so computes the average of param values
SELECT dateLogged,
ROUND(avg( ROUND(IF(paramName = 'temp1', paramValue, NULL),2) ),2) AS T1,
ROUND(avg( ROUND(IF(paramName = 'temp2', paramValue, NULL),2) ),2) AS T2,
ROUND(avg( ROUND(IF(paramName = 'temp3', paramValue, NULL),2) ),2) AS T3,
ROUND(avg( ROUND(IF(paramName = 'temp4', paramValue, NULL),2) ),2) as T4
FROM temp_data where locationCode='A123' and paramName in ('temp1','temp2','temp3','temp4')
group by dateLogged order by dateLogged;
Result:
+---------------------+--------+---------+-------+-------+
| date | T1 | T2 | T3 | T4 |
+---------------------+--------+---------+-------+-------+
| 2018-12-01 00:00:00 | 95.46 | 99.12 | 96.44 | 95.86 |
| 2018-12-01 01:00:00 | 100.38 | 101.09 | 99.56 | 99.70 |
| 2018-12-01 02:00:00 | 101.41 | 102.08 | 99.47 | 99.88 |
| 2018-12-01 03:00:00 | 98.79 | 100.47 | 98.59 | 99.75 |
| 2018-12-01 04:00:00 | 98.23 | 100.58 | 98.38 | 98.93 |
| 2018-12-01 05:00:00 | 101.03 | 101.80 | 99.37 | 99.88 |
... ... ... ... ...
+---------------------+--------+---------+---------+-----+
Problem:
Now there are over 8 Million records in the table and the query takes approximately 35 to 40 seconds to execute.
Looking for suggestions on how to improve the query performance and hopefully, bring it down to under 10 seconds.
Note:
The table has data for up to 1 year and data beyond that is archived and deleted
Result of describe:
+----+-------------+-----------+------------+------+-----------------+---------+---------+-------+---------+----------+--------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+-----------------+---------+---------+-------+---------+----------+--------------------------------------------------------+
| 1 | SIMPLE | temp_data | NULL | ref | PRIMARY,summary | summary | 53 | const | 3524800 | 50.00 | Using index condition; Using temporary; Using filesort |
+----+-------------+-----------+------------+------+-----------------+---------+---------+-------+---------+----------+--------------------------------------------------------+
As temp1 -> temp4 are fixed we can use generated columns to index this:
alter table temp_data add p1234 bool as (paramName IN ('temp1','temp2','temp3','temp4')) NOT NULL,
ADD KEY s1234 (locationCode, p1234, paramName, paramValue, dateLogged)
Then change the query too:
SELECT dateLogged, paramName,
ROUND(avg( ROUND(paramValue,2) ),2)
FROM temp_data where locationCode='A123' and p1234
group by dateLogged, paramName
order by dateLogged, paramName;
Handle the T1 -> T4 paramName formatting in the application code
Related
I have the following table my_entry:
Id int(11) AI PK
InternalId varchar(30)
UpdatedDate datetime
IsDeleted bit(1)
And I have the following query:
SELECT
`Id`, `InternalId`
FROM
`my_entry`
WHERE
(`IsDeleted` = FALSE)
AND ((`UpdatedDate` IS NULL
OR DATE(`UpdatedDate`) != DATE(STR_TO_DATE('17/10/2019', '%d/%m/%Y'))))
ORDER BY `x`.`UpdatedDate`
Limit 200;
The table has around 3M records, I have a program running that executes the above query and returns 200 entries from the table that weren't updated today, the program then changes those 200 entries and updates them again setting the UpdatedDate to today's date, on the next execution those 200 entries will be ignored, and new 200 entries will get selected, this keeps running until all the entries in the table are selected and updated for today.
This way I can ensure that all the entries are updated at least once every day.
This works perfectly fine, for the very first thousands of entries, the select query executes in a couple of milliseconds, but as soon as more entries are updated and have today's date in the UpdatedDate the query keeps slowing down, reaching execution times up to 20 seconds.
I'm wondering if I can do something to optimize the query, or if there is a better approach to take without using the UpdatedDate.
I was thinking of using the Id and paginating the entries, but I'm afraid this way I might miss some of them.
What I already tried:
Adding indexes to both the UpdatedDate and IsDeleted.
Changing the UpdatedDate type from datetime to date.
Edit:
MySql version: 5.6.45
The table in hand:
CREATE TABLE `my_entry` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`InternalId` varchar(30) NOT NULL,
`UpdatedDate` date DEFAULT NULL,
`IsDeleted` bit(1) NOT NULL DEFAULT b'0',
PRIMARY KEY (`Id`),
UNIQUE KEY `InternalId` (`InternalId`),
KEY `UpdatedDate` (`UpdatedDate`),
KEY `entry_isdeleted_index` (`IsDeleted`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=8204626 DEFAULT CHARSET=utf8mb4
The output of the EXPLAIN query:
+----+-------------+-------+-------+-------------------------------------+-------------+---------+------+------+---------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-------------------------------------+-------------+---------+------+------+---------------+
| 1 | SIMPLE | x | index | "UpdatedDate entry_isdeleted_index" | UpdatedDate | 4 | NULL | 400 | "Using where" |
+----+-------------+-------+-------+-------------------------------------+-------------+---------+------+------+---------------+
Example of data in the table:
+------------+--------+---------------------+-----------+
| InternalId | Id | UpdatedDate | IsDeleted |
+------------+--------+---------------------+-----------+
| 328044773 | 552990 | 2019-10-17 10:11:29 | 0 |
| 330082707 | 552989 | 2019-10-17 10:11:29 | 0 |
| 329701688 | 552988 | 2019-10-17 10:11:29 | 0 |
| 329954358 | 552987 | 2019-10-16 10:11:29 | 0 |
| 964227577 | 552986 | 2019-10-16 12:33:29 | 0 |
| 329794593 | 552985 | 2019-10-16 12:33:29 | 0 |
| 400015773 | 552984 | 2019-10-16 12:33:29 | 0 |
| 330674329 | 552983 | 2019-10-16 12:33:29 | 0 |
+------------+--------+---------------------+-----------+
Example expected output of the query:
+------------+--------+
| InternalId | Id |
+------------+--------+
| 329954358 | 552987 |
| 964227577 | 552986 |
| 329794593 | 552985 |
| 400015773 | 552984 |
| 330674329 | 552983 |
+------------+--------+
First, simplify the date arithmetic. Then take the following approach:
Take NULL values in one subquery
Take rows on the date in another
Then order and select the results
Start by writing the query as:
SELECT Id, InternalId
FROM ((SELECT Id, InternalId, 2 as priority
FROM my_entry
WHERE NOT IsDeleted AND UpdatedDate IS NULL
LIMIT 200
) UNION ALL
(SELECT Id, InternalId, 1 as priority
FROM my_entry
WHERE NOT IsDeleted AND UpdatedDate <> '2019-10-17'
LIMIT 200
)
) t
ORDER BY priority
LIMIT 200;
The index that you want is either (updateddate, isdeleted) or (isdeleted, updateddate). You can add id and internalid.
The idea is to select at most 200 rows from the two subqueries without sorting. Then the outer query is sorting at most 400 rows -- and that should not take multiple seconds.
for a project i'm working on, i have a single table with two dates meaning a range of dates and i needed a way to "multiply" my rows for every day in between the two dates.
So for instance i have start 2017-07-10, end 2017-07-14
I needed to have 4 lines with 2017-07-10, 2017-07-11, 2017-07-12, 2017-07-13
In order to do this i found here someone mentioning using a "calendar table" with all the dates for years.
So i built it, now i have these two simple tables:
CREATE TABLE `time_sample` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`start` varchar(16) DEFAULT NULL,
`end` varchar(16) DEFAULT NULL,
PRIMARY KEY (`societa_id`),
KEY `start_idx` (`start`),
KEY `end_idx` (`end`)
) ENGINE=MyISAM AUTO_INCREMENT=222 DEFAULT CHARSET=latin1;
This table contains my date ranges, start and end are indexed, the primary key is an incremental int.
Sample Row:
id start end
1 2015-05-13 2015-05-18
Second table:
CREATE TABLE `time_dimension` (
`id` int(11) NOT NULL,
`db_date` date NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `td_dbdate_idx` (`db_date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
This has a date indexed for every day for many years to come.
Sample row:
id db_date
20120101 2012-01-01
Now, i made the join:
select * from time_sample s join time_dimension t on (t.db_date >= start and t.db_date < end);
This takes 3ms. Even if my first table is HUGE, this query will always be very quick (max i've seen was 50ms with a lot of records).
The issue i have is while grouping results (i need them grouped for my application):
select * from time_sample s join time_dimension t on (t.db_date >= start and t.db_date < end) group by db_date;
This takes more than one second with not so many rows in the first table, increasing dramatically. Why is this happening and how can i avoid this?
Changing the data types doesn't help, having the second table with just one column doesn't help.
Can i have suggestions, please :(
I cannot replicate this result...
I have a calendar table with lots of dates: calendar(dt) where dt is a PRIMARY KEY DATE data type.
DROP TABLE IF EXISTS time_sample;
CREATE TABLE time_sample (
id int(11) NOT NULL AUTO_INCREMENT,
start date not NULL,
end date null,
PRIMARY KEY (id),
KEY (start,end)
);
INSERT INTO time_sample (start,end) VALUES ('2010-03-13','2010-05-09);
SELECT *
FROM calendar x
JOIN time_sample y
ON x.dt BETWEEN y.start AND y.end;
+------------+----+------------+------------+
| dt | id | start | end |
+------------+----+------------+------------+
| 2010-03-13 | 1 | 2010-03-13 | 2010-05-09 |
| 2010-03-14 | 1 | 2010-03-13 | 2010-05-09 |
| 2010-03-15 | 1 | 2010-03-13 | 2010-05-09 |
| 2010-03-16 | 1 | 2010-03-13 | 2010-05-09 |
...
| 2010-05-09 | 1 | 2010-03-13 | 2010-05-09 |
+------------+----+------------+------------+
58 rows in set (0.10 sec)
EXPLAIN
SELECT * FROM calendar x JOIN time_sample y ON x.dt BETWEEN y.start AND y.end;
+----+-------------+-------+--------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | y | system | start | NULL | NULL | NULL | 1 | |
| 1 | SIMPLE | x | range | PRIMARY | PRIMARY | 3 | NULL | 57 | Using where; Using index |
+----+-------------+-------+--------+---------------+---------+---------+------+------+--------------------------+
2 rows in set (0.00 sec)
Even with a GROUP BY, I'm struggling to reproduce the problem. Here's a simple COUNT...
SELECT SQL_NO_CACHE dt, COUNT(1) FROM calendar x JOIN time_sample y WHERE x.dt BETWEEN y.start AND y.end GROUP BY dt ORDER BY COUNT(1) DESC LIMIT 3;
+------------+----------+
| dt | COUNT(1) |
+------------+----------+
| 2010-04-03 | 2 |
| 2010-05-05 | 2 |
| 2010-03-13 | 2 |
+------------+----------+
3 rows in set (0.36 sec)
EXPLAIN
SELECT SQL_NO_CACHE dt, COUNT(1) FROM calendar x JOIN time_sample y WHERE x.dt BETWEEN y.start AND y.end GROUP BY dt ORDER BY COUNT(1) DESC LIMIT 3;
+----+-------------+-------+-------+---------------+---------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+----------------------------------------------+
| 1 | SIMPLE | y | index | start | start | 7 | NULL | 2 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | x | index | PRIMARY | PRIMARY | 3 | NULL | 1000001 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+----------------------------------------------+
I have a table with currency exchange rates that I fill with data published by the ECB. That data contains gaps in the date dimension like e.g. holidays.
CREATE TABLE `imp_exchangerate` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`rate_date` date NOT NULL,
`currency` char(3) NOT NULL,
`rate` decimal(14,6) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `rate_date` (`rate_date`,`currency`),
KEY `imp_exchangerate_by_currency` (`currency`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
I also have a date dimension as youd expect in a data warehouse:
CREATE TABLE `d_date` (
`date_id` int(11) NOT NULL,
`full_date` date DEFAULT NULL,
---- etc.
PRIMARY KEY (`date_id`),
KEY `full_date` (`full_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Now I try to fill the gaps in the exchangerates like this:
SELECT
d.full_date,
currency,
(SELECT rate FROM imp_exchangerate
WHERE rate_date <= d.full_date AND currency = c.currency
ORDER BY rate_date DESC LIMIT 1) AS rate
FROM
d_date d,
(SELECT DISTINCT currency FROM imp_exchangerate) c
WHERE
d.full_date >=
(SELECT min(rate_date) FROM imp_exchangerate
WHERE currency = c.currency) AND
d.full_date <= curdate()
Explain says:
+------+--------------------+------------------+-------+----------------------------------------+------------------------------+---------+------------+------+--------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+------------------+-------+----------------------------------------+------------------------------+---------+------------+------+--------------------------------------------------------------+
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 201 | |
| 1 | PRIMARY | d | range | full_date | full_date | 4 | NULL | 6047 | Using where; Using index; Using join buffer (flat, BNL join) |
| 4 | DEPENDENT SUBQUERY | imp_exchangerate | ref | imp_exchangerate_by_currency | imp_exchangerate_by_currency | 3 | c.currency | 664 | |
| 3 | DERIVED | imp_exchangerate | range | NULL | imp_exchangerate_by_currency | 3 | NULL | 201 | Using index for group-by |
| 2 | DEPENDENT SUBQUERY | imp_exchangerate | index | rate_date,imp_exchangerate_by_currency | rate_date | 6 | NULL | 1 | Using where |
+------+--------------------+------------------+-------+----------------------------------------+------------------------------+---------+------------+------+--------------------------------------------------------------+
MySQL needs multiple hours to execute that query. Are there any Ideas how to improve that? I have tried with an index on rate without any noticable impact.
I have a solution for a while now: get rid of dependent subqueries. I had to think from different angles in mutliple places and here is the result:
SELECT
cd.date_id,
x.currency,
x.rate
FROM
imp_exchangerate x INNER JOIN
(SELECT
d.date_id,
max(rate_date) as rate_date,
currency
FROM
d_date d INNER JOIN
imp_exchangerate ON rate_date <= d.full_date
WHERE
d.full_date <= curdate()
GROUP BY
d.date_id,
currency) cd ON x.rate_date = cd.rate_date and x.currency = cd.currency
This query finishes in less then 10 minutes now compared to multiple hours for the original query.
Lesson learned: avoid dependent subqueries in MySQL like the plague!
The following query hangs on the "sending data" phase for an incredibly long time. It is a large query but im hoping to get some assistance with my indexes and possibly learn a bit more about how MySQL actually chooses which index its going to use.
Below is the query as well as a DESCRIBE statement output.
mysql> DESCRIBE SELECT e.employee_number, s.current_status_start_date, e.company_code, e.location_code, s.last_suffix_first_mi, s.job_title, SUBSTRING(e.job_code,1,1) tt_jobCode,
-> SUM(e.current_amount) tt_grossWages,
-> IFNULL((SUM(e.current_amount) - IF(tt1.tt_reduction = '','0',tt1.tt_reduction)),SUM(e.current_amount)) tt_taxableWages,
-> t.new_code, STR_TO_DATE(s.last_hire_date, '%Y-%m-%d') tt_hireDate,
-> IF(s.current_status_code = 'T',STR_TO_DATE(s.current_status_start_date, '%Y-%m-%d'),'') tt_terminationDate,
-> IFNULL(tt_totalHours,'0') tt_totalHours
-> FROM check_earnings e
-> LEFT JOIN (
-> SELECT * FROM summary
-> GROUP BY employee_no
-> ORDER BY current_status_start_date DESC
-> ) s
-> ON e.employee_number = s.employee_no
-> LEFT JOIN (
-> SELECT employee_no, SUM(current_amount__employee) tt_reduction
-> FROM check_deductions
-> WHERE STR_TO_DATE(pay_date, '%Y-%m-%d') >= STR_TO_DATE('2012-06-01', '%Y-%m-%d')
-> AND STR_TO_DATE(pay_date, '%Y-%m-%d') <= STR_TO_DATE('2013-06-01', '%Y-%m-%d')
-> AND (
-> deduction_code IN ('DECMP','FSAM','FSAC','DCMAK','DCMAT','401KD')
-> OR deduction_code LIKE 'IM%'
-> OR deduction_code LIKE 'ID%'
-> OR deduction_code LIKE 'IV%'
-> )
-> GROUP BY employee_no
-> ORDER BY employee_no ASC
-> ) tt1
-> ON e.employee_number = tt1.employee_no
-> LEFT JOIN translation t
-> ON e.location_code = t.old_code
-> LEFT JOIN (
-> SELECT employee_number, SUM(current_hours) tt_totalHours
-> FROM check_earnings
-> WHERE STR_TO_DATE(pay_date, '%Y-%m-%d') >= STR_TO_DATE('2012-06-01', '%Y-%m-%d')
-> AND STR_TO_DATE(pay_date, '%Y-%m-%d') <= STR_TO_DATE('2013-06-01', '%Y-%m-%d')
-> AND earnings_code IN ('REG1','REG2','REG3','REG4')
-> GROUP BY employee_number
-> ) tt2
-> ON e.employee_number = tt2.employee_number
-> WHERE STR_TO_DATE(e.pay_date, '%Y-%m-%d') >= STR_TO_DATE('2012-06-01', '%Y-%m-%d')
-> AND STR_TO_DATE(e.pay_date, '%Y-%m-%d') <= STR_TO_DATE('2013-06-01', '%Y-%m-%d')
-> AND SUBSTRING(e.job_code,1,1) != 'E'
-> AND e.location_code != '639'
-> AND t.field = 'location_state'
-> GROUP BY e.employee_number
-> ORDER BY s.current_status_start_date DESC, e.location_code ASC, s.last_suffix_first_mi ASC;
+----+-------------+------------------+-------+----------------+-----------------+---------+----------------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+----------------+-----------------+---------+----------------------------+---------+----------------------------------------------+
| 1 | PRIMARY | e | ALL | location_code | NULL | NULL | NULL | 3498603 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | t | ref | field,old_code | old_code | 303 | historical.e.location_code | 1 | Using where |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 16741 | |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 2530 | |
| 1 | PRIMARY | <derived4> | ALL | NULL | NULL | NULL | NULL | 2919 | |
| 4 | DERIVED | check_earnings | index | NULL | employee_number | 303 | NULL | 3498603 | Using where |
| 3 | DERIVED | check_deductions | index | deduction_code | employee_no | 303 | NULL | 6387048 | Using where |
| 2 | DERIVED | summary | index | NULL | employee_no | 303 | NULL | 17608 | Using temporary; Using filesort |
+----+-------------+------------------+-------+----------------+-----------------+---------+----------------------------+---------+----------------------------------------------+
8 rows in set, 65535 warnings (32.77 sec)
EDIT: After playing with some indexes, it now spends the most time in the "Copying to tmp table" state.
There's no way you can avoid use of a temp table in that query. One reason is that you are grouping by different columns than you are sorting by.
Another reason is the use of derived tables (subqueries in the FROM/JOIN clauses).
One way you could speed this up is to create summary tables to store the result of those subqueries so you don't have to do them during every query.
You are also forcing table-scans by searching on the result of functions like STR_TO_DATE() and SUBSTR(). These cannot be optimized with an index.
Re your comment:
I can make an SQL query against a far smaller table run for 72 hours with a poorly-optimized query.
Note for example in the output of your DESCRIBE, it shows "ALL" for several of the tables involved in the join. This means it has to do a table-scan of all the rows (shown in the 'rows' column).
A rule of thumb: how many row comparisons does it take to resolve the join? Multiple the 'rows' of all the tables joined together with the same 'id'.
+----+-------------+------------------+-------+---------+
| id | select_type | table | type | rows |
+----+-------------+------------------+-------+---------+
| 1 | PRIMARY | e | ALL | 3498603 |
| 1 | PRIMARY | t | ref | 1 |
| 1 | PRIMARY | <derived2> | ALL | 16741 |
| 1 | PRIMARY | <derived3> | ALL | 2530 |
| 1 | PRIMARY | <derived4> | ALL | 2919 |
So it may be evaluating the join conditions 432,544,383,105,752,610 times (assume those numbers are approximate, so it may not really be as bad as that). It's actually a miracle it takes only 5 hours!
What you need to do is use indexes to help the query reduce the number of rows it needs to examine.
For example, why are you using STR_TO_DATE() given that the date you are parsing is the native date format for MySQL? Why don't you store those columns as a DATE data type? Then the search could use an index.
You don't need to "play with indexes." It's not like indexing is a mystery or has random effects. See my presentation How to Design Indexes, Really for some introduction.
I have the following two MySQL tables which I need to join:
CREATE TABLE `tbl_L` (
`datetime` datetime NOT NULL,
`value` decimal(14,8) DEFAULT NULL,
`series_id` int(11) NOT NULL,
PRIMARY KEY (`series_id`,`datetime`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `tbl_R` (
`datetime` datetime NOT NULL,
`value` decimal(14,8) DEFAULT NULL,
`series_id` int(11) NOT NULL,
PRIMARY KEY (`series_id`,`datetime`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
I need to select all the dates and values from tbl_L, but also the values in tbl_R that have the same datetime as an entry in tbl_L. A trivial join, like so:
SELECT tbl_L.datetime AS datetime, tbl_L.value AS val_L, tbl_R.value AS val_R
FROM tbl_L
LEFT JOIN tbl_R
ON tbl_L.datetime = tbl_R.datetime
WHERE
tbl_L.series_id = 1 AND tbl_R.series_id = 2 ORDER BY tbl_L.datetime ASC
Won't work because it will only return datetime that are both in tbl_L and tbl_R (because the right table is mentioned in the WHERE clause).
Modifying the query to look like this:
SELECT tbl_L.datetime AS datetime, tbl_L.value AS val_L, tbl_R.value AS val_R
FROM tbl_L
LEFT JOIN tbl_R
ON tbl_L.datetime = tbl_R.datetime
AND tbl_R.series_id = 2
AND tbl_L.series_id = 1
ORDER BY tbl_L.datetime ASC;
Significantly slows it down (from a few milliseconds to a few long seconds).
Edit: and also doesn't actually work. I will clarify what I need to achieve:
Assume the following data in the tables:
mysql> SELECT * FROM tbl_R;
+---------------------+------------+-----------+
| datetime | value | series_id |
+---------------------+------------+-----------+
| 2013-02-20 19:21:00 | 5.87000000 | 2 |
| 2013-02-20 19:22:00 | 5.90000000 | 2 |
| 2013-02-20 19:23:00 | 5.80000000 | 2 |
| 2013-02-20 19:25:00 | 5.65000000 | 2 |
+---------------------+------------+-----------+
4 rows in set (0.00 sec)
mysql> SELECT * FROM tbl_L;
+---------------------+-------------+-----------+
| datetime | value | series_id |
+---------------------+-------------+-----------+
| 2013-02-20 19:21:00 | 13.16000000 | 1 |
| 2013-02-20 19:23:00 | 13.22000000 | 1 |
| 2013-02-20 19:24:00 | 13.14000000 | 1 |
| 2013-02-20 19:25:00 | 13.04000000 | 1 |
+---------------------+-------------+-----------+
4 rows in set (0.00 sec)
Again, I need all entries in tbl_L joined with the entries in tbl_R that match in terms of datetime, otherwise NULL.
My output should look like this:
+---------------------+-------------+-------------+
| datetime | val_L | val_R |
+---------------------+-------------+-------------+
| 2013-02-20 19:21:00 | 13.16000000 | 5.870000000 |
| 2013-02-20 19:23:00 | 13.22000000 | 5.800000000 |
| 2013-02-20 19:24:00 | 13.14000000 | NULL |
| 2013-02-20 19:25:00 | 13.04000000 | 5.650000000 |
+---------------------+-------------+-------------+
Thanks again!
You can get the data you want by moving only the condition for tbl_R into the join's ON clause like this:
SELECT tbl_L.datetime AS datetime, tbl_L.value AS val_L, tbl_R.value AS val_R
FROM tbl_L
LEFT JOIN tbl_R
ON tbl_L.datetime = tbl_R.datetime
AND tbl_R.series_id = 2
WHERE
tbl_L.series_id = 1 ORDER BY tbl_L.datetime ASC
Also, there is no index for the query to use on tbl_L. Adding an index on tbl_L.series_id will help the query's performance.