MySQL Left Join and NULL columns from right table

MySQL Left Join and NULL columns from right table - mysql

I have the following two MySQL tables which I need to join:
CREATE TABLE `tbl_L` (
`datetime` datetime NOT NULL,
`value` decimal(14,8) DEFAULT NULL,
`series_id` int(11) NOT NULL,
PRIMARY KEY (`series_id`,`datetime`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `tbl_R` (
`datetime` datetime NOT NULL,
`value` decimal(14,8) DEFAULT NULL,
`series_id` int(11) NOT NULL,
PRIMARY KEY (`series_id`,`datetime`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
I need to select all the dates and values from tbl_L, but also the values in tbl_R that have the same datetime as an entry in tbl_L. A trivial join, like so:
SELECT tbl_L.datetime AS datetime, tbl_L.value AS val_L, tbl_R.value AS val_R
FROM tbl_L
LEFT JOIN tbl_R
ON tbl_L.datetime = tbl_R.datetime
WHERE
tbl_L.series_id = 1 AND tbl_R.series_id = 2 ORDER BY tbl_L.datetime ASC
Won't work because it will only return datetime that are both in tbl_L and tbl_R (because the right table is mentioned in the WHERE clause).
Modifying the query to look like this:
SELECT tbl_L.datetime AS datetime, tbl_L.value AS val_L, tbl_R.value AS val_R
FROM tbl_L
LEFT JOIN tbl_R
ON tbl_L.datetime = tbl_R.datetime
AND tbl_R.series_id = 2
AND tbl_L.series_id = 1
ORDER BY tbl_L.datetime ASC;
Significantly slows it down (from a few milliseconds to a few long seconds).
Edit: and also doesn't actually work. I will clarify what I need to achieve:
Assume the following data in the tables:
mysql> SELECT * FROM tbl_R;
+---------------------+------------+-----------+
| datetime | value | series_id |
+---------------------+------------+-----------+
| 2013-02-20 19:21:00 | 5.87000000 | 2 |
| 2013-02-20 19:22:00 | 5.90000000 | 2 |
| 2013-02-20 19:23:00 | 5.80000000 | 2 |
| 2013-02-20 19:25:00 | 5.65000000 | 2 |
+---------------------+------------+-----------+
4 rows in set (0.00 sec)
mysql> SELECT * FROM tbl_L;
+---------------------+-------------+-----------+
| datetime | value | series_id |
+---------------------+-------------+-----------+
| 2013-02-20 19:21:00 | 13.16000000 | 1 |
| 2013-02-20 19:23:00 | 13.22000000 | 1 |
| 2013-02-20 19:24:00 | 13.14000000 | 1 |
| 2013-02-20 19:25:00 | 13.04000000 | 1 |
+---------------------+-------------+-----------+
4 rows in set (0.00 sec)
Again, I need all entries in tbl_L joined with the entries in tbl_R that match in terms of datetime, otherwise NULL.
My output should look like this:
+---------------------+-------------+-------------+
| datetime | val_L | val_R |
+---------------------+-------------+-------------+
| 2013-02-20 19:21:00 | 13.16000000 | 5.870000000 |
| 2013-02-20 19:23:00 | 13.22000000 | 5.800000000 |
| 2013-02-20 19:24:00 | 13.14000000 | NULL |
| 2013-02-20 19:25:00 | 13.04000000 | 5.650000000 |
+---------------------+-------------+-------------+
Thanks again!

You can get the data you want by moving only the condition for tbl_R into the join's ON clause like this:
SELECT tbl_L.datetime AS datetime, tbl_L.value AS val_L, tbl_R.value AS val_R
FROM tbl_L
LEFT JOIN tbl_R
ON tbl_L.datetime = tbl_R.datetime
AND tbl_R.series_id = 2
WHERE
tbl_L.series_id = 1 ORDER BY tbl_L.datetime ASC
Also, there is no index for the query to use on tbl_L. Adding an index on tbl_L.series_id will help the query's performance.

Related

MySQL Analytics Query - Improve Performance

I have a mysql table that holds about 8 Million Records and I need to run some analytics on it to get averages as shown in below table definition and query. The result contains hourly analytics (avg of a parameter value) for the last 1 year data.
MySQL Server Version : 8.0.15
Table:
create table `temp_data` (
`dateLogged` datetime NOT NULL,
`paramName` varchar(30) NOT NULL,
`paramValue` float NOT NULL,
`sensorId` varchar(20) NOT NULL,
`locationCode` varchar(30) NOT NULL,
PRIMARY KEY (`sensorId`,`paramName`,`dateLogged`),
KEY `summary` (`locationCode`,`paramName`,`dateLogged`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED
Query: The below query transposes row based parameters into columns and while doing so computes the average of param values
SELECT dateLogged,
ROUND(avg( ROUND(IF(paramName = 'temp1', paramValue, NULL),2) ),2) AS T1,
ROUND(avg( ROUND(IF(paramName = 'temp2', paramValue, NULL),2) ),2) AS T2,
ROUND(avg( ROUND(IF(paramName = 'temp3', paramValue, NULL),2) ),2) AS T3,
ROUND(avg( ROUND(IF(paramName = 'temp4', paramValue, NULL),2) ),2) as T4
FROM temp_data where locationCode='A123' and paramName in ('temp1','temp2','temp3','temp4')
group by dateLogged order by dateLogged;
Result:
+---------------------+--------+---------+-------+-------+
| date | T1 | T2 | T3 | T4 |
+---------------------+--------+---------+-------+-------+
| 2018-12-01 00:00:00 | 95.46 | 99.12 | 96.44 | 95.86 |
| 2018-12-01 01:00:00 | 100.38 | 101.09 | 99.56 | 99.70 |
| 2018-12-01 02:00:00 | 101.41 | 102.08 | 99.47 | 99.88 |
| 2018-12-01 03:00:00 | 98.79 | 100.47 | 98.59 | 99.75 |
| 2018-12-01 04:00:00 | 98.23 | 100.58 | 98.38 | 98.93 |
| 2018-12-01 05:00:00 | 101.03 | 101.80 | 99.37 | 99.88 |
... ... ... ... ...
+---------------------+--------+---------+---------+-----+
Problem:
Now there are over 8 Million records in the table and the query takes approximately 35 to 40 seconds to execute.
Looking for suggestions on how to improve the query performance and hopefully, bring it down to under 10 seconds.
Note:
The table has data for up to 1 year and data beyond that is archived and deleted
Result of describe:
+----+-------------+-----------+------------+------+-----------------+---------+---------+-------+---------+----------+--------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+-----------------+---------+---------+-------+---------+----------+--------------------------------------------------------+
| 1 | SIMPLE | temp_data | NULL | ref | PRIMARY,summary | summary | 53 | const | 3524800 | 50.00 | Using index condition; Using temporary; Using filesort |
+----+-------------+-----------+------------+------+-----------------+---------+---------+-------+---------+----------+--------------------------------------------------------+

As temp1 -> temp4 are fixed we can use generated columns to index this:
alter table temp_data add p1234 bool as (paramName IN ('temp1','temp2','temp3','temp4')) NOT NULL,
ADD KEY s1234 (locationCode, p1234, paramName, paramValue, dateLogged)
Then change the query too:
SELECT dateLogged, paramName,
ROUND(avg( ROUND(paramValue,2) ),2)
FROM temp_data where locationCode='A123' and p1234
group by dateLogged, paramName
order by dateLogged, paramName;
Handle the T1 -> T4 paramName formatting in the application code

Selecting the oldest updated set of entries

I have the following table my_entry:
Id int(11) AI PK
InternalId varchar(30)
UpdatedDate datetime
IsDeleted bit(1)
And I have the following query:
SELECT
`Id`, `InternalId`
FROM
`my_entry`
WHERE
(`IsDeleted` = FALSE)
AND ((`UpdatedDate` IS NULL
OR DATE(`UpdatedDate`) != DATE(STR_TO_DATE('17/10/2019', '%d/%m/%Y'))))
ORDER BY `x`.`UpdatedDate`
Limit 200;
The table has around 3M records, I have a program running that executes the above query and returns 200 entries from the table that weren't updated today, the program then changes those 200 entries and updates them again setting the UpdatedDate to today's date, on the next execution those 200 entries will be ignored, and new 200 entries will get selected, this keeps running until all the entries in the table are selected and updated for today.
This way I can ensure that all the entries are updated at least once every day.
This works perfectly fine, for the very first thousands of entries, the select query executes in a couple of milliseconds, but as soon as more entries are updated and have today's date in the UpdatedDate the query keeps slowing down, reaching execution times up to 20 seconds.
I'm wondering if I can do something to optimize the query, or if there is a better approach to take without using the UpdatedDate.
I was thinking of using the Id and paginating the entries, but I'm afraid this way I might miss some of them.
What I already tried:
Adding indexes to both the UpdatedDate and IsDeleted.
Changing the UpdatedDate type from datetime to date.
Edit:
MySql version: 5.6.45
The table in hand:
CREATE TABLE `my_entry` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`InternalId` varchar(30) NOT NULL,
`UpdatedDate` date DEFAULT NULL,
`IsDeleted` bit(1) NOT NULL DEFAULT b'0',
PRIMARY KEY (`Id`),
UNIQUE KEY `InternalId` (`InternalId`),
KEY `UpdatedDate` (`UpdatedDate`),
KEY `entry_isdeleted_index` (`IsDeleted`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=8204626 DEFAULT CHARSET=utf8mb4
The output of the EXPLAIN query:
+----+-------------+-------+-------+-------------------------------------+-------------+---------+------+------+---------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-------------------------------------+-------------+---------+------+------+---------------+
| 1 | SIMPLE | x | index | "UpdatedDate entry_isdeleted_index" | UpdatedDate | 4 | NULL | 400 | "Using where" |
+----+-------------+-------+-------+-------------------------------------+-------------+---------+------+------+---------------+
Example of data in the table:
+------------+--------+---------------------+-----------+
| InternalId | Id | UpdatedDate | IsDeleted |
+------------+--------+---------------------+-----------+
| 328044773 | 552990 | 2019-10-17 10:11:29 | 0 |
| 330082707 | 552989 | 2019-10-17 10:11:29 | 0 |
| 329701688 | 552988 | 2019-10-17 10:11:29 | 0 |
| 329954358 | 552987 | 2019-10-16 10:11:29 | 0 |
| 964227577 | 552986 | 2019-10-16 12:33:29 | 0 |
| 329794593 | 552985 | 2019-10-16 12:33:29 | 0 |
| 400015773 | 552984 | 2019-10-16 12:33:29 | 0 |
| 330674329 | 552983 | 2019-10-16 12:33:29 | 0 |
+------------+--------+---------------------+-----------+
Example expected output of the query:
+------------+--------+
| InternalId | Id |
+------------+--------+
| 329954358 | 552987 |
| 964227577 | 552986 |
| 329794593 | 552985 |
| 400015773 | 552984 |
| 330674329 | 552983 |
+------------+--------+

First, simplify the date arithmetic. Then take the following approach:
Take NULL values in one subquery
Take rows on the date in another
Then order and select the results
Start by writing the query as:
SELECT Id, InternalId
FROM ((SELECT Id, InternalId, 2 as priority
FROM my_entry
WHERE NOT IsDeleted AND UpdatedDate IS NULL
LIMIT 200
) UNION ALL
(SELECT Id, InternalId, 1 as priority
FROM my_entry
WHERE NOT IsDeleted AND UpdatedDate <> '2019-10-17'
LIMIT 200
)
) t
ORDER BY priority
LIMIT 200;
The index that you want is either (updateddate, isdeleted) or (isdeleted, updateddate). You can add id and internalid.
The idea is to select at most 200 rows from the two subqueries without sorting. Then the outer query is sorting at most 400 rows -- and that should not take multiple seconds.

MySQL: Strange behavior of UPDATE query (ERROR 1062 Duplicate entry)

I have a MySQL database the stores news articles with the publications date (just day information), the source, and category. Based on these I want to generate a table that holds the article counts w.r.t. to these 3 parameters.
Since for some combinations of these 3 parameters there might be no article, a simple GROUP BY won't do. I therefore first generate a table news_article_counts with all possible combinations of the 3 parameters, and an default article_count of 0 -- like this:
SELECT * FROM news_article_counts;
+--------------+------------+----------+---------------+
| published_at | source | category | article_count |
+------------- +------------+----------+---------------+
| 2016-08-05 | 1826089206 | 0 | 0 |
| 2016-08-05 | 1826089206 | 1 | 0 |
| 2016-08-05 | 1826089206 | 2 | 0 |
| 2016-08-05 | 1826089206 | 3 | 0 |
| 2016-08-05 | 1826089206 | 4 | 0 |
| ... | ... | ... | ... |
+--------------+------------+----------+---------------+
For testing, I now created a temporary table tmp as the GROUP BY result from the original news article table:
SELECT * FROM tmp LIMIT 6;
+--------------+------------+----------+-----+
| published_at | source | category | cnt |
+--------------+------------+----------+-----+
| 2016-08-05 | 1826089206 | 3 | 1 |
| 2003-09-19 | 1826089206 | 4 | 1 |
| 2005-08-08 | 1826089206 | 3 | 1 |
| 2008-07-22 | 1826089206 | 4 | 1 |
| 2008-11-26 | 1826089206 | 8 | 1 |
| ... | ... | ... | ... |
+--------------+------------+----------+-----+
Given these two tables, the following query works as expected:
SELECT * FROM news_article_counts c, tmp t
WHERE c.published_at = t.published_at AND c.source = t.source AND c.category = t.category;
But now I need to update the article_count of table news_article_counts with the values in table tmp where the 3 parameters match up. For this I'm using the following query (I've tried different ways but with the same results):
UPDATE
news_article_counts c
INNER JOIN
tmp t
ON
c.published_at = t.published_at AND
c.source = t.source AND
c.category = t.category
SET
c.article_count = t.cnt;
Executing this query yields this error:
ERROR 1062 (23000): Duplicate entry '2018-04-07 14:46:17-1826089206-1' for key 'uniqueIndex'
uniqueIndex is a joint index over published_at, source, category of table news_article_counts. But this shouldn't be a problem since I do not -- as far as I can tell -- update any of those 3 values, only article_count.
What confuses me most is that in the error it mentions the timestamp I executed the query (here: 2018-04-07 14:46:17). I have no absolutely idea where this comes into play. In fact, some rows in news_article_counts now have 2018-04-07 14:46:17 as value for published_at. While this explains the error, I cannot see why published_at gets overwritten with the current timestamp. There is no ON UPDATE CURRENT_TIMESTAMP on this column; see:
CREATE TABLE IF NOT EXISTS `test`.`news_article_counts` (
`published_at` TIMESTAMP NOT NULL,
`source` INT UNSIGNED NOT NULL,
`category` INT UNSIGNED NOT NULL,
`article_count` INT UNSIGNED NOT NULL DEFAULT 0,
UNIQUE INDEX `uniqueIndex` (`published_at` ASC, `source` ASC, `category` ASC))
ENGINE = MyISAM
DEFAULT CHARACTER SET = utf8mb4;
What am I missing here?
UPDATE 1: I actually checked the table definition of news_article_counts in the database. And there's indeed the following:
mysql> SHOW COLUMNS FROM news_article_counts;
+---------------+------------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------------+------+-----+-------------------+-----------------------------+
| published_at | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| source | int(10) unsigned | NO | | NULL | |
| category | int(10) unsigned | NO | | NULL | |
| article_count | int(10) unsigned | NO | | 0 | |
+---------------+------------------+------+-----+-------------------+-----------------------------+
But why is on update CURRENT_TIMESTAMP set. I double and triple-checked my CREATE TABLE statement. I removed the joint index, I added an artificial primary key (auto_increment). Nothing help. I've even tried to explicitly remove these attributes from published_at with:
ALTER TABLE `news_article_counts` CHANGE `published_at` `published_at` TIMESTAMP NOT NULL;
Nothing seems to work for me.

It looks like you have the explicit_defaults_for_timestamp system variable disabled. One of the effects of this is:
The first TIMESTAMP column in a table, if not explicitly declared with the NULL attribute or an explicit DEFAULT or ON UPDATE attribute, is automatically declared with the DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP attributes.
You could try enabling this system variable, but that could potentially impact other applications. I think it only takes effect when you're actually creating a table, so it shouldn't affect any existing tables.
If you don't to make a system-level change like this, you could add an explicit DEFAULT attribute to the published_at column of this table, then it won't automatically add ON UPDATE.

MySql calendar table and performances

for a project i'm working on, i have a single table with two dates meaning a range of dates and i needed a way to "multiply" my rows for every day in between the two dates.
So for instance i have start 2017-07-10, end 2017-07-14
I needed to have 4 lines with 2017-07-10, 2017-07-11, 2017-07-12, 2017-07-13
In order to do this i found here someone mentioning using a "calendar table" with all the dates for years.
So i built it, now i have these two simple tables:
CREATE TABLE `time_sample` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`start` varchar(16) DEFAULT NULL,
`end` varchar(16) DEFAULT NULL,
PRIMARY KEY (`societa_id`),
KEY `start_idx` (`start`),
KEY `end_idx` (`end`)
) ENGINE=MyISAM AUTO_INCREMENT=222 DEFAULT CHARSET=latin1;
This table contains my date ranges, start and end are indexed, the primary key is an incremental int.
Sample Row:
id start end
1 2015-05-13 2015-05-18
Second table:
CREATE TABLE `time_dimension` (
`id` int(11) NOT NULL,
`db_date` date NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `td_dbdate_idx` (`db_date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
This has a date indexed for every day for many years to come.
Sample row:
id db_date
20120101 2012-01-01
Now, i made the join:
select * from time_sample s join time_dimension t on (t.db_date >= start and t.db_date < end);
This takes 3ms. Even if my first table is HUGE, this query will always be very quick (max i've seen was 50ms with a lot of records).
The issue i have is while grouping results (i need them grouped for my application):
select * from time_sample s join time_dimension t on (t.db_date >= start and t.db_date < end) group by db_date;
This takes more than one second with not so many rows in the first table, increasing dramatically. Why is this happening and how can i avoid this?
Changing the data types doesn't help, having the second table with just one column doesn't help.
Can i have suggestions, please :(

I cannot replicate this result...
I have a calendar table with lots of dates: calendar(dt) where dt is a PRIMARY KEY DATE data type.
DROP TABLE IF EXISTS time_sample;
CREATE TABLE time_sample (
id int(11) NOT NULL AUTO_INCREMENT,
start date not NULL,
end date null,
PRIMARY KEY (id),
KEY (start,end)
);
INSERT INTO time_sample (start,end) VALUES ('2010-03-13','2010-05-09);
SELECT *
FROM calendar x
JOIN time_sample y
ON x.dt BETWEEN y.start AND y.end;
+------------+----+------------+------------+
| dt | id | start | end |
+------------+----+------------+------------+
| 2010-03-13 | 1 | 2010-03-13 | 2010-05-09 |
| 2010-03-14 | 1 | 2010-03-13 | 2010-05-09 |
| 2010-03-15 | 1 | 2010-03-13 | 2010-05-09 |
| 2010-03-16 | 1 | 2010-03-13 | 2010-05-09 |
...
| 2010-05-09 | 1 | 2010-03-13 | 2010-05-09 |
+------------+----+------------+------------+
58 rows in set (0.10 sec)
EXPLAIN
SELECT * FROM calendar x JOIN time_sample y ON x.dt BETWEEN y.start AND y.end;
+----+-------------+-------+--------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | y | system | start | NULL | NULL | NULL | 1 | |
| 1 | SIMPLE | x | range | PRIMARY | PRIMARY | 3 | NULL | 57 | Using where; Using index |
+----+-------------+-------+--------+---------------+---------+---------+------+------+--------------------------+
2 rows in set (0.00 sec)
Even with a GROUP BY, I'm struggling to reproduce the problem. Here's a simple COUNT...
SELECT SQL_NO_CACHE dt, COUNT(1) FROM calendar x JOIN time_sample y WHERE x.dt BETWEEN y.start AND y.end GROUP BY dt ORDER BY COUNT(1) DESC LIMIT 3;
+------------+----------+
| dt | COUNT(1) |
+------------+----------+
| 2010-04-03 | 2 |
| 2010-05-05 | 2 |
| 2010-03-13 | 2 |
+------------+----------+
3 rows in set (0.36 sec)
EXPLAIN
SELECT SQL_NO_CACHE dt, COUNT(1) FROM calendar x JOIN time_sample y WHERE x.dt BETWEEN y.start AND y.end GROUP BY dt ORDER BY COUNT(1) DESC LIMIT 3;
+----+-------------+-------+-------+---------------+---------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+----------------------------------------------+
| 1 | SIMPLE | y | index | start | start | 7 | NULL | 2 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | x | index | PRIMARY | PRIMARY | 3 | NULL | 1000001 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+----------------------------------------------+

How to improve MySQL "fill the gaps" query

I have a table with currency exchange rates that I fill with data published by the ECB. That data contains gaps in the date dimension like e.g. holidays.
CREATE TABLE `imp_exchangerate` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`rate_date` date NOT NULL,
`currency` char(3) NOT NULL,
`rate` decimal(14,6) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `rate_date` (`rate_date`,`currency`),
KEY `imp_exchangerate_by_currency` (`currency`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
I also have a date dimension as youd expect in a data warehouse:
CREATE TABLE `d_date` (
`date_id` int(11) NOT NULL,
`full_date` date DEFAULT NULL,
---- etc.
PRIMARY KEY (`date_id`),
KEY `full_date` (`full_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Now I try to fill the gaps in the exchangerates like this:
SELECT
d.full_date,
currency,
(SELECT rate FROM imp_exchangerate
WHERE rate_date <= d.full_date AND currency = c.currency
ORDER BY rate_date DESC LIMIT 1) AS rate
FROM
d_date d,
(SELECT DISTINCT currency FROM imp_exchangerate) c
WHERE
d.full_date >=
(SELECT min(rate_date) FROM imp_exchangerate
WHERE currency = c.currency) AND
d.full_date <= curdate()
Explain says:
+------+--------------------+------------------+-------+----------------------------------------+------------------------------+---------+------------+------+--------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+------------------+-------+----------------------------------------+------------------------------+---------+------------+------+--------------------------------------------------------------+
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 201 | |
| 1 | PRIMARY | d | range | full_date | full_date | 4 | NULL | 6047 | Using where; Using index; Using join buffer (flat, BNL join) |
| 4 | DEPENDENT SUBQUERY | imp_exchangerate | ref | imp_exchangerate_by_currency | imp_exchangerate_by_currency | 3 | c.currency | 664 | |
| 3 | DERIVED | imp_exchangerate | range | NULL | imp_exchangerate_by_currency | 3 | NULL | 201 | Using index for group-by |
| 2 | DEPENDENT SUBQUERY | imp_exchangerate | index | rate_date,imp_exchangerate_by_currency | rate_date | 6 | NULL | 1 | Using where |
+------+--------------------+------------------+-------+----------------------------------------+------------------------------+---------+------------+------+--------------------------------------------------------------+
MySQL needs multiple hours to execute that query. Are there any Ideas how to improve that? I have tried with an index on rate without any noticable impact.

I have a solution for a while now: get rid of dependent subqueries. I had to think from different angles in mutliple places and here is the result:
SELECT
cd.date_id,
x.currency,
x.rate
FROM
imp_exchangerate x INNER JOIN
(SELECT
d.date_id,
max(rate_date) as rate_date,
currency
FROM
d_date d INNER JOIN
imp_exchangerate ON rate_date <= d.full_date
WHERE
d.full_date <= curdate()
GROUP BY
d.date_id,
currency) cd ON x.rate_date = cd.rate_date and x.currency = cd.currency
This query finishes in less then 10 minutes now compared to multiple hours for the original query.
Lesson learned: avoid dependent subqueries in MySQL like the plague!

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL Left Join and NULL columns from right table - mysql

Related

MySQL Analytics Query - Improve Performance

Selecting the oldest updated set of entries

MySQL: Strange behavior of UPDATE query (ERROR 1062 Duplicate entry)

MySql calendar table and performances

How to improve MySQL "fill the gaps" query

Categories

Resources