How to optimize query for Max(Date) in MySQL - mysql

I have this SQL Query:
SELECT company.*, salesorder.lastOrderDate
FROM company
INNER JOIN
(
SELECT companyId, MAX(orderDate) AS lastOrderDate
FROM salesorder
GROUP BY companyId
) salesorder ON salesorder.companyId = company.companyId;
This gives me one extra column at the end of a company master table with their last order date.
Problem is, when analyzing this query, it seems like it's not that efficient:
Is there a way to make this more efficient?
salesorder:
orderId, companyId, orderDate
1 333 2015-01-01
2 555 2016-01-01
3 333 2017-01-01
company
companyId, name
333 Acme
555 Microsoft
Query:
companyId, name, lastOrderDate
333 Acme 2017-01-01
555 Microsoft 2016-01-01
EXPLAIN SELECT:
CREATE TABLE `salesorder` (
`orderId` int(11) NOT NULL,
`companyId` int(11) DEFAULT NULL,
`orderDate` date DEFAULT NULL,
PRIMARY KEY (`orderId`),
UNIQUE KEY `orderId_UNIQUE` (`orderId`) /*!80000 INVISIBLE */,
KEY `testComposite` (`companyId`,`orderDate`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
CREATE TABLE `company` (
`companyId` int(11) NOT NULL,
`name` varchar(45) DEFAULT NULL,
PRIMARY KEY (`companyId`),
UNIQUE KEY `companyId_UNIQUE` (`companyId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

It looks like you could simplify the query like this:
SELECT c.*, MAX(o.OrderDate) As lastOrderDate
FROM company c
INNER JOIN salesorder o on o.companyId = c.companyId
GROUP BY <list all company fields here>;
MySql might even let you get away with just c.companyId in the GROUP BY clause, but that's not really standard and not great practice.

Add the composite index with the columns in this order:
INDEX(companyId, orderDate)
Single column indexes are not as efficient (in this query).
Since a PRIMARY KEY is a unique key, do not redundantly declare a UNIQUE key.
With only a few rows in the table, you cannot trust EXPLAIN (and Explain-like output) to say how bad the query will be. Try it with at least a few dozen rows. And provide EXPLAIN FORMAT=JSON SELECT ...
Note that it says "Using index". That says that the subquery in question can be performed entirely inside the index's BTree. This is 'good'. (I presume you did the EXPLAIN after adding my suggested index?)
Your previous image showed a lot of rows; what gives?
I'm still puzzled as to why there are 3 rows in the EXPLAIN and two table scans. Anyway, here is another formulation to try:
SELECT c.*,
( SELECT MAX(orderDate)
FROM salesorder
WHERE companyId = c.companyId
) AS lastOrderDate
FROM company AS c;
(and my INDEX is still important)

Related

Is there a shorter alternative to my MySql query?

I'm a student of Java and do SQL too. In a lesson we were presented with an example database sketch, and a query that a replicate in this question.
I have made an example with MySql and it has three tables,
CREATE TABLE `employed` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8;
CREATE TABLE `department` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8;
CREATE TABLE `employees_departments` (
`employed_id` int(11) NOT NULL,
`department_id` int(11) NOT NULL,
PRIMARY KEY (`employed_id`,`department_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
employed was filled with
(1 'Karl'), (2 'Bengt'), (3 'Adam'), (4 'Stefan')
department was filled with
(4, 'HR'), (5, 'Sälj'), (6, 'New departm')
employees_departments was filled with
1 4
2 5
3 4
So "Stefan" has no department, and "New departm" has no employed.
I wanted a query that would give the employees with all their departments, and employees without departments and departments with no employees. I found on solution like this:
select A.name, C.name from employed A
left join employees_departments B on (A.id=B.employed_id)
left join department C on (B.department_id = C.id)
union
select C.name, A.name from department A
left join employees_departments B on (A.id=B.department_id)
left join employed C on (B.employed_id = C.id)
Would be nice if there was a short query to make it...
Also, I made this without foreign key constraints, since I want to do it as simple as possible for this example.
Greetings
MySQL doesn't support a FULL OUTER join operation.
We can emulate that by combining two sets... the result of an OUTER JOIN and the result from an anti-JOIN.
(
SELECT ee.name AS employed_name
, dd.name AS department_name
FROM employed ee
LEFT
JOIN employees_departments ed
ON ed.employed_id = ee.id
LEFT
JOIN department dd
ON dd.id = ed.department_id
)
UNION ALL
(
SELECT nn.name AS employed_name
, nd.name AS department_name
FROM department nd
LEFT
JOIN employees_departments ne
ON ne.deparment_id = nd.id
LEFT
JOIN employeed nn
ON nn.id = nd.employee_id
WHERE nn.id IS NULL
)
The first SELECT returns all employed name, along with matching department name, including employed that have no department.
The second SELECT returns just department name that have no matching rows in employed.
The results from the two SELECT are combined/concatenated using a UNION ALL set operator. (The UNION ALL operation avoids a potentially expensive "Using filesort" operation that would be forced with the UNION set operator.
This is the shortest query pattern to return these rows.
We could make the SQL a little shorter. For example, if we have a foreign key relationships between employeed_department and employed (no indication in the original post that such a relationship is enforced, so we don't assume that there is one)... but if that is enforced, then we could omit the employed table from the second SELECT
UNION ALL
(
SELECT NULL AS employed_name
, nd.name AS department_name
FROM department nd
LEFT
JOIN employees_departments ne
ON ne.deparment_id = nd.id
WHERE ne.department_id IS NULL
)
With suitable indexes available, this is going to give us the most efficient access plan.
Is there shorter SQL that will return an equivalent result? If there is, it's likely not going to perform as efficiently as the above.

MySQL - Query with LEFT JOIN and group by not returning rows for count 0

I am running below mentioned query.
select c.id, c.code, c.name, count(a.iso_country) from countries c
left join airports a on c.code = a.iso_country group by a.iso_country
order by count(a.iso_country);
In my 'countries' table, I have 247 rows.
In 'airports' table, 'iso_country' column maps to 'code' column in 'countries' table.
Below are the table definitions.
Countries table -
CREATE TABLE `countries` (
`id` int(11) NOT NULL,
`code` varchar(2) NOT NULL,
`name` text,
`continent` text,
PRIMARY KEY (`id`),
UNIQUE KEY `code_UNIQUE` (`code`),
KEY `code_idx` (`code`)
)
Airports table -
CREATE TABLE `airports` (
`id` int(11) NOT NULL,
`type` text,
`name` text,
`continent` text,
`iso_country` varchar(2) DEFAULT NULL,
`iso_region` text,
PRIMARY KEY (`id`),
KEY `country_iso_code_fk_idx` (`iso_country`),
CONSTRAINT `country_fk` FOREIGN KEY (`iso_country`) REFERENCES `countries`
(`code`) )
The issue I'm facing is - the query I mentioned above returns 242 countries - 241 countries with airports and 1 with null values for 'airports', but doesn't include other 5 countries who also don't have any airports. Please guide me what am I doing wrong in this query.
PS:- I am just a novice in SQL.
I'm running on MySQL 5.7 Community Edition.
Thanks in advance.
You want a count of airports by country, including those where there are none, right?
Try this:
SELECT
c.id, c.code, c.name, count(a.iso_country) AS airport_count
FROM
countries c LEFT JOIN airports a ON c.code = a.iso_country
GROUP BY
c.id, c.code, c.name
ORDER BY
airport_count DESC;
Could it have something to do with NULL values? Your a.iso_country column is DEFAULT NULL and c.code is NOT NULL. Usually when comparing values, if either can be NULL, you want to use something like COALESCE to provide a second option in the case that the value is NULL. The reason for this being that NULL != '' and NULL != 0. What if you join on COALESCE(a.iso_country, '') = COALESCE(c.code, '')?
I'm also not sure how COUNT behaves when given a NULL. You may need to be careful there.
Can you try with checking this condition in the where clause .Implicitly checking null in table where airport are not found.
where (airports.name.is null)

Mysql query not optimized and very slow, but why?

in the software that i develop, a car delear software, there's a section with the agenda with all the appointments of the users.
This section is pretty fast to load with a daily and normal use of the agenda, thousands of rows, but start to be really slow when the agenda tables reach 1 million of rows.
The structure:
1) Main table
CREATE TABLE IF NOT EXISTS `agenda` (
`id_agenda` int(11) NOT NULL AUTO_INCREMENT,
`id_user` int(11) NOT NULL DEFAULT '0',
`id_agency` int(11) NOT NULL DEFAULT '0',
`id_customer` int(11) DEFAULT NULL,
`id_car` int(11) DEFAULT NULL,
`id_owner` int(11) DEFAULT NULL,
`type` int(11) NOT NULL DEFAULT '8',
`title` varchar(255) NOT NULL DEFAULT '',
`text` text NOT NULL,
`start_day` date NOT NULL DEFAULT '0000-00-00',
`end_day` date NOT NULL DEFAULT '0000-00-00',
`start_hour` time NOT NULL DEFAULT '00:00:00',
`end_hour` time NOT NULL DEFAULT '00:00:00'
PRIMARY KEY (`id_agenda`),
KEY `start_day` (`start_day`),
KEY `id_customer` (`id_customer`),
KEY `id_car` (`id_car`),
KEY `id_user` (`id_user`),
KEY `id_owner` (`id_owner`),
KEY `type` (`type`),
KEY `id_agency` (`id_agency`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
2) Secondary table
CREATE TABLE IF NOT EXISTS `agenda_cars` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_agenda` int(11) NOT NULL,
`id_car` int(11) NOT NULL,
`id_owner` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `id_agenda` (`id_agenda`),
KEY `id_car` (`id_car`),
KEY `id_owner` (`id_owner`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Query:
SELECT a.id_agenda
FROM agenda as a
LEFT JOIN agenda_cars as agc on agc.id_agenda = a.id_agenda
WHERE
(a.id_customer = '22' OR (a.id_owner = '22' OR agc.id_owner = '22' ))
GROUP BY a.id_agenda
ORDER BY a.start_day, a.start_hour
Explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE a index PRIMARY PRIMARY 4 NULL 1051987 Using temporary; Using filesort
1 SIMPLE agc ref id_agenda id_agenda 4 db.a.id_agenda 1 Using where
The query reachs 10 secs to end, with the id 22, but with other id can reach also 20 secs, this just for the query, to load all in the web page take of course more time.
I don't get the point why it takes so long to get the data, i think the indexes are right configured and the query is pretty simple, so why?
Too much data?
I've solved in this way:
SELECT a.id_agenda
FROM
(
SELECT id_agenda
FROM agenda
WHERE (id_customer = '22' OR id_owner = '22' )
UNION
SELECT id_agenda
FROM agenda_cars
WHERE id_owner = '22'
) as at
INNER JOIN agenda as a on a.id_agenda = at.id_agenda
GROUP BY a.id_agenda
ORDER BY a.start_day, a.start_hour
This version of the query is ten times faster the then previous...but why?
Thanks to all want to contribute to solve my doubts!
UPDATE AFTER Rick James solution:
Query suggested
SELECT a.id_agenda
FROM
(
SELECT id_agenda FROM agenda WHERE id_customer = '22'
UNION DISTINCT
SELECT id_agenda FROM agenda WHERE id_owner = '22'
UNION DISTINCT
SELECT id_agenda FROM agenda_cars WHERE id_owner = '22'
) as at
INNER JOIN agenda as a ON a.id_agenda = at.id_agenda
ORDER BY a.start_datetime;
Result: 279 total, 0.0111 sec
EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 366 Using temporary; Using filesort
1 PRIMARY a eq_ref PRIMARY PRIMARY 4 at.id_agenda 1 NULL
2 DERIVED agenda ref id_customer id_customer 5 const 1 Using index
3 UNION agenda ref id_owner id_owner 5 const 114 Using index
4 UNION agenda_cars ref id_owner id_owner 4 const 250 NULL
NULL UNION RESULT <union2,3,4> ALL NULL NULL NULL NULL NULL Using temporary
Before I dig into what can be done, let me list several reg flags I see.
OR is hard to optimize
Filtering (WHERE) on multiple tables JOINed together is hard to optimize.
GROUP BY x ORDER BY z means two passes over the data, usually 2 temp tables and filesorts.
Did you really mean LEFT? It says "the right table (agc) might be missing, in which case provide NULLs".
(You may not be able to get rid of all of the red flags.)
Red flags in the Schema:
Indexing every column -- usually not useful
Only single-column indexes -- "composite" indexes often help.
DATE and TIME as separate columns -- usually makes for clumsy queries.
OK, those are off my shoulder, now to study the query... (Oh, and thanks for providing the CREATEs and EXPLAIN!)
The ON implies a 1:many relationship between agenda:agenda_cars. Is that correct?
id_owner and id_car are in both tables, yet are not included in the ON; what's up?
(Here's the meat of the answer to your final question.) Why have GROUP BY? I see no aggregates. I will guess that the 1:many relationship lead to multiple rows, and you needed to de-dup? For dedupping, please use DISTINCT. But, the real solution is to avoid the "inflate (JOIN) - deflate (GROUP BY)" syndrome. Your subquery is a good start on that.
Rolling some of the above comments in, plus more:
SELECT a.id_agenda
FROM
(
SELECT id_agenda FROM agenda WHERE id_customer = '22'
UNION DISTINCT
SELECT id_agenda FROM agenda WHERE id_owner = '22'
UNION DISTINCT
SELECT id_agenda FROM agenda_cars WHERE id_owner = '22'
) as at
INNER JOIN agenda as a ON a.id_agenda = at.id_agenda
ORDER BY a.start_datetime;
Notes:
Got rid of the other OR
Explicit UNION DISTINCT to be clear that dups are expected.
Toss GROUP BY and not using SELECT DISTINCT; UNION DISTINCT deals with the need.
You have the 4 necessary indexes (one per subquery): (id_customer), (id_owner) (on both tables) and PRIMARY KEY(id_agenda).
The indexes are "covering indexes for all the subqueries -- an extra bonus.
There will be one unavoidable tmp table and file sort -- for the ORDER BY, but it won't be on a million rows.
(No need for composite indexes -- this time.)
I changed to a DATETIME; change back if you have a good reason for splitting them.
Did I get you another 10x? Did I explain it sufficiently?
Oh, one more thing...
This query returns an list of ids ordered by something that it does not return (date+time). What will you do with ids? If you are using this as a subquery in another table, then the Optimizer has a right to throw away the ORDER BY. Just warning you.

MYSQL - JOIN with OR condition

I have 3 Tables CompanyMaster(Which has 3 Million rows), Token1, Token2 and the table structure is,
CompanyMaster
CREATE TABLE `CompanyMaster` (
`CompanyUID` int(11) NOT NULL AUTO_INCREMENT,
`WebDomain` varchar(150) DEFAULT NULL,
`CompanyPrimaryName` varchar(200) DEFAULT NULL,
PRIMARY KEY (`CompanyUID`)
) ENGINE=InnoDB AUTO_INCREMENT=3941244 DEFAULT CHARSET=latin1
Token1
CREATE TABLE `Token1`(
`CompanyUID` int(11) NOT NULL,
`Token` varchar(50) NOT NULL,
KEY `Token` (`Token`),
KEY `CompanyUID` (`CompanyUID`),
CONSTRAINT `CompanyAlias4_ibfk_1` FOREIGN KEY (`CompanyUID`) REFERENCES `CompanyMaster` (`CompanyUID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Token2
CREATE TABLE `Token2` (
`CompanyUID` int(11) NOT NULL,
`Token` varchar(100) NOT NULL,
KEY `Token` (`Token`),
KEY `CompanyUID` (`CompanyUID`),
CONSTRAINT `CompanyAlias5_ibfk_1` FOREIGN KEY (`CompanyUID`) REFERENCES `CompanyMaster` (`CompanyUID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
I want to get the WebDomain from the CompanyMaster table using the Token1 and Token2 tables.
The Query i am using is,
SELECT WebDomain FROM CompanyMaster WHERE CompanyUID IN (
SELECT CompanyUID FROM Token1 WHERE Token='appleinc'
UNION
SELECT CompanyUID FROM Token2 WHERE Token='d012233:q122100:')
This query takes almost 30 Seconds to get the result. I executed the sub-query alone, which is taking < 100 milli-seconds.So the problem is with the IN condition.
I replaced the query with join and it is executing in < 200 ms,
SELECT c.CompanyUID FROM `CompanyMaster` c
JOIN `Token1` tk1
ON tk1.CompanyUID = c.CompanyUID AND tk1.Token= 'appleinc'
JOIN `Token2` tk2
ON tk2.CompanyUID = c.CompanyUID AND tk2.Token= 'd012233:q122100:'
But the problem with above query is , if tk1.Alias = 'appleinc' or tk2.Alias = 'd012233:q122100:' fails it is giving output as empty row. But i want the matched rows even if only one condition is matched.
Please help me how to solve this one ? And i also want the query to be executed in less than 10 milli-seconds. Is it achievable ?
You should certainly get better performance with UNION ALL than with UNION, as it will have no difference for your case in output, but it does not need to filter out duplicates like UNION does:
SELECT WebDomain
FROM CompanyMaster
WHERE CompanyUID IN
( SELECT CompanyUID
FROM Token1
WHERE Token = 'appleinc'
UNION ALL
SELECT CompanyUID
FROM Token2
WHERE Token = 'd012233:q122100:')
However, if you would put the UNION in the outer query, it might even give better performance, like this:
SELECT WebDomain
FROM CompanyMaster m
INNER JOIN Token1 t ON t.CompanyUID = m.CompanyUID
WHERE Token = 'appleinc'
UNION
SELECT WebDomain
FROM CompanyMaster m
INNER JOIN Token2 t ON t.CompanyUID = m.CompanyUID
WHERE Token = 'd012233:q122100:'
Here it is probably important to only get unique values, so you need UNION without ALL here.
You can use where clause to filter your record on the basis of toke1 and token2. On the basis of your requirement you can change that clause.
Please check following SQL. Hope it will solve your problem.
SELECT
c.CompanyUID, c.WebDomain
FROM
CompanyMaster c
LEFT JOIN Token1 tk1 ON tk1.CompanyUID = c.CompanyUID
LEFT JOIN Token2 tk2 ON tk2.CompanyUID = c.CompanyUID
WHERE
tk1.Token = '123' OR tk2.Token = 'xyz';

Why is my MySQL group by so slow?

I am trying to query against a partitioned table (by month) approaching 20M rows. I need to group by DATE(transaction_utc) as well as country_id. The rows that get returned if i turn off the group by and aggregates is just over 40k, which isn't too many, however adding the group by makes the query substantially slower unless said GROUP BY is on the transaction_utc column, in which case it gets FAST.
I've been trying to optimize this first query below by tweaking the query and/or the indexes, and got to the point below (about 2x as fast as initially) however still stuck with a 5s query for summarizing 45k rows, which seems way too much.
For reference, this box is a brand new 24 logical core, 64GB RAM, Mariadb-5.5.x server with way more INNODB buffer pool available than index space on the server, so shouldn't be any RAM or CPU pressures.
So, I'm looking for ideas on what is causing this slow down and suggestions on speeding it up. Any feedback would be greatly appreciated! :)
Ok, onto the details...
The following query (the one I actually need) takes approx 5 seconds (+/-), and returns less than 100 rows.
SELECT lss.`country_id` AS CountryId
, Date(lss.`transaction_utc`) AS TransactionDate
, c.`name` AS CountryName, lss.`country_id` AS CountryId
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD
FROM `sales` lss
JOIN `countries` c ON lss.`country_id` = c.`country_id`
WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' ) GROUP BY lss.`country_id`, DATE(lss.`transaction_utc`)
EXPLAIN SELECT for the same query is as follows. Notice that it's not using the transaction_utc key. Shouldn't it be using my covering index instead?
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE lss ref idx_unique,transaction_utc,country_id idx_unique 50 const 1208802 Using where; Using temporary; Using filesort
1 SIMPLE c eq_ref PRIMARY PRIMARY 4 georiot.lss.country_id 1
Now onto a couple other options that I've tried to attempt to determine whats going on...
The following query (changed group by) takes about 5 seconds (+/-), and returns only 3 rows:
SELECT lss.`country_id` AS CountryId
, DATE(lss.`transaction_utc`) AS TransactionDate
, c.`name` AS CountryName, lss.`country_id` AS CountryId
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD
FROM `sales` lss
JOIN `countries` c ON lss.`country_id` = c.`country_id`
WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' ) GROUP BY lss.`country_id`
The following query (removed group by) takes 4-5 seconds (+/-) and returns 1 row:
SELECT lss.`country_id` AS CountryId
, DATE(lss.`transaction_utc`) AS TransactionDate
, c.`name` AS CountryName, lss.`country_id` AS CountryId
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD
FROM `sales` lss
JOIN `countries` c ON lss.`country_id` = c.`country_id`
WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' )
The following query takes .00X seconds (+/-) and returns ~45k rows. This to me shows that at max we're only trying to group 45K rows into less than 100 groups (as in my initial query):
SELECT lss.`country_id` AS CountryId
, DATE(lss.`transaction_utc`) AS TransactionDate
, c.`name` AS CountryName, lss.`country_id` AS CountryId
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD
FROM `sales` lss
JOIN `countries` c ON lss.`country_id` = c.`country_id`
WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' )
GROUP BY lss.`transaction_utc`
TABLE SCHEMA:
CREATE TABLE IF NOT EXISTS `sales` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_linkshare_account_id` int(11) unsigned NOT NULL,
`username` varchar(16) NOT NULL,
`country_id` int(4) unsigned NOT NULL,
`order` varchar(16) NOT NULL,
`raw_tracking_code` varchar(255) DEFAULT NULL,
`transaction_utc` datetime NOT NULL,
`processed_utc` datetime NOT NULL ,
`sku` varchar(16) NOT NULL,
`sale_original` decimal(10,4) NOT NULL,
`sale_usd` decimal(10,4) NOT NULL,
`quantity` int(11) NOT NULL,
`commission_original` decimal(10,4) NOT NULL,
`commission_usd` decimal(10,4) NOT NULL,
`original_currency` char(3) NOT NULL,
PRIMARY KEY (`id`,`transaction_utc`),
UNIQUE KEY `idx_unique` (`username`,`order`,`processed_utc`,`sku`,`transaction_utc`),
KEY `raw_tracking_code` (`raw_tracking_code`),
KEY `idx_usd_amounts` (`sale_usd`,`commission_usd`),
KEY `idx_countries` (`country_id`),
KEY `transaction_utc` (`transaction_utc`,`username`,`country_id`,`sale_usd`,`commission_usd`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE ( TO_DAYS(`transaction_utc`))
(PARTITION pOLD VALUES LESS THAN (735112) ENGINE = InnoDB,
PARTITION p201209 VALUES LESS THAN (735142) ENGINE = InnoDB,
PARTITION p201210 VALUES LESS THAN (735173) ENGINE = InnoDB,
PARTITION p201211 VALUES LESS THAN (735203) ENGINE = InnoDB,
PARTITION p201212 VALUES LESS THAN (735234) ENGINE = InnoDB,
PARTITION pMAX VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ AUTO_INCREMENT=19696320 ;
The offending part is probably the GROUP BY DATE(transaction_utc). You also claim to have a covering index for this query but I see none. Your 5-column index has all the columns used in the query but not in the best order (which is: WHERE - GROUP BY - SELECT).
So, the engine, finding no useful index, would have to evaluate this function for all the 20M rows. Actually, it finds an index that starts with username (the idx_unique) and it uses that, so it has to evaluate the function for (only) 1.2M rows. If you had a (transaction_utc) or a (username, transaction_utc) it would choose the most useful of the three.
Can you afford to change the table structure by splitting the column into date and time parts?
If you can, then an index on (username, country_id, transaction_date) or (changing the order of the two columns used for grouping), on (username, transaction_date, country_id) would be quite efficient.
A covering index on (username, country_id, transaction_date, sale_usd, commission_usd) even better.
If you want to keep the current structure, try changing the order inside your 5-column index to:
(username, country_id, transaction_utc, sale_usd, commission_usd)
or to:
(username, transaction_utc, country_id, sale_usd, commission_usd)
Since you are using MariaDB, you can use the VIRTUAL columns feature, without changing the existing columns:
Add a virtual (persistent) column and the appropriate index:
ALTER TABLE sales
ADD COLUMN transaction_date DATE NOT NULL
AS DATE(transaction_utc)
PERSISTENT
ADD INDEX special_IDX
(username, country_id, transaction_date, sale_usd, commission_usd) ;