Correctly optimising MySQL data for date range queries?

Correctly optimising MySQL data for date range queries? - mysql

I have a table with lots of numeric data. I need to query this to get the closest row for a specific date_added and name.
My problem is that this data is not ordered by date, so when returning results I need to include ORDER BY date_added (or it doesn't return the correct row). Currently doing this takes a good 90 seconds to run because of this ORDER BY condition.
Are there any ways I can further optimise this? I've already indexed the date_added and name columns, so I'm not really sure what else can be done. I considered creating a new table with the data reordered in date_added order, but this isn't practical as new entries need to be added regularly.
I've stored the numeric data as decimal as it can potentially be very small, very large or both. Perhaps storing this data in a different way would be more efficient?

Add a compound index on name and date. The query above will run without using filesort.
An alternative way for the query:
SELECT date_added, data_1, data_2, data_3, data_4, data_5, data_6, data_7, data_8, data_9
FROM numeric_data
WHERE date_added =
(select min(date_added) from numeric_data where date_added >= '2018-05-03 11:00:00' and name = 'aaa')
and name = 'aaa'
limit 1;
Fiddle: http://sqlfiddle.com/#!9/4e8d89/1 .

You can use range partitioning:
https://dev.mysql.com/doc/refman/5.7/en/partitioning-range.html
You need to define your partitions depends on the date range you have.
CREATE TABLE `numeric_data` (
`id` int(255) NOT NULL AUTO_INCREMENT,
`date_added` datetime NOT NULL,
`name` varchar(8) COLLATE utf8mb4_unicode_ci NOT NULL,
`data_1` decimal(30,17) NOT NULL,
`data_2` decimal(30,17) NOT NULL,
`data_3` decimal(30,17) NOT NULL,
`data_4` decimal(30,17) NOT NULL,
`data_5` decimal(30,17) NOT NULL,
`data_6` decimal(30,17) NOT NULL,
`data_7` decimal(30,17) NOT NULL,
`data_8` decimal(30,17) NOT NULL,
`data_9` decimal(30,17) NOT NULL,
PRIMARY KEY (`id`),
KEY `date_added` (`date_added`),
KEY `name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=60000000 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
PARTITION BY RANGE( TO_DAYS(date_added) ) (
PARTITION p1 VALUES LESS THAN (TO_DAYS('2018-01-01')),
PARTITION p2 VALUES LESS THAN (TO_DAYS('2018-02-01')),
PARTITION p3 VALUES LESS THAN (TO_DAYS('2018-03-01')),
PARTITION p4 VALUES LESS THAN (TO_DAYS('2018-04-01')),
PARTITION future VALUES LESS THAN MAXVALUE
);
For the below query will only use partition "future" :
SELECT date_added, data_1, data_2, data_3, data_4, data_5, data_6, data_7, data_8, data_9
FROM numeric_data
WHERE date_added >= '2018-05-03 11:00:00'
AND name = 'aaa'
ORDER BY date_added LIMIT 1

Related

MySQL query with IN clause loses performance

I have a table to store data from csv files. It is a large table (over 40 million rows). This is its structure:
CREATE TABLE `imported_lines` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`day` date NOT NULL,
`name` varchar(256) NOT NULL,
`origin_id` int(11) NOT NULL,
`time` time(3) NOT NULL,
`main_index` tinyint(4) NOT NULL DEFAULT 0,
`transaction_index` tinyint(4) NOT NULL DEFAULT 0,
`data` varchar(4096) NOT NULL,
`error` bit(1) NOT NULL,
`expressions_applied` bit(1) NOT NULL,
`count_records` smallint(6) NOT NULL DEFAULT 0,
`client_id` tinyint(4) NOT NULL DEFAULT 0,
`receive_date` datetime(3) NOT NULL,
PRIMARY KEY (`id`,`client_id`),
UNIQUE KEY `uq` (`client_id`,`name`,`origin_id`,`receive_date`),
KEY `dh` (`day`,`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY HASH (`client_id`) PARTITIONS 15 */
When I perform a SELECT with one day filter, it returns data very quick (0.4 s). But, as I increase date range, it becomes slow, until gets a timeout error.
This is the query:
SELECT origin_id, error, main_index, transaction_index,
expressions_applied, name, day,
COUNT(id) AS total, SUM(count_records) AS sum_records
FROM imported_lines FORCE INDEX (dh)
WHERE client_id = 1
AND day >= '2017-07-02' AND day <= '2017-07-03'
AND name IN ('name1', 'name2', 'name3', ...)
GROUP BY origin_id, error, main_index, transaction_index, expressions_applied, name, day;
I think the IN clause may be losing performance. I also tried to add uq index to this query, which gave a little gain (FORCE INDEX (dh, uq)).
Plus, I tried to INNER JOIN (SELECT name FROM providers WHERE id = 2) prov ON prov.name = il.name but doesn't result in a quicker query as well.
EDIT
EXPLAINing the query
id - 1
select_type - SIMPLE
table - imported_lines
type - range
possible_keys - uq, dh
key - dh
key_len - 261
ref - NULL
rows - 297988
extra - Using where; Using temporary; Using filesort
Any suggestions what it should do?

I have done a few changes, adding a new index with multiple columns (as suggested by #Uueerdo) and rewritten query as another user suggested too (but he deleted his answer).
I ran a few EXPLAIN PARTITIONS with queries, tested with SQL_NO_CACHE in order to guarantee it wouldn't use cache and searching data for one whole month now takes 1.8s.
It's so much faster!
This is what I did:
ALTER TABLE `imported_lines` DROP INDEX dh;
ALTER TABLE `imported_lines` ADD INDEX dhc (`day`, `name`, `client_id`);
Query:
SELECT origin_id, error, main_index, transaction_index,
expressions_applied, name, day,
COUNT(id) AS total, SUM(count_records) AS sum_records
FROM imported_lines il
INNER JOIN (
SELECT id FROM imported_lines
WHERE client_id = 1
AND day >= '2017-07-01' AND day <= '2017-07-31'
AND name IN ('name1', 'name2', 'name3', ...)
) AS il_filter
ON il_filter.id = il.id
WHERE il.client_id = 1
GROUP BY origin_id, error, main_index, transaction_index, expressions_applied, name, day;
I realized using INNER JOIN, EXPLAIN PARTITIONS it began to use index. Also with WHERE il.client_id = 1, query reduces the number of partitions to look up.
Thanks for your help!

Optimize a query

How can I proceed to make my response time more faster, approximately the average time of response is 0.2s ( 8039 records in my items table & 81 records in my tracking table )
Query
SELECT a.name, b.cnt FROM `items` a LEFT JOIN
(SELECT guid, COUNT(*) cnt FROM tracking WHERE
date > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day ) GROUP BY guid) b ON
a.`id` = b.guid WHERE a.`type` = 'streaming' AND a.`state` = 1
ORDER BY b.cnt DESC LIMIT 15 OFFSET 75
Tracking table structure
CREATE TABLE `tracking` (
`id` bigint(11) NOT NULL AUTO_INCREMENT,
`guid` int(11) DEFAULT NULL,
`ip` int(11) NOT NULL,
`date` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `i1` (`ip`,`guid`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=4303 DEFAULT CHARSET=latin1;
Items table structure
CREATE TABLE `items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`guid` int(11) DEFAULT NULL,
`type` varchar(255) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`embed` varchar(255) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`description` text,
`tags` varchar(255) DEFAULT NULL,
`date` int(11) DEFAULT NULL,
`vote_val_total` float DEFAULT '0',
`vote_total` float(11,0) DEFAULT '0',
`rate` float DEFAULT '0',
`icon` text CHARACTER SET ascii,
`state` int(11) DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=9258 DEFAULT CHARSET=latin1;

Your query, as written, doesn't make much sense. It produces all possible combinations of rows in your two tables and then groups them.
You may want this:
SELECT a.*, b.cnt
FROM `items` a
LEFT JOIN (
SELECT guid, COUNT(*) cnt
FROM tracking
WHERE `date` > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day)
GROUP BY guid
) b ON a.guid = b.guid
ORDER BY b.cnt DESC
The high-volume data in this query come from the relatively large tracking table. So, you should add a compound index to it, using the columns (date, guid). This will allow your query to random-access the index by date and then scan it for guid values.
ALTER TABLE tracking ADD INDEX guid_summary (`date`, guid);
I suppose you'll see a nice performance improvement.
Pro tip: Don't use SELECT *. Instead, give a list of the columns you want in your result set. For example,
SELECT a.guid, a.name, a.description, b.cnt
Why is this important?
First, it makes your software more resilient against somebody adding columns to your tables in the future.
Second, it tells the MySQL server to sling around only the information you want. That can improve performance really dramatically, especially when your tables get big.

Since tracking has significantly fewer rows than items, I will propose the following.
SELECT i.name, c.cnt
FROM
(
SELECT guid, COUNT(*) cnt
FROM tracking
WHERE date > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day )
GROUP BY guid
) AS c
JOIN items AS i ON i.id = c.guid
WHERE i.type = 'streaming'
AND i.state = 1;
ORDER BY c.cnt DESC
LIMIT 15 OFFSET 75
It will fail to display any items for which cnt is 0. (Your version displays the items with NULL for the count.)
Composite indexes needed:
items: The PRIMARY KEY(id) is sufficient.
tracking: INDEX(date, guid) -- "covering"
Other issues:
If ip is an IP-address, it needs to be INT UNSIGNED. But that covers only IPv4, not IPv6.
It seems like date is not just a "date", but really a date+time. Please rename it to avoid confusion.
float(11,0) -- Don't use FLOAT for integers. Don't use (m,n) on FLOAT or DOUBLE. INT UNSIGNED makes more sense here.
OFFSET is naughty when it comes to performance -- it must scan over the skipped records. But, in your query, there is no way to avoid collecting all the possible rows, sorting them, stepping over 75, and only finally delivering 15 rows. (And, with no more than 81, it won't be a full 15.)
What version are you using? There have been important changes to the Optimization of LEFT JOIN ( SELECT ... ). Please provide EXPLAIN SELECT for each query under discussion.

SQL VIEW simplification/solution faster Querys

I'm trying to break down and re-write a view that had been created by a long gone developer. The query takes well over three minuites to access, I'm assuming from all the CONCATs.
CREATE VIEW `active_users_over_time` AS
select
`users_activity`.`date` AS `date`,
time_format(
addtime(
concat(`users_activity`.`date`,' ',`users_activity`.`time`),
concat('0 ',sec_to_time(`users_activity`.`duration_checkout`),'.0')
),'%H:%i:%s') AS `time`,
`users_activity`.`username` AS `username`,
count(addtime(concat(`users_activity`.`date`,' ',`users_activity`.`time`),
concat('0 ',sec_to_time(`users_activity`.`duration_checkout`),'.0'))) AS `checkouts`
from `users_activity`
group by
concat(
addtime(
concat(`users_activity`.`date`,' ',`users_activity`.`time`),
concat('0 ',sec_to_time(`users_activity`.`duration_checkout`),'.0')
),
`users_activity`.`username`);
The data comes from the SQL table:
CREATE TABLE `users_activity` (
`id` int(10) unsigned NOT NULL auto_increment,
`featureid` smallint(5) unsigned NOT NULL,
`date` date NOT NULL,
`time` time NOT NULL,
`duration_checkout` int unsigned NOT NULL,
`update_date` date NOT NULL,
`username` varchar(255) NOT NULL,
`checkout` smallint(5) unsigned NOT NULL,
`licid` smallint(5) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `featureid_licid_username` (`featureid`,`licid`,`date`,`time`,`username`),
FOREIGN KEY(featureid) REFERENCES features(id)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
I'm having a hard time decifering what exactly what is needed and what isnt needed.
Anyone have any ideas? Thanks.

I think this does everything that the original query did, skipping a bunch of redundant steps:
select `date`
, `time`
, `username`
, count(1) as `checkouts`
from
(
select
`users_activity`.`date` AS `date`
,time_format(
addtime(`users_activity`.`date`,`users_activity`.`time`)
+ interval `users_activity`.`duration_checkout` second
,'%H:%i:%s'
) AS `time`
,`users_activity`.`username` AS `username`
from `users_activity`
) x
group by `username`, `date`, `time`
You may also want to look at what indexes are on the table to see if optimisations can be made elsewhere (e.g. if you don't already have an index on the username and date fields you'd get a lot of benefit for this query by adding one).

You can start from rewriting GROUP BY clase from this:
group by
concat(
addtime(
concat(`users_activity`.`date`,' ',`users_activity`.`time`),
concat('0 ',sec_to_time(`users_activity`.`duration_checkout`),'.0')
),
`users_activity`.`username`);
to this one:
GROUP BY `users_activity`.`date`,
`users_activity`.`time`,
`users_activity`.`duration_checkout`,
`users_activity`.`username`
This change should give some slight savings on converting dates to strings and concatenating them, and the result of the query shouldn't change.
Then you may consider creating a composite index on GROUP BY columns.
According to this link: http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html
The most important preconditions for using indexes for GROUP BY are that all GROUP BY columns reference attributes from the same index
It means, that if we create the following index:
CREATE INDEX idx_name ON `users_activity`(
`date`,`time`,`duration_checkout`,`username`
);
then MySql might use it to optimize GROUP BY (but there is no guarantee).

How to aggregate data without group by

I am having a little bit of a situation here.
The environment
I have a database for series here.
One table for the series itself, one for the season connected to the series table, one for the episodes connected to the seasons table.
Since there are air dates for different countries I have another table called 'series_data` which looks like the following:
CREATE TABLE IF NOT EXISTS `episode_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`episode_id` int(11) NOT NULL,
`country` char(3) NOT NULL,
`title` varchar(255) NOT NULL,
`date` date NOT NULL,
`tba` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `episode_id` (`episode_id`),
KEY `date` (`date`),
KEY `country` (`country`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Now I am trying to collect the last aired episodes from each series in the database using the following query:
SELECT
*
FROM
`episode_data` ed
WHERE
`ed`.`date` < CURDATE( ) &&
`ed`.`date` != '1970-01-01' &&
`ed`.`series_id` = 1
GROUP BY
`ed`.`country` DESC
ORDER BY
`ed`.`date` DESC
Since I have everything normalized I changed 'episode_id' with 'series_id' to make the query less complicated.
What I am trying to accomplish
I want to have the last aired episodes for each country which are actually announced (ed.date != '1970-01-01') as the returning result of one query.
What's the problem
I know now (searched google, found not for me working answers here), that the ordering takes place AFTER grouping, so my "date" ordering is completly useless.
The other problem is that the query above is working, but always takes those entries with the lowest id matching my conditions, because those are the first ones in the tables index.
What is the question?
How may accomplish the above. I do not know if the grouping is the right way to do it. If there is no "one liner", I think the only way is a sub query which I want to avoid since this is as far as I know slower than a one liner with the right indexes set.
Hope in here is everything you need :)
Example data
CREATE TABLE IF NOT EXISTS `episode_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`episode_id` int(11) NOT NULL,
`country` char(3) NOT NULL,
`title` varchar(255) NOT NULL,
`date` date NOT NULL,
`tba` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `episode_id` (`episode_id`),
KEY `date` (`date`),
KEY `country` (`country`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `episode_data` (`id`, `episode_id`, `country`, `title`, `date`, `tba`) VALUES
(4942, 2471, 'de', 'Väter und Töchter', '2013-08-06', 0),
(4944, 2472, 'de', 'Neue Perspektiven', '2013-08-13', 0),
(5013, 2507, 'us', 'Into the Deep', '2013-08-06', 0),
(5015, 2508, 'us', 'The Mirror Has Three Faces', '2013-08-13', 0);
Attention!
This is the original table data with "EPISODE_ID" not "SERIES_ID".
The data I want are those with closest dates to today, which are here 4944 and 5015.

If you want the last aired date for each country, then use this aggregation:
SELECT country, max(date) as lastdate
FROM `episode_data` ed
WHERE `ed`.`date` < CURDATE( ) AND
`ed`.`date` != '1970-01-01' AND
`ed`.`series_id` = 1
GROUP BY `ed`.`country`;
If you are trying to get the episode_id and title as well, you can use group_concat() and substring_index():
SELECT country, max(date) as lastdate,
substring_index(group_concat(episode_id order by date desc), ',', 1
) as episode_id,
substring_index(group_concat(title order by date desc separator '|'), '|', 1
) as title
FROM `episode_data` ed
WHERE `ed`.`date` < CURDATE( ) AND
`ed`.`date` != '1970-01-01' AND
`ed`.`series_id` = 1
GROUP BY `ed`.`country`;
Note that this uses a different separator for the title, under the assumption that it might have a comma.

Why is my MySQL group by so slow?

I am trying to query against a partitioned table (by month) approaching 20M rows. I need to group by DATE(transaction_utc) as well as country_id. The rows that get returned if i turn off the group by and aggregates is just over 40k, which isn't too many, however adding the group by makes the query substantially slower unless said GROUP BY is on the transaction_utc column, in which case it gets FAST.
I've been trying to optimize this first query below by tweaking the query and/or the indexes, and got to the point below (about 2x as fast as initially) however still stuck with a 5s query for summarizing 45k rows, which seems way too much.
For reference, this box is a brand new 24 logical core, 64GB RAM, Mariadb-5.5.x server with way more INNODB buffer pool available than index space on the server, so shouldn't be any RAM or CPU pressures.
So, I'm looking for ideas on what is causing this slow down and suggestions on speeding it up. Any feedback would be greatly appreciated! :)
Ok, onto the details...
The following query (the one I actually need) takes approx 5 seconds (+/-), and returns less than 100 rows.
SELECT lss.`country_id` AS CountryId
, Date(lss.`transaction_utc`) AS TransactionDate
, c.`name` AS CountryName, lss.`country_id` AS CountryId
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD
FROM `sales` lss
JOIN `countries` c ON lss.`country_id` = c.`country_id`
WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' ) GROUP BY lss.`country_id`, DATE(lss.`transaction_utc`)
EXPLAIN SELECT for the same query is as follows. Notice that it's not using the transaction_utc key. Shouldn't it be using my covering index instead?
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE lss ref idx_unique,transaction_utc,country_id idx_unique 50 const 1208802 Using where; Using temporary; Using filesort
1 SIMPLE c eq_ref PRIMARY PRIMARY 4 georiot.lss.country_id 1
Now onto a couple other options that I've tried to attempt to determine whats going on...
The following query (changed group by) takes about 5 seconds (+/-), and returns only 3 rows:
SELECT lss.`country_id` AS CountryId
, DATE(lss.`transaction_utc`) AS TransactionDate
, c.`name` AS CountryName, lss.`country_id` AS CountryId
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD
FROM `sales` lss
JOIN `countries` c ON lss.`country_id` = c.`country_id`
WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' ) GROUP BY lss.`country_id`
The following query (removed group by) takes 4-5 seconds (+/-) and returns 1 row:
SELECT lss.`country_id` AS CountryId
, DATE(lss.`transaction_utc`) AS TransactionDate
, c.`name` AS CountryName, lss.`country_id` AS CountryId
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD
FROM `sales` lss
JOIN `countries` c ON lss.`country_id` = c.`country_id`
WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' )
The following query takes .00X seconds (+/-) and returns ~45k rows. This to me shows that at max we're only trying to group 45K rows into less than 100 groups (as in my initial query):
SELECT lss.`country_id` AS CountryId
, DATE(lss.`transaction_utc`) AS TransactionDate
, c.`name` AS CountryName, lss.`country_id` AS CountryId
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD
FROM `sales` lss
JOIN `countries` c ON lss.`country_id` = c.`country_id`
WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' )
GROUP BY lss.`transaction_utc`
TABLE SCHEMA:
CREATE TABLE IF NOT EXISTS `sales` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_linkshare_account_id` int(11) unsigned NOT NULL,
`username` varchar(16) NOT NULL,
`country_id` int(4) unsigned NOT NULL,
`order` varchar(16) NOT NULL,
`raw_tracking_code` varchar(255) DEFAULT NULL,
`transaction_utc` datetime NOT NULL,
`processed_utc` datetime NOT NULL ,
`sku` varchar(16) NOT NULL,
`sale_original` decimal(10,4) NOT NULL,
`sale_usd` decimal(10,4) NOT NULL,
`quantity` int(11) NOT NULL,
`commission_original` decimal(10,4) NOT NULL,
`commission_usd` decimal(10,4) NOT NULL,
`original_currency` char(3) NOT NULL,
PRIMARY KEY (`id`,`transaction_utc`),
UNIQUE KEY `idx_unique` (`username`,`order`,`processed_utc`,`sku`,`transaction_utc`),
KEY `raw_tracking_code` (`raw_tracking_code`),
KEY `idx_usd_amounts` (`sale_usd`,`commission_usd`),
KEY `idx_countries` (`country_id`),
KEY `transaction_utc` (`transaction_utc`,`username`,`country_id`,`sale_usd`,`commission_usd`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE ( TO_DAYS(`transaction_utc`))
(PARTITION pOLD VALUES LESS THAN (735112) ENGINE = InnoDB,
PARTITION p201209 VALUES LESS THAN (735142) ENGINE = InnoDB,
PARTITION p201210 VALUES LESS THAN (735173) ENGINE = InnoDB,
PARTITION p201211 VALUES LESS THAN (735203) ENGINE = InnoDB,
PARTITION p201212 VALUES LESS THAN (735234) ENGINE = InnoDB,
PARTITION pMAX VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ AUTO_INCREMENT=19696320 ;

The offending part is probably the GROUP BY DATE(transaction_utc). You also claim to have a covering index for this query but I see none. Your 5-column index has all the columns used in the query but not in the best order (which is: WHERE - GROUP BY - SELECT).
So, the engine, finding no useful index, would have to evaluate this function for all the 20M rows. Actually, it finds an index that starts with username (the idx_unique) and it uses that, so it has to evaluate the function for (only) 1.2M rows. If you had a (transaction_utc) or a (username, transaction_utc) it would choose the most useful of the three.
Can you afford to change the table structure by splitting the column into date and time parts?
If you can, then an index on (username, country_id, transaction_date) or (changing the order of the two columns used for grouping), on (username, transaction_date, country_id) would be quite efficient.
A covering index on (username, country_id, transaction_date, sale_usd, commission_usd) even better.
If you want to keep the current structure, try changing the order inside your 5-column index to:
(username, country_id, transaction_utc, sale_usd, commission_usd)
or to:
(username, transaction_utc, country_id, sale_usd, commission_usd)
Since you are using MariaDB, you can use the VIRTUAL columns feature, without changing the existing columns:
Add a virtual (persistent) column and the appropriate index:
ALTER TABLE sales
ADD COLUMN transaction_date DATE NOT NULL
AS DATE(transaction_utc)
PERSISTENT
ADD INDEX special_IDX
(username, country_id, transaction_date, sale_usd, commission_usd) ;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Correctly optimising MySQL data for date range queries? - mysql

Related

MySQL query with IN clause loses performance

Optimize a query

SQL VIEW simplification/solution faster Querys

How to aggregate data without group by

Why is my MySQL group by so slow?

Categories

Resources