How to partition MySQL table with two indexes - mysql

I have a table game_log with fields id, game_id and several varchar fields.
id is primary key and game_id is non-unique key.
There're two frequent queries:
SELECT * FROM game_log ORDER BY id DESC LIMIT 20
SELECT * FROM game_log WHERE game_id = <value> ORDER BY id DESC
The table is huge (6.1GB and 32M rows). InnoDB. Rows in it are being added randomly (one per query). Also, some games are being deleted.
I need to reduce disk IO and imrpove responsiveness.
Should I use key or range partitioning? If range, then by id or by game_id? Is there any theory?

Use partitioning by range.
If you partition by key, both of your example queries have to touch every partition.
The theory is that partitioning by KEY is like partitioning by hash, in that consecutive values of the primary key are bound to be stored in separate partitions. By querying a range of id values, you spoil the partition pruning.
Demo:
CREATE TABLE `game_log` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`game_id` int(11) NOT NULL DEFAULT '0',
`xyz` varchar(15) DEFAULT NULL,
PRIMARY KEY (`id`,`game_id`)
)
PARTITION BY KEY ()
PARTITIONS 13;
INSERT INTO game_log (game_id) VALUES (1), (2), (3), (4), (5), (6);
EXPLAIN PARTITIONS SELECT * FROM game_log ORDER BY id DESC LIMIT 3\G
id: 1
select_type: SIMPLE
table: game_log
partitions: p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12
EXPLAIN PARTITIONS SELECT * FROM game_log WHERE game_id = 4 ORDER BY id DESC LIMIT 3\G
id: 1
select_type: SIMPLE
table: game_log
partitions: p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12
Whereas if you partition by range on game_id, you can get partition pruning to help you at least when you query for a specific game_id. But your query for any game_id order by id desc is still bound to touch every partition.
CREATE TABLE `game_log` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`game_id` int(11) NOT NULL DEFAULT '0',
`xyz` varchar(15) DEFAULT NULL,
PRIMARY KEY (`id`,`game_id`)
)
PARTITION BY RANGE (game_id)
(PARTITION p1 VALUES LESS THAN (3),
PARTITION p2 VALUES LESS THAN (6),
PARTITION p3 VALUES LESS THAN MAXVALUE);
INSERT INTO game_log (game_id) VALUES (1), (2), (3), (4), (5), (6);
EXPLAIN PARTITIONS SELECT * FROM game_log ORDER BY id DESC LIMIT 3\G
id: 1
select_type: SIMPLE
table: game_log
partitions: p1,p2,p3
EXPLAIN PARTITIONS SELECT * FROM game_log WHERE game_id = 4 ORDER BY id DESC LIMIT 3\G
id: 1
select_type: SIMPLE
table: game_log
partitions: p2

Related

partition mysql table on a none primary key column

I have a table:
+----+---------+----------+
| id | user_id | comment |
+----+---------+----------+
Where column type is:
id (bigint not null primary key autoincrement)
user_id (bigint not null)
comment (text)
How can I partition this table on user_id by range? I tried to partition this table by range in PHPMyAdmin but doesn't allow me because user_id isn't a primary key. If I have many 10 billion users and each has an infinite amount of comments this table will be very large. I want to partition it like:
partition 1 (user_id<500)
+----+---------+----------+
| id | user_id | comment |
+----+---------+----------+
partition 2 (user_id<1000)
+----+---------+----------+
| id | user_id | comment |
+----+---------+----------+
And so on.
Ensure you have satisfied the criteria of when to use partitioning. This is a rather rare case and needs to map closely to your queries. A 500 user range seems tiny. MySQL can handle large tables without partitioning so don't assume its necessary.
The form is:
CREATE TABLE tbl (
id bigint unsigned AUTO_INCREMENT NOT NULL,
user_id bigint unsigned NOT NULL,
COMMENT TEXT NOT NULL,
PRIMARY KEY (user_id, id),
key(id))
PARTITION BY RANGE (user_id) (
PARTITION p0 VALUES LESS THAN (500),
PARTITION p1 VALUES LESS THAN (1000),
PARTITION p2 VALUES LESS THAN (2000),
PARTITION p3 VALUES LESS THAN (3000)
);
ref: fiddle
Yes, since user_id is not part of the table primary key(s) or unique keys you can't create partitions solely for the user_id on your table as the DOCs states very clearly
every unique key on the table must use every column in the table's partitioning expression
So for your case what you can do is to add a unique key on your table for both id and user_id
alter table myTable add unique key uk_id_userid (id, user_id);
And then add the range partition for only user_id column as such:
alter table myTable partition by range (user_id) (
PARTITION p0 VALUES LESS THAN (10),
PARTITION p1 VALUES LESS THAN (20),
PARTITION p2 VALUES LESS THAN (30),
PARTITION p3 VALUES LESS THAN (40)
);
Note Since you already have a table with values in order to define your partition ranges you need to wrap around all existing values for your user_id column in your partitions. That is if you have a user_id of 1000 you can not define your last partition as PARTITION p3 VALUES LESS THAN (1000) that will fail. You will need one more partition i.e.: PARTITION p3 VALUES LESS THAN (2000) or PARTITION p3 VALUES LESS THAN MAXVALUE
See it working here: http://sqlfiddle.com/#!9/8ca7ed
Full working example:
create table myTable (
id bigint not null auto_increment,
user_id bigint not null,
comment text,
key (id)
) engine=InnoDb;
insert into myTable (user_id, comment) values
(1, 'bla'), (1, 'ble'), (1, 'bli'), (1, 'blo'),
(12, 'bla'), (12, 'ble'), (12, 'bli'), (12, 'blo'),
(23, 'bla'), (23, 'ble'), (23, 'bli'), (23, 'blo'),
(34, 'bla'), (34, 'ble'), (34, 'bli'), (34, 'blo');
alter table myTable add unique key uk_id_userid (id, user_id);
alter table myTable partition by range (user_id) (
PARTITION p0 VALUES LESS THAN (10),
PARTITION p1 VALUES LESS THAN (20),
PARTITION p2 VALUES LESS THAN (30),
PARTITION p3 VALUES LESS THAN (40)
);

Moving MANY partitions from one table to another -Mysql

I have a table with many partitions by date
I want to move some of the oldest partitions to another table.
I succeed moving the oldest partition by following the manual
but when I try to move more partitions I get: Error Code: 1737. Found a row that does not match the partition
So, I deleted the oldest partition and move the next partition -but then the rows from the first partitions returned to the original table (I did NOT see any documentation for record that go back...)
How can I move the three first partitions to another table?
THANKS
CREATE TABLE `TestPartA` (
`Name` VARCHAR(50) NOT NULL,
`Time` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
`Slot` INT(11) NOT NULL DEFAULT '-1',
`Text` VARCHAR(50) NOT NULL,
PRIMARY KEY (`Name`, `Text`, `Time`),
INDEX `ClusterTimeIdx` (`Name`, `Time`, `Slot`),
INDEX `Time` (`Time`)
)
PARTITION BY RANGE (TO_DAYS(TIME))
(PARTITION p20190407 VALUES LESS THAN (TO_DAYS('2019-04-07')) ,
PARTITION p20190421 VALUES LESS THAN (TO_DAYS('2019-04-21')) ,
PARTITION p20190428 VALUES LESS THAN (TO_DAYS('2019-04-28')),
PARTITION p20190505 VALUES LESS THAN (TO_DAYS('2019-05-05'))) ;
CREATE TABLE TestPartB LIKE TestPartA;
ALTER TABLE TestPartB REMOVE PARTITIONING;
insert into TestPartA values ('A','2019-04-02',1,'W1');
insert into TestPartA values ('A','2019-04-04',1,'W1');
insert into TestPartA values ('A','2019-04-08',1,'W1');
insert into TestPartA values ('A','2019-04-20',1,'W1');
insert into TestPartA values ('A','2019-05-01',1,'W1');
SELECT PARTITION_NAME, TABLE_ROWS FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME = 'TestPartA';
-- move the first partition
ALTER TABLE TestPartA EXCHANGE PARTITION p20190407 WITH TABLE TestPartB; -- Works GREAT
select * from TestPartA;
select * from TestPartB;
SELECT PARTITION_NAME, TABLE_ROWS FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME = 'TestPartA'; -- this is not working any more - but according to documentation it happens sometimes
--move the second partition
ALTER TABLE TestPartA EXCHANGE PARTITION p20190421 WITH TABLE TestPartB; -- FAILED
ALTER TABLE TestPartA drop PARTITION p20190407;
ALTER TABLE TestPartA EXCHANGE PARTITION p20190421 WITH TABLE TestPartB; -- Succeed but the rows from the first partitions returned to table A
select * from TestPartA;
select * from TestPartB;

Optimize COUNT(*)

I have a table items from which I'm selecting 40 rows at a time ordered by the popularity of the item.
The popularity score is simply downloads/impressions;
Query:
SELECT id, name
FROM items
ORDER BY (SELECT COUNT(*) FROM downloads WHERE item = items.id)/
(SELECT COUNT(*) FROM impressions WHERE item = items.id)
LIMIT 40;
The problem is that the query takes forever to complete (ranging from 2 to 10 seconds).
At the moment we have 25K items, 18M impressions, and 560k download.
We already tried adding the fields downloads and impressions in the table items and keeping the count updated using triggers (after an insert in the tables impressions and downloads we increment the values), but we've had some issues with deadlocking.
Is there a better way to optimize this query?
Thanks.
Edit
Here's the output of EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY items ALL NULL NULL NULL NULL 20496 Using filesort
3 DEPENDENT SUBQUERY impressions ref PRIMARY PRIMARY 4 db.items.id 74 Using index
2 DEPENDENT SUBQUERY downloads ref PRIMARY PRIMARY 4 db.items.id 274 Using index
Tables:
CREATE TABLE `items` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(35) DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=24369 DEFAULT CHARSET=utf8mb4;
CREATE TABLE `impressions` (
`item` int(10) unsigned NOT NULL,
`user` char(36) NOT NULL DEFAULT '',
PRIMARY KEY (`item`,`user`),
CONSTRAINT `impression_ibfk_1` FOREIGN KEY (`item`) REFERENCES `items` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `downloads` (
`item` int(10) unsigned NOT NULL,
`user` char(36) NOT NULL DEFAULT '',
PRIMARY KEY (`item`,`user`),
CONSTRAINT `download_ibfk_1` FOREIGN KEY (`item`) REFERENCES `items` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
I think next query can resolve your problem:
SELECT
item,items.name, downloads.cnt/impressions.cnt AS rate
FROM (
SELECT item, COUNT(*) AS cnt FROM downloads GROUP BY item
) AS downloads
JOIN (
SELECT item, COUNT(*) AS cnt FROM impressions GROUP BY item
) impressions
JOIN items ON items.id = downloads.items
ORDER BY rate DESC
LIMIT 40;
Also care about downloads and impressions tables have indexed by item field.
Not solvable with that approach.
There are two solutions:
Keep counters (by item.id) for impressions and downloads.
Summary tables.
Counters This involves adding an extra column for each counter to the items table. Or building a parallel table with id and the various counters. For really high volumn of counts, the latter avoids some clashes between various queries.
Summary Tables Build and incrementally augment a table(s) that summarize counts like these, plus perhaps other SUMs, COUNTs, etc. The table would perhaps be augmented daily for the previous day's information. Then the "sum the counts" to get the grand total; this will be much faster than your current query.
More on Summary Tables: http://mysql.rjweb.org/doc.php/summarytables
I'd count downloads and impressions first and then get the top 40:
with d as (select item, count(*) as total from downloads group by item)
, i as (select item, count(*) as total from impressions group by item)
, top40 as select item from d join i using (item) order by d.total / i.total limit 40)
select *
from items
where id in
(
select item from top40
);
The WITH clause is available as of MySQL 8. In earlier versions, you'd work with subqueries instead.
As item is a foreign key in downloads and impressions and id is the primary key in items, I suppose there is an index on them. Otherwise create one:
create unique index idx1 on items(id);
create index idx2 on downloads(item);
create index idx3 on impressions(item);

Mysql - how to keep unique constraint while partitioning by RANGE (timestamp)?

I have one table, want to partition by RANGE (created_at timestamp), so can delete old data easily (by drop partition).
CREATE TABLE `orders` (
`order_id` NVARCHAR(64) NOT NULL,
`amount` INTEGER NOT NULL,
`created_at` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
`modified_at` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
KEY `order_id` (`order_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
ALTER TABLE dropship.orders
PARTITION BY RANGE (UNIX_TIMESTAMP(created_at)) (
PARTITION p0 VALUES LESS THAN ( UNIX_TIMESTAMP('2019-03-01 00:00:00') ),
PARTITION p1 VALUES LESS THAN ( UNIX_TIMESTAMP('2019-04-01 00:00:00') ),
PARTITION p2 VALUES LESS THAN ( UNIX_TIMESTAMP('2019-05-01 00:00:00') ),
PARTITION p3 VALUES LESS THAN ( UNIX_TIMESTAMP('2019-06-01 00:00:00') ),
PARTITION p4 VALUES LESS THAN ( UNIX_TIMESTAMP('2019-07-01 00:00:00') ),
PARTITION p5 VALUES LESS THAN (MAXVALUE)
);
This table only has two usages: get by order_id, or update by order_id.
select * from orders where order_id = '123';
update orders set amount = 10 where order_id = '123';
Due to the limitation of Mysql partitioning, I cannot add an unique key for order_id since will use created_at field for partitioning.
All columns used in the table's partitioning expression must be part of every unique key that the table may have, including any primary key.
Question:
Any way to make order_id unique in this table please?
I have thought about partitioning by order_id, but it's hard to delete old data in that way.
Any suggestion is welcome. (For example may be you have better design for this table).
BEGIN;
SELECT 1 FROM orders WHERE order_id = 234 FOR UPDATE;
if row exists, you have a dup error.
INSERT INTO orders ... order_id = 234;
COMMIT;
But, as Raymond points out, you may as well drop PARTITIONing and make the column the PRIMARY KEY. This would make all the stated operations slightly faster.

Why is my MySQL group by so slow?

I am trying to query against a partitioned table (by month) approaching 20M rows. I need to group by DATE(transaction_utc) as well as country_id. The rows that get returned if i turn off the group by and aggregates is just over 40k, which isn't too many, however adding the group by makes the query substantially slower unless said GROUP BY is on the transaction_utc column, in which case it gets FAST.
I've been trying to optimize this first query below by tweaking the query and/or the indexes, and got to the point below (about 2x as fast as initially) however still stuck with a 5s query for summarizing 45k rows, which seems way too much.
For reference, this box is a brand new 24 logical core, 64GB RAM, Mariadb-5.5.x server with way more INNODB buffer pool available than index space on the server, so shouldn't be any RAM or CPU pressures.
So, I'm looking for ideas on what is causing this slow down and suggestions on speeding it up. Any feedback would be greatly appreciated! :)
Ok, onto the details...
The following query (the one I actually need) takes approx 5 seconds (+/-), and returns less than 100 rows.
SELECT lss.`country_id` AS CountryId
, Date(lss.`transaction_utc`) AS TransactionDate
, c.`name` AS CountryName, lss.`country_id` AS CountryId
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD
FROM `sales` lss
JOIN `countries` c ON lss.`country_id` = c.`country_id`
WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' ) GROUP BY lss.`country_id`, DATE(lss.`transaction_utc`)
EXPLAIN SELECT for the same query is as follows. Notice that it's not using the transaction_utc key. Shouldn't it be using my covering index instead?
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE lss ref idx_unique,transaction_utc,country_id idx_unique 50 const 1208802 Using where; Using temporary; Using filesort
1 SIMPLE c eq_ref PRIMARY PRIMARY 4 georiot.lss.country_id 1
Now onto a couple other options that I've tried to attempt to determine whats going on...
The following query (changed group by) takes about 5 seconds (+/-), and returns only 3 rows:
SELECT lss.`country_id` AS CountryId
, DATE(lss.`transaction_utc`) AS TransactionDate
, c.`name` AS CountryName, lss.`country_id` AS CountryId
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD
FROM `sales` lss
JOIN `countries` c ON lss.`country_id` = c.`country_id`
WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' ) GROUP BY lss.`country_id`
The following query (removed group by) takes 4-5 seconds (+/-) and returns 1 row:
SELECT lss.`country_id` AS CountryId
, DATE(lss.`transaction_utc`) AS TransactionDate
, c.`name` AS CountryName, lss.`country_id` AS CountryId
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD
FROM `sales` lss
JOIN `countries` c ON lss.`country_id` = c.`country_id`
WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' )
The following query takes .00X seconds (+/-) and returns ~45k rows. This to me shows that at max we're only trying to group 45K rows into less than 100 groups (as in my initial query):
SELECT lss.`country_id` AS CountryId
, DATE(lss.`transaction_utc`) AS TransactionDate
, c.`name` AS CountryName, lss.`country_id` AS CountryId
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD
FROM `sales` lss
JOIN `countries` c ON lss.`country_id` = c.`country_id`
WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' )
GROUP BY lss.`transaction_utc`
TABLE SCHEMA:
CREATE TABLE IF NOT EXISTS `sales` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_linkshare_account_id` int(11) unsigned NOT NULL,
`username` varchar(16) NOT NULL,
`country_id` int(4) unsigned NOT NULL,
`order` varchar(16) NOT NULL,
`raw_tracking_code` varchar(255) DEFAULT NULL,
`transaction_utc` datetime NOT NULL,
`processed_utc` datetime NOT NULL ,
`sku` varchar(16) NOT NULL,
`sale_original` decimal(10,4) NOT NULL,
`sale_usd` decimal(10,4) NOT NULL,
`quantity` int(11) NOT NULL,
`commission_original` decimal(10,4) NOT NULL,
`commission_usd` decimal(10,4) NOT NULL,
`original_currency` char(3) NOT NULL,
PRIMARY KEY (`id`,`transaction_utc`),
UNIQUE KEY `idx_unique` (`username`,`order`,`processed_utc`,`sku`,`transaction_utc`),
KEY `raw_tracking_code` (`raw_tracking_code`),
KEY `idx_usd_amounts` (`sale_usd`,`commission_usd`),
KEY `idx_countries` (`country_id`),
KEY `transaction_utc` (`transaction_utc`,`username`,`country_id`,`sale_usd`,`commission_usd`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE ( TO_DAYS(`transaction_utc`))
(PARTITION pOLD VALUES LESS THAN (735112) ENGINE = InnoDB,
PARTITION p201209 VALUES LESS THAN (735142) ENGINE = InnoDB,
PARTITION p201210 VALUES LESS THAN (735173) ENGINE = InnoDB,
PARTITION p201211 VALUES LESS THAN (735203) ENGINE = InnoDB,
PARTITION p201212 VALUES LESS THAN (735234) ENGINE = InnoDB,
PARTITION pMAX VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ AUTO_INCREMENT=19696320 ;
The offending part is probably the GROUP BY DATE(transaction_utc). You also claim to have a covering index for this query but I see none. Your 5-column index has all the columns used in the query but not in the best order (which is: WHERE - GROUP BY - SELECT).
So, the engine, finding no useful index, would have to evaluate this function for all the 20M rows. Actually, it finds an index that starts with username (the idx_unique) and it uses that, so it has to evaluate the function for (only) 1.2M rows. If you had a (transaction_utc) or a (username, transaction_utc) it would choose the most useful of the three.
Can you afford to change the table structure by splitting the column into date and time parts?
If you can, then an index on (username, country_id, transaction_date) or (changing the order of the two columns used for grouping), on (username, transaction_date, country_id) would be quite efficient.
A covering index on (username, country_id, transaction_date, sale_usd, commission_usd) even better.
If you want to keep the current structure, try changing the order inside your 5-column index to:
(username, country_id, transaction_utc, sale_usd, commission_usd)
or to:
(username, transaction_utc, country_id, sale_usd, commission_usd)
Since you are using MariaDB, you can use the VIRTUAL columns feature, without changing the existing columns:
Add a virtual (persistent) column and the appropriate index:
ALTER TABLE sales
ADD COLUMN transaction_date DATE NOT NULL
AS DATE(transaction_utc)
PERSISTENT
ADD INDEX special_IDX
(username, country_id, transaction_date, sale_usd, commission_usd) ;