MySQL Partitioning a Table That Contains a Primary Key - mysql

I have a table that I want to partition:
CREATE TABLE `tbl_orders` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(50) NOT NULL DEFAULT '0' COLLATE 'utf8mb4_general_ci',
`system_id` INT(11) NOT NULL DEFAULT '0',
`created_at` DATETIME NULL DEFAULT NULL,
`updated_at` DATETIME NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE,
INDEX `system_id` (`system_id`) USING BTREE
)
COLLATE='utf8mb4_general_ci'
ENGINE=InnoDB
AUTO_INCREMENT=8
;
ALTER table tbl_orders
PARTITION BY HASH(system_id)
PARTITIONS 4;
Example of what im trying to achieve:
I have a table which I want to partition by system_id in order to speed up queries.
When I run the partition I get the following error:
/* SQL Error (1503): A PRIMARY KEY must include all columns in the table's partitioning function */
What would I change to run this partition successfully whilst still achieving my aim which is to split the table on system_id?
Is partitioning this way achievable with a primary key on the table?

PARTITIONing requires you to add the "partition key" (system_id) to every Unique index, including the PRIMARY KEY.
You will, I predict, find that PARTITION BY HASH is useless for performance. It may even slow down the query.
Please show a query that you hope to speed up; I will advise in more detail.

Related

Update index values extremely slow on MySQL

I have three tables, one is in database db1 and two are in database db2, all on the same MySQL server:
CREATE TABLE `db1`.`user` (
`id` bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`user_name` varchar(20) NOT NULL,
`password_hash` varchar(71) DEFAULT NULL,
`email_address` varchar(100) NOT NULL,
`registration_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`registration_hash` char(16) DEFAULT NULL,
`active` bit(1) NOT NULL DEFAULT b'0',
`public` bit(1) NOT NULL DEFAULT b'0',
`show_name` bit(1) NOT NULL DEFAULT b'0',
PRIMARY KEY (`id`),
UNIQUE KEY `user_name` (`user_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `db2`.`ref` (
`id` bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `db2`.`combination` (
`ref_id` bigint(20) UNSIGNED NOT NULL,
`user_id` bigint(20) UNSIGNED NOT NULL,
`arbitrary_number` tinyint(3) UNSIGNED NOT NULL DEFAULT '0',
PRIMARY KEY (`figurine_id`,`user_id`),
KEY `combination_user` (`user_id`),
KEY `combination_number` (`user_id`,`arbitrary_number`),
CONSTRAINT `combination_ref` FOREIGN KEY (`ref_id`) REFERENCES `ref` (`id`) ON UPDATE CASCADE,
CONSTRAINT `combination_user` FOREIGN KEY (`user_id`) REFERENCES `db1`.`user` (`id`) ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The table db1.user has around 600 records, the table db2.ref has around 800 records and the table db2.combination has around 300K records.
Now using Perl and DBD::mysql I perform the following query:
UPDATE `db1`.`user` SET `id` = (`id` + 1000)
ORDER BY `id` DESC
However, this query always stops mentioning the connection to the MySQL server was lost. Also executing this same query via PhpMyAdmin results in a timeout. Somehow the query just takes a very long time to execute. I guess it comes because all the foreign key values need to be updated.
Setting FOREIGN_KEY_CHECKS variable to OFF will not update the user_id column in the db.combination table, which do need to be updated.
I have also tried to manipulate the different timeouts (as suggested all over the internet), like this:
SET SESSION net_read_timeout=3000;
SET SESSION net_write_timeout=3000;
SET SESSION wait_timeout=6000;
I have verified that the new values are actually set, by retrieving the values again. However, even with these long timeouts, the query still fails to execute and after about 30 seconds the connection to the MySQL server is again lost (while amidst executing the UPDATE query)
Any suggestions on how to speed up this query are more than welcome.
BTW: The PK columns have a very large integer type. I will also make this type smaller (change to INT). Could this type change also improve the speed significantly?
UPDATE
I also performed an EXPLAIN for the query and it mentions in the Extra column that the query is doing a filesort. I would have expected that due to the indexes on the table (added them, as they were not there in the first place), no filesort would take place.
The 300K CASCADEs is probably the really slow part of the task. So, let's avoid it. (However, there may be a check the verify the resulting links; this should be not-too-slow.)
Disable FOREIGN KEY processing
Create new tables without FOREIGN KEYs. new_user, new_combination. (I don't know if new_ref is needed.)
Do this to populate the tables:
INSERT INTO new_xx (user_id, ...)
SELECT user_id + 1000, ...;
ALTER TABLE new_xx ADD FOREIGN KEY ...; (for each xx)
`RENAME TABLE xx TO old_xx, new_xx TO xx;
`DROP TABLE old_xx;
Enable FOREIGN KEY processing

mysql - 500 tables with 100K-1M rows or 1 table with 50-500M rows

I've read many similar posts, yet I don't understand what to choose.
From software perspective it is game leaderboard. One table for all leaderboards or 500 small tables, one for each game level?
I've tested both variants, and have found:
1 big table works slower (with all needed indexes created).
1 big table should be partitioned at least into 10 files for adequate speed.
500 small tables are not that convenient, but twice faster (50M big table vs 100K small table)
500 small tables don't need partitioning (I heard about some problems with it in mysql, maybe in MariaDB 10.0 that I use everything is fixed, but just in case)
The only problem here is possibly many opened tables at once. I didn't thougt it's a problem until read setup suggestions in phpMyAdmin, so now I doubt should I use that many tables?
Just in case here's schemas.
"small" table:
CREATE TABLE IF NOT EXISTS `level0` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT '0',
`score` int(11) NOT NULL,
`timestamp` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `user_id` (`user_id`),
KEY `score` (`score`),
KEY `timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE IF NOT EXISTS `leaderboard` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT '0',
`level_no` int(11) NOT NULL,
`score` int(11) NOT NULL,
`timestamp` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `level_no` (`level_no`),
KEY `score` (`score`),
KEY `timestamp` (`timestamp`),
KEY `lev_sc` (`level_no`,`score`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY HASH (id)
PARTITIONS 10 */
Queries for ranking:
SELECT COUNT(score) FROM level0 WHERE score > $current_score
ORDER BY score desc
SELECT COUNT(score) FROM leaderboard WHERE
level_no = 0 and score > $current_score ORDER BY score desc
update
I've learned about indexes and ended up with the following schema for big table (20M rows):
CREATE TABLE IF NOT EXISTS `leaderboard` (
`user_id` int(11) NOT NULL DEFAULT '0',
`level_no` smallint(5) unsigned NOT NULL,
`score` int(11) unsigned NOT NULL,
`timestamp` int(11) unsigned NOT NULL,
PRIMARY KEY (`level_no`,`user_id`),
KEY `user_id` (`user_id`),
KEY `score` (`score`),
KEY `level_no_score` (`level_no`,`score`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
and for small (100K rows, got from leaderboard where level_no=200):
CREATE TABLE IF NOT EXISTS `level20` (
`user_id` int(11) NOT NULL DEFAULT '0',
`score` int(11) NOT NULL,
`timestamp` int(11) NOT NULL,
PRIMARY KEY (`user_id`),
KEY `score` (`score`),
KEY `timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
shared table with long literal user ids:
CREATE TABLE IF NOT EXISTS `player_ids` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`store_user_id` char(64) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `store_user_id` (`store_user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
For tests I've used these queries:
SELECT COUNT(*) AS rank FROM level20 lev WHERE score >
(SELECT score FROM level20 lt INNER JOIN player_ids pids ON
pids.id = lt.user_id WHERE pids.store_user_id='3FGTOHQN6UMwXI47IiRRMf9WI777SSJ6A' );
SELECT COUNT(*) AS rank FROM leaderboard lev WHERE level_no=20 and score >
(SELECT score FROM leaderboard lt INNER JOIN player_ids pids ON
pids.id = lt.user_id WHERE pids.store_user_id='3FGTOHQN6UMwXI47IiRRMf9WI777SSJ6A' and level_no=20 ) ;
I like the idea of using one big table, yet, while I'm getting similar timings (~0,050 for small and ~0,065 for big) on both queries, explain still confuses me a little:
for small table
type | key | key_len | ref | rows | extra
index; score; 4; (null); 50049; Using where, Using index
and for big table:
ref; PRIMARY 2; const; 164030; Using where
As you can see there are 3x less rows was scanned in the small table. Data in all tables are identical, level20 was filled with query:
INSERT INTO level20 (user_id, score, timestamp) SELECT user_id, score,
timestamp FROM leaderboard WHERE level_no=20;
another update
Have experimented today with tables and found that changing int to medium int almost doesn't change the size of the table. Here's statistics after optimize (recreate+analyse):
#medium ints
CREATE TABLE IF NOT EXISTS `leaderboard1` (
`user_id` mediumint(8) unsigned NOT NULL DEFAULT '0',
`level_no` smallint(5) unsigned NOT NULL DEFAULT '0',
`score` mediumint(8) unsigned NOT NULL DEFAULT '0',
`timestamp` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`level_no`,`user_id`),
KEY `score` (`score`),
KEY `level_no_score` (`level_no`,`score`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Data 628 Mb
Index 521.6 Mb
Total 1.1 Gb
#ints
CREATE TABLE IF NOT EXISTS `leaderboard` (
`user_id` int(11) NOT NULL DEFAULT '0',
`level_no` smallint(5) unsigned NOT NULL,
`score` int(11) unsigned NOT NULL,
`timestamp` int(11) unsigned NOT NULL,
PRIMARY KEY (`user_id`,`level_no`),
KEY `score` (`score`),
KEY `level_no_score` (`level_no`,`score`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Data 670 Mb
Index 597.8Mb
Total 1.2 Gb
And my queries work almost the same way on both tables. I have a feeling that table with medium ints is better and I leave it, yet still a little bit confused.
Your queries are a bit strange. Try this
SELECT COUNT(*)
FROM leaderboard
WHERE level_no = 0 and score > $current_score
Your ORDER BY here is pointless because this query can only return a single row: it's an aggregate query without any GROUP BY.
Five hundred tables is a terrible idea. Your administration tasks will be quite unpleasant.
Also, partitioning your tables rarely helps query performance. In the case you've proposed, partitioning on hash(id), will definitely wreck performance for the query you've shown; every query will have to read every partition.
Keep it simple. One table. When it gets reasonably big, use EXPLAIN to analyze your query performance, and consider adding appropriate compound indexes.
Don't create indexes you don't need. They slow down inserts and waste hard drive space. Read this http://use-the-index-luke.com/ .
Edit MySQL is built for this sort of four-longword table with half a billion rows. You will get this working if you're patient and learn about indexing. Don't waste your irreplaceable time with hundreds of smaller tables or with partitioning. More RAM may help, though.
The best thing for performance with InnoDB is making sure that all of your frequently used data fits in the buffer pool. With your posted table structures, it looks like you'll need roughly 500MB of buffer pool space to keep all of the data in the buffer pool.
A better structure for the leaderboard table would be:
CREATE TABLE IF NOT EXISTS `leaderboard` (
`user_id` INT(10) UNSIGNED NOT NULL DEFAULT '0',
`level_no` SMALLINT(5) UNSIGNED NOT NULL,
`score` int(10) NOT NULL,
`timestamp` int(10) UNSIGNED NOT NULL,
PRIMARY KEY (`level_no`,`user_id`),
KEY `user_id` (`user_id`),
KEY `score` (`score`),
KEY `level_no_score` (`level_no`,`score`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Changes:
timestamp and user_id columns are UNSIGNED: expands range for user IDs, I assume you're not using negative time values and current unix timestamps are above the signed range.
The timestamp may be easier to use as a TIMESTAMP type: the TIMESTAMP uses 4 bytes like INT but is shown as a datetime.
Removed the level_no index: it is redundant with the level_no_score index since prefixes of indexes can be used instead of the whole thing.
List item
Using (level_no, user_id) as the primary key will help if you frequently use those columns in queries and removes an unneeded column (id). InnoDB does implicitly create a primary key only if one is not explicitly defined, so creating the id column only to use as a primary key is a waste.
The "correct" primary index also depends on the data and access pattern. What is unique in the table? Is it really level_no and user_id or is it just user? If it's just user_id that will probably be a better primary key.
To save space (hence make things more cacheable, hence faster), shrink from INT (4 bytes) to MEDIUMINT UNSIGNED (3 bytes, 0-16M range) or smaller.
CHAR(64) -- are the strings always 64 characters? If not, use VARCHAR(64) to save space. ('3FGTOHQN6UMwXI47IiRRMf9WI777SSJ6A' is only 33?)
For leaderboard, I think you can get rid of one index:
PRIMARY KEY (`user_id`, `level_no`), -- reversed
# KEY `user_id` (`user_id`), -- not needed
KEY `score` (`score`),
KEY `level_no_score` (`level_no`,`score`) -- takes care of any lookup by just `level_no`
Re "3x": "Rows" in EXPLAIN is an estimate. Sometimes it is a crude estimate.
You know SQL; why go to the effort to code "SELECT" yourself for NoSQL?
PARTITIONing does not automatically provide any performance boost. And you have not shown any queries that would benefit.
I agree that 500 similar tables is more trouble than it is worth.
2GB of RAM? Better keep innodb_buffer_pool_size down at maybe 300M. Swapping is much worse than shrinking the buffer_pool.
leaderboard PK -- You are saying that one user_id can be in multiple levels?

Optimizing aggregation on MySQL Table with 850 million rows

I have a query that I'm using to summarize via aggregations.
The table is called 'connections' and has about 843 million rows.
CREATE TABLE `connections` (
`app_id` varchar(16) DEFAULT NULL,
`user_id` bigint(20) DEFAULT NULL,
`time_started_dt` datetime DEFAULT NULL,
`device` varchar(255) DEFAULT NULL,
`os` varchar(255) DEFAULT NULL,
`firmware` varchar(255) DEFAULT NULL,
KEY `app_id` (`bid`),
KEY `time_started_dt` (`time_started_dt`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
When I try to run a query, such as the one below, it takes over 10 hours and I end up killing it. Does anyone see any mistakes that I'm making, of have any suggestions as to how I could optimize the query?
SELECT
app_id,
MAX(time_started_dt),
MIN(time_started_dt),
COUNT(*)
FROM
connections
GROUP BY
app_id
I suggest you create a composite index on (app_id, time_started_dt):
ALTER TABLE connections ADD INDEX(app_id, time_started_dt)
To get that query to perform, you really need a suitable covering index, with app_id as the leading column, e.g.
CREATE INDEX `connections_IX1` ON `connections` (`app_id`,` time_start_dt`);
NOTE: creating the index may take hours, and the operation will prevent insert/update/delete to the table while it is running.
An EXPLAIN will show the proposed execution plan for your query. With the covering index in place, you'll see "Using index" in the plan. (A "covering index" is an index that can be used by MySQL to satisfy a query without having to access the underlying table. That is, the query can be satisfied entirely from the index.)
With the large number of rows in this table, you may also want to consider partitioning.
I have tried your query on randomly generated data (around 1 million rows). Adding PRIMATY KEY will improve performance of your query by 10%.
As already suggested by other people composite index should be added to the table. Index time_started_dt is useless.
CREATE TABLE `connections` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`app_id` varchar(16) DEFAULT NULL,
`user_id` bigint(20) DEFAULT NULL,
`time_started_dt` datetime DEFAULT NULL,
`device` varchar(255) DEFAULT NULL,
`os` varchar(255) DEFAULT NULL,
`firmware` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `composite_idx` (`app_id`,`time_started_dt`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Very slow query when using ORDER BY and LIMIT?

The following query takes 10 seconds to finish when having order by. Without order by it finish in 0.0005 seconds. I am already having an index on field "sku", "vid" AND "timestamp". I have more 200,000 record in this table. Please help, what is wrong with the query when using order by.
SELECT i.pn,i.sku,i.title, fl.f_inserted,fl.f_special, fl.f_notinserted
FROM inventory i
LEFT JOIN inventory_flags fl ON fl.sku = i.sku AND fl.vid = i.vid
WHERE i.qty >=2 ORDER BY i.timestamp LIMIT 0,100;
-- --------------------------------------------------------
--
-- Table structure for table `inventory`
--
CREATE TABLE IF NOT EXISTS `inventory` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`pn` varchar(60) DEFAULT NULL,
`sku` varchar(60) DEFAULT NULL,
`title` varchar(60) DEFAULT NULL,
`qty` int(11) DEFAULT NULL,
`vid` int(11) DEFAULT NULL,
`timestamp` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `vid` (`vid`),
KEY `sku` (`sku`),
KEY `timestamp` (`timestamp`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
-- --------------------------------------------------------
--
-- Table structure for table `inventory_flags`
--
CREATE TABLE IF NOT EXISTS `inventory_flags` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`f_inserted` tinyint(1) DEFAULT NULL,
`f_notinserted` tinyint(1) DEFAULT NULL,
`f_special` tinyint(1) DEFAULT NULL,
`timestamp` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`sku` varchar(60) DEFAULT NULL,
`vid` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `vid` (`vid`),
KEY `sku` (`sku`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
EXPLANE RESULT:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE fl system vid,sku NULL NULL NULL 0 const row not found
1 SIMPLE i index NULL timestamp 5 NULL 10 Using where
Instead of adding seprate indexes on columns you need to put multicolumn index on tables as you are using more than one columns from same table in joining condition.
after including columns from WHERE clause also include columns used in ORDER BY clause in composite index.
try adding flowing indexes and test them using EXPLAIN:
ALTER TABLE ADD INDEX ix_if inventory_flags(sku, vid);
ALTER TABLE ADD INDEX ix_i inventory(sku, qty, timestamp);
also try to avoid DISTINCT clause in your query, it is equivalent to GROUP BY clause, if you still need it then consider adding covering index.
If sku is unique to each inventory item then define it as UNIQUE - it'll speed things up. (Or the combination of sku and vid - define a composite index in that case.)
Why are you doing SELECT DISTINCT? The vast majority of the time using DISTINCT is a sign that your query or your table structure is wrong.
Since it's DISTINCT, and sku is not UNIQUE it can't use the index on timestamp to speed things up, so it has to sort a table with 200,000 records - it can't even use an index on qty to speed that part up.
PS. Omesh has some good advice as well.
you can use force index(index_key). try it, and you will see in explain query that mysql now will use the key index when 'order by'

MySQL ORDER BY optimization in many to many tables

Tables:
CREATE TABLE IF NOT EXISTS `posts` (
`post_n` int(10) NOT NULL auto_increment,
`id` int(10) default NULL,
`date` datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`post_n`,`visibility`),
KEY `id` (`id`),
KEY `date` (`date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
CREATE TABLE IF NOT EXISTS `subscriptions` (
`subscription_n` int(10) NOT NULL auto_increment,
`id` int(10) NOT NULL,
`subscribe_id` int(10) NOT NULL,
PRIMARY KEY (`subscription_n`),
KEY `id` (`id`),
KEY `subscribe_id` (`subscribe_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Query:
SELECT posts.* FROM posts, subscriptions
WHERE posts.id=subscriptions.subscribe_id AND subscriptions.id=1
ORDER BY date DESC LIMIT 0, 15
It`s so slow because used indexes "id", "subscribe_id" but not index "date" thus ordering is very slow.
Is there any options to change the query, indexes, architecture?
Possible Improvements:
First, you'll gain a couple microseconds per query if you name your fields instead of using SELECT posts.* which causes a schema lookup. Change your query to:
SELECT posts.post_n, posts.id, posts.date
FROM posts, subscriptions
WHERE posts.id=subscriptions.subscribe_id
AND subscriptions.id=1
ORDER BY date DESC
LIMIT 0, 15
Next, this requires MySQL 5.1 or higher, but you might want to consider partitioning your tables. You might consider KEY partitioning for both tables.
This should get you started.
http://dev.mysql.com/doc/refman/5.1/en/partitioning-types.html
E.g.
SET SQL_MODE = 'ANSI';
-- to allow default date
CREATE TABLE IF NOT EXISTS `posts` (
`post_n` int(10) NOT NULL auto_increment,
`id` int(10) default NULL,
`date` datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`post_n`,`id`),
KEY `id` (`id`),
KEY `date` (`date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin
PARTITION BY KEY(id) PARTITIONS 32;
--
CREATE TABLE IF NOT EXISTS `subscriptions` (
`subscription_n` int(10) NOT NULL auto_increment,
`id` int(10) NOT NULL,
`subscribe_id` int(10) NOT NULL,
PRIMARY KEY (`subscription_n`,`subscribe_id`),
KEY `id` (`id`),
KEY `subscribe_id` (`subscribe_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin
PARTITION BY KEY(subscribe_id) PARTITIONS 32;
I had to adjust your primary key a bit. So, beware, this may NOT work for you. Please test it and make sure. I hope, this does though. Make sure to run sysbench against the old and new structures/queries to compare results before going to production.
:-)
If you're able to modify the table, you could add a multi-field index containing both ID and date. (or modify one of the existing keys to contain them both).
If you can't make changes to the database, and if you know that your result set is going to be small, you can force it to use a specific named key, with USE KEY(name). The ordering would then be done after the fact, just on the reslts returned.
Hope that helps.