Related
I have a MySQL table with a PRIMARY KEY AUTO_INCREMENT column. I started running an event that runs ALTER TABLE update_log REORGANIZE PARTITION. Suddenly my clients (which INSERT) started getting errors java.sql.SQLIntegrityConstraintViolationException: Duplicate entry '146255393' for key 'PRIMARY'.
I noticed that the duplicate entry was the same as the AUTO_INCREMENT=… value returned by SHOW CREATE TABLE update_log;:
CREATE TABLE `update_log` (
`update_id` bigint(20) NOT NULL AUTO_INCREMENT,
`org_id` int(11) NOT NULL,
`inventory_id` binary(16) NOT NULL,
`item_url` text NOT NULL,
`item_update_count` bigint(20) NOT NULL,
PRIMARY KEY (`update_id`)
) ENGINE=InnoDB AUTO_INCREMENT=146255393 DEFAULT CHARSET=utf8mb4
/*!50100 PARTITION BY RANGE (update_id)
(PARTITION initial VALUES LESS THAN (1000000) ENGINE = InnoDB,
PARTITION `1000000-1999999` VALUES LESS THAN (2000000) ENGINE = InnoDB,
PARTITION `2000000-2999999` VALUES LESS THAN (3000000) ENGINE = InnoDB,
PARTITION `3000000-3999999` VALUES LESS THAN (4000000) ENGINE = InnoDB,
PARTITION `4000000-4999999` VALUES LESS THAN (5000000) ENGINE = InnoDB,
PARTITION `5000000-5999999` VALUES LESS THAN (6000000) ENGINE = InnoDB,
PARTITION `6000000-6999999` VALUES LESS THAN (7000000) ENGINE = InnoDB,
PARTITION `7000000-7999999` VALUES LESS THAN (8000000) ENGINE = InnoDB,
PARTITION `8000000-8999999` VALUES LESS THAN (9000000) ENGINE = InnoDB,
PARTITION `9000000-9999999` VALUES LESS THAN (10000000) ENGINE = InnoDB,
PARTITION `10000000-10999999` VALUES LESS THAN (11000000) ENGINE = InnoDB,
PARTITION `11000000-11999999` VALUES LESS THAN (12000000) ENGINE = InnoDB,
PARTITION `12000000-12999999` VALUES LESS THAN (13000000) ENGINE = InnoDB,
PARTITION `13000000-13999999` VALUES LESS THAN (14000000) ENGINE = InnoDB,
PARTITION `14000000-14999999` VALUES LESS THAN (15000000) ENGINE = InnoDB,
PARTITION `15000000-15999999` VALUES LESS THAN (16000000) ENGINE = InnoDB,
PARTITION `16000000-16999999` VALUES LESS THAN (17000000) ENGINE = InnoDB,
PARTITION `17000000-17999999` VALUES LESS THAN (18000000) ENGINE = InnoDB,
PARTITION `18000000-18999999` VALUES LESS THAN (19000000) ENGINE = InnoDB,
PARTITION `19000000-19999999` VALUES LESS THAN (20000000) ENGINE = InnoDB,
PARTITION `20000000-20999999` VALUES LESS THAN (21000000) ENGINE = InnoDB,
PARTITION `21000000-21999999` VALUES LESS THAN (22000000) ENGINE = InnoDB,
PARTITION `22000000-22999999` VALUES LESS THAN (23000000) ENGINE = InnoDB,
PARTITION `23000000-23999999` VALUES LESS THAN (24000000) ENGINE = InnoDB,
PARTITION `24000000-24999999` VALUES LESS THAN (25000000) ENGINE = InnoDB,
PARTITION `25000000-25999999` VALUES LESS THAN (26000000) ENGINE = InnoDB,
PARTITION `26000000-26999999` VALUES LESS THAN (27000000) ENGINE = InnoDB,
PARTITION `27000000-27999999` VALUES LESS THAN (28000000) ENGINE = InnoDB,
PARTITION `28000000-28999999` VALUES LESS THAN (29000000) ENGINE = InnoDB,
PARTITION `29000000-29999999` VALUES LESS THAN (30000000) ENGINE = InnoDB,
PARTITION `30000000-30999999` VALUES LESS THAN (31000000) ENGINE = InnoDB,
PARTITION `31000000-31999999` VALUES LESS THAN (32000000) ENGINE = InnoDB,
PARTITION future VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;
So I manually altered the AUTO_INCREMENT value to be one greater than the maximum value in the table: ALTER TABLE update_log AUTO_INCREMENT=146262544;
Then the same error happened again (java.sql.SQLIntegrityConstraintViolationException: Duplicate entry '146255393' for key 'PRIMARY'), and I again tried to increase AUTO_INCREMENT by hand to a much higher value: ALTER TABLE update_log AUTO_INCREMENT=146300000; Then the error happened again (java.sql.SQLIntegrityConstraintViolationException: Duplicate entry '146311957' for key 'PRIMARY').
Edit (to answer question from #RaymondNijland): All the inserting clients get the same error at the same position. The insertions to the table are made using the slick library which compile into:
insert into `update_log` (`org_id`,`inventory_id`,`item_url`,`item_update_count`)
select s37.`org_id`, s38.`inventory_id`, s38.`item_url`, s38.`item_update_count`
from `org` s37, `inventory_version` s38
where (s37.`org_code` = 'myorg') and ((s38.`org_id` = s37.`org_id`) and (s38.`inventory_id` = ?))
By the way, also I noticed in SHOW TABLE STATUS\G that Rows does not match COUNT(*):
mysql> SHOW TABLE STATUS\G
…
*************************** 6. row ***************************
Name: update_log
Engine: InnoDB
Version: 10
Row_format: Dynamic
Rows: 0
Avg_row_length: 0
Data_length: 1425408
Max_data_length: 0
Index_length: 0
Data_free: 158334976
Auto_increment: 146311958
Create_time: 2019-08-23 22:01:31
Update_time: 2019-08-23 22:01:31
Check_time: NULL
Collation: utf8mb4_general_ci
Checksum: NULL
Create_options: partitioned
Comment:
…
mysql> SELECT MAX(update_id), COUNT(*) FROM update_log\G
*************************** 1. row ***************************
MAX(update_id): 146312275
COUNT(*): 255167
What’s going on? Am I doomed to manually increment my auto_increment primary key forever?
this is my table
CREATE TABLE `fa_nls_og` (
`Incr_Dollar_YAG_pct_Chg` double(50,4) DEFAULT NULL,
`Incr_U_YAG_pct_Chg` double(50,4) DEFAULT NULL,
`Incr_U_YAG_Chg` double(50,4) DEFAULT NULL,
`Incr_EQ_YAG_pct_Chg` double(50,4) DEFAULT NULL,
`Incr_EQ_YAG_Chg` double(50,4) DEFAULT NULL,
`Baseline_EQ_YAG_pct_Chg` double(50,4) DEFAULT NULL,
`Baseline_EQ_YAG_Chg` double(50,4) DEFAULT NULL,
`Baseline_Units_YAG_pct_Chg` double(50,4) DEFAULT NULL,
`Baseline_Units_YAG_Chg` double(50,4) DEFAULT NULL,
`Units_YAG_Period` double(50,4) DEFAULT NULL,
`Units_YAG_pct_Chg` double(50,4) DEFAULT NULL,
`Units_YAG_Chg` double(50,4) DEFAULT NULL,
`PERIOD_YEAR` int(50) DEFAULT NULL,
`CAT_NO` int(10) DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (PERIOD_YEAR)
SUBPARTITION BY KEY (CAT_NO) SUBPARTITIONS 12
(PARTITION pytd VALUES LESS THAN (2) ENGINE = InnoDB,
PARTITION p VALUES LESS THAN (200000) ENGINE = InnoDB,
PARTITION p0 VALUES LESS THAN (201401) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (201402) ENGINE = InnoDB,
PARTITION p4 VALUES LESS THAN (201403) ENGINE = InnoDB,
PARTITION p6 VALUES LESS THAN (201404) ENGINE = InnoDB,
PARTITION p8 VALUES LESS THAN (201405) ENGINE = InnoDB,
PARTITION p10 VALUES LESS THAN (201406) ENGINE = InnoDB,
PARTITION p12 VALUES LESS THAN (201407) ENGINE = InnoDB,
PARTITION p14 VALUES LESS THAN (201408) ENGINE = InnoDB,
PARTITION p16 VALUES LESS THAN (201409) ENGINE = InnoDB,
PARTITION p18 VALUES LESS THAN (201410) ENGINE = InnoDB,
PARTITION p20 VALUES LESS THAN (201411) ENGINE = InnoDB,
PARTITION p22 VALUES LESS THAN (201412) ENGINE = InnoDB,
PARTITION p24 VALUES LESS THAN (201501) ENGINE = InnoDB,
PARTITION p26 VALUES LESS THAN (201502) ENGINE = InnoDB,
PARTITION p28 VALUES LESS THAN (201503) ENGINE = InnoDB,
PARTITION p30 VALUES LESS THAN (201504) ENGINE = InnoDB,
PARTITION p32 VALUES LESS THAN (201505) ENGINE = InnoDB,
PARTITION p34 VALUES LESS THAN (201506) ENGINE = InnoDB,
PARTITION p36 VALUES LESS THAN (201507) ENGINE = InnoDB,
PARTITION p38 VALUES LESS THAN (201508) ENGINE = InnoDB,
PARTITION p40 VALUES LESS THAN (201509) ENGINE = InnoDB,
PARTITION p42 VALUES LESS THAN (201510) ENGINE = InnoDB,
PARTITION p44 VALUES LESS THAN (201511) ENGINE = InnoDB,
PARTITION p46 VALUES LESS THAN (201512) ENGINE = InnoDB,
PARTITION p48 VALUES LESS THAN (201601) ENGINE = InnoDB,
PARTITION p50 VALUES LESS THAN (201602) ENGINE = InnoDB,
PARTITION p52 VALUES LESS THAN (201603) ENGINE = InnoDB,
PARTITION p54 VALUES LESS THAN (201604) ENGINE = InnoDB,
PARTITION p56 VALUES LESS THAN (201605) ENGINE = InnoDB,
PARTITION p58 VALUES LESS THAN (201606) ENGINE = InnoDB,
PARTITION p60 VALUES LESS THAN (201607) ENGINE = InnoDB,
PARTITION p62 VALUES LESS THAN (201608) ENGINE = InnoDB,
PARTITION p64 VALUES LESS THAN (201609) ENGINE = InnoDB,
PARTITION p66 VALUES LESS THAN (201610) ENGINE = InnoDB,
PARTITION p68 VALUES LESS THAN (201611) ENGINE = InnoDB,
PARTITION p70 VALUES LESS THAN (201612) ENGINE = InnoDB,
PARTITION p72 VALUES LESS THAN (201701) ENGINE = InnoDB,
PARTITION p74 VALUES LESS THAN (201702) ENGINE = InnoDB,
PARTITION p76 VALUES LESS THAN (201703) ENGINE = InnoDB,
PARTITION p78 VALUES LESS THAN (201704) ENGINE = InnoDB,
PARTITION p80 VALUES LESS THAN (201705) ENGINE = InnoDB,
PARTITION p82 VALUES LESS THAN (201706) ENGINE = InnoDB,
PARTITION p84 VALUES LESS THAN (201707) ENGINE = InnoDB,
PARTITION p86 VALUES LESS THAN (201708) ENGINE = InnoDB,
PARTITION p88 VALUES LESS THAN (201709) ENGINE = InnoDB,
PARTITION p90 VALUES LESS THAN (201710) ENGINE = InnoDB,
PARTITION p92 VALUES LESS THAN (201711) ENGINE = InnoDB,
PARTITION p94 VALUES LESS THAN (201712) ENGINE = InnoDB,
PARTITION p96 VALUES LESS THAN (201801) ENGINE = InnoDB,
PARTITION p98 VALUES LESS THAN (201802) ENGINE = InnoDB,
PARTITION p100 VALUES LESS THAN (201803) ENGINE = InnoDB,
PARTITION p102 VALUES LESS THAN (201804) ENGINE = InnoDB,
PARTITION p104 VALUES LESS THAN (201805) ENGINE = InnoDB,
PARTITION p106 VALUES LESS THAN (201806) ENGINE = InnoDB,
PARTITION p108 VALUES LESS THAN (201807) ENGINE = InnoDB,
PARTITION p110 VALUES LESS THAN (201808) ENGINE = InnoDB,
PARTITION p112 VALUES LESS THAN (201809) ENGINE = InnoDB,
PARTITION p114 VALUES LESS THAN (201810) ENGINE = InnoDB,
PARTITION p116 VALUES LESS THAN (201811) ENGINE = InnoDB,
PARTITION p118 VALUES LESS THAN (201812) ENGINE = InnoDB,
PARTITION p144 VALUES LESS THAN (201901) ENGINE = InnoDB,
PARTITION p146 VALUES LESS THAN (201902) ENGINE = InnoDB,
PARTITION p148 VALUES LESS THAN (201903) ENGINE = InnoDB,
PARTITION p150 VALUES LESS THAN (201904) ENGINE = InnoDB,
PARTITION p152 VALUES LESS THAN (201905) ENGINE = InnoDB,
PARTITION p154 VALUES LESS THAN (201906) ENGINE = InnoDB,
PARTITION p156 VALUES LESS THAN (201907) ENGINE = InnoDB,
PARTITION p158 VALUES LESS THAN (201908) ENGINE = InnoDB,
PARTITION p160 VALUES LESS THAN (201909) ENGINE = InnoDB,
PARTITION p162 VALUES LESS THAN (201910) ENGINE = InnoDB,
PARTITION p164 VALUES LESS THAN (201911) ENGINE = InnoDB,
PARTITION p166 VALUES LESS THAN (201912) ENGINE = InnoDB,
PARTITION p120 VALUES LESS THAN (202001) ENGINE = InnoDB,
PARTITION p122 VALUES LESS THAN (202002) ENGINE = InnoDB,
PARTITION p124 VALUES LESS THAN (202003) ENGINE = InnoDB,
PARTITION p126 VALUES LESS THAN (202004) ENGINE = InnoDB,
PARTITION p128 VALUES LESS THAN (202005) ENGINE = InnoDB,
PARTITION p130 VALUES LESS THAN (202006) ENGINE = InnoDB,
PARTITION p132 VALUES LESS THAN (202007) ENGINE = InnoDB,
PARTITION p134 VALUES LESS THAN (202008) ENGINE = InnoDB,
PARTITION p136 VALUES LESS THAN (202009) ENGINE = InnoDB,
PARTITION p138 VALUES LESS THAN (202010) ENGINE = InnoDB,
PARTITION p140 VALUES LESS THAN (202011) ENGINE = InnoDB,
PARTITION p142 VALUES LESS THAN (202012) ENGINE = InnoDB
) */
updates on one of the column where period year and catno
first update it takes 5 sec ,2nd time updating it takes 30 min , can anyone help it?
Too many partitions. After about 50, performance degrades. (You have about 800!)
Don't pre-build partitions; it slows down operations.
SUBPARTITIONs have no performance benefit.
Don't use DOUBLE(m,n) it leads to extra rounding. Either use plain DOUBLE (with 16 significant digits) or DECIMAL(m,n) with reasonable values for m and n. DOUBLE (with or without (m,n)) takes 8 bytes; DECIMAL(50,4) takes about 25 bytes!
int(50) -- The (50) means nothing. INT always takes 4 bytes. Since it seems to be a YEAR, use that datatype (only 2 bytes).
Have a PRIMARY KEY
If that is your main query, have INDEX(period_year, catno).
After all of that, get rid of all partitioning -- it is not providing anything useful (based on what you have said so far). The INDEX will give you the speed you are missing. My other tips help in various other ways, some of them helping directly or indirectly (eg, small = faster) with speed.
I have a very large table on a mysql 5.6.10 instance (roughly 480 million rows).
The storage engine is InnoDB. (Table and DB Default).
The table was partitioned by hash of merchantId (bigint: a kind of client identifier) which helped when queries related to a single merchant. Due to significant performance degradation when queries spanned multiple merchants, I decided to repartition the table by Range on ACTION_DATE (the DATE that an activity occurred). Thinking I was being clever, I decided to add a few (5) new fields for future use (unused_varchar1 varchar(200), etc.), since the table is so large, adding new fields essentially requires a rebuild anyway, so why not...
I created the new table structure as _new, dumped the existing file to a secondary server using mysql dump. I then used an awk script to finesse the name and a few other details to fit the new table (change tableName to tableName_new), and started the load.
The existing table was approximately 430 GB. The text file similarly was about 403 GB. I was surprised therefore that the new table ended up taking about 840 GB!! (Based on the linux fize size of the .ibd files)
So, I have 2 basic questions, which really amount to why and what now...
I imagine that the new table is larger because the dump file was in the order of the previous partition (merchantId) while the load was inserting into the new partitioning (Activity date) creating a semi-random insertion order. The randomness led mysql to leave plenty of space (roughly 50%) in the pages for future insertions. (I'm a little fuzzy on the terminology here, having spent much more time in my career with Sql Server DBs than MySql Dbs...) I'm not able to find any internal statistics in mysql for space free per page. The INFORMATION_SCHEMA.TABLES DATA_FREE stat is an unconvincing 68MB.
If it helps these are the relevant stats from I_S.TABLES:
TABLE_TYPE: BASE TABLE
Engine: InnoDB
VERSION: 10
ROW_FORMAT: Compact
TABLE_ROWS: 488,094,271
AVG_ROW_LENGTH: 1,564
DATA_LENGTH: 763,509,358,592 (711 GB)
INDEX_LENGTH: 100,065,574,912 (93.19 GB)
DATA_FREE: 68,157,440 (0.06 GB)
I realize that that doesn't add up to 840 GB, but as I said, that was the size of the .ibd files which seems to be slightly different than the I_S.TABLES stats. Either way, it is significantly more than the text dump file.
I digress...
My question is whether my theory about whether the repartioning explains the roughly doubled size. Or is there another explanation? I think the extra columns (2 Bigint, 2 Varchar(200), 1 Date) are not the culprit since they are all null. My napkin calculation was that the additional columns would add < 9 GB. Likewise, one additional index on UID should be a relatively small addition.
The follow up question is what can I do now if I want to try to compact the table. (Server now only has about 385 GB free...)
If I repeated the procedure, dump to file, reload, this time in the current partition order, would I end up with a table more like the size of my original table ~430 GB?
Following are relevant parts of DDL.
OLD TABLE:
CREATE TABLE table_name (
`AUTO_SEQ` bigint(20) NOT NULL,
`MERCHANT_ID` bigint(20) NOT NULL,
`AFFILIATE_ID` bigint(20) DEFAULT NULL,
`PROGRAM_ID` bigint(20) NOT NULL,
`ACTION_DATE` date DEFAULT NULL,
`UID` varchar(128) DEFAULT NULL,
... additional columns ...
PRIMARY KEY (`AUTO_SEQ`,`MERCHANT_ID`,`PROGRAM_ID`),
KEY `oc_rpt_mpad_idx` (`MERCHANT_ID`,`PROGRAM_ID`,`ACTION_DATE`,`AFFILIATE_ID`),
KEY `oc_rpt_mapd` (`MERCHANT_ID`,`ACTION_DATE`),
KEY `oc_rpt_apda_idx` (`AFFILIATE_ID`,`PROGRAM_ID`,`ACTION_DATE`,`MERCHANT_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY HASH (merchant_id)
PARTITIONS 16 */
NEW TABLE:
CREATE TABLE `tableName_new` (
`AUTO_SEQ` bigint(20) NOT NULL,
`MERCHANT_ID` bigint(20) NOT NULL,
`AFFILIATE_ID` bigint(20) DEFAULT NULL,
`PROGRAM_ID` bigint(20) NOT NULL,
`ACTION_DATE` date NOT NULL DEFAULT '0000-00-00',
`UID` varchar(128) DEFAULT NULL,
... additional columns...
# NEW COLUMNS (ALL NULL)
`UNUSED_BIGINT1` bigint(20) DEFAULT NULL,
`UNUSED_BIGINT2` bigint(20) DEFAULT NULL,
`UNUSED_VARCHAR1` varchar(200) DEFAULT NULL,
`UNUSED_VARCHAR2` varchar(200) DEFAULT NULL,
`UNUSED_DATE1` date DEFAULT NULL,
PRIMARY KEY (`AUTO_SEQ`,`ACTION_DATE`),
KEY `oc_rpt_mpad_idx` (`MERCHANT_ID`,`PROGRAM_ID`,`ACTION_DATE`,`AFFILIATE_ID`),
KEY `oc_rpt_mapd` (`ACTION_DATE`),
KEY `oc_rpt_apda_idx` (`AFFILIATE_ID`,`PROGRAM_ID`,`ACTION_DATE`,`MERCHANT_ID`),
KEY `oc_uid` (`UID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50500 PARTITION BY RANGE COLUMNS(ACTION_DATE)
(PARTITION p01 VALUES LESS THAN ('2012-01-01') ENGINE = InnoDB,
PARTITION p02 VALUES LESS THAN ('2012-04-01') ENGINE = InnoDB,
PARTITION p03 VALUES LESS THAN ('2012-07-01') ENGINE = InnoDB,
PARTITION p04 VALUES LESS THAN ('2012-10-01') ENGINE = InnoDB,
PARTITION p05 VALUES LESS THAN ('2013-01-01') ENGINE = InnoDB,
PARTITION p06 VALUES LESS THAN ('2013-04-01') ENGINE = InnoDB,
PARTITION p07 VALUES LESS THAN ('2013-07-01') ENGINE = InnoDB,
PARTITION p08 VALUES LESS THAN ('2013-10-01') ENGINE = InnoDB,
PARTITION p09 VALUES LESS THAN ('2014-01-01') ENGINE = InnoDB,
PARTITION p10 VALUES LESS THAN ('2014-04-01') ENGINE = InnoDB,
PARTITION p11 VALUES LESS THAN ('2014-07-01') ENGINE = InnoDB,
PARTITION p12 VALUES LESS THAN ('2014-10-01') ENGINE = InnoDB,
PARTITION p13 VALUES LESS THAN ('2015-01-01') ENGINE = InnoDB,
PARTITION p14 VALUES LESS THAN ('2015-04-01') ENGINE = InnoDB,
PARTITION p15 VALUES LESS THAN ('2015-07-01') ENGINE = InnoDB,
PARTITION p16 VALUES LESS THAN ('2015-10-01') ENGINE = InnoDB,
PARTITION p17 VALUES LESS THAN ('2016-01-01') ENGINE = InnoDB,
PARTITION p18 VALUES LESS THAN ('2016-04-01') ENGINE = InnoDB,
PARTITION p19 VALUES LESS THAN ('2016-07-01') ENGINE = InnoDB,
PARTITION p20 VALUES LESS THAN ('2016-10-01') ENGINE = InnoDB,
PARTITION p21 VALUES LESS THAN ('2017-01-01') ENGINE = InnoDB,
PARTITION p22 VALUES LESS THAN ('2017-04-01') ENGINE = InnoDB,
PARTITION p23 VALUES LESS THAN ('2017-07-01') ENGINE = InnoDB,
PARTITION p24 VALUES LESS THAN ('2017-10-01') ENGINE = InnoDB,
PARTITION p25 VALUES LESS THAN ('2018-01-01') ENGINE = InnoDB,
PARTITION p26 VALUES LESS THAN ('2018-04-01') ENGINE = InnoDB,
PARTITION p27 VALUES LESS THAN ('2018-07-01') ENGINE = InnoDB,
PARTITION p28 VALUES LESS THAN ('2018-10-01') ENGINE = InnoDB,
PARTITION p29 VALUES LESS THAN ('2019-01-01') ENGINE = InnoDB,
PARTITION p30 VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB) */
adding new fields essentially requires a rebuild anyway, so why not
I predict you will regret it.
The existing table was approximately 430 GB.
According to size of .ibd? Or SHOW TABLE STATUS? Or the dump size, which would be bogus (see below).
it is significantly more than the text dump file
The lengths in TABLE STATUS include several flavors of overhead (BTree, free space, extra extents, etc), plus the indexes (which are not in the dump file).
Also, think about a BIGINT that contains 1234. The .ibd will 8 bytes plus some overhead; the dump will have 5 ('1234', plus a comma). That leads to my next point...
Are there really more than 4 billion merchants? merchant_id is BIGINT (8 bytes); INT UNSIGNED is only 4 bytes and allows 0..4 billion.
What's in uid? If it is some sort of UUID, it seems awfully long.
Do you happen to have the "stats from I_S.TABLES" from the old table?
So far, I have not addressed "whether the repartioning explains the roughly doubled size".
extra columns (2 Bigint, 2 Varchar(200), 1 Date)
That's about 29 bytes per row (15GB of Data_length), perhaps less since they are NULL.
You seem to be using the default ROW_FORMAT. I suspect this did not change in the conversion.
It is usually unwise to start an index with the "partition key" (merchant_id or action_date). This is because you are already "pruning" on that key; you are better off starting the index with something else. (Caveat: There are exceptions.)
Check the CHARACTER SET and datatype of the "additional columns". If something changed, that could be significant.
would I end up with a table more like the size of my original table ~430 GB?
Alas, until we figure out why it grew, I can't answer that question.
I'm more interested in whether random insertion vs. the partition (ACTION_DATE) would lead to wasted space / half empty pages.
I recommend you try the following experiment. Do not use optimize partition; see http://bugs.mysql.com/bug.php?id=42822 . Instead do this to defragment one partition (such as p02):
ALTER TABLE table_name REBUILD PARTITION p02;
You could do this SELECT before and after in order to see the change(s) to the PARTITIONs:
SELECT *
FROM information_schema.PARTITIONS
WHERE TABLE_SCHEMA = 'dbname' -- change as needed
AND TABLE_NAME = 'table_name' -- change as needed
ORDER BY PARTITION_ORDINAL_POSITION,
SUBPARTITION_ORDINAL_POSITION;
It's a generic query to get the table-status-like info for the partitions of one table.
If the REBUILD cuts the partition by about 50%, then we have the answer.
Generally, randomly inserting into a BTree should leave you with about 69% (not 50%) of the "full" size. Hence, I'm not 'expecting' this to be the solution/answer.
im testing now for +10hours to get a database structure with a primary key (id)
and a partition by bigint. but nothing will work :/
is that possible?
maybe anybody could give a me good hint ;)
CREATE TABLE IF NOT EXISTS `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`uniqueID` bigint(20) DEFAULT NULL,
`value` int(11) DEFAULT NULL,
`m1` text CHARACTER SET utf8,
`m2` text CHARACTER SET utf8,
`m3` text CHARACTER SET utf8,
`m4` text CHARACTER SET utf8,
`m5` text CHARACTER SET utf8,
PRIMARY KEY (`id`),
UNIQUE KEY `uniqueID` (`uniqueID`),
KEY `value` (`value`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci PACK_KEYS=1 DELAY_KEY_WRITE=1 ROW_FORMAT=DYNAMIC
/*!50500 PARTITION BY RANGE COLUMNS(uniqueID)
(PARTITION p1 VALUES LESS THAN ('0') ENGINE = MyISAM,
PARTITION p2 VALUES LESS THAN ('1') ENGINE = MyISAM,
PARTITION p3 VALUES LESS THAN ('2') ENGINE = MyISAM,
PARTITION p4 VALUES LESS THAN ('3') ENGINE = MyISAM,
PARTITION p5 VALUES LESS THAN ('4') ENGINE = MyISAM,
PARTITION p6 VALUES LESS THAN ('5') ENGINE = MyISAM,
PARTITION p7 VALUES LESS THAN ('6') ENGINE = MyISAM,
PARTITION p8 VALUES LESS THAN ('9') ENGINE = MyISAM,
PARTITION p9 VALUES LESS THAN (MAXVALUE) ENGINE = MyISAM) */;
with this partition, i will split the bigint values by the first number - example:
16275214652090176103 would be a part of partition p2
this database will take 100M records :/
thanks advance
The column you are partitioning on must be part of the primary key of the table, and I believe, it must be the last column in the primary key. So, if you were to define your primary key as PRIMARY KEY (id, uniqueID), you should be able to partition on uniqueID.
That being said, given that your uniqueID field is a bigint and you are trying to partition based on the bigint being less than a string, I'm not sure what you are trying to do will work as desired. Perhaps using hash partitioning rather than range partitioning would be more use to you? For more details see https://dev.mysql.com/doc/refman/5.1/en/partitioning-hash.html
I have created a table using mysql partition using range and have inserted millions of data.
CREATE TABLE `PART_SAMPLE ` (
`TRANSACTION_ID` bigint(25) NOT NULL AUTO_INCREMENT,
`TASK_ID` int(11) DEFAULT NULL,
`STATUS_CODE` int(10) DEFAULT NULL,
`FIELD10` int(5) DEFAULT NULL,
KEY `TXN_ID` (`TRANSACTION_ID`),
KEY `TASK_IDX` (`TASK_ID`),
KEY `id_idx_task_status` (`TASK_ID`,`STATUS_CODE`),
KEY `IDX_STATUS` (`STATUS_CODE`),
KEY `Fld_idx` (`FIELD10`)
) ENGINE=MyISAM AUTO_INCREMENT=12249932 DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (FIELD10)
(PARTITION p0 VALUES LESS THAN (0) ENGINE = MyISAM,
PARTITION p1 VALUES LESS THAN (1) ENGINE = MyISAM,
PARTITION p2 VALUES LESS THAN (2) ENGINE = MyISAM,
........
PARTITION p9 VALUES LESS THAN (9) ENGINE = MyISAM,
PARTITION p10 VALUES LESS THAN MAXVALUE ENGINE = MyISAM) */
Each Field10(0-10) value is having 3 million data each.
But when am executing a select query as this
select TASK_ID,STATUS_CODE,count(*) from PART_SAMPLE where FIELD10=X group by TASK_ID,STATUS_CODE;
x can be any value in the partition
for x value 0,2,5,8 it is taking only 10 seconds to retrive result but for rest it is taking abount 50s to rerive the result. As per my understating since data is same for all Fields almost same time has to be taken for any Field10 value. Why this time difference is coming