MySQL partition contains more records than expected - mysql

I have partitioned a MySQL table containing 53 rows. Now when I query number of records in all partitions, the records are almost 3 times the expected. Even phpMyAdmin thinks there are 156 records.
Have I done somthing wrong in my table design and partitioning?
Below picture shows count of records in partitions:
phpMyAdmin:
Finally, this is my table:
CREATE TABLE cl_inbox (
id int(11) NOT NULL AUTO_INCREMENT,
user int(11) NOT NULL,
contact int(11) DEFAULT NULL,
sdate timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
body text NOT NULL,
userstatus tinyint(4) NOT NULL DEFAULT 1 COMMENT '0: new, 1:read, 2: deleted',
contactstatus tinyint(4) NOT NULL DEFAULT 0,
class tinyint(4) NOT NULL DEFAULT 0,
attachtype tinyint(4) NOT NULL DEFAULT 0,
attachsrc varchar(255) DEFAULT NULL,
PRIMARY KEY (id, user),
INDEX i_class (class),
INDEX i_contact_user (contact, user),
INDEX i_contactstatus (contactstatus),
INDEX i_user_contact (user, contact),
INDEX i_userstatus (userstatus)
)
ENGINE = INNODB
AUTO_INCREMENT = 69
AVG_ROW_LENGTH = 19972
CHARACTER SET utf8
COLLATE utf8_general_ci
ROW_FORMAT = DYNAMIC
PARTITION BY KEY (`user`)
(
PARTITION partition1 ENGINE = INNODB,
PARTITION partition2 ENGINE = INNODB,
PARTITION partition3 ENGINE = INNODB,
.....
PARTITION partition128 ENGINE = INNODB
);

Those numbers are approximations, just as with SHOW TABLE STATUS and EXPLAIN.
Meanwhile, you will probably find that PARTITION BY KEY provides no performance improvement. If you find otherwise, I would be very interested to hear about it.

Related

Querying record in a single partition very slow

I have a large table (over 2 billion records) which is partitioned. Each partition contains roughly 500 million records. I have recently moved from physical hardware to AWS, i used a mysqldump to backup and restore the MySQL data. I have also recently created a new partition (p108). Querying data from old partitions (created on the old server) are running as normal, very quick, returning data in seconds. However querying records in the newly created partition (p108) is very slow - minutes.
show create table results
CREATE TABLE `termusage`
(
`id` BIGINT(20) NOT NULL auto_increment,
`terminal` BIGINT(20) DEFAULT NULL,
`date` DATETIME DEFAULT NULL,
`dest` VARCHAR(255) DEFAULT NULL,
`feattrans` BIGINT(20) DEFAULT NULL,
`cost_type` TINYINT(4) DEFAULT NULL,
`cost` DECIMAL(16, 6) DEFAULT NULL,
`gprsup` BIGINT(20) DEFAULT NULL,
`gprsdown` BIGINT(20) DEFAULT NULL,
`duration` TIME DEFAULT NULL,
`file` BIGINT(20) DEFAULT NULL,
`custcost` DECIMAL(16, 6) DEFAULT '0.000000',
`invoice` BIGINT(20) NOT NULL DEFAULT '99999999',
`carriertrans` BIGINT(20) DEFAULT NULL,
`session_start` DATETIME DEFAULT NULL,
`session_end` DATETIME DEFAULT NULL,
`mt_mo` VARCHAR(4) DEFAULT NULL,
`grps_rounded` BIGINT(20) DEFAULT NULL,
`gprs_rounded` BIGINT(20) DEFAULT NULL,
`country` VARCHAR(25) DEFAULT NULL,
`network` VARCHAR(25) DEFAULT NULL,
`ctn` VARCHAR(20) DEFAULT NULL,
`pricetrans` BIGINT(20) DEFAULT NULL,
PRIMARY KEY (`id`, `invoice`),
KEY `idx_terminal` (`invoice`, `terminal`),
KEY `idx_feattrans` (`invoice`, `feattrans`),
KEY `idx_file` (`invoice`, `file`),
KEY `termusage_carriertrans_idx` (`carriertrans`),
KEY `idx_ctn` (`invoice`, `ctn`),
KEY `idx_pricetrans` (`invoice`, `pricetrans`)
)
engine=innodb
auto_increment=17449438880
DEFAULT charset=latin1
/*!50500 PARTITION BY RANGE COLUMNS(invoice)
(PARTITION p103 VALUES LESS THAN (621574) ENGINE = InnoDB,
PARTITION p104 VALUES LESS THAN (628214) ENGINE = InnoDB,
PARTITION p106 VALUES LESS THAN (634897) ENGINE = InnoDB,
PARTITION p107 VALUES LESS THAN (649249) ENGINE = InnoDB,
PARTITION p108 VALUES LESS THAN (662763) ENGINE = InnoDB,
PARTITION plast VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB) */
I created the partition p108 using the following query
ALTER TABLE termusage reorganize partition plast
INTO ( partition p108 VALUES less than (662763),
partition plast VALUES less than maxvalue )
I can see the file termusage#p#p108.ibd and looks to be "normal" and the data is there as i can get results from the query.
information_schema.PARTITIONS shows the following for the table - which indicates there is some kind of issue
Name Pos Rows Avg Data Length Method
p103 1 412249206 124 51124371456 RANGE COLUMNS
p104 2 453164890 133 60594061312 RANGE COLUMNS
p106 3 542767414 135 73562849280 RANGE COLUMNS
p107 4 587042147 129 76288098304 RANGE COLUMNS
p108 5 0 0 16384 RANGE COLUMNS
plast 6 0 0 16384 RANGE COLUMNS
How can i fix the partition ?
Updated
Explain for good query
# id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1, SIMPLE, t, p107, ref, idx_terminal,idx_feattrans,idx_file,idx_ctn,idx_pricetrans, idx_terminal, 17, const,const, 603, 100.00, Using index condition; Using temporary; Using filesort
Explain for poor query
# id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1, SIMPLE, t, p108, ALL, idx_terminal,idx_feattrans,idx_file,idx_ctn,idx_pricetrans, , , , 1, 100.00, Using where; Using temporary; Using filesort
For future readers, the issue was resolved by running ALTER TABLE ... ANALYZE PARTITION p108.
The table and index statistics that guide the optimizer to choose the best way to read the table were out of date. It's common to use ANALYZE to make sure these statistics are updated after a significant data load or delete.

Design Database to store lists

I apologize for the ambiguity of the column and table names.
My database has two tables A and B. Its a many to many relationship between these tables.
Table A has around 200 records
Table A structure
Id. Definition
12 Def1
42 Def2 .... etc.
Table B has around 5 Billion records
Column 1 . Associated Id(from table A)
eg . abc 12
abc 21
pqr 42
I am trying to optimize the way data is stored in table B, as it has a lot of redundant data. The structure am thinking of, is as follows
Column 1 Associated Ids
abc 12, 21
pqr 42
The "Associated Id" column can have updates when new rows are added to table A.
Is this a good structure to create in this scenario? If yes what should the column type be for the "Associated Id"? I am using mysql database.
Create table statements.
CREATE TABLE `A` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(100) DEFAULT NULL,
`name` varchar(100) DEFAULT NULL,
`creat_usr_id` varchar(20) NOT NULL,
`creat_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`modfd_usr_id` varchar(20) DEFAULT NULL,
`modfd_ts` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `A_ak1` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=277 DEFAULT CHARSET=utf8;
CREATE TABLE `B`(
`col1` varchar(128) NOT NULL,
`id` int(11) NOT NULL,
`added_dt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`creat_usr_id` varchar(20) NOT NULL,
`creat_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`col1`,`id`,`added_dt`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (UNIX_TIMESTAMP(added_dt))
(PARTITION Lessthan_2016 VALUES LESS THAN (1451606400) ENGINE = InnoDB,
PARTITION L`Ω`essthan_201603 VALUES LESS THAN (1456790400) ENGINE = InnoDB,
PARTITION Lessthan_201605 VALUES LESS THAN (1462060800) ENGINE = InnoDB,
PARTITION Lessthan_201607 VALUES LESS THAN (1467331200) ENGINE = InnoDB,
PARTITION Lessthan_201609 VALUES LESS THAN (1472688000) ENGINE = InnoDB,
PARTITION Lessthan_201611 VALUES LESS THAN (1477958400) ENGINE = InnoDB,
PARTITION Lessthan_201701 VALUES LESS THAN (1483228800) ENGINE = InnoDB,
PARTITION pfuture VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;
Indexes.
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Index_type Comment Index_comment
B 0 PRIMARY 1 col1 A
2 NULL NULL BTREE
B 0 PRIMARY 2 id A
6 NULL NULL BTREE
B 0 PRIMARY 3 added_dt A
6 NULL NULL BTREE
5 billion rows here. Let me walk through things:
col1 varchar(128) NOT NULL,
How often is this column repeated? That is, is is worth it to 'normalize it?
id int(11) NOT NULL,
Cut the size of this column in half (4 bytes -> 2), since you have only 200 distinct ids:
a_id SMALLINT UNSIGNED NOT NULL
Range of values: 0..65535
added_dt timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
Please explain why this is part of the PK. That is a rather odd thing to do.
creat_usr_id varchar(20) NOT NULL,
creat_ts timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
Toss these as clutter, unless you can justify keeping track of 5 billion actions this way.
PRIMARY KEY (col1,id,added_dt)
I'll bet you will eventually get two rows in the same second. A PK is 'unique'. Perhaps you need only (col, a_id)`? Else, you are allowing a col-a_id pair to be added multiple times. Or maybe you want IODKU to add a new row versus update the timestamp?
PARTITION...
This is useful if (and probably only if) you intend to remove 'old' rows. Else please explain why you picked partitioning.
It is hard to review a schema without seeing the main SELECTs. In the case of large tables, we should also review the INSERTs, UPDATEs, and DELETEs, since each of them could pose serious performance problems.
At 100 rows inserted per second, it will take more than a year to add 5B rows. How fast will the rows be coming in? This may be a significant performance issue, too.

MySQL update query locks table for insertion

I am in a problematic situation and found dozens of questions on same topic, but may b i am not able to understand those solutions as per my issue.
I have a system built in Codeigniter, and it does the following
codeigniter->start_transaction()
UPDATE T SET A = 1, MODIFIED = NOW()
WHERE PK IN
( SELECT PK FROM
(SELECT PK, LAST_INSERT_ID(PK) FROM T
where FK = 31 AND A=0 AND R=1 AND R_FK = 21
AND DEAD = 0 LIMIT 0,1) AS TBL1
) and A=0 AND R = 1 AND R_FK = 21 AND DEAD = 0
-- what this query does is , it takes a row dynamically which is not dead yet,
--and not assigned and it's linked to 21 id (R_FK) from R table,
-- when finds the row, update it to be marked as assigned (A=1).
-- PK = LAST_INSERT_ID(PK) ensures that last_insert_id is updated with this row id, so i can retrieve it from PHP
GOTO MODULE B
MODULE B {
INSERT INTO T(A,B,C,D,E,F,R,FK,R_FK,DEAD,MODIFIED) VALUES(ALL VALUES)
-- this line gives me lock wait timeout exceeded.
}
MySQL version is 5.1.63-community-log
Table T is an INNODB table and has only one normal type index on FK field, and no foreign key constraints are there. PrimaryKey (PK) field is an auto_increment field.
I get lock wait timeout in the above case , and that is due to first transactional update holding lock on table, how can i avoid lock on table with that update query ,while using transactions, I cannot commit the transaction until i receive response from MODULE B .
I don't have much detailed knowledge about DB and structural things, so please bear with me if i said something not making sense.
--UPDATE--
-- TABLE T Structure
CREATE TABLE `T` (
`PK` int(11) NOT NULL AUTO_INCREMENT,
`FK` int(11) DEFAULT NULL,
`P` varchar(1024) DEFAULT NULL,
`DEAD` tinyint(1) NOT NULL DEFAULT '0',
`A` tinyint(1) NOT NULL DEFAULT '0',
`MODIFIED` datetime DEFAULT NULL,
`R` tinyint(4) NOT NULL DEFAULT '0',
`R_FK` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`PK`),
KEY `FK_REFERENCE_54` (`FK`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
-- Indexes Information
SHOW INDEX FROM T;
1- Field FK, Cardinality 65 , NULL => Yes , Index_Type => BTRee
2- Field PK, Cardinality 11153, Index_Type => BTRee

Creating temporary table from TokuDB query too slow

I have this table in one server:
CREATE TABLE `mh` (
`M` char(13) NOT NULL DEFAULT '',
`F` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`D` char(6) DEFAULT NULL,
`A` int(11) DEFAULT NULL,
`DC` char(13) DEFAULT NULL,
`S` char(22) DEFAULT NULL,
`S0` int(11) DEFAULT NULL,
PRIMARY KEY (`F`,`M`),
KEY `IDX_S` (`S`),
KEY `IDX_M` (`M`),
KEY `IDX_A` (`M`,`A`)
) ENGINE=TokuDB DEFAULT CHARSET=latin1;
And the same table but using MyISAM engine in another similar server.
When I execute this query:
CREATE TEMPORARY TABLE temp
(S VARCHAR(22) PRIMARY KEY)
AS
(
SELECT S, COUNT(S) AS HowManyS
FROM mh
WHERE A = 1 AND S IS NOT NULL
GROUP BY S
);
The table has 120 millions of rows. The server using TokuDB executes the query in 3 hours... the server using MyISAM in 22 minutes.
The query using TokuDB shows a "Queried about 38230000 rows, Fetched about 303929 rows, loading data still remains" status.
Why TokuDB query duration take so long? TokuDB is a really good engine, but I don't know what I'm doing wrong with this query
The servers are using a MariaDB 5.5.38 server
TokuDB is not currently using it's bulk-fetch algorithm on this statement, as noted in https://github.com/Tokutek/tokudb-engine/issues/143. I've added a link to this page so it is considered as part of the upcoming effort.

Unable to optimise MySQL query further: what am I missing?

I've got a query which seems to be impossible to optimise further (with regards to execution time). It's a plain simple query, indexes are in place, I've tried to configure InnoDB settings...but nothing really seems to help.
Tables
The query is a JOIN between the three tables trk, auf and paf.
trk : temporary table holding id's representing tracks.
auf : table representing audio files associated with the tracks.
paf : table holding the id's of published audio files. Acts as a "filter".
// 'trk' table
CREATE TEMPORARY TABLE auf_713340 (
`id` char(36),
PRIMARY KEY (id)
) ENGINE=MEMORY);
// 'auf' table
CREATE TABLE `file` (
`id` char(36) NOT NULL,
`track_id` char(36) NOT NULL,
`type` varchar(3) DEFAULT NULL,
`quality` int(1) DEFAULT '0',
`size` int(20) DEFAULT '0',
`duration` float DEFAULT '0',
`bitrate` int(6) DEFAULT '0',
`samplerate` int(5) DEFAULT '0',
`tagwritten` datetime DEFAULT NULL,
`tagwriteattempts` int(3) NOT NULL DEFAULT '0',
`audiodataread` datetime DEFAULT NULL,
`audiodatareadattempts` int(3) NOT NULL DEFAULT '0',
`converted` datetime DEFAULT NULL,
`convertattempts` int(3) NOT NULL DEFAULT '0',
`waveformgenerated` datetime DEFAULT NULL,
`waveformgenerationattempts` int(3) NOT NULL DEFAULT '0',
`flag` int(1) NOT NULL DEFAULT '0',
`status` int(1) NOT NULL DEFAULT '0',
`updated` datetime NOT NULL DEFAULT '2000-01-01 00:00:00',
PRIMARY KEY (`id`),
KEY `FK_file_track` (`track_id`),
KEY `file_size` (`size`),
KEY `file_type` (`type`),
KEY `file_quality` (`quality`),
CONSTRAINT `file_ibfk_1` FOREIGN KEY (`track_id`) REFERENCES `track` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
// 'paf' table
CREATE TABLE `publishedfile` (
`file_id` varchar(36) NOT NULL,
`data` varchar(255) DEFAULT NULL,
`file_updated` datetime NOT NULL,
PRIMARY KEY (`file_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
The query usually takes between 1500 ms and 2500 ms to execute with somewhere between 50 and 100 ids in the trk table.The auf table holds about 1.1 million rows, and the paf table holds about 900.000 rows.
The MySQL server runs on a 4GB Rackspace Cloud Server instance.
The Query
SELECT auf.*
FROM auf_713340 trk
INNER JOIN file auf
ON auf.track_id = trk.id
INNER JOIN publishedfile paf
ON auf.id = paf.file_id
The Query w/EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE trk ALL NULL NULL NULL NULL 60
1 SIMPLE auf ref PRIMARY,FK_file_track FK_file_track 108 func 1 Using where
1 SIMPLE paf eq_ref PRIMARY PRIMARY 110 trackerdatabase_development.auf.id 1 Using where; Using index
The InnoDB configuration
[mysqld]
# The size of memory used to cache table data and indexes. The larger
# this value is, the less I/O is needed to access data in tables.
# Default value is 8MB. Recommendations point towards 70% - 80% of
# available system memory.
innodb_buffer_pool_size=2850M
# Recommendations point towards using O_DIRECT to avoid double buffering.
# innodb_flush_method=O_DIRECT
# Recommendations point towards using 256M.
# #see http://www.mysqlperformanceblog.com/2006/07/03/choosing-proper-innodb_log_file_size/
innodb_log_file_size=256M
# The size in bytes of the buffer that InnoDB uses to write to the log files
# on disk. Recommendations point towards using 4MB.
innodb_log_buffer_size=4M
# The size of the buffer used for MyISAM index blocks.
key_buffer_size=128M
Now, the question is; what can I do to get the query to perform better? After all, the tables in question are not that big and indexes are in place..?
In auf table make id field as int(11) and make it auto increment. all int field length which are >11 , edit them into 11.
Thanks
Ripa Saha
Try this:
SELECT auf.*
FROM file auf
WHERE EXISTS
( SELECT *
FROM auf_713340 trk
WHERE auf.track_id = trk.id
)
AND EXISTS
( SELECT *
FROM publishedfile paf
WHERE auf.id = paf.file_id
) ;
I would also test and compare efficiency with the temporary table defined with InnoDB engine or with the (Primary) index as a BTREE index. Memory tables have HASH indices by default, not Btree if I remember correctly.