I want to partition a mysql table by datetime column. One day a partition.The create table scripts is like this:
CREATE TABLE raw_log_2011_4 (
id bigint(20) NOT NULL AUTO_INCREMENT,
logid char(16) NOT NULL,
tid char(16) NOT NULL,
reporterip char(46) DEFAULT NULL,
ftime datetime DEFAULT NULL,
KEY id (id)
) ENGINE=InnoDB AUTO_INCREMENT=286802795 DEFAULT CHARSET=utf8
PARTITION BY hash (day(ftime)) partitions 31;
But when I select data of some day.It could not locate the partition.The select statement is like this:
explain partitions select * from raw_log_2011_4 where day(ftime) = 30;
when i use another statement,it could locate the partition,but I coluld not select data of some day.
explain partitions select * from raw_log_2011_4 where ftime = '2011-03-30';
Is there anyone tell me How I could select data of some day and make use of partition.Thanks!
Partitions by HASH is a very bad idea with datetime columns, because it cannot use partition pruning. From the MySQL docs:
Pruning can be used only on integer columns of tables partitioned by
HASH or KEY. For example, this query on table t4 cannot use pruning
because dob is a DATE column:
SELECT * FROM t4 WHERE dob >= '2001-04-14' AND dob <= '2005-10-15';
However, if the table stores year values in an INT column, then a
query having WHERE year_col >= 2001 AND year_col <= 2005 can be
pruned.
So you can store the value of TO_DAYS(DATE()) in an extra INTEGER column to use pruning.
Another option is to use RANGE partitioning:
CREATE TABLE raw_log_2011_4 (
id bigint(20) NOT NULL AUTO_INCREMENT,
logid char(16) NOT NULL,
tid char(16) NOT NULL,
reporterip char(46) DEFAULT NULL,
ftime datetime DEFAULT NULL,
KEY id (id)
) ENGINE=InnoDB AUTO_INCREMENT=286802795 DEFAULT CHARSET=utf8
PARTITION BY RANGE( TO_DAYS(ftime) ) (
PARTITION p20110401 VALUES LESS THAN (TO_DAYS('2011-04-02')),
PARTITION p20110402 VALUES LESS THAN (TO_DAYS('2011-04-03')),
PARTITION p20110403 VALUES LESS THAN (TO_DAYS('2011-04-04')),
PARTITION p20110404 VALUES LESS THAN (TO_DAYS('2011-04-05')),
...
PARTITION p20110426 VALUES LESS THAN (TO_DAYS('2011-04-27')),
PARTITION p20110427 VALUES LESS THAN (TO_DAYS('2011-04-28')),
PARTITION p20110428 VALUES LESS THAN (TO_DAYS('2011-04-29')),
PARTITION p20110429 VALUES LESS THAN (TO_DAYS('2011-04-30')),
PARTITION future VALUES LESS THAN MAXVALUE
);
Now the following query will only use partition p20110403:
SELECT * FROM raw_log_2011_4 WHERE ftime = '2011-04-03';
Hi You are doing the wrong partition in definition of the table the table definition would like this:
CREATE TABLE raw_log_2011_4 (
id bigint(20) NOT NULL AUTO_INCREMENT,
logid char(16) NOT NULL,
tid char(16) NOT NULL,
reporterip char(46) DEFAULT NULL,
ftime datetime DEFAULT NULL,
KEY id (id)
) ENGINE=InnoDB AUTO_INCREMENT=286802795 DEFAULT CHARSET=utf8
PARTITION BY hash (TO_DAYS(ftime)) partitions 31;
And your select command would be:
explain partitions
select * from raw_log_2011_4 where TO_DAYS(ftime) = '2011-03-30';
The above command would select all the date required, as if you use the TO_DAYS command as
mysql> SELECT TO_DAYS(950501);
-> 728779
mysql> SELECT TO_DAYS('2007-10-07');
-> 733321
Why to use the TO_DAYS AS The MySQL optimizer will recognize two date-based functions for partition pruning purposes:
1.TO_DAYS()
2.YEAR()
and this would solve your problem..
I just recently read a MySQL blog post relating to this, at http://dev.mysql.com/tech-resources/articles/mysql_55_partitioning.html.
Versions earlier than 5.1 required special gymnastics in order to do partitioning based on dates. The link above discusses it and shows examples.
Versions 5.5 and later allowed you to do direct partitioning using non-numeric values such as dates and strings.
Don't use CHAR, use VARCHAR. That will save a lot of space, hence decrease I/O, hence speed up queries. (Exception: If the column is really fixed length, then use CHAR. And it will probably be CHARACTER SET ascii.)
reporterip: (46) is unnecessarily big for an IP address, even IPv6. See My blog for further discussion, including how to shrink it to 16 bytes.
PARTITION BY RANGE(TO_DAYS(...)) as #Steyx suggested, but don't have more than about 50 partitions. The more partitions you have, the slower queries get, in spite of the "pruning". HASH partitioning is essentially useless.
More discussion of partitioning, especially the type you are looking at. That includes code for a sliding set of partitions over time.
Related
I am trying to partition my table MySQL innoDB. Right now there are approximately 2 million rows in the location table (and growing always) rows of history data. I must perodicly delete the dataset old by year
I use MySQL 5.7.22 Community Server.
CREATE TABLE `geo_data` (
`ID` bigint(20) NOT NULL AUTO_INCREMENT,
`ID_DISP` bigint(20) DEFAULT NULL,
`SYS_TIMESTAMP` datetime DEFAULT NULL,
`DATA_TIMESTAMP` bigint(20) DEFAULT NULL,
`X` double DEFAULT NULL,
`Y` double DEFAULT NULL,
`SPEED` bigint(20) DEFAULT NULL,
`HEADING` bigint(20) DEFAULT NULL,
`ID_DATA_TYPE` bigint(20) DEFAULT NULL,
`PROCESSED` bigint(20) DEFAULT NULL,
`ALTITUDE` bigint(20) DEFAULT NULL,
`ID_UNIT` bigint(20) DEFAULT NULL,
`ID_DRIVER` bigint(20) DEFAULT NULL,
UNIQUE KEY `part_id` (`ID`,`DATA_TIMESTAMP`,`ID_DISP`),
KEY `Index_idDisp_dataTS_type` (`ID_DISP`,`DATA_TIMESTAMP`,`ID_DATA_TYPE`),
KEY `Index_idDisp_dataTS` (`ID_DISP`,`DATA_TIMESTAMP`),
KEY `Index_TS` (`DATA_TIMESTAMP`),
KEY `idx_sysTS_idDisp` (`ID_DISP`,`SYS_TIMESTAMP`),
KEY `idx_clab_geo_data_ID_UNIT_DATA_TIMESTAMP_ID_DATA_TYPE` (`ID_UNIT`,`DATA_TIMESTAMP`,`ID_DATA_TYPE`),
KEY `idx_idUnit_dataTS` (`ID_UNIT`,`DATA_TIMESTAMP`),
KEY `idx_clab_geo_data_ID_DRIVER_DATA_TIMESTAMP_ID_DATA_TYPE` (`ID_DRIVER`,`DATA_TIMESTAMP`,`ID_DATA_TYPE`)
) ENGINE=InnoDB AUTO_INCREMENT=584390 DEFAULT CHARSET=latin1;
I have to partition by DATA_TIMESTAMP (format timestamp date gps).
ALTER TABLE geo_data
PARTITION BY RANGE (year(from_unixtime(data_timestamp)))
(
PARTITION p2018 VALUES LESS THAN ('2018'),
PARTITION p2019 VALUES LESS THAN ('2019'),
PARTITION pmax VALUES LESS THAN MAXVALUE
);
Error Code: 1697. VALUES value for partition 'p2018' must have type INT
How can I do?
I would like to add later a subpartion range by ID_DISP. How can I do?
Thanks in advance!
Since data_timestamp was actually a BIGINT, you are not permitted to use date functions. It seemed there were two errors, and this probably fixes them:
ALTER TABLE geo_data
PARTITION BY RANGE (data_timestamp)
(
PARTITION p2018 VALUES LESS THAN (UNIX_TIMESTAMP('2018-01-01') * 1000),
PARTITION p2019 VALUES LESS THAN (UNIX_TIMESTAMP('2019-01-01') * 1000),
PARTITION pmax VALUES LESS THAN MAXVALUE
);
I am assuming your data_timestamp is really milliseconds, a la Java? If not, the decide what to do with the * 1000.
SUBPARTITIONs are useless; don't bother with them. If you really want to partition by Month or Quarter, then simply do it at the PARTITION level.
Recommendation: Don't have more than about 50 partitions.
How many "drivers" do you have? I suspect you do not have trillions. So, don't blindly use BIGINT for ids. Each takes 8 bytes. SMALLINT UNSIGNED, for example, would take only 2 bytes and allow 64K drivers (etc).
If X and Y are latitude and longitude, it might be clearer to name them such. Here are what datatype to use instead of the 8-byte DOUBLE, depending on the resolution you have (and need). 4-byte FLOATs are probably good enough for vehicles.
The table has several redundant indexes; toss them. Also, note that when you have INDEX(a,b,c), it is redundant to also have INDEX(a,b).
See also my discussion on Partitioning, especially related to time-series, such as yours.
Hmmm... I wonder if the 63 bits of precision for SPEED will let you record them when they go at the speed of light?
Another point: Don't create p2019 until just before the start of 2019. You have pmax in case you goof and fail to add that partition in time. And the REORGANIZE PARTITION mentioned in my discussion covers how to recover from such a goof.
Update:
Seems you cannot use from_unixtime in a PARTITION BY RANGE query because hash partitions must be based on an integer expression. More info see this answer
Its expecting an INT not a STRING (as per the error message), so try :
ALTER TABLE geo_data
PARTITION BY RANGE (year(from_unixtime(data_timestamp)))
(
PARTITION p2018 VALUES LESS THAN (2018),
PARTITION p2019 VALUES LESS THAN (2019),
PARTITION pmax VALUES LESS THAN MAXVALUE
);
Here I have specified the year in the partition values as an int ie 2018 / 2019, and not strings as in '2018' / '2019'
I have a very large 500 million rows table with the following columns:
id - Bigint - Autoincrementing primary index.
date - Datetime - Approximately 1.5 million rows per date, data older 1 year is deleted.
uid - VARCHAR(60) - A user ID
sessionNumber - INT
start - INT - epoch of start time.
end - INT - epoch of end time.
More columns not relevant for this query.
The combination of uid and sessionNumber forms a uinque index. I also have an index on date.
Due to the sheer size, I'd like to partition the table.
Most of my accesses would be by date, so partitioning by date ranges seems intuitive, but as the date is not part of the unique index, this is not an option.
Option 1: RANGE PARTITION on Date and BEFORE INSERT TRIGGER
I don't really have a regular issue with the uid and sessionNumber uniqueness being violated. The source data is consistent, but sessions that span two days may be inserted on two consecutive days with midnight being the end time of the first and start time of the second.
I'm trying to understand if I could remove the unique key and instead use a trigger that would
Check if there is a session with the same identifiers the previous day and if so,
Updates the end date.
cancels the actual insert.
However, I am not sure if I can 1) trigger an update on the same table. or 2) prevent the actual insert.
Option 2: LINEAR HASH PARTITION on UID
My second option is to use a linear hash partition on the UID. However I cannot see any example that utilizes a VARCHAR and converts it to an INTEGER which is used for the HASH partitioning.
However I cannot finde a permitted way to convert from VARCHAR to INTEGER. For example
ALTER TABLE mytable
PARTITION BY HASH (CAST(md5(uid) AS UNSIGNED integer))
PARTITIONS 20
returns that the partition function is not allowed.
HASH partitioning must work with a 32-bit integer. But you can't convert an MD5 string to an integer simply with CAST().
Instead of MD5, CRC32() can take an arbitrary string and converts to a 32-bit integer. But this is also not a valid function for partitioning.
mysql> alter table v partition by hash(crc32(uid));
ERROR 1564 (HY000): This partition function is not allowed
You could partition by the string using KEY Partitioning instead of HASH partitioning. KEY Partitioning accepts strings. It passes whatever input string through MySQL's built-in PASSWORD() function, which is basically related to SHA1.
However, this leads to another problem with your partitioning strategy:
mysql> alter table v partition by key(uid);
ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function
Your table's primary key id does not include the column uid that you want to partition by. This is a restriction of MySQL's partitioning:
every unique key on the table must use every column in the table's partitioning expression.
Here's the table I'm testing with (it would have been a good idea for you to include this in your question):
CREATE TABLE `v` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`date` datetime NOT NULL,
`uid` varchar(60) NOT NULL,
`sessionNumber` int(11) NOT NULL,
`start` int(11) NOT NULL,
`end` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `uid` (`uid`,`sessionNumber`),
KEY `date` (`date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Before going any further, I have to wonder why you want to use partitioning anyway? "Sheer size" is not a reason to partition a table.
Partitioning, like any optimization, is done for the sake of specific queries you want to optimize for. Any optimization improves one query at the expense of other queries. Optimization has nothing to do with the table. The table is happy to sit there with 5 billion rows, and it doesn't care. Optimization is for the queries.
So you need to know which queries you want to optimize for. Then decide on a strategy. Partitioning might not be the best strategy for the set of queries you need to optimize!
I'll assume your 'uid' is a 128-bit UUID kind of value, which can be stored as a BINARY(16), because that is generally worth the trouble.
Next, stay away from the 'datetime' type, as it is stored like a packed string, and doesn't hold any timezone information. Store date-time-values either as pure numerical values (the number of seconds since the UNIX-epoch), or let MySQL do that for you and use the timestamp(N) type.
Also don't call a column 'date', not just because that is a reserved word, but also because the value contains time details too.
Next, stay away from using anything else than latin1 as the CHARSET of (all) your tables. Only ever do UTF-8-ness at the column level. This to prevent unnecessarily byte-wide columns and indexes creeping in over time. Adopt this habit and you'll happily look back on it after some years, promised.
This makes the table look like:
CREATE TABLE `v` (
`uuid` binary(16) NOT NULL,
`mysql_created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`visitor_uuid` BINARY(16) NOT NULL,
`sessionNumber` int NOT NULL,
`start` int NOT NULL,
`end` int NOT NULL,
PRIMARY KEY (`uuid`),
UNIQUE KEY (`visitor_uuid`,`sessionNumber`),
KEY (`mysql_created_at`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
PARTITIONED BY RANGE COLUMNS (`uuid`)
( PARTITION `p_0` VALUES LESS THAN (X'10')
, PARTITION `p_1` VALUES LESS THAN (X'20')
...
, PARTITION `p_9` VALUES LESS THAN (X'A0')
, PARTITION `p_A` VALUES LESS THAN (X'B0')
...
, PARTITION `p_F` VALUES LESS THAN (MAXVALUE)
);
To make the KEY (mysql_created_at) be only on the date-part, needs a calculated column, which can be added in-place, and then an index on it is also light to add, so I'll leave that as homework.
I'm running a table that has built up to 600 million rows and is rapidly growing, which has been slowing down queries that need to run as quickly as possible. Current schema is:
CREATE TABLE `user_history` (
`userId` int(11) NOT NULL,
`asin` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
`dateSent` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
KEY `userId` (`userId`,`asin`,`dateSent`),
KEY `dateSent` (`dateSent`,`asin`),
KEY `asin` (`asin`,`dateSent`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Everything I've read about partitioning suggested that this was a prime candidate for partitioning by date range. We only tend to use the last 14 days data, but the client doesn't want to delete old data. The new schema looks like:
CREATE TABLE `user_history_partitioned` (
`userId` int(11) NOT NULL,
`asin` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
`dateSent` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`dateSent`,`asin`,`userId`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
PARTITION BY RANGE ( UNIX_TIMESTAMP(dateSent) ) (
PARTITION Apr2013 VALUES LESS THAN (UNIX_TIMESTAMP('2013-05-01')),
etc...
PARTITION Mar2014 VALUES LESS THAN (UNIX_TIMESTAMP('2014-04-01')),
PARTITION Apr2014 VALUES LESS THAN (UNIX_TIMESTAMP('2014-05-01')),
PARTITION May2014 VALUES LESS THAN (UNIX_TIMESTAMP('2014-06-01')),
PARTITION Future VALUES LESS THAN MAXVALUE);
The idea of the Future partition is because a REORGANIZE PARTITION run on a populated partition was taking a long time to complete. So Future will always be empty and can reorganized into new partitions instantly. And other queries using this table have been reordered to use the primary key only, to reduce the number of indexes on the table.
The time-critical query is apropos of:
SELECT SQL_NO_CACHE *
FROM books B
WHERE (non-relevant stuff deleted)
AND NOT EXISTS
(
SELECT 1 FROM user_history H
WHERE
H.userId=$userId
AND H.asin=B.ASIN
AND dateSent > DATE_SUB(NOW(), INTERVAL 14 DAY)
)
AND (non-relevant stuff deleted)
LIMIT 1
So we're avoid duplicates that have already been selected for the same user in the last 14 days. And this currently returns in < 0.1 secs, which is okay but slower than it used to be on the current schema.
For the new schema, the inner SELECT has been reordered to:
SELECT 1 FROM user_history_partitioned H
WHERE dateSent > DATE_SUB(NOW(), INTERVAL 14 DAY)
AND H.asin=B.ASIN
AND H.userId=$userId
And it's taking 5 minutes per query. and I can't see why. The idea is that the current partition and indexes should be in memory (or maybe the previous month too, at some times of the month), and the primary index covers the WHERE clause. But from the time it's taking, it could be performing a full table scan on asin or userId. Which is difficult to identify from EXPLAIN because it's an inner query.
What am I missing? Do I need another combined index for (asin, userID)? If so, why?
Thanks,
PS: Tried wrapping the DATE_SUB(...) as UNIX_TIMESTAMP(DATE_SUB(...)) just in case it was a type conversion issue, but made no difference.
I'm trying to shorten the time it takes to query this table. This is one of our production tables so we cannot do any changes to it.
I'm using the following query on a bash script
SELECT min(date_time) FROM db.event
But, this is taking about 20min. We have a lot of data on these partitions - around 100GB.
The bash scripts needs to gather the oldest record in this table. Is there any better way to make it more efficient say 10min or less?
*************************** 1. row ***************************
Table: event
Create Table: CREATE TABLE `event` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`type` varchar(100) NOT NULL,
`application` varchar(100) NOT NULL,
`cdr_id` varchar(100) NOT NULL,
`date_time` datetime NOT NULL,
PRIMARY KEY (`id`,`date_time`),
KEY `ix_cdr` (`cdr_id`),
KEY `ix_type_app_cdr` (`type`,`application`,`cdr_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2985903027 DEFAULT CHARSET=latin1
/*!50100 PARTITION BY LIST (to_days(date_time))
(PARTITION p735660 VALUES IN (735060) ENGINE = InnoDB,
...
... many partitions here
...
...
PARTITION p735660 VALUES IN (735566) ENGINE = InnoDB)
You want to create an index on date_time:
create index event_datetime on event(date_time)
The field date_time has to be the first column in the index.
EDIT:
If you can't make changes to the table, then you have a problem. If the datetime is correlated with the id, then you could do:
select datetime
from event
order by id
limit 1;
But this is a big assumption that the minimum date is on the minimum id.
I update MySQL versition from 5.0 to 5.5. and I am new for studying mysql partition. firstly, I type:
SHOW VARIABLES LIKE '%partition%'
Variable_name Value
have_partitioning YES
Make sure that the new version support partition. I tried to partition my table by every 10 minutes, then INSERT, UPDATE, QUERY huge data into this table for a test.
First, I need create a new table, I type my code:
CREATE TABLE test (
`id` int unsigned NOT NULL auto_increment,
`words` varchar(100) collate utf8_unicode_ci NOT NULL,
`date` varchar(10) collate utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
FULLTEXT KEY `index` (`words`)
)
ENGINE=MyISAM
DEFAULT CHARSET=utf8
COLLATE=utf8_unicode_ci
AUTO_INCREMENT=0
PARTITION BY RANGE (MINUTE(`date`))
(
PARTITION p0 VALUES LESS THAN (1322644000),
PARTITION p1 VALUES LESS THAN (1322644600) ,
PARTITION p2 VALUES LESS THAN (1322641200) ,
PARTITION p3 VALUES LESS THAN (1322641800) ,
PARTITION p4 VALUES LESS THAN MAXVALUE
);
It return alert: #1564 - This partition function is not allowed, so what is this problem? thanks.
UPDATE
Modify date into int NOT NULL, and PARTITION BY RANGE MINUTE(date) into PARTITION BY RANGE COLUMNS(date)
CREATE TABLE test (
`id` int unsigned NOT NULL auto_increment,
`words` varchar(100) collate utf8_unicode_ci NOT NULL,
`date` int NOT NULL,
PRIMARY KEY (`id`),
FULLTEXT KEY `index` (`words`)
)
ENGINE=MyISAM
DEFAULT CHARSET=utf8
COLLATE=utf8_unicode_ci
AUTO_INCREMENT=0
PARTITION BY RANGE COLUMNS(`date`)
(
PARTITION p0 VALUES LESS THAN (1322644000),
PARTITION p1 VALUES LESS THAN (1322644600) ,
PARTITION p2 VALUES LESS THAN (1322641200) ,
PARTITION p3 VALUES LESS THAN (1322641800) ,
PARTITION p4 VALUES LESS THAN MAXVALUE
);
Then caused new error: #1214 - The used table type doesn't support FULLTEXT indexes
I am so sorry, mysql not support fulltext and partition at the same time.
See partitioning limitations
FULLTEXT indexes. Partitioned tables do not support FULLTEXT indexes or searches. This includes partitioned tables employing the MyISAM storage engine.
One issue might be
select MINUTE('2008-10-10 56:56:98') returns null, the reason is Minute function returns minute from time or datetime value, where as in your case date is varchar
MINUTE function returns in either date/datetime expression. Again, A partitioning key must be either an integer column or an expression that resolves to an
integer but inyour case it's VARCHAR