how to partitioning mysql by column value "user_id" and "gps_time"? - mysql

my table scheme:
CREATE TABLE `test_table` (
`his_id` int(11) NOT NULL,
`user_id` varchar(45) NOT NULL,
`gps_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`his_id`,`user_id`)
)
I want partitioning this table by user_id and gps_time,
which column user_id is partitioned by first character A~Z、a~z、0~9,
column gps_time is partitioned by the tast 3 month(ie:3 partitions).
how to do that?
thanks alot~

With MySQL 5.5, you can use multiple columns with RANGE partitioning.
From your question, it's not entirely clear how many partitions you want; it sounds as if you want a whole boatload of partitions, but I don't believe that's what you really want.
The syntax for RANGE partitioning is in the MySQL Reference Manual, available online.
here: http://dev.mysql.com/doc/refman/5.5/en/partitioning.html
(Be sure you check the manual for the version of MySQL you are actually running; there's been some significant changes to partitioning in 5.0, 5.1, 5.5, etc.
With MySQL 5.5.x, if you want a separate partitions for the first character of user_id, and a range of gps_time values, you could do something like this:
PARTITION BY RANGE COLUMNS(userid, gps_time)
( PARTITION pA0 VALUES LESS THAN ('B','2014-07-01')
, PARTITION pA1 VALUES LESS THAN ('B','2014-08-01')
, PARTITION pA2 VALUES LESS THAN ('B','2014-09-01')
, PARTITION pA3 VALUES LESS THAN ('B',MAXVALUE)
, PARTITION pB0 VALUES LESS THAN ('C','2014-07-01')
, PARTITION pB1 VALUES LESS THAN ('C','2014-08-01')
, PARTITION pB2 VALUES LESS THAN ('C','2014-09-01')
, PARTITION pB3 VALUES LESS THAN ('C',MAXVALUE)
, ...
, PARTITION pMX VALUES LESS THAN (MAXVALUE,MAXVALUE),
But that'd be over 100 partitions. I can't imagine a scenario where that's that's you really want. (I'm not sure what the upper limit on partitions for a table is.)
With MySQL 5.1, I don't believe it's possible to partition on multiple columns. You could, howerver, partition on just the user_id column, and then create subpartitions (within each partition) on the gps_time column... but I've never done that before.

Related

MySQL : optimize partitioning to speed up requests [duplicate]

I have a huge table that stores many tracked events, such as a user click.
The table is already in the 10s of millions, and it's growing larger every day.
The queries are starting to get slower when I try to fetch events from a large timeframe, and after reading quite a bit on the subject I understand that partitioning the table may boost the performance.
What I want to do is partition the table on a per month basis.
I have only found guides that show how to partition manually each month, is there a way to just tell MySQL to partition by month and it will do that automatically?
If not, what is the command to do it manually considering my partitioned by column is a datetime?
As explained by the manual: http://dev.mysql.com/doc/refman/5.6/en/partitioning-overview.html
This is easily possible by hash partitioning of the month output.
CREATE TABLE ti (id INT, amount DECIMAL(7,2), tr_date DATE)
ENGINE=INNODB
PARTITION BY HASH( MONTH(tr_date) )
PARTITIONS 6;
Do note that this only partitions by month and not by year, also there are only 6 partitions (so 6 months) in this example.
And for partitioning an existing table (manual: https://dev.mysql.com/doc/refman/5.7/en/alter-table-partition-operations.html):
ALTER TABLE ti
PARTITION BY HASH( MONTH(tr_date) )
PARTITIONS 6;
Querying can be done both from the entire table:
SELECT * from ti;
Or from specific partitions:
SELECT * from ti PARTITION (HASH(MONTH(some_date)));
CREATE TABLE `mytable` (
`post_id` int DEFAULT NULL,
`viewid` int DEFAULT NULL,
`user_id` int DEFAULT NULL,
`post_Date` datetime DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
PARTITION BY RANGE (extract(year_month from `post_Date`))
(PARTITION P0 VALUES LESS THAN (202012) ENGINE = InnoDB,
PARTITION P1 VALUES LESS THAN (202104) ENGINE = InnoDB,
PARTITION P2 VALUES LESS THAN (202108) ENGINE = InnoDB,
PARTITION P3 VALUES LESS THAN (202112) ENGINE = InnoDB,
PARTITION P4 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)
Be aware of the "lazy" effect doing it partitioning by hash:
As docs says:
You should also keep in mind that this expression is evaluated each time a row is inserted or updated (or possibly deleted); this means that very complex expressions may give rise to performance issues, particularly when performing operations (such as batch inserts) that affect a great many rows at one time.
The most efficient hashing function is one which operates upon a single table column and whose value increases or decreases consistently with the column value, as this allows for “pruning” on ranges of partitions. That is, the more closely that the expression varies with the value of the column on which it is based, the more efficiently MySQL can use the expression for hash partitioning.
For example, where date_col is a column of type DATE, then the expression TO_DAYS(date_col) is said to vary directly with the value of date_col, because for every change in the value of date_col, the value of the expression changes in a consistent manner. The variance of the expression YEAR(date_col) with respect to date_col is not quite as direct as that of TO_DAYS(date_col), because not every possible change in date_col produces an equivalent change in YEAR(date_col).
HASHing by month with 6 partitions means that two months a year will land in the same partition. What good is that?
Don't bother partitioning, index the table.
Assuming these are the only two queries you use:
SELECT * from ti;
SELECT * from ti PARTITION (HASH(MONTH(some_date)));
then start the PRIMARY KEY with the_date.
The first query simply reads the entire table; no change between partitioned and not.
The second query, assuming you want a single month, not all the months that map into the same partition, would need to be
SELECT * FROM ti WHERE the_date >= '2019-03-01'
AND the_date < '2019-03-01' + INTERVAL 1 MONTH;
If you have other queries, let's see them.
(I have not found any performance justification for ever using PARTITION BY HASH.)

mysql select query optimization of partitioned table in non-cluster environment

I have select query on a partitioned table with 123 million records which is taking more then 10 minutes to fetch data. My query looks like 'select * from tableName where column1='1.1.1.1' order by timestamp desc';
Table is already indexed on column1.
Any help appreciated.
(From comments)
CREATE TABLE mytable (
column1 varchar(256) NOT NULL,
column2 varchar(100) NOT NULL,
column3 smallint(5) unsigned NOT NULL,
column4 smallint(5) unsigned NOT NULL,
timestamp bigint(20) unsigned NOT NULL,
KEY mytable_idx (column2,timestamp,column3,column4),
KEY ip_addr_index (column1),
KEY ts_idx (timestamp)
) /*!50100 PARTITION BY RANGE ((TIMESTAMP))
(PARTITION p1498800000 VALUES LESS THAN (1498800000) ENGINE = InnoDB,
PARTITION p1500000000 VALUES LESS THAN (1500000000) ENGINE = InnoDB,
PARTITION p1501200000 VALUES LESS THAN (1501200000) ENGINE = InnoDB,
PARTITION p1502400000 VALUES LESS THAN (1502400000) ENGINE = InnoDB,
PARTITION p1503600000 VALUES LESS THAN (1503600000) ENGINE = InnoDB,
PARTITION p1504800000 VALUES LESS THAN (1504800000) ENGINE = InnoDB,
PARTITION p1506000000 VALUES LESS THAN (1506000000) ENGINE = InnoDB
) */
For this query:
select *
from tableName
where column1 = '1.1.1.1'
order by timestamp desc;
You want an index on (column1, timestamp desc). Note: The desc may be ignored in earlier versions of MySQL.
PARTITIONing does not intrinsically provide speed. Please provide SHOW CREATE TABLE so we can discuss whether partitioning actually hurts performance in your case.
INDEX(column1, timestamp) -- In this order
is optimal whether the table is partitioned or not. In particular, that index will work just as good for non-partitioned. (Gordon's comment about DESC has no impact on performance, whether old or new version.)
With 123 million rows, you should keep an eye on datatypes. If you have
column1 VARCHAR(15) CHARACTER SET utf8
then that ipv4_address can be improved from up-to-17 bytes to exactly 4:
BINARY(4)
with suitable conversions on INSERT and SELECT. Making that change would also allow for CDR and other range tests, which are not possible with VARCHAR. Will you need to handle IPv6? I discuss that here.
How many rows match 1.1.1.1? Are there any TEXT columns? What is the PRIMARY KEY? Which Engine? Each of those questions may have an impact on the "10 minutes".
It is important to understand when a "composite" index is better than a single-column index. More discussion: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
after CREATE
Replace this
KEY ip_addr_index (column1)
with
KEY ip_addr_index (column1, timestamp)
Don't create more than one future partition before it is needed. Always have a LESS THAN (MAXVALUE) partition just in case.
IPv4 can live with VARCHAR(15); IPv6 fits in (39) or `BINARY(16) after packing.
For that one query, 7 queries must be done (one per partition); the results put together, then sorted. Without partitioning, it becomes one query, no sort (since the index is already sorted). So, (I believe) that partitioning slows that query down.
When discussing performance in 123M rows, I need to see all the main queries in one sitting in order to advise. Optimizing for one query is all to likely to de-optimize for some other.
There seems to be no reason to use BIGINT for TIMESTAMP. INT UNSIGNED would save 4 bytes per row of data, plus more for the indexes. Perhaps a total savings of 2GB of disk space. That translates into some speedup for some queries.
If timestamp is always used in a "range", then this index (column2,timestamp,column3,column4) is probably in an inefficient order. Please provide the query that benefits from this index so I can further elaborate.

UNIX_TIMESTAMP field partition for a whole year

I am quite new in the subject of partitions and the necessity has arisen due to the great accumulation of data.
Well, basically it is an access control system, there are currently 20 departments and each department has approximately 100 users. The system records the date and time of the entries and exits (from_date / to_date) My intention is to divide by departments and then for a month throughout the year.
Plan:
Partition the table by [ dep_id and date (from_date and to_date) ]
Problem
I have the following table.
CREATE TABLE `employee` (
`employee_id` smallint(5) NOT NULL,
`dep_id` int(11) NOT NULL,
`from_date` int(11) NOT NULL,
`to_date` int(11) NOT NULL,
KEY `index1` (`employee_id`,`from_date`,`to_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I have the dates (from_date and to_date) in UNIX_TIMESTAMP format (INT 11)
I am looking to divide it during all the months of the year.
it's possible?
Mysql - 5.7
It is possible to use range partitioning on an integer column.
Assuming my_int_col is unix-style integer seconds since 1970-01-01
we could achieve monthly partitions with something like this:
PARTITION BY RANGE (my_int_col)
( PARTITION p20180101 VALUES LESS THAN ( UNIX_TIMESTAMP('2018-01-01 00:00') )
, PARTITION p20180201 VALUES LESS THAN ( UNIX_TIMESTAMP('2018-02-01 00:00') )
, PARTITION p20180301 VALUES LESS THAN ( UNIX_TIMESTAMP('2018-03-01 00:00') )
, PARTITION p20180401 VALUES LESS THAN ( UNIX_TIMESTAMP('2018-04-01 00:00') )
, PARTITION p20180501 VALUES LESS THAN ( UNIX_TIMESTAMP('2018-05-01 00:00') )
, PARTITION p20180601 VALUES LESS THAN ( UNIX_TIMESTAMP('2018-06-01 00:00') )
Be careful of the time_zone setting of the session. Those date literals will be interpreted as values in the current time_zone... e.g. if you want those to be UTC datetime, time_zone should be +00:00.
Or, replace the UNIX_TIMESTAMP() expression with a literal integer value... that's what MySQL is going to do with the UNIX_TIMESTAMP() expressions.
Obviously, you can name the partitions whatever you want.
Note: applying partitioning to an existing table will require MySQL to create an entire copy of the table, holding an exclusive lock on the original table while the operation completes. So you will need sufficient storage (disk) space, and a window of time for the operation to complete.
It's possible to create a new table that is partitioned, and then copy the older data a chunk at a time. But make the chunks reasonably sized, to avoid ballooning the ibdata1 with large transactions. And then do some RENAME TABLE statements to move the old table out, and move the new table in.
Some caveats to note with partitioned tables: there's no foreign key support, and there's no guarantee that partitioned table will give better DML performance than a non-partitioned table.
Strategic indexes and carefully planned queries is the key to performance with "very large" tables. And this is true with partitioned tables as well.
Partitioning isn't a magic bullet for performance problems that some novices would like it to be.
As far as creating subpartitions within partitions, I wouldn't recommend it.

Mysql range partition with range select

I couldn't find an example like mine, so here's the thing:
I have a big data set that I need to aggregate on top of.
We're talking about ~ %500M rows with a date field ranging from 2y ago until now.
My first instinct was to partition the table by this field (creating a partition on the date field), which leaves roughly 20M rows per partition.
Then I have indexes on the other fields I will aggregate/group by.
Here's my table definition (simplified for brevity sake):
create table t1(
date_field datetime not null,
additional_id int not null,
category_id int not null,
value_field1 double,
value_field2 double,
primary key(additional_id,date_field)
)
ENGINE=InnoDB
PARTITION BY RANGE(YEAR(date_field)*100 + MONTH(date_field)) (
PARTITION p_201411 VALUES LESS THAN (201411),
PARTITION p_201412 VALUES LESS THAN (201412),
#all the partitions until the current month...
PARTITION p_201610 VALUES LESS THAN (201610),
PARTITION p_201611 VALUES LESS THAN (201610),
PARTITION p_catchall VALUES LESS THAN MAXVALUE );
If I execute a query that gets a date directly, only the partition for the month is used, based on the output of explain partitions on top of a query such as the following one:
select value_field1 where additional_id=x and date_field='2014-11-05'
However, if I use a date range (even if inside the same partition), all partitions are scanned
select value_field1 where additional_id=x and date_field> '2014-11-05' and date_field <'2014-11-10'
(Same result if I use between).
What am I missing here? Is this really the right way to partition this table?
Thanks in advance
Short answer: Do not use complex expressions for PARTITION BY RANGE.
Long answer: (Aside from criticizing the implementation of BY RANGE with range queries.)
Instead, do this:
PARTITION BY RANGE (TO_DAYS(date_field)) (
PARTITION p_201411 VALUES LESS THAN (TO_DAYS('2014-11-01')),
...
PARTITION p_catchall VALUES LESS THAN MAXVALUE ); -- unchanged
Newer versions of MySQL have slightly more friendly expressions you can use.
If this is your typical query:
additional_id=x and date_field> '2014-11-05'
and date_field <'2014-11-10'
then partitioning is no faster than the equivalent non-partitioned table. You even have the perfect index for the non-partitioned version.
If, on the other hand, you are DROPping old partitions when they 'expire', the PARTITIONing is excellent.
25 partitions is good.
More discussion .
A side note: additional_id int is limited to 2 billion, so you are 1/4 of the way to overflowing. INT UNSIGNED would get you to 4 billion; you might consider an ALTER. (Of course, I don't know whether additional_id is unique in this table; so maybe it is not an issue.)

MySQL table partition by month

I have a huge table that stores many tracked events, such as a user click.
The table is already in the 10s of millions, and it's growing larger every day.
The queries are starting to get slower when I try to fetch events from a large timeframe, and after reading quite a bit on the subject I understand that partitioning the table may boost the performance.
What I want to do is partition the table on a per month basis.
I have only found guides that show how to partition manually each month, is there a way to just tell MySQL to partition by month and it will do that automatically?
If not, what is the command to do it manually considering my partitioned by column is a datetime?
As explained by the manual: http://dev.mysql.com/doc/refman/5.6/en/partitioning-overview.html
This is easily possible by hash partitioning of the month output.
CREATE TABLE ti (id INT, amount DECIMAL(7,2), tr_date DATE)
ENGINE=INNODB
PARTITION BY HASH( MONTH(tr_date) )
PARTITIONS 6;
Do note that this only partitions by month and not by year, also there are only 6 partitions (so 6 months) in this example.
And for partitioning an existing table (manual: https://dev.mysql.com/doc/refman/5.7/en/alter-table-partition-operations.html):
ALTER TABLE ti
PARTITION BY HASH( MONTH(tr_date) )
PARTITIONS 6;
Querying can be done both from the entire table:
SELECT * from ti;
Or from specific partitions:
SELECT * from ti PARTITION (HASH(MONTH(some_date)));
CREATE TABLE `mytable` (
`post_id` int DEFAULT NULL,
`viewid` int DEFAULT NULL,
`user_id` int DEFAULT NULL,
`post_Date` datetime DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
PARTITION BY RANGE (extract(year_month from `post_Date`))
(PARTITION P0 VALUES LESS THAN (202012) ENGINE = InnoDB,
PARTITION P1 VALUES LESS THAN (202104) ENGINE = InnoDB,
PARTITION P2 VALUES LESS THAN (202108) ENGINE = InnoDB,
PARTITION P3 VALUES LESS THAN (202112) ENGINE = InnoDB,
PARTITION P4 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)
Be aware of the "lazy" effect doing it partitioning by hash:
As docs says:
You should also keep in mind that this expression is evaluated each time a row is inserted or updated (or possibly deleted); this means that very complex expressions may give rise to performance issues, particularly when performing operations (such as batch inserts) that affect a great many rows at one time.
The most efficient hashing function is one which operates upon a single table column and whose value increases or decreases consistently with the column value, as this allows for “pruning” on ranges of partitions. That is, the more closely that the expression varies with the value of the column on which it is based, the more efficiently MySQL can use the expression for hash partitioning.
For example, where date_col is a column of type DATE, then the expression TO_DAYS(date_col) is said to vary directly with the value of date_col, because for every change in the value of date_col, the value of the expression changes in a consistent manner. The variance of the expression YEAR(date_col) with respect to date_col is not quite as direct as that of TO_DAYS(date_col), because not every possible change in date_col produces an equivalent change in YEAR(date_col).
HASHing by month with 6 partitions means that two months a year will land in the same partition. What good is that?
Don't bother partitioning, index the table.
Assuming these are the only two queries you use:
SELECT * from ti;
SELECT * from ti PARTITION (HASH(MONTH(some_date)));
then start the PRIMARY KEY with the_date.
The first query simply reads the entire table; no change between partitioned and not.
The second query, assuming you want a single month, not all the months that map into the same partition, would need to be
SELECT * FROM ti WHERE the_date >= '2019-03-01'
AND the_date < '2019-03-01' + INTERVAL 1 MONTH;
If you have other queries, let's see them.
(I have not found any performance justification for ever using PARTITION BY HASH.)