SQL Partitioning of a very large table - mysql

I'm trying to partition my very large MySQL table called companyScores (60million rows and 50 columns).
Basically, the table features companies (with the column varchar "company_idx" with unique IDs going from 0 to 10,000 companies) and their respective timestamp (with the column "timestamp") and scores "Scores" (with the column "Scores").
I'd like to include around 500 companies into each partition.
Please let me know if the following would do the job?
ALTER TABLE `companyScores`
PARTITION BY RANGE( company_idx ) (
PARTITION p0 VALUES LESS THAN (500),
PARTITION p1 VALUES LESS THAN (1000),
PARTITION p2 VALUES LESS THAN (1500),
PARTITION p3 VALUES LESS THAN (2000),
and so on...
);
Would the above work?
Also, can we easily insert new values into this database once it has been partitioned?

Would the above work?
No. For several reasons.
If company_idx is a varchar, you need to use RANGE COLUMNS. The RANGE partitioning only works on integers. If you try to use RANGE partitioning on a varchar, you get this error:
ERROR 1659 (HY000): Field 'company_idx' is of a not allowed type for this type of partitioning
Assuming you correct that, you have another problem:
Your partition clauses use integer values, not quoted string values. Those are different types, and the partitioning engine won't use them for defining partitions. If you try, you'll this this error:
ERROR 1654 (HY000): Partition column values of incorrect type
Assuming you correct that by quoting the numbers, you have another problem:
You list the partition for 500 before the string 1000, but the string '500' should come after the string '1000' lexically. RANGE or RANGE COLUMNS partitions must be declared in increasing order. If you try to do it in the order you have, you'll get this error:
ERROR 1493 (HY000): VALUES LESS THAN value must be strictly increasing for each partition
Assuming you correct the order, it works, but it might not do what you want:
CREATE TABLE `companyScores` (
`company_idx` varchar(10) NOT NULL,
PRIMARY KEY (`company_idx`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
/*!50500 PARTITION BY RANGE COLUMNS(company_idx)
(PARTITION p1 VALUES LESS THAN ('1000') ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN ('1500') ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN ('2000') ENGINE = InnoDB,
PARTITION p0 VALUES LESS THAN ('500') ENGINE = InnoDB) */
Now another question you asked:
Also, can we easily insert new values into this database once it has been partitioned?
If you insert a new value that isn't covered by the partitions you defined, you'll get this error:
mysql> insert into companyScores set company_idx = '700';
ERROR 1526 (HY000): Table has no partition for value from column_list
Why is that? You have a partition for company_idx less than 1000 right?
No. You have a partition for company_idx less than the string '1000'. You tried to insert the string '700', which is lexically greater than '500', as well as all the other partitions. Therefore it's beyond any of the partitions defined.
You could solve all of the above problems if you change your customer_idx to an integer column.

Related

Using MySQL partitioning by an AUTO INCREMENT field, how can I guarantee that INSERT/LOAD DATA statements are only accessing specified partitions?

General context
I want to be able to tell, when inserting into non-balanced RANGE-partitioned MySQL tables with AUTO INCREMENT primary keys, whether my inserts are causing MySQL to communicate in any way with partitions other than the ones I specify. This is useful for budgeting future capacity for large-scale data loading; with that assurance, I could much more accurately predict that performance and hardware resource cost of loading data into the database.
I am using MySQL 5.6.
Specific context
Say I have the following table in MySQL (5.6):
CREATE TABLE foo (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`data` varchar(6) COLLATE utf8_bin NOT NULL
) ENGINE=InnoDB AUTO_INCREMENT=9001 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
/*!12345 PARTITION BY RANGE (id)
(PARTITION cold VALUES LESS THAN (8000) ENGINE = InnoDB,
PARTITION hot VALUES LESS THAN (9000) ENGINE = InnoDB,
PARTITION overflow VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
Assume the table is not sparse: no rows have been deleted, so count(*) = max(id) = 9001.
Questions
If I do INSERT INTO foo (data) PARTITION (hot) VALUES ('abc') or an equivalent LOAD DATA statement with the PARTITION clause included, are any partitions other than the selected hot partition being accessed?
How would I tell what partitions are being accessed by those DML statements?
What I've tried
The MySQL documentation on partition selection says:
REPLACE and INSERT now lock only those partitions having rows to be
inserted or replaced. However, if an AUTO_INCREMENT value is generated
for any partitioning column then all partitions are locked.
Additionally, it says:
Locks imposed by LOAD DATA statements on partitioned tables cannot be
pruned.
Those statements don't help clarify which partitions are being accessed by DML queries which explicitly specify the partition.
I've tried doing EXPLAIN PARTITIONS INSERT INTO foo ..., but the partitions column of the output is always NULL.
According to the documentation,
For statements that insert rows, the behavior differs in that failure to find a suitable partition causes the statement to fail. This is true for both INSERT and REPLACE statements
So when you try to insert a row that does not match your specified partition, you'll receive
Error Code: 1748. Found a row not matching the given partition set
This including statements where some rows match and some don't,
so you cannot use this to fill "hot" and throw away rows that would go into "overflow" (as the whole query will fail).
The explain-otuput for MySQL 5.6 does not include a seperate row for insert; the value for partition relates to the source of the data you insert (in cases where you e.g. use insert ... select ... partition ...), even if you use values() (then you use "no table", and the relevant partition is just null). For MySQL 5.7+, there is an "insert"-type, and it would indeed list only your specified partition.

MySQL Partitioning Error - Error Code : 1486

MySQL throwing error while creating partitions on table.
Error Code : 1486
Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed.
I have tried following query :
alter table test.tbl1
partition by range(unix_timestamp(sys_time))
(
PARTITION p20151001 VALUES LESS THAN (unix_timestamp('2015-10-01')),
PARTITION p20151101 VALUES LESS THAN (unix_timestamp('2015-11-01')),
PARTITION p20151201 VALUES LESS THAN (unix_timestamp('2015-12-01')),
PARTITION p20160101 VALUES LESS THAN (unix_timestamp('2016-01-01')),
PARTITION p20160201 VALUES LESS THAN (unix_timestamp('2016-02-01')),
PARTITION p20160301 VALUES LESS THAN (unix_timestamp('2016-03-01'))
);
How can I round this problem.
Thanks in Advance
Reading here it may be possible that you are using MYSQL 5.1:
https://dev.mysql.com/tech-resources/articles/mysql_55_partitioning.html
Another pain point in MySQL 5.1 is the handling of date columns. You
can't use them directly, but you need to convert such columns using
either YEAR or TO_DAYS
If your column sys_time is a DATETIME, you dont need to specify the timestamp in order to partition it, you just need to do TO_DAYS, since you're not doing it by year:
alter table test.tbl1
partition by range (TO_DAYS(sys_time))
(
PARTITION p20151001 VALUES LESS THAN (TO_DAYS('2015-10-01')),
PARTITION p20151101 VALUES LESS THAN (TO_DAYS('2015-11-01')),
PARTITION p20151201 VALUES LESS THAN (TO_DAYS('2015-12-01')),
PARTITION p20160101 VALUES LESS THAN (TO_DAYS('2016-01-01')),
PARTITION p20160201 VALUES LESS THAN (TO_DAYS('2016-02-01')),
PARTITION p20160301 VALUES LESS THAN (TO_DAYS('2016-03-01'))
);
if sys_time is a TIMESTAMP then you dont need to convert your timestamp to a timestamp, I have taken that out of the range parameter:
alter table test.tbl1
partition by range(sys_time)
(
PARTITION p20151001 VALUES LESS THAN (unix_timestamp('2015-10-01')),
PARTITION p20151101 VALUES LESS THAN (unix_timestamp('2015-11-01')),
PARTITION p20151201 VALUES LESS THAN (unix_timestamp('2015-12-01')),
PARTITION p20160101 VALUES LESS THAN (unix_timestamp('2016-01-01')),
PARTITION p20160201 VALUES LESS THAN (unix_timestamp('2016-02-01')),
PARTITION p20160301 VALUES LESS THAN (unix_timestamp('2016-03-01'))
);

MySQL partitioning by range - error in statement?

I try to alter an existing table by adding partitioning, but get SQL errors although it looks like the docu says.
Hopefully somebody can point out my mistake.
The table orders has a field called date_order_start, which is DATE, so it has no time information. This field has an index on it. This index is not unique and not part of another unique index.
I want to partition my table by using this statement:
ALTER TABLE orders
PARTITION BY RANGE (date_order_start) (
startpoint VALUES LESS THAN (0),
from20140701 VALUES LESS THAN ('2014-07-01'),
from20140801 VALUES LESS THAN ('2014-08-01'),
from20140901 VALUES LESS THAN ('2014-09-01'),
future VALUES LESS THAN MAXVALUE
);
Error....
Before I tried this:
ALTER TABLE orders
PARTITION BY RANGE (TO_DAYS(date_order_start)) (
startpoint VALUES LESS THAN (0),
from20140701 VALUES LESS THAN (TO_DAYS('2014-07-01')),
from20140801 VALUES LESS THAN (TO_DAYS('2014-08-01')),
from20140901 VALUES LESS THAN (TO_DAYS('2014-09-01')),
future VALUES LESS THAN MAXVALUE
);
But also got an error:
**Error Code: 1064**. You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'from20140701 VALUES LESS THAN ('2014-07-01'), from20140801 VALUES LESS T' at line 4
Well.... that does not help.
Can anybody spot the error?
Also variation without the startpoint statement didn't work. I thought maybe the (0) was the problem.
I used these pages for information:
http://dev.mysql.com/tech-resources/articles/mysql_55_partitioning.html
http://dev.mysql.com/doc/refman/5.5/en/alter-table-partition-operations.html
I'm wondering if you're just missing the partition keyword:
ALTER TABLE orders
PARTITION BY RANGE (date_order_start) (
PARTITION startpoint VALUES LESS THAN (0),
PARTITION from20140701 VALUES LESS THAN ('2014-07-01'),
PARTITION from20140801 VALUES LESS THAN ('2014-08-01'),
PARTITION from20140901 VALUES LESS THAN ('2014-09-01'),
PARTITION future VALUES LESS THAN MAXVALUE
);
Also, is the VALUES LESS THAN (0) part really necessary?

MySQL table partition by month

I have a huge table that stores many tracked events, such as a user click.
The table is already in the 10s of millions, and it's growing larger every day.
The queries are starting to get slower when I try to fetch events from a large timeframe, and after reading quite a bit on the subject I understand that partitioning the table may boost the performance.
What I want to do is partition the table on a per month basis.
I have only found guides that show how to partition manually each month, is there a way to just tell MySQL to partition by month and it will do that automatically?
If not, what is the command to do it manually considering my partitioned by column is a datetime?
As explained by the manual: http://dev.mysql.com/doc/refman/5.6/en/partitioning-overview.html
This is easily possible by hash partitioning of the month output.
CREATE TABLE ti (id INT, amount DECIMAL(7,2), tr_date DATE)
ENGINE=INNODB
PARTITION BY HASH( MONTH(tr_date) )
PARTITIONS 6;
Do note that this only partitions by month and not by year, also there are only 6 partitions (so 6 months) in this example.
And for partitioning an existing table (manual: https://dev.mysql.com/doc/refman/5.7/en/alter-table-partition-operations.html):
ALTER TABLE ti
PARTITION BY HASH( MONTH(tr_date) )
PARTITIONS 6;
Querying can be done both from the entire table:
SELECT * from ti;
Or from specific partitions:
SELECT * from ti PARTITION (HASH(MONTH(some_date)));
CREATE TABLE `mytable` (
`post_id` int DEFAULT NULL,
`viewid` int DEFAULT NULL,
`user_id` int DEFAULT NULL,
`post_Date` datetime DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
PARTITION BY RANGE (extract(year_month from `post_Date`))
(PARTITION P0 VALUES LESS THAN (202012) ENGINE = InnoDB,
PARTITION P1 VALUES LESS THAN (202104) ENGINE = InnoDB,
PARTITION P2 VALUES LESS THAN (202108) ENGINE = InnoDB,
PARTITION P3 VALUES LESS THAN (202112) ENGINE = InnoDB,
PARTITION P4 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)
Be aware of the "lazy" effect doing it partitioning by hash:
As docs says:
You should also keep in mind that this expression is evaluated each time a row is inserted or updated (or possibly deleted); this means that very complex expressions may give rise to performance issues, particularly when performing operations (such as batch inserts) that affect a great many rows at one time.
The most efficient hashing function is one which operates upon a single table column and whose value increases or decreases consistently with the column value, as this allows for “pruning” on ranges of partitions. That is, the more closely that the expression varies with the value of the column on which it is based, the more efficiently MySQL can use the expression for hash partitioning.
For example, where date_col is a column of type DATE, then the expression TO_DAYS(date_col) is said to vary directly with the value of date_col, because for every change in the value of date_col, the value of the expression changes in a consistent manner. The variance of the expression YEAR(date_col) with respect to date_col is not quite as direct as that of TO_DAYS(date_col), because not every possible change in date_col produces an equivalent change in YEAR(date_col).
HASHing by month with 6 partitions means that two months a year will land in the same partition. What good is that?
Don't bother partitioning, index the table.
Assuming these are the only two queries you use:
SELECT * from ti;
SELECT * from ti PARTITION (HASH(MONTH(some_date)));
then start the PRIMARY KEY with the_date.
The first query simply reads the entire table; no change between partitioned and not.
The second query, assuming you want a single month, not all the months that map into the same partition, would need to be
SELECT * FROM ti WHERE the_date >= '2019-03-01'
AND the_date < '2019-03-01' + INTERVAL 1 MONTH;
If you have other queries, let's see them.
(I have not found any performance justification for ever using PARTITION BY HASH.)

Error #1526 when partitioning table on mysql

Sorry, I don't know English, but I need help :(
I'm using partitioning by LIST COLUMNS by ALTER TABLE statement
My table :
table member_list:
id int,
name varchar(255),
company varchar(255),
cell_phone varchar(20)
It's haven't key
I have more than 900.000 records in the current. After inserting, I tried partitioning table by LIST COLUMNS :
alter table member_list
partition by list columns(company)(
partition p1 values in ('Lavasoft','Cakewalk','Lycos'),
partition p2 values in ('Adobe','Vivoo','Apple Systems','Sibelius'),
partition p3 values in ('Finale','Borland','Macromedia','FPT'),
partition p4 values in ('Chami','Yahoo','Google','Altavista')
)
After runned :
#1526 - Table has no partition for value from column_list
MySQL returned me this error, I can not find support from Oracle page. I hope you will help me. Thanks
#1526 - Table has no partition for value from column_list
The error message is telling you that there is a value in your data in one of the columns you have chosen for partitioning that is not accounted for in your defined partitions.
In this case, there is something in the "company" field that cannot be placed into any of the partitions. For instance, on some record, company="Blackberry." MySQL cannot put this row into any of your partitions.
LIST partitioning allow only Integer values. If you want to use columns with varchar partitioning use HASH or KEY PARTITIONS. Besides partition can only be used on columns that have primary or unique attribute.