Is it possible to partition a table using 2 columns instead of only 1 for the partition function?
Consider a table with 3 columns
ID (int, primary key,
Date (datetime),
Num (int)
I want to partition this table by 2 columns: Date and Num.
This is what I do to partition a table using 1 column (date):
create PARTITION FUNCTION PFN_MonthRange (datetime)
AS
RANGE left FOR VALUES ('2009-11-30 23:59:59:997',
'2009-12-31 23:59:59:997',
'2010-01-31 23:59:59:997',
'2010-28-02 23:59:59:997',
'2010-03-31 23:59:59:997')
go
Bad News: The partition function has to be defined on a single column.
Good News: That single column could be a persisted computed column that is a combination of the two columns you're trying to partition by.
I found this was an easier solution
select ROW_NUMBER() over (partition by CHECKSUM(value,ID) order by SortOrder) as Row From your_table
Natively, no you can not partition by two columns in SQL Server.
There are a few things you could do, have a lookup table that you use to extract which arbitary integer (partition) each value is within, but you only have 1000 partitions maximum, so they are going to start occupying the same space. The computed column approach suffers this same problem, you have a 1k partition limit, chances are you will blow it.
I would probably just stick to a date partition, and range right on the 1st of the month, instead of ranging left on the last part of the month.
What do you intend to gain from the second partition value?
Related
I couldn't find an example like mine, so here's the thing:
I have a big data set that I need to aggregate on top of.
We're talking about ~ %500M rows with a date field ranging from 2y ago until now.
My first instinct was to partition the table by this field (creating a partition on the date field), which leaves roughly 20M rows per partition.
Then I have indexes on the other fields I will aggregate/group by.
Here's my table definition (simplified for brevity sake):
create table t1(
date_field datetime not null,
additional_id int not null,
category_id int not null,
value_field1 double,
value_field2 double,
primary key(additional_id,date_field)
)
ENGINE=InnoDB
PARTITION BY RANGE(YEAR(date_field)*100 + MONTH(date_field)) (
PARTITION p_201411 VALUES LESS THAN (201411),
PARTITION p_201412 VALUES LESS THAN (201412),
#all the partitions until the current month...
PARTITION p_201610 VALUES LESS THAN (201610),
PARTITION p_201611 VALUES LESS THAN (201610),
PARTITION p_catchall VALUES LESS THAN MAXVALUE );
If I execute a query that gets a date directly, only the partition for the month is used, based on the output of explain partitions on top of a query such as the following one:
select value_field1 where additional_id=x and date_field='2014-11-05'
However, if I use a date range (even if inside the same partition), all partitions are scanned
select value_field1 where additional_id=x and date_field> '2014-11-05' and date_field <'2014-11-10'
(Same result if I use between).
What am I missing here? Is this really the right way to partition this table?
Thanks in advance
Short answer: Do not use complex expressions for PARTITION BY RANGE.
Long answer: (Aside from criticizing the implementation of BY RANGE with range queries.)
Instead, do this:
PARTITION BY RANGE (TO_DAYS(date_field)) (
PARTITION p_201411 VALUES LESS THAN (TO_DAYS('2014-11-01')),
...
PARTITION p_catchall VALUES LESS THAN MAXVALUE ); -- unchanged
Newer versions of MySQL have slightly more friendly expressions you can use.
If this is your typical query:
additional_id=x and date_field> '2014-11-05'
and date_field <'2014-11-10'
then partitioning is no faster than the equivalent non-partitioned table. You even have the perfect index for the non-partitioned version.
If, on the other hand, you are DROPping old partitions when they 'expire', the PARTITIONing is excellent.
25 partitions is good.
More discussion .
A side note: additional_id int is limited to 2 billion, so you are 1/4 of the way to overflowing. INT UNSIGNED would get you to 4 billion; you might consider an ALTER. (Of course, I don't know whether additional_id is unique in this table; so maybe it is not an issue.)
I'm trying to optimize my MySQL DB so I can query it as quickly as possible.
It goes like this:
My DB consists of 1 table that has (for now) about 18 million rows - and growing rapidly.
This table has the following columns - idx, time, tag_id, x, y, z.
No column has any null values.
'idx' is an INT(11) index column, AI and PK. right now it's in ascending order.
'time' is a date-time column. it's also ascending. 50% of the 'time' values in the table are distinct (and the rest of the values will appear probably twice or 3 times at most).
'tag_id' is an INT(11) column. it's not ordered in any way, and there are between 30-100 different possible tag_id values that spread over the whole DB. It's also a foreign key with another table.
INSERT -
A new row is being inserted to the table every 2-3 seconds. 'idx' is calculated by the server (AI). since the 'time' column represents the time the row was inserted, every new 'time' that's inserted will be either higher or equal to the previous row. all the other column values don't have any order.
SELECT -
here is an example of a typical query:
"select x, y, z, time from table where date(time) between '2014-08-01' and '2014-10-01' and tag_id = 123456"
so, 'time' and 'tag_id' are the only columns that appear in the where part, and both of them will ALWAYS appear in the where part of every query. 'x', 'y' and 'z' and 'time' will always appear in the select part. 'tag_id' might also appear in the select part sometimes.
the queries will usually seek higher (more recent) times, rather then the older times. meaning - later rows in the table will be searched more.
INDEXES-
right now, 'idx', being the PK, is the clustered ASC index. 'time' has also a non-clustered ASC index.
That's it. considering all this data, a typical query will return results for me in around 30 seconds. I'm trying to lower this time. any advice??
I'm thinking about changing one or both of the indexes from ASC to DESC (since the higher values are more popular in the search). if I change 'idx' to DESC it will physically reverse the whole table. if I change 'time' to DESC it will reverse the 'time' index tree. but since this is an 18 million row table, changes like this might take a long time for the server so I want to be sure it's a good idea. the question is, if I reverse the order and a new row is inserted, will the server know to put it in the beginning of the table quickly? or will it search the table every time for the place? and will putting a new row in the beginning of the table mean that some kind of data shifting will need to be done to the whole table every time?
Or maybe I just need a different indexing technique??
Any ideas you have are very welcome.. thanks!!
select x, y, z, time from table
where date(time) between '2014-08-01' and '2014-10-01' and tag_id = 123456
Putting a column inside a function call like date(time) spoils any chance of using an index for that column. You must use only a bare column for comparison, if you want to use an index.
So if you want to compare it to dates, you should store a DATE column. If you have a DATETIME column, you may have to use a search term like this:
WHERE `time` >= '2014-08-01 00:00:00 AND `time` < '2014-10-02 00:00:00' ...
Also, you should use multi-column indexes where you can. Put columns used in equality conditions first, then one column used in range conditions. For more on this rule, see my presentation How to Design Indexes, Really.
You may also benefit from adding columns that are not used for searching, so that the query can retrieve the columns from the index entry alone. Put these columns following the columns used for searching or sorting. This is called an index-only query.
So for this query, your index should be:
ALTER TABLE `this_table` ADD INDEX (tag_id, `time`, x, y, z);
Regarding ASC versus DESC, the syntax supports the option for different direction indexes, but in the two most popular storage engines used in MySQL, InnoDB and MyISAM, there is no difference. Either direction of sorting can use either type of index more or less equally well.
I want to partition a table in MySQL while preserving the table's structure.
I have a column, 'Year', based on which I want to split up the table into different tables for each year respectively. The new tables will have names like 'table_2012', 'table_2013' and so on. The resultant tables need to have all the fields exactly as in the source table.
I have tried the following two pieces of SQL script with no success:
1.
CREATE TABLE all_data_table
( column1 int default NULL,
column2 varchar(30) default NULL,
column3 date default NULL
) ENGINE=InnoDB
PARTITION BY RANGE ((year))
(
PARTITION p0 VALUES LESS THAN (2010),
PARTITION p1 VALUES LESS THAN (2011) , PARTITION p2 VALUES LESS THAN (2012) ,
PARTITION p3 VALUES LESS THAN (2013), PARTITION p4 VALUES LESS THAN MAXVALUE
);
2.
ALTER TABLE all_data_table PARTITION BY RANGE COLUMNS (`year`) (
PARTITION p0 VALUES LESS THAN (2011),
PARTITION p1 VALUES LESS THAN (2012),
PARTITION p2 VALUES LESS THAN (2013),
PARTITION p3 VALUES LESS THAN (MAXVALUE)
);
Any assistance would be appreciated!
This is old, but seeing as it comes up highly ranked in partitioning searches, I figured I'd give some additional details for people who might hit this page. What you are talking about in having a table_2012 and table_2013 is not "MySQL Partitioning" but "Manual Partitioning".
Partitioning means that you have one "logical table" with a single table name, which--behind the scenes--is divided among multiple files. When you have millions to billions of rows, over years, but typically you are only searching a single month, partitioning by Year/Month can have a great performance benefit because MySQL only has to search against the file that contains the Year/Month that you are searching for...so long as you include the partition key in your WHERE.
When you create multiple tables like table_2012 and table_2013, you are MANUALLY partitioning the tables, which you don't do with the MySQL PARTITION configuration. To manually partition the tables, during 2012, you put all data into the 2012 table. When you hit 2013, you start putting all the data into the 2013 table. You have to make sure to create the table before you hit 2013 or it won't have any place to go. Then, when you query across the years (e.g. from Nov 2012 - Jan 2013), you have to do a UNION between table_2012 and table_2013.
SELECT * FROM table_2012 WHERE #...
UNION
SELECT * FROM table_2013 WHERE #...
With partitioning, this manual work is not necessary. You do the initial setup of the partitions, then you treat is as a single table. No unions required, no checking the date before you insert, etc. This makes life much easier. MySQL handles figuring out what tables it needs to query. However, you MUST make sure to query against the Year column or it will have to scan ALL files. E.g. SELECT * FROM all_data_table WHERE Month=12 will scan all partitions for Month=12. To ensure you are only scanning the partition files that you need to scan, you want to make sure to include the partition column in every query that you can.
Possible negatives to partitioning...if you have billions of rows and you do an ALTER TABLE on the table to--say--add a column...it's going to have to update every row taking a VERY long time. At the company I currently work for, the boss doesn't think it's worth the time it takes to update the billion rows historically when we are adding a new column for going forward...so this is one of the reasons we do manual partitioning instead of letting MySQL do it.
DISCLAIMER: I am not an expert at partitioning...so if I'm wrong in any of this, please let me know and I'll fix the incorrect parts.
From what I see you want to create many tables from one big table.
I think you should try to create views instead.
Since from what I look around about partitioning, it actually partitions the physical storage of that table and then store them separately. But if you see from the top perspective you will see them as a single table.
I have a table with an integer column ranging from 1 to 32 (this column identify the type of record stored).
The types 5 and 12 represents 70% of the total number of rows, and this number is greater than 1M rows, so it seems to makes sense to partition the table.
Question is: how can I create a set of 3 partitions, one containing the type 5 records, the second containing the type 12 records, and the third one with the remaining records?
http://dev.mysql.com/doc/refman/5.1/en/partitioning-list.html
create table some_table (
id INT NOT NULL,
some_id INT NOT NULL
)
PARTITION BY LIST(some_id) (
PARTITION fives VALUES IN (5),
PARTITION twelves VALUES IN (12),
PARTITION rest VALUES IN (1,2,3,4,6,7,8,9,10,11,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32)
);
Use Partition by list
Provided that type is an index, then MySQL has already logically partitioned the table for you. Unless you really need physical partitioning, it seems to me you are only making trouble for yourself.
Trying to implement a partition strategy for a MySQL 5.5 (InnoDB) table and I am not sure my understanding is right or if I need to change the syntax in creating the partition.
Table "Apple" has 10 mill rows...Columns "A" to "H"
PK is columns "A", "B" and "C"
Column "A" is a char column and can identify groups of 2 million rows.
I thought column "A" would be a nice candidate to try and implement a partition around since
I select and delete by this column and could really just truncate the partition when the data is no longer needed.
I issued this command:
ALTER TABLE Apple
PARTITION BY KEY (A);
After looking at the partition info using this command:
SELECT PARTITION_NAME, TABLE_ROWS FROM
INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME = 'Apple';
I see all the data is on partition p0
I am wrong in thinking that MySQL was going to break out the partitions in groups of 2 million automagically?
Did I need to specify the number of partitions in the Alter command?
I was hoping this would create groups of 2 million rows in a partition and then create a new partition as new data comes in with a unique value for column "A".
Sorry if this was too wordy.
Thanks - JeffSpicoli
Yes, you need to specify the number of partitions (I assume the default was to create 1 partition). Partition by KEY uses internal hashing function http://dev.mysql.com/doc/refman/5.1/en/partitioning-key.html , so the partition is not selected based on the value of column, but on hash computed from it. Hashing functions return the same result for same input, so yes, all rows having the same value will be in the same partition.
But maybe you want to partition by RANGE if you want to be able to DROP PARTITION (because if partitioned by KEY, you only know that the rows are spaced evenly in the partitions, but you many different values end up in the same partition).