Partitioning table on YEAR and create view in MYSQL - mysql

I have 2 problems with a partitioned table in mysql.
My table has three columns
id_row INT NOT NULL AUTO_INCREMENT
name_element VARCHAR(45) NULL
date_element DATETIME NOT NULL
I modify the table to apply partioning by range on YEAR(date_element) as follows
ALTER TABLE `orderslist`
PARTITION BY RANGE(YEAR(date_element))
PARTITIONS 5(
PARTITION part_2013 VALUES LESS THAN (2014),
PARTITION part_2014 VALUES LESS THAN (2015),
PARTITION part_2015 VALUES LESS THAN (2016),
PARTITION part_2016 VALUES LESS THAN (2017),
PARTITION part_2017 VALUES LESS THAN (MAXVALUE));
but when I use
EXPLAIN PARTITIONS SELECT * FROM ordersList WHERE YEAR(date_element) > '2015';
the query uses all the partitions and not only part_2015, part_2016 and part_2017.
Instead if I use
EXPLAIN PARTITIONS SELECT * FROM ordersList WHERE date_element > '2015-10-10 10:00:00';
it works.
So my questions are:
How can I make the first query work?
Is there a way to create a materialized view from this table without losing the partitions?
Thank you

In your first example: EXPLAIN PARTITIONS SELECT * FROM ordersList WHERE YEAR(date_element) > '2015'; there's no way for the engine to identify beforehand in which partition your data is.
It must evaluate YEAR(date_element) in every row to find out the year. It's a classic example of filtering by a function's result. DBMS in general can't use indexes to find data this way, since the function's result is unknown and must be evaluated for every table, so your search turns into a full scan.
I understand your point here, since you used the same function the define partitioning and to find data, but for some reason this optimization is not there. In other words: the engine doesn't notice both functions are the same.
In the second statement, you're directly comparing a column to an arbitrary value, this is what the engine prefers, and indexes come into play.

MySQL's PARTITIONing is quite finicky. Whereas YEAR() is recognized, it is probably the only expression that is recognized, not > it plays dumb.
Why are you partitioning on YEAR? it may not be useful.
If your queries are like what you described. then an appropriate index on a non-partitioned table is likely to run just as fast.
Please provide the important queries and SHOW CREATE TABLE (with or without partitioning) so we can analyze what makes the most sense.
Also, what is PARTITIONS 5??

Related

MySQL: How avoid all partitions scan (year-based) when doing ID lookup?

In case I have a table partitioned by year; how do I avoid the scanning of all partitions when I have to lookup a row by its ID and can't use partition pruning in the lookup query?
CREATE TABLE part_table (
id bigint NOT NULL auto_increment,
moment datetime NOT NULL,
KEY (id),
KEY (moment)
)-- partitioning information (in years)
PARTITION BY RANGE( YEAR(moment) ) (
PARTITION p2020 VALUES LESS THAN (2021),
PARTITION p2021 VALUES LESS THAN (2022),
PARTITION p2022 VALUES LESS THAN (2023),
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p2025 VALUES LESS THAN (2026),
PARTITION pFuture VALUES LESS THAN (maxvalue) )
;
With e.g. lookup query:
SELECT * FROM part_table WHERE ID = <nr>
Don't you want PRIMARY KEY(id, moment) or PRIMARY KEY(moment, id) instead of INDEX(id)?
Indexes are partitioned. Each partition is essentially a "table". It has a `BTree for the data and PK, and a BTree for each secondary index.
So, to find id=123 requires checking INDEX(id) in each partition. Herein lies one of the reasons why a PARTITIONed table is sometimes slower than the equivalent non-partitioned table.
It is inefficient to pre-create future partitions (other than one).
Show us the main queries you have. I will probably explain why you should not partition the table. I see two possible benefits in your definition:
Dropping 'old' data is much faster than DELETEing it.
`WHERE something-else AND moment between ..
Some cases
For this discussion, I assuming partitioning by a datetime in some fashion (BY RANGE(TO_DAYS(moment)) or BY ... (YEAR(moment)), etc).
WHERE id BETWEEN 111 and 222
Partitioning probably hurts slightly because, regardless of what indexes are available, the query must look in every partition.
WHERE id BETWEEN 111 and 222
AND moment > NOW() - INTERVAL 1 MONTH
with some index starting with `id`
This is a case where partition "pruning" is beneficial. It will look in one or two partitions (depending on whether or not the query is being run in January). Then it will somewhat efficiently use the index to lookup by id.
Now let be discuss two flavors if an index starting with id (and assuming either of the WHERE clauses, above:
PRIMARY KEY(id, moment)
The PK is "clustered" with the data. That is, the data is sorted by first id then moment. Hence the id BETWEEN... will find the rows consecutively in the BTree -- this is the most efficient. The AND moment... works to filter out some of the rows.
INDEX(id)
is not "clustered". It is a secondary index. Secondary indexes take two steps. (1) search the secondary BTree for the ids, but without filtering by moment; (2) reach into the data BTree using the artificial PK that was provided for you; (3) now the filtering by moment can happen. More steps, more blocks to read, etc.
DROP PARTITION p2020
id much faster and less invasive than `DELETE .. WHERE moment < '2021-01-01'.
More
It is important to look at all the main queries. X=constant versus X BETWEEN... can make a big difference in optimization; please provide concrete examples that are realistic for your app.
Also, sometimes a "covering" index can make up for otherwise inefficient indexes. So those examples need to show all the columns in the important queries. And what datatypes they are.
In the absence of such details, I will make the following broad statements (which might be invalidated by the specifics):
If the WHERE references only one column, the PARTITIONing is probably never beneficial.
If the WHERE has one = test and one 'range' test, there is probably a composite index that will work much better than partitioning.
Partitioning may shine when there are two range tests, but only if 'pruning' can be applied. (There are a lot of limitations on pruning.)
With 2 ranges, the one that is not being pruned on should be at the beginning of the PRIMARY KEY.
When pruning is used but the rest of the WHERE cannot use some index, that implies a scan of the partition. If there are only a few partitions, that could be a big scan.
Don't pre-build more than one partition. When not pruning, it is somewhat costly to open all the partitions only to find some are empty.

MySQL - Access Partitions Through Views

I've created a view on partitioned table. When I pass the partitioned column to the SELECT statement of view, the optimizer is not going to that particular partition when checked through EXPLAIN statement.
Is there any way to make the view access a single partition of its table?
[Edit] : Here is how I created the view on two partitioned tables
CREATE TABLE Partition1 (ID INT,NAME VARCHAR(100),DOB DATE)
PARTITION BY LIST (YEAR(DOB))
(
PARTITION P_2000 VALUES IN (2000),
PARTITION P_2001 VALUES IN (2001)
);
CREATE TABLE NOPART (ID INT,DOB DATE)
PARTITION BY LIST (YEAR(DOB))
(
PARTITION P_2000 VALUES IN (2000),
PARTITION P_2001 VALUES IN (2001)
);
CREATE OR REPLACE VIEW P_VIEW
AS
SELECT ID,DOB
FROM PARTITION1
UNION
SELECT ID,DOB
FROM NOPART;
EXPLAIN
SELECT * FROM P_VIEW
WHERE DOB = '2001-01-01';
When I run the "Explain" it shows optimizer is going to both partitions "p_2000" and "p_2001".
There are many deficiencies in the implementation of VIEWs. You may have hit one.
There are many uses of PARTITIONing that do not provide any performance. BY RANGE is probably the only variant that helps performance for some use cases. A table with less than a million rows is not worth partitioning.
Without seeing your CREATE TABLE, CREATE VIEW, and SELECT, we can only give you vague answers like I have.
(Responding to added code) Unless there is more to it than that, PARTITIONing in that way provide no benefit over having an index on DOB.
Furthermore, The VIEW + PARTITION approach (without an index) must scan the entire 2001 partition looking for the few rows for '2001-01-01'. Instead the simple index approach can find them immediately -- 365 times as fast. (OK, not really that much faster, but still.)

how does MySQL know which partition to look up?

Let's analyse the simplest possible example of MySQL paritioning by hash (slightly modified version of http://dev.mysql.com/doc/refman/5.5/en/alter-table-partition-operations.html):
CREATE TABLE t1 (
id INT,
year_col INT
);
ALTER TABLE t1
PARTITION BY HASH(year_col)
PARTITIONS 8;
Let's say we put there millions of records. The question is - if a specific query comes (e.g. SELECT * FROM t1 WHERE year_col = 5) then how does MySQL know which partition to look up? There are 8 partitions. I guess that the hash function is calculated and MySQL recognizes that it matches thepartitioning key and then MySQL knows which one that is. But what is the query is SELECT * FROM t1 WHERE year_col IN (5, 45, 5435)? How about other non-trivial queries? Is there any general algorithm for that?
This is called Partition pruning:
The optimizer can perform pruning whenever a WHERE condition can be reduced to either one of the following two cases:
partition_column = constant
partition_column IN (constant1, constant2, ..., constantN)
In the first case, the optimizer simply evaluates the partitioning expression for the value given, determines which partition contains that value, and scans only this partition. (...)
In the second case, the optimizer evaluates the partitioning expression for each value in the list, creates a list of matching partitions, and then scans only the partitions in this partition list. (...)
MySQL can apply partition pruning to SELECT, DELETE, and UPDATE statements. INSERT statements currently cannot be pruned.
Pruning can also be applied to short ranges, which the optimizer can convert into equivalent lists of values. (...)

Partitioning a MySQL table based on a column value.

I want to partition a table in MySQL while preserving the table's structure.
I have a column, 'Year', based on which I want to split up the table into different tables for each year respectively. The new tables will have names like 'table_2012', 'table_2013' and so on. The resultant tables need to have all the fields exactly as in the source table.
I have tried the following two pieces of SQL script with no success:
1.
CREATE TABLE all_data_table
( column1 int default NULL,
column2 varchar(30) default NULL,
column3 date default NULL
) ENGINE=InnoDB
PARTITION BY RANGE ((year))
(
PARTITION p0 VALUES LESS THAN (2010),
PARTITION p1 VALUES LESS THAN (2011) , PARTITION p2 VALUES LESS THAN (2012) ,
PARTITION p3 VALUES LESS THAN (2013), PARTITION p4 VALUES LESS THAN MAXVALUE
);
2.
ALTER TABLE all_data_table PARTITION BY RANGE COLUMNS (`year`) (
PARTITION p0 VALUES LESS THAN (2011),
PARTITION p1 VALUES LESS THAN (2012),
PARTITION p2 VALUES LESS THAN (2013),
PARTITION p3 VALUES LESS THAN (MAXVALUE)
);
Any assistance would be appreciated!
This is old, but seeing as it comes up highly ranked in partitioning searches, I figured I'd give some additional details for people who might hit this page. What you are talking about in having a table_2012 and table_2013 is not "MySQL Partitioning" but "Manual Partitioning".
Partitioning means that you have one "logical table" with a single table name, which--behind the scenes--is divided among multiple files. When you have millions to billions of rows, over years, but typically you are only searching a single month, partitioning by Year/Month can have a great performance benefit because MySQL only has to search against the file that contains the Year/Month that you are searching for...so long as you include the partition key in your WHERE.
When you create multiple tables like table_2012 and table_2013, you are MANUALLY partitioning the tables, which you don't do with the MySQL PARTITION configuration. To manually partition the tables, during 2012, you put all data into the 2012 table. When you hit 2013, you start putting all the data into the 2013 table. You have to make sure to create the table before you hit 2013 or it won't have any place to go. Then, when you query across the years (e.g. from Nov 2012 - Jan 2013), you have to do a UNION between table_2012 and table_2013.
SELECT * FROM table_2012 WHERE #...
UNION
SELECT * FROM table_2013 WHERE #...
With partitioning, this manual work is not necessary. You do the initial setup of the partitions, then you treat is as a single table. No unions required, no checking the date before you insert, etc. This makes life much easier. MySQL handles figuring out what tables it needs to query. However, you MUST make sure to query against the Year column or it will have to scan ALL files. E.g. SELECT * FROM all_data_table WHERE Month=12 will scan all partitions for Month=12. To ensure you are only scanning the partition files that you need to scan, you want to make sure to include the partition column in every query that you can.
Possible negatives to partitioning...if you have billions of rows and you do an ALTER TABLE on the table to--say--add a column...it's going to have to update every row taking a VERY long time. At the company I currently work for, the boss doesn't think it's worth the time it takes to update the billion rows historically when we are adding a new column for going forward...so this is one of the reasons we do manual partitioning instead of letting MySQL do it.
DISCLAIMER: I am not an expert at partitioning...so if I'm wrong in any of this, please let me know and I'll fix the incorrect parts.
From what I see you want to create many tables from one big table.
I think you should try to create views instead.
Since from what I look around about partitioning, it actually partitions the physical storage of that table and then store them separately. But if you see from the top perspective you will see them as a single table.

MySQL Partitioning and Unix Timestamp

I've just started reading on MySQL partitions, they kind of look too good to be true, please bear with me.
I have a table which I would like to partition (which I hope would bring better performance).
This is the case / question:
We have a column which stores Unix timestamp values, is it possible to partition the table in that way, that based on the unix timestamp the partitions are separated on a single date? Or do I have to use range based partitioning by defining the ranges before?
Cheers
You can do whatever you feel like, See: http://dev.mysql.com/doc/refman/5.5/en/partitioning-types.html
And example of partitioning by unix_timestamp would be:
ALTER TABLE table1 PARTITION BY KEY myINT11timestamp PARTITIONS 1000;
-- or
ALTER TABLE table1 PARTITION BY HASH (myINT11timestamp/1000) PARTITIONS 10;
Everything you wanted to know about partitions in MySQL 5.5: http://dev.mysql.com/tech-resources/articles/mysql_55_partitioning.html