Let's analyse the simplest possible example of MySQL paritioning by hash (slightly modified version of http://dev.mysql.com/doc/refman/5.5/en/alter-table-partition-operations.html):
CREATE TABLE t1 (
id INT,
year_col INT
);
ALTER TABLE t1
PARTITION BY HASH(year_col)
PARTITIONS 8;
Let's say we put there millions of records. The question is - if a specific query comes (e.g. SELECT * FROM t1 WHERE year_col = 5) then how does MySQL know which partition to look up? There are 8 partitions. I guess that the hash function is calculated and MySQL recognizes that it matches thepartitioning key and then MySQL knows which one that is. But what is the query is SELECT * FROM t1 WHERE year_col IN (5, 45, 5435)? How about other non-trivial queries? Is there any general algorithm for that?
This is called Partition pruning:
The optimizer can perform pruning whenever a WHERE condition can be reduced to either one of the following two cases:
partition_column = constant
partition_column IN (constant1, constant2, ..., constantN)
In the first case, the optimizer simply evaluates the partitioning expression for the value given, determines which partition contains that value, and scans only this partition. (...)
In the second case, the optimizer evaluates the partitioning expression for each value in the list, creates a list of matching partitions, and then scans only the partitions in this partition list. (...)
MySQL can apply partition pruning to SELECT, DELETE, and UPDATE statements. INSERT statements currently cannot be pruned.
Pruning can also be applied to short ranges, which the optimizer can convert into equivalent lists of values. (...)
Related
Suppose I have a MySQL table, with a indexed field called balance. However, the table contains 95% of rows with balance = 0. So if I was to go:
Select * from mytable where balance > 0.02
the query would take quite a while, if the table had 1mil+ rows, as the BTree index does not have a distinct set of values for balance.
In this situation, without changing the data, how would one optimize the SQL query
First, your query is likely to be returning a lot of rows. That is going to take time.
If you only need a few, you can add limit:
Select *
from mytable
where balance > 0.02
limit 100;
Second, if you have any particularly large columns, then those could dominate the time for returning rows. If this is an issue, then select only the columns you really need.
Third, an index might help. If very few rows satisfy the where clause then an index on balance should speed the query. However, if a lot of rows match the filter condition, then you are returning a lot of data -- and that can take time.
Also, this assumes that something called mytable is really a table. If it is a view, then all bets are off. You need to optimize the view and not the query.
This is a radical approach, but if this query is very critical you could add a partition to the table in the balance field:
EDIT: For some reason MySQL partition are restricted to INT values, maybe this workaround will work:
ALTER TABLE mytable
PARTITION BY RANGE( CEILING(balance) ) (
PARTITION p0 VALUES LESS THAN (1),
PARTITION p1 VALUES LESS THAN MAXVALUE
);
NOTE: This approach will only work if balance is declared as a Decimal type, not a Float type.
I have 2 problems with a partitioned table in mysql.
My table has three columns
id_row INT NOT NULL AUTO_INCREMENT
name_element VARCHAR(45) NULL
date_element DATETIME NOT NULL
I modify the table to apply partioning by range on YEAR(date_element) as follows
ALTER TABLE `orderslist`
PARTITION BY RANGE(YEAR(date_element))
PARTITIONS 5(
PARTITION part_2013 VALUES LESS THAN (2014),
PARTITION part_2014 VALUES LESS THAN (2015),
PARTITION part_2015 VALUES LESS THAN (2016),
PARTITION part_2016 VALUES LESS THAN (2017),
PARTITION part_2017 VALUES LESS THAN (MAXVALUE));
but when I use
EXPLAIN PARTITIONS SELECT * FROM ordersList WHERE YEAR(date_element) > '2015';
the query uses all the partitions and not only part_2015, part_2016 and part_2017.
Instead if I use
EXPLAIN PARTITIONS SELECT * FROM ordersList WHERE date_element > '2015-10-10 10:00:00';
it works.
So my questions are:
How can I make the first query work?
Is there a way to create a materialized view from this table without losing the partitions?
Thank you
In your first example: EXPLAIN PARTITIONS SELECT * FROM ordersList WHERE YEAR(date_element) > '2015'; there's no way for the engine to identify beforehand in which partition your data is.
It must evaluate YEAR(date_element) in every row to find out the year. It's a classic example of filtering by a function's result. DBMS in general can't use indexes to find data this way, since the function's result is unknown and must be evaluated for every table, so your search turns into a full scan.
I understand your point here, since you used the same function the define partitioning and to find data, but for some reason this optimization is not there. In other words: the engine doesn't notice both functions are the same.
In the second statement, you're directly comparing a column to an arbitrary value, this is what the engine prefers, and indexes come into play.
MySQL's PARTITIONing is quite finicky. Whereas YEAR() is recognized, it is probably the only expression that is recognized, not > it plays dumb.
Why are you partitioning on YEAR? it may not be useful.
If your queries are like what you described. then an appropriate index on a non-partitioned table is likely to run just as fast.
Please provide the important queries and SHOW CREATE TABLE (with or without partitioning) so we can analyze what makes the most sense.
Also, what is PARTITIONS 5??
I've created a view on partitioned table. When I pass the partitioned column to the SELECT statement of view, the optimizer is not going to that particular partition when checked through EXPLAIN statement.
Is there any way to make the view access a single partition of its table?
[Edit] : Here is how I created the view on two partitioned tables
CREATE TABLE Partition1 (ID INT,NAME VARCHAR(100),DOB DATE)
PARTITION BY LIST (YEAR(DOB))
(
PARTITION P_2000 VALUES IN (2000),
PARTITION P_2001 VALUES IN (2001)
);
CREATE TABLE NOPART (ID INT,DOB DATE)
PARTITION BY LIST (YEAR(DOB))
(
PARTITION P_2000 VALUES IN (2000),
PARTITION P_2001 VALUES IN (2001)
);
CREATE OR REPLACE VIEW P_VIEW
AS
SELECT ID,DOB
FROM PARTITION1
UNION
SELECT ID,DOB
FROM NOPART;
EXPLAIN
SELECT * FROM P_VIEW
WHERE DOB = '2001-01-01';
When I run the "Explain" it shows optimizer is going to both partitions "p_2000" and "p_2001".
There are many deficiencies in the implementation of VIEWs. You may have hit one.
There are many uses of PARTITIONing that do not provide any performance. BY RANGE is probably the only variant that helps performance for some use cases. A table with less than a million rows is not worth partitioning.
Without seeing your CREATE TABLE, CREATE VIEW, and SELECT, we can only give you vague answers like I have.
(Responding to added code) Unless there is more to it than that, PARTITIONing in that way provide no benefit over having an index on DOB.
Furthermore, The VIEW + PARTITION approach (without an index) must scan the entire 2001 partition looking for the few rows for '2001-01-01'. Instead the simple index approach can find them immediately -- 365 times as fast. (OK, not really that much faster, but still.)
I have a MySQL DB with two columns. 'Key' and 'Used'. Key is a string, Used is an integer. Is there a very fast way to search for a specific Key and then return the Use in a huge MySQL DB with 6000000 rows of data.
You can make it fast by creating an index on key field:
CREATE INDEX mytable_key_idx ON mytable (`key`);
You can actually make it even faster for reading by creating covering index on both (key, used) fields:
CREATE INDEX mytable_key_used_idx ON mytable (`key`, `used`);
In this case, when reading, MySQL could retrieve used value from the index itself, without reading the table (index-only scan). However, if you have a lot of write activity, covering index may work slower because now it has to update both an index and actual table.
The normative SQL for that would be:
SELECT t.key, t.used FROM mytable t WHERE t.key = 'particularvalue' ;
The output from
EXPLAIN
SELECT t.key, t.used FROM mytable t WHERE t.key = 'particularvalue' ;
Would give details about the access plan, what indexes are being considered, etc.
The output from a
SHOW CREATE TABLE mytable ;
would give information about the table, the engine being used and the available indexes, as well as the datatypes.
Slow performance on a query like this is usually indicative of a suboptimal access plan, either because suitable indexes are not available, or not being used. Sometimes, a characterset mismatch between the column datatype and the literal datatype in the predicate can make an index "unusable" by a particular query.
Which is the complexity of the "group by" statement in MySQL?
I am managing vaery big tables and I also would like to know if there is any method to calculate how much time a query is going to take.
This question is impossible to answer with knowledge of what the entire query looks like. Some group bys can be prohibitively expensive while others are very cheap, it all depends on how the indexes in the database are set up, if the value you group by can be cached etc.
For example, this is a very cheap group by:
CREATE TABLE t (a INT, KEY(a));
SELECT * FROM WHERE 1 GROUP BY a;
Since a is an index.
But for something like this, it's very expensive since it would require a table scan.
CREATE TABLE t (a INT);
SELECT * FROM WHERE 1 GROUP BY a;
Generally if a key is not available, the database will creates a temporary table in memory for group by clauses, go through all the values, insert each value into the temporary table with an index to the corresponding row in the result set, then it will select from the temporary table, pick the first row from each column and send that back as the result. Depending on if you use the "extra" rows per group by clause (ie. using MAX(), GROUP_CONCAT() or similar) it will need to fetch all rows again.
You can use EXPLAIN to figure out what strategy MySQL will use, the 'Extra' (in ascending order of cost to execute) 'Using index' if an index can be used, 'Using filesort' if reading all rows from disk will be necessary, and column will contain 'Using Temporary' if a temporary will be required