MySQL partition pruning always includes first partition in inequality query - mysql

I have a database partitioned by range on to_days(created_at).
The partitions are monthly (p1 - p50) with a pmax catchall on the end. In the below example, I'm expecting only partition p45 to be hit.
when I do an explain partitions select * from units where created_at > "2013-01-01 00:00:00" and NOW()
I get p1,p45 listed under the partitions column
This happens in both 5.1 and 5.5
Why is the optimizer including the first partition for an inequality check?

You asked this a long time ago, but I also ran into this issue and found a workaround here:
http://datacharmer.blogspot.com/2010/05/two-quick-performance-tips-with-mysql.html
... basically you should create a first partition that contains values less than (0), which will always be empty. The MySQL query optimizer will still include this first partition, but at the least it shouldn't be doing any resource-intensive scanning.
UPDATE: Here's a short summary of the URL linked in my original answer:
The official MySQL bugtracker acknowledges this behavior as a feature:
Bug Description:
Regardless of the range in the BETWEEN clause a table partitioned by RANGE using TO_DAYS function always includes the first partition in the table when pruning.
Response:
This is not a bug, since TO_DAYS() returns NULL for invalid dates, it needs to scan the first partition as well (since that holds all NULL values) for ranges.
...
A performance workaround is to create a specific partition to hold all NULL values (like '... LESS THAN (0)'), which also would catch all bad dates.

Related

MySQL indexing has no speed effect through PHP but does on PhpMyAdmin

I am trying to speed up a simple SELECT query on a table that has around 2 million entries, in a MariaDB MySQL database. It took over 1.5s until I created an index for the columns that I need, and running it through PhpMyAdmin showed a significant boost in speed (now takes around 0.09s).
The problem is, when I run it through my PHP server (mysqli), the execution time does not change at all. I'm logging my execution time by running microtime() before and after the query, and it takes ~1.5s to run it, regardless of having the index or not (tried removing/readding it to see the difference).
Query example:
SELECT `pair`, `price`, `time` FROM `live_prices` FORCE INDEX
(pairPriceTime) WHERE `time` = '2022-08-07 03:01:59';
Index created:
ALTER TABLE `live_prices` ADD INDEX pairPriceTime (pair, price, time);
Any thoughts on this? Does PHP PDO ignore indexes? Do I need to restart the server in order for it to "acknowledge" that there is a new index? (Which is a problem since I'm using a shared hosting service...)
If that is really the query, then it needs an INDEX starting with the value tested in the WHERE:
INDEX(time)
Or, to make a "covering index":
INDEX(time, pair, price)
However, I suspect that most of your accesses involve pair? If so, then other queries may need
INDEX(pair, time)
especially if you as for a range of times.
To discuss various options further, please provide EXPLAIN SELECT ...
PDO, mysqli, phpmyadmin -- These all work the same way. (A possible exception deals with an implicit LIMIT on phpmyadmin.)
Try hard to avoid the use of FORCE INDEX -- what helps on today's query and dataset may hurt on tomorrow's.
When you see puzzling anomalies in timings, run the query twice. Caching may be the explanation.
The mysql documenation says
The FORCE INDEX hint acts like USE INDEX (index_list), with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the named indexes to find rows in the table.
MariaDB documentation Force Index here says this
FORCE INDEX works by only considering the given indexes (like with USE_INDEX) but in addition, it tells the optimizer to regard a table scan as something very expensive. However, if none of the 'forced' indexes can be used, then a table scan will be used anyway.
Use of the index is not mandatory. Since you have only specified one condition - the time, it can choose to use some other index for the fetch. I would suggest that you use another condition for the select in the where clause or add an order by
order by pair, price, time
I ended up creating another index (just for the time column) and it did the trick, running at ~0.002s now. Setting the LIMIT clause had no effect since I was always getting 423 rows (for 423 coin pairs).
Bottom line, I probably needed a more specific index, although the weird part is that the first index worked great on PMA but not through PHP, but the second one now applies to both approaches.
Thank you all for the kind replies :)

MySQL - rebuild partition vs optimize partition

I've partitioned tables in my MySQL 5.1.41 which hold very huge amount of data. Recently, I've deleted a lot of data which caused fragmentation of around 500 GB yet there is a lot of data in the partitions.
To reclaim that space to the OS, I had to de-fragment the partitions. I referred to MySQL documentation, https://dev.mysql.com/doc/refman/5.1/en/partitioning-maintenance.html which confused me with the following statements,
Rebuilding partitions : Rebuilds the partition; this has the same effect as dropping all records stored in the partition, then
reinserting them. This can be useful for purposes of defragmentation.
Optimizing partitions : If you have deleted a large number of rows from a partition or if you have made many changes to a partitioned
table with variable-length rows (that is, having VARCHAR, BLOB, or
TEXT columns), you can use ALTER TABLE ... OPTIMIZE PARTITION to
reclaim any unused space and to defragment the partition data file.
I tried both and observed sometimes "rebuild" happens faster and sometimes "optimize". Each partition I run these commands on, has records from millions to sometimes billions. I'm aware of what MySQL does for above each statement.
Do they need to be applied based on number of rows in the partition? If so, on how many rows I can use "optimize" and on how many I should use "rebuild"?
Also, which is better to use?
MyISAM or InnoDB? (The answer will be different.)
For MyISAM, REBUILD/REORGANIZE/OPTIMIZE will take about the same effort per partition.
For InnoDB, OPTIMIZE PARTITION rebuilds all partitions. So, don't use this if you want to do the partitions one at a time. REORGANIZE PARTITION of the partition into an identical partition definition should act only on the one partition. I recommend that.
It is generally not worth using partitioning unless you have a least a million rows. Also BY RANGE is the only form that has any performance benefits that I have found.
Perhaps the main use of partitioning is with a time-series where you want to delete "old" data. PARTITION BY RANGE with weekly or monthly partitions lets you very efficiently DROP PARTITION rather than DELETE. More in my blog.
(My answer applies to all versions through 5.7, not just your antique 5.1.)

Why MySql can not use PARTITION pruning with INSERT statements?

Here is a sentence:
MySQL can apply partition pruning to SELECT, DELETE, and UPDATE statements. INSERT statements currently cannot be pruned.
So when a new row is inserted MySql can not determine what partition it belongs? Sound very strange. Is it a mistake? or what do they mean by this pharse?
As I read it, the partition-pruning optimization currently relies on parsing of the WHERE clause to determine which partitions to access. The INSERT statement has no WHERE clause, and the optimizer currently has no other mechanism by which to prune.
It looks as though partition pruning is, at present, something of a work-in-progress.

When some partitions are crashed can we query on other partitions

I have a large table with about 40 partition.
Each partition belongs to different area data.
I found that some partition are crashed and i also want to work on other partitions at the same time keeping crashed partitions as it is.
So can i query on other partition, using PARTITION in SELECT statement, when some partitions are crashed?
I would appreciate if somebody helps me. Thanks in advance
You can, in some sense, restrict select statements to certain partitions. There's no parameter that allow to select a partition (wouldn't make sense since partition access is controlled by the partitioning limits) but you can write your query so that it only retrieves data from specific partitions. For instance if you have partitioned by date you can use a WHERE clause that only addresses specific dates, working so only with specific partitions.

How to partition and subpartition MySQL by key?

I want to add partition to my innoDB table. I have tried to search the syntax for this, but have not found specifics.
Is this syntax wrong? :
ALTER TABLE Product PARTITION BY HASH(catetoryID1) PARTITIONS 6
SUBPARTITION BY KEY(catetoryID2) SUBPARTITIONS 10;
Does SUBPARTITIONS 10 mean each main partition has 10 subpartitions, or does it mean all main partitions have 10 subpartitions divided among them?
It's strange you didn't find the syntax. The MySQL online documentation has quite detailed syntax listed for most common operations.
Look here for overall syntax of the alter table to work with partitions:
http://dev.mysql.com/doc/refman/5.5/en/create-table.html
The syntax for partition management would remain same even when used with the alter table statement, with a few nuances that are listed on the alter table syntax pages in the MySQL docs.
To answer your first question, the problem is not your syntax but rather that you are trying sub-partition a table partitioned first by Hash partitioning - this is not allowed, at least in MySQL 5.5. Only Range or List partitions can be sub-partitioned.
Look here for a complete list of partitioning types:
http://dev.mysql.com/doc/refman/5.5/en/partitioning-types.html
As for the second question, assuming what you were trying would work, you'd be creating 6 partitions hashed by catetoryID1, and then within these you'd have 10 sub-partitions hashed by catetoryID2. So you'd have in all
6 x 10 = 60 partitions
Rules of thumb:
SUBPARTITION is useless. It provides no speed, and nothing else.
Due to various inefficiencies, don't have more than about 50 partitions.
PARTITION BY RANGE is the only useful one.
Often an INDEX can provide better performance than PARTITION; let's see your SELECT.
My blog on partitioning: http://mysql.rjweb.org/doc.php/partitionmaint