MySQL Partition By DATEDIFF - mysql

I have a table with the ReferenceDate field. I intend to partition it using this field as follows:
partition_0: Values more than 1 year old;
partition_1: Values older than 6 months;
partition_2: Values older than 3 months;
partition_3: Values for the last 3 months;
For this I tried the following script to change the table:
ALTER TABLE `MyTable`
PARTITION BY RANGE (DATEDIFF(NOW(), `ReferenceDate`))
(
PARTITION p0_historic_data VALUES LESS THAN (90),
PARTITION p1_intermediary_data VALUES LESS THAN (180),
PARTITION p2_intermediary_data VALUES LESS THAN (365),
PARTITION p3_current_data VALUES LESS THAN MAXVALUE
);
However, I believe that I cannot use the Now () function, in the partitioning clause, something I was able to do was use TO_DATE, but it doesn't give me the return I need, with DIFF I have the value of the difference of the current date and ReferenceDate , TO_DATE returns the value in days from year 0 to the current date.
I would like to know if there is really no way to use DIFF, or if there is any alternative in that sense.

A PARTITIONed table is one where some of the rows are permanently put in one 'sub-table' or another, based on the instructions in PARTITION BY ....
So, it is flatly not possible. To implement such, MySQL would have to move rows from one partition to another, even when you are not touching the table.
Even if it were possible, it might not provide any performance improvement. After all, you can have something like this:
WHERE ReferenceDate >= NOW() - INTERVAL 180 DAY
AND ReferenceDate < NOW() - INTERVAL 90 DAY
Then, if you also have
AND CustomerId = 123
then this index would be excellent for finding the desired rows:
INDEX(CustomerId, ReferenceDate)
That does not need PARTITIONing.

Related

How to generate faster mysql query with 1.6M rows

I have a table that has 1.6M rows. Whenever I use the query below, I get an average of 7.5 seconds.
select * from table
where pid = 170
and cdate between '2017-01-01 0:00:00' and '2017-12-31 23:59:59';
I tried adding a LIMIT 1000 or 10000 or change the date to filter for 1 month, it still processes it to an average of 7.5s. I tried adding a composite index for pid and cdate but it resulted to 1 second slower.
Here is the INDEX list
https://gist.github.com/primerg/3e2470fcd9b21a748af84746554309bc
Can I still make it faster? Is this an acceptable performance considering the amount of data?
Looks like the index is missing. Create this index and see if its helping you.
CREATE INDEX cid_date_index ON table_name (pid, cdate);
And also modify your query to below.
select * from table
where pid = 170
and cdate between CAST('2017-01-01 0:00:00' AS DATETIME) and CAST('2017-12-31 23:59:59' AS DATETIME);
Please provide SHOW CREATE TABLE clicks.
How many rows are returned? If it is 100K rows, the effort to shovel that many rows is significant. And what will you do with that many rows? If you then summarize them, consider summarizing in SQL!
Do have cdate as DATETIME.
Do you use id for anything? Perhaps this would be better:
PRIMARY KEY (pid, cdate, id) -- to get benefit from clustering
INDEX(id) -- if still needed (and to keep AUTO_INCREMENT happy)
This smells like Data Warehousing. DW benefits significantly from building and maintaining Summary table(s), such as one that has the daily click count (etc), from which you could very rapidly sum up 365 counts to get the answer.
CAST is unnecessary. Furthermore 0:00:00 is optional -- it can be included or excluded for either DATE or DATETIME. I prefer
cdate >= '2017-01-01'
AND cdate < '2017-01-01' + INTERVAL 1 YEAR
to avoid leap year, midnight, date arithmetic, etc.

How can I make this sql query faster?

I have a table user_notifications that has 1100000 records and I have to run this below query but it takes more than 3 minutes to complete the query what can I do to improve the fetch time.
SELECT `user_notifications`.`user_id`
FROM `user_notifications`
WHERE `user_notifications`.`notification_template_id` = 175
AND (DATE(sent_at) >= DATE_SUB(CURDATE(), INTERVAL 4 day))
AND `user_notifications`.`user_id` IN (
1203, 1282, 1499, 2244, 2575, 2697, 2828, 2900, 3085, 3989,
5264, 5314, 5368, 5452, 5603, 6133, 6498..
)
the user ids in IN block are sometimes upto 1k.
for optimisation I have indexed on user_id and notification_template_id column in user_notification table.
Big IN() lists are inherently slow. Create a temporary table with an index and put the values in the IN() list into that tempory table instead, then you'll get the power of an indexed join instead of giant IN() list.
You seem to be querying for a small date range. How about having an index based on SENT_AT column? Do you know what index the current query is using?
(1) Don't hide columns in functions if you might need to use an index:
AND (DATE(sent_at) >= DATE_SUB(CURDATE(), INTERVAL 4 day))
-->
AND sent_at >= CURDATE() - INTERVAL 4 day
(2) Use a "composite" index for
WHERE `notification_template_id` = 175
AND sent_at >= ...
AND `user_id` IN (...)
The first column should be the one with '='. It is unclear what to put next, so I suggest adding both of these indexes:
INDEX(notification_template_id, user_id, sent_at)
INDEX(notification_template_id, sent_at)
The Optimizer will probably pick between them correctly.
Composite indexes are not the same as indexes on the individual columns.
(3) Yes, you could try putting the IN list in a tmp table, but the cost of doing such might outweigh the benefit. I don't think of 1K values in IN() as being "too many".
(4) My cookbook on building indexes.

MySQL - Move data between partitions aka re-partition

I have a mysql table whose partitions look as below
p2015h1 - Contains data where date < 2015-07-01 (Has data from 2016-06-01. Hence only month worth of data)
p2015h2 - Contains data where date < 2016-01-01
p2016h1 - Contains data where date < 2016-07-01
p2016h2 - Contains data where date < 2017-01-01
I'd like the new partitions to be quarterly based as below -
p0 - Contains data where date < 2015-10-01
p1 - Contains data where date < 2016-01-01
p2 - Contains data where date < 2016-04-01
p3 - Contains data where date < 2016-07-01
I started by reorganizing the first partition & executed the below command. All went well.
alter table `table1` reorganize partition `p2015half1` into (partition `p0` values less than ('2015-10-01'));
Now as the existing partition p2015h2 has data that includes data upto 2015-10-01, how could I move this part into the partition p0 ? I would need to do the same thing with the other partitions too as I continue building the new ones.
I did try to remove partitioning on the table fully, but, the table is billions of rows in size & hence the operation will take days. Post this I will have to rebuild the partitions which will take days again. Hence, I decided to take the approach of splitting partitions.
I'm stuck at this point in time. I'd fully appreciate any guidance here please.
mysql> alter table `table1` reorganize partition p0,p2015half2 into (partition p00 values less than ('2015-07-01'), partition p1 values less than ('2016-01-01'));
mysql> alter table `table1` reorganize partition p00 into (partition p0 values less than ('2015-07-01'));
mysql> alter table `table1` reorganize partition p2016half1,p2016half2 into (partition p2 values less than ('2016-04-01'), partition p3 values less than ('2016-07-01'),partition p4 values less than maxvalue);

doesn't partition pruning work if I have range size larger than number of partitions?

I've 15 million of rows in my table and data comes on every 4 second basis. So, I have decided to make partitions on each day as follows
ALTER TABLE vehicle_gps
PARTITION BY RANGE(UNIX_TIMESTAMP(gps_time)) (
PARTITION p01 VALUES LESS THAN (UNIX_TIMESTAMP('2014-01-01 00:00:00')),
.
.
.
PARTITION p365 VALUES LESS THAN (UNIX_TIMESTAMP('2015-01-01 00:00:00')));
I had to make 365 partitions as shown. Each partitioned day contains data around 100 thousand rows.
And if I want to fetch the data by giving a query
SELECT gps_time FROM vehicle_gps
WHERE gps_time BETWEEN '2014-05-01 00:00:00' AND '2014-05-06 00:00:00';
I found that Partitioning pruning not happening. MySQL manual says if Values in between range are larger than number of partitions, Pruning won't happen. If so then what is the need of creating partitions with tables which contain huge data as mine. Since I'm new to partitioning I'm confused, please guide me if I'm wrong, help me in learning.
Thank You :)
It just doesn't work with dates, small extract from the MySQL Documentation
Pruning can be used only on integer columns of tables partitioned by HASH or KEY. For example, this query cannot use pruning because dob is a DATE column:
SELECT * FROM t4 WHERE dob >= '2001-04-14' AND dob <= '2005-10-15';
However, if the table stores year values in an INT column, then a query having WHERE year_col >= 2001 AND year_col <= 2005 can be pruned.
Hope it helps!

Select day of week from date

I have the following table in MySQL that records event counts of stuff happening each day
event_date event_count
2011-05-03 21
2011-05-04 12
2011-05-05 12
I want to be able to query this efficiently by date range AND by day of week. For example - "What is the event_count on Tuesdays in May?"
Currently the event_date field is a date type. Are there any functions in MySQL that let me query this column by day of week, or should I add another column to the table to store the day of week?
The table will hold hundreds of thousands of rows, so given a choice I'll choose the most efficient solution (as opposed to most simple).
Use DAYOFWEEK in your query, something like:
SELECT * FROM mytable WHERE MONTH(event_date) = 5 AND DAYOFWEEK(event_date) = 7;
This will find all info for Saturdays in May.
To get the fastest reads store a denormalized field that is the day of the week (and whatever else you need). That way you can index columns and avoid full table scans.
Just try the above first to see if it suits your needs and if it doesn't, add some extra columns and store the data on write. Just watch out for update anomalies (make sure you update the day_of_week column if you change event_date).
Note that the denormalized fields will increase the time taken to do writes, increase calculations on write, and take up more space. Make sure you really need the benefit and can measure that it helps you.
Check DAYOFWEEK() function
If you want textual representation of day of week - use DAYNAME() function.