Dropping Partitions in Vertica - partitioning

I have a table like below in Vertica,
Seq_No CO_NO DATE
1 PQ01 01-Sep-15
2 XY01 01-Oct-15
3 AB01 01-Nov-15
4 PQ02 01-Dec-15
. . .
. . .
. . .
14 XYZ9 01-Oct-16
And table has Partition by Month and Year based on the DATE column.
At any point of time there has to be only 13 partitions ie 13 months of data.
If the current months data comes in(Oct-16) then we need to drop last years SEP month partition(SEP-15) by keeping only 13 partitions on the table(ie Latest 13 months of data).
How can we achieve this in Vertica?

To do this use the drop partition procedure
SELECT DROP_PARTITION('schema.table',CAST(TO_CHAR(ADD_MONTHS(SYSDATE,-13),'YYYYMM') AS INTEGER));
What you need is cron job that will run every beginning of the month.
Before drop all partitions prior to 13 manually and them let the job do it`s work.
Note: your table must be partitioned like :
PARTITION BY (((date_part('year', Datecol) * 100) + date_part('month', Datecol)))
test the drop partition before using it, create a dummy table and run it.

I'm assuming your focus is on the "At any point of time" part of your question. One of two solutions, I guess.
Add a script to your loading job that finds any partitions older than your threshold and drops them (look at the partitions system view, if you are trying to come up with a more generic approach you can extract the partition expression from the tables system view).
Instead of having to be on top of the partition drops, you could just create a view around your table and use that instead to only show the past 1 year of data. Example:
create view myview
as
select * from mytable
where mydate >= current_timestamp - interval '1 year'
Or something similar, like trunc(current_timestamp - interval '1 year','MM'), etc. Then you can drop partitions at your leisure.

Related

Isolating MySql records created today

I am just learning MySql (SQL in general) and I have a question. I ran a process to populate a table with 72 records. This was done, however, I needed to run the process again and this time it populated the table again with a second record for each user for a total now of 144 records. How can I isolate the newest records created today?
A simple solution is to use current_date to figure out today's date and date() to remove the time portion of your column. Then:
where current_date = date(createdTS)
This is fine for a small dataset as yours. As general solution, you'd need a query that won't need to manipulate every row, e.g.
where createdTS >= current_date and createdTS < current_date + interval 1 day
You just have to use your createdTS column, (assuming you know what was the timestamp of both runs).
SELECT * FROM `my_table` WHERE `createdTS` > '2019-07-25 15:00:00'
You could also RANK() over and get only the newest run for each user (something like this)

SQL Like Search

I am trying run to say find the devices that did not contain 01: in the past 7 days.
I have tried "Where column Not Like '%01:%'" but it just removes the 01: and still shows the machine that had the 01: in the past 7 days.
I have a table called devices. Each location has a unique ID number. Each device runs a job at 1am and 7pm. Devices should have 1 entry for 01:00:00 per week then 3 entries for 19:00:00 per week. Ex of cell data is 2017-10-23 19:00:02.
So I begin with
Select * From devices
Where locationid=##
AND jobdate < DATE_SUB(NOW(), INTERVAL 7 DAY))
AND jobdate not like '%01:%'
What I get in result is the machine that did run at 01:00 2 days ago. The job date shows 19:00 so it sounds like it just removed the 01:.
I am thinking of grouping the job data then say list the computer that did not have 2017-10-23 01:00:02 .
There is a good deal of intuition in the following suggestion, more on that later.
Most databases don't actually store date/time information is a WYSIWYG fashion. Indeed if you think about it long enough you will understand that date/times are really "sets of numbers". That is why we can do things like calculate the number of days from date1 to date2 etc. So, IF the data is stored as a datetime data type don't attempt to use LIKE (which is for text) against a datetime column. Instead look for date and time related functions that may apply to your situation. Here you are looking for not equal to specific time of day (I think). So, to remove "date" from consideration convert it to "time", and then you can filter on that.
So below, I introduce a new column jobtime which is the time portion of jobdate, and then I look for any times not equal to a given value.
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE Devices
(`locationid` varchar(2), `jobdate` datetime)
;
INSERT INTO Devices
(`locationid`, `jobdate`)
VALUES
('##', '2017-10-23 01:00:00'),
('##', '2017-10-23 19:00:02')
;
Query 1:
select
*
from (
select locationid, cast(jobdate as time) jobtime, jobdate
from devices
) d
where locationid = '##'
and jobtime <> '01:00:00'
;
Results:
| locationid | jobtime | jobdate |
|------------|----------|----------------------|
| ## | 19:00:02 | 2017-10-23T19:00:02Z |
...
why is there "intuition" above? (the "more on this later")
It is remarkably frustrating to not know which database is in use because the syntax differs so much between the vendors. It is also essential to know the EXACT data type of the jobdate column - because if it is varchar for example I have just made a complete fool of myself in the query above. In other words we are not likely to answer because key facts are missing.
Finally, you have data! It's in your table(s) already. Why not make it easy on everyone by sharing a few bits of it? Provide "sample data" with your question, and the "expected result" too (i.e. provide 2 things, not one without the other, and do not use images of data!!!). Hopefully you can see from the example above how useful sample data & result is. For example, if my intuition is way off, you can tell in an instant that it is - even if you don't read the SQL.
Rant over, not all points raised here apply to this question.

Stored procedure for update date range and price in mysql

I have a table lets say:
tblHotel
id
start_date
end_date
rate
Now I want to write procedure for update records for date range, say for example I have data:
id start_date end_date rate
1 2016/01/01 2016/01/10 10
2 2016/01/11 2016/01/20 50
Now if a new date range and rate comes from supplier I want to update tables record like new range is.
start_date end_date rate
2016/01/05 2016/01/12 100
Now updated records should be like this:
id start_date end_date rate
1 2016/01/01 2016/01/04 10
2 2016/01/05 2016/01/12 100
3 2016/01/13 2016/01/20 50
I'm not going to write the code for you, but handling overlapping time frame is tricky. You need to handle this as different cases:
If nothing overlaps, then this is simple:
insert into tbl_Hotel(start_date, end_date, rate)
select $start_date, $end_date, $rate
from dual
where not exists (select 1
from tbl_Hotel h
where h.start_date <= $end_date and h.end_date >= $start_date
);
Easy . . . And in the stored procedure the where can be handled using if logic.
Then the hard part. There are four types of overlaps:
-------hhhhhhhhhhh--------
a) ---------xxxxx------------
b) -----xxxxxx---------------
c) ----------xxxxxx----------
d) --xxxxxxxxxxxxxxxxxxxxxx--
And, then it gets a bit more complicated because a new rate period could overlap with more than one existing period.
Arrrg! How do you approach this? Carefully and with test cases. You might even want to use a cursor (although there are non-cursor-based methods as well).
The idea is to pull out one overlapping existing period. Then, for that period handle the logic:
a) The existing period needs to be split into two parts (with appropriate end dates. Then the new reservation can just be added.
b) The start date of the existing period has to change to one more than the end date of the new one. Then the new one inserted.
c) The end date of the existing period has to change to one less than the start date of the new one. Then the new one inserted.
d) The old record is removed and the new one inserted.
As I say, good tests for your stored procedure are important, so you can actually check that it works.

doesn't partition pruning work if I have range size larger than number of partitions?

I've 15 million of rows in my table and data comes on every 4 second basis. So, I have decided to make partitions on each day as follows
ALTER TABLE vehicle_gps
PARTITION BY RANGE(UNIX_TIMESTAMP(gps_time)) (
PARTITION p01 VALUES LESS THAN (UNIX_TIMESTAMP('2014-01-01 00:00:00')),
.
.
.
PARTITION p365 VALUES LESS THAN (UNIX_TIMESTAMP('2015-01-01 00:00:00')));
I had to make 365 partitions as shown. Each partitioned day contains data around 100 thousand rows.
And if I want to fetch the data by giving a query
SELECT gps_time FROM vehicle_gps
WHERE gps_time BETWEEN '2014-05-01 00:00:00' AND '2014-05-06 00:00:00';
I found that Partitioning pruning not happening. MySQL manual says if Values in between range are larger than number of partitions, Pruning won't happen. If so then what is the need of creating partitions with tables which contain huge data as mine. Since I'm new to partitioning I'm confused, please guide me if I'm wrong, help me in learning.
Thank You :)
It just doesn't work with dates, small extract from the MySQL Documentation
Pruning can be used only on integer columns of tables partitioned by HASH or KEY. For example, this query cannot use pruning because dob is a DATE column:
SELECT * FROM t4 WHERE dob >= '2001-04-14' AND dob <= '2005-10-15';
However, if the table stores year values in an INT column, then a query having WHERE year_col >= 2001 AND year_col <= 2005 can be pruned.
Hope it helps!

Correct MySQL Structure for a Time Range for Query Optimization?

I have a scenario where I want to be able to SELECT rows from a MySQL table, but exclude rows where the current time-of-day is inside a time-range.
Example:
The "quiet" period for one row is 10pm - 8:30am.
My SQL SELECT statement should not return that row if the current server time is after 10pm or before 8:30am.
Example 2: The "quiet period" is NULL and ignored.
Example 3: A new row is created with a quiet period from 9:53am to 9:55am. If the current server time is in that 2-minute window, the row is not returned by the SELECT.
My question:
What data format would you use in the database, and how would you write the query?
I have thought about a few different approaches (defining start_time as one column and duration as another, defining both in seconds... or using Date stamps... or whatever). None of them seem ideal and require a lot of calculation.
Thanks!
I would store the start and end dates as MySQL native TIME fields.
You would need to consider ranges that span midnight as two separate ranges but you would be able to query the table like this, To find all current quiet periods
SELECT DISTINCT name FROM `quiet_periods`
WHERE start_time<=CURTIME() AND CURTIME()<=end_time
Or to find all non-active quiet periods
SELECT name FROM quiet_periods WHERE name NOT IN (
SELECT name FROM `quiet_periods`
WHERE start_time<=CURTIME() AND CURTIME()<=end_time
)
So with sample data
id -- name -- start_time -- end_time
1 -- late_night -- 00:00:00 -- 08:30:00
2 -- late_night -- 22:00:00 -- 23:59:59
3 -- null_period -- NULL -- NULL
4 -- nearly_10am -- 09:53:00 -- 09:55:00
At 11pm this would return
null_period
nearly_10am
from the second query.
Depending on performance and how many rows you had you might want to refactor the second query into a JOIN and probably add the relevant INDEXes too.