I've been hammering my head against my desk for the past few days on this, and so I turn to you, Stack Overflow.
The software I'm working on has time-sensitive data. The usual solution for this is effective and expiration dates.
EFF_DT XPIR_DT VALUE
2000-05-01 2000-10-31 100
2000-11-01 (null) 90
This would be easy. Unfortunately, we require data that repeats on a yearly basis arbitrarily far into the future. In other words, each May 1 (starting in 2000) we may want the effective value to be 100, and each November 1 we may want to change it to 90.
This may be true for a long time (>50 years), and so I don't want to just create a hundred records. I.e., I don't want to do this:
EFF_DT XPIR_DT VALUE
2000-05-01 2000-10-31 100
2000-11-01 2001-04-30 90
2001-05-01 2001-10-31 100
2001-11-01 2002-04-30 90
2002-05-01 2002-10-31 100
2002-11-01 2003-04-30 90
...
2049-05-01 2049-10-31 100
2049-11-01 2050-04-30 90
2050-05-01 2050-10-31 100
2050-11-01 2051-04-30 90
These values may also change with time. Values before 2000 might have been constant (no flip-flopping) and values for the coming decade may be different than the values for the last:
EFF_DT XPIR_DT REPEATABLE VALUE
1995-01-01 2000-04-30 false 85
2000-05-01 2010-04-30 true 100
2000-11-01 2010-10-31 true 90
2010-05-01 (null) true 120
2010-11-01 (null) true 115
We already have a text file (from a legacy app) that stores data in a form very close to this, so there are benefits to adhering to this type of structure as closely as possible.
The question then comes on retrieval: which value would apply to today, 2010-03-09?
It seems that the best way to do this would be to find the most recent instance of each effective date (of all the active rows), then see which is the greatest.
EFF_DT MOST_RECENT XPIR_DT VALUE
2000-05-01 2009-05-01 2010-04-30 100
2000-11-01 2009-11-01 2010-10-31 90
The value for today would be 90, since 2009-11-01 is later than 2009-05-01.
On, say, 2007-06-20:
EFF_DT MOST_RECENT XPIR_DT VALUE
2000-05-01 2007-05-01 2010-04-30 100
2000-11-01 2006-11-01 2010-10-31 90
The value would be 100 since 2007-05-01 is later than 2006-11-01.
Using the MySQL date functions, what's the most efficient way to calculate the MOST_RECENT field?
Or, can anyone think of a better way to do this?
The language is Java, if it matters. Thanks all!
Suppose your wanted 'date' is '2007-06-20'.
You need to combine the non-repeating elements with the repeating ones, so you could do something like this (untested and probably needs some thinkering, but should give you the general idea):
select * from (
select * from mytable
where
repeatable = false
and
EFF_DT <= '2007-06-20' < XPIR_DT
union all
select * from mytable
where
repeatable = true
and EFF_DT <= str_to_date(concat("2007", "-", month(EFF_DT), "-", day(EFF_DT)), "%Y-%m-%d") < XPIR_DT
)
order by EFF_DT desc limit 1
I've had to do similar things with recurring appointments & events, and you might find that MySQL will be a lot happier with the "static" date style that you don't want - each recurring instance spelled out in hundreds of rows.
If possible, I'd consider creating a separate table to store them flattened out, while keeping the effective/expires dates where they are (to match legacy data & act as a parent), and a 1:many relation between the two tables (i.e. an "event_id" on the flattened data referencing the original's PK). Writing all those records will obviously take longer, but it's directly lightening the load from reading them (where things generally need to be faster).
Creating a stored procedure or external program to handle recalculating a flat start_date / end_date / value table should be fairly basic, given a common interval. Querying the data could then be as simple as WHERE #somedate BETWEEN start_date AND end_date, instead of increasingly complex conversions & date math.
Again, INSERTs and UPDATEs will be slower, but "hundreds of rows" isn't even scratching the surface of what MySQL's capable of. If it's just 2 dates, an int, and some sort of int key, writing a few hundred records shouldn't take but a couple seconds on a sub-par server. If we were talking millions of records then maybe something could be tweaked (do you really need to track 50 years ahead or just the next 5? can recalculation be moved to off-peak times via cron? etc), but even then MySQL will just be that much more effective compared to calculating the difference every time.
Also maybe of interest: What's the best way to model recurring events in a calendar application? & Data structure for storing recurring events?
Here is a query that you can use to calculate the more recent EFF_DT for a data set. You will have to fill in there where clause because i'm not sure how this data is organized.
select EFF_DT form date_table where 1 order by EFF_DT desc limit 1
The flip flop of 90 and 100 is more complex, but you should be able to take care of this using the mysql data and time functions. This is a tricky one, and I'm not 100% on what you are trying to do. But, this query checks to see if the month of XPIR_DT is greater than May (the 5th month) but less than November (The 11th month). If this is true then the sql query will return 90, if its false then you'll get 100.
select if((month(XPIR_DT)>=5) and (month(XPIR_DT)<11),90,100) from date_table where id=1
Related
I am trying run to say find the devices that did not contain 01: in the past 7 days.
I have tried "Where column Not Like '%01:%'" but it just removes the 01: and still shows the machine that had the 01: in the past 7 days.
I have a table called devices. Each location has a unique ID number. Each device runs a job at 1am and 7pm. Devices should have 1 entry for 01:00:00 per week then 3 entries for 19:00:00 per week. Ex of cell data is 2017-10-23 19:00:02.
So I begin with
Select * From devices
Where locationid=##
AND jobdate < DATE_SUB(NOW(), INTERVAL 7 DAY))
AND jobdate not like '%01:%'
What I get in result is the machine that did run at 01:00 2 days ago. The job date shows 19:00 so it sounds like it just removed the 01:.
I am thinking of grouping the job data then say list the computer that did not have 2017-10-23 01:00:02 .
There is a good deal of intuition in the following suggestion, more on that later.
Most databases don't actually store date/time information is a WYSIWYG fashion. Indeed if you think about it long enough you will understand that date/times are really "sets of numbers". That is why we can do things like calculate the number of days from date1 to date2 etc. So, IF the data is stored as a datetime data type don't attempt to use LIKE (which is for text) against a datetime column. Instead look for date and time related functions that may apply to your situation. Here you are looking for not equal to specific time of day (I think). So, to remove "date" from consideration convert it to "time", and then you can filter on that.
So below, I introduce a new column jobtime which is the time portion of jobdate, and then I look for any times not equal to a given value.
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE Devices
(`locationid` varchar(2), `jobdate` datetime)
;
INSERT INTO Devices
(`locationid`, `jobdate`)
VALUES
('##', '2017-10-23 01:00:00'),
('##', '2017-10-23 19:00:02')
;
Query 1:
select
*
from (
select locationid, cast(jobdate as time) jobtime, jobdate
from devices
) d
where locationid = '##'
and jobtime <> '01:00:00'
;
Results:
| locationid | jobtime | jobdate |
|------------|----------|----------------------|
| ## | 19:00:02 | 2017-10-23T19:00:02Z |
...
why is there "intuition" above? (the "more on this later")
It is remarkably frustrating to not know which database is in use because the syntax differs so much between the vendors. It is also essential to know the EXACT data type of the jobdate column - because if it is varchar for example I have just made a complete fool of myself in the query above. In other words we are not likely to answer because key facts are missing.
Finally, you have data! It's in your table(s) already. Why not make it easy on everyone by sharing a few bits of it? Provide "sample data" with your question, and the "expected result" too (i.e. provide 2 things, not one without the other, and do not use images of data!!!). Hopefully you can see from the example above how useful sample data & result is. For example, if my intuition is way off, you can tell in an instant that it is - even if you don't read the SQL.
Rant over, not all points raised here apply to this question.
I would like to discuss the "best" way to storage date periods in a database. Let's talk about SQL/MySQL, but this question may be for any database. I have the sensation I am doing something wrong for years...
In english, the information I have is:
-In year 2014, value is 1000
-In year 2015, value is 2000
-In year 2016, there is no value
-In year 2017 (and go on), value is 3000
Someone may store as:
BeginDate EndDate Value
2014-01-01 2014-12-31 1000
2015-01-01 2015-12-31 2000
2017-01-01 NULL 3000
Others may store as:
Date Value
2014-01-01 1000
2015-01-01 2000
2016-01-01 NULL
2017-01-01 3000
First method validation rules looks like mayhem to develop in order to avoid holes and overlaps.
In second method the problem seem to filter one punctual date inside a period.
What my colleagues prefer? Any other suggestion?
EDIT: I used full year only for example, my data usually change with day granularity.
EDIT 2: I thought about using stored "Date" as "BeginDate", order rows by Date, then select the "EndDate" in next (or previous) row. Storing "BeginDate" and "Interval" would lead to hole/overlap problem as method one, that I need a complex validation rule to avoid.
It mostly depends on the way you will be using this information - I'm assuming you do more than just store values for a year in your database.
Lots of guesses here, but I guess you have other tables with time-bounded data, and that you need to compare the dates to find matches.
For instance, in your current schema:
select *
from other_table ot
inner join year_table yt on ot.transaction_date between yt.year_start and yt.year_end
That should be an easy query to optimize - it's a straight data comparison, and if the table is big enough, you can add indexes to speed it up.
In your second schema suggestion, it's not as easy:
select *
from other_table ot
inner join year_table yt
on ot.transaction_date between yt.year_start
and yt.year_start + INTERVAL 1 YEAR
Crucially - this is harder to optimize, as every comparison needs to execute a scalar function. It might not matter - but with a large table, or a more complex query, it could be a bottleneck.
You can also store the year as an integer (as some of the commenters recommend).
select *
from other_table ot
inner join year_table yt on year(ot.transaction_date) = yt.year
Again - this is likely to have a performance impact, as every comparison requires a function to execute.
The purist in me doesn't like to store this as an integer - so you could also use MySQL's YEAR datatype.
So, assuming data size isn't an issue you're optimizing for, the solution really would lie in the way your data in this table relates to the rest of your schema.
I have partly the following MySQL schema
ServiceRequests
----------
id int
RequestDateTime datetime
This is what a typical collection of records might look like.
1 | 2009-10-11 14:34:22
2 | 2009-10-11 14:34:56
3 | 2009-10-11 14:35:01
In this case the average request time is (34+5)/2 = 19.5 seconds, being
14:34:22 ---> (34 seconds) ----> 14:34:56 ------> (5 seconds) -----> 14:35:01
Basically I need to work out the difference in time between consecutive records, sum that up and divide by the number of records.
The closest thing I can think of is to convert the timestamp to epoch time and start there. I can add a field to the table to precalculate the epoch time if necessary.
How do I determine 19.5 using a sql statement(s)?
You don't really need to know the time difference of each record to get the average. You have x data points ranging from some point t0 to t1. Notice that the the last time - first time is also 39 sec. (max-min)/(count-1) should work for you
select max(RequestDateTime)-min(RequestDateTime) / (count(id)-1) from ServiceRequests;
Note: This will not work if the table is empty, due to a divide by zero.
Note2: Different databases handle subtraction of dates differently so you may need to turn that difference into seconds.
Hint: maybe using TIMEDIFF(expr1,expr2) and/or TIME_TO_SEC(expr3)
I have a MySQL table containing a column to store time and another to store a value associated with that time.
time | value
------------
1 | 0.5
3 | 1.0
4 | 1.5
.... | .....
The events are not periodic, i.e., the time values do not increment by fix interval.
As there are large number of rows (> 100000), for the purpose of showing the values in a graph I would like to be able to aggregate (mean) the values for an interval of fixed size over the entire length of time for which the data is available. So basically the output should consist of pairs of interval and mean values.
Currently, I am splitting the total time interval into fixed chunks of time, executing individual aggregate queries for that interval and collecting the results in application code (Java). Is there a way to do all of these steps in SQL. Also, I am currently using MySQL but am open to other databases that might support an efficient solution.
SELECT FLOOR(time / x) AS Inter, AVG(value) AS Mean
FROM `table`
GROUP BY Inter;
Where x is your interval of fixed size.
I've usually solved this through a "period" table, with all the valid times in it, and an association with the period on which I report.
For instance:
time day week month year
1 1 1 1 2001
2 1 1 1 2001
....
999 7 52 12 2010
You can then join your time to the "period" table time, and use AVG.
I have the following table in MySQL that records event counts of stuff happening each day
event_date event_count
2011-05-03 21
2011-05-04 12
2011-05-05 12
I want to be able to query this efficiently by date range AND by day of week. For example - "What is the event_count on Tuesdays in May?"
Currently the event_date field is a date type. Are there any functions in MySQL that let me query this column by day of week, or should I add another column to the table to store the day of week?
The table will hold hundreds of thousands of rows, so given a choice I'll choose the most efficient solution (as opposed to most simple).
Use DAYOFWEEK in your query, something like:
SELECT * FROM mytable WHERE MONTH(event_date) = 5 AND DAYOFWEEK(event_date) = 7;
This will find all info for Saturdays in May.
To get the fastest reads store a denormalized field that is the day of the week (and whatever else you need). That way you can index columns and avoid full table scans.
Just try the above first to see if it suits your needs and if it doesn't, add some extra columns and store the data on write. Just watch out for update anomalies (make sure you update the day_of_week column if you change event_date).
Note that the denormalized fields will increase the time taken to do writes, increase calculations on write, and take up more space. Make sure you really need the benefit and can measure that it helps you.
Check DAYOFWEEK() function
If you want textual representation of day of week - use DAYNAME() function.