The price calculations for the product I rent out is very complicated, and basically signature for it looks like this:
price = f( from_date, to_date, x ) - i.e dependent on 3 parameters: from_date, to_date, x, where from_date and to_date are dates, and x is a numeric (int) value in range from 1 to 10.
f - is the complicated underlying function.
Main constraints:
1) from_date accept values [ today; today + 1 year ]
2) to_date > from_date
3) |(to_date - from_date)| takes values from [1..365]
I use MySQL as my main storage and I have around 50,000 items.
And I would like to build a search page, where you could filter or sort by price (after entering dates; x defaults to 1 if omitted).
I understand that this cannot be converted into MySQL query, because it is too complicated (and even if was possible, it would be very slow and inefficient).
I thought that maybe I could use a key-value storage like redis or memcache to pre-calculate price values for all possible date ranges, store them there, and invalidate if needed (and that will not happen often).
But, using some basic math, I figured out that using this approach I would need at most 33,580,500,000 keys: (366 * 367 / 2) * 10 * 50,000
And my question is - I'm I thinking in right direction?
If yes - are there any efficient solutions (in terms of # of keys, memory footprint and scalability) to this problem?
Thanks in advance!
Related
I would like to discuss the "best" way to storage date periods in a database. Let's talk about SQL/MySQL, but this question may be for any database. I have the sensation I am doing something wrong for years...
In english, the information I have is:
-In year 2014, value is 1000
-In year 2015, value is 2000
-In year 2016, there is no value
-In year 2017 (and go on), value is 3000
Someone may store as:
BeginDate EndDate Value
2014-01-01 2014-12-31 1000
2015-01-01 2015-12-31 2000
2017-01-01 NULL 3000
Others may store as:
Date Value
2014-01-01 1000
2015-01-01 2000
2016-01-01 NULL
2017-01-01 3000
First method validation rules looks like mayhem to develop in order to avoid holes and overlaps.
In second method the problem seem to filter one punctual date inside a period.
What my colleagues prefer? Any other suggestion?
EDIT: I used full year only for example, my data usually change with day granularity.
EDIT 2: I thought about using stored "Date" as "BeginDate", order rows by Date, then select the "EndDate" in next (or previous) row. Storing "BeginDate" and "Interval" would lead to hole/overlap problem as method one, that I need a complex validation rule to avoid.
It mostly depends on the way you will be using this information - I'm assuming you do more than just store values for a year in your database.
Lots of guesses here, but I guess you have other tables with time-bounded data, and that you need to compare the dates to find matches.
For instance, in your current schema:
select *
from other_table ot
inner join year_table yt on ot.transaction_date between yt.year_start and yt.year_end
That should be an easy query to optimize - it's a straight data comparison, and if the table is big enough, you can add indexes to speed it up.
In your second schema suggestion, it's not as easy:
select *
from other_table ot
inner join year_table yt
on ot.transaction_date between yt.year_start
and yt.year_start + INTERVAL 1 YEAR
Crucially - this is harder to optimize, as every comparison needs to execute a scalar function. It might not matter - but with a large table, or a more complex query, it could be a bottleneck.
You can also store the year as an integer (as some of the commenters recommend).
select *
from other_table ot
inner join year_table yt on year(ot.transaction_date) = yt.year
Again - this is likely to have a performance impact, as every comparison requires a function to execute.
The purist in me doesn't like to store this as an integer - so you could also use MySQL's YEAR datatype.
So, assuming data size isn't an issue you're optimizing for, the solution really would lie in the way your data in this table relates to the rest of your schema.
I have a table.
And it has two fields id and datetime.
What I need to do is, for any two given datetimes, I need to divide the time range into 10 equal intervals and give row count of each interval.
Please let me know whether this is possible without using any external support from languages like java or php.
select ((UNIX_TIMESTAMP(date_col) / CAST((time2 - time1)/10) AS INT) + time1), count(id) from my_table where date_col >= time1 AND date_col <= time2 GROUP BY ((UNIX_TIMESTAMP(date_col) / CAST((time2 - time1)/10) AS INT) + time1)
I haven't tested it. But something like this should work.
The easiest way to divide date intervals is if you store them as longs (ie #of ticks from the "beginning of time"). As far as I know, there is no way to do it using MySQL's datetime.
How you decide to do it ultimately depends on your application. I would store them as longs and have whatever front end you are using handle to conversion to a more readable format.
What exactly do you mean by giving the row count of each interval? That part doesn't make sense to me.
I have a database table that holds order information and saves the current gold price (world price) automatically at the time of entry.
If I look back on the table, and the saved gold price has a difference with the current gold price of +100, O want to show that in a report.
How would I write the MySQL query to do this? I know how to do datediffs but not numeric values.
Example:
select * from table where saved_price < current_price - 100
Is there a better way to do this?
Use Abs if the sign of the difference doesn't matter:
select * from table where abs(saved_price - current_price) > 100
If the sign is interesting, your suggested approach is fine. I'd write it like this, but use the way you think is the most read- and understandable:
select * from table where (current_price - saved_price) >= 100
I'm currently developing a program that will generate reports based upon lead data. My issue is that I'm running 3 queries for something that I would like to only have to run one query for. For instance, I want to gather data for leads generated in the past day
submission_date > (NOW() - INTERVAL 1 DAY)
and I would like to find out how many total leads there were, and how many sold leads there were in that timeframe. (sold=1 / sold=0). The issue comes with the fact that this query is currently being done with 2 queries, one with WHEREsold= 1 and one with WHEREsold= 0. This is all well and good, but when I want to generate this data for the past day,week,month,year,and all time I will have to run 10 queries to obtain this data. I feel like there HAS to be a more efficient way of doing this. I know I can create a mySQL function for this, but I don't see how this could solve the problem.
Thanks!!
Why not GROUP BY sold so you get the totals for sold and not sold
One way to do is to exploit the aggregate functions (usually SUM and COUNT help you the most in this situation) along with MySQL's IF() function.
For example, you could use a query such as:
SELECT
SUM(IF(sold = 1, sold, 0)) AS TotalSold,
SUM(IF(sold = 0, sold, 0)) AS TotalUnsold,
SUM(IF(submission_date > (NOW() - INTERVAL 1 WEEK)
AND sold = 1, sold, 0) AS TotalSoldThisWeek
FROM ...
WHERE ...
The condition (e.g. sold = 1) could be as complex as you want by using AND and OR.
Disclamer: code wasn't tested, this was just provided as an example that should work with minor modifications.
I've been hammering my head against my desk for the past few days on this, and so I turn to you, Stack Overflow.
The software I'm working on has time-sensitive data. The usual solution for this is effective and expiration dates.
EFF_DT XPIR_DT VALUE
2000-05-01 2000-10-31 100
2000-11-01 (null) 90
This would be easy. Unfortunately, we require data that repeats on a yearly basis arbitrarily far into the future. In other words, each May 1 (starting in 2000) we may want the effective value to be 100, and each November 1 we may want to change it to 90.
This may be true for a long time (>50 years), and so I don't want to just create a hundred records. I.e., I don't want to do this:
EFF_DT XPIR_DT VALUE
2000-05-01 2000-10-31 100
2000-11-01 2001-04-30 90
2001-05-01 2001-10-31 100
2001-11-01 2002-04-30 90
2002-05-01 2002-10-31 100
2002-11-01 2003-04-30 90
...
2049-05-01 2049-10-31 100
2049-11-01 2050-04-30 90
2050-05-01 2050-10-31 100
2050-11-01 2051-04-30 90
These values may also change with time. Values before 2000 might have been constant (no flip-flopping) and values for the coming decade may be different than the values for the last:
EFF_DT XPIR_DT REPEATABLE VALUE
1995-01-01 2000-04-30 false 85
2000-05-01 2010-04-30 true 100
2000-11-01 2010-10-31 true 90
2010-05-01 (null) true 120
2010-11-01 (null) true 115
We already have a text file (from a legacy app) that stores data in a form very close to this, so there are benefits to adhering to this type of structure as closely as possible.
The question then comes on retrieval: which value would apply to today, 2010-03-09?
It seems that the best way to do this would be to find the most recent instance of each effective date (of all the active rows), then see which is the greatest.
EFF_DT MOST_RECENT XPIR_DT VALUE
2000-05-01 2009-05-01 2010-04-30 100
2000-11-01 2009-11-01 2010-10-31 90
The value for today would be 90, since 2009-11-01 is later than 2009-05-01.
On, say, 2007-06-20:
EFF_DT MOST_RECENT XPIR_DT VALUE
2000-05-01 2007-05-01 2010-04-30 100
2000-11-01 2006-11-01 2010-10-31 90
The value would be 100 since 2007-05-01 is later than 2006-11-01.
Using the MySQL date functions, what's the most efficient way to calculate the MOST_RECENT field?
Or, can anyone think of a better way to do this?
The language is Java, if it matters. Thanks all!
Suppose your wanted 'date' is '2007-06-20'.
You need to combine the non-repeating elements with the repeating ones, so you could do something like this (untested and probably needs some thinkering, but should give you the general idea):
select * from (
select * from mytable
where
repeatable = false
and
EFF_DT <= '2007-06-20' < XPIR_DT
union all
select * from mytable
where
repeatable = true
and EFF_DT <= str_to_date(concat("2007", "-", month(EFF_DT), "-", day(EFF_DT)), "%Y-%m-%d") < XPIR_DT
)
order by EFF_DT desc limit 1
I've had to do similar things with recurring appointments & events, and you might find that MySQL will be a lot happier with the "static" date style that you don't want - each recurring instance spelled out in hundreds of rows.
If possible, I'd consider creating a separate table to store them flattened out, while keeping the effective/expires dates where they are (to match legacy data & act as a parent), and a 1:many relation between the two tables (i.e. an "event_id" on the flattened data referencing the original's PK). Writing all those records will obviously take longer, but it's directly lightening the load from reading them (where things generally need to be faster).
Creating a stored procedure or external program to handle recalculating a flat start_date / end_date / value table should be fairly basic, given a common interval. Querying the data could then be as simple as WHERE #somedate BETWEEN start_date AND end_date, instead of increasingly complex conversions & date math.
Again, INSERTs and UPDATEs will be slower, but "hundreds of rows" isn't even scratching the surface of what MySQL's capable of. If it's just 2 dates, an int, and some sort of int key, writing a few hundred records shouldn't take but a couple seconds on a sub-par server. If we were talking millions of records then maybe something could be tweaked (do you really need to track 50 years ahead or just the next 5? can recalculation be moved to off-peak times via cron? etc), but even then MySQL will just be that much more effective compared to calculating the difference every time.
Also maybe of interest: What's the best way to model recurring events in a calendar application? & Data structure for storing recurring events?
Here is a query that you can use to calculate the more recent EFF_DT for a data set. You will have to fill in there where clause because i'm not sure how this data is organized.
select EFF_DT form date_table where 1 order by EFF_DT desc limit 1
The flip flop of 90 and 100 is more complex, but you should be able to take care of this using the mysql data and time functions. This is a tricky one, and I'm not 100% on what you are trying to do. But, this query checks to see if the month of XPIR_DT is greater than May (the 5th month) but less than November (The 11th month). If this is true then the sql query will return 90, if its false then you'll get 100.
select if((month(XPIR_DT)>=5) and (month(XPIR_DT)<11),90,100) from date_table where id=1