Have an RDB with a quantity x and the date that quantity started being tracked, date_1 and the date it was finished being tracked date_2. If tracking is still on going that second date is NULL obviously.
What I would like to do is take the number X and get its average over either date_1 and date_2. And if date_2 is NULL then go by current time. Any help?
[EDIT] to clarify in RDB format, one row with data column (x), data column (date_1) and data column (data_2) along with other fields of importance.
[EDIT] so imagine X as some integer like 100,000 and dates being March 30, 2016 12:29:45 and April 3, 2016 03:42:29. Not sure how to breakdown the date/times yet so open to suggestions. The end goal is calculate how much of x can be allocated in one month vs how much in the other month. Depending on how fine grain you breakdown the time frame (days vs seconds) will ultimately change those numbers.
Related
I am trying to pull a report from a data set , the conditions are as follow:
Customer A,B and C produced 100, 150 and 200 tickets respectively in a year.
A's period from 1/1/2022 till 3/30/2022
B's period from 1/10/2022 till 6/20/2022
C's period from 6/10/2022 till 9/5/2022
I want to pull how many cases each customer produced while they are in the incubation period. Such that the report will not include any cases outside the customers incubation period.
The start date and end date in available in a table.
Hopefully I was able to explain this, thanks for your help.
!!! ---------------------!!!
After exchanging some comments with people, I have decided to include the source data, so anyone who wishes to help with solving this could load it into a table and run the query.
Here is the link that retrieves data from Yahoo Finance for symbol SPY in csv format.
https://query1.finance.yahoo.com/v7/finance/download/SPY?period1=1584963550&period2=1616499550&interval=1d&events=history&includeAdjustedClose=true
The file header needs to be changed. Change Date to Source_Date and Adj Close to Adj_Close. The file does not have a Symbol column, so the query doesn't need to reference it. The only two relevant columns are Source_Date and Adj_Close.
!!! ---------------------!!!
The issue with my query is that it takes a very long time to run. The query is not wrong. I know exactly why it takes such a long time to run. I just couldn't come up with anything more efficient.
First, here is the business logic.
Let's say I bought Apple stock and it went down. It's been 10 days since I bought it and it is still down. I extracted an entire history of daily prices for Apple going back to 1993, and I uploaded it into a database table. Now I want to write a query that will tell me how often did Apple stock price recovered in more than 10 days.
For example. I bought Apple at $100. It went down to $90. It's been 10 days since I bought it. I run my query and it comes back with something like this:
-- Buy Date: April 1, 2001. Buy price: $10. Recovered on: April 12, 2001. Days to recovery: 12
-- Buy Date: June 12, 2006. Buy price: $23. Recovered on: July 20, 2001. Days to recovery: 38
-- Buy Date: January 15, 2009. Buy price: $65. Recovered on: December 30, 2010. Days to recovery: 700
Each example indicates that on each day between Buy Date and Recovered on date, the price stayed below the buy price.
My query has two steps.
The first step. Join the same table to itself, qualified as x and y. For each x.record it searches for the next minimum date where y.price is higher than x.price.
The second query will simply extract only those records from the first query result set where the difference between x.date and y.date is greater than 10.
The reason why the first query (see below) runs for such a long time is because for each record in x.table the query must search an entire y.table. That's a lot of table scans. It comes back with the result but it takes between 50 and 60 seconds.
The table stricture is very simple: Symbol, Date, and Price. Symbol and date are primary key.
SELECT x.Symbol,
x.Source_Date as 'Source_Date',
min(y.Source_Date) as 'Recovery_Date'
FROM transformed_source x,
transformed_source y
where x.Symbol = 'AAPL'
and y.Symbol = x.Symbol
and y.Source_Date > x.Source_Date
and y.Adj_Close > x.Adj_Close
group by x.Symbol, x.Source_Date
*** Note This query will miss records where the price never recovered, so I will need to modify it with an outer join. It wouldn't make any difference. Changing it to outer join will not make it run any faster. So, working with this.
Any ideas are welcome.
Thank you
You might find that a correlated subquery is better:
SELECT ts.Symbol, ts.Source_Date,
(SELECT MIN(ts2.Source_Date)
FROM transformed_source ts2
WHERE ts2.Symbol = t.Symbol AND
ts2.Source_Date > t.Source_Date AND
ts2.Adj_Close > t.Adj_Close
) as Recovery_Date
FROM transformed_source ts
WHERE ts.Symbol = 'AAPL';
Then for performance, you want indexes on transformed_data(symbol, source_date, adj_close).
If you're using an RDBMS that provides pattern matching (e.g. Oracle with match recognise) then you can write a query which will do one pass of your table.
I've put together a DBFiddle
Right now I am developing Hotel reservation system.
so I need to store prices on certain date/date range for future days, so the price varies on different days/dates. so I need to store those price & date details in to db. i thought of 2 structures .
1st model :
room_prices :
room_id :
from_date :
to_date :
price:
is_available:
2nd design:
room_prices :
room_id :
date:
price:
is_available
so i found the 2nd method is easy. but data stored grows exponentially
as my hotel list grows.
say suppose if i want to store next(future) 2 months price data for one hotel i need to create 60 records for every hotel.
in case of 1st design, i don't require that many records.
ex:
```
price values for X hotel :
1-Dec-15 - 10-Dec-15 -- $200,
1st design requires : only 1 row
2nd design requires : 10 rows
```
i am using mysql,
does it have any performance degradation while searching
room_prices table.
would someone suggest me any better design ?
I have actually worked on designing and implementing a Hotel Reservation system and can offer the following advice based on my experience.
I would recommend your second design option, storing one record for each individual Date / Hotel combination. The reason being that although there will be periods where a Hotel's Rate is the same across multiple days it is more likely that, depending on availability, it will change over time and become different (hotels tend to increase the room rate as the availability drops).
Also there are other important pieces of information that will need to be stored that are specific to a given day:
You will need to manage the hotel availability, i.e. on Date x there
are y rooms available. This will almost certain vary by day.
Some hotels have blackout periods where the hotel is unavailable for short periods of time (typically specific days).
Lead Time - some Hotels only allow rooms to be booked a certain
number of days in advance, this can differ between Week days and
Weekends.
Minimum nights, again data stored by individual date that says if you arrive on this day you must stay x number of nights (say over a weekend)
Also consider a person booking a week long stay, the database query to return the rates and availability for each day of that stay is a lot more concise if you have a pricing record for each Date. You can simply do a query where the Room Rate Date is BETWEEN the Arrival and Departure Date to return a dataset with one record per date of the stay.
I realise with this approach you will store more records but with well indexed tables the performance will be fine and the management of the data will be much simpler. Judging by your comment you are only talking in the region of 18000 records which is a pretty small volume (the system I worked on has several million and works fine).
To illustrate the extra data management if you DON'T store one record per day, imagine that a Hotel has a rate of 100 USD and 20 rooms available for the whole of December:
You will start with one record:
1-Dec to 31st Dec Rate 100 Availability 20
Then you sell one room on the 10th Dec.
Your business logic now has to create three records from the one above:
1-Dec to 9th Dec Rate 100 Availability 20
10-Dec to 10th Dec Rate 100 Availability 19
11-Dec to 31st Dec Rate 100 Availability 20
Then the rate changes on the 3rd and 25th Dec to 110
Your business logic now has to split the data again:
1-Dec to 2-Dec Rate 100 Availability 20
3-Dec to 3-Dec Rate 110 Availability 20
4-Dec to 9-Dec Rate 100 Availability 20
10-Dec to 10-Dec Rate 100 Availability 19
11-Dec to 24-Dec Rate 100 Availability 20
25-Dec to 25-Dec Rate 110 Availability 20
26-Dec to 31-Dec Rate 100 Availability 20
That is more business logic and more overhead than storing one record per date.
I can guarantee you that by the time you have finished your system will end up with one row per date anyway so you might as well design it that way from the beginning and get the benefits of easier data management and quicker database queries.
I think that the first solution is better and as you already noticed it reduce the number of storage you need to store prices. Another possible approach could be : having a single date and assuming that the date specified is valid up until a new date is found, so basically the same structure you designed in the second approach but implementing an internal logic in which you override a date if you find a new date for a specified period.
Let's say that you have ROOM 1 with price $200 from 1st December and with price $250 from 12 December, then you will have only two rows :
1-Dec-15 -- $200
12-Dec-15 -- $250
And you will assume in your logic that a price is valid from the specified date up until a new price is found.
Another answer shows how to set a default timestamp. However, it updates only the entry you edit. I was wondering if there is another way that would edit, let's say, the current year in all the entries.
For example, my database has 2 entries that were inserted in the year 2011, so the current date column for the 2 entries would be 2011. I would like to make this column update to 2012 when a new user is inserted, which means now there would be 3 entries and the current year column for all 3 entries would be 2012.
Is this possible?
My main objective is to calculate age.
You really don't need to - and should NOT - store the current year in a table's column (and possibly in several million rows) - unless you are writing a Star Trek application where some characters live in 2012 and others in 42012 (in other words if the "current year" is not the same for all).
You can always use this in your computations:
YEAR( CURRENT_DATE() )
If you store the user's birthdate as a DATE column then you can calculate their age. These calculations range from the "good 99%' of the time, to the good for everyone including leap year babies
This is the query I have already:
use willkara;
select EngagementNumber,AgentID, EntertainerID, StartDate, EndDate, ContractPrice, ContractPrice/DateDiff(EndDate,StartDate) AS PricePerDay
FROM EA_Engagements
where StartDate <= '1999-8-13'
and EndDate >= '1999-8-8'
ORDER BY EngagementNumber;
And this is the problem:
I need a list of engagements that occurred between 8/8/1999 and 8/13/1999. I only want to see the engagements that started on or after 8/8/1999 and ended on or before 8/13/1999. For each of those engagements, I need to know how long (in days) the engagement was, and the IDs of the entertainer and the agent, and the contract price per day of entertainment. Remember, when we compute the length of an engagement, we include both the day it started and the day it ended. Please sort the information in Engagement number order. [2 rows]
8 columns needed; Last column must be labeled PricePerDay
For some reason, some of the end dates are 8-15 and 8-19 and it's only suppose to be the dates that end on the 13th.
Since they are two fields, it would be theoretically possible for someone to put in a start date that's older than the end date and vice versa leading to incorrect results. I'd adjust your query accordingly, do a between or something similar.