SQL Server 2008 SP - sql-server-2008

I have two tables actual and forecast
Actual
month actual
6 20
7 60
8 70
and Forecast
month forecast
9 50
10 150
11 85
I have to update it in same column, i.e. till the data is available it should be updated from actual table and when data is not available there it should be updated from forecast table.
month actual/forecast
6 20
7 60
8 70
9 50
10 150
11 85

I'd do something like the following. The Source column is only for your test purposes. I'm also assuming that you might require some year or something included in your queries if your data spans multiple years.
SELECT month, actual as 'actual/forecast', 'A' as Source FROM Actual
UNION
SELECT month, forecast as 'actual/forecast', 'F' As Source FROM Forecast
WHERE month NOT IN (SELECT month FROM Actual)

maybe this will help
WITH T AS(
SELECT COUNT(*) FROM (SELECT month,actual FROM Actual
UNION ALL
SELECT MONTH,forecast FROM Forecast)
)
IF (SELECT COUNT(*) FROM T) = 0
BEGIN
--UPDATE here forecast TABLE
END
ELSE
--UPDATE here T table

Related

SQL - sum data for all time, 30 days and 90 days for multiple columns indiviually

BACKGROUND:
I have data that looks like this
date src subsrc subsubsrc param1 param2
2020-02-01 src1 ksjd dfd8 47 31
2020-02-02 src1 djsk zmnc 44 95
2020-02-03 src2 skdj awes 92 100
2020-02-04 src2 mxsf kajs 80 2
2020-02-05 src3 skdj asio 46 53
2020-02-06 src3 dekl jdqo 19 18
2020-02-07 src3 dskl dqqq 69 18
2020-02-08 src4 sqip riow 64 46
2020-02-09 src5 ss01 qwep 34 34
I am trying to aggregate for all time, last 30 days and last 90 days (no rolling sum)
So my final data would look like this:
src subsrc subsubsrc p1_all p1_30 p1_90 p2_all p2_30 p2_90
src1 ksjd dfd8 7 1 7 98 7 98
src1 djsk zmnc 0 0 0 0 0 0
src2 skdj awes 12 12 12 4 4 4
src2 mxsf kajs 6 6 6 31 31 31
src3 skdj asio 0 0 0 0 0 0
src3 dekl jdqo 20 20 20 17 17 17
src3 dskl dqqq 3 3 3 4 4 4
src4 sqip qwep 0 0 0 0 0 0
src5 ss01 qwes 15 15 15 2 2 2
ABOUT DATA:
This is only dummy data and therefore incorrect.
There are tens of thousands of rows in my data.
There are a dozen of src columns that make up the key for the table.
There are a dozen of param columns that I have to sum for 30 and 90 and all time.
Also there are null values in param columns.
Also there are might be multiple rows for same day and src column.
New data is being added every day and the query is probably going to be run every day to get the latest 30, 90, all time data.
WHAT I HAVE TRIED:
This is what I have come up with:
SELECT src, subsubsrc, subsubsrc,
SUM(param1) as param1_all,
SUM(CASE WHEN DATE_DIFF(CURRENT_DATE,date,day) <= 30 THEN param1 END) as param1_30,
SUM(CASE WHEN DATE_DIFF(CURRENT_DATE,date,day) <= 90 THEN param1 END) as param1_90,
SUM(param2) as param2_all,
SUM(CASE WHEN DATE_DIFF(CURRENT_DATE,date,day) <= 30 THEN param2 END) as param2_30,
SUM(CASE WHEN DATE_DIFF(CURRENT_DATE,date,day) <= 90 THEN param2 END) as param2_90,
FROM `MY_TABLE`
GROUP BY src
ORDER BY src
This actually works but I can anticipate how long this query is going to become for multiple sources and even more param columns.
I have been trying something called "Filtered aggregate functions (or manual pivot)" explained HERE. But I am unable to understand/implement it for my case.
Also I have looked at dozens of answers and most of them are running sums for each day OR are complicated cases of this basic calculation. Maybe I am not searching it correctly.
As you can see I am newbie in SQL and would really appreciate any help.
Your query looks quite good; conditional aggregation is the canonical method to pivot a dataset.
One way to possibly increase performance would be to change the date filter in the conditional expressions: using a date function precludes the use of an index.
Instead, you could phrase this as:
select
src,
subsrc,
subsubsrc,
sum(param1) as param1_all,
sum(case when date >= current_date - interval 30 day then param1 end) as param1_30,
sum(case when date >= current_date - interval 90 day then param1 end) as param1_90,
sum(param2) as param2_all,
sum(case when date >= current_date - interval 30 day then param2 end) as param2_30,
sum(case when date >= current_date - interval 90 day then param2 end) as param2_90
from my_table
group by src, subsrc, subsubsrc
order by src, subsrc, subsubsrc
For performance, the following index may be helpul: (src, subsrc, subsubsrc, date).
Note that I included all three non-aggregated columns (src, subsrc, subsubsrc) in the group by clause: starting MySQL 5.7, this is by default mandatory (although you can play around with sql modes to alter that behavior) - and most other databases implement the same constraint.
Your first approach isn't a bad one if you are able to build the query programmatically. One alternative might be to create side tables for the 30 and 90 day cases first so you can effectively select all columns from each. This could also be done in sub-queries but there are performance considerations.
Some untested pseudo code to hopefully clarify:
SELECT
src,
subsrc,
subsubsrc,
SUM(param1) as param1_all,
-- other "all" sums here
SUM(t30.param1) as param1_30,
-- other "30" sums here
SUM(t90.param1) as param1_90,
-- other "90" sums here
FROM MY_TABLE
LEFT JOIN (
SELECT *
FROM MY_TABLE
WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
) as t30 on t30.src = MY_TABLE.src
LEFT JOIN (
SELECT *
FROM MY_TABLE
WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)
) as t90 on t90.src = MY_TABLE.src
GROUP BY MY_TABLE.src
ORDER BY MY_TABLE.src
Note the date conditions have been switched to not use a function on the date column but instead compare to a date value. Your original approach would defeat any index on date (which you will want to make this more efficient).
If you first put these sub-queries into side tables that have a key on src the joins will be more efficient too. You could even group/sum directly into those side tables first rather than creating larger copies of your data, and then join the aggregated data together.
Your code looks good. Your RDBMS needs to loop all records under the hood and do some calculations. One thing that you can improve is that you are calculating date differences for all records. It would make sense to calculate the moment 30 days ago and 90 days ago beforehand, respectively and only compare the dates against those.
Since you already know that the number of rows and parameters will increase in the future, it makes sense to create a cron job which daily computes this in the following manner:
the first time it calculates the values, it should store all the results along with the date it was running at (maybe into a table dedicated for this analytics)
on subsequent days you can calculate the all time sum by loading the items which were created since the last check
you will still need to calculate the 30 and 90 day stuff, but that would be much less of a problem than calculating this for all time
If you do this properly and have daily information, then later on you will be able to analyze trends in history as well.
I'd recommend you use 3 different queries for that:
Sum for all time
Sum for 30 days
Sum for 90 days
Because when you're trying to do all-in-1 query then you end up with full table scan because of CASE-WHEN-END (BTW there is compact form IF() in MySQL). This is extremely non-optimal.
If you split it into 3 different queries and add an index to the date column then it won't do full-scan for the 2nd and 3rd query. Only for the 1st query, which can be optimised separately (for example by caching).
Also this approach: DATE_DIFF(CURRENT_DATE,date,day) <= 90
should be changed to: date >= 'date-90-days-ago' (where 'date-90-days-ago' is a fixed date)
Thus you won't have to compute difference of 2 dates for every row. You'll have just to compute 2 dates: 30 days ago and 90 days ago and compare all other dates to these two. This approach will benefit of the date column index.

How can I get the month that is not yet updated in SQL by inserting another row on every update?

I have a table that contains records of different transaction that is needed to be updated monthly. Once the record for a specific month has been successfully updated, it will insert a new record to that table to indicate that it is already updated. Let's take this example.
**date_of_transaction** **type**
2015-04-21 1 //A deposit record
2015-04-24 2 //A withdrawal record
2015-04-29 1
2015-04-30 2
2015-04-30 3 //3, means an update record
2015-05-14 1
2015-05-22 1
2015-05-27 2
2015-05-30 2
2015-06-09 1
2015-06-12 2
2015-06-17 2
2015-06-19 2
Let's suppose that the day today is July 23, 2015. I can only get the data one month lower than the current month, so only the data that I can get are june and downwards records.
As you can see, there is an update performed in the month of April because of the '3' in the type attribute, but in the month of May and June, there are no updates occurred, how can I get the month that is not yet updated?
This will return you months, which has no type=3 rows
SELECT MONTH([trans-date]) FROM [table] GROUP BY MONTH([trans-date]) HAVING MAX([trans-type])<3
Note: this will not work if 3 is not max value in the column
My approach would be to find all the months first, then find the months whose records were updated. Then select only those months from all months whose records werent updated (A set minus operation).
Mysql query would be something like this
select extract(MONTH,data_of_transaction) from your_table_name where month not in (select extract(MONTH,data_of_transaction) from table where type=3);
You can try this;
select *
from tbl
where date_of_transaction < 'July 23, 2015'
and
date_format(date_of_transaction, '%M-%Y') in (
select
date_format(date_of_transaction, '%M-%Y')
from tbl
group by date_format(date_of_transaction, '%M-%Y')
having max(type) != 3
)
date_format(date_of_transaction, '%M-%Y') will take month-year in consideration and filter the data having type = 3.

Creating columns per where clause

Is it possible to construct a query to create multiple columns based on multiple conditions and aggregate by time. For example a simple table like so:
id value created
1 45 datetime
2 52 datetime
3 24 datetime
4 33 datetime
5 20 datetime
I can get all values between 10 and 20 per week of the year like so:
SELECT count(*) as '10-20', week(created) as 'week', year(created) as 'year'
FROM table
WHERE value > 10 AND value < 20
GROUP BY year(created), week(created)
This will give me for example:
10-20 week year
40 1 2014
21 2 2014
3 33 2014
I could repeat the query for ranges 20-30,30-40, 40-50 and manual join the outputs but I'd like a single queries combining these into a table, so the output would be:
10-20 20-30 30-40 40-50 week year
40 0 33 42 1 2014
21 1 0 2 1 2014
0 0 32 0 12 2014
3 42 34 32 33 2014
You can use SUM instead of COUNT, with an IF inside the brackets of the sum
Something like this
SELECT SUM(IF(value > 10 AND value < 20), 1, 0) as '10-20',
SUM(IF(value > 20 AND value < 30), 1, 0) as '20-30',
SUM(IF(value > 30 AND value < 40), 1, 0) as '30-40',
SUM(IF(value > 40 AND value < 50), 1, 0) as '40-50',
week(created) as 'week', year(created) as 'year'
FROM table
WHERE
GROUP BY year(created), week(created)
Note that there is an issue here (and in your original code) for items with a value that is on the border (eg, if 20).
Flexibility wise you would probably be better off with another table that stores the ranges, join that to get get one row per range per week / year and then turn the rows to columns in your script. Saves amending the code when a range is added.

Date ranking in Access SQL?

I have a query pulling the last six months of data from a table which has a column, UseDates (so as of today in June, this table has dates for December 2011 through May 2012).
I wish to include a "rank" column that associates a 1 to all December dates, 2 to all January dates, etc -- up to 6 for the dates corresponding one month prior. If I were to open up this query a month from now, the 1 would then be associated with January, etc.
I hope this makes sense!
Example, if I ran the query right now
UseDate Rank
12/31/2011 1
1/12/2012 2
...
5/23/2012 6
Example, if I ran the query in August:
UseDate Rank
2/16/2012 1
3/17/2012 2
...
7/21/2012 6
Example, if I ran the query in March:
UseDate Rank
9/16/2011 1
10/17/2011 2
...
2/24/2012 6
SELECT
UseDates,
DateDiff("m", Date(), UseDates) + 7 AS [Rank]
FROM YourTable;
You can use month function for UseDates and subtract it from the result of now function. If it goes negative, just add 12. Also you may want to add 1 since you start with 1 and not 0. Apparently it should work for half a year date ranges. You'll get into trouble when you need to "rank" several years.
You can rank with a count.
SELECT
Table.ADate,
(SELECT Count(ADate)
FROM Table b
WHERE b.ADate<=Table.ADate) AS Expr1
FROM Table3;
You have to repeat any where statement in the subquery:
SELECT
Table.ADate,
(SELECT Count(ADate)
FROM Table b
WHERE b.ADate<=Table.ADate And Adate>#2012/02/01#) AS Expr1
FROM Table3
WHERE Adate>#2012/02/01#

MySQL select/where statement

I have a webapplication linked to a mysql database with the following fields:
field 1:trip_id
field 2:trip_destination
field 3:trip_description
field 4:trip_duration
In the webapplication I have a listbox based on the following:
ListBox value =1: trip duration 1 - 5 days
ListBox value =2: trip duration 6 - 10 days
Listbox value =3: trip duration 11 -20 days
ListBox value =4: trip duration over 20 days
How do I put this in the sql select statement?
SELECT * FROM trip_table WHERE trip_duration BETWEEN start_day-1 AND end_day+1;
You would then need to replace start_day and end_day with your periods e.g. start_day = 6 end_day=10.
Hope this helps.
in its simplest form, from your internally controlled values of the listbox ranges (and I'm not a PHP programmer to fill in the blanks), but a query could be.
select *
from TripTable
where trip_duration >= ?MinimumDays
AND trip_duration <= ?MaximumDays
If you are trying to get all trips, and have them pre-stamped with a 1-4 classification, I would apply a CASE WHEN
select *,
case
when trip_duration >= 1 and trip_duration <=5 then 1
when trip_duration >= 6 and trip_duration <=10 then 2
when trip_duration >= 11 and trip_duration <=20 then 3
when trip_duration > 20 then 4
end as TripDurationType
from
TripTable