I have 3 columns (id, date, amount) and trying to calculate 4th column (calculated_column).
How to create SQL query to do following:
The way that needs to be calculated is to look at ID (e.g. 1) and see all same IDs for that month (e.g. for first occurrence - 1-Sep it should be calculated as 5 and for second occurrence - it would be 5+6=11 -> all amounts from beginning of that month including that amount).
Then for the next month (Oct) - it will find first occurrence of id=1 and store 3 in calculated_column and for the second occurrence of id=1 in Oct it will do sum from beginning of that month for the same id (3+2=5)
Assuming I've understood correctly, I would suggest a correlated subquery such as:
select t.*,
(
select sum(u.amount) from table1 u
where
u.id = t.id and
date_format(u.date, '%Y-%m') = date_format(t.date, '%Y-%m') and u.date <= t.date
) as calculated_column
from table1 t
(Change the table name table1 to suit your data)
In Oracle and MySQL 8+, you can use window functions. The corresponding date arithmetic varies, but here is the idea:
select t.*,
(case when date = max(date) over (partition by to_char(date, 'YYYY-MM') and
id = 1
then sum(amount) over (partition by to_char(date, 'YYYY-MM')
end) as calculated_column
from t;
The outer case is simply to put the value on the appropriate row of the result set. The code would be simpler if all rows in the month had the same value.
Here is a solution for oracle. Since you did not gave the table name I named it my_table, change it to the real name
select
t1.id,
t1.date,
t1.amount,
decode(t1.id, 1, sum(nvl(t2.amount, 0)), null) calculated_column
from my_table1 t1
left join my_table t2
on trunc(t2.date, 'month') = trunc(t1.date, 'month')
and t1.id = 1
group by t1.id, t1.date, t1.amount
If your version supports window function (e.g. MySQL 8 upwards)
# MySQL 8+
select
t.*
, sum(amount) over (partition by id, date_format(date, '%Y-%m-01') order by date) as calculated_column
from t
;
-- Oracle
select
t.*
, sum(amount) over (partition by id, trunc(date, 'MM') order by date) as calculated_column
from t
;
Related
I am trying to determine the average difference between the events in days, within a column in mysql workbench.
sample data looks something like this :
I want to determine the average of duration between events grouped by organizer. any suggestions please?
If you are running MySQL 8.0, you can use lag() for this:
select avg(datediff(event_date, lag_event_date)) avg_diff
from (
select
t.*,
lag(event_date) over(partition by nid order by event_date) lag_event_date
from mytable t
) t
In earlier versions, a typical workaround is a correlated subquery:
select nid, avg(datediff(event_date, lag_event_date)) avg_diff
from (
select
t.*,
(
select max(t1.event_date)
from mytable t1
where t1.nid = t.nid and t1.event_date < t.event_date
) lag_event_date
from mytable t
) t
group by nid
The simplest method is to take the largest date minus the smallest date and divide by one less than the count:
select organizer,
datediff(day, min(date), max(date)) * 1.0 / nullif(count(*) - 1, 0) as avg_day_diff
from t
group by organizer;
Try the following:
LAG([EVENT DATE],1) OVER ( ORDER BY [EVENT DATE]) AS PREV_EVENT_DATE,
DATEDIFF(DD, LAG([EVENT DATE],1) OVER ( ORDER BY [EVENT DATE]), [EVENT DATE]) AS DAYS_BETWEEN_EVENTS
DAYS_BETWEEN_EVENT Can then be used to calculate your average days difference.
The key piece of SQL to use in these instances is the LAG function because it allows you to return a value from the previous row. Documentation here
I'm trying to get a complete set of buckets for a given dataset, even if no records exist for some buckets.
For example, I want to display totals by day of week, with zero total for days with no records.
SELECT
WEEKDAY(transaction_date) AS day_of_week,
SUM(sales) AS total_sales
FROM table1
GROUP BY day_of_week
If I have sales every day, I'll get 7 rows in my result representing total sales on days 0-6.
If I don't have sales on Day 2, I get no result for Day 2.
What's the most efficient way to force a zero value for day 2?
Should I join to a temporary table or array of defined buckets? ['0','1','2','3','4','5','6']
Or is it better to insert zeros outside of MySQL, after I've done the query?
I am using MySQL, but this is a general SQL question.
In MySQL, you could simply use a derived table of numbers from 1 to 7, left join it with the table, then aggregate:
select d.day_of_week, sum(sales) AS total_sales
from (
select 1 day_of_week union all select 2 union all select 3 union all select 4
union all select 5 union all select 6 union all select 7
) d
left join table1 t1 on weekday(t1.transaction_date) = d.day_of_week
group by day_of_week
Very recent versions have the values(row...) syntax, which shortens the query:
select d.day_of_week, sum(sales) AS total_sales
from (values row(1), row(2), row(3), row(4), row(5), row(6), row(7)) d(day_of_week)
left join table1 t1 on weekday(t1.transaction_date) = d.day_of_week
group by day_of_week
Basically you want the answer to be 0 when the data is actually null for that bucket, therefore you want the max(null, 0). A max function wouldn't natively work with NULL in this way, however, you can use COALESCE to force it:
COALESCE(MAX(SUM(sales)),0)
as suggested by this answer
First off you need a calendar table; something like this or this. Or create calendar subset on the fly. I am not sure of the mySQL syntax, but here is what it would look like in SQL Server.
DECLARE
#FromDate DATE
, #ToDate DATE
-- set these variables to appropriate values
SET #FromDate = '2020-03-01';
SET #ToDate = '2020-03-31';
;WITH cteCalendar (MyDate) AS
(
SELECT CONVERT(DATE, #FromDate) AS MyDate
UNION ALL
SELECT DATEADD(DAY, 1, MyDate)
FROM cteCalendar
WHERE DATEADD(DAY, 1, MyDate) <= #ToDate
)
SELECT WEEKDAY(cte.MyDate) AS day_of_week,
SUM(sales) AS total_sales
FROM cteCalendar cte
LEFT JOIN table1 t1 ON cte.MyDate = t1.transaction_date
GROUP BY day_of_week
I have a select SQL query on Mysql that returns a result in the form with two columns:
number date
1 date1
1 date2
2 date3
.
.
How do you i select from the select query and keep only the most recent date for each number.
I have problems working the query result.
You can use your query as a derived table. Since you didn't provide your query, let's use this for example:
SELECT Name, Date
FROM YourQuery
Now take MAX(Date) and GROUP BY Name with your query as the derived table:
SELECT MAX(Date), Name
FROM (
SELECT Name, Date
FROM YourQuery
) a
GROUP BY Name
If you just want to show the number and date, you can do a regular group by on the returned select and it should do the trick:
select
a.number,
max(a.date) from
(select number, date from table_name ) a
group by a.number
If you have other columns that you would like to show on that row with the most recent date, this should do the trick:
select
a.number,
a.lastentrytime,
b.some_other_column
from (
select number,
max(date) recent_date
from table_name group by number) a
inner join table_name b on a.number= b.number and a.recent_date= b.date
;with cte as (
select number, row_number() over(order by date desc) as rn from thistable )
select number from cte where rn=1
You can use this query to get most recent record.
I have a lookup table that relates dates and people associated with those dates:
id, user_id,date
1,1,2014-11-01
2,2,2014-11-01
3,1,2014-11-02
4,3,2014-11-02
5,1,2014-11-03
I can group these by date(day):
SELECT DATE_FORMAT(
MIN(date),
'%Y/%m/%d 00:00:00 GMT-0'
) AS date,
COUNT(*) as count
FROM user_x_date
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) / 43200)
But, how can get the number of unique users, that have now shown up previously? For instance this would be a valid result:
unique, non-unique, date
2,0,2014-11-01
1,1,2014-11-02
0,1,2014-11-03
Is this possibly without having to rely on a scripting language to keep track of this data?
I think this query will do what you want, at least it seems to work for your limited sample data.
The idea is to use a correlated sub-query to check if the user_id has occurred on a date before the date of the current row and then do some basic arithmetic to determine number of unique/non-unique users for each date.
Please give it a try.
select
sum(u) - sum(n) as "unique",
sum(n) as "non-unique",
date
from (
select
date,
count(user_id) u,
case when exists (
select 1
from Table1 i
where i.user_id = o.user_id
and i.date < o.date
) then 1 else 0
end n
from Table1 o
group by date, user_id
) q
group by date
order by date;
Sample SQL Fiddle
I didn't include the id column in the sample fiddle as it's not needed (or used) to produce the result and won't change anything.
This is the relevant question: "But, how can get the number of unique users, that have now shown up previously?"
Calculate the first time a person shows up, and then use that for the aggregation:
SELECT date, count(*) as FirstVisit
FROM (SELECT user_id, MIN(date) as date
FROM user_x_date
GROUP BY user_id
) x
GROUP BY date;
I would then use this as a subquery for another aggregation:
SELECT v.date, v.NumVisits, COALESCE(fv.FirstVisit, 0) as NumFirstVisit
FROM (SELECT date, count(*) as NumVisits
FROM user_x_date
GROUP BY date
) v LEFT JOIN
(SELECT date, count(*) as FirstVisit
FROM (SELECT user_id, MIN(date) as date
FROM user_x_date
GROUP BY user_id
) x
GROUP BY date
) fv
ON v.date = fv.date;
I have a table of production readings and need to get a result set containing a row for the min(timestamp) for EACH hour.
The column layout is quite simple:
ID,TIMESTAMP,SOURCE_ID,SOURCE_VALUE
The data sample would look like:
123,'2013-03-01 06:05:24',PMPROD,12345678.99
124,'2013-03-01 06:15:17',PMPROD,88888888.99
125,'2013-03-01 06:25:24',PMPROD,33333333.33
126,'2013-03-01 06:38:14',PMPROD,44444444.44
127,'2013-03-01 07:12:04',PMPROD,55555555.55
128,'2013-03-01 10:38:14',PMPROD,44444444.44
129,'2013-03-01 10:56:14',PMPROD,22222222.22
130,'2013-03-01 15:28:02',PMPROD,66666666.66
Records are added to this table throughout the day and the source_value is already calculated, so no sum is needed.
I can't figure out how to get a row for the min(timestamp) for each hour of the current_date.
select *
from source_readings
use index(ID_And_Time)
where source_id = 'PMPROD'
and date(timestamp)=CURRENT_DATE
and timestamp =
( select min(timestamp)
from source_readings use index(ID_And_Time)
where source_id = 'PMPROD'
)
The above code, of course, gives me one record. I need one record for the min(hour(timestamp)) of the current_date.
My result set should contain the rows for IDs: 123,127,128,130. I've played with it for hours. Who can be my hero? :)
Try below:
SELECT * FROM source_readings
JOIN
(
SELECT ID, DATE_FORMAT(timestamp, '%Y-%m-%d %H') as current_hour,MIN(timestamp)
FROM source_readings
WHERE source_id = 'PMPROD'
GROUP BY current_hour
) As reading_min
ON source_readings.ID = reading_min.ID
SELECT a.*
FROM Table1 a
INNER JOIN
(
SELECT DATE(TIMESTAMP) date,
HOUR(TIMESTAMP) hour,
MIN(TIMESTAMP) min_date
FROM Table1
GROUP BY DATE(TIMESTAMP), HOUR(TIMESTAMP)
) b ON DATE(a.TIMESTAMP) = b.date AND
HOUR(a.TIMESTAMP) = b.hour AND
a.timestamp = b.min_date
SQLFiddle Demo
With window function:
WITH ranked (
SELECT *, ROW_NUMBER() OVER(PARTITION BY HOUR(timestamp) ORDER BY timestamp) rn
FROM source_readings -- original table
WHERE date(timestamp)=CURRENT_DATE AND source_id = 'PMPROD' -- your custom filter
)
SELECT * -- this will contain `rn` column. you can select only necessary columns
FROM ranked
WHERE rn=1
I haven't tested it, but the basic idea is:
1) ROW_NUMBER() OVER(PARTITION BY HOUR(timestamp) ORDER BY timestamp)
This will give each row a number, starting from 1 for each hour, increasing by timestamp. The result might look like:
|rest of columns |rn
123,'2013-03-01 06:05:24',PMPROD,12345678.99,1
124,'2013-03-01 06:15:17',PMPROD,88888888.99,2
125,'2013-03-01 06:25:24',PMPROD,33333333.33,3
126,'2013-03-01 06:38:14',PMPROD,44444444.44,4
127,'2013-03-01 07:12:04',PMPROD,55555555.55,1
128,'2013-03-01 10:38:14',PMPROD,44444444.44,1
129,'2013-03-01 10:56:14',PMPROD,22222222.22,2
130,'2013-03-01 15:28:02',PMPROD,66666666.66,1
2) Then on the main query we select only rows with rn=1, in other words, rows that has lowest timestamp in each hourly partition (1st row after sorted by timestamp in each hour).