Teradata SQL: Group by when summing over rows unbounded - teradata-sql-assistant

I am a little confused about how the sum over rows unbounded function works.
I am running the following code to get the following output:
select
policy_id,
incurred_date,
paid_date,
row_number () over (partition by policy_agreement_id, incurred_date order by paid_date) as row_num
from table1
qualify row_num = 1
policy_id | incurred_date | paid_date | row_num
111 | 01/01/2019 | 01/10/2019 | 1
222 | 01/01/2019 | 01/10/2019 | 1
333 | 01/01/2019 | 01/11/2019 | 1
444 | 01/01/2019 | 01/11/2019 | 1
etc..
What I want to do is getting a running total of the policy_ids for each incurred_date as paid_date increases.
When I run the following code, I am not getting what I want:
select
incurred_date,
paid_date,
sum(row_num) over (partition by incurred_date order by paid_date rows unbounded preceding)
from (
select
policy_id,
incurred_date,
paid_date,
row_number () over (partition by policy_agreement_id, incurred_date order by paid_date) as row_num
from table1
qualify row_num = 1
) sub
group by 1,2, row_num
incurred_date | paid_date | pol_count
01/01/2019 | 01/10/2019 | 1
01/01/2019 | 01/11/2019 | 2
Desired output:
incurred_date | paid_date | pol_count
01/01/2019 | 01/10/2019 | 2
01/01/2019 | 01/11/2019 | 4
I understand this is probably because row_num is in the group by and it is always equal to one, however I am unable to run my code without it there. I am able to get the desired output through other means, but I am curious as to why row_num needs to be in the group by or if I am just not seeing something that I am missing.

It looks like you are seeing this behavior because you have a window function and an aggregate function in the same query. The aggregate function (GROUP BY) will be executed before the window function (SUM() OVER(...)). So, going by your example, you will never get a pol_count value greater than 2, since there are only two unique paid_date values for each incurred_date.
If you just want to get a running total of policy_ids for each incurred_date, try using a cumulative sum:
SELECT
src.incurred_date,
src.paid_date,
SUM(src.pol_count) OVER(
PARTITION BY src.incurred_date, src.paid_date
ORDER BY src.paid_date ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW -- Cumulative sum
)
FROM (
SELECT incurred_date, paid_date, COUNT(*) AS pol_count -- Get counts
FROM table1
GROUP BY incurred_date, paid_date
) src
This assumes that table1 stores one row per policy_id.

Related

Min and Max on a SUM column

I have a table like:
Phrase | qty
phrase_1 | 4
phrase_1 | 1
phrase_1 | 8
phrase_2 | 2
phrase_3 | 3
phrase_3 | 2
What I initially return is:
phrase_1 | 13
phrase_3 | 5
phrase_2 | 2
Using:
SELECT phrase, sum(qty) as total
FROM mytable
GROUP By phrase
ORDER BY total DESC
What I need, and can't figure out, is how to return the min and max with the results.
so I would get:
phrase, qty, min, max
phrase_1 | 13 | 2 | 13
phrase_3 | 5 | 2 | 13
phrase_2 | 2 | 2 | 13
Because I want to run a normalization on the resultset and return a new order based on values between 1 and 0.
Something like (this doesn't work):
SELECT phrase, sum(qty) as total, (total - min(total)/max(total) - min(total)) AS rank
FROM mytable
GROUP By phrase
ORDER BY rank DESC
The above statement is ultimiately what I'm looking to do and not sure if it's possible.
With some subqueries you can achieve your goal, but pretty it will never get
CREATE TABLE mytable (
`Phrase` VARCHAR(8),
`qty` INTEGER
);
INSERT INTO mytable
(`Phrase`, `qty`)
VALUES
('phrase_1', '4'),
('phrase_1', '1'),
('phrase_1', '8'),
('phrase_2', '2'),
('phrase_3', '3'),
('phrase_3', '2');
SELECT phrase,total,(total - mi/ma - mi) AS rank
FROM
(SELECT phrase, sum(qty) as total
FROM mytable
GROUP By phrase
ORDER BY total DESC) t1 CROSS JOIN (SELECT MIN(total) mi,MAX(total) ma
FROM
(SELECT phrase, sum(qty) as total
FROM mytable
GROUP By phrase
ORDER BY total DESC) t1) t2
phrase | total | rank
:------- | ----: | ------:
phrase_1 | 13 | 10.8462
phrase_3 | 5 | 2.8462
phrase_2 | 2 | -0.1538
db<>fiddle here
You want window functions:
SELECT phrase, sum(qty) as total,
MIN(SUM(qty)) OVER () as min_total,
MAX(SUM(qty)) OVER () as max_total
FROM mytable
GROUP By phrase
ORDER BY total DESC
You can use code below :
SELECT phrase, sum(qty) as total,
MIN(SUM(qty)) OVER () as min_total,
MAX(SUM(qty)) OVER () as max_total
into #temp
FROM mytable
GROUP By phrase
ORDER BY total DESC
Select *,(total - min_total/max_total - min_total) AS rank From #temp
Drop Table #temp

Calculate unique items seen by users via sql

I need help to resolve the next case.
The data which users want to see is accessible by pagination requests and later these requests are stored in the database in the next form:
+----+---------+-------+--------+
| id | user id | first | amount |
+----+---------+-------+--------+
| 1 | 1 | 0 | 5 |
| 2 | 1 | 10 | 10 |
| 3 | 1 | 10 | 5 |
| 4 | 1 | 15 | 10 |
| 5 | 2 | 0 | 10 |
| 6 | 2 | 0 | 5 |
| 7 | 2 | 10 | 5 |
+----+---------+-------+--------+
The table is ordered by user id asc, first asc, amount desc.
The task is to write the SQL statement which calculate what total unique amount of data the user has seen.
For the first user total amount must be 20, since the request with id=1 returned first 5 items, with id=2 returned another 10 items. Request with id=3 returns data already 'seen' by request with id=2. Request with id=4 intersects with id=2, but still returns 5 'unseen' pieces of data.
For the second user total amount must be 15.
As a result of SQL statement, I should get the next output:
+---------+-------+
| user id | total |
+---------+-------+
| 1 | 20 |
+---------+-------+
| 2 | 15 |
+---------+-------+
I am using MySQL 5.7, so window functions are not available for me. I stuck with this task for a day already and still cannot get the desired output. If it is not possible with this setup, I will end up calculating the results in the application code. I would appreciate any suggestions or help with resolving this task, thank you!
This is a type of gaps and islands problem. In this case, use a cumulative max to determine if one request intersects with a previous request. If not, that is the beginning of an "island" of adjacent requests. A cumulative sum of the beginnings assigns an "island", then an aggregation counts each island.
So, the islands look like this:
select userid, min(first), max(first + amount) as last
from (select t.*,
sum(case when prev_last >= first then 0 else 1 end) over
(partition by userid order by first) as grp
from (select t.*,
max(first + amount) over (partition by userid order by first range between unbounded preceding and 1 preceding) as prev_last
from t
) t
) t
group by userid, grp;
You then want this summed by userid, so that is one more level of aggregation:
with islands as (
select userid, min(first) as first, max(first + amount) as last
from (select t.*,
sum(case when prev_last >= first then 0 else 1 end) over
(partition by userid order by first) as grp
from (select t.*,
max(first + amount) over (partition by userid order by first range between unbounded preceding and 1 preceding) as prev_last
from t
) t
) t
group by userid, grp
)
select userid, sum(last - first) as total
from islands
group by userid;
Here is a db<>fiddle.
This logic is similar to Gordon's, but runs on older releases of MySQL, too.
select userid
-- overall length minus gaps
,max(maxlast)-min(minfirst) + sum(gaplen) as total
from
(
select userid
,prevlast
,min(first) as minfirst -- first of group
,max(last) as maxlast -- last of group
-- if there was a gap, calculate length of gap
,min(case when prevlast < first then prevlast - first else 0 end) as gaplen
from
(
select t.*
,first + amount as last -- last value in range
,( -- maximum end of all previous rows
select max(first + amount)
from t as t2
where t2.userid = t.userid
and t2.first < t.first
) as prevlast
from t
) as dt
group by userid, prevlast
) as dt
group by userid
order by userid
See fiddle

Use Group By against two separate columns in SQL to compute a new column

I have a mysql table called transactions which looks as follows:
|---------|--------------|--------------|--------------------------|
|order_id |customer_name | brand_name | order_time_stamp |
|---------|--------------|--------------|--------------------------|
| 1 | Jack | Pepsi | 2019-02-23 20:02:21.550. |
|---------|--------------|--------------|--------------------------|
| 2 | Dorothy | Fanta | 2019-02-23 20:03:21.550. |
|---------|--------------|--------------|--------------------------|
| 3 | Dorothy | Fanta | 2019-02-23 20:04:21.550. |
|---------|--------------|--------------|--------------------------|
| 4 | Jack | Fanta | 2019-02-23 20:05:21.550. |
|---------|--------------|--------------|--------------------------|
As is evident, this is a table that captures every order at an online store with the order_id being the primary key. What I am trying to capture is the number of additional orders grouped by brand_name as follows:
enter code here
|------------|--------------------|
| brand_name | additional orders |
|------------|--------------------|
| Pepsi | 0 |
|------------|--------------------|
| Fanta | 1 |
|------------|--------------------|
However, additional orders are defined on a customer level and are defined as the sum of all orders after the first order by a customer.
My strategy to do this was was to use the rank() function as follows:
select rank() over( partition by customer_name order by order_time_stamp) as rank
from transactions
This creates an additional column which creates a rank per customer. However, I am not sure how to now group this on a brand_level and get the output as I have shown
You can use row_number() to rank the orders per customer, then filter on "additional" orders (that is, every order that whose rank is greater than 1), then aggregate by brand_name:
select brand_name, count(*) no_additional_orders
from (
select
t.*,
row_number() over(partition by customer_name order by order_time_stamp) rn
from transactions t
) t
where rn > 1
group by brand_name
If you want to also take in account brands that have no additional order, then you can move the filtering logic to the aggregate function:
select brand_name, sum(rn > 1) no_additional_orders
from (
select t.*, row_number() over(partition by customer_name order by order_time_stamp) rn
from transactions t
) t
group by brand_name
Your data is rather confusing. I think you want everything after the earliest timestamp, not the earliest order. This is a subtle difference, but important:
select brand_name,
sum(order_time_stamp > min_ots)
from (select t.*, min(order_time_stamp) over (partition by customer_name) as min_ots
from t
) t
group by brand_name;
You can do something similar with rank() as well:
select brand_name,
sum(seqnum > 1)
from (select t.*,
rank() over (partition by customer_name order by order_time_stamp) as seqnum
from t
) t
group by brand_name;
You want to sum all the orders of each customer_name per brand_name except 1, because you don't want in the sum each customer's 1st order.
You can do it with by subtracting from the total number of orders the number of distinct customers that ordered the product which is equal to the number of 1st orders of each customer:
select brand_name,
count(*) - count(distinct customer_name) additional_orders
from transactions
group by brand_name
See the demo.
Results:
> brand_name | additional_orders
> :--------- | ----------------:
> Pepsi | 0
> Fanta | 1

Get Earliest or Latest Date of Row Detail in MySQL [duplicate]

This question already has answers here:
Retrieving the last record in each group - MySQL
(33 answers)
Closed 2 years ago.
I have a table like so. The way it works is that the billing occurs daily to make sure that accounts are current.
+------+------------+-------------+
| ID | AcctType | BillingDate |
+------+------------+-------------+
| 100 | Individual | 2020-01-01 |
| 100 | Individual | 2020-01-02 |
| 100 | Individual | 2020-01-03 |
| 101 | Group | 2020-01-01 |
| 101 | Group | 2020-01-02 |
| 101 | Individual | 2020-01-01 |
+------+------------+-------------+
What I need to find is the first and last AcctType of each plan by ID since the AcctType can change. I am using MySQL and the aggregation of select ID, AcctType, min(BillingDate) from table group by ID won't work because AcctType will return a random value associated with the ID. How do I reliably get the latest and earliest AcctType by ID? Using version 5.6.
If you are running MySQL 8.0, you can use window functions for this:
select distinct
id,
first_value(acctType) over(
partition by id
order by billingDate
rows between unbounded preceding and unbounded following
) firstAccType,
last_value(acctType) over(
partition by id
order by billingDate
rows between unbounded preceding and unbounded following
) lastAccType
from mytable
This generates a single record for each id, with the first and last value of accType in columns.
In earlier versions, using correlated subquery is probably the simplest solution to achieve the same result:
select distinct
id,
(
select t1.accType
from mytable t1
where t1.id = t.id
order by billingDate asc
limit 1
) firstAccType,
(
select t1.accType
from mytable t1
where t1.id = t.id
order by billingDate desc
limit 1
) lastAccType
from mytable

Select last inserted value of each month for every year from DATETIME

I got a DATETIME to store when the values where introduced, like this example shows:
CREATE TABLE IF NOT EXISTS salary (
change_id INT(11) NOT NULL AUTO_INCREMENT,
emp_salary FLOAT(8,2),
change_date DATETIME,
PRIMARY KEY (change_id)
);
I gonna fill the example like this:
+-----------+------------+---------------------+
| change_id | emp_salary | change_date |
+-----------+------------+---------------------+
| 1 | 200.00 | 2018-06-18 13:17:17 |
| 2 | 700.00 | 2018-06-25 15:20:30 |
| 3 | 300.00 | 2018-07-02 12:17:17 |
+-----------+------------+---------------------+
I want to get the last inserted value of each month for every year.
So for the example I made, this should be the output of the Select:
+-----------+------------+---------------------+
| change_id | emp_salary | change_date |
+-----------+------------+---------------------+
| 2 | 700.00 | 2018-06-25 15:20:30 |
| 3 | 300.00 | 2018-07-02 12:17:17 |
+-----------+------------+---------------------+
1 won't appear because is an outdated version of 2
You could use a self join to pick group wise maximum row, In inner query select max of change_date by grouping your data month and year wise
select t.*
from your_table t
join (
select max(change_date) max_change_date
from your_table
group by date_format(change_date, '%Y-%m')
) t1
on t.change_date = t1.max_change_date
Demo
If you could use Mysql 8 which has support for window functions you could use common table expression and rank() function to pick row with highest change_date for each year and month
with cte as(
select *,
rank() over (partition by date_format(change_date, '%Y-%m') order by change_date desc ) rnk
from your_table
)
select * from cte where rnk = 1;
Demo
The below query should work for you.
It uses group by on month and year to find max record for each month and year.
SELECT s1.*
FROM salary s1
INNER JOIN (
SELECT MAX(change_date) maxDate
FROM salary
GROUP BY MONTH(change_date), YEAR(change_date)
) s2 ON s2.maxDate = s1.change_date;
Fiddle link : http://sqlfiddle.com/#!9/1bc20b/15