Generating accurate subtotals on fanout joins - mysql

My data model:
The query:
SELECT
ProductSummary.Product,
ProductSummary.ID AS SummaryID,
Transactions.DateOfSale,
Summary.Revenue
FROM
ProductSummary JOIN
Transactions ON (Transactions.ProductID = ProductSummary.ID)
WHERE
Transactions.DateOfSale < '2014-01-10'
The data itself looks fine, however I also want to show a subtotal, and the subtotal of a table should be the amount displayed when that table is not joined.
For example, for subtotaling Revenue the answer should always be what I would get from SELECT SUM(Revenue) FROM Summary (after applying any necessary filters). How to generate that?

One way to do this would be using an analytic function to count the unique rows while totaling, for example:
WITH
ProductSummary (Product, ID, Revenue) AS (
SELECT 'Car', 1, 12 UNION ALL
SELECT 'Phone', 2, 7
),
Transactions (SummaryID, ID, DateOfSale) AS (
SELECT 1, 1, DATE '2014-01-01' UNION ALL
SELECT 1, 2, DATE '2014-01-02' UNION ALL
SELECT 1, 3, DATE '2014-01-03' UNION ALL
SELECT 2, 4, DATE '2014-01-04' UNION ALL
SELECT 1, 5, DATE '2014-01-04' UNION ALL
SELECT 1, 6, DATE '2014-01-04' UNION ALL
SELECT 1, 7, DATE '2020-01-01'
)
SELECT
ProductSummary.Product,
ProductSummary.ID AS SummaryID,
Transactions.DateOfSale,
ProductSummary.Revenue,
IF(
ROW_NUMBER() OVER (PARTITION BY ProductSummary.ID) = 1,
ProductSummary.Revenue,
0
) RevenueUnique
FROM
ProductSummary
JOIN Transactions ON (Transactions.SummaryID=ProductSummary.ID)
WHERE
Transactions.DateOfSale < DATE '2014-01-10';

Related

Emulating a ROLLUP without that keyword

In postgres, I can emulate a two dimensional pivot table by doing a query such as:
SELECT ... FROM ...
GROUP BY
ROLLUP(x,y,z), -- ROWS
ROLLUP(a,b,c) -- COLS
As a concrete example in dbfiddle:
However, if a database did not have access to the ROLLUP keyword, how could this be emulated? I know a one-dimensional ROLLUP could be done with a UNION ALL, such as:
SELECT a, b, SUM(c) FROM Input GROUP BY ROLLUP(a, b);
-->
SELECT NULL, NULL, SUM(c) FROM Input UNION ALL
SELECT a, NULL, SUM(c) FROM Input GROUP BY a UNION ALL
SELECT a, b, SUM(c) FROM Input GROUP BY a, b;
But how could this be done without access to the ROLLUP keyword? We an use postgres (or mysql) as the database here, but in your answer just refrain from using the ROLLUP keyword.
You can emulate this also with UNION ALLs since we have:
GROUP BY ROLLUP(a,b)
--> GROUP BY (), (a), (a,b)
So with ROLLUP(a,b), ROLLUP(x,y) we multiply the products together so we get:
GROUP BY ROLLUP(a,b), (x,y)
--> GROUP BY (), a, (a,b) *cross product* (), x, (x,y)
--> (),() + (),x + (),x,y + a,() + a,x + a,x,y + a,b,(), a,b,x, a,b,x,y
So applying it to the original question we would have:
WITH sales (Year, Half, Category, Product, Revenue) AS (
SELECT 2020, 'H1', 'Electronics', 'Phone', 200 UNION ALL
SELECT 2020, 'H1', 'Electronics', 'Computer', 300 UNION ALL
SELECT 2020, 'H2', 'Electronics', 'Phone', 100 UNION ALL
SELECT 2020, 'H2', 'Electronics', 'Computer', 175 UNION ALL
SELECT 2021, 'H1', 'Electronics', 'Phone', 109 UNION ALL
SELECT 2021, 'H1', 'Electronics', 'Computer', 32 UNION ALL
SELECT 2021, 'H2', 'Electronics', 'Phone', 93 UNION ALL
SELECT 2021, 'H2', 'Electronics', 'Computer', 111
)
SELECT SUM(Revenue) AS "sum", NULL AS category, NULL AS product, NULL AS year, NULL AS half FROM Sales GROUP BY (),() UNION ALL
SELECT SUM(Revenue), NULL, NULL, Year, NULL FROM Sales GROUP BY (),Year UNION ALL
SELECT SUM(Revenue), NULL, NULL, Year, Half FROM Sales GROUP BY (),Year,Half UNION ALL
SELECT SUM(Revenue), Category, NULL, NULL, NULL FROM Sales GROUP BY Category,() UNION ALL
SELECT SUM(Revenue), Category, NULL, Year, NULL FROM Sales GROUP BY Category,Year UNION ALL
SELECT SUM(Revenue), Category, NULL, Year, Half FROM Sales GROUP BY Category,Year,half UNION ALL
SELECT SUM(Revenue), Category, Product, NULL, NULL FROM Sales GROUP BY Category,Product,() UNION ALL
SELECT SUM(Revenue), Category, Product, Year, NULL FROM Sales GROUP BY Category,Product,Year UNION ALL
SELECT SUM(Revenue), Category, Product, Year, half FROM Sales GROUP BY Category,Product,Year,Half
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=be3b0d89ad97eaf44b522caf0df9d7da

How to rewrite the following into an EXISTS

I have the following SQL statement that I would like to convert to use an EXISTS. How would this be done?
with Sales as (
select 'Office Supplies' Category , 2014 Year,22593.42 Profit UNION all
select 'Technology', 2014, 21492.83 UNION all
select 'Furniture', 2014, 5457.73 UNION all
select 'Office Supplies', 2015, 25099.53 UNION all
select 'Technology', 2015, 33503.87 UNION all
select 'Furniture', 2015, 50000.00 UNION all
select 'Office Supplies', 2016, 35061.23 UNION all
select 'Technology', 2016, 39773.99 UNION all
select 'Furniture', 2016, 6959.95
)
select Category, Profit - LAG(Profit) OVER (PARTITION BY Category ORDER BY Year) Diff
FROM Sales where 1=1 qualify Diff < 0
In other words, I want the query to be something like:
SELECT * FROM tbl WHERE EXISTS (...)
If LAG exists then you can use a sub-query instead of Teradata's QUALIFY.
SELECT Category, `Year`, Profit
, PreviousProfit
, (Profit - PreviousProfit) AS Diff
FROM
(
SELECT Category, `Year`, Profit
, LAG(Profit) OVER (PARTITION BY Category ORDER BY `Year`) AS PreviousProfit
FROM Sales AS sale
) AS sale
WHERE EXISTS (
SELECT 1
FROM Sales AS sale2
WHERE sale2.Category = sale.Category
AND sale2.Year = sale.Year - 1
AND sale2.Profit > sale.Profit
)
AND PreviousProfit > Profit
The EXISTS isn't really needed then.

Drop Off Funnel in SQL

I have a table that has user_seq_id and no of days a user was active in the program. I want to understand the drop-off funnel. Like how many users were active on day 0 (100%) and on day 1, 2 and so on.
Input table :
create table test (
user_seq_id int ,
NoOfDaysUserWasActive int
);
insert into test (user_seq_id , NoOfDaysUserWasActive)
values (13451, 2), (76453, 1), (22342, 3), (11654, 0),
(54659, 2), (64420, 1), (48906, 5);
I want Day, ActiveUsers, and % Distribution of these users.
One method doesn't use window functions at all. Just a list of days and aggregation:
select v.day, count(t.user_seq_id),
count(t.user_seq_id) / c.cnt as ratio
from (select 0 as day union all select 1 union all select 2 union all select 3 union all select 4 union all select 5
) v(day) left join
test t
on v.day <= t.NoOfDaysUserWasActive cross join
(select count(*) as cnt from test) c
group by v.day, c.cnt
order by v.day asc;
Here is a db<>fiddle.
The mention of window function suggests that you are thinking:
select NoOfDaysUserWasActive,
sum(count(*)) over (order by NoOfDaysUserWasActive desc) as cnt,
sum(count(*)) over (order by NoOfDaysUserWasActive desc) / sum(count(*)) over () as ratio
from test
group by NoOfDaysUserWasActive
order by NoOfDaysUserWasActive
The problem is that this does not "fill in" the days that are not explicitly in the original data. If that is not an issue, then this should have better performance.

Problems with an UNION MySQL

I have this query:
SELECT COUNT(*) AS invoice_count, IFNULL(SUM(qa_invoices.invoice_total), 0)
AS invoice_total, IFNULL(SUM(qa_invoices.invoice_discount) ,0) AS invoice_discount
FROM qa_invoices
WHERE (DATE(qa_invoices.invoice_date) BETWEEN '12/06/25' AND '12/06/25')
AND qa_invoices.status_code IN (5, 8)
UNION
SELECT IFNULL(SUM(qa_returns.client_credit), 0)
FROM qa_returns
WHERE (DATE(qa_returns.returnlog_date) BETWEEN '12/06/25' AND '12/06/25');
I get the error:
The used SELECT statements have a different number of columns.
I'm trying to join this 2 selects with an UNION command, if we look returnlog_date and invoice_date have the same data condition, if there is any way to perform both queries into one would be better.
Use a subselect:
SELECT
COUNT(*) AS invoice_count,
IFNULL(SUM(invoice_total), 0) AS invoice_total,
IFNULL(SUM(invoice_discount), 0) AS invoice_discount,
(
SELECT IFNULL(SUM(qa_returns.client_credit), 0)
FROM qa_returns
WHERE qa_returns.returnlog_date >= '2012-06-25'
AND qa_returns.returnlog_date < '2012-06-26'
) AS client_credit
FROM qa_invoices
WHERE invoice_date >= '2012-06-25'
AND invoice_date < '2012-06-26'
AND status_code IN (5, 8)
The error is telling you exactly what the problem is, for a UNION you have to have the same number of columns in each query.
I am not sure which column in your second query corresponds to your first query, but you can insert a zero in your second query.
Something like this:
SELECT COUNT(*) AS invoice_count
, IFNULL(SUM(qa_invoices.invoice_total), 0) AS invoice_total
, IFNULL(SUM(qa_invoices.invoice_discount) ,0) AS invoice_discount
FROM qa_invoices
WHERE (DATE(qa_invoices.invoice_date) BETWEEN '12/06/25' AND '12/06/25')
AND qa_invoices.status_code IN (5, 8)
UNION
SELECT 0
, IFNULL(SUM(qa_returns.client_credit), 0)
, 0
FROM qa_returns
WHERE (DATE(qa_returns.returnlog_date) BETWEEN '12/06/25' AND '12/06/25');
Result set you union together have to have the exact same columns.
Well in order to do a UNION u need to have same number of columns

Get column name which has the max value in a row sql

I have a a table in my database where I store categories for newsarticles and each time a user reads an article it increments the value in the associated column. Like this:
Now I want to execute a query where I can get the column names with the 4 highest values for each record. For example for user 9, it would return this:
I've tried several things, searched a lot but don't know how to do it. Can anyone help me?
This should do it:
select
userid,
max(case when rank=1 then name end) as `highest value`,
max(case when rank=2 then name end) as `2nd highest value`,
max(case when rank=3 then name end) as `3rd highest value`,
max(case when rank=4 then name end) as `4th highest value`
from
(
select userID, #rownum := #rownum + 1 AS rank, name, amt from (
select userID, Buitenland as amt, 'Buitenland' as name from newsarticles where userID = 9 union
select userID, Economie, 'Economie' from newsarticles where userID = 9 union
select userID, Sport, 'Sport' from newsarticles where userID = 9 union
select userID, Cultuur, 'Cultuur' from newsarticles where userID = 9 union
select userID, Wetenschap, 'Wetenschap' from newsarticles where userID = 9 union
select userID, Media, 'Media' from newsarticles where userID = 9
) amounts, (SELECT #rownum := 0) r
order by amt desc
limit 4
) top4
group by userid
Demo: http://www.sqlfiddle.com/#!2/ff624/11
A very simple way of doing this is shown below
select userId, substring_index(four_highest,',',1) as 'highest value', substring_index(substring_index(four_highest,',',2),',',-1) as '2th highest value', substring_index(substring_index(four_highest,',',3),',',-1) as '3 rd highest value', substring_index(four_highest,',',-1) as '4th highest value' from
(
select userid, convert(group_concat(val) using utf8) as four_highest from
(
select userId,Buitenland as val,'Buitenland' as col from test where userid=9 union
select userId,Economie as val,' Economie' as col from test where userid=9 union
select userId,Sport as val ,'Sport' as col from test where userid=9 union
select userId,Cultuur as val,'Cultuur' as col from test where userid=9 union
select userId,Wetenschap as val,'Wetenschap' as col from test where userid=9 union
select userId,Media as val,'Media' as col from test where userid=9 order by val desc limit 4
) inner_query
)outer_query;
PL/SQL, maybe? Set user_id, query your table, store the returned row in an nx2 array of column names and values (where n is the number of columns) and sort the array based on the values.
Of course, the correct thing to do is redesign your database in the manner that #octern suggests.
This will get you started with the concept of grabbing the highest value from multiple columns on a single row (modify for your specific tables - I created a fake one).
create table fake
(
id int Primary Key,
col1 int,
col2 int,
col3 int,
col4 int
)
insert into fake values (1, 5, 9, 27, 10)
insert into fake values (2, 3, 5, 1, 20)
insert into fake values (3, 89, 9, 27, 6)
insert into fake values (4, 17, 40, 1, 20)
SELECT *,(SELECT Max(v)
FROM (VALUES (col1), (col2), (col3), (col4) ) AS value(v))
FROM fake