SQL subquery to fetch rows by column values - mysql

I have a PostgreSQL table like this:
CREATE TABLE foo(man_id, subgroup, power, grp)
AS VALUES
(1, 'Sub_A', 4, 'Group_A'),
(2, 'Sub_B', -1, 'Group_A'),
(3, 'Sub_A', -1, 'Group_B'),
(4, 'Sub_B', 6, 'Group_B'),
(5, 'Sub_A', 5, 'Group_A'),
(6, 'Sub_B', 1, 'Group_A'),
(7, 'Sub_A', -1, 'Group_B'),
(8, 'Sub_B', 2, 'Group_B'),
(9, 'Sub_C', 2, 'Group_B');
The power calculation works like this:
Total Power of Subgroup Sub_A in the grp Group_A is (4 + 5 ) = 9
Total Power of Subgroup Sub_B in the grp Group_A is ((-1) + 1 ) = 0
Total Power of Subgroup Sub_A in the grp Group_B is ((-1) + (-1) ) = -2
Total Power of Subgroup Sub_B in the grp Group_B is (6 + 2 ) = 8
So the power of Sub_A in the Group_A is not equal to power of Sub_A in the Group_B
So the power of Sub_B in the Group_A is not equal to power of Sub_B in the Group_B
I want to query the database and fetch the rows where, for a same subgroup name total power is not equal across all the other grp names.
What would be the recommended way to do this?
I can find the sum of total power:
SELECT sum(power) AS total_power
FROM foo
GROUP BY grp
MySQL solution will be accepted as well.

One way:
SELECT f.*
FROM (
SELECT subgroup
FROM (
SELECT subgroup, grp, sum(power) AS total_power
FROM foo
GROUP BY subgroup, grp
) sub
GROUP BY 1
HAVING min(total_power) <> max(total_power) -- can fail for NULL values;
) sg
JOIN foo f USING (subgroup);
In your example all rows qualify except for the last one with 'Sub_C'.
Closely related to your previous question:
Do all groups have equal total power for given subgroup?
Similar explanation and considerations.
db<>fiddle here

I think a way to phrase your problem is that you want to total the power for subgroup in a group, then find if a subgroup with the same name exists in another group with a different power.
The first step is to total the powers like you want:
SELECT grp, subgroup, sum(power) as power
FROM foo
GROUP BY grp, subgroup
That should give you a result like:
grp subgroup power
------- -------- -----
Group_A Sub_A 9
Group_A Sub_B 0
Group_B Sub_A -2
Group_B Sub_B 8
Group_B Sub_C 2
Once you have that, you can use a CTE to join the results with itself for the comparison to get what you want. You don't specify whether you want Sub_C to appear, if 'not existing' qualifies as having a 'different total power', then you would want to use a left join and check for nulls in alias b. The < in the join makes it so that each difference only appears once with the lower order group as grp1.
WITH totals AS (
SELECT grp, subgroup, sum(power) as power
FROM foo
GROUP BY grp, subgroup
ORDER BY grp, subgroup
)
SELECT a.subgroup,
a.grp as grp1, a.power as Power1,
b.grp as grp2, b.power as Power2
FROM totals a
INNER JOIN totals b ON b.subgroup = a.subgroup
and a.grp < b.grp
WHERE b.power <> a.power
ORDER BY a.subgroup, a.grp, b.grp

with totals as (
select grp, subgroup, sum(power) as total_power
from foo
group by grp, subgroup
)
select * from totals t1
where t1.total_power <> all (
select t2.total_power from totals t2
where t2.subgroup = t.subgroup and t2.grp <> t1.grp
)
or
with totals as (
select grp, subgroup, sum(power) as total_power
from foo
group by grp, subgroup
), matches as (
select grp, subgroup, count(*) over (partition by subgroup, total_power) as matches
)
select * from counts where matches = 1;

I would use window functions:
select f.*
from (select f.*,
min(sum_value)) over (partition by group) as min_sum_value,
max(sum_value)) over (partition by group) as max_sum_value,
from (select f.*,
sum(value) over (partition by subgroup, group) as sum_value
from foo f
) f
) f
where min_sum_value <> max_sum_value;

Related

MySQL and using SELECT from custom position

I have a MySQL problem that I can't figure out.
I run a query:
SELECT id, totalsum FROM table ORDER BY totalsum DESC
This could give me the following result:
1, 10000
4, 90000
8, 80000
3, 50000
5, 40000
++++
What is need is a code that should work something like this:
SELECT id, totalsum
FROM table ORDER BY totalsum DESC
START LISTING FROM id=8 AND CONTINUE TO THE END OF RESULT / LIMIT
Resulting in someting like this
8, 80000
3, 50000
5, 40000
++++
I can not use this query:
SELECT id, totalsum
FROM table
WHERE id>=8
ORDER BY totalsum DESC
Because the id could be both < and >.
Have tried using LIMIT AND OFFSET but that resulting in very slow speed.
Any advice pointing me in the right direction will be appreciated!
Here's a way to do it:
Assign each row a row_num based on totalsum in descending order (CTE)
Select from the above where row_num >= the row_num of id=8
create table a_table (
id int,
total int);
insert into a_table values
(1, 100000),
(4, 90000),
(8, 80000),
(3, 50000),
(5, 40000);
with cte as (
select id,
total,
row_number() over (order by total desc) as row_num
from a_table)
select *
from cte
where row_num >= (select row_num from cte where id=8);
Result:
id|total|row_num|
--+-----+-------+
8|80000| 3|
3|50000| 4|
5|40000| 5|
EDIT:
The above query may return wrong result if other rows have the same total. A comment said it well, just use the following query can do the job:
select id, total
from a_table
where total <= (select total from a_table where id=8)
order by total desc;

How to transpose values in rows to columns in MySQL

This image shows how my raw table looks like:
Following are the conditions to get the transposed table from the image below:
Each row has a unique id
We only need columns for groups A,B,C in the group field and not others.
There could be single or multiple id for group A for the same app id, I need to get those rows for which date is minimum.
There could be single or multiple id for group B and C for the same app id, I need to get those rows for which date is maximum
The image below shows how my final table should look like:
Each row has a unique id
We only need columns for groups A,B,C in the group field and not others.
add this to your query
WHERE `GROUP` IN ('A','B','C')
There could be single or multiple id for group A for the same app id, I need to get those rows for which date is minimum.
add somewhere after the SELECT:
MIN(date) OVER (PARTIITON BY appid)
There could be single or multiple id for group B and C for the same app id, I need to get those rows for which date is maximum
change the added option on point 3 to:
CASE WHEN `group` IN ('B','C')
THEN MAX(date) OVER (PARTIITON BY appid)
ELSE MIN(date) OVER (PARTIITON BY appid)
END
Maybe this helps you to try and take a serious request of solving this yourself (and learn from it) in stead of asking for a solution and then do copy/paste...
BTW: Naming fiels with reserved words, like GROUP and DATE is not a very smart thing to do. A better name for the column GROUP might be CategoryGroup (or whatever this group is referring to)
I took a different approach to this. The SQL is longer but I think it's more auditable.
The main logic point is that I broke A and BC into 2 different subqueries, and used QUALIFY ROW_NUMBER() to choose the correct row, based on either ASC or DESC per your requirements.
I know you are using mysql and this might not work since I don't have an instance to test this one, but here is the SQL I got from building this logic in Rasgo, which I tested on Snowflake and it worked.
-- This splits the data into group A only
WITH CTE_A AS (
SELECT
*
FROM
{{ your_table }}
WHERE
my_group = 'A'
),
-- This splits the data into group B and C only
CTE_B AS (
SELECT
*
FROM
{{ your_table }}
WHERE
my_group IN('B', 'C')
),
-- Selecting from A only, it keeps the most recent row ASCENDING
CTE_A_FIRST AS (
SELECT
*
FROM
CTE_A QUALIFY ROW_NUMBER() OVER (
PARTITION BY APP_ID,
MY_GROUP
ORDER BY
MY_DATE ASC
) = 1
),
-- Selecting from A only, it keeps the most recent row DESCENDING
CTE_B_LAST AS (
SELECT
*
FROM
CTE_B QUALIFY ROW_NUMBER() OVER (
PARTITION BY APP_ID,
MY_GROUP
ORDER BY
MY_DATE DESC
) = 1
),
-- Here we just union A and BC back to one another
CTE_ABC AS (
SELECT
ID,
APP_ID,
MY_DATE,
MY_GROUP,
SCORE1,
SCORE2
FROM
CTE_B_LAST
UNION ALL
SELECT
ID,
APP_ID,
MY_DATE,
MY_GROUP,
SCORE1,
SCORE2
FROM
CTE_B
),
-- We pivot the date horizontally so we get a date for A B C
-- the MIN does not matter, since at this point, we only have 1
CTE_PVT_DATE AS (
SELECT
APP_ID,
B,
C,
A
FROM
(
SELECT
APP_ID,
MY_DATE,
MY_GROUP
FROM
CTE_ABC
) PIVOT (
MIN (MY_DATE) FOR MY_GROUP IN ('B', 'C', 'A')
) as p (APP_ID, B, C, A)
),
-- We pivot the SCORE1 horizontally so we get a date for A B C
-- the MIN does not matter, since at this point, we only have 1
CTE_PVT_SCORE1 AS (
SELECT
APP_ID,
B,
C,
A
FROM
(
SELECT
APP_ID,
SCORE1,
MY_GROUP
FROM
CTE_ABC
) PIVOT (
MIN (SCORE1) FOR MY_GROUP IN ('B', 'C', 'A')
) as p (APP_ID, B, C, A)
),
-- We pivot the SCORE2 horizontally so we get a date for A B C
-- the MIN does not matter, since at this point, we only have 1
CTE_PVT_SCORE2 AS (
SELECT
APP_ID,
B,
C,
A
FROM
(
SELECT
APP_ID,
SCORE2,
MY_GROUP
FROM
CTE_ABC
) PIVOT (
MIN (SCORE2) FOR MY_GROUP IN ('B', 'C', 'A')
) as p (APP_ID, B, C, A)
),
-- We join the subqueries above together on the APP_IDs
CTE_JOINED AS (
SELECT
t0.*,
t1.APP_ID as SCORE1_APP_ID,
t1.B as SCORE1_B,
t1.C as SCORE1_C,
t1.A as SCORE1_A,
t2.APP_ID as SCORE2_APP_ID,
t2.B as SCORE2_B,
t2.C as SCORE2_C,
t2.A as SCORE2_A
FROM
CTE_PVT_DATE t0
INNER JOIN CTE_PVT_SCORE1 t1 ON t0.APP_ID = t1.APP_ID
INNER JOIN CTE_PVT_SCORE2 t2 ON t0.APP_ID = t2.APP_ID
)
-- The final select is really just renaming ...
-- the magic has already happened
SELECT
A AS DATE_A,
B AS DATE_B,
C AS DATE_C,
APP_ID,
SCORE1_B,
SCORE1_C,
SCORE1_A,
SCORE2_B,
SCORE2_C,
SCORE2_A
FROM
CTE_JOINED
I'll roll out my attempt along several steps and then show you the full solution made up of these steps, so that you can understand it piece by piece, given the following definition of your input table:
CREATE TABLE tab(
id INT,
app_id INT,
date VARCHAR(20),
group VARCHAR(20),
score1 INT,
score2 INT
);
STEP 1. Formatting date using a proper DATE format ("YYYY-MM-DD"). For this purpose the function STR_TO_DATE can come in handy.
WITH formatted_tab AS (
SELECT id,
app_id,
STR_TO_DATE(date, '%m/%d/%Y') AS date,
group,
score1,
score2
FROM tab
)
STEP 2. Extracting the useful dates according to the group field. As long as you treat group "A" differently with respect to group "B" and "C" specifically, the idea here is to address each group with a different query, where
in the former case the MIN aggregation function is applied,
in the latter case the MAX aggregation function is applied,
Then the two output result sets are combined with a UNION operation.
(
SELECT app_id,
MIN(date) AS date,
group
FROM formatted_tab
WHERE group IN ('A')
GROUP BY app_id,
group
UNION
SELECT app_id,
MAX(date) AS date,
group
FROM formatted_tab
WHERE group IN ('B', 'C')
GROUP BY app_id,
group
) needed_dates
STEP 3. Getting back scores corresponding to group and date field. This is done with a simple INNER JOIN between the last generated table and the formatted table.
(
SELECT needed_dates.*,
formatted_tab.score1,
formatted_tab.score2
FROM needed_dates
INNER JOIN formatted_tab
ON needed_dates.app_id = formatted_tab.app_id
AND needed_dates.date = formatted_tab.date
AND needed_dates.group = formatted_tab.group
) needed_infos
STEP 4. Pivoting the table exploiting MySQL tools like:
the IF statement to retrieve the values corresponding to a specific group
the MAX aggregation function, to aggregate on the same group
These tools are applied for each group you specified ('A', 'B' and 'C').
SELECT app_id,
MAX(IF(group='A', date , NULL)) AS date_groupA,
MAX(IF(group='B', date , NULL)) AS date_groupB,
MAX(IF(group='C', date , NULL)) AS date_groupC,
MAX(IF(group='A', score1, NULL)) AS score1_groupA,
MAX(IF(group='A', score2, NULL)) AS score2_groupA,
MAX(IF(group='B', score1, NULL)) AS score1_groupB,
MAX(IF(group='B', score2, NULL)) AS score2_groupB,
MAX(IF(group='C', score1, NULL)) AS score1_groupC,
MAX(IF(group='C', score2, NULL)) AS score2_groupC
FROM needed_infos
GROUP BY app_id
Full attempt. This is the combination of the previous snippets. The only difference is the presence of backticks for the field names, that avoid MySQL to misunderstand them with MySQL private keywords like "date" (indicating the DATE type), "group" (use as keyword in the GROUP BY clause) or similar.
WITH `formatted_tab` AS (
SELECT `id`,
`app_id`,
STR_TO_DATE(`date`, '%m/%d/%Y') AS `date`,
`group`,
`score1`,
`score2`
FROM `tab`
)
SELECT `app_id`,
MAX(IF(`group`='A', `date` , NULL)) AS date_groupA,
MAX(IF(`group`='B', `date` , NULL)) AS date_groupB,
MAX(IF(`group`='C', `date` , NULL)) AS date_groupC,
MAX(IF(`group`='A', `score1`, NULL)) AS score1_groupA,
MAX(IF(`group`='A', `score2`, NULL)) AS score2_groupA,
MAX(IF(`group`='B', `score1`, NULL)) AS score1_groupB,
MAX(IF(`group`='B', `score2`, NULL)) AS score2_groupB,
MAX(IF(`group`='C', `score1`, NULL)) AS score1_groupC,
MAX(IF(`group`='C', `score2`, NULL)) AS score2_groupC
FROM ( SELECT needed_dates.*,
formatted_tab.score1,
formatted_tab.score2
FROM ( SELECT `app_id`,
MIN(`date`) AS `date`,
`group`
FROM `formatted_tab`
WHERE `group` IN ('A')
GROUP BY `app_id`,
`group`
UNION
SELECT `app_id`,
MAX(`date`) AS `date`,
`group`
FROM `formatted_tab`
WHERE `group` IN ('B', 'C')
GROUP BY `app_id`,
`group`
) needed_dates
INNER JOIN formatted_tab
ON needed_dates.app_id = formatted_tab.app_id
AND needed_dates.date = formatted_tab.date
AND needed_dates.group = formatted_tab.group
) needed_infos
GROUP BY `app_id`
You'll find a tested SQL Fiddle here.

Drop Off Funnel in SQL

I have a table that has user_seq_id and no of days a user was active in the program. I want to understand the drop-off funnel. Like how many users were active on day 0 (100%) and on day 1, 2 and so on.
Input table :
create table test (
user_seq_id int ,
NoOfDaysUserWasActive int
);
insert into test (user_seq_id , NoOfDaysUserWasActive)
values (13451, 2), (76453, 1), (22342, 3), (11654, 0),
(54659, 2), (64420, 1), (48906, 5);
I want Day, ActiveUsers, and % Distribution of these users.
One method doesn't use window functions at all. Just a list of days and aggregation:
select v.day, count(t.user_seq_id),
count(t.user_seq_id) / c.cnt as ratio
from (select 0 as day union all select 1 union all select 2 union all select 3 union all select 4 union all select 5
) v(day) left join
test t
on v.day <= t.NoOfDaysUserWasActive cross join
(select count(*) as cnt from test) c
group by v.day, c.cnt
order by v.day asc;
Here is a db<>fiddle.
The mention of window function suggests that you are thinking:
select NoOfDaysUserWasActive,
sum(count(*)) over (order by NoOfDaysUserWasActive desc) as cnt,
sum(count(*)) over (order by NoOfDaysUserWasActive desc) / sum(count(*)) over () as ratio
from test
group by NoOfDaysUserWasActive
order by NoOfDaysUserWasActive
The problem is that this does not "fill in" the days that are not explicitly in the original data. If that is not an issue, then this should have better performance.

Order by the difference of two selects in MySQL

I have a table like this:
groupX groupY quantity
A B 10
A C 2
C D 7
B A 13
C B 1
D B 9
So, the same individuals appear at columns groupX and groupY. I would like to write a Select that makes the following:
Select
(Select groupX, sum(quantity) group by groupX) as M
- (Select groupY, sum(quantity)
group by groupY) as N
ORDER BY M-N Desc
I mean, I need to sum the quantities for each individual when they appear at groupX and when they appear at groupY and then calculate the difference for each individual between first and second quantity. Finally I need the query to order the individuals by that difference.
Of course the query I wrote does not work.
With this table as source
CREATE TABLE tableA
(`groupX` varchar(1), `groupY` varchar(1), `quantity` int)
;
INSERT INTO tableA
(`groupX`, `groupY`, `quantity`)
VALUES
('A', 'B', 10),
('A', 'C', 2),
('C', 'D', 7),
('B', 'A', 13),
('C', 'B', 1),
('D', 'B', 9)
;
You get with this statement
SELECT T1.groupX groupname, (sum1 - sum2) as res
From
(SELECT groupX ,Sum(quantity) sum1 From tableA Group by groupX) T1
inner join (SELECT groupY,Sum(quantity) sum2 From tableA Group by groupY) T2
On T1.groupX = T2.groupY
ORDER by res;
You will get
groupname res
B -7
A -1
D 2
C 6
As i don't know all your data, it can be that it is better ti use Left join, which would include all unique memebers of groupX that have no corresponding group in groupY.
With right join it is vice versa
See example here https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=6c2facaaf6564d4025f24f6aab35adf7

aggregation condition in case when

I have a dataset with a structure similar to the one bellow
fruit, value
apple, 234
apple, 2341
pear, 3233
grape, 323
pear, 3234
grap 1234
I am trying to find a count of a range of the numbers that are in the bottom 10% of the range by performing a query like the one below. (the ultimate goal of the query is to calculate and see the ranges of the calc go up in increments of 10%) I also have a group by clause so I would like the counts to be grouped by the fruit and aggregated that way. Bellow is the query I have tried
select fruit, count(case when (value <= (((max(value) - min(value)) * .1) + min(value))) then 1 end)
from fruit_juice
group by substring(fruit, 5, 5);
Aggregate the table in the from clause to get the limits you want. Join those results back to your query and use those values for the query:
select substring(fj.fruit, 5, 5),
sum(fj.value <= fmm.minv + (fmm.maxv - fmm.minv) * 0.1)
from fruit_juice fj join
(select substring(fruit, 5, 5) as fruit5,
max(value) as maxv, min(value) as minv
from fruit_juice
group by substring(fruit, 5, 5)
) fmm
on fmm.fruit5 = substring(fj.fruit, 5, 5)
group by substring(fruit, 5, 5);
Note that your group by expressions should match the expressions in the select clause.
EDIT:
I'm not sure where the substring() is coming from in your question, so this version removes it:
select fj.fruit, sum(fj.value <= fmm.minv + (fmm.maxv - fmm.minv) * 0.1)
from fruit_juice fj join
(select fruit,
max(value) as maxv, min(value) as minv
from fruit_juice
group by fruit
) fmm
on fmm.fruit = fj.fruit
group by fruit;