How to determine equaivalent groupings (recursively) in SQL? - mysql

I have a list of products identified by their SKUs. To simplify it, I just name them as A, B, C, D,... here. Each of these SKUs has been assigned by default an already existing GroupID, for simplicity I just number them as 1, 2, 3,... here.
The same GroupID would mean "These SKUs are equivalent, so it is ok to use/buy either one of them, as it makes no difference".
The problem is, some SKUs show up more than once as they come from a different buying source, but as they come from a different source, they have a different grouping.
The goal is therefore to consolidate the grouping and make sure they have the same groupings.
I already apologize if my illustration may not be super pretty, but I'm trying. Here's a small data table sample on how the raw data looks like (first line is the column names):
Source SKU GroupID
Seller1 A 1
Seller1 B 1
Seller1 C 1
Seller2 B 2
Seller2 D 2
Seller2 E 2
Seller3 A 3
Seller3 B 3
Seller4 F 4
Seller4 G 4
Seller4 H 4
The result should be like:
Source SKU GroupID
Seller1 A 1
Seller1 B 1
Seller1 C 1
Seller2 B 1
Seller2 D 1
Seller2 E 1
Seller3 A 1
Seller3 B 1
Seller4 F 4
Seller4 G 4
Seller4 H 4
Basically, if Any SKU in GroupID X is a subset of GroupID Y, then GroupID Y = GroupID X. But that should be applied on all GroupIDs, so it appears to be recursive.
I wish I could show the code that I tried already and I tried already for a few days, but I literally only managed to produce garbage.
In C# I'd know how to deal with this, but I can't seem to wrap my head around SQL as I am not that experienced and unfortunately I would need this in SQL.
I would be thankful for any kind of help, even if it's just a hint or direction you guys would suggest I should try. Thanks a lot!

You want a correspondence between groups, which you can calculate with a recursive CTE:
with recursive tt as (
select distinct t1.groupid as groupid1, t2.groupid as groupid2
from t t1 join
t t2
on t1.sku = t2.sku
),
cte as (
select tt.groupid1, tt.groupid2, concat_ws(',', tt.groupid1, tt.groupid2) as visited
from tt
union all
select cte.groupid1, tt.groupid2, concat_ws(',', visited, tt.groupid2)
from cte join
tt
on cte.groupid2 = tt.groupid1
where find_in_set(tt.groupid2, cte.visited) = 0
)
select groupid1, min(groupid2) as overall_group
from cte
group by groupid1;
You can then join this back to the original table to get the "overall group":
with recursive tt as (
select distinct t1.groupid as groupid1, t2.groupid as groupid2
from t t1 join
t t2
on t1.sku = t2.sku
),
cte as (
select tt.groupid1, tt.groupid2, concat_ws(',', tt.groupid1, tt.groupid2) as visited
from tt
union all
select cte.groupid1, tt.groupid2, concat_ws(',', visited, tt.groupid2)
from cte join
tt
on cte.groupid2 = tt.groupid1
where find_in_set(tt.groupid2, cte.visited) = 0
)
select t.*, g.overall_group
from t join
(select groupid1, min(groupid2) as overall_group
from cte
group by groupid1
) g
on t.groupid = g.groupid1;
Here is a db<>fiddle.
Note: Your sample data is rather "complete" so you don't need a recursive CTE for that particular data. However, I am guessing that your real groups have a bit less overlap in which case recursion is necessary.

First is to get all those sellers with subsets based on count. then filter using Group By
select table1.Source, SKU, case when table1.Source = t6.Source and t6.cnt > 1 then 1 else 2 end as GroupID
from table1
left join
(select t5.Source, count(t5.cnt) as cnt from (
select distinct t4.Source, t4.cnt from (
select t3.Source, count(t3.SKU) as cnt from (
select t1.Source, t1.SKU from table1 t1
left join table1 t2 on t2.SKU = t1.SKU ) t3
group by t3.Source, t3.SKU
order by t3.Source) t4) as t5
group by t5.Source) t6 on t6.Source = table1.Source

Related

How to find no of users in mysql table?

I have table like below
A B C D
part1 p NA NA
part2 NA NA p
Part3 p p NA
I have to create table with part1 users,part2 users and part3 users. p represents users.
I have tried like below for part1 users but is giving all part1 and part2 and part3 users as 1.
SELECT count(distinct A) as no_of_part1_users
FROM table1
WHERE (A="part1" AND B="x") OR (A="part1" AND C="x") OR (A="part1" AND D="x");
I am getting output like below:
no_of_part1_users
1
I have applied same logic for part2 and part3 users but i am getting same output.
Desired output:
no_of_part1_users no_of_part2_users no_of_part3_users
1 1 2
To produce what you want, you could use a FROM-less SELECT with subqueries for each count. Each subquery uses a derived table that transforms the columns into rows using UNION ALL. Then this can be filtered and the count being taken.
SELECT (SELECT count(*)
FROM (SELECT b u
FROM table1
WHERE a = 'part1'
UNION ALL
...
UNION ALL
SELECT d u
FROM table1
WHERE a = 'part1') x
WHERE u <> 'NA') no_of_part1_users,
...
(SELECT count(*)
FROM (SELECT b u
FROM table1
WHERE a = 'part3'
UNION ALL
...
UNION ALL
SELECT d u
FROM table1
WHERE a = 'part3') x
WHERE u <> 'NA') no_of_part3_users;
But yeah, that's pretty ugly. Your design is really bad indeed. You should really fix that instead. Relational tables aren't spreadsheets!

Get all applications where count of positions more then 1

I have 2 tables
LoanApplications (Id, Name, CreationDate, LoanApplicationStatusId)
Positions(Id, Name, CreationDate, LoanApplicationId)
I need to find all loan applications that have more than 1 position and update LoanApplicationStatusId to 2
I write code to get these LoanApplications like this
SELECT e.Id, count(Name) FROM LoanApplications e
INNER JOIN Positions d ON e.Id=d.LoanApplicationId
GROUP BY e.Id
HAVING COUNT(Name)>1
But I don't understand how to make an update now.
Can you help me?
Straight ahead would be a simple subselect
UPDATE LoanApplications l
SET LoanApplicationStatusId = 2
where (select count(1) from Positions p where p.LoanApplicationId = l.id) > 1
Simply select id of apps which have more than one row, and use it in UPDATE as a condition
UPDATE LoanApplications
JOIN ( SELECT LoanApplicationId
FROM Positions
GROUP BY LoanApplicationId
HAVING COUNT(LoanApplicationId) > 1 ) multi_positional ON id = LoanApplicationId
SET LoanApplicationStatusId = 2
Unsafe query: 'Update' statement without 'where' updates all table rows at once Got this stuff – Eugene Sukh
Convert this query to
UPDATE LoanApplications
JOIN ( SELECT LoanApplicationId
FROM Positions
GROUP BY LoanApplicationId
HAVING COUNT(LoanApplicationId) > 1 ) multi_positional
SET LoanApplicationStatusId = 2
WHERE LoanApplications.id = multi_positional.LoanApplicationId

repeated rows in json_agg() in query with 2 lateral joins

I have a strange result when performing a lateral join on a query
I have the following table structure
task->id
comment -> id , taskId, comment
tasklink -> taskId, type, userid
with a single task record (id 10), 1 comment record ("row1", "a test comment") and 5 tasklink records (all with taskid 10)
I expected this query
select task.id,
json_agg(json_build_object('id',c.id, 'user',c.comment)) as comments,
json_agg(json_build_object('type',b.type, 'user',b.userid)) as users
FROM task
left join lateral (select c.* from comment c where task.id = c.taskid) c on true
left join lateral (select b.* from taskuserlink b where task.id = b.taskid) b on true
where task.id = 10
GROUP BY task.id ;
to return
id | comments | users
---------------------------------------------------------------------
10 "[{"id":"row1","user":"a test comment"}]" "[{"type":"updatedBy","user":1},"type":"closedBy","user":5},"type":"updatedBy","user":5},"type":"createdBy","user":5},{"type":"ownedBy","user":5}]"
instead, I got this
id | comments | users
10 "[{"id":"row1","user":"a test comment"},{"id":"row1","user":"a test comment"},{"id":"row1","user":"a test comment"},{"id":"row1","user":"a test comment"},{"id":"row1","user":"a test comment"}]" "[{"type":"updatedBy","user":1},{"type":"closedBy","user":5},{"type":"updatedBy","user":5},{"type":"createdBy","user":5},{"type":"ownedBy","user":5}]"
ie , for every link row, the comment row is duplicated
I am thinking that I am missing something really obvious, but as I have only just started using Postgres (and sql ) I'm a little stumped
I would appreciate some guidance on where I'm going wrong
Move the aggregates into subqueries:
select id, comments, users
from task t
left join lateral (
select json_agg(json_build_object('id',c.id, 'user',c.comment)) as comments
from comment c
where t.id = c.taskid
) c on true
left join lateral (
select json_agg(json_build_object('type',b.type, 'user',b.userid)) as users
from taskuserlink b
where t.id = b.taskid
) b on true
DbFiddle.

correct sql query to simplify relationship (mysql)

I have the following table:
table A
id emp emp_dst
1 a b
2 a d
3 b c
4 b a
5 c d
6 d a
7 d b
8 d c
my sql query should return me the following simplified table since a = b equals b = a
table B
emp emp_dst
a b
a d
b c
d b
d c
but I have no idea how to do this in an sql query in MYSQL,
try revising expressions with UNION but the results are wrong
An alternative that suits my personal preferences better...
(Based on your comment that the id in the results was not relevant.)
SELECT
CASE WHEN emp <= emp_dst THEN emp ELSE emp_dst END AS emp,
CASE WHEN emp <= emp_dst THEN emp_dst ELSE emp END AS emp_dst
FROM
yourTable
GROUP BY
1, 2
ORDER BY
1, 2
If you want an id, then you can add MIN(id). Just note that the id found may actually have the two values the other way around.
An alternative that uses a LEFT JOIN rather than GROUP BY.
SELECT
yourTable.*
FROM
yourTable
LEFT JOIN
yourTable AS reflection
ON reflection.emp_dst = yourTable.emp
AND reflection.emp = yourTable.emp_dst
AND reflection.id <> yourTable.id
WHERE
(reflection.id IS NULL)
OR (yourTable.emp < reflection.emp_dst)
OR (yourTable.emp = reflection.emp_dst AND yourTable.id < reflection.id)
ORDER BY
yourTable.emp,
yourTable.emp_dst
(The last OR is only needed if a table can have 'a', 'a', and it appear twice.)
Note: This may benefit from having two indexes...
CREATE INDEX yourTable_e_ed_id ON yourTable( emp, emp_dst, id );
CREATE INDEX yourTable_ed_e_id ON yourTable( emp_dst, emp, id );

SELECT group by twice

I'm not strong in DB at all and I need your help.
I need SQL request with GROUP by twice.
Example of my data in table
<table border="1" style="border-collapse:collapse">
<tr><th>id</th><th>market_id</th><th>price</th><th>low</th><th>high</th><th>symbol</th><th>created_at</th></tr>
<tr><td>1</td><td>1</td><td>5773.8</td><td>5685</td><td>6020</td><td>btcusd</td><td>2017-10-27 16:46:10</td></tr>
<tr><td>2</td><td>1</td><td>0.4274</td><td>0.39</td><td>0.43983</td><td>iotusd</td><td>2017-10-27 16:46:11</td></tr>
<tr><td>3</td><td>1</td><td>0.20026</td><td>0.1986</td><td>0.20352</td><td>xrpusd</td><td>2017-10-27 16:46:12</td></tr>
<tr><td>4</td><td>2</td><td>5771</td><td>5685</td><td>6020</td><td>btcusd</td><td>2017-10-27 16:46:18</td></tr>
<tr><td>5</td><td>2</td><td>0.4274</td><td>0.39</td><td>0.43983</td><td>iotusd</td><td>2017-10-27 16:46:18</td></tr>
<tr><td>6</td><td>2</td><td>0.20026</td><td>0.1986</td><td>0.20352</td><td>xrpusd</td><td>2017-10-27 16:46:19</td></tr>
<tr><td>7</td><td>1</td><td>5773.1</td><td>5685</td><td>6020</td><td>btcusd</td><td>2017-10-27 16:46:25</td></tr>
<tr><td>8</td><td>1</td><td>0.4274</td><td>0.39</td><td>0.43983</td><td>iotusd</td><td>2017-10-27 16:46:25</td></tr>
<tr><td>9</td><td>1</td><td>0.20026</td><td>0.1986</td><td>0.20352</td><td>xrpusd</td><td>2017-10-27 16:46:26</td></tr>
<tr><td>10</td><td>2</td><td>5773.1</td><td>5685</td><td>6020</td><td>btcusd</td><td>2017-10-27 16:46:32</td></tr>
<tr><td>11</td><td>2</td><td>0.42741</td><td>0.39</td><td>0.43983</td><td>iotusd</td><td>2017-10-27 16:46:32</td></tr>
<tr><td>12</td><td>2</td><td>0.20026</td><td>0.1986</td><td>0.20352</td><td>xrpusd</td><td>2017-10-27 16:46:33</td></tr></table>
I would like to get latest data for every market_id and symbol
That's mean I need somethind like that in the end :
- id market_id symbol
- 7 1 btcusd
- 8 1 iotusd
- 9 1 xrpusd
- 10 2 btcusd
- 11 2 iotusd
- 12 2 xrpusd
Really need help, a little bit blocked.
You are almost there. Try this
SELECT c.*
FROM CRYPTO as C
JOIN (
SELECT market_id, symbol, MAX(id) as maxid
FROM CRYPTO
GROUP BY market_id, symbol
) AS C2
ON C2.maxid = C.id and C.market_id = c2.market_id and c.symbol = c2.symbol
Along these lines...
SELECT MAX(id), market_id, symbol
FROM crypto
GROUP BY market_id, symbol
Here's my comment stated as SQL.
SELECT A.ID, A.MarketID, A.Symbol, A.Price, A.Low, A.High
FROM CRYPTO A
INNER JOIN (SELECT max(Created_at) MCA, Market_ID, Symbol
FROM crypto
GROUP BY Market_ID, Symbol) B
on A.Created_At = B.MCA
and A.market_ID = B.Market_ID
and A.Symbol = B.Symbol
What this does:
The derived table (aliased B) generates 1 line for each market_ID and symbol having the max created_at time. It then uses this derived table set to join back to the base set (aliased A) to limit the data to just those having the max created_at. this allows us to show the whole record from A for each unique market_Id and symbol; but only for records having the max created_at.
Other engines would allow you to use a cross apply or an analytic to obtain the desired results.
I tried these requests
SELECT * FROM CRYPTO as C3
JOIN (
SELECT MAX(id) as max
FROM CRYPTO as C1
GROUP BY symbol
) AS C2
ON C2.max = C3.id
SELECT M.id, M.name, R.symbol FROM MARKET AS M
JOIN (
SELECT DISTINCT C.symbol, C.market_id
FROM CRYPTO as C
) as R
ON M.id = R.market_id
But finally I did not find the good combination.