How to select columns with subset of another column in mysql? - mysql

I have a complicated version of this problem (How to select columns with same set of values in mysql?) to deal with.
In a relation R(A,B,C), The problem is to figure out "A's with 4 or more common B's". FYI: "AB" is a candidate key.
All I was able to do is this
Query:
select * from
(select A, group_concat (B separator ', ') all_b's from R group by A having
(count(B))>3) p1
join
(select A, group_concat (B separator ', ') all_b's from R group by A having
(count(B))>3) p2
on p1.all_b's = p2.all_b's and p1.A <> p2.A;
Output:
Null Set
But, the answer is supposed to be something else.
Any idea how to deal with this?
Sample Data:
A B C
a1 b1 asdas
a1 b2 sdvsd
a1 b3 sdfs
a1 b4 evevr
a2 b1 jdjd
a2 b2 dkjlfnv
a2 b3 sdfs
a2 b4 evevr
a2 b5 adfgaf
a3 b1 sdfsdf
Expected Output
A A count
a1 a2 4

It should be something like that:
SELECT
first.A AS first_A,
second.A AS second_A,
COUNT(*) AS countSameBs
FROM
R first
JOIN
R second ON
first.B = second.B AND
first.A != second.A
GROUP BY
first_A, second_A
HAVING
countSameBs >= 4 AND
first_A < second_A

Related

Remove rows that do not differ from the previous row in MySQL

Suppose I have a table that records changes to my database over time:
TimeOfChange FieldA FieldB FieldC
-------------------------------------
2019-01-01 A1 B1 C1 /*(R1)*/
2019-01-02 A2 B2 C1 /*(R2)*/
2019-01-03 A2 B2 C1 /*(R3)*/
2019-01-05 A1 B1 C2 /*(R4)*/
2019-01-07 A1 B1 C1 /*(R5)*/
My database has many rows where nothing significant changed, eg row (R3) is the same as (R2).
I would like to remove these rows. I have found many references on how to use a common table expression to remove duplicate rows from the table. So it's possible to remove the duplicate (ignoring the TimeOfChange column) rows. But this will remove (R5) as well because it is the same as R1. I only want to remove the rows that have the same ABC values as the previous row, when ordered by the TimeOfChange column. How do I do that?
edit: You can assume that TimeOfChange values are all unique
Assuming the TimeOfChange is unique, you can do:
delete
from data
where TimeOfChange in (
select TimeOfChange
from (
select d2.TimeOfChange
from data d1
join data d2
where d2.TimeOfChange in (
select min(x.TimeOfChange)
from data x
where x.TimeOfChange>d1.TimeOfChange
) and d1.FieldA=d2.FieldA and d1.FieldB=d2.FieldB and d1.FieldC=d2.FieldC
) as q
);
So you first want to determine which rows are the "next" and then check if the "next" has the same values as the "current". For those the "next" would form a result set that you want to use in DELETE. The select * from data is there to circumvent the reuse of the table in DELETE and in the subquery.
You probably will get much better performance if you separate the logic into a stored procedure and store the id's for rows to be deleted into a temp table.
See DB Fiddle
Presuming, you really meant "when the same A, B, C occurred on the most recent day prior that had any data", this should be usable to identify the rows that need removed:
SELECT t2.TimeOfChange, t2.FieldA, t2.FieldB, t2.FieldC
FROM (
SELECT tMain.TimeOfChange, tMain.FieldA, tMain.FieldB, tMain.FieldC
, MAX(tPrev.TimeOfChange) AS prevTimeOfChange
FROM t AS tMain
LEFT JOIN t AS tPrev ON t.TimeOfChange> tPrev.TimeOfChange
GROUP BY tMain.TimeOfChange, tMain.FieldA, tMain.FieldB, tMain.FieldC
) AS t2
INNER JOIN t AS tPrev2
ON t2.prevTimeOfChange = tPrev2.TimeOfChange
AND t2.FieldA = tPrev2.FieldA
AND t2.FieldB = tPrev2.FieldB
AND t2.FieldC = tPrev2.FieldC
This can then be used in a DELETE with some indirection to force a temp table to be created.
DELETE td
FROM t AS td
WHERE (td.TimeOfChange, td.FieldA, td.FieldB, td.FieldC)
IN (SELECT * FROM ([the query above]) AS tt) -- Yes, you have to wrap the query from above in a select * so mysql will not reject it.
;
However, after getting this far, what happens when....
2019-01-01 A1 B1 C1
2019-01-02 A2 B2 C1
2019-01-03 A2 B2 C1
2019-01-04 A1 B1 C2
2019-01-05 A1 B1 C3
2019-01-05 A1 B1 C1
2019-01-06 A1 B1 C3
2019-01-07 A1 B1 C1
becomes
2019-01-01 A1 B1 C1
2019-01-02 A2 B2 C1
2019-01-04 A1 B1 C2
2019-01-05 A1 B1 C3
2019-01-05 A1 B1 C1
2019-01-07 A1 B1 C1
Does a second pass now need made to remove the 2019-01-07 entry?
Are you going to run the query repeatedly until no rows are affected?

Delete duplicate elements in SQL

How to delete duplicate elements in SQL?
That is mean in each column, each element should only occur once.
For example, I have a table like:
NAME1 NAME2 NAME3 NAME4
A1 A2 A3 A4
A1 B2 A3 A4
A1 C2 C3 B4
B1 C2 B3 C4
C1 B2 A3 B4
There are so many duplicate elements in each column and they are placed randomly.
I should convert it to the table like below:
NAME1 NAME2 NAME3 NAME4
A1 A2 A3 A4
B1 B2 B3 B4
C1 C2 C3 C4
Well, I finally found out a solution to my problem.
Select the distinct names in each column as tables and then inner join them with adding common rownumbers could work.
However, this problem could be solved since the number of distinct names in each column are equal. I am still trying to find out how to solve the problem when the number of distinct names in each column are not equal.
set #r1 = 0, #r2=0, #r3=0, #r4=0;
select A.n1, B.n2, C.n3, D.n4 from
(select *,
case when n1 is not null then (#r1:=#r1+1) end as Rownumber
from(
select distinct NAME1 n1
from MYTABLE)Tn1)A
inner join
(select *,
case when n2 is not null then (#r2:=#r2+1) end as Rownumber
from(
select distinct NAME2 n2
from MYTABLE)Tn2)B
on A.Rownumber = B.Rownumber
inner join
(select *,
case when n3 is not null then (#r3:=#r3+1) end as Rownumber
from(
select distinct NAME3 n3
from MYTABLE)Tn3)C
on A.Rownumber = C.Rownumber
inner join
(select *,
case when n4 is not null then (#r4:=#r4+1) end as Rownumber
from(
select distinct NAME4 n4
from MYTABLE)Tn4)D
on A.Rownumber = D.Rownumber;

Find average of rows over name ;display all rows

I have a table with the below values
name symbol current value
a a1 1
a a2 2
a a3 4
a a4 3
a a5 5
b b1 6
b b2 7
b b3 8
c c1 1
c c2 2
c c3 3
c c4 3
c c5 5
d d1 6
d d2 6
Required : To find the average of the current value grouping by name , yet show all results . ie ; the result show be like below ;
name symbol current value Required
a a1 1 =current value/(sum of all 'a' current values)
a a2 2 =current value/(sum of all 'a' current values)
a a3 4 =current value/(sum of all 'a' current values)
a a4 3 =current value/(sum of all 'a' current values)
a a5 5 =current value/(sum of all 'a' current values)
b b1 6 = current value /(sum of all 'b' current values)
b b2 7 = current value /(sum of all 'b' current values)
b b3 8 = current value /(sum of all 'b' current values)
Similarly for all names
Join to a subquery which finds the averages:
SELECT t1.*,
CASE WHEN t2.avg > 0 THEN t1.current_value / t2.avg ELSE 0 END AS avg
FROM yourTable t1
INNER JOIN
(
SELECT name, SUM(current_value) AS avg
FROM yourTable
GROUP BY name
) t2
ON t1.name = t2.name;
The CASE expression is necessary to protect against a possible divide by zero, which could happen if a given name happen to have all zero current values. In that case, I default the average to zero.
try this logic :)
SELECT
*
FROM
table_listing_sample
WHERE
NAME = 'a'
AND NAME = 'b'
ORDER BY
id

Sql Query on Union on specific column

I have two tables:
T1: Schema(Bucket_Id,B_Id);<br>
T2: Schema(B_Id,V_Id);<br>
Relations:
R1: Bucket_Id->B_Id (one to many)<br>
R2: B_Id->V_Id (one to many)<br>
I want to have all the B_Id OR V_Id corresponding to a given Bucket_Id.
Can someone help me with this.
Thanks
Example:
Table T1
Bucket_Id B_Id
b1 B1
b1 B2
b2 B3
b2 B4
Table T2
B_Id V_Id
B1 V1
B1 V2
B3 V3
B3 V4
Expected Output
b1 B1
b1 V1
b1 V2
b1 B2
b2 B3
b2 V3
b2 V4
b2 B4
Try this
SELECT * FROM T1
UNION ALL
SELECT * FROM T2
Try this.
select * from T1
union all
select T1.Bucket_id ,V_Id
from T1 join T2 on T1.B_Id = T2.B_Id

distinct rows when looking at selected columns

I have a table with 6 column that lists order lines.
the columns are Order, Orderline, ProductName, Description, UpdateDate, Location.
What I want to do is looking at values for first 4 columns (order, Line, ProductName, description).
If the rows are not identical in these four columns I want to return
Order, Line, Name, and Description and update dates.
If they are identical I want to
return just one of rows back (first or last).
Order LineNumber ProductName description UpdateDate Location
Order1 1 a1 b1 d1 n
Order1 1 a1 b1 d2 m
Order1 1 a3 b3 d5 L
Order2 1 a1 b1 d3 o
Order2 2 a2 b2 d4 m
I want the result to be:
Order LineNumber ProductName description UpdateDate Location
Order1 1 a1 b1 d1 n
Order1 1 a3 b3 d5 L
Order2 1 a1 b1 d3 o
Order2 2 a2 b2 d4 m
For Order1:
line 1 repeated 3 times.
2 times out of three ProductName a1, and description b1 are identical so one of these two will be returned.
1 time out of three productName a3 and description b3 is unique so this line will be returned as well.
For Order2:
all lines are identical unigue in Name and description so both lines will be returned.
Any help appriciated
You can use window function
SELECT *
FROM
(SELECT *,
ROW_NUMBER() OVER (PARTITION BY [Order], LineNumber, ProductName, [DESCRIPTION] ORDER BY UpdateDate) AS RowNum
FROM YourTable) DerivedTable
WHERE RowNum = 1