Suppose I have a table that records changes to my database over time:
TimeOfChange FieldA FieldB FieldC
-------------------------------------
2019-01-01 A1 B1 C1 /*(R1)*/
2019-01-02 A2 B2 C1 /*(R2)*/
2019-01-03 A2 B2 C1 /*(R3)*/
2019-01-05 A1 B1 C2 /*(R4)*/
2019-01-07 A1 B1 C1 /*(R5)*/
My database has many rows where nothing significant changed, eg row (R3) is the same as (R2).
I would like to remove these rows. I have found many references on how to use a common table expression to remove duplicate rows from the table. So it's possible to remove the duplicate (ignoring the TimeOfChange column) rows. But this will remove (R5) as well because it is the same as R1. I only want to remove the rows that have the same ABC values as the previous row, when ordered by the TimeOfChange column. How do I do that?
edit: You can assume that TimeOfChange values are all unique
Assuming the TimeOfChange is unique, you can do:
delete
from data
where TimeOfChange in (
select TimeOfChange
from (
select d2.TimeOfChange
from data d1
join data d2
where d2.TimeOfChange in (
select min(x.TimeOfChange)
from data x
where x.TimeOfChange>d1.TimeOfChange
) and d1.FieldA=d2.FieldA and d1.FieldB=d2.FieldB and d1.FieldC=d2.FieldC
) as q
);
So you first want to determine which rows are the "next" and then check if the "next" has the same values as the "current". For those the "next" would form a result set that you want to use in DELETE. The select * from data is there to circumvent the reuse of the table in DELETE and in the subquery.
You probably will get much better performance if you separate the logic into a stored procedure and store the id's for rows to be deleted into a temp table.
See DB Fiddle
Presuming, you really meant "when the same A, B, C occurred on the most recent day prior that had any data", this should be usable to identify the rows that need removed:
SELECT t2.TimeOfChange, t2.FieldA, t2.FieldB, t2.FieldC
FROM (
SELECT tMain.TimeOfChange, tMain.FieldA, tMain.FieldB, tMain.FieldC
, MAX(tPrev.TimeOfChange) AS prevTimeOfChange
FROM t AS tMain
LEFT JOIN t AS tPrev ON t.TimeOfChange> tPrev.TimeOfChange
GROUP BY tMain.TimeOfChange, tMain.FieldA, tMain.FieldB, tMain.FieldC
) AS t2
INNER JOIN t AS tPrev2
ON t2.prevTimeOfChange = tPrev2.TimeOfChange
AND t2.FieldA = tPrev2.FieldA
AND t2.FieldB = tPrev2.FieldB
AND t2.FieldC = tPrev2.FieldC
This can then be used in a DELETE with some indirection to force a temp table to be created.
DELETE td
FROM t AS td
WHERE (td.TimeOfChange, td.FieldA, td.FieldB, td.FieldC)
IN (SELECT * FROM ([the query above]) AS tt) -- Yes, you have to wrap the query from above in a select * so mysql will not reject it.
;
However, after getting this far, what happens when....
2019-01-01 A1 B1 C1
2019-01-02 A2 B2 C1
2019-01-03 A2 B2 C1
2019-01-04 A1 B1 C2
2019-01-05 A1 B1 C3
2019-01-05 A1 B1 C1
2019-01-06 A1 B1 C3
2019-01-07 A1 B1 C1
becomes
2019-01-01 A1 B1 C1
2019-01-02 A2 B2 C1
2019-01-04 A1 B1 C2
2019-01-05 A1 B1 C3
2019-01-05 A1 B1 C1
2019-01-07 A1 B1 C1
Does a second pass now need made to remove the 2019-01-07 entry?
Are you going to run the query repeatedly until no rows are affected?
Related
Given this DB table:
A_column B_column
---------------
A1 1
A1 2
A1 2
A2 1
A2 1
A2 1
A3 2
A3 3
A3 4
A3 5
How do I write SQL SELECT query to print out number of unique values in B_column per value in A_column, so output would be like this:
A1 2
A2 1
A3 4
I tried this, but doesn't seem to work properly:
SELECT A_column, count(B_column) FROM table GROUP BY A_column
Use distinct:
SELECT A_column, count(distinct B_column) FROM table GROUP BY A_column
I have a table with the below values
name symbol current value
a a1 1
a a2 2
a a3 4
a a4 3
a a5 5
b b1 6
b b2 7
b b3 8
c c1 1
c c2 2
c c3 3
c c4 3
c c5 5
d d1 6
d d2 6
Required : To find the average of the current value grouping by name , yet show all results . ie ; the result show be like below ;
name symbol current value Required
a a1 1 =current value/(sum of all 'a' current values)
a a2 2 =current value/(sum of all 'a' current values)
a a3 4 =current value/(sum of all 'a' current values)
a a4 3 =current value/(sum of all 'a' current values)
a a5 5 =current value/(sum of all 'a' current values)
b b1 6 = current value /(sum of all 'b' current values)
b b2 7 = current value /(sum of all 'b' current values)
b b3 8 = current value /(sum of all 'b' current values)
Similarly for all names
Join to a subquery which finds the averages:
SELECT t1.*,
CASE WHEN t2.avg > 0 THEN t1.current_value / t2.avg ELSE 0 END AS avg
FROM yourTable t1
INNER JOIN
(
SELECT name, SUM(current_value) AS avg
FROM yourTable
GROUP BY name
) t2
ON t1.name = t2.name;
The CASE expression is necessary to protect against a possible divide by zero, which could happen if a given name happen to have all zero current values. In that case, I default the average to zero.
try this logic :)
SELECT
*
FROM
table_listing_sample
WHERE
NAME = 'a'
AND NAME = 'b'
ORDER BY
id
I have a table with 6 column that lists order lines.
the columns are Order, Orderline, ProductName, Description, UpdateDate, Location.
What I want to do is looking at values for first 4 columns (order, Line, ProductName, description).
If the rows are not identical in these four columns I want to return
Order, Line, Name, and Description and update dates.
If they are identical I want to
return just one of rows back (first or last).
Order LineNumber ProductName description UpdateDate Location
Order1 1 a1 b1 d1 n
Order1 1 a1 b1 d2 m
Order1 1 a3 b3 d5 L
Order2 1 a1 b1 d3 o
Order2 2 a2 b2 d4 m
I want the result to be:
Order LineNumber ProductName description UpdateDate Location
Order1 1 a1 b1 d1 n
Order1 1 a3 b3 d5 L
Order2 1 a1 b1 d3 o
Order2 2 a2 b2 d4 m
For Order1:
line 1 repeated 3 times.
2 times out of three ProductName a1, and description b1 are identical so one of these two will be returned.
1 time out of three productName a3 and description b3 is unique so this line will be returned as well.
For Order2:
all lines are identical unigue in Name and description so both lines will be returned.
Any help appriciated
You can use window function
SELECT *
FROM
(SELECT *,
ROW_NUMBER() OVER (PARTITION BY [Order], LineNumber, ProductName, [DESCRIPTION] ORDER BY UpdateDate) AS RowNum
FROM YourTable) DerivedTable
WHERE RowNum = 1
I have a complicated version of this problem (How to select columns with same set of values in mysql?) to deal with.
In a relation R(A,B,C), The problem is to figure out "A's with 4 or more common B's". FYI: "AB" is a candidate key.
All I was able to do is this
Query:
select * from
(select A, group_concat (B separator ', ') all_b's from R group by A having
(count(B))>3) p1
join
(select A, group_concat (B separator ', ') all_b's from R group by A having
(count(B))>3) p2
on p1.all_b's = p2.all_b's and p1.A <> p2.A;
Output:
Null Set
But, the answer is supposed to be something else.
Any idea how to deal with this?
Sample Data:
A B C
a1 b1 asdas
a1 b2 sdvsd
a1 b3 sdfs
a1 b4 evevr
a2 b1 jdjd
a2 b2 dkjlfnv
a2 b3 sdfs
a2 b4 evevr
a2 b5 adfgaf
a3 b1 sdfsdf
Expected Output
A A count
a1 a2 4
It should be something like that:
SELECT
first.A AS first_A,
second.A AS second_A,
COUNT(*) AS countSameBs
FROM
R first
JOIN
R second ON
first.B = second.B AND
first.A != second.A
GROUP BY
first_A, second_A
HAVING
countSameBs >= 4 AND
first_A < second_A
I have data in below structure.
A
A1 A2
B B1
C C1 C2 C3
These information transferred into two table named group1 and group2.
group1 has first level of data and middle level data.
group2 has last level of data and middle level data.
ie
group1
group_name group_id
A 1
A1 2
B 3
C 4
C1 5
C2 6
group2
group2_name parent_id
A1 1
A2 2
B 1
B1 3
C 1
C1 4
C2 5
C3 6
Now i want to get the last level of information under the group A.
My output could be
group2_name
A2
B1
C3
I can fetch the information level 2 by using below query.
select group2.group_name from group2
inner join
group1 on group1.group_id = group2.parent_id
where group1.group_name = 'A'
How can get the above output?
Here is SQLFIDDLE Demo
Kindly help me.
You could use this:
select
group2.group_name
from
group2 left join group1
using(group_name)
where
group1.group_name is null
and group2.group_name like 'A%'
that returns all elements from table group2 that are not present in table group1.
Or (depending on how your database is structured) also this:
select
concat(left(group_name,1),
case when max(mid(group_name,2,length(group_name)-1)+0)>0 then
max(mid(group_name,2,length(group_name)-1)+0)
else '' end)
from group2
where group2.group_name like 'A%'
group by left(group_name,1)
here I am grouping for the first character of the string, and getting the maximum value of the numeric value.