Background:
Given table t1 with fields A, B (and others):
DROP TEMPORARY TABLE IF EXISTS t1;
CREATE TEMPORARY TABLE t1 (ID INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY, A varchar(255), B int, C varchar(40));
INSERT INTO t1 (A, B, C)
SELECT 'AA', 11, 100
UNION ALL
SELECT 'BB', 12, 200
UNION ALL
SELECT 'BB', 12, 201
UNION ALL
SELECT 'AA', 12, 300
UNION ALL
SELECT 'AA', 11, 101;
-- ID A B C
-- 1 AA 11 100
-- 2 BB 12 200
-- 3 BB 12 201
-- 4 AA 12 300
-- 5 AA 11 101
GOAL: For a given combination of A and B, examine how many rows there are in t1, and then list all those rows (to understand what is same and what is different between those rows).
(Finally, but beyond the scope of this question, will be writing queries to process some of the older rows, that are determined to be "obsolete" (replaced by the most recent row with given A and B). Its not safe to do so for ALL combinations of A and B, at this time. A definitive answer on "what combinations of A and B are safe to delete old version of" is not available to me - this is a legacy table which has associated with it many GBs of external files, most of which are no longer relevant to anyone. All those files have been backed up; I need to make a conservative proposal as to which files to remove, and how I determined those files.)
I've made temp table t2 with all distinct combinations of A and B (plus an ID, and a count of how many rows of each combo):
DROP TEMPORARY TABLE IF EXISTS t2;
CREATE TEMPORARY TABLE t2 (ID INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY, A varchar(255), B int)
SELECT COUNT(1) As Cnt, A, B FROM t1
GROUP BY A, B
ORDER BY Cnt DESC;
SELECT * FROM t2
-- ID Cnt A B
-- 1 2 AA 11
-- 2 2 BB 12
-- 3 1 AA 12
The query that I am having trouble writing:
In the actual data, there are hundreds of rows for some combinations. I am most interested in the combinations that have a high count, so I attempt to dump the rows of t1, based on the first row of t2:
SELECT * FROM t1
WHERE A=
(SELECT A from t2 LIMIT 1 OFFSET 0) AND
B=
(SELECT B from t2 LIMIT 1 OFFSET 0);
This gives error:
Error Code: 1137. Can't reopen table: 't2'
I presume that I should refer to the row I want from t2:
(SELECT A, B from t2 LIMIT 1 OFFSET 0)
And then make a nested query that uses this row twice, in the two places where columns A and B are used. I am stuck on how to write this query. The basic idea in my head is:
SELECT * FROM t1
WHERE A=t3.A AND B=t3.B IN
(SELECT A, B from t2 LIMIT 1 OFFSET 0) AS t3;
(which is not valid SQL)
NOTE: "OFFSET 0" is there because then I will change to other offset values, to examine other A-B combos.
The goal is to see response:
-- ID A B C
-- 1 AA 11 100
-- 5 AA 11 101
Or maybe this can be done with a JOIN, but I'm not sure how to do a JOIN using just one row of t2.
You can do something like this:
SELECT t1.*
FROM t1
JOIN ( SELECT t2.A, t2.B
FROM t2
ORDER BY t2.A, t2.B
LIMIT 1 OFFSET 0
) t3
WHERE t3.A = t1.A
AND t3.B = t1.B
Without an ORDER BY clause, MySQL is free to return any row. We need to add the ORDER BY to make the result deterministic.
Related
Given a table data as follows:
id
a1
a2
a3
1
b
300
10
2
c
111
12
3
b
300
10
4
b
300
10
Is there a way to select ONLY THOSE ids of rows where information, which is stored in columns "a1","a2,"a3", differ?
In this case, the output should be:
[1, 2] OR [2, 3] OR [2, 4]
Doesnt matter wether the representative id of "same rows" was taken from the first one, third one or fourth one.
What I have tried:
SELECT id
FROM data
GROUP BY a1, a2, a3;
This ofcourse wont work unless I disable ONLY_GROUP_BY_ALL mode, however I'd rather not disable such feature and turn to alternatives if those exist.
As a joke:
SELECT t1.id, t2.id
FROM data t1
JOIN data t2 ON (t1.a1, t1.a2, t1.a3) <> (t2.a1, t2.a2, t2.a3)
LIMIT 1
If some columns may be NULL then
SELECT t1.id, t2.id
FROM data t1
JOIN data t2 ON NOT ((t1.a1, t1.a2, t1.a3) <=> (t2.a1, t2.a2, t2.a3))
LIMIT 1
I've seen this, this, this, this and this but my question is different.
I have a Table1:
id c a b rc bid
1 12 4 6 35 4
2 12 4 6 67 7
3 12 4 6 88 8
4 23 4 7 49 3
5 23 5 8 59 8
Table2 also has the same columns but does not have bid column.
A row is considered a duplicate if it has the same values of columns c, a and b. So rows 1, 2 and 3 are considered duplicates because they have 12, 4 and 6.
I want to insert rows of Table1 to Table2, but only those rows that are not duplicates. Which means that rows 1, 2 and 3 won't get inserted to Table2. Only rows 4 and 5 will get inserted because they have no duplicates.
So Table2 will look like this after the inserts:
id c a b rc
1 23 4 7 49
2 23 5 8 59
I know I can get which rows have no duplicates using this query:
select distinct c,a,b,count(*) from Table1 group by c,a,b having count(*) > 1
But am not able to figure out how to insert these to Table2 because the insertion requires specific columns to be specified.
Tried something like this which obviously doesn't work:
insert into Table2 (c, a, b, rc) select distinct c,a,b,count(*) from Table1 group by c,a,b having count(*) > 1
You can use also not in in subselect
INSERT INTO Table2(c, a, b, rc, bid)
SELECT c, a, b, rc, bid
FROM Table1 t1
WHERE (c,a,b) not in ( SELECT c,a,b
FROM Table1 t2
GROUP BY c, a, b
HAVING COUNT(*) > 1
)
You can use NOT EXISTS to exclude duplicate rows:
INSERT INTO Table2(c, a, b, rc, bid)
SELECT
c, a, b, rc, bid
FROM Table1 t1
WHERE NOT EXISTS(
SELECT 1
FROM Table1 t2
WHERE
t2.c = t1.c
AND t2.a = t1.a
AND t2.b = t1.b
HAVING COUNT(*) > 1
)
The HAVING COUNT(*) > 1 will check if there are duplicates.
insert into table2 (c,a,b,rc)
select c,a,b,rc from table1
where id in (select distinct id
from Table1 group by c,a,b having count(*) = 1)
There are many ways to do that. You have already got so many correct answers. Here, I am giving the query based on the way you approached.
INSERT INTO Table2 (c, a, b, rc)
SELECT
c,
a,
b,
rc
FROM
Table1
GROUP BY c, a, b
HAVING count(*) = 1;
newbie here to SQL. So I have two tables, let's take for example the two tables below.
Table A
set_num s_id s_val
100 3 AA
100 5 BB
200 3 AA
200 9 CC
Table B
s_id s_val phrase seq
1 DD 'hi' 'first'
3 AA 'hello' 'first'
6 EE 'goodnight' 'first'
5 BB 'world' 'second'
9 CC 'there' 'second'
4 FF 'bye' 'first'
I want to join Table A with Table B on two columns, like a composite key (s_id, s_val), and I want to return
set_num from Table A and the concatenation of phrases in Table B (which we will call entire_phrase, concat(...) AS entire_phrase).
The concatenation should also follow an order in which the phrases are to be concatenated. This will be determined by seq column in Table B for each phrase. "First" will indicate this phrase needs to come first and "Second", well comes next. I will like to do this with a SELECT query but not sure if this is possible without it getting to complex. Can I do this in SELECT or does this call for another approach?
Expected Output:
set_num entire_phrase
100 'hello world'
200 'hello there'
And not
set_num entire_phrase
100 'world hello'
200 'there hello'
Any help/approach will be greatly appreciated!
You can do it like this:
select temp1.set_num, concat(phrase1,' ',phrase2) as entire_phrase
from (
(
select set_num, b.phrase as phrase1
from TableA as A
join TableB as B
on a.s_id = b.s_id
and a.s_val = b.s_val
and b.seq = 'first'
) as temp1
join
(
select set_num, b.phrase as phrase2
from TableA as A
join TableB as B
on a.s_id = b.s_id
and a.s_val = b.s_val
and b.seq = 'second'
) as temp2
on temp1.set_num = temp2.set_num
)
Running here: http://sqlfiddle.com/#!9/d63ac3/1
I have one log table and one view.
I would like to fetch the changed rows from the view by comparing it to the log table given an ID_NO.
The ID_NO is fixed between the two tables, whereas other columns can change.
In short, I would like to fetch the rows from Table1 which have one more changed columns in comparison to Table2.
for example:
TABLE 1:
ID COL1 COL2 COL3
1 A B C
2 34 56 D
3 F XY 24
TABLE 2:
ID COL1 COL2 COL3
1 A B C
2 34 56 F
3 1 XY 24
The query should return the following from TABLE2:
ID COL1 COL2 COL3
2 34 56 F
3 1 XY 24
Please advise.
Many Thanks!
SELECT *
FROM one_view vw
WHERE EXISTS
(
SELECT 1
FROM log_table t
WHERE vw.id_no = t.id_no
)
;
A note after the question was updated:
SELECT *
FROM table_2 t1
WHERE EXISTS
(
SELECT 1
FROM table_1 t2
WHERE t1.id_no = t2.id_no
AND
(
t1.col1 <> t2.col1
OR t1.col2 <> t2.col2
OR t1.col3 <> t2.col3
)
)
;
you could add a trigger to the changing table that inserts the id in a second table that is used to identify the changed rows from the changing table. Just comparing the values between tables might work but requires a lot of work. Getting the id's of the changed rows might be easier.
Just in case you also want to have the old values, add the changed colums and values to the logging table.
Suppose this table:
ID ColA ColB
1 7 8
2 7 9
3 7 9
4 5 8
5 6 9
6 6 9
7 5 4
The PK is the ID coumn.
Now, I want to delete all duplicates of ColA and ColB in consecutive rows.
In this example rows 2,3 and 5,6 contain duplicates.
These shall be removed so that the higher ID is remained.
The output should be:
ID ColA ColB
1 7 8
3 7 9
4 5 8
6 6 9
7 5 4
How can this be done with mySQL?
Thanks,
Juergen
SELECT
ID
FROM
MyTable m1
WHERE
0 < (SELECT
COUNT(*)
FROM
MyTable m2
WHERE
m2.ID = m1.ID - 1 AND
m2.ColA = m1.ColA AND
m2.ColB = m1.ColB)
and then you can use a
delete from MyTable where ID in ...
query. This way it would surely work in any version.
CREATE TEMPORARY TABLE duplicates (id int primary key)
INSERT INTO duplicates (id)
SELECT t1.id
FROM table t1
join table t2 on t2.id = t1.id + 1
WHERE t1.ColA = t2.ColA
and t1.ColB = t2.ColB
-- SELECT * FROM duplicates --> are you happy with that? => delete
DELETE table
FROM table
join duplicates on table.id = duplicates.id
Depending on how many records you have, this might not be the most efficient:
SELECT (SELECT TOP 1 id FROM table WHERE colA = m.colA AND colB = m.colB ORDER BY id DESC) AS id, m.*
FROM (SELECT DISTINCT colA, colB
FROM table) m
There might be syntax errors because I usually use mssql, but the idea should be similar.
I've called the first table 'test'.
Firstly create a table that will hold all the identical combinations of ColA and ColB:
create temporary table tmpTable (ColA int, ColB int);
insert into tmpTable select ColA,ColB from test group by ColA, ColB;
Now, select the maximum id in the original table for each identical combination of ColA and ColB. Put this into a new table (called idsToKeep because these are the rows we do not want to delete):
create temporary table idsToKeep (ID int);
insert into idsToKeep select (select max(ID) from test where test.ColA=tmpTable.ColA and test.ColB=tmpTable.ColB) from tmpTable;
Finally, delete all the entries from the original table that are not in the idsToKeep table:
delete from test where ID <> all (select ID from idsToKeep);