I couldn't find and couldn't produce a solution to this problem with SQL Mysql.
I want to update two columns of a table, and the origin of these values ββare from another table, having to come randomly.
Here's a tentative example:
update table1 a1,
(select col1, col2
from table2
ORDER BY RAND() limit 1) a2
set a1.col1 = a2.col1, a1.col2 = a2.col2
where a1.col3 is not null;
From this form, the same value from table2 is always coming.
table1 | table2
id col1 col2 | id col1 col2
1 aaa bbb | 1 xxx yyy
2 ccc ddd | 2 www ttt
| 3 uuu vvv
I want the values ββ(col1, col2) from table 2 to be defined in table1 randomly (col1 and col2).
Without Limit 1, it is also being updated with the same record. As if there were 1 record in table2.
That is, for each line of the update, a subquery is made in the other table bringing a record randomly.
You can use join and row_number(), but multiple times:
update table1 t1 join
(select *, row_number() over (order by rand()) as seqnum
from table1 t1
) tt1
on tt1.id = t1.id join
(select *, row_number() over (order by rand()) as seqnum
from table2 t2
) t2
on t2.seqnum = t1.seqnum
set t1.col1 = t2.col1,
t1.col2 = t2.col2;
This adds a sequence number defined randomly to the two tables and joins on that for the matching. The extra join is to implement the update.
Related
Suppose I have data containing two columns I am interested in. Ideally, the data in these is always in matching sets like this:
A 1
A 1
B 2
B 2
C 3
C 3
C 3
However, there might be bad data where the same value in one column has different values in the other column, like this:
D 4
D 5
or:
E 6
F 6
How do I isolate these bad rows, or at least show that some of them exist?
You can use exists:
select t.*
from t
where exists (select 1 from t t2 where t2.col1 = t.col1 and t2.col2 <> t.col2);
If you just want the col1 values that have non-matches, you can use aggregation:
select col1, min(col2), max(col2)
from t
group by col1
having min(col2) <> max(col2);
Using MIN and MAX as analytic functions we can try:
WITH cte AS (
SELECT t.*, MIN(col2) OVER (PARTITION BY col1) AS min_col2,
MAX(col2) OVER (PARTITION BY col1) AS max_col2
FROM yourTable t
)
SELECT col1, col2
FROM cte
WHERE min_col2 <> max_col2;
The above approach, while seemingly verbose, would return all offending rows.
I have 2 tables, table_1 and table_2. table_1 included all data which I need to update to table_2.
table_1
column_2
column_3
b1
b1
b2
b2
table_2
column_1
column_2
column_3
column_4
1
a1
a1
a
2
a
a
a
2
a
a
a
1
a2
a2
a
2
a
a
a
I need to put all data of table_1 to table_2 where column_1 is a specific number, for example, 1. However, I don't have any foreign key to join these two tables. The only relationship is that table_1 has n rows, table_2 also has n rows where column_1 = 1, and I want n rows in table_1 to be updated to these n rows in table_2.
My result would look like this:
column_1
column_2
column_3
column_4
1
b1
b1
a
2
a
a
a
2
a
a
a
1
b2
b2
a
2
a
a
a
Any help would be appreciated.
I think you should try to do that with a scripting language instead of using sql.
get everything from table2 where column_1=1 order by column_2 to an array of objects like
[
{column_1: 1, column_2: b1, column_3: b1, column_4: a},
{column_1: 1, column_2: b2, column_3: b2, column_4: a}
]
then get everything from table1 order by column_2 in an array of objects
[
{column_1: b1, column_2: b1},
{column_1: b2, column_2: b2}
]
and for every element in table1, update table2 using column_1, column_2, column_3, column_4 in a where qlause
I dont think any other way to do this...it really a pain if the structure is like that
It's unclear by what logic you would like which rows in table1 to update which in table2. I will assume you just want to go in order: row 1 to row 1, 2 to 2 etc.
What we can do is add a row_number() to each table, then join on that.
I'm not 100% on MySQL syntax but hopefully you should get the idea. See also here for further "update through join" syntaxes:
WITH t2 AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS rn
FROM table_2
WHERE column_1 = 1
),
t1 AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS rn
FROM table_1
)
UPDATE t2
SET
t2.column_2 = t1.column_2,
t2.column_3 = t1.column_3
INNER JOIN t1 ON t1.rn = t2.rn;
If you cannot do an update on a WITH table, then you must self-join table2. You haven't indicated the PK of that table, I will just use column PK:
UPDATE table_2
SET
table_2.column_2 = t1.column_2,
table_2.column_3 = t1.column_3
INNER JOIN (
SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS rn
FROM table_2
WHERE column_1 = 1
) AS t2 ON t2.PK = table_2.PK
INNER JOIN (
SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS rn
FROM table_1
)AS t1 ON t1.rn = t2.rn;
I have table like this
Table1
ID | Val | Val2 |
606541 |3175031503131004|3175032612900004|
606542 |3175031503131004|3175032612900004|
677315 |3175031503131004|3175032612980004|
222222 |1111111111111111|8888888888888888|
231233 |1111111111111111|3175032612900004|
111111 |9999992222211111|1111111111111111|
57 |3173012102121018|3173015101870020|
59 |3173012102121018|3173021107460002|
2 |900 |7000 |
4 |900 |7001 |
I have two condition with column Val and Val2. Show the result if the Val:
Val column has at least two or more duplicate values AND
Val2 column has no duplicate value (unique)
For example :
Sample 1
ID | Val | Val2 |
606541 |3175031503131004|3175032612900004|
606542 |3175031503131004|3175032612900004|
677315 |3175031503131004|3175032612980004|
False, because even the Val column
had two or more duplicate but the Val2
had dulicate value (ID 606541 and 606542)
Sample Expected 1 Result
No records
Sample 2
ID | Val | Val2 |
222222 |1111111111111111|8888888888888888|
231233 |1111111111111111|3175032612900004|
111111 |9999992222211111|1111111111111111|
True, Because the condition is match,
Val column had duplicate value AND Val2 had unique values
Sample 2 Expected Result
ID | Val | Val2 |
222222 |1111111111111111|8888888888888888|
231233 |1111111111111111|3175032612900004|
Sample 3
ID | Val | Val2 |
606541 |3175031503131004|3175032612900004|
606542 |3175031503131004|3175032612900004|
677315 |3175031503131004|3175032612980004|
222222 |1111111111111111|8888888888888888|
231233 |1111111111111111|3175032612900004|
111111 |9999992222211111|1111111111111111|
Note : This is false condition, Because even the value for id 606541, 606542, and
677315 in column Val had duplicate value at least
two or more but the value in column Val2 had no unique value (it could be true condition if id 606541,
606542, and 677315 had 3 different value on Val2).
NOte 2 : for Id 222222 and 231233 that had duplicate value, this is still false, because the column
Val2 with ID 231233 had the same value with ID 606542 and 606541 (3175032612900004), so it didnt match
the second condition which only have no duplicate value
Sample 3 Expected Result
No records
Now back to Table1 in the earlier, i tried to show result from the two condition with this query
SELECT
tb.* FROM table1 tb
WHERE
tb.Val2 IN (
SELECT ta.Val2
FROM (
SELECT
t.*
FROM
table1 t
WHERE
t.Val IN (
SELECT Val FROM table1
GROUP BY Val
HAVING count( Val ) > 1 )
) ta
GROUP BY
ta.Val2
HAVING
count( ta.Val2 ) = 1
)
The result
ID Val Val2
677315 3175031503131004 3175032612980004
222222 1111111111111111 8888888888888888
57 3173012102121018 3173015101870020
59 3173012102121018 3173021107460002
2 900 7000
4 900 7001
While i expect the result was like this:
ID Val Val2
57 3173012102121018 3173015101870020
59 3173012102121018 3173021107460002
2 900 7000
4 900 7001
Is there something wrong with my query ?
Here is my DB Fiddle.
Excuse for any mistakes as this would be my first answer in this forum.
Could you also try with below, i agree to the answer with window function though.
SELECT t.*
FROM table1 t
WHERE t.val IN (SELECT val
FROM table1
GROUP BY val
HAVING COUNT(val) > 1
AND COUNT(val) = COUNT(DISTINCT val2)
)
AND t.val NOT IN (SELECT t.val
FROM table1 t
WHERE EXISTS (SELECT 1
FROM table1 tai
WHERE tai.id != t.id
AND tai.val2 = t.val2));
/*
first part of where clause makes sure we have distinct values in column val2 for repeated value in column val
second part of where clause with not in tells us there is no value shares across different ids with respect to value in column val2
*/
--reverse order query ( not sure gives the expected result)
SELECT t.*
FROM table2 t
WHERE t.val IN (SELECT val FROM table2 GROUP BY val HAVING COUNT(val) = 1)
AND t.val2 IN (SELECT t.val2
FROM table2 ta
WHERE EXISTS (SELECT 1
FROM table2 tai
WHERE tai.id != ta.id
AND tai.val = ta.val));
You have to use Group By to find val & val2 with duplicate values and need to use Inner Join and Left Join in order to include/eliminate records as given conditions (oppose to IN, NOT IN etc. clauses that might cause performance issues in case you're dealing with large data).
Please find the query below:
select t1.*from table1 t1 left join
(select val from table1
where val2 in (select val2 from table1 group by val2 having count(id) > 1)
) t2
on t1.val = t2.val
inner join
(select val from table1 group by val having count(id) >1) t3
on t1.val = t3.val
where t2.val is null
Query for Reverse Condition:
select t1.*from table1 t1 inner join
(select val from table1 group by val having count(id) = 1)
t2
on t1.val = t2.val
inner join
(select val2 from table1 group by val2 having count(id) >1) t3
on t1.val2 = t3.val2
Please find fiddle for both queries here.
Can you try this and let me know the results? SQL fiddle
SELECT t1.id, t1.val, t1.val2 FROM table1 t1
JOIN (
select val from
(select id, val, val2 from table1 group by val2 having count(1) = 1) a
group by a.val having count(1) > 1
)t2 on t1.val = t2.val;
you can use group by :
select * from (select * from #table1 where Val2 in (select Val2 val from #table1 group by Val2 having COUNT(*) =1 )) select1
where select1.val in (select Val val from #table1 group by Val having COUNT(*) >1)
or you can use RANK :
select * from ( SELECT
i.id,
i.Val val,
RANK() OVER (PARTITION BY i.val ORDER BY i.id DESC) AS Rank1,
RANK() OVER (PARTITION BY i.val2 ORDER BY i.id DESC) AS Rank2
FROM #table1 AS i
) select1 where select1.Rank1 >1 or select1.Rank2 =2
You don't need group by or having. Sub-selects will do the job just fine.
SELECT * FROM MyTable a
WHERE (SELECT Count(*) FROM MyTable b WHERE a.val = b.val) >= 2
AND (SELECT Count(*) FROM MyTable c WHERE a.val2 = c.val2) = 1;
This looks at the table as if it was 3 identical tables, but it's just one. The first sub select
(SELECT Count(*) FROM MyTable b WHERE a.val = b.val)
returns a number containing how many occurrences of "Val" are in the table; if there are at least 2 we're good to go. The second sub select
(SELECT Count(*) FROM MyTable c WHERE a.val2 = c.val2)
returns a number containing how many occurrences of "Val2" are in the table; if it's 1 and the first sub select returns at least 2 then we print the record.
If you want a solution, i think this will help.
I got the
val2s which has no duplicates
vals which has more than 1 duplicates
and join
Select t.* from
table1 t
inner join
(Select val2 from table1 group by val2 having count(*) = 1) tv2 on t.val2 = tv2.val2
inner join
(Select val from table1 group by val having count(*) > 1) tv on t.val = tv.val;
You can do it with EXISTS and NOT EXISTS.
If you want only the column Val:
select t1.val from table1 t1
where not exists (
select 1 from table1
where val = t1.val and val2 in (select val2 from table1 group by val2 having count(*) > 1)
)
group by t1.val
having count(t1.val) > 1
If you want full rows:
select t1.* from table1 t1
where exists (select 1 from table1 where id <> t1.id and val = t1.val)
and not exists (
select 1 from table1
where val = t1.val and val2 in (select val2 from table1 group by val2 having count(*) > 1)
)
And one solution with window functions for MySql 8.0+:
select t.id, t.val, t.val2
from (
select *, max(counter2) over (partition by val) countermax
from (
select *,
count(*) over (partition by val) counter,
count(*) over (partition by val2) counter2
from table1
) t
) t
where t.counter > 1 and t.countermax = 1
See the demo.
Common Table Expressions may help readability and perhaps performance as well.
with dup as (select val, count(*) -- two or more of val
from table1
group by val
having count(*)>1)
select tb1.*
from table1 tb1
inner join dup
on dup.val = tb1.val
where not exists (select val2, count(*) -- Not exists is generally fast
from table1
where val = tb1.val
group by 1
having count(*) > 1)
Fiddle
I'm going through your dataset at the moment, and I feel like your final result is accurate when you compare the results to your original dataset. Your criteria used are:
Val is duplicated at least once
Val2 is unique
9999992222211111 is the only unique value in the Val list, so that's the only value I don't expect to see in the final result. For Val2, the only duplicated value is 3175032612900004, so I don't expect to see in the final result.
What it sounds like you're trying to do is to apply the original conditions to your final result table (which is different from your original data table). If that's what you're after, you can go through the same process applied to the original table to your new table, in which you'll get the exact result you want.
I've taken that and included all of this in my fiddle below. You'll see two output queries, one with the result you're seeing, and one with the result you want. Let me know if this answers your question! =)
Here's my fiddle: fiddle
The answer to your query
Is there something wrong with my query ?
is in your Note 2 of Sample 3
NOte 2 : for Id 222222 and 231233 that had duplicate value, this is still false, because the column
Val2 with ID 231233 had the same value with ID 606542 and 606541 (3175032612900004), so it didnt match
the second condition which only have no duplicate value
You are not eliminating the records where Val2 is duplicate with another record outside the set. So, all you need to do in your query is to add the below condition
AND tb.Val NOT IN (SELECT t.Val
FROM table1 t
WHERE t.Val2 IN (SELECT Val2 FROM table1 GROUP BY Val2 HAVING count( Val2 ) > 1 ))
I have added this condition to your query and see the expected results. See fiddle below
My Fiddle
The answer given by #Govind feels like a better re-write of your requirements. It is checking for the duplicates of Val column only when there are no duplicates in Val2 column. Very neat and concise query.
Answer by Govind
Something like this?
SELECT *
FROM table1
WHERE val IN
(SELECT val
FROM table1
GROUP BY val
HAVING COUNT(*) > 1 AND COUNT(DISTINCT val2) = COUNT(*))
AND val NOT IN (SELECT t.val
FROM table1 t
INNER JOIN (SELECT val2
FROM table1
GROUP BY val2
HAVING COUNT(*) > 1) x
ON x.val2 = t.val2);
`select val, count(*) from table1 group by val having count(*)>=2;`
`val count(*)`
`1111111111111111 2`
`3173012102121018 2`
`3175031503131004 3`
`900 2`
Val column has at least two or more duplicate values - TRUE
select val2, count(*) from table1 group by val2 having count(*)>1;
`val2 count(*)`
`3175032612900004 3`
Val2 column has no duplicate value (unique) - FALSE
So ideally you should get no records found right?
I am trying to clean up records stored in a MySQL table. If a row contains %X%, I need to delete that row and the row immediately below it, regardless of content. E.g. (sorry if the table is insulting anyone's intelligence):
| 1 | leave alone
| 2 | Contains %X% - Delete
| 3 | This row should also be deleted
| 4 | leave alone
| 5 | Contains %X% - Delete
| 6 | This row should also be deleted
| 7 | leave alone
Is there a way to do this using only a couple of queries? Or am I going to have to execute a SELECT query first (using the %x% search parameter) then loop through those results and execute a DELETE...WHERE for each index returned + 1
This should work although its a bit clunky (might want to check the LIKE argument as it uses pattern matching (see comments)
DELETE FROM table.db
WHERE idcol IN
( SELECT idcol FROM db.table WHERE col LIKE '%X%')
OR idcolIN
( SELECTidcol+1 FROMdb.tableWHEREcol` LIKE '%X%')
Let's assume the table was named test and contained to columns named id and data.
We start with a SELECT that gives us the id of all rows that have a preceding row (highest id of all ids lower than id of our current row):
SELECT t1.id FROM test t1
JOIN test t2 ON
( t2.id, true )
=
( SELECT t3.id, t3.data LIKE '%X%' FROM test t3
WHERE t3.id < t1.id ORDER BY id DESC LIMIT 1 )
That gives us the ids 3 and 6. Their preceding rows 2 and 5 contain %X%, so that's good.
Now lets get the ids of the rows that contain %X% and combine them with the previous ones, via UNION:
(SELECT t1.id FROM test t1
JOIN test t2 ON
( t2.id, true )
=
( SELECT t3.id, t3.data LIKE '%X%' FROM test t3
WHERE t3.id < t1.id ORDER BY id DESC LIMIT 1 )
)
UNION
(
SELECT id FROM test WHERE data LIKE '%X%'
)
That gives us 3, 6, 2, 5 - nice!
Now, we can't delete from a table and select from the same table in MySQL - so lets use a temporary table, store our ids that are to be deleted in there, and then read from that temporary table to delete from our original table:
CREATE TEMPORARY TABLE deleteids (id INT);
INSERT INTO deleteids
(SELECT t1.id FROM test t1
JOIN test t2 ON
( t2.id, true )
=
( SELECT t3.id, t3.data LIKE '%X%' FROM test t3
WHERE t3.id < t1.id ORDER BY id DESC LIMIT 1 )
)
UNION
(
SELECT id FROM test WHERE data LIKE '%X%'
);
DELETE FROM test WHERE id in (SELECT * FROM deleteids);
... and we are left with the ids 1, 4 and 7 in our test table!
(And since the previous rows are selected using <, ORDER BY and LIMIT, this also works if the ids are not continuous.)
You can do it all in a single DELETE statement:
Assuming the "row immediately after" is based on the order of your INT-based ID column, you can use MySQL variables to assign row numbers which accounts for gaps in your IDs:
DELETE a FROM tbl a
JOIN (
SELECT a.id, b.id AS nextid
FROM (
SELECT a.id, a.text, #rn:=#rn+1 AS rownum
FROM tbl a
CROSS JOIN (SELECT #rn:=1) rn_init
ORDER BY a.id
) a
LEFT JOIN (
SELECT a.id, #rn2:=#rn2+1 AS rownum
FROM tbl a
CROSS JOIN (SELECT #rn2:=0) rn_init
ORDER BY a.id
) b ON a.rownum = b.rownum
WHERE a.text LIKE '%X%'
) b ON a.id IN (b.id, b.nextid)
SQL Fiddle Demo (added additional data for example)
What this does is it first takes your data and ranks it based on your ID column, then we do an offset LEFT JOIN on an almost identical result set except that the rank column is behind by 1. This gets the rows and their immediate "next" rows side by side so that we can pull both of their id's at the same time in the parent DELETE statement:
SELECT a.id, a.text, b.id AS nextid, b.text AS nexttext
FROM (
SELECT a.id, a.text, #rn:=#rn+1 AS rownum
FROM tbl a
CROSS JOIN (SELECT #rn:=1) rn_init
ORDER BY a.id
) a
LEFT JOIN (
SELECT a.id, a.text, #rn2:=#rn2+1 AS rownum
FROM tbl a
CROSS JOIN (SELECT #rn2:=0) rn_init
ORDER BY a.id
) b ON a.rownum = b.rownum
WHERE a.text LIKE '%X%'
Yields:
ID | TEXT | NEXTID | NEXTTEXT
2 | Contains %X% - Delete | 3 | This row should also be deleted
5 | Contains %X% - Delete | 6 | This row should also be deleted
257 | Contains %X% - Delete | 3434 | This row should also be deleted
4000 | Contains %X% - Delete | 4005 | Contains %X% - Delete
4005 | Contains %X% - Delete | 6000 | Contains %X% - Delete
6000 | Contains %X% - Delete | 6534 | This row should also be deleted
We then JOIN-DELETE that entire statement on the condition that it deletes rows whose IDs are either the "subselected" ID or NEXTID.
There is no reasonable way of doing this in a single query. (It may be possible, but the query you end up having to use will be unreasonably complex, and will almost certainly not be portable to other SQL engines.)
Use the SELECT-then-DELETE approach you described in your question.
I have two tables
The first with only 5 rows
The second with 800 rows
I'm using this query:
SELECT *
FROM table1 t1
JOIN (SELECT * FROM table2 ORDER BY RAND() LIMIT 5) t2
But I'm getting 5 rows from the first table for each result of the second table.
I don't need a condition when joining, I just want 5 random results from the second table to join the 5 results from the first.
Example:
--------------------------------------------------------
|table1 (always with same order)| table2(random order) |
--------------------------------------------------------
item1 | item4
item2 | item2
item3 | item5
item4 | item1
item5 | item3
Do you mean UNION ?
SELECT * FROM table1
UNION SELECT * FROM table2 ORDER BY RAND() LIMIT 5;
Update: revised answer after modification of your question:
SELECT field1 FROM table1
UNION SELECT field2 FROM table2 ORDER BY RAND() LIMIT 5;
To my understanding, you just need one field from each table. If you need several ones, you can list them: field2, field2, ... as long as the number of fields is the same in both SELECTs.
Update 2: ok, I think I see what you mean now. Here is a (dirty) way to do it, I'm quite confident someone can come with a more elegant solution though:
SET #num1=0, #num2=0;
SELECT t1.field1, t2.field2
FROM (
SELECT field1, #num1:=#num1+1 AS num
FROM table1
) AS t1
INNER JOIN (
SELECT field2, #num2:=#num2+1 AS num
FROM (
SELECT field2
FROM table2
ORDER BY RAND()
LIMIT 5
) AS t
) AS t2
ON t1.num = t2.num;
Try use subquery in select. The subquery part pick an id for each row of table1.
SELECT
id AS table1_id,
(
SELECT id FROM table2 ORDER BY RAND() LIMIT 1
) AS table2_id
FROM table1
The query result would be like this:
table1_id
table2_id
1
24
2
13
3
36
4
68
5
5
You may join with table2 to select other table2 column:
SELECT table1.*, table2.*
FROM (
SELECT
id AS table1_id,
(
SELECT id FROM table2 ORDER BY RAND() LIMIT 1
) AS table2_id
FROM table1
) t
JOIN table1 on t.table1_id = table1.id
JOIN table2 on t.table2_id = table2.id