MySQL merging rows with equivalent ID - mysql

I have an issue where a mistake resulted in a database table having both emails and GUIDs mixed in the ID column.
The table looks like this (simplified):
GUID | Value | Date
cf#a | 21 | 2016
mf#b | 42 | 2015
mf#b | 21 | 2016
1aXd | 3 | 2016
a7vf | 9 | 2015
Where for example user cf#a and user 1aXd are the same. The GUID - Email combinations are stored in another table. The emails are unique. Primary key is the ID(GUID) and the Date combined. My problem is, how do i update the table and merge the rows? the Value column of two merging rows should be summed.
So the example table assuming (cf#a -> 1aXd and mf#b -> 8bga and ui#q -> a7vf) would become:
GUID | Value | Date
1aXd | 24 | 2016 <-- merged, Value = 21+3
8bga | 42 | 2015 <-- converted to GUID
8bga | 21 | 2016 <-- converted to GUID
<-- one row removed (merged with the first one)
a7vf | 9 | 2015 <-- untouched
Thank you for any help!
I could do this in C# but i would rather learn how to do it with the MySQL Workbench

Use JOIN:
SELECT t1.Value + t2.Value
FROM t1
JOIN t2 USING (`GUID`)
If you want update values, you need something like this:
UPDATE t1
JOIN t2 USING (`GUID`)
SET t1.Value = t1.Value + t2.Value
Removing merged rows:
DELETE t2 FROM t2
JOIN t1 USING (`GUID`)
UPDATE
If has only one table.
Merge:
UPDATE t1
JOIN (
SELECT GUID, SUM(Value) as amount, COUNT(1) as cnt
FROM t1
GROUP BY `GUID`
HAVING cnt > 1
) t2 ON t2.GUID = t1.GUID
SET t1.Value = t2.amount;
Delete:
CREATE table t2 (
GUID integer,
Value integer,
Date integer
);
INSERT INTO t2 (GUID, Value, Date)
SELECT GUID, Value, MAX(Date) FROM t1 GUID, Value;
Result will be in t2.

Related

Remove continuous duplicated values with different IDs in MySQL

I know there is a ton of same questions about finding and removing duplicate values in mySQL but my question is a bit different:
I have a table with columns as ID, Timestamp and price. A script scrapes data from another webpage and saves it in the database every 10 seconds. Sometimes data ends up like this:
| id | timestamp | price |
|----|-----------|-------|
| 1 | 12:13 | 100 |
| 2 | 12:14 | 120 |
| 3 | 12:15 | 100 |
| 4 | 12:16 | 100 |
| 5 | 12:17 | 110 |
As you see there are 3 duplicated values and removing the price with ID = 4 will shrink the table without damaging data integrity. I need to remove continuous duplicated records except the first one (which has the lowest ID or Timestamp).
Is there a sufficient way to do it? (there is about a million records)
I edited my scraping script so it checks for duplicated price before adding it but I need to shrink and maintain my old data.
Since MySQL 8.0 you can use window function LAG() in next way:
delete tbl.* from tbl
join (
-- use lag(price) for get value from previous row
select id, lag(price) over (order by id) price from tbl
) l
-- join rows with same previous price witch will be deleted
on tbl.id = l.id and tbl.price = l.price;
fiddle
I am just grouping based on price and filtering only one record per group.The lowest id gets displayed.Hope the below helps.
select id,timestamp,price from yourTable group by price having count(price)>0;
My query is based on #Tim Biegeleisen one.
-- delete records
DELETE
FROM yourTable t1
-- where exists an older one with the same price
WHERE EXISTS (SELECT 1
FROM yourTable t2
WHERE t2.price = t1.price
AND t2.id < t1.id
-- but does not exists any between this and the older one
AND NOT EXISTS (SELECT 1
FROM yourTable t3
WHERE t1.price <> t3.price
AND t3.id > t2.id
AND t3 < t1.id));
It deletes records where exists an older one with same price but does not exists any different between
It could be checked by timestamp column if id column is not numeric and ascending.

How to Delete Duplicate Rows Based on 3 Column Values and Length in MySQL

I wanted some help in regards to understanding how I can delete duplicate records from my database table. I have a table of 1 million records which has been collected over a 2 year period hence there is a number of records that need to be deleted as they have been added numerous times into the database.
The following is a query that I wrote based on the three columns that I am matching for duplicates, taking a count and I have also added a length of one of the columns as this will determine whether I delete all the records or just the duplicates.
SELECT
Ref_No,
End_Date,
Filename,
count(*) as cnt,
length(Ref_No)
FROM
master_table
GROUP BY
Ref_No,
End_Date,
Filename,
length(Ref_No)
HAVING
COUNT(*) > 1
;
This then gives me an output like the following:
Ref_No | End_Date | Filename | cnt | length(Ref_No)
05011384 | 2018-07-01 | File1 | 2 | 8
1234 | 2018-12-31 File2 | 11 | 4
1000002975625 | 2018-12-31 | File3 | 13
123456789123456789 | 2019-02-06 | File3 | 18
Now I have a list of rules to follow based on the length column and this will determine whether I leave the records as they are with the duplicates, delete the duplicates or delete all the records and this is where I am stuck.
My rules are the following:
If length is between 0 and 4 - Keep all records with duplicates
If length is between 5 and 10 - Delete Duplicates, keep 1 record
If length equals 13 - Delete Duplicates, keep 1 record
If length is 11, 12, 14-30 - Delete all records
I would really appreciate if some could advice on how I go about completing this task.
Thanks.
I have managed to create a temporary table in which I add a unique id. The only thing is that I am running the query twice with the length part changed for my requirements.
INSERT INTO UniqueIDs
(
SELECT
T1.ID
FROM
master_table T1
LEFT JOIN
master_table T2
ON
(
T1.Ref_No = T2.Ref_No
AND
T1.End_Date = T2.End_Date
AND
T1.Filename = T2.Filename
AND
T1.ID > T2.ID
)
WHERE T2.ID IS NULL
AND
LENGTH(T1.Ref_No) BETWEEN 5 AND 10
)
;
I then just run the following delete to keep the unique ids in the table and remove the rest.
DELETE FROM master_table WHERE id NOT IN (SELECT ID FROM UniqueIDs);
That's it.

SQL get max of columns where a row equals something

If I have Table with 3-columns:
Date | Name | Num
oct1 | Bob | 2
oct2 | Zayne | 1
oct1 | Test | 5
oct2 | Apple | 7
I want to retrieve the rows where Num is MAX,
WHERE Date = oct1 or Date = oct2
So I want result to be:
oct1 Test 5
oct2 Apple 7
MYSQL is preferred. But SQL answer be given also. Thanks.
You can try below using correlated subquery
select * from tablename a
where num in (select max(num) from tablename b where a.date=b.date)
and date in ('oct1', 'oct2')
It sounds like you want this query:
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT Date, MAX(Num) AS max_num
FROM yourTable
WHERE Date IN ('oct1', 'oct2')
GROUP BY Date
) t2
ON t1.Date = t2.Date AND t1.Num = t2.max_num
WHERE t1.Date IN ('oct1', 'oct2');
By the way, you should seriously consider storing proper date data in an actual date or datetime column in MySQL. It appears you are just storing text right now, which would be hard to work with.
You can try to use correctly subquery
Schema (MySQL v5.7)
CREATE TABLE T(
Date VARCHAR(50),
Name VARCHAR(50),
Num INT
);
INSERT INTO T VALUES ('oct1','Bob',2);
INSERT INTO T VALUES ('oct2','Zayne',1);
INSERT INTO T VALUES ('oct1','Test',5);
INSERT INTO T VALUES ('oct2','Apple',7);
Query #1
SELECT *
FROM T t1
WHERE Num = (SELECT MAX(Num) FROM T tt WHERE t1.Date = tt.Date)
AND
t1.Date in ('oct1','oct2')
| Date | Name | Num |
| ---- | ----- | --- |
| oct1 | Test | 5 |
| oct2 | Apple | 7 |
View on DB Fiddle
As you where asking for a standard way to do this: All the answers given so far comply with the SQL standard. One more possible approach in standard SQL is to use a window function. This is only featured in MySQL as of version 8, however.
select date, name, num
from
(
select date, name, num, max(num) over (partition by date) as max_num
from mytable
) analyzed
where num = maxnum
order by date;
This only reads the table once, which can (but not necessarily does) speed up the query.
You can use corelated subquery just like below
SELECT *
FROM T t1
WHERE Num = (SELECT MAX(Num) FROM T t2 WHERE t2.Date = t1.Date)
Fiddle link
Date Name Num
oct1 Test 5
oct2 Apple 7

sql presto query to join 2 tables interatably

I need to do this in the sql query. LEt me know if this possible
I have a table which has a mapping like (table1)
num,value
2,'h'
3,'b'
5,'c'
Now I have another table which have these values (table2)
name, config
"test1",45
"test2",20
Now what I want is the sql query which will add another column to my table2 by checking if config column values are divisible by table1.num and if yes concat the table1.values to it
so now after the sql query it should become
name, config, final
"test1",45, bc
"test2",20, hc
Please let me know if I can form a query for this
You can by using a cross join,the mod function https://dev.mysql.com/doc/refman/8.0/en/mathematical-functions.html#function_mod and group_concat https://dev.mysql.com/doc/refman/8.0/en/group-by-functions.html#function_group-concat
select t2.name,t2.config,group_concat(t1.value separator '') final
from table1 t1
cross join table2 t2
where t2.config % t1.num = 0
group by t2.name,t2.config
+-------+--------+-------+
| name | config | final |
+-------+--------+-------+
| test1 | 45 | bc |
| test2 | 20 | hc |
+-------+--------+-------+
2 rows in set (0.00 sec)
The answer from P.Salmon should work for MySQL. If you are using Presto, then this would work:
SELECT t2.name,
t2.config,
array_join(array_agg(t1.value),'','') AS final
FROM table1 t1
CROSS JOIN table2 t2
WHERE t2.config % t1.num = 0
GROUP BY t2.name,
t2.config

Possible to create a mysql query that only displays things that are in descending order

To start things off, I want to make it clear that I'm not trying to order by descending order.
I am looking to order by something else, but then filter further by displaying things in a second column only if the value in that column 1 row below it is less than itself. Once It finds that the next column is lower, it stops.
Example:
Ordered by column-------------------Descending Column
353215 20
535325 15
523532 10
666464 30
473460 20
If given that data, I would like it to only return 20, 15 and 10. Because now that 30 is higher than 10, we don't care about what's below it.
I've looked everywhere and can't find a solution.
EDIT: removed the big number init, and edd the counter in ifnull test, so it works in pure MySQL: ifnull(#prec,counter) and not ifnull(#prec,999999).
If your starting table is t1 and the base request was:
select id,counter from t1 order by id;
Then with a mysql variable you can do the job:
SET #prec=NULL;
select * from (
select id,counter,#prec:= if(
ifnull(#prec,counter)>=counter,
counter,
-1) as prec
from t1 order by id
) t2 where prec<>-1;
except here I need the 99999 as a max value for your column and there's maybe a way to put the initialisation of #prec to NULL somewhere in the 1st request.
Here the prec column contains the 1st row value counter, and then the counter value of each row if it less than the one from previous row, and -1 when this becomes false.
Update
The outer select can be removed completely if the variable assignment is done in the WHERE clause:
SELECT #prec := NULL;
SELECT
id,
counter
FROM t1
WHERE
(#prec := IF(
IFNULL(#prec, counter) >= counter,
counter,
-1
)) IS NOT NULL
AND #prec <> -1
ORDER BY id;
regilero EDIT:
I can remove the 1st initialization query using a temporary table (left join) of 1 row this way: but this may slow down the query, maybe.
(...)
FROM t1
LEFT JOIN (select #prec:=NULL as nullinit limit 1) as tmp1 ON tmp1.nullinit is null
(..)
As said by #Mike using a simple UNION query or even :
(...)
FROM t1 , (select #prec:=NULL) tmp1
(...)
is better if you want to avoid the first query.
So at the end the nicest solution is:
SELECT NULL AS id, NULL AS counter FROM dual WHERE (#prec := NULL)
UNION
SELECT id, counter
FROM t1
WHERE (
#prec := IF(
IFNULL(#prec, counter) >= counter,
counter,
-1 )) IS NOT NULL
AND #prec <> -1
ORDER BY id;
+--------+---------+
| id | counter |
+--------+---------+
| 353215 | 20 |
| 523532 | 10 |
| 535325 | 15 |
+--------+---------+
EXPLAIN SELECT output:
+----+--------------+------------+------+---------------+------+---------+------+------+------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+------+---------------+------+---------+------+------+------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE |
| 2 | UNION | t1 | ALL | NULL | NULL | NULL | NULL | 6 | Using where |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | Using filesort |
+----+--------------+------------+------+---------------+------+---------+------+------+------------------+
You didn't find a solution because it is impossible.
SQL works only within a row, it can not look at rows above or below it.
You could write a stored procedure to do this, essentially looping one row at a time and calculating the logic.
It would probably be easier to write it in the frontend language, whatever it is you are using.
I'm afraid you can't do it in SQL. Relational databases were designed for different purpose so there is no abstraction like next or previous row. Do it outside the SQL in the 'wrapping' language.
I'm not sure whether these do what you want, and they're probably too slow anyway:
SELECT t1.col1, t1.col2
FROM tbl t1
WHERE t1.col2 = (SELECT MIN(t2.col2) FROM tbl t2 WHERE t2.col1 <= t1.col1)
Or
SELECT t1.col1, t1.col2
FROM tbl t1
INNER JOIN tbl t2 ON t2.col1 <= t1.col1
GROUP BY t1.col1, t1.col2
HAVING t1.col2 = MIN(t2.col2)
I guess you could maybe select them (in order) into a temporary table, that also has an auto-incrementing column, and then select from the temporary table, joining on to itself based on the auto-incrementing column (id), but where t1.id = t2.id + 1, and then use the where criteria (and appropriate order by and limit 1) to find the t1.id of the row where the descending column is greater in t2 than in t1. After which, you can select from the temporary table where the id is less than or equal to the id that you just found. It's not exactly pretty though! :)
It is actually possible, but the performance isn't easy to optimize. If Col1 is ordered and Col2 is the descending column:
First you create a self join of each row with the next row (note that this only works if the column value is unique, if not you need to join on unique values).
(Select Col1, (Select Min(Col2) as A2 from MyTable as B Where B.A2>A.Col1) As Col1FromNextRow From MyTable As A) As D
INNER JOIN
(Select Col1 As C1,Col2 From MyTable As C On C.C1=D.Col1FromNextRow)
Then you implement the "keep going until the first ascending value" bit:
Select Col2 FROM
(
(Select Col1, (Select Min(Col2) as A2 from MyTable as B Where B.A2>A.Col1) As Col1FromNextRow From MyTable As A) As D
INNER JOIN
(Select Col1 As C1,Col2 From MyTable As C On C.C1=D.Col1FromNextRow)
) As E
WHERE NOT EXISTS
(SELECT Col1 FROM MyTable As Z Where z.COL1<E.Col1 and Z.Col2 < E.Col2)
I don't have an environment to test this, so it probably has bugs. My apologies, but hopefully the idea is semi clear.
I would still try to do it outside of SQL.