SQL normalized data INSERT WHERE NOT EXISTS ; ON DUPLICATE KEY UPDATE - mysql

i am using MySql Workbench version 6.3.9 with mySql 5.6.35.
i have the following tables:
EQUIPMENT
eID | caochID | eName
COACH
coachID | coachName
SQLfiddle prepared http://sqlfiddle.com/#!9/e333d/1
eID is a primary key. there are multiple coachID's in different equipment, so there will be duplicate coachIDs with different equipment, but the eID will be unique as it is a primary key.
REQUIRED
i need to insert a row in the equipment table, if it does not already exist. If it exists, do nothing.
various posts online have pointed me towards two options:
a) INSERT...ON DUPLICATE KEY UPDATE...
b)INSERT...WHERE NOT EXISTS
PROBLEM i have problems with both of these solutions. for the first solution (ON DUPLICATE KEY UPDATE) the query inserts the row as required but does not update the existing row. instead it creates a new entry. for the second solution (WHERE NOT EXISTS) i get an error : SYNTAX ERROR: 'WHERE' (WHERE) is not a valid input at this position.
the sql query doesnt need to make any joins. i listed both tables so that you can see how they are related. the insert query i need will only insert for the equipment table.

You can insert by using a tmp table and ensuring that the same record is not existing from current table. Add limit 1 to ensure only one record is inserted. Below query will not insert since 1 and small ball exists.
INSERT INTO `Equipment` (`c_id`, `eName`)
SELECT * FROM (SELECT '1', 'small ball') tmp
WHERE NOT EXISTS (
SELECT c_id FROM Equipment WHERE `c_id`='1' and `eName` = 'small ball'
) LIMIT 1;

NOT EXISTS
insert into table2 (....) --- all if not columns ... destination
select ....
from table1 t1 --- source of data to check
where not exists (
select 1
from table2 t2
where t2.col = t1.col --- match source and destination table making sure table1 data is not in table2
)

Related

Remove duplicate records in mysql

I have a table called leads with duplicate records
Leads:
*account_id
*campaign_id
I want to remove all the duplicate account_id where campaign_id equal to "51"
For example, if account_id = 1991 appears two times in the table then remove the one with campaign_id = "51" and keep the other one.
You could use a delete join:
DELETE t1
FROM yourTable t1
INNER JOIN yourTable t2
ON t2.account_id = t1.account_id AND
t2.campaign_id <> 51
WHERE
t1.campaign_id = 51;
There's no problem to delete from a table provided that:
You use the correct syntax.
You have done a backup of the table BEFORE you do any deleting.
However, I would suggest a different method:
Create a new table based on the existing table:
CREATE TABLE mytable_new LIKE mytable;
Add unique constraint (or PRIMARY KEY) on column(s) you don't want to have duplicates:
ALTER TABLE mytable_new ADD UNIQUE(column1,[column2]);
Note: if you want to identify a combination of two (or more) columns as unique, place all the column names in the UNIQUE() separated by comma. Maybe in your case, the constraint would be UNIQUE(account_id, campaign_id).
Insert data from original table to new table:
INSERT IGNORE INTO mytable_new SELECT * FROM mytable;
Note: the IGNORE will insert only non-duplicate values that match with the UNIQUE() constraint. If you have an app that runs a MySQL INSERT query to the table, you have to update the query by adding IGNORE.
Check data consistency and once you're satisfied, rename both tables:
RENAME TABLE mytable TO mytable_old;
RENAME TABLE mytable_new TO mytable;
The best thing about this is that in case that if you see anything wrong with the new table, you still have the original table.
Changing the name of the tables only take less than a second, the probable issue here is that it might take a while to do the INSERT IGNORE if you have a large data.
Demo fiddle
DELETE t1
FROM yourTable t1
INNER JOIN yourTable t2
ON t2.account_id = t1.account_id AND
t2.campaign_id <> 51
WHERE
t1.campaign_id = 51;

Delete Duplicates from large mysql Address DB

I know, deleting duplicates from mysql is often discussed here. But none of the solution work fine within my case.
So, I have a DB with Address Data nearly like this:
ID; Anrede; Vorname; Nachname; Strasse; Hausnummer; PLZ; Ort; Nummer_Art; Vorwahl; Rufnummer
ID is primary Key and unique.
And i have entrys for example like this:
1;Herr;Michael;Müller;Testweg;1;55555;Testhausen;Mobile;012345;67890
2;Herr;Michael;Müller;Testweg;1;55555;Testhausen;Fixed;045678;877656
The different PhoneNumber are not the problem, because they are not relevant for me. So i just want to delete the duplicates in Lastname, Street and Zipcode. In that case ID 1 or ID 2. Which one of both doesn't matter.
I tried it actually like this with delete:
DELETE db
FROM Import_Daten db,
Import_Daten dbl
WHERE db.id > dbl.id AND
db.Lastname = dbl.Lastname AND
db.Strasse = dbl.Strasse AND
db.PLZ = dbl.PLZ;
And insert into a copy table:
INSERT INTO Import_Daten_1
SELECT MIN(db.id),
db.Anrede,
db.Firstname,
db.Lastname,
db.Branche,
db.Strasse,
db.Hausnummer,
db.Ortsteil,
db.Land,
db.PLZ,
db.Ort,
db.Kontaktart,
db.Vorwahl,
db.Durchwahl
FROM Import_Daten db,
Import_Daten dbl
WHERE db.lastname = dbl.lastname AND
db.Strasse = dbl.Strasse And
db.PLZ = dbl.PLZ;
The complete table contains over 10Mio rows. The size is actually my problem. The mysql runs on a MAMP Server on a Macbook with 1,5GHZ and 4GB RAM. So not really fast. SQL Statements run in a phpmyadmin. Actually i have no other system possibilities.
You can write a stored procedure that will each time select a different chunk of data (for example by rownumber between two values) and delete only from that range. This way you will slowly bit by bit delete your duplicates
A more effective two table solution can look like following.
We can store only the data we really need to delete and only the fields that contain duplicate information.
Let's assume we are looking for duplicate data in Lastname , Branche, Haushummer fields.
Create table to hold the duplicate data
DROP TABLE data_to_delete;
Populate the table with data we need to delete ( I assume all fields have VARCHAR(255) type )
CREATE TABLE data_to_delete (
id BIGINT COMMENT 'this field will contain ID of row that we will not delete',
cnt INT,
Lastname VARCHAR(255),
Branche VARCHAR(255),
Hausnummer VARCHAR(255)
) AS SELECT
min(t1.id) AS id,
count(*) AS cnt,
t1.Lastname,
t1.Branche,
t1.Hausnummer
FROM Import_Daten AS t1
GROUP BY t1.Lastname, t1.Branche, t1.Hausnummer
HAVING count(*)>1 ;
Now let's delete duplicate data and leave only one record of all duplicate sets
DELETE Import_Daten
FROM Import_Daten LEFT JOIN data_to_delete
ON Import_Daten.Lastname=data_to_delete.Lastname
AND Import_Daten.Branche=data_to_delete.Branche
AND Import_Daten.Hausnummer = data_to_delete.Hausnummer
WHERE Import_Daten.id != data_to_delete.id;
DROP TABLE data_to_delete;
You can add a new column e.g. uq and make it UNIQUE.
ALTER TABLE Import_Daten
ADD COLUMN `uq` BINARY(16) NULL,
ADD UNIQUE INDEX `uq_UNIQUE` (`uq` ASC);
When this is done you can execute an UPDATE query like this
UPDATE IGNORE Import_Daten
SET
uq = UNHEX(
MD5(
CONCAT(
Import_Daten.Lastname,
Import_Daten.Street,
Import_Daten.Zipcode
)
)
)
WHERE
uq IS NULL;
Once all entries are updated and the query is executed again, all duplicates will have the uq field with a value=NULL and can be removed.
The result then is:
0 row(s) affected, 1 warning(s): 1062 Duplicate entry...
For newly added rows always create the uq hash and and consider using this as the primary key once all entries are unique.

MySQL insert in one of two tables based on condition for one table

Consider two tables that have timestamp and data columns. I need to construct an SQL that does the following:
Insert data (unique timestamp and data column) in one table if timestamp value is not present in the table ("insert my data in table 1 for timestamp="12:00 1999-01-01" only if that timestamp is not present in table 1...)
Otherwise, insert very same data in different table without any checks, and overwrite if necessary (... otherwise insert same set of fields in table 2).
How I could possibly achieve this on SQL? I could do it using a client but this is way slower. I use MySQL
Run a query for your 2nd bullet first. i.e. insert data into table 2 if it is present in table 1
insert into table2 (data, timestamp)
select 'myData', '12:00 1999-01-01'
from table1
where exists (
select 1 from table1
where timestamp = '12:00 1999-01-01'
)
limit 1
Then run your the query for your 1st bullet i.e. insert into table1 only if the data doesn't already exist
insert into table1 (data, timestamp)
select 'myData', '12:00 1999-01-01'
from table1
where not exists (
select 1 from table1
where timestamp = '12:00 1999-01-01'
)
limit 1
Running both these queries will always only insert 1 row into 1 table because if the row exists in table1, the not exists condition of the 2nd query will be false and if it doesn't exist in table1, then the exists condition of the 1st query will be false.
You may want to consider creating a unique constraint on table1 to automatically prevent duplicates so you can use insert ignore for your inserts into table1
alter table table1 add constraint myIndex (timestamp);
insert ignore into table1 (data,timestamp) values ('myData','12:00 1999-01-01');
A regural INSERT statement can insert records into one table only. You have 2 options:
Code the logic within the application
Create a stored procedure within mysql and code the application logic there
No matter which route you choose, I would
Add a unique index on the timestamp column in both tables.
Attempt to insert the data into the 1st table. If the insert succeeds, everything is OK. If the timestamp exists, then you will get an error (or a warning depending on mysql confioguration). Your solution handles the error (in mysql see DECLARE ... HANDLER ...).
Insert the data into the 2nd table using INSERT INTO ... ON DUPLICATE KEY UPDATE ... statement, which will insert the data if the timestamp does not exists, or updates the record if it does.

Insert into master table when detail records present but missing master

I have two tables - one master, one detail (i.e. a one-to-many pair of tables). I'm importing data from a horrible schema and one feature of the data is that often I have some detail records but no master.
How would go about inserting master records in these cases? I can locate the missing masters easily enough with this query:
select * from p_ltx_surgical_comp as c -- detail
left join p_ltx_surgical as s -- master
on c.fk_oid = s.fk_oid -- this is the key
where s.oid is null -- primary key, so null means no record exists
group by c.fk_oid; -- only show one value even if there are multiple detail records
Oh, and as an extra wrinkle, I only want to insert a single master even if there a are multiple detail records.
You can start with this INSERT query:
INSERT INTO p_ltx_surgical (fk_oid)
SELECT DISTINCT c.fk_oid
FROM
p_ltx_surgical_comp AS c
LEFT JOIN p_ltx_surgical AS s
ON c.fk_oid = s.fk_oid
WHERE
s.oid IS NULL
and you can add more details to your table, for example:
INSERT INTO p_ltx_surgical (fk_oid, description, ...)
SELECT DISTINCT c.fk_oid, 'missing record', ...
FROM
...
Ah, I was so close... this seems to have worked:
insert into p_ltx_surgical (oid, fk_oid, ltx_surg_date)
select sp_getvdtablekey('p_ltx_surgical', 0), c.fk_oid, '1900-01-01' from
p_ltx_surgical_comp as c -- detail
left join p_ltx_surgical as s -- master
on c.fk_oid = s.fk_oid -- this is the key
where s.oid is null
group by c.fk_oid; -- primary key, so null means no record exists

Delete statement in a same table

I need to query a delete statement for the same table based on column conditions from the same table for a correlated subquery.
I can't directly run a delete statement and check a condition for the same table in mysql for a correlated subquery.
I want to know whether using temp table will affect mysql's memory/performance?
Any help will be highly appreciated.
Thanks.
You can make mysql do the temp table for you by wrapping your "where" query as an inline from table.
This original query will give you the dreaded "You can't specify target table for update in FROM clause":
DELETE FROM sametable
WHERE id IN (
SELECT id FROM sametable WHERE stuff=true
)
Rewriting it to use inline temp becomes...
DELETE FROM sametable
WHERE id IN (
SELECT implicitTemp.id from (SELECT id FROM sametable WHERE stuff=true) implicitTemp
)
Your question is really not clear, but I would guess you have a correlated subquery and you're having trouble doing a SELECT from the same table that is locked by the DELETE. For instance to delete all but the most recent revision of a document:
DELETE FROM document_revisions d1 WHERE edit_date <
(SELECT MAX(edit_date) FROM document_revisions d2
WHERE d2.document_id = d1.document_id);
This is a problem for MySQL.
Many examples of these types of problems can be solved using MySQL multi-table delete syntax:
DELETE d1 FROM document_revisions d1 JOIN document_revisions d2
ON d1.document_id = d2.document_id AND d1.edit_date < d2.edit_date;
But these solutions are best designed on a case-by-case basis, so if you edit your question and be more specific about the problem you're trying to solve, perhaps we can help you.
In other cases you may be right, using a temp table is the simplest solution.
can't directly run a delete statement and check a condition for the same table
Sure you can. If you want to delete from table1 while checking the condition that col1 = 'somevalue', you could do this:
DELETE
FROM table1
WHERE col1 = 'somevalue'
EDIT
To delete using a correlated subquery, please see the following example:
create table project (id int);
create table emp_project (id int, project_id int);
insert into project values (1);
insert into project values (2);
insert into emp_project values (100, 1);
insert into emp_project values (200, 1);
/* Delete any project record that doesn't have associated emp_project records */
DELETE
FROM project
WHERE NOT EXISTS
(SELECT *
FROM emp_project e
WHERE e.project_id = project.id);
/* project 2 doesn't have any emp_project records, so it was deleted, now
we have 1 project record remaining */
SELECT * FROM project;
Result:
id
1
Create a temp table with the values you want to delete, then join it to the table while deleting. In this example I have a table "Games" with an ID column. I will delete ids greater than 3. I will gather the targets in a temp table first so I can report on them later.
DECLARE #DeletedRows TABLE (ID int)
insert
#DeletedRows
(ID)
select
ID
from
Games
where
ID > 3
DELETE
Games
from
Games g
join
#DeletedRows x
on x.ID = g.ID
I have used group by aggregate with having clause and same table, where the query was like
DELETE
FROM TableName
WHERE id in
(select implicitTable.id
FROM (
SELECT id
FROM `TableName`
GROUP by id
HAVING count(id)>1
) as implicitTable
)
You mean something like:
DELETE FROM table WHERE someColumn = "someValue";
?
This is definitely possible, read about the DELETE syntax in the reference manual.
You can delete from same table. Delete statement is as follows
DELETE FROM table_name
WHERE some_column=some_value