How to avoid this kind of duplicate? - mysql

This is my table for many to many relationship:
Related:
-id
-id_postA
-id_postB
I want this:
If for example there is a row with id_postA = 32 and id_postB = 67
then it must ignore the insertion of a row with id_postA = 67 AND id_postB = 32.

One option would be to create a unique index on both columns:
CREATE UNIQUE INDEX uk_related ON related (id_postA, id_postB);
And then prevent "duplicates by order inversion" using a trigger, ordering id_postA and id_postB on INSERT and UPDATE:
CREATE TRIGGER order_uk_related
BEFORE INSERT -- Duplicate this trigger also for UPDATE
ON related -- As MySQL doesn't support INSERT OR UPDATE triggers
FOR EACH ROW
BEGIN
DECLARE low INT;
DECLARE high INT;
SET low = LEAST(NEW.id_postA, NEW.id_postB);
SET high = GREATEST(NEW.id_postA, NEW.id_postB);
SET NEW.id_postA = low;
SET NEW.id_postB = high;
END;
As you can see in this SQLFiddle, the fourth insert will fail, as (2, 1) has already been switched to (1, 2) by the trigger:
INSERT INTO relation VALUES (1, null, null)
INSERT INTO relation VALUES (2, null, null)
INSERT INTO relation VALUES (3, 2, 1)
INSERT INTO relation VALUES (4, 1, 2)
Function-based indexes
In some other databases, you might be able to use a function-based index. Unfortunately, this is not possible in MySQL (Is it possible to have function-based index in MySQL?). If this were an Oracle question, you'd write:
CREATE UNIQUE INDEX uk_related ON related (
LEAST(id_postA, id_postB),
GREATEST(id_postA, id_postB)
);

you can include a where like:
For example
insert into table_name
(id_postA
,id_postB
select
col1,
col2
from table_1
where where (cast(col1 as varchar)+'~'+cast(col2 as varchar))
not in (select cast(id_postB as varchar)+'~'+cast(id_postA as varchar) from table_name)

If you always insert these with A < B, you won't have to worry about the reverse being inserted. This can be done with a simple sort, or a quick comparison before inserting.
Join tables like this are by their very nature uni-directional. There is no automatic method for detecting the reverse join and blocking it with a simple UNIQUE index.
Normally what you'd do, though, is insert in pairs:
INSERT INTO related (id_postA, id_postB) VALUES (3,4),(4,3);
If this insert fails, then one or both of those links is already present.

Related

How can I increase insert speed?

I need to import data from an external web service to my mySQL(5.7) database.
Problem is, that I need to split the data into to tables. So for example I have the tables
CREATE TABLE a (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100)
);
CREATE TABLE b (
id INT PRIMARY KEY AUTO_INCREMENT,
a_id INT,
name VARCHAR(100)
);
Now I have to insert multiple rows into table b for one row in table a (1:n)
As I do not know the id of table a before inserting it, the only way is to insert one row in table a, get the last id and then insert all connected entries to table b.
But, my database is very slow when I insert row by row. It takes more than 1h to insert about 35000 rows in table a and 120000 in table b. If I do a batch insert about 1000 rows on table a (just for testing without filling table b) it is incredible faster (less then 3 minutes)
I guess there must be a solution how I can speed up my import.
Thanks for your help
I presume you are working with a programming language driving your inserts. You need to be able to program this sequence of operations.
First, you need to use this sequence to put a row into a and dependent rows into b. It uses LAST_INSERT_ID() to handle a_id. That's faster and much more robust than querying the table to find the correct id value.
INSERT INTO a (name) VALUES ('Claus');
SET #a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (#a_id, 'von');
INSERT INTO b (a_id, name) VALUES (#a_id, 'Bönnhoff');
The trick is to capture the a.id value in the session variable #a_id, and then reuse it for each dependent INSERT. (I have turned you into an aristocrat to illustrate this, sorry :-)
Second, you should keep this in mind: INSERTs are cheap, but transaction COMMITs are expensive. That's because MySQL (InnoDB actually) does not actually update tables until COMMIT. Unless you manage your transactions explicitly, the DBMS uses a feature called "autocommit" in which it immediately commits each INSERT (or UPDATE or DELETE).
Fewer transactions gets you better speed. Therefore, to improve bulk-loading performance you want to bundle together 100 or so INSERTs into a single transaction. (The exact number doesn't matter very much.) You can do something like this:
START TRANSACTION; /* start an insertion bundle */
INSERT INTO a (name) VALUES ('Claus');
SET #a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (#a_id, 'von');
INSERT INTO b (a_id, name) VALUES (#a_id, 'Bönnhoff');
INSERT INTO a (name) VALUES ('Oliver');
SET #a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (#a_id, 'Jones');
... more INSERT operations ...
INSERT INTO a (name) VALUES ('Jeff');
SET #a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (#a_id, 'Atwood');
COMMIT; /* commit the bundle */
START TRANSACTION; /* start the next bundle */
INSERT INTO a (name) VALUES ('Joel');
SET #a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (#a_id, 'Spolsky');
... more INSERT operations ...
COMMIT; /* finish the bundle */
(All this, except LAST_INSERT_ID(), works on any SQL-based RDBMS. Each make of RDBMS has its own way of handling IDs.(

Mysql Insert if not exist in two column

I looked into MySQL duplicate key but cant figure it out.
I have a table like below:
id series chapter path(can be unique)
I want only insert data and not update. Lets say I have data like below:
seri:Naruto, klasor:567 ==> If both of these exist in table then do not insert.
seri:Naruto, klasor:568 ==> If Naruto exist but 568 does not exist then do insert.
How can I achieve this?
Easiest way would be to define unique index with two columns on that table:
ALTER TABLE yourtable ADD UNIQUE INDEX (seri,klasor);
You may also define two column primary key, which would work just as well.
Then use INSERT IGNORE to only add rows when they will not be duplicates:
INSERT IGNORE INTO yourtable (seri, klasor) VALUES ('Naruto',567);
INSERT IGNORE INTO yourtable (seri, klasor) VALUES ('Naruto',568);
Edit: As per comments, you can't use UNIQUE INDEX which complicates things.
SET #seri='Naruto';
SET #klasor=567;
INSERT INTO yourtable
SELECT seri,klasor FROM (SELECT #seri AS seri, #klasor AS klasor)
WHERE NOT EXISTS (SELECT seri, klasor FROM yourtable WHERE seri=#seri AND klasor=#klasor);
You may use the above query with two local variables or convert it to single statement by replacing the local variables with actual values.
Better way would be to use stored procedure:
CREATE PROCEDURE yourinsert (vseri VARCHAR(8), vklasor INT)
BEGIN
DECLARE i INT;
SELECT COUNT(*) INTO i FROM yourtable WHERE seri=vseri AND klasor=vklasor;
IF i=0 THEN
INSERT INTO yourtable (seri,klasor) VALUES (vseri, vklasor);
END IF;
END;
This would allow you to perform the INSERT using:
CALL yourinsert('Naruto',567);
INSERT INTO table_name (seri, klasor) VALUES ('Naruto',567)
WHERE NOT EXISTS( SELECT seri,klasor FROM table_name WEHERE seri='Naruto' AND klasor=567
)
Hope this helps..

Equivalent of MySQL ON DUPLICATE KEY UPDATE in Sql Server

I am trying to find an equivalent of the following MySql query in Sql Server (2012)?
INSERT INTO mytable (COL_A, COL_B, COL_C, COL_D)
VALUES ( 'VAL_A','VAL_B', 'VAL_C', 'VAL_D')
ON DUPLICATE KEY UPDATE COL_D= VALUES(COL_D);
Can anyone help?
PS. I have read that MERGE query has similar function, but I find the syntax of that very different.
You are basically looking for an Insert or Update pattern sometimes referred to as an Upsert.
I recommend this: Insert or Update pattern for Sql Server - Sam Saffron
For a procedure that will be dealing with single rows, either these transactions would work well:
Sam Saffron's First Solution (Adapted for this schema):
begin tran
if exists (
select *
from mytable with (updlock,serializable)
where col_a = #val_a
and col_b = #val_b
and col_c = #val_c
)
begin
update mytable
set col_d = #val_d
where col_a = #val_a
and col_b = #val_b
and col_c = #val_c;
end
else
begin
insert into mytable (col_a, col_b, col_c, col_d)
values (#val_a, #val_b, #val_c, #val_d);
end
commit tran
Sam Saffron's Second Solution (Adapted for this schema):
begin tran
update mytable with (serializable)
set col_d = #val_d
where col_a = #val_a
and col_b = #val_b
and col_c = #val_c;
if ##rowcount = 0
begin
insert into mytable (col_a, col_b, col_c, col_d)
values (#val_a, #val_b, #val_c, #val_d);
end
commit tran
Even with a creative use of IGNORE_DUP_KEY, you'd still be stuck having to use an insert/update block or a merge statement.
A creative use of IGNORE_DUP_KEY - Paul White #Sql_Kiwi
update mytable
set col_d = 'val_d'
where col_a = 'val_a'
and col_b = 'val_b'
and col_c = 'val_c';
insert into mytable (col_a, col_b, col_c, col_d)
select 'val_a','val_b', 'val_c', 'val_d'
where not exists (select *
from mytable with (serializable)
where col_a = 'val_a'
and col_b = 'val_b'
and col_c = 'val_c'
);
The Merge answer provided by Spock should do what you want.
Merge isn't necessarily recommended. I use it, but I'd never admit that to #AaronBertrand.
Use Caution with SQL Server's MERGE Statement - Aaron Bertrand
Can I optimize this merge statement - Aaron Bertrand
If you are using indexed views and MERGE, please read this! - Aaron Bertrand
An Interesting MERGE Bug - Paul White
UPSERT Race Condition With Merge
Try this...
I've added comments to try and explain what happens where in a SQL Merge statement.
Source : MSDN : Merge Statement
The Merge Statement is different to the ON DUPLICATE KEY UPDATE statement in that you can tell it what columns to use for the merge.
CREATE TABLE #mytable(COL_A VARCHAR(10), COL_B VARCHAR(10), COL_C VARCHAR(10), COL_D VARCHAR(10))
INSERT INTO #mytable VALUES('1','0.1', '0.2', '0.3'); --<These are the values we'll be updating
SELECT * FROM #mytable --< Starting values (1 row)
MERGE #mytable AS target --< This is the target we want to merge into
USING ( --< This is the source of your merge. Can me any select statement
SELECT '1' AS VAL_A,'1.1' AS VAL_B, '1.2' AS VAL_C, '1.3' AS VAL_D --<These are the values we'll use for the update. (Assuming column COL_A = '1' = Primary Key)
UNION
SELECT '2' AS VAL_A,'2.1' AS VAL_B, '2.2' AS VAL_C, '2.3' AS VAL_D) --<These values will be inserted (cause no COL_A = '2' exists)
AS source (VAL_A, VAL_B, VAL_C, VAL_D) --< Column Names of our virtual "Source" table
ON (target.COL_A = source.VAL_A) --< This is what we'll use to find a match "JOIN source on Target" using the Primary Key
WHEN MATCHED THEN --< This is what we'll do WHEN we find a match, in your example, UPDATE COL_D = VALUES(COL_D);
UPDATE SET
target.COL_B = source.VAL_B,
target.COL_C = source.VAL_C,
target.COL_D = source.VAL_D
WHEN NOT MATCHED THEN --< This is what we'll do when we didn't find a match
INSERT (COL_A, COL_B, COL_C, COL_D)
VALUES (source.VAL_A, source.VAL_B, source.VAL_C, source.VAL_D)
--OUTPUT deleted.*, $action, inserted.* --< Uncomment this if you want a summary of what was inserted on updated.
--INTO #Output --< Uncomment this if you want the results to be stored in another table. NOTE* The table must exists
;
SELECT * FROM #mytable --< Ending values (2 row, 1 new, 1 updated)
Hope that helps
You can simulate a near identitical behaviour using an INSTEAD OF TRIGGER:
CREATE TRIGGER tMyTable ON MyTable
INSTEAD OF INSERT
AS
BEGIN
SET NOCOUNT ON;
SELECT i.COL_A, i.COL_B, i.COL_C, i.COL_D,
CASE WHEN mt.COL_D IS NULL THEN 0 ELSE 1 END AS KeyExists
INTO #tmpMyTable
FROM INSERTED i
LEFT JOIN MyTable mt
ON i.COL_D = mt.COL_D;
INSERT INTO MyTable(COL_A, COL_B, COL_C, COL_D)
SELECT COL_A, COL_B, COL_C, COL_D
FROM #tmpMyTable
WHERE KeyExists = 0;
UPDATE mt
SET mt.COL_A = t.COL_A, mt.COL_B = t.COL_B, mt.COL_C = t.COL_C
FROM MyTable mt
INNER JOIN #tmpMyTable t
ON mt.COL_D = t.COL_D AND t.KeyExists = 1;
END;
SqlFiddle here
How it works
We first project a list of all rows being attempted to be inserted into the table into a #temp table, noting which of those ARE already in the underlying table via a LEFT OUTER JOIN on the key column(s) COL_D which detect the duplication criteria.
We then need to repeat the actual work of an INSERT statement, by inserting those rows which are not already in the table (because of the INSTEAD OF, we have removed the responsibility of insertion from the engine and need to do this ourselves).
Finally, we update all non-key columns in the matched rows with the newly 'inserted' data.
Salient Points
It works under the covers, i.e. any insert into the table while the trigger is enabled will be subject to the trigger (e.g. Application ORM, other stored procedures etc). The caller will generally be UNAWARE that the INSTEAD OF trigger is in place.
There must be a key of sorts to detect the duplicate criterion (natural or surrogate). I've assumed COL_D in this case, but it could be a composite key. (Key but cannot be IDENTITY for obvious reasons, since the client wouldn't be inserting an Identity)
The trigger works for both single and multiple row INSERTS
NB
The standard disclaimers with triggers apply, and more so with INSTEAD OF triggers - as this can cause surprising changes in observable behaviour of Sql Server, such as this - even well intended INSTEAD OF triggers can cause hours of wasted effort and frustration for developers and DBA's who are not aware of their presence on your table.
This will affect ALL inserts into the table. Not just yours.
Stored Procedure will save the day.
Here I assume that COL_A and COL_B are unique columns and are type of INT
NB! Don't have sql-server instance ATM so cannot guarantee correctness of the syntax.
UPDATE! Here is a link to SQLFIDDLE
CREATE TABLE mytable
(
COL_A int UNIQUE,
COL_B int UNIQUE,
COL_C int,
COL_D int,
)
GO
INSERT INTO mytable (COL_A, COL_B, COL_C, COL_D)
VALUES (1,1,1,1),
(2,2,2,2),
(3,3,3,3),
(4,4,4,4);
GO
CREATE PROCEDURE updateDuplicate(#COL_A INT, #COL_B INT, #COL_C INT, #COL_D INT)
AS
BEGIN
DECLARE #ret INT
SELECT #ret = COUNT(*)
FROM mytable p
WHERE p.COL_A = #COL_A
AND p.COL_B = #COL_B
IF (#ret = 0)
INSERT INTO mytable (COL_A, COL_B, COL_C, COL_D)
VALUES ( #COL_A, #COL_B, #COL_C, #COL_D)
IF (#ret > 0)
UPDATE mytable SET COL_D = #COL_D WHERE col_A = #COL_A AND COL_B = #COL_B
END;
GO
Then call this procedure with needed values instead of Update statement
exec updateDuplicate 1, 1, 1, 2
GO
SELECT * from mytable
GO
There's no DUPLICATE KEY UPDATE equivalent in sql server,but you can use merged and when matched of sql server to get this done ,have a look here:
multiple operations using merge

Insert only when auto-increment id is not equal 6(for example)?

I have a table with 3 fields: Id(PK,AI), Name(varchar(36)), LName(varchar(36)).
I have to insert name and last name, Id inserts automatically because of it's constraints,
Is There a way to Jump id auto increment value when it reaches 6?
for instance do this 7 times:
Insert Into table(Name, LName) Values ('name1', 'lname1') "And jump id to 7 if it is going to be 6"
It may sound stupid to do this but I have the doubt.
Also Jump and do not record id 6.
record only, 1-5, 7,8,9 and so on
What I want to achieve starts from a Union:
Select * From TableNames
Union All
Select * From TableNames_general
In the TableNames_general I assign it's first value so that when the user sees the table for the first time it will be displayed the record I inserted.
The problem comes when the user inserts a new record, if the Id of the inserted record is the same as the one I have inserted it will be duplicated, that is why I want to achieve when the users inserts one record and if the last insert id already exists just jump that record. this is because I must have different ids due to its relationship among child tables.
Identity column generate values for you, And its best left this way, You have the ability to insert specific values in Identity column but its best left alone and let it generate values for you.
Imagine you have inserted a value explicitly in an identity column and then later on Identity column generates the same value for you, you will end up with duplicates.
If you want to have your input in that column then why bother with identity column anyway ??
Well this is not the best practice but you can jump to a specific number by doing as follows:
MS SQL SERVER 2005 and Later
-- Create test table
CREATE TABLE ID_TEST(ID INT IDENTITY(1,1), VALUE INT)
GO
-- Insert values
INSERT INTO ID_TEST (VALUE) VALUES
(1),(2),(3)
GO
-- Set idnentity insert on to insert values explicitly in identity column
SET IDENTITY_INSERT ID_TEST ON;
INSERT INTO ID_TEST (ID, VALUE) VALUES
(6, 6),(8,8),(9,9)
GO
-- Set identity insert off
SET IDENTITY_INSERT ID_TEST OFF;
GO
-- 1st reseed the value of identity column to any smallest value in your table
-- below I reseeded it to 0
DBCC CHECKIDENT ('ID_TEST', RESEED, 0);
-- execute the same commad without any seed value it will reset it to the
-- next highest idnetity value
DBCC CHECKIDENT ('ID_TEST', RESEED);
GO
-- final insert
INSERT INTO ID_TEST (VALUE) VALUES
(10)
GO
-- now select data from table and see the gap
SELECT * FROM ID_TEST
If you query the database to get the last inserted ID, then you can check if you need to increment it, by using a parameter in the query to set the correct ID.
If you use MSSQL, you can do the following:
Before you insert check for the current ID, if it's 5, then do the following:
Set IDENTITY_INSERT to ON
Insert your data with ID = 7
Set IDENTITY_INSERT to OFF
Also you might get away with the following scenario:
check for current ID
if it's 5, run DBCC CHECKIDENT (Table, reseed, 6), it will reseed the table and in this case your next identity will be 7
If you're checking for current identity just after INSERT, you can use SELECT ##IDENTITY or SELECT SCOPE_IDENTITY() for better results (as rcdmk pointed out in comments)
Otherwise you can just use select: SELECT MAX(Id) FROM Table
There's no direct way to influence the AUTO_INCREMENT to "skip" a particular value, or values on a particular condition.
I think you'd have to handle this in an AFTER INSERT trigger. An AFTER INSERT trigger can't update the values of the row that was just inserted, and I don't think it can make any modifications to the table affected by the statement that fired the trigger.
A BEFORE INSERT trigger won't work either, because the value assigned to an AUTO_INCREMENT column is not available in a BEFORE INSERT trigger.
I don't believe there's a way to get SQL Server IDENTITY to "skip" a particular value either.
UPDATE
If you need "unique" id values between two tables, there's a rather ugly workaround with MySQL: roll your own auto_increment behavior using triggers and a separate table. Rather than defining your tables with AUTO_INCREMENT attribute, use a BEFORE INSERT trigger to obtain a value.
If an id value is supplied, and it's larger than the current maximum value from the auto_increment column in the dummy auto_increment_seq table, we'd need to either update that row, or insert a new one.
As a rough outline:
CREATE TABLE auto_increment_seq
(id INT NOT NULL PRIMARY KEY AUTO_INCREMENT) ENGINE=MyISAM;
DELIMITER $$
CREATE TRIGGER TableNames_bi
BEFORE INSERT ON TableNames
FOR EACH ROW
BEGIN
DECLARE li_new_id INT UNSIGNED;
IF ( NEW.id = 0 OR NEW.id IS NULL ) THEN
INSERT INTO auto_increment_seq (id) VALUES (NULL);
SELECT LAST_INSERT_ID() INTO li_new_id;
SET NEW.id = li_new_id;
ELSE
SELECT MAX(id) INTO li_max_seq FROM auto_increment_seq;
IF ( NEW.id > li_max_seq ) THEN
INSERT INTO auto_increment_seq (id) VALUES (NEW.id);
END IF;
END IF;
END$$
CREATE TRIGGER TableNames_ai
AFTER INSERT ON TableNames
FOR EACH ROW BEGIN
DECLARE li_max_seq INT UNSIGNED;
SELECT MAX(id) INTO li_max_seq FROM auto_increment_seq;
IF ( NEW.id > li_max_seq ) THEN
INSERT INTO auto_increment_seq (id) VALUES (NEW.id);
END IF;
END;
DELIMITER ;
The id column in the table could be defined something like this:
TableNames
( id INT UNSIGNED NOT NULL DEFAULT 0 PRIMARY KEY
COMMENT 'populated from auto_increment_seq.id'
, ...
You could create an identical trigger for the other table as well, so the two tables are effectively sharing the same auto_increment sequence. (With less efficiency and concurrency than an Oracle SEQUENCE object would provide.)
IMPORTANT NOTES
This doesn't really insure that the id values between the tables are actually kept unique. That would really require a query of the other table to see if the id value exists or not; and if running with InnoDB engine, in the context of some transaction isolation levels, we might be querying a stale (as in, consistent from the point in time at the start of the transaction) version of the other table.
And absent some additional (concurrency killing) locking, the approach outline above is subject to a small window of opportunity for a "race" condition with concurrent inserts... the SELECT MAX() from the dummy seq table, followed by the INSERT, allows a small window for another transaction to also run a SELECT MAX(), and return the same value. The best we can hope for (I think) is for an error to be thrown due to a duplicate key exception.
This approach requires the dummy "seq" table to use the MyISAM engine, so we can get an Oracle-like AUTONOMOUS TRANSACTION behavior; if inserts to the real tables are performed in the context of a REPEATABLE READ or SERIALIZABLE transaction isolation level, reads of the MAX(id) from the seq table would be consistent from the snapshot at the beginning of the transaction, we wouldn't get the newly inserted (or updated) values.
We'd also really need to consider the edge case of an UPDATE of row changing the id value; to handle that case, we'd need BEFORE/AFTER UPDATE triggers as well.

MYSQL trigger see if record exists first?

So I have a trigger that works on update. Totally works fine.
Insert in cars(date, id, parent_id) values (date, ford, 2)
What I need to do is to actually check to see if the parent_id already exists. If it does do nothing but if it does not exist then do the insert statement.
right now i have
SET #myVar1 = (SELECT parent_id from cars where parent_id = NEW.id);
IF #myVar1 = NULL;
Insert in cars(date, id, parent_id) values (date, ford, 2);
ENDIF;
I keep getting sysntax error. How am I writing this worng?
The problem is on this line:
Insert in cars(date, id, parent_id) values (date, ford, 2);
The in should be INTO. That's the syntax error.
That said, you might be better served with an INSERT...ON DUPLICATE KEY or REPLACE INTO statement rather than an on-update trigger. Be careful with REPLACE INTO though, as it can be dangerous (but the danger can be somewhat mitigated by using transactions).
dunno if this what you really need. but you can try this one
SET #myVar1 = (SELECT parent_id from cars where parent_id = NEW.id);
IF (#myVar1 is NULL) then
Insert into cars(`date`, id, parent_id) values (date(), new.`name`, new.id);
END IF;
or
Insert into cars(`date`, id, parent_id) values (date(), new.`name`, new.id) on duplicate key update `date`=date();
on mysql must be "end if" not "endif".
new.name is assumes that id field on car from trigger table
you can use on duplicate key update if cars table use primary key or unique key like mention above
and if you doesn't want to change any record if exists then after key update change to id=id or you can use any field.