Prevent auto-increment on MySQL duplicate insert with multiple values - mysql

Is there an accepted practice for bulk inserting values that don't exist into a table without auto-incrementing when you attempt to insert a row that already exists?
There's a great answer for the single-row insert case at:
Prevent auto increment on MySQL duplicate insert
However, for insertion efficiency I'd like to insert a large number of rows with a single SQL command. (i.e.:INSERT INTO myBigTable VALUES ((value1_row1,value2_row1),(value1_row2,value2_row2) ... )
ADDITIONAL INFO:
I would like to have all the ID's available, since my table has the potential of becoming extremely large. Changing the auto_increment variable size to a BIGINT would be a last resort. My insertion application will attempt to insert a large number of already-existing rows on a regular basis (think stock price updates), so I'll effectively have a large number of auto-incremented ID's skipped.
WHY I'M USING AUTO-INCREMENT:
I believe that for query speed, I should be using an integer index into my (very large) table as the primary key instead of a combination of string fields. I also believe that I should use auto_increment, so MySQL handles concurrency for me.

You can use the technique in the other question, using a UNION query:
INSERT INTO yourTable (col1, col2, ...)
SELECT $row1_value1, $row1_value2, ...
FROM DUAL
WHERE NOT EXISTS (
SELECT 1
FROM yourTable
WHERE unique_col = $row1_value1)
UNION ALL
SELECT $row2_value1, $row2_value2, ...
FROM DUAL
WHERE NOT EXISTS (
SELECT 1
FROM yourTable
WHERE unique_col = $row2_value1)
UNION ALL
SELECT $row3_value1, $row3_value2, ...
FROM DUAL
WHERE NOT EXISTS (
SELECT 1
FROM yourTable
WHERE unique_col = row1_value1)
UNION ALL
SELECT $row4_value1, $row4_value2, ...
FROM DUAL
WHERE NOT EXISTS (
SELECT 1
FROM yourTable
WHERE unique_col = $row4_value1)

Related

Insert into multiple tables only if record doesn't exist in primary table

I don't seem to understand IF statements in SQL very well.
I have two tables, one called event_headers and one called event_records. Each event in has a single entry in the event_header table and at least one record in the event_records table.
I'm running a script in c# that reads SQL files that will insert into each table, but I'm running into a problem with duplicates. I can eliminate the duplicates in the event_header table by using INSERT IGNORE. The trouble I have is I want to be able to skip inserting into the event_records table if there is already an entry in the event_header table.
EXAMPLE:
INSERT INTO `event_headers` (`session_id`, [...] ) VALUES ('89131', [...] );
INSERT INTO `event_records` (`event_header_session_id`, [...] )
VALUES
('89131', [...] ),
('89131', [...] ),
('89191', [...] );
(In truth, I have a third table that also has records that get updated, but this illustrates the point).
I want to only run the INSERT statements if the event_headers.session_id does not exist.
You must check does the 1st insertion inserts the row. You may do this, for example, using ROW_COUNT() which returns the amount of rows really altered in previous statement. The only point - you must use INSERT .. SELECT for 2nd insertion because INSERT .. VALUES does not allow WHERE clause:
INSERT IGNORE INTO main_table VALUES (...);
INSERT INTO slave_table
SELECT *
FROM ( SELECT ... UNION ALL SELECT ... ) slave_data;
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=0668b71ddcdc67180b3ed54acb562931
Both statements must be executed as a batch (in the same connection, without any other statement between them).
Only one row must be inserted into main table.
But stored procedure which checks the presence in main table and inserts only when no such row is preferred.
Instead of just using VALUES, use a select:
INSERT INTO `event_records` (`event_header_session_id`, col_a, col_b, col_c )
SELECT event_header_session_id, col_a, col_b, col_c
FROM (
SELECT NULL event_header_session_id, NULL col_a, NULL col_b, NULL col_c WHERE 0
UNION ALL
VALUES
ROW('89131', 1,2,3 ),
ROW('89131', 2,3,4 ),
ROW('89191', 3,4,5 );
) new_rows
WHERE NOT EXISTS (
SELECT 1
FROM event_headers
WHERE event_headers.session_id=new_rows.event_headers_session_id
);
The SELECT NULL...UNION ALL is the most portable way I know to name the columns of a VALUES table constructor. On mariadb, omit the ROWs.

Update with Subquery never completes

I'm currently working on a project with a MySQL Db of more than 8 million rows. I have been provided with a part of it to test some queries on it. It has around 20 columns out of which 5 are of use to me. Namely: First_Name, Last_Name, Address_Line1, Address_Line2, Address_Line3, RefundID
I have to create a unique but random RefundID for each row, that is not the problem. The problem is to create same RefundID for those rows whose First_Name, Last_Name, Address_Line1, Address_Line2, Address_Line3 as same.
This is my first real work related to MySQL with such large row count. So far I have created these queries:
-- Creating Teporary Table --
CREATE temporary table tempT (SELECT tt.First_Name, count(tt.Address_Line1) as
a1, count(tt.Address_Line2) as a2, count(tt.Address_Line3) as a3, tt.RefundID
FROM `tempTable` tt GROUP BY First_Name HAVING a1 >= 2 AND a2 >= 2 AND a3 >= 2);
-- Updating Rows with First_Name from tempT --
UPDATE `tempTable` SET RefundID = FLOOR(RAND()*POW(10,11))
WHERE First_Name IN (SELECT First_Name FROM tempT WHERE First_Name is not NULL);
This update query keeps on running but never ends, tempT has more than 30K rows. This query will then be run on the main DB with more than 800K rows.
Can someone help me out with this?
Regards
The solutions that seem obvious to me....
Don't use a random value - use a hash:
UPDATE yourtable
SET refundid = MD5('some static salt', First_Name
, Last_Name, Address_Line1, Address_Line2, Address_Line3)
The problem is that if you are using an integer value for the refundId then there's a good chance of getting a collision (hint CONV(SUBSTR(MD5(...),1,16),16,10) to get a SIGNED BIGINT). But you didn't say what the type of the field was, nor how strict the 'unique' requirement was. It does carry out the update in a single pass though.
An alternate approach which creates a densely packed seguence of numbers is to create a temporary table with the unique values from the original table and a random value. Order by the random value and set a monotonically increasing refundId - then use this as a look up table or update the original table:
SELECT DISTINCT First_Name
, Last_Name, Address_Line1, Address_Line2, Address_Line3
INTO temptable
FROM yourtable;
set #counter=-1;
UPDATE temptable t SET t,refundId=(#counter:=#counter + 1)
ORDER BY r.randomvalue;
There are other solutions too - but the more efficient ones rely on having multiple copies of the data and/or using a procedural language.
Try using the following:
UPDATE `tempTable` x SET RefundID = FLOOR(RAND()*POW(10,11))
WHERE exists (SELECT 1 FROM tempT y WHERE First_Name is not NULL and x.First_Name=y.First_Name);
In MySQL, it is often more efficient to use join with update than to filter through the where clause using a subquery. The following might perform better:
UPDATE `tempTable` join
(SELECT distinct First_Name
FROM tempT
WHERE First_Name is not NULL
) fn
on temptable.First_Name = fn.First_Name
SET RefundID = FLOOR(RAND()*POW(10,11));

INSERT INTO with SubQuery MySQL

I have this Statement:
INSERT INTO qa_costpriceslog (item_code, invoice_code, item_costprice)
VALUES (1, 2, (SELECT item_costprice FROM qa_items WHERE item_code = 1));
I'm trying to insert a value copy the same data of item_costprice, but show me the error:
Error Code: 1136. Column count doesn't match value count at row 1
How i can solve this?
Use numeric literals with aliases inside a SELECT statement. No () are necessary around the SELECT component.
INSERT INTO qa_costpriceslog (item_code, invoice_code, item_costprice)
SELECT
/* Literal number values with column aliases */
1 AS item_code,
2 AS invoice_code,
item_costprice
FROM qa_items
WHERE item_code = 1;
Note that in context of an INSERT INTO...SELECT, the aliases are not actually necessary and you can just SELECT 1, 2, item_costprice, but in a normal SELECT you'll need the aliases to access the columns returned.
You can just simply e.g.
INSERT INTO modulesToSections (fk_moduleId, fk_sectionId, `order`) VALUES
((SELECT id FROM modules WHERE title="Top bar"),0,-100);
I was disappointed at the "all or nothing" answers. I needed (again) to INSERT some data and SELECT an id from an existing table.
INSERT INTO table1 (id_table2, name) VALUES ((SELECT id FROM table2 LIMIT 1), 'Example');
The sub-select on an INSERT query should use parenthesis in addition to the comma as deliminators.
For those having trouble with using a SELECT within an INSERT I recommend testing your SELECT independently first and ensuring that the correct number of columns match for both queries.
Your insert statement contains too many columns on the left-hand side or not enough columns on the right hand side. The part before the VALUES has 7 columns listed, but the second part after VALUES only has 3 columns returned: 1, 2, then the sub-query only returns 1 column.
EDIT: Well, it did before someone modified the query....
As a sidenote to the good answer of Michael Berkowski:
You can also dynamically add fields (or have them prepared if you're working with php skripts) like so:
INSERT INTO table_a(col1, col2, col3)
SELECT
col1,
col2,
CURRENT_TIMESTAMP()
FROM table_B
WHERE b.col1 = a.col1;
If you need to transfer without adding new data, you can use NULL as a placeholder.
If you have multiple string values you want to add, you can put them into a temporary table and then cross join it with the value you want.
-- Create temp table
CREATE TEMPORARY TABLE NewStrings (
NewString VARCHAR(50)
);
-- Populate temp table
INSERT INTO NewStrings (NewString) VALUES ('Hello'), ('World'), ('Hi');
-- Insert desired rows into permanent table
INSERT INTO PermanentTable (OtherID, NewString)
WITH OtherSelect AS (
SELECT OtherID AS OtherID FROM OtherTable WHERE OtherName = 'Other Name'
)
SELECT os.OtherID, ns.NewString
FROM OtherSelect os, NewStrings ns;
This way, you only have to define the strings in one place, and you only have to do the query in one place. If you used subqueries like I initially did and like Elendurwen and John suggest, you have to type the subquery into every row. But using temporary tables and a CTE in this way, you can write the query only once.

Execute INSERT if table is empty?

Is there a way to do an insert under a count condition, something like:
INSERT INTO my_table (colname) VALUES('foo') IF COUNT(my_table) < 1
Basically I want to insert a single default record if the table is currently empty. I'm using mysql.
Use SELECT instead of VALUES to be able to expand the query with a WHERE clause.
EXISTS is a better & faster test than COUNT
INSERT INTO my_table (colname)
SELECT 'foo'
WHERE NOT EXISTS (SELECT * FROM my_table)
One way would be to place a unique key on a column. Then execute a REPLACE:
REPLACE [LOW_PRIORITY | DELAYED]
[INTO] tbl_name [(col_name,...)]
{VALUES | VALUE} ({expr | DEFAULT},...),(...),...
REPLACE works exactly like INSERT,
except that if an old row in the table
has the same value as a new row for a
PRIMARY KEY or a UNIQUE index, the old
row is deleted before the new row is
inserted
This is easier to read:
INSERT INTO my_table (colname)
SELECT 'foo' FROM DUAL
WHERE NOT EXISTS (SELECT * FROM my_table);
The lack of a VALUES is mitigated by the SELECT FROM DUAL which will provide the values. the FROM DUAL is not always required, but it doesn't hurt to include it for that weird configurations where it is required (like the installation of Percona I am using).
The NOT EXISTS is faster than doing a count which can be slow on a table with a large number of rows.

mysql - union with creating demarcated field

I need UNION two tables with creating new field, where 1 for first table, and 2 for second.
I tried
(
SELECT field, 1 AS tmp
FROM table1
)
UNION
(
SELECT field, 2 AS tmp
FROM table2
)
But in result, tmp field was full of "1".
How it can be implemented?
Your query should work fine. The only thing you should change is UNION should be UNION ALL to give better performance. Without the ALL it defaults to UNION DISTINCT which causes the rows to be compared for duplicates*, but the way you have constructed them guarantees that there cannot be duplicates so this extra check is a waste of time. Here is some test code I used to verify that what you are doing ought to work:
CREATE TABLE table1 (field NVARCHAR(100) NOT NULL);
INSERT INTO table1 (field) VALUES
('foo1'),
('bar1'),
('baz1');
CREATE TABLE table2 (field NVARCHAR(100) NOT NULL);
INSERT INTO table2 (field) VALUES
('foo2'),
('bar2'),
('baz2');
SELECT field, 1 AS tmp
FROM table1
UNION ALL
SELECT field, 2 AS tmp
FROM table2
Result:
'foo1', 1
'bar1', 1
'baz1', 1
'foo2', 2
'bar2', 2
'baz2', 2
If you only get rows where tmp was equal to 1, maybe your table2 was empty?
*See the documentation for UNION.
The default behavior for UNION is that duplicate rows are removed from the result. The optional DISTINCT keyword has no effect other than the default because it also specifies duplicate-row removal. With the optional ALL keyword, duplicate-row removal does not occur and the result includes all matching rows from all the SELECT statements.
You are very close
Create YourNewTable
SELECT field, 1 AS tmp
FROM table1
UNION ALL
SELECT field, 2 AS tmp
FROM table2