How can I increase insert speed? - mysql

I need to import data from an external web service to my mySQL(5.7) database.
Problem is, that I need to split the data into to tables. So for example I have the tables
CREATE TABLE a (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100)
);
CREATE TABLE b (
id INT PRIMARY KEY AUTO_INCREMENT,
a_id INT,
name VARCHAR(100)
);
Now I have to insert multiple rows into table b for one row in table a (1:n)
As I do not know the id of table a before inserting it, the only way is to insert one row in table a, get the last id and then insert all connected entries to table b.
But, my database is very slow when I insert row by row. It takes more than 1h to insert about 35000 rows in table a and 120000 in table b. If I do a batch insert about 1000 rows on table a (just for testing without filling table b) it is incredible faster (less then 3 minutes)
I guess there must be a solution how I can speed up my import.
Thanks for your help

I presume you are working with a programming language driving your inserts. You need to be able to program this sequence of operations.
First, you need to use this sequence to put a row into a and dependent rows into b. It uses LAST_INSERT_ID() to handle a_id. That's faster and much more robust than querying the table to find the correct id value.
INSERT INTO a (name) VALUES ('Claus');
SET #a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (#a_id, 'von');
INSERT INTO b (a_id, name) VALUES (#a_id, 'Bönnhoff');
The trick is to capture the a.id value in the session variable #a_id, and then reuse it for each dependent INSERT. (I have turned you into an aristocrat to illustrate this, sorry :-)
Second, you should keep this in mind: INSERTs are cheap, but transaction COMMITs are expensive. That's because MySQL (InnoDB actually) does not actually update tables until COMMIT. Unless you manage your transactions explicitly, the DBMS uses a feature called "autocommit" in which it immediately commits each INSERT (or UPDATE or DELETE).
Fewer transactions gets you better speed. Therefore, to improve bulk-loading performance you want to bundle together 100 or so INSERTs into a single transaction. (The exact number doesn't matter very much.) You can do something like this:
START TRANSACTION; /* start an insertion bundle */
INSERT INTO a (name) VALUES ('Claus');
SET #a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (#a_id, 'von');
INSERT INTO b (a_id, name) VALUES (#a_id, 'Bönnhoff');
INSERT INTO a (name) VALUES ('Oliver');
SET #a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (#a_id, 'Jones');
... more INSERT operations ...
INSERT INTO a (name) VALUES ('Jeff');
SET #a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (#a_id, 'Atwood');
COMMIT; /* commit the bundle */
START TRANSACTION; /* start the next bundle */
INSERT INTO a (name) VALUES ('Joel');
SET #a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (#a_id, 'Spolsky');
... more INSERT operations ...
COMMIT; /* finish the bundle */
(All this, except LAST_INSERT_ID(), works on any SQL-based RDBMS. Each make of RDBMS has its own way of handling IDs.(

Related

How to increment id without auto increment?

I have a table with id column as a number which have meanings. Different types of accounts start from different ranges. E.g Organisation 10000 <-> 100000, users 1000000 <-> 1kk. How can i properly increment ids on insert (with possible concurrency problem)?
If you were doing this in Oracle's table server, you would use different SEQUENCE objects for each type of account.
The MariaDB fork of MySQL has a similar kind of SEQUENCE object, as does PostgreSQL. So if you were using MariaDB you would do something like this.
CREATE SEQUENCE IF NOT EXISTS org_account_id MINVALUE=10000 MAXVALUE=999999;
CREATE SEQUENCE IF NOT EXISTS user_account_id MINVALUE=1000000;
Then to use a sequence in place of autoincrement you'll do something like this.
INSERT INTO tbl (id, col1, col2)
VALUES (NEXTVAL(user_account_id), something, something);
In MySQL you can emulate sequence objects with dummy tables containing autoincrement ids. It's a kludge. Create the following table (one for each sequence).
CREATE TABLE user_account_id (
sequence_id BIGINT NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`sequence_id`)
);
ALTER TABLE user_account_id AUTO_INCREMENT=1000000;
Then issue these queries one after the other to insert a row with a unique user id.
INSERT INTO user_account_id () VALUES ();
DELETE FROM sequence WHERE sequence_id < LAST_INSERT_ID();
SET #id:=LAST_INSERT_ID();
INSERT INTO tbl (id, col1, col2)
VALUES (#id, something, something);
After your insert into the dummy table, LAST_INSERT_ID() returns a unique id. The DELETE query merely keeps this dummy table from taking up too much space.
I recommend that you use a normal sequence-based bigint column. Then, on SELECT, add the base for the appropriate account type to the column.
PreparedStatement ps = con.prepareStatement("insert into emp(emp_code,emp_name,join_date,designation,birth_date,gender,mobile) values((select max(emp_code)+1 from emp),?,?,?,?,?,?)")
This query will definitely help..

Mysql insert to main table with auto increment and link data in other table to the inserted main

I am not sure if the title explains it well, But I am sure that my explanation will explain it better:
I have a table called Tracks and a tble called Flocks.
each Track has many Flocks in it.
So when I insert a new track, a new ID is created with the AUTO_INCREMENT function, and in the same query, I want to insert the track's flocks aswell, but in order to make these flocks belong to the track I just inserted, I have to set their track_id to the auto incremented value.
I can do this in 3 queries, insert the Track, fetch the incremented ID, and then insert all flocks with the ID.
But I want to do this in one query, is that possible?
You need at least two queries unless you go for triggers or stored procedures:
Insert the track
use last_insert_id() as foreign key value to insert into flocks
example:
insert into track (name) values ('Trackname');
insert info flocks (trackid) select last_insert_id();
I normally group such tasks together in a stored procedure:
create procedure createTrack ( p_trackname varchar(20) ) as
begin
insert into track (name) values (p_trackname);
insert info flocks (trackid) select last_insert_id();
end;
And then call it this way:
call createTrack("Trackname");

how to use primary key as custId number in second table

I have two tables, the first has an auto incrementing ID number, I want to use that as custId in the second table.
I am using an insert into the first table with all the basic info, name, address etc. Then in the second table only 3 things, custId, stocknum, and location. How can I write to these two tables kinda of simultaneously since stockNum may have several values, but always attached to one custId. I hope this makes sense even without putting code in here.
You can't insert into multiple tables at the same time. You have two options. You either do two inserts
INSERT INTO table1 (col1, col2) VALUES ('value1',value2);
/* Gets the id of the new row and inserts into the other table */
INSERT INTO table2 (cust_id, stocknum, location) VALUES (LAST_INSERT_ID(), 'value3', 'value4')
Or you can use a post-insert trigger
CREATE TRIGGER table2_auto AFTER INSERT ON `table1`
FOR EACH ROW
BEGIN
INSERT INTO table2 (cust_id, stocknum, location) VALUES (NEW.id, value3, 'value4')
END
Hope this helps.
After inserting in the first table, The identity field or Auto increment field generate an ID
Get this id Refer Here(LAST_INSERT_ID() MySQL)
Then use this id to store value in the other table

Insert only when auto-increment id is not equal 6(for example)?

I have a table with 3 fields: Id(PK,AI), Name(varchar(36)), LName(varchar(36)).
I have to insert name and last name, Id inserts automatically because of it's constraints,
Is There a way to Jump id auto increment value when it reaches 6?
for instance do this 7 times:
Insert Into table(Name, LName) Values ('name1', 'lname1') "And jump id to 7 if it is going to be 6"
It may sound stupid to do this but I have the doubt.
Also Jump and do not record id 6.
record only, 1-5, 7,8,9 and so on
What I want to achieve starts from a Union:
Select * From TableNames
Union All
Select * From TableNames_general
In the TableNames_general I assign it's first value so that when the user sees the table for the first time it will be displayed the record I inserted.
The problem comes when the user inserts a new record, if the Id of the inserted record is the same as the one I have inserted it will be duplicated, that is why I want to achieve when the users inserts one record and if the last insert id already exists just jump that record. this is because I must have different ids due to its relationship among child tables.
Identity column generate values for you, And its best left this way, You have the ability to insert specific values in Identity column but its best left alone and let it generate values for you.
Imagine you have inserted a value explicitly in an identity column and then later on Identity column generates the same value for you, you will end up with duplicates.
If you want to have your input in that column then why bother with identity column anyway ??
Well this is not the best practice but you can jump to a specific number by doing as follows:
MS SQL SERVER 2005 and Later
-- Create test table
CREATE TABLE ID_TEST(ID INT IDENTITY(1,1), VALUE INT)
GO
-- Insert values
INSERT INTO ID_TEST (VALUE) VALUES
(1),(2),(3)
GO
-- Set idnentity insert on to insert values explicitly in identity column
SET IDENTITY_INSERT ID_TEST ON;
INSERT INTO ID_TEST (ID, VALUE) VALUES
(6, 6),(8,8),(9,9)
GO
-- Set identity insert off
SET IDENTITY_INSERT ID_TEST OFF;
GO
-- 1st reseed the value of identity column to any smallest value in your table
-- below I reseeded it to 0
DBCC CHECKIDENT ('ID_TEST', RESEED, 0);
-- execute the same commad without any seed value it will reset it to the
-- next highest idnetity value
DBCC CHECKIDENT ('ID_TEST', RESEED);
GO
-- final insert
INSERT INTO ID_TEST (VALUE) VALUES
(10)
GO
-- now select data from table and see the gap
SELECT * FROM ID_TEST
If you query the database to get the last inserted ID, then you can check if you need to increment it, by using a parameter in the query to set the correct ID.
If you use MSSQL, you can do the following:
Before you insert check for the current ID, if it's 5, then do the following:
Set IDENTITY_INSERT to ON
Insert your data with ID = 7
Set IDENTITY_INSERT to OFF
Also you might get away with the following scenario:
check for current ID
if it's 5, run DBCC CHECKIDENT (Table, reseed, 6), it will reseed the table and in this case your next identity will be 7
If you're checking for current identity just after INSERT, you can use SELECT ##IDENTITY or SELECT SCOPE_IDENTITY() for better results (as rcdmk pointed out in comments)
Otherwise you can just use select: SELECT MAX(Id) FROM Table
There's no direct way to influence the AUTO_INCREMENT to "skip" a particular value, or values on a particular condition.
I think you'd have to handle this in an AFTER INSERT trigger. An AFTER INSERT trigger can't update the values of the row that was just inserted, and I don't think it can make any modifications to the table affected by the statement that fired the trigger.
A BEFORE INSERT trigger won't work either, because the value assigned to an AUTO_INCREMENT column is not available in a BEFORE INSERT trigger.
I don't believe there's a way to get SQL Server IDENTITY to "skip" a particular value either.
UPDATE
If you need "unique" id values between two tables, there's a rather ugly workaround with MySQL: roll your own auto_increment behavior using triggers and a separate table. Rather than defining your tables with AUTO_INCREMENT attribute, use a BEFORE INSERT trigger to obtain a value.
If an id value is supplied, and it's larger than the current maximum value from the auto_increment column in the dummy auto_increment_seq table, we'd need to either update that row, or insert a new one.
As a rough outline:
CREATE TABLE auto_increment_seq
(id INT NOT NULL PRIMARY KEY AUTO_INCREMENT) ENGINE=MyISAM;
DELIMITER $$
CREATE TRIGGER TableNames_bi
BEFORE INSERT ON TableNames
FOR EACH ROW
BEGIN
DECLARE li_new_id INT UNSIGNED;
IF ( NEW.id = 0 OR NEW.id IS NULL ) THEN
INSERT INTO auto_increment_seq (id) VALUES (NULL);
SELECT LAST_INSERT_ID() INTO li_new_id;
SET NEW.id = li_new_id;
ELSE
SELECT MAX(id) INTO li_max_seq FROM auto_increment_seq;
IF ( NEW.id > li_max_seq ) THEN
INSERT INTO auto_increment_seq (id) VALUES (NEW.id);
END IF;
END IF;
END$$
CREATE TRIGGER TableNames_ai
AFTER INSERT ON TableNames
FOR EACH ROW BEGIN
DECLARE li_max_seq INT UNSIGNED;
SELECT MAX(id) INTO li_max_seq FROM auto_increment_seq;
IF ( NEW.id > li_max_seq ) THEN
INSERT INTO auto_increment_seq (id) VALUES (NEW.id);
END IF;
END;
DELIMITER ;
The id column in the table could be defined something like this:
TableNames
( id INT UNSIGNED NOT NULL DEFAULT 0 PRIMARY KEY
COMMENT 'populated from auto_increment_seq.id'
, ...
You could create an identical trigger for the other table as well, so the two tables are effectively sharing the same auto_increment sequence. (With less efficiency and concurrency than an Oracle SEQUENCE object would provide.)
IMPORTANT NOTES
This doesn't really insure that the id values between the tables are actually kept unique. That would really require a query of the other table to see if the id value exists or not; and if running with InnoDB engine, in the context of some transaction isolation levels, we might be querying a stale (as in, consistent from the point in time at the start of the transaction) version of the other table.
And absent some additional (concurrency killing) locking, the approach outline above is subject to a small window of opportunity for a "race" condition with concurrent inserts... the SELECT MAX() from the dummy seq table, followed by the INSERT, allows a small window for another transaction to also run a SELECT MAX(), and return the same value. The best we can hope for (I think) is for an error to be thrown due to a duplicate key exception.
This approach requires the dummy "seq" table to use the MyISAM engine, so we can get an Oracle-like AUTONOMOUS TRANSACTION behavior; if inserts to the real tables are performed in the context of a REPEATABLE READ or SERIALIZABLE transaction isolation level, reads of the MAX(id) from the seq table would be consistent from the snapshot at the beginning of the transaction, we wouldn't get the newly inserted (or updated) values.
We'd also really need to consider the edge case of an UPDATE of row changing the id value; to handle that case, we'd need BEFORE/AFTER UPDATE triggers as well.

How to avoid this kind of duplicate?

This is my table for many to many relationship:
Related:
-id
-id_postA
-id_postB
I want this:
If for example there is a row with id_postA = 32 and id_postB = 67
then it must ignore the insertion of a row with id_postA = 67 AND id_postB = 32.
One option would be to create a unique index on both columns:
CREATE UNIQUE INDEX uk_related ON related (id_postA, id_postB);
And then prevent "duplicates by order inversion" using a trigger, ordering id_postA and id_postB on INSERT and UPDATE:
CREATE TRIGGER order_uk_related
BEFORE INSERT -- Duplicate this trigger also for UPDATE
ON related -- As MySQL doesn't support INSERT OR UPDATE triggers
FOR EACH ROW
BEGIN
DECLARE low INT;
DECLARE high INT;
SET low = LEAST(NEW.id_postA, NEW.id_postB);
SET high = GREATEST(NEW.id_postA, NEW.id_postB);
SET NEW.id_postA = low;
SET NEW.id_postB = high;
END;
As you can see in this SQLFiddle, the fourth insert will fail, as (2, 1) has already been switched to (1, 2) by the trigger:
INSERT INTO relation VALUES (1, null, null)
INSERT INTO relation VALUES (2, null, null)
INSERT INTO relation VALUES (3, 2, 1)
INSERT INTO relation VALUES (4, 1, 2)
Function-based indexes
In some other databases, you might be able to use a function-based index. Unfortunately, this is not possible in MySQL (Is it possible to have function-based index in MySQL?). If this were an Oracle question, you'd write:
CREATE UNIQUE INDEX uk_related ON related (
LEAST(id_postA, id_postB),
GREATEST(id_postA, id_postB)
);
you can include a where like:
For example
insert into table_name
(id_postA
,id_postB
select
col1,
col2
from table_1
where where (cast(col1 as varchar)+'~'+cast(col2 as varchar))
not in (select cast(id_postB as varchar)+'~'+cast(id_postA as varchar) from table_name)
If you always insert these with A < B, you won't have to worry about the reverse being inserted. This can be done with a simple sort, or a quick comparison before inserting.
Join tables like this are by their very nature uni-directional. There is no automatic method for detecting the reverse join and blocking it with a simple UNIQUE index.
Normally what you'd do, though, is insert in pairs:
INSERT INTO related (id_postA, id_postB) VALUES (3,4),(4,3);
If this insert fails, then one or both of those links is already present.