MySQL Bulk Insert Dependent on Another Table - mysql

I have a case where I'm doing two queries: query1 is a bulk INSERT ... ON DUPLICATE KEY UPDATE on table1. For query2, I want to do another bulk INSERT on table2 with some application data along with using the ids inserted/updated from query1. I know I can do this with an intermediate query, selecting the ids I need from table1 and then inserting them into table2 along with application data, but I really want to avoid the extra network back-and-forth of that query along with the db overhead. Is there any way I can either get the ids inserted/updated from query1 when running that, or do some kind of complex, but relatively less expensive INSERT ... SELECT FROM in query2 to avoid this?
As far as I know, getting ids added/modified returned from query1 is impossible without a separate query, and I can't think of a way to batch INSERT ... SELECT FROM where the insertion values for each row are dependent on the selected value, but I'd love to be proven wrong, or shown a way around either of those.

There is no way to get a set of IDs as a result of a bulk INSERT.
One option you have is indeed to run a SELECT query to get the IDs and use them in the second bulk INSERT. But that's a hassle.
Another option is to run the 2nd bulk INSERT into a temporary table, let's call it table3, then use INSERT INTO table2 ... SELECT FROM ... table1 JOIN table3 ...
With a similar use case we eventually found that this is the fastest option, given that you index table3 correctly.
Note that in this case you don't have a SELECT that you need to loop over in your code, which is nice.

Related

Insert and Select query at the same time

I have two tables in MS Access
Table1 (name, carname)
Table2 (carname, agency)
I executed the following command
st.executeUpdate("insert into Table1 values('"+name+"','"+carname+"')");
and the value is inserted.
At the same time if the carname in Table2 is the same as user given it has to select the agency. How can I write the query? (This two process has to be done at the same time).
You can't do them "at the same time", Access isn't capable of multi-threading. The only way to do it is to run the statements in succession, or one after the other. If that's the case, just put a second INSERT staement into an If/Then/Else block and only run it if it fulfills your desired criteria. I honestly don't understand the question; are you trying to match up the last record inserted? If that's the case, just select the Max(PrimaryKey) and INNER JOIN it back to the table, and that will give you the record you last inserted.

Insert query vs select query performance mysql

I have executed an insert query as follows -
Insert into tablename
select
query1 union query2
Now if I execute the select part of this insert query,it takes around 2-3 minutes.However,the entire insert script is taking more than 8 minutes.As per my knowledge the insert and corresponding select queries should take almost the same time for execution.
So is their any other factor that could impact the execution time of the insert?
It's not correct that insert and corresponding select takes the same time, it should not!
The select query just "reads" data and transmit them; if you are trying the query in an application (like phpMyadmin) is very likely to limit query result to paginate them, so the select is faster (as it doesn't select all the data).
The insert query must read that data, insert in the table, update primary key tree, update every other index on that table, update every view using that table, triggering any trigger on that table/column, ecc... so the insert operates a LOT way more actions than an insert.
So IT'S normal that the insert is slower than the select, how much slower depends on your tables and db structure.
You could optimize the insert with some db specific options, for example you could read here for mysql, if you are on DB2 you could crete a temp file then cpyf that into the real one, and so on...

Which option is most efficient to insert data to mysql db?

I have a mysql db with several tables, let's call them Table1, Table2, etc. I have to make several calls to each of these tables
Which is most efficient,
a) Collecting all queries for each table in one message, then executing them separately, e.g.:
INSERT INTO TABLE1 VALUES (A,B);
INSERT INTO TABLE1 VALUES (A,B);
...execute
INSERT INTO TABLE2 VALUES (A,B);
INSERT INTO TABLE2 VALUES (A,B);
...execute
b) Collecting ALL queries in one long message(not in order of table), then executing this query, e.g:
INSERT INTO TABLE1 VALUES (A,B);
INSERT INTO TABLE2 VALUES (B,C);
INSERT INTO TABLE1 VALUES (B,A);
INSERT INTO TABLE3 VALUES (D,B);
c) Something else?
Currently I am doing it like option (b), but I am wondering if there is a better way.
(I am using jdbc to access the db, in a groovy script).
Thanks!
Third option - using prepared statements.
Without posting your code, you've made this a bit of a wild guess, but this blog post shows great performance improvements using the groovy Sql.withBatch method.
The code they show (which uses sqlite) is reproduced here for posterity:
Sql sql = Sql.newInstance("jdbc:sqlite:/home/ron/Desktop/test.db", "org.sqlite.JDBC")
sql.execute("create table dummyTable(number)")
sql.withBatch {stmt->
100.times {
stmt.addBatch("insert into dummyTable(number) values(${it})")
}
stmt.executeBatch()
}
which inserts the numbers 1 to 1000 into a table dummyTable
This will obviously need tweaking to work with your unknown code
Rather than looking at which is more efficient, first consider whether the tables are large and whether you need concurrency.
If they are (millions of records) then you may want to separate them on a statement to statement basis and give some time between each statement, so you will not lock the table for too long at a time.
If your table isn't that large or concurrency is not a problem, then by all means do whichever. You should look at the slow logs of the statements and see which statement is faster.

MySQL performance: nested insert/duplicate key vs multiple updates

Does anyone know what would be more efficient and use less resources:
Method 1-- Using a single SELECT statement to get data from one table and then iterating through it to execute multiple UPDATEs on another table. E.G. (pseudo-code, execute() runs query):
Query1_resultset = execute("SELECT item_id, sum(views) as view_count FROM tableA WHERE condition=1");
while(Query1_resultset as row) {
execute("UPDATE tableB SET view_count=row.view_count WHERE id=row.item_id");
}
Method 2-- Use a single INSERT.. ON DUPLICATE KEY UPDATE statement with a nested SELECT statement. E.G.:
INSERT INTO tableB (id, view_count) SELECT item_id, SUM(views) as view_count FROM tableA WHERE condition=1 ON DUPLICATE KEY UPDATE view_count=VALUES(view_count);
Note: ID on tableB is a primary key. There actually won't be any INSERTS because I know the key will exist. So it's all UPDATEs. Just using this statement to pass in a single query rather than multiple.
I'm really curious as to why either would be more efficient. Is it the number of queries that determines how quickly it will run? Where is the bottleneck?
I'm looking for something that will scale (the number of rows being updated grows daily).
Any ideas?
Thanks
It depens on your update/insert ratio. If you have lots of inserts and only a couple of updates than the INSERT ... ON DUPLICATE KEY UPDATE statement will be faster.
If you mainly have updates, than you would be better off with an UPDATE statement and an insert as fallback (if there was no update). You could use the multi table update clause to do it with a single update instead of a select followed by an update by the way. If you're doing both a SELECT and an UPDATE than the INSERT will definately be faster.
I think INSERT.. ON DUPLICATE KEY UPDATE is more efficient (otherwise, it wouldn't make much sense to add such an extension). By the way, your first example is not exactly the same as the second one - you neither use transactions nor you lock the table, so it's possible that the record returned by SELECT will not exist by the time you execute UPDATE.

How do I insert data into a mysql database using a procedure?

I have a single procedure that has two insert statements in it for two different tables. I must insert data into table1 before I can insert into table2. I'm using PHP to do the data collection. What I'd like to know is how to insert multiple rows into table2, which can have many rows associated with table1. How would I do this?
I want to only store the person in table1 just one time but table2 requires multiple rows. If these insert statements were in separate procedures, I wouldn't have a problem but I just don't know how I would insert more than one row into table2 without table1 rejecting a second duplicate record.
BEGIN
INSERT INTO user(name, address, city) VALUES(Name, Address, City);
INSERT INTO order(order_id, desc) VALUES(OrderNo, Description);
END
I'd suggest you do it separately, otherwise you'd need a complicated solution which is prone to error if something changes.
The complicated solution is:
join all orderno and descriptions with a separator. (orderno#description)
join all orders with a different separator. (orderno#description/orderno#description/...)
pass it to the procedure
in the procedure, split the string by order separator, then loop through each of them
for each order, split the string by the first separator, then insert into the appropriate columns
As you can see, this is bad.
I am sorry, but what's stopping you from inserting data into these (seemingly unrelated) tables in separate queries? If you don't like the idea of it failing halfway through, you can wrap it into a transaction. I know, mysqli and pdo can do that just fine.
Answering your question directly, insert's ignore mode turns errors during insertion into warnings, so upon attempting to insert a duplicate row the warning is issued and the row is not inserted, but there is no error.
You could use the IGNORE keyword on the first statement.
http://dev.mysql.com/doc/refman/5.1/en/insert.html:
If you use the IGNORE keyword, errors that occur while executing the INSERT statement are treated as warnings instead. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row still is not inserted, but no error is issued.But somehow this seems rather inefficient to me, a "stabbed from behind through the chest in the eye"-solution.