In my table I insert around 20,000 rows on each load. Right now I am doing it one-by-one. From mysql website I came to know inserting multiple rows with single insert query is faster.
Can I insert all 20000 in single query?
What will happen if there are errors within this 20000 rows? how will mysql handle that?
If you are inserting the rows from some other table then you can use the INSERT ... SELECT pattern to insert the rows.
However if you are inserting the values using INSERT ... VALUES pattern then you have the limit of max_allowed_packet.
Also from the docs:-
To optimize insert speed, combine many small operations into a single
large operation. Ideally, you make a single connection, send the data
for many new rows at once, and delay all index updates and consistency
checking until the very end.
Example:-
INSERT INTO `table1` (`column1`, `column2`) VALUES ("d1", "d2"),
("d1", "d2"),
("d1", "d2"),
("d1", "d2"),
("d1", "d2");
What will happen if there are errors within this 20000 rows?
If there are errors while inserting the records then the operation will be aborted.
http://dev.mysql.com/doc/refman/5.5/en/insert.html
INSERT statements that use VALUES syntax can insert multiple rows. To
do this, include multiple lists of column values, each enclosed within
parentheses and separated by commas.
Example:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
You can use code to generate the insert VALUES section based on your data source.
Errors: if there are errors in the INSERT statement (including in any of the rows) the operation will be aborted.
Generating the query - this will be based on your data source, for example, if you are getting data from an associative array in PHP, you'll do something like this:
$sql = "INSERT INTO tbl_name (a, b, c) VALUES ";
foreach($dataset as $row)
{
$sql .= "(" + $row['a'] + ", " + $row['a'] + ", " + $row['a'] + ")";
// OR
$sql .= "($row[a], $row[b], $row[c])";
}
Some more resources:
Optimize MySQL Queries – Fast Inserts With Multiple Rows
The fastest way to insert 100K records
batch insert with SQL: insert into table (col...coln) values (col... coln),(col...coln)... but the SQL length is limited by 1M default, you can change max_allowed_packet parameter to support more bigger single insert
Related
I have 1-many number of records that need to be entered into a table. What is the best way to do this in a query? Should I just make a loop and insert one record per iteration? Or is there a better way?
From the MySQL manual
INSERT statements that use VALUES
syntax can insert multiple rows. To do
this, include multiple lists of column
values, each enclosed within
parentheses and separated by commas.
Example:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
Most of the time, you are not working in a MySQL client and you should batch inserts together using the appropriate API.
E.g. in JDBC:
connection con.setAutoCommit(false);
PreparedStatement prepStmt = con.prepareStatement("UPDATE DEPT SET MGRNO=? WHERE DEPTNO=?");
prepStmt.setString(1,mgrnum1);
prepStmt.setString(2,deptnum1);
prepStmt.addBatch();
prepStmt.setString(1,mgrnum2);
prepStmt.setString(2,deptnum2);
prepStmt.addBatch();
int [] numUpdates=prepStmt.executeBatch();
http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/ad/tjvbtupd.htm
Load data infile query is much better option but some servers like godaddy restrict this option on shared hosting so , only two options left then one is insert record on every iteration or batch insert , but batch insert has its limitaion of characters if your query exceeds this number of characters set in mysql then your query will crash , So I suggest insert data in chunks withs batch insert , this will minimize number of connections established with database.best of luck guys
Insert into table(col1,col2) select col1,col2 from table_2;
Please refer to MySQL documentation on INSERT Statement
mysql allows you to insert multiple rows at once INSERT manual
INSERT INTO test_1 VALUES(24, 'B', '1990-12-07'), (25, 'C', '1990-12-08');
I am doing bulk insert and inserting 0.5 million tokens in the database with insert "ignore statement". in 0.5 million tokens there can be duplicate tokens.
so if i insert 0.5 million tokens in the database with insert ignore statement then there is no guarantee that all of tokens are inserted into the database because of duplicate tokens.
After doing insertion i want to know how many tokens are inserted into the database. some people are suggesting to use affected_rows columns to get count of inserted (affected) rows. But affected_rows doesn't give the output of current sql statement it gives the output of last sql statement.
Please tell me the best way to get count of inserted rows with insert ignore statment.
Put select row_count(); just after the insert statement to get the number of rows inserted.
eg:
insert ignore into tbl(col1) values (1),(2); select row_count();
Doing a single SQL insert ignore would work with affected_rows. Not sure tho how would that turn out performance wise since it's 0.5 mil rows to enter.
Anyhow, here's a solution I tried and works with 4 values in a signle INSERT.
<?php
$mysqli = new mysqli('127.0.0.1','root','','test');
if (mysqli_connect_errno()) {
printf("Connect failed: %s\n", mysqli_connect_error());
exit();
}
$sql = "INSERT IGNORE INTO test1 (Name, Attribute, Val) VALUES ('ai', 'blue', '1j'),('ai1', 'white', '2j'),('ai2', 'black', '3j'),('ai1', 'green', '4j')";
$insert = $mysqli->query($sql);
printf ($mysqli->affected_rows);
?>
If you will consider the following table
table_A (id (PK), value1, value2)
If I want to insert a set of data, for example: (1,5), (1,3), (3,5)
I could perform a query such as :
INSERT INTO table_A (value1, value2) VALUES (1,5), (1,3), (3,5)
which would work. However, I am told prepared statements would be better. Looking into prepared statements it seems I would have to do something like this
$stmt = $dbh->prepare("INSERT INTO table_A (value1, value2) VALUES (?, ?)");
$stmt->bindParam(1, $value1);
$stmt->bindParam(2, $value2);
//for each set of values
$value1 = 1;
$value2 = 5;
$stmt->execute();
my question is, how can a prepared statement be better (performance wise) than the first method? One is a single query, the other involves several executions of the same query. Does the first query get compiled into three separate queries or something?
The prepared statement by itself is not going to be faster when you insert only once. However, if you need to run the same inserts multiple times, you will save on the time it takes to parse the query and prepare the query plan. The prepared statement insert will be parsed once, the plan for it will be cached, and then reused for all subsequent insertions. The statement with multiple embedded values, on the other hand, will need to be re-processed every time you run a new one, slowing the process down.
On the other hand, network roundtrips are slow as well. It may be slower to do an extra roundtrip than to parse and prepare a query plan, so you should profile before making a decision one way or the other.
This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
Is the database query faster if I insert multiple rows at once:
like
INSERT....
UNION
INSERT....
UNION
(I need to insert like 2-3000 rows)
INSERT statements that use VALUES syntax can insert multiple rows. To do this, include multiple lists of column values, each enclosed within parentheses and separated by commas.
Example:
INSERT INTO tbl_name
(a,b,c)
VALUES
(1,2,3),
(4,5,6),
(7,8,9);
Source
If you have your data in a text-file, you can use LOAD DATA INFILE.
When loading a table from a text file, use LOAD DATA INFILE. This is usually 20 times faster than using INSERT statements.
Optimizing INSERT Statements
You can find more tips on how to speed up your insert statements on the link above.
Just use a SELECT statement to get the values for many lines of the chosen columns and put these values into columns of another table in one go. As an example, columns "size" and "price" of the two tables "test_b" and "test_c" get filled with the columns "size" and "price" of table "test_a".
BEGIN;
INSERT INTO test_b (size, price)
SELECT size, price
FROM test_a;
INSERT INTO test_c (size, price)
SELECT size, price
FROM test_a;
COMMIT;
The code is embedded in BEGIN and COMMIT to run it only when both statements have worked, else the whole run up to that point gets withdrawn.
Here is a PHP solution ready for use with a n:m (many-to-many relationship) table :
// get data
$table_1 = get_table_1_rows();
$table_2_fk_id = 123;
// prepare first part of the query (before values)
$query = "INSERT INTO `table` (
`table_1_fk_id`,
`table_2_fk_id`,
`insert_date`
) VALUES ";
//loop the table 1 to get all foreign keys and put it in array
foreach($table_1 as $row) {
$query_values[] = "(".$row["table_1_pk_id"].", $table_2_fk_id, NOW())";
}
// Implode the query values array with a coma and execute the query.
$db->query($query . implode(',',$query_values));
EDIT : After #john's comment I decided to enhance this answer with a more efficient solution :
divides the query to multiple smaller queries
use rtrim() to delete last coma instead of implod()
// limit of query size (lines inserted per query)
$query_values = "";
$limit = 100;
$table_1 = get_table_1_rows();
$table_2_fk_id = 123;
$query = "INSERT INTO `table` (
`table_1_fk_id`,
`table_2_fk_id`,
`insert_date`
) VALUES ";
foreach($table_1 as $row) {
$query_values .= "(".$row["table_1_pk_id"].", $table_2_fk_id, NOW()),";
// entire table parsed or lines limit reached :
// -> execute and purge query_values
if($i === array_key_last($table_1)
|| fmod(++$i / $limit) == 0) {
$db->query($query . rtrim($query_values, ','));
$query_values = "";
}
}
// db table name / blog_post / menu / site_title
// Insert into Table (column names separated with comma)
$sql = "INSERT INTO product_cate (site_title, sub_title)
VALUES ('$site_title', '$sub_title')";
// db table name / blog_post / menu / site_title
// Insert into Table (column names separated with comma)
$sql = "INSERT INTO menu (menu_title, sub_menu)
VALUES ('$menu_title', '$sub_menu', )";
// db table name / blog_post / menu / site_title
// Insert into Table (column names separated with comma)
$sql = "INSERT INTO blog_post (post_title, post_des, post_img)
VALUES ('$post_title ', '$post_des', '$post_img')";
If I insert multiple records with a loop that executes a single record insert, the last insert id returned is, as expected, the last one. But if I do a multiple records insert statement:
INSERT INTO people (name,age)
VALUES ('William',25), ('Bart',15), ('Mary',12);
Let's say the three above are the first records inserted in the table. After the insert statement I expected the last insert id to return 3, but it returned 1. The first insert id for the statement in question.
So can someone please confirm if this is the normal behavior of LAST_INSERT_ID() in the context of multiple records INSERT statements. So I can base my code on it.
Yes. This behavior of last_insert_id() is documented in the MySQL docs:
Important
If you insert multiple rows using a single INSERT statement, LAST_INSERT_ID() returns the value generated for the first inserted row only. The reason for this is to make it possible to reproduce easily the same INSERT statement against some other server.
This behavior is mentioned on the man page for MySQL. It's in the comments but is not challenged, so I'm guessing it's the expected behavior.
I think it's possible if your table has unique autoincrement column (ID) and you don't require them to be returned by mysql itself. I would cost you 3 more DB requests and some processing. It would require these steps:
Get "Before MAX(ID)" right before your insert:
SELECT MAX(id) AS before_max_id FROM table_name`
Make multiple INSERT ... VALUES () query with your data and keep them:
INSERT INTO table_name
(col1, col2)
VALUES
("value1-1" , "value1-2"),
("value2-1" , "value2-2"),
("value3-1" , "value3-2"),
ON DUPLICATE KEY UPDATE
Get "After MAX(ID)" right after your insert:
SELECT MAX(id) AS after_max_id FROM table_name`
Get records with IDs between "Before MAX(ID)" and "After MAX(ID)" including:
SELECT * FROM table_name WHERE id>$before_max_id AND id<=$after_max_id`
Do a check of retrieved data with data you inserted to match them and remove any records that were not inserted by you. The remaining records have your IDs:
foreach ($after_collection as $after_item) {
foreach ($input_collection as $input_item) {
if ( $after_item->compare_content($input_item) ) {
$intersection_array[] = $after_item;
}
}
}
This is just how a common person would solve it in a real world, with parts of code. Thanks to autoincrement it should get smallest possible amount of records to check against, so they will not take lot of processing. This is not the final "copy & paste" code - eg. you have to create your own function compare_content() according you your needs.