I have a mySQL stored procedure that is executing a lot of inserts.
Since performance was being bad, I decided not to Insert in every loop rather than build a string for the INSERT and in the end of the procedure I would do just one big INSERT for all the values with a prepared statement.
The problem is that my statement will be bigger than a VARCHAR. It would have to be TEXT.
The question is: Can I do that? Or does a prepared statement have to be VARCHAR?
What is the maximum length of a prepared statement?
Use LOAD DATA. It's the option with the best performance when making a lot of inserts in mySQL, you can modify your input data as well.
Check about: http://dev.mysql.com/doc/refman/5.0/en/packet-too-large.html - this actual limit on single query
but your problem looks deeper, if you need to perform too many inserts, you need to prepare single insert statement, and in loop bind values to it and execute one by one
to speed up this process you will need to set autocommit to off, but then you will have next problem: http://dev.mysql.com/doc/refman/5.0/en/innodb-restrictions.html
InnoDB has a limit of 1023 concurrent transactions that have created undo records by modifying data.
and innodb_log_file_size
this both setting will limit your transaction size, so basically you need to add commit each Nth row (instead of each 1 row), Nth is better to determine programmatic
UPDATE: check comment from #BillKarwin - basically we can say that there is no limit on transaction size
Related
I have a mysql table that keep gaining new records every 5 seconds.
The questions are
can I run query on this set of data that may takes more than 5 seconds?
if SELECT statement takes more than 5s, will it affect the scheduled INSERT statement?
what happen when INSERT statement invoked while SELECT is still running, will SELECT get the newly inserted records?
I'll go over your questions and some of the comments you added later.
can I run query on this set of data that may takes more than 5 seconds?
Can you? Yes. Should you? It depends. In a MySQL configuration I set up, any query taking longer than 3 seconds was considered slow and logged accordingly. In addition, you need to keep in mind the frequency of the queries you intend to run.
For example, if you try to run a 10 second query every 3 seconds, you can probably see how things won't end well. If you run a 10 second query every few hours or so, then it becomes more tolerable for the system.
That being said, slow queries can often benefit from optimizations, such as not scanning the entire table (i.e. search using primary keys), and using the explain keyword to get the database's query planner to tell you how it intends to work on that internally (e.g. is it using PKs, FKs, indices, or is it scanning all table rows?, etc).
if SELECT statement takes more than 5s, will it affect the scheduled INSERT statement?
"Affect" in what way? If you mean "prevent insert from actually inserting until the select has completed", that depends on the storage engine. For example, MyISAM and InnoDB are different, and that includes locking policies. For example, MyISAM tends to lock entire tables while InnoDB tends to lock specific rows. InnoDB is also ACID-compliant, which means it can provide certain integrity guarantees. You should read the docs on this for more details.
what happen when INSERT statement invoked while SELECT is still running, will SELECT get the newly inserted records?
Part of "what happens" is determined by how the specific storage engine behaves. Regardless of what happens, the database is designed to answer application queries in a way that's consistent.
As an example, if the select statement were to lock an entire table, then the insert statement would have to wait until the select has completed and the lock has been released, meaning that the app would see the results prior to the insert's update.
I understand that locking database can prevent messing up the SELECT statement.
It can also put a potentially unacceptable performance bottleneck, especially if, as you say, the system is inserting lots of rows every 5 seconds, and depending on the frequency with which you're running your queries, and how efficiently they've been built, etc.
what is the good practice to do when I need the data for calculations while those data will be updated within short period?
My recommendation is to simply accept the fact that the calculations are based on a snapshot of the data at the specific point in time the calculation was requested and to let the database do its job of ensuring the consistency and integrity of said data. When the app requests data, it should trust that the database has done its best to provide the most up-to-date piece of consistent information (i.e. not providing a row where some columns have been updated, but others yet haven't).
With new rows coming in at the frequency you mentioned, reasonable users will understand that the results they're seeing are based on data available at the time of request.
All of your questions are related to locking of table.
Your all questions depend on the way database is configured.
Read : http://www.mysqltutorial.org/mysql-table-locking/
Perform Select Statement While insert statement working
If you want to perform a select statement during insert SQL is performing, you should check by open new connection and close connection every time. i.e If I want to insert lots of records, and want to know that last record has inserted by selecting query. I must have to open connection and close connection in for loop or while loop.
# send a request to store data
insert statement working // take a long time
# select statement in while loop.
while true:
cnx.open()
select statement
cnx.close
//break while loop if you get the result
I'm using MySQL 5.6. Let's say we have the following two tables:
Every DataSet has a huge amount of child DataEntry records that the number would be 10000 or 100000 or more. DataSet.md5sum and DataSet.version get updated when its child DataEntry records are inserted or deleted, in one transaction. A DataSet.md5sum is calculated against all of its children DataEntry.content s.
Under this situation, What's the most efficient way to fetch consistent data from those two tables?
If I issue the following two distinct SELECTs, I think I might get inconsistent data due to concurrent INSERT / UPDATEs:
SELECT md5sum, version FROM DataSet WHERE dataset_id = 1000
SELECT dataentry_id, content FROM DataEntry WHERE dataset_id = 1000 -- I think the result of this query will possibly incosistent with the md5sum which fetched by former query
I think I can get consistent data with one query as follows:
SELECT e.dataentry_id, e.content, s.md5sum, s.version
FROM DataSet s
INNER JOIN DataEntry e ON (s.dataset_id = e.dataset_id)
WHERE s.dataset_id = 1000
But it produces redundant dataset which filled with 10000 or 100000 duplicated md5sums, So I guess it's not efficient (EDIT: My concerns are high network bandwidth and memory consumption).
I think using pessimistic read / write lock (SELECT ... LOCK IN SHARE MODE / FOR UPDATE) would be another option but it seems overkill. Are there any other better approaches?
The join will ensure that the data returned is not affected by any updates that would have occurred between the two separate selects, since they are being executed as a single query.
When you say that md5sum and version are updated, do you mean the child table has a trigger on it for inserts and updates?
When you join the tables, you will get a "duplicate md5sum and version" because you are pulling the matching record for each item in the DataEntry table. It is perfectly fine and isn't going to be an efficiency issue. The alternative would be to use the two individual selects, but depending upon the frequency of inserts/updates, without a transaction, you run the very slight risk of getting data that may be slightly off.
I would just go with the join. You can run explain plans on your query from within mysql and look at how the query is executed and see any differences between the two approaches based upon your data and if you have any indexes, etc...
Perhaps it would be more beneficial to run these groups of records into a staging table of sorts. Before processing, you could call a pre-processor function that takes a "snapshot" of the data about to be processed, putting a copy into a staging table. Then you could select just the version and md5sum alone, and then all of the records, as two different selects. Since these are copied into a separate staging table, you wont have to worry about immediate updates corrupting your session of processing. You could set up timed jobs to do this or have it as an on-demand call. Again though, this would be something you would need to research the best approach given the hardware/network setup you are working with. And any job scheduling software you have available to you.
Use this pattern:
START TRANSACTION;
SELECT ... FOR UPDATE; -- this locks the row
...
UPDATE ...
COMMIT;
(and check for errors after every statement, including COMMIT.)
"100000" is not "huge", but "BIGINT" is. Recomment INT UNSIGNED instead.
For an MD5, make sure you are not using utf8: CHAR(32) CHARACTER SET ascii. This goes for any other hex strings.
Or, use BINARY(16) for half the space. Then use UNHEX(md5...) when inserting, and HEX(...) when fetching.
You are concerned about bandwidth, etc. Please describe your client (PHP? Java? ...). Please explain how much (100K rows?) needs to be fetched to re-do the MD5.
Note that there is a MD5 function in MySQL. If each of your items had an MD5, you could take the MD5 of the concatenation of those -- and do it entirely in the server; no bandwidth needed. (Be sure to increase group_concat_max_len)
I want to execute the following two MySQL statements:
1) SELECT * FROM table1 WHERE field1=val1 FOR UPDATE;
2) UPDATE table1 SET field2=val2 WHERE field1=val1;
It is important that the second statement exactly changes the rows returned by the first statement (no additional rows and no one less). Therefore I execute the transactions with auto_commit=false and use the "for update" version of the select statement.
"For Update" locks all rows it returns, so they are in the original state when the second statement is executed. But what about insertions? Is it possible that another thread inserts a new row with field1=val1 inbetween, which then gets changed by the second statement?
And another question: Does it make a difference if the second statement doesn't changes the rows itself but does something like the following?
3) INSERT INTO table2 (SELECT * FROM table1 WHERE field1=val1)
If (3) is in the same transaction as (1), is then ensured that both selects return exactly the same elements?
edit:
I'm using InnoDB and I read some stuff about next key locking and gap locking.
As far as I understood it, while executing (1), InnoDB would not only lock the selected rows but also the accessed indices.
So am I right to say that this problem doesn't occur if I have an index over column "field1"? What if there is no index for it? Is it different then?
I can't speak to the mySQL case completely, but I rather doubt it. The only thing that SELECT FROM ... FOR UPDATE (appears) to do is preventing other transactions from modifying the given set of rows.
It doesn't restrict future statements - if another row is inserted (by another transaction) before that UPDATE statement runs, it'll be updated as well.
Needless to say, the third statement you've given will also fall prey to the same issue.
What are you attempting to actually do here? It's possible there's another way to accomplish this. For example, if there's some sort of 'insertedAt' timestamp, you could probably just add that as an extra condition.
I want to write a procedure that will handle the insert of data into 2 tables. If the insert should fail in either one then the whole procedure should fail. I've tried this many different ways and cannot get it to work. I've purposefully made my second insert fail but the data is inserted into the first table anyway.
I've tried to nest IF statements based on the rowcount but even though the data fails on the second insert, the data is still being inserted into the first table. I'm looking for a total number of 2 affected rows.
Can someone please show me how to handle multiple inserts and rollback if one of them fails? A short example would be nice.
If you are using InnoDB tables (or other compatible engine) you can use the Transaction feature of MySQL that allows you to do exactly what you want.
Basically you start the transaction
do the queries checking for the result
If every result is OK you call the CONMIT
else you call the ROLLBACK to void all the queries within the transaction.
You can read and article about with examples here.
HTH!
You could try turning autocommit off. It might be automatically committing your first insert even though you haven't explicitly committed the transaction that's been started:
SET autocommit = 0;
START TRANSACTION
......
Does it depend on the number of values sets? Does it depend on the number of bytes in the INSERT statement?
You can insert infinitely large number of records using INSERT ... SELECT pattern, provided you have those records, or part of, in other tables.
But if you are hard-coding the values using INSERT ... VALUES pattern, then there is a limit on how large/long your statement is: max_allowed_packet which limits the length of SQL statements sent by the client to the database server, and it affects any types of queries and not only for INSERT statement.
Ideally, Mysql allow infinite number of rows creation in single insert (at once) but when a
MySQL client or the mysqld server receives a packet bigger than max_allowed_packet bytes, it issues a Packet too large error and closes the connection.
To view what the default value is for max_allowed_packet variable, execute the following command in in MySQL:
show variables like 'max_allowed_packet';
Standard MySQL installation has a default value of 1048576 bytes (1MB). This can be increased by setting it to a higher value for a session or connection.
This sets the value to 500MB for everyone (that's what GLOBAL means):
SET GLOBAL max_allowed_packet=524288000;
check your change in new terminal with new connection:
show variables like 'max_allowed_packet';
Now it should work without any error for infinite records insert. Thanks
Query is limited by max_allowed_packet in general.
You will hit the max_allowed_packet limit and
error: 1390 Prepared statement contains too many placeholders.
You can put 65535 placeholders in one sql.So if you have two columns in one row,you can insert 32767 rows in one sql.
Import of 50K+ Records in MySQL Gives General error: 1390 Prepared statement contains too many placeholders
refer to http://forums.mysql.com/read.php?20,161869, it's related with your mysql's configuration: max_allowed_packet, bulk_insert_buffer_size, key_buffer_size.
You can insert an infinite number of rows with one INSERT statement. For example, you could execute a stored procedure that has a loop executed a thousand times, each time running an INSERT query.
Or your INSERT could trip a trigger which itself performs an INSERT. Which trips another trigger. And so on.
No, it does not depend on the number of value sets. Nor does it depend on the number of bytes.
There is a limit to how deeply nested your parentheses may be, and a limit to how long your total statement is. Both of these are referenced, ironically, on thedailywtf.com . However, both of the means I mentioned above get around these limits.
I believe there's no defined number of rows you're limited to inserting per INSERT, but there may be some sort of maximum size for queries in general.
It is limited by max_allowed_packet.
You can specify by using:
mysqld --max_allowed_packet=32M
It is by default 16M.
You can also specify in my.cnf in /etc/mysql/