Speed of INSERT vs LOAD DATA LOCAL INFILE - mysql

I have many tens of thousands of rows of data that need to be inserted into a MySQL InnoDB table from a remote client. The client (Excel VBA over MySQL ODBC connector via ADO) can either generate a CSV and perform a LOAD DATA LOCAL INFILE, or else can prepare an enormous INSERT INTO ... VALUES (...), (...), ... statement and execute that.
The former requires some rather ugly hacks to overcome Excel's inability to output Unicode CSV natively (it only writes CSV in the system locale's default codepage, which in many cases is a single-byte character set and therefore quite limited); but the MySQL documentation suggests it could be 20 times faster than the latter approach (why?), which also "feels" as though it might be less stable due to the extremely long SQL command.
I have not yet been able to benchmark the two approaches, but I would be very interested to hear thoughts on likely performance/stability issues.

I'm thinking maybe a hybrid solution would work well here... As in...
First create a prepared statement for performance
PREPARE stmt1 FROM 'INSERT INTO table (column1, column2, ...) VALUES (?, ?, ...)';
Observe that the ? marks are actual syntax - you use a question mark wherever you intend to eventually use a value parsed from the CSV file.
Write a procedure or function that opens the .CSV file and enters into a loop that reads the contents one row at a time (one record at a time), storing the values of the parsed columns in separate variables.
Then, within this loop, just after reading a record into local variables, you set the values in the prepared statement to your current record in local variables, as in...
SET #a = 3;
SET #b = 4;
There should be the same number of SET statements as there are columns in the CSV file. If not, you have missed something. The order is extremely important as you must set the values according to the position of the ? marks in the prepared statement. This means you will have to ensure the SET statements match column for column with the columns in your INSERT statement.
After setting all the parameters for the prepared statement, you then execute it.
EXECUTE stmt1 USING #a, #b;
This then is the end of the loop. Just after exiting the loop (after reaching end of file of the CSV), you must release the prepared statement, as in...
DEALLOCATE PREPARE stmt1;
Important things to keep in mind are ...
Make sure you prepare the INSERT statement before entering into the loop reading records, and make sure you DEALLOCATE the statement after exiting the loop.
Prepared statements allow the database to pre-compile and optimize the statement one time, then execute it multiple times with changing parameter values. This should result in a nice performance increase.
I am not certain about MySQL, but some databases also let you specify a number of rows to cache before a prepared statement actually executes across the network - if this is possible with MySQL, doing so will allow you to tell the database that although you are calling execute on the statement for every row read from the CSV, that the database should batch up the statements up to the specified number of rows, and only then execute across the network. In this way performance is greatly increased as the database may batch up 5 or 10 INSERTS and execute them using only one round trip over the network instead of one per row.
Hope this helps and is relevant. Good Luck!
Rodney

Related

Does latest Mysql and Postgres prepare every query automatically?

I come across this in sqlx docs:
On most databases, statements will actually be prepared behind the scenes whenever a query is executed. However, you may also explicitly prepare statements for reuse elsewhere with sqlx.DB.Prepare():
Although I can't find proof that databases actually prepare every query.
So is it true, should I use prepare manually?
MySQL and PostgreSQL definitely don't prepare every query. You can execute queries directly, without doing a prepare & execute sequence.
The Go code in the sqlx driver probably does this, but it's elective, done to make it simpler to code the Go interface when you pass args.
You don't need to use the Prepare() func manually, unless you want to reuse the prepared query, executing it multiple times with different args.

Firefox add-on: Populating sqlite database with a lot of data (around 80000 rows) not working with executeSimpleSQL()

The firefox add-on that I am trying to code needs a big database.
I was advised not to load the database itself from the 'data' directory (using the addon-sdk to develop locally on my linux box).
So I decided to get the content from a csv file and insert it into the database that I created.
The thing is that the csv has about 80 000 rows and I get an error when I try to pass .executeSimpleSQL() the reaaaaally long INSERT statement as a string
('insert into table
values (row1val1,row1val2,row1val3),
(row2val1,row2val2,row2val3),
...
(row80000val1,row80000val2,row80000val3)')
Should I insert asynchronously? Use prepared statements?
Should I consider another approach, loading the database as an sqlite file directly?
You may be crossing some sqlite limits.
From sqlite Implementation Limits:
Maximum Length Of An SQL Statement
The maximum number of bytes in the text of an SQL statement is limited
to SQLITE_MAX_SQL_LENGTH which defaults to 1000000. You can redefine
this limit to be as large as the smaller of SQLITE_MAX_LENGTH and
1073741824.
If an SQL statement is limited to be a million bytes in length, then
obviously you will not be able to insert multi-million byte strings by
embedding them as literals inside of INSERT statements. But you should
not do that anyway. Use host parameters for your data. Prepare short
SQL statements like this:
INSERT INTO tab1 VALUES(?,?,?);
Then use the sqlite3_bind_XXXX()
functions to bind your large string values to the SQL statement. The
use of binding obviates the need to escape quote characters in the
string, reducing the risk of SQL injection attacks. It is also runs
faster since the large string does not need to be parsed or copied as
much.
The maximum length of an SQL statement can be lowered at run-time
using the sqlite3_limit(db,SQLITE_LIMIT_SQL_LENGTH,size) interface.
You cannot use that many records in a single INSERT statement;
SQLite limits the number to its internal parameter SQLITE_LIMIT_COMPOUND_SELECT, which is 500 by default.
Just use multiple INSERT statements.

MySQL multiple statement blocks

As we know, MySQL supports Multiple Statement Queries, i.e., we can execute two or more statements separated by a semicolon with only one function call. This can be done using, for example, the PHP function mysqli_multi_query().
Now I have a question. I want to execute the following two statements in one call, but the call does not return for a very long time. I wonder whether these two statements will cause dead blocks.
If so, how should I resolve it?
update users set user_openid='' where user_openid='12345';
update users set user_openid='23456', user_fakeid='34567' where user_login='cifer';

How to insert data to mysql directly (not using sql queries)

I have a MySQL database that I use only for logging. It consists of several simple look-alike MyISAM tables. There is always one local (i.e. located on the same machine) client that only writes data to db and several remote clients that only read data.
What I need is to insert bulks of data from local client as fast as possible.
I have already tried many approaches to make this faster such as reducing amount of inserts by increasing the length of values list, or using LOAD DATA .. INFILE and some others.
Now it seems to me that I've came to the limitation of parsing values from string to its target data type (doesn't matter if it is done when parsing queries or a text file).
So the question is:
does MySQL provide some means of manipulating data directly for local clients (i.e. not using SQL)? Maybe there is some API that allow inserting data by simply passing a pointer.
Once again. I don't want to optimize SQL code or invoke the same queries in a script as hd1 adviced. What I want is to pass a buffer of data directly to the database engine. This means I don't want to invoke SQL at all. Is it possible?
Use mysql's LOAD DATA command:
Write the data to file in CSV format then execute this OS command:
LOAD DATA INFILE 'somefile.csv' INTO TABLE mytable
For more info, see the documentation
Other than LOAD DATA INFILE, I'm not sure there is any other way to get data into MySQL without using SQL. If you want to avoid parsing multiple times, you should use a client library that supports parameter binding, the query can be parsed and prepared once and executed multiple times with different data.
However, I highly doubt that parsing the query is your bottleneck. Is this a dedicated database server? What kind of hard disks are being used? Are they fast? Does your RAID controller have battery backed RAM? If so, you can optimize disk writes. Why aren't you using InnoDB instead of MyISAM?
With MySQL you can insert multiple tuples with one insert statement. I don't have an example, because I did this several years ago and don't have the source anymore.
Consider as mentioned to use one INSERT with multiple values:
INSERT INTO table_name (col1, col2) VALUES (1, 'A'), (2, 'B'), (3, 'C'), ( ... )
This leads to you only having to connect to your database with one bigger query instead of several smaller. It's easier to take in the entire couch through the door once than running back and forth with all disassembled pieces of the couch, opening the door every time. :)
Apart from that, you can also run LOCK TABLES table_name WRITE before INSERT and UNLOCK TABLES afterwards. That will secure that nothing else is inserted during.
Lock tables
INSERT into foo (foocol1, foocol2) VALUES ('foocol1val1', 'foocol2val1'),('foocol1val2','foocol2val2') and so on should sort you. More information and sample code will be found here. If you have further problems, do leave a comment.
UPDATE
If you don't want to use SQL, then try this shell script to do as many inserts as you want, put it in a file, say insertToDb.sh, and get on with your day/evening:
#!/bin/sh
mysql --user=me --password=foo dbname -h foo.example.com -e "insert into tablename (col1, col2) values ($1, $2);"
Invoke as sh insertToDb.sh col1value col2value. If I've still misunderstood your question, leave another comment.
After making some investigation I found no way of passing data directly to mysql database engine (without parsing it).
My aim was to speed up communication between local client and db server as much as possible. The idea was if client is local then it could use some api functions to pass data to db engine thus not using (i.e. parsing) SQL and values in it. The only closest solution was proposed by bobwienholt (using prepared statement and binding parameters). But LOAD DATA .. INFILE appeared to be a bit faster in my case.
The best way to insert data on MS SQL without using insert into or update queries is just to access MS SQL Interface. Right click on the table name and select "Edit top 200 rows". Then you will be able to add data on the database directly by just typing per cell. For you to enable searching or using select or other sql commands just right click on any of the 200 rows you have selected. Go to pane then select SQL and you can add sql command. Check it out. :D
without using insert statement , use " Sqllite Studio " for inserting data in mysql. It's free and open source so u can download and check.

Changing current mysql database in a procedure?

For our system we are using multiple databases with the same structure. For example when we have 1000 customers, there will be 1000 databases. We've chosen to give each customers his own database, so we can delete all his data at once without any hassle.
Now I have to update the database structure several times a year. So I began to write a stored procedure which loops through all schemas. But I got stuck with executing a dynamic USE statement.
My code is as follows:
DECLARE V_SCHEMA VARCHAR(100);
SET V_SCHEMA = 'SomeSchemaName';
SET #QUERYSTRING = CONCAT('USE ', V_SCHEMA);
PREPARE S FROM #QUERYSTRING;
EXECUTE S;
DEALLOCATE PREPARE S;
When I execute this code I get an error which says Error Code: 1295. This command is not supported in the prepared statement protocol yet. So I assume that I cannot change the active database in a procedure.
I have searched the internet, but the only thing I found was creating a string of each alter query and prepare/execute/deallocate it. I hope there is a better solution for this. I could write a shell script that loops through the schemas and executes a SQL file on them, but I prefer a stored procedure that takes care of this.
Does anyone know how to make this work?
Thank you for your help!
EDIT: I use the latest stable version of MySQL 5.6
If there are some known databases, then try to write a CASE.
Otherwise, do not execute USE statement using prepared statements; instead, build other statements (SELECT, INSERT, UPDATE, ...) with full name - <database name> + '.' + <object name>, and execute them using prepared statements.
If you put your structure changes into a stored procedure in a temporary schema, you can do this within a Workbench SQL window.
You can build your iteration script using a query against the information_schema, e.g.
SELECT GROUP_CONCAT(CONCAT('USE ',schema_name,'; CALL tmp.Upgrade')
SEPARATOR ';\n') AS BldCode
FROM information_schema.schemata
WHERE schema_name NOT IN
('information_schema', 'performance_schema', 'mysql', 'sakila', 'world', 'tmp')
Since you cannot execute this as a prepared statement, you can copy the SQL result into a new SQL window, and run that.
Please note that the structure changes stored procedure would need to operate on the current schema, rather than specifying schemas.