how to FAST import a giant sql script for mysql? - mysql

Currently I have a situation which needs to import a giant sql script into mysql. The sql script content is mainly about INSERT operation. But there are so much records over there and the file size is around 80GB.
The machine has 8 cpus, 20GB mem. I have done something like:
mysql -h [*host_ip_address*] -u *username* -px xxxxxxx -D *databaseName* < giant.sql
But the whole process takes serveral days which is quite long.Is any other options to import the sql file into database?
Thanks so much.

I suggest you to try LOAD DATA INFILE. It is extremely fast. I've not used it for loading to remote server, but there is mysqlimport utility. See a comparison of different approaches: https://dev.mysql.com/doc/refman/5.5/en/insert-speed.html.
Also you need to convert your sql script to format suitable for LOAD DATA INFILE clause.

You can break the sql file into several files (on basis of tables) by using shell script & then prepare a shell script to one by one to import the file. This would speedy insert instead of one go.
The reason is that inserted records occupied space in memory for a single process and not remove. You can see when you are importing script after 5 hours the query execution speed would be slow.

Thanks for all your guys help.
I have taken some of your advices and done some comparison on this, now it is time to post the results. The target single sql script 15GB.
Overall, I tried:
importing data as single sql script with index; (Take Days, finally I killed it. DONOT TRY THIS YOURSELF, you will be pissed off.)
importing data as single sql script without index; (Same as above)
importing data as split sql script with index (Take the single sql as an example, I split the big file into small trunks around 41MB each. Each trunk takes around 2m19.586s, Total around );
importing data as split sql script without index; (Each trunk takes 2m9.326s.)
(Unfortunately I did not tried the Load Data method for this dataset)
Conclusion:
If you do not want to use Load Data method when you have to import a giant sql into mysql. It is better to:
Divide into small scripts;
Remove the index
You can add the index back after importing. Cheers
Thanks #btilly #Hitesh Mundra

Put the following commands at the head of giant.sql file
SET AUTOCOMMIT = 0;
SET FOREIGN_KEY_CHECKS=0;
and following at the end
SET FOREIGN_KEY_CHECKS = 1;
COMMIT;
SET AUTOCOMMIT = 1;

Related

import csv file with LOAD DATA LOCAL INFILE in symfony 1.4

I need to fill several of tables with CSV files. I tried to use a loop that do insert with each row but a file with 65,000 records take me more then 20 min.
I want to use the MySQL command LOAD DATA LOCAL INFILE, but I received this message :
LOAD DATA LOCAL INFILE forbidden in C:\xampp\htdocs\myProject\apps\backend\modules\member\actions\actions.class.php on line 112
After a little research, I understand there is need to change one of the security parameters of the PDO (PDO::MYSQL_ATTR_LOCAL_INFILE) to true.
In symfony2, you need to change it at config.yml of your app, but I can't find it on symfony 1.4.
Let me try to understand the question (or questions?!).
If you need to optimize the INSERT queries you should probably batch them at a single INSERT query or a few ones, but definitely not for each row. Besides, the INSERT query in MySQL will be always slow especially for a large amount of data inserted, also depends on indexing, engine and schema structure of the DB.
About the second question, take a look here, maybe it will help.

Big data migration from Oracle to MySQL

I received over 100GB of data with 67million records from one of the retailers. My objective is to do some market-basket analysis and CLV. This data is a direct sql dump from one of the tables with 70 columns. I'm trying to find a way to extract information from this data as managing itself in a small laptop/desktop setup is becoming time consuming. I considered the following options
Parse the data and convert the same to CSV format. File size might come down to around 35-40GB as more than half of the information in each records is column names. However, I may still have to use a db as I cant use R or Excel with 66 million records.
Migrate the data to mysql db. Unfortunately I don't have the schema for the table and I'm trying to recreate the schema looking at the data. I may have to replace to_date() in the data dump to str_to_date() to match with MySQL format.
Are there any better way to handle this? All that I need to do is extract the data from the sql dump by running some queries. Hadoop etc. are options, but I dont have the infrastructure to setup a cluster. I'm considering mysql as I have storage space and some memory to spare.
Suppose I go in the MySQL path, how would I import the data? I'm considering one of the following
Use sed and replace to_date() with appropriate str_to_date() inline. Note that, I need to do this for a 100GB file. Then import the data using mysql CLI.
Write python/perl script that will read the file, convert the data and write to mysql directly.
What would be faster? Thank you for your help.
In my opinion writing a script will be faster, because you are going to skip the SED part.
I think that you need to setup a server on a separate PC, and run the script from your laptop.
Also use tail to faster get a part from the bottom of this large file, in order to test your script on that part before you run it on this 100GB file.
I decided to go with the MySQL path. I created the schema looking at the data (had to increase a few of the column size as there were unexpected variations in the data) and wrote a python script using MySQLdb module. Import completed in 4hr 40mins on my 2011 MacBook Pro with 8154 failures out of 67 million records. Those failures were mostly data issues. Both client and server are running on my MBP.
#kpopovbg, yes, writing script was faster. Thank you.

How to insert data to mysql directly (not using sql queries)

I have a MySQL database that I use only for logging. It consists of several simple look-alike MyISAM tables. There is always one local (i.e. located on the same machine) client that only writes data to db and several remote clients that only read data.
What I need is to insert bulks of data from local client as fast as possible.
I have already tried many approaches to make this faster such as reducing amount of inserts by increasing the length of values list, or using LOAD DATA .. INFILE and some others.
Now it seems to me that I've came to the limitation of parsing values from string to its target data type (doesn't matter if it is done when parsing queries or a text file).
So the question is:
does MySQL provide some means of manipulating data directly for local clients (i.e. not using SQL)? Maybe there is some API that allow inserting data by simply passing a pointer.
Once again. I don't want to optimize SQL code or invoke the same queries in a script as hd1 adviced. What I want is to pass a buffer of data directly to the database engine. This means I don't want to invoke SQL at all. Is it possible?
Use mysql's LOAD DATA command:
Write the data to file in CSV format then execute this OS command:
LOAD DATA INFILE 'somefile.csv' INTO TABLE mytable
For more info, see the documentation
Other than LOAD DATA INFILE, I'm not sure there is any other way to get data into MySQL without using SQL. If you want to avoid parsing multiple times, you should use a client library that supports parameter binding, the query can be parsed and prepared once and executed multiple times with different data.
However, I highly doubt that parsing the query is your bottleneck. Is this a dedicated database server? What kind of hard disks are being used? Are they fast? Does your RAID controller have battery backed RAM? If so, you can optimize disk writes. Why aren't you using InnoDB instead of MyISAM?
With MySQL you can insert multiple tuples with one insert statement. I don't have an example, because I did this several years ago and don't have the source anymore.
Consider as mentioned to use one INSERT with multiple values:
INSERT INTO table_name (col1, col2) VALUES (1, 'A'), (2, 'B'), (3, 'C'), ( ... )
This leads to you only having to connect to your database with one bigger query instead of several smaller. It's easier to take in the entire couch through the door once than running back and forth with all disassembled pieces of the couch, opening the door every time. :)
Apart from that, you can also run LOCK TABLES table_name WRITE before INSERT and UNLOCK TABLES afterwards. That will secure that nothing else is inserted during.
Lock tables
INSERT into foo (foocol1, foocol2) VALUES ('foocol1val1', 'foocol2val1'),('foocol1val2','foocol2val2') and so on should sort you. More information and sample code will be found here. If you have further problems, do leave a comment.
UPDATE
If you don't want to use SQL, then try this shell script to do as many inserts as you want, put it in a file, say insertToDb.sh, and get on with your day/evening:
#!/bin/sh
mysql --user=me --password=foo dbname -h foo.example.com -e "insert into tablename (col1, col2) values ($1, $2);"
Invoke as sh insertToDb.sh col1value col2value. If I've still misunderstood your question, leave another comment.
After making some investigation I found no way of passing data directly to mysql database engine (without parsing it).
My aim was to speed up communication between local client and db server as much as possible. The idea was if client is local then it could use some api functions to pass data to db engine thus not using (i.e. parsing) SQL and values in it. The only closest solution was proposed by bobwienholt (using prepared statement and binding parameters). But LOAD DATA .. INFILE appeared to be a bit faster in my case.
The best way to insert data on MS SQL without using insert into or update queries is just to access MS SQL Interface. Right click on the table name and select "Edit top 200 rows". Then you will be able to add data on the database directly by just typing per cell. For you to enable searching or using select or other sql commands just right click on any of the 200 rows you have selected. Go to pane then select SQL and you can add sql command. Check it out. :D
without using insert statement , use " Sqllite Studio " for inserting data in mysql. It's free and open source so u can download and check.

Importing 10 billion rows into mysql

I have a .csv file with 10 billion rows. I want to check that each row is unique. Is there an easy way to do this? I was thinking perhaps importing to mysql would allow me to find out uniqueness quickly. How can I upload this huge file to mysql? I have already tried row-by-row insert statements and also the 'LOAD DATA INFILE' command but both failed.
Thanks
I wouldn't use a database for this purpose, unless it needed to end up in the database eventually. Assuming you have the same formatting for each row (so that you don't have "8.230" and "8.23", or extra spaces on start/end of lines of equal values), use a few textutils included with most POSIX environments (Linux, Mac OS X), or available for Windows via GnuWIn32 coreutils.
Here is the sequence of steps to do from your system shell. First, sort the file (this step is required):
sort ten.csv > ten_sorted.csv
Then find unique rows from sorted data:
uniq ten_sorted.csv > ten_uniq.csv
Now you can check to see how many rows there are in the final file:
wc ten_uniq.csv
Or you can just use pipes for combining the three steps with one command line:
sort ten.csv | uniq | wc
Does the data have a unique identifier? Have this column as primary key in your mysql table and when you go to import the data, mysql should throw an error if you have duplicates.
As for how to go about doing it..just read in the file row by row and do an insert on each row.
If you are importing from Excel or such other programs. See here for how to cleanse the csv file before importing it into MySQL. Regarding the unique row, as long as your table schema is right, MySQL should be able to take care of it.
EDIT:
Whether the source is Excel or not, LOAD DATA LOCAL INFILE appears to be the way to go.
10bn rows, and LOAD DATA LOCAL gives you error? Are you sure there is no problem with the csv file?
You have to truncate your database into separate small bite size chunks. Use Big Dump.
http://www.ozerov.de/bigdump.php
If you do have 10 billion rows then you will struggle working with this data.
You would need to look at partitioning your database (ref here: about mysql partitioning)
However, even with that large number you would be requiring some serious hardware to cut through the work involved there.
Also, what would you do if a row was found to be nonunique? Would you want to continue importing the data? If you import the data would you import the identical row or flag it as a duplicate? Would you stop processing.
This is the kind of job linux is "made for".
First you have to split the file in to many smaller files:
split -l 100 filename
After this you have few options with the two commands sort / uniq, and after having timed 8 different options with a file of 1 million IP address from an ad exchange log-file, and found a almost 20x difference between using LC_ALL=C or not. For example:
LC_ALL=C sort IP_1m_rows.txt > temp_file
LC_ALL=C uniq temp_file > IP_unique_rows.txt
real 0m1.283s
user 0m1.121s
sys 0m0.088s
Where as the same without LC=ALL_C:
sort IP_1m_rows.txt > temp_file
uniq temp_file > IP_unique_rows.txt
real 0m24.596s
user 0m24.065s
sys 0m0.201s
Piping the command and using LC_ALL=C was 2x slower than the fastest:
LC_ALL=C sort IP_1m_rows.txt | uniq > IP_unique_rows.txt
real 0m3.532s
user 0m3.677s
sys 0m0.106s
Databases are not useful for one-off jobs like this, and flatfiles will get you surprisingly far even with more challenging / long-term objectives.

MySQL BinLog Statement Retrieval

I have seven 1G MySQL binlog files that I have to use to retrieve some "lost" information. I only need to get certain INSERT statements from the log (ex. where the statement starts with "INSERT INTO table SET field1="). If I just run a mysqlbinlog (even if per database and with using --short-form), I get a text file that is several hundred megabytes, which makes it almost impossible to then parse with any other program.
Is there a way to just retrieve certain sql statements from the log? I don't need any of the ancillary information (timestamps, autoincrement #s, etc.). I just need a list of sql statements that match a certain string. Ideally, I would like to have a text file that just lists those sql statements, such as:
INSERT INTO table SET field1='a';
INSERT INTO table SET field1='tommy';
INSERT INTO table SET field1='2';
I could get that by running mysqlbinlog to a text file and then parsing the results based upon a string, but the text file is way too big. It just times out any script I run and even makes it impossible to open in a text editor.
Thanks for your help in advance.
I never received an answer, but I will tell you what I did to get by.
1. Ran mysqlbinlog to a textfile
2. Created a PHP script that uses fgets to read each line of the log
3. While looping through each line, the script parses it using the stristr function
4. If the line matches the string I am looking for, it logs the line to a file
It takes a while to run mysqlbinlog and the PHP script, but it no longer times out. I originally used fread in PHP, but that reads the entire file into memory and caused the script to crash on large (1G) log files. Now, it takes several minutes to run (I also set the max_execution_time variable to be larger), but it works like a charm. fgets gets one line at a time, so it doesn't take up nearly as much memory.