How can I INSERT 1 million entries to my MySQL DB? - mysql

I want to test the speed of my SQL queries (update queries) with a real "load" on my DB. I'm relatively fresh to DB's and I am doing more complex queries than I have before, and I'm getting scared by people talking about performance like "30 seconds for 3000 records to be updated" etc. So I want to have a concrete experiment showing what my performance will be in production.
To achieve this, I want to add 10k, 100k, 1M, 10M records to my DB and then run my query.
My issue is, how can I do this? I have a "name" primary key field that must be unique and be <= 15 characters and have alphanumeric entry. The other fields I want to be the same for all created entries (i.e. a "foo" field I want to start at 10000)
If there's a way to do this and get approximately 1M entries (i.e. could be name collisions) that's fine. I'm just looking for a benchmarking dataset.
If there's a better way to benchmark my query, I'm all ears. I'm planning to simply execute and see how long the query says it takes.
Edit: It's worth noting that this is for a server and has nothing to do with "The Web" so I don't have access to PHP. I'm seeing some PHP scripts to populate, is there perhaps a way to have a perl script write out all these queries and then suck them in to the command line mysql tools?

I'm not sure of how to use just MySQL to accomplish this, but if you have access to PHP, then use this:
<?php
$start = time();
$interval = 10000000; // 10M
$con = mysql_connect( 'server', 'user', 'pass' );
mysql_select_db( 'database' );
for ( $i = 0; $i < $interval; $i++ )
{
mysql_query( 'INSERT INTO TABLE (fields) VALUES (values)', $con );
}
$endt = time();
$diff = ( $endt - $start );
print( "{$interval} queries took " . date( 'g:i:s', $diff ) . " to execute." );
?>

If you want to optimize querys you should look into the EXPLAIN statement of MySQL.
To populate your database I would suggest you write your own litte PHP script or check out this one
http://www.generatedata.com
Regarding your edit:
you could generate a big text file with perl and then use the MySQL CLI to load the file into the table, for more info please see:
http://dev.mysql.com/doc/refman/5.0/en/loading-tables.html

You just want to prepopulate your database so that you have something to run your queries against, and you are not benchmarking the initial insertion process?
In that case, just generate your input data as a tab-delimited file and use mysqlimport to populate your database.

Related

How to process BIG (16GB + or 100M+ Rows) file and import into MySQL Database [duplicate]

How can I import a large (14 GB) MySQL dump file into a new MySQL database?
I've searched around, and only this solution helped me:
mysql -u root -p
set global net_buffer_length=1000000; --Set network buffer length to a large byte number
set global max_allowed_packet=1000000000; --Set maximum allowed packet size to a large byte number
SET foreign_key_checks = 0; --Disable foreign key checking to avoid delays,errors and unwanted behaviour
source file.sql --Import your sql dump file
SET foreign_key_checks = 1; --Remember to enable foreign key checks when procedure is complete!
The answer is found here.
Have you tried just using the mysql command line client directly?
mysql -u username -p -h hostname databasename < dump.sql
If you can't do that, there are any number of utilities you can find by Googling that help you import a large dump into MySQL, like BigDump
On a recent project we had the challenge of working with and manipulating a large collection of data. Our client provided us with a 50 CSV files ranging from 30 MB to 350 MB in size and all in all containing approximately 20 million rows of data and 15 columns of data. Our end goal was to import and manipulate the data into a MySQL relational database to be used to power a front-end PHP script that we also developed. Now, working with a dataset this large or larger is not the simplest of tasks and in working on it we wanted to take a moment to share some of the things you should consider and know when working with large datasets like this.
Analyze Your Dataset Pre-Import
I can’t stress this first step enough! Make sure that you take the time to analyze the data you are working with before importing it at all. Getting an understand of what all of the data represents, what columns related to what and what type of manipulation you need to will end up saving you time in the long run.
LOAD DATA INFILE is Your Friend
Importing large data files like the ones we worked with (and larger ones) can be tough to do if you go ahead and try a regular CSV insert via a tool like PHPMyAdmin. Not only will it fail in many cases because your server won’t be able to handle a file upload as large as some of your data files due to upload size restrictions and server timeouts, but even if it does succeed, the process could take hours depending our your hardware. The SQL function LOAD DATA INFILE was created to handle these large datasets and will significantly reduce the time it takes to handle the import process. Of note, this can be executed through PHPMyAdmin, but you may still have file upload issues. In that case you can upload the files manually to your server and then execute from PHPMyAdmin (see their manual for more info) or execute the command via your SSH console (assuming you have your own server)
LOAD DATA INFILE '/mylargefile.csv' INTO TABLE temp_data FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'
MYISAM vs InnoDB
Large or small database it’s always good to take a little time to consider which database engine you are going to use for your project. The two main engines you are going to read about are MYISAM and InnoDB and each has their own benefits and drawbacks. In brief the things to consider (in general) are as follows:
MYISAM
Lower Memory Usage
Allows for Full-Text Searching
Table Level Locking – Locks Entire Table on Write
Great for Read-Intensive Applications
InnoDB
List item
Uses More Memory
No Full-Text Search Support
Faster Performance
Row Level Locking – Locks Single Row on Write
Great for Read/Write Intensive Applications
Plan Your Design Carefully
MySQL AnalyzeYour databases design/structure is going to be a large factor in how it performs. Take your time when it comes to planning out the different fields and analyze the data to figure out what the best field types, defaults and field length. You want to accommodate for the right amounts of data and try to avoid varchar columns and overly large data types when the data doesn’t warrant it. As an additional step after you are done with your database, you make want to see what MySQL suggests as field types for all of your different fields. You can do this by executing the following SQL command:
ANALYZE TABLE my_big_table
The result will be a description of each columns information along with a recommendation for what type of datatype it should be along with a proper length. Now you don’t necessarily need to follow the recommendations as they are based solely on existing data, but it may help put you on the right track and get you thinking
To Index or Not to Index
For a dataset as large as this it’s infinitely important to create proper indexes on your data based off of what you need to do with the data on the front-end, BUT if you plan to manipulate the data beforehand refrain from placing too many indexes on the data. Not only will it will make your SQL table larger, but it will also slow down certain operations like column additions, subtractions and additional indexing. With our dataset we needed to take the information we just imported and break it into several different tables to create a relational structure as well as take certain columns and split the information into additional columns. We placed an index on the bare minimum of columns that we knew would help us with the manipulation. All in all, we took 1 large table consisting of 20 million rows of data and split its information into 6 different tables with pieces of the main data in them along with newly created data based off the existing content. We did all of this by writing small PHP scripts to parse and move the data around.
Finding a Balance
A big part of working with large databases from a programming perspective is speed and efficiency. Getting all of the data into your database is great, but if the script you write to access the data is slow, what’s the point? When working with large datasets it’s extremely important that you take the time to understand all of the queries that your script is performing and to create indexes to help those queries where possible. One such way to analyze what your queries are doing is by executing the following SQL command:
EXPLAIN SELECT some_field FROM my_big_table WHERE another_field='MyCustomField';
By adding EXPLAIN to the start of your query MySQL will spit out information describing what indexes it tried to use, did use and how it used them. I labeled this point ‘Finding a balance’ because although indexes can help your script perform faster, they can just as easily make it run slower. You need to make sure you index what is needed and only what is needed. Every index consumes disk space and adds to the overhead of the table. Every time you make an edit to your table, you have to rebuild the index for that particular row and the more indexes you have on those rows, the longer it will take. It all comes down to making smart indexes, efficient SQL queries and most importantly benchmarking as you go to understand what each of your queries is doing and how long it’s taking to do it.
Index On, Index Off
As we worked on the database and front-end script, both the client and us started to notice little things that needed changing and that required us to make changes to the database. Some of these changes involved adding/removing columns and changing the column types. As we had already setup a number of indexes on the data, making any of these changes required the server to do some serious work to keep the indexes in place and handle any modifications. On our small VPS server, some of the changes were taking upwards of 6 hours to complete…certainly not helpful to us being able to do speedy development. The solution? Turn off indexes! Sometimes it’s better to turn the indexes off, make your changes and then turn the indexes back on….especially if you have a lot of different changes to make. With the indexes off, the changes took a matter of seconds to minutes versus hours. When we were happy with our changes we simply turned our indexes back on. This of course took quite some time to re-index everything, but it was at least able to re-index everything all at once, reducing the overall time needed to make these changes one by one. Here’s how to do it:
Disable Indexes: ALTER TABLE my_big_table DISABLE KEY
Enable Indexes: ALTER TABLE my_big_table ENABLE KEY
Give MySQL a Tune-Up
Don’t neglect your server when it comes to making your database and script run quickly. Your hardware needs just as much attention and tuning as your database and script does. In particular it’s important to look at your MySQL configuration file to see what changes you can make to better enhance its performance.
Don’t be Afraid to Ask
Working with SQL can be challenging to begin with and working with extremely large datasets only makes it that much harder. Don’t be afraid to go to professionals who know what they are doing when it comes to large datasets. Ultimately you will end up with a superior product, quicker development and quicker front-end performance. When it comes to large databases sometimes it’s take a professionals experienced eyes to find all the little caveats that could be slowing your databases performance.
I'm posting my finding in a few of the responses I've seen that didn't mention what I ran into, and apprently this would even defeat BigDump, so check it:
I was trying to load a 500 meg dump via Linux command line and kept getting the "Mysql server has gone away" errors. Settings in my.conf didn't help. What turned out to fix it is...I was doing one big extended insert like:
insert into table (fields) values (a record, a record, a record, 500 meg of data);
I needed to format the file as separate inserts like this:
insert into table (fields) values (a record);
insert into table (fields) values (a record);
insert into table (fields) values (a record);
Etc.
And to generate the dump, I used something like this and it worked like a charm:
SELECT
id,
status,
email
FROM contacts
INTO OUTFILE '/tmp/contacts.sql'
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES STARTING BY "INSERT INTO contacts (id,status,email) values ("
TERMINATED BY ');\n'
Use source command to import large DB
mysql -u username -p
> source sqldbfile.sql
this can import any large DB
I have made a PHP script which is designed to import large database dumps which have been generated by phpmyadmin or mysql dump (from cpanel) . It's called PETMI and you can download it here [project page] [gitlab page].
It works by splitting an. sql file into smaller files called a split and processing each split one at a time. Splits which fail to process can be processed manually by the user in phpmyadmin. This can be easily programmed as in sql dumps, each command is on a new line. Some things in sql dumps work in phpmyadmin imports but not in mysqli_query so those lines have been stripped from the splits.
It has been tested with a 1GB database. It has to be uploaded to an existing website. PETMI is open source and the sample code can be seen on Gitlab.
A moderator asked me to provide some sample code. I'm on a phone so excuse the formatting.
Here is the code that creates the splits.
//gets the config page
if (isset($_POST['register']) && $_POST['register'])
{
echo " <img src=\"loading.gif\">";
$folder = "split/";
include ("config.php");
$fh = fopen("importme.sql", 'a') or die("can't open file");
$stringData = "-- --------------------------------------------------------";
fwrite($fh, $stringData);
fclose($fh);
$file2 = fopen("importme.sql","r");
//echo "<br><textarea class=\"mediumtext\" style=\"width: 500px; height: 200px;\">";
$danumber = "1";
while(! feof($file2)){
//echo fgets($file2)."<!-- <br /><hr color=\"red\" size=\"15\"> -->";
$oneline = fgets($file2); //this is fgets($file2) but formatted nicely
//echo "<br>$oneline";
$findme1 = '-- --------------------------------------------------------';
$pos1 = strpos($oneline, $findme1);
$findme2 = '-- Table structure for';
$pos2 = strpos($oneline, $findme2);
$findme3 = '-- Dumping data for';
$pos3 = strpos($oneline, $findme3);
$findme4 = '-- Indexes for dumped tables';
$pos4 = strpos($oneline, $findme4);
$findme5 = '-- AUTO_INCREMENT for dumped tables';
$pos5 = strpos($oneline, $findme5);
if ($pos1 === false && $pos2 === false && $pos3 === false && $pos4 === false && $pos5 === false) {
// setcookie("filenumber",$i);
// if ($danumber2 == ""){$danumber2 = "0";} else { $danumber2 = $danumber2 +1;}
$ourFileName = "split/sql-split-$danumber.sql";
// echo "writing danumber is $danumber";
$ourFileHandle = fopen($ourFileName, 'a') or die("can't edit file. chmod directory to 777");
$stringData = $oneline;
$stringData = preg_replace("/\/[*][!\d\sA-Za-z#_='+:,]*[*][\/][;]/", "", $stringData);
$stringData = preg_replace("/\/[*][!]*[\d A-Za-z`]*[*]\/[;]/", "", $stringData);
$stringData = preg_replace("/DROP TABLE IF EXISTS `[a-zA-Z]*`;/", "", $stringData);
$stringData = preg_replace("/LOCK TABLES `[a-zA-Z` ;]*/", "", $stringData);
$stringData = preg_replace("/UNLOCK TABLES;/", "", $stringData);
fwrite($ourFileHandle, $stringData);
fclose($ourFileHandle);
} else {
//write new file;
if ($danumber == ""){$danumber = "1";} else { $danumber = $danumber +1;}
$ourFileName = "split/sql-split-$danumber.sql";
//echo "$ourFileName has been written with the contents above.\n";
$ourFileName = "split/sql-split-$danumber.sql";
$ourFileHandle = fopen($ourFileName, 'a') or die("can't edit file. chmod directory to 777");
$stringData = "$oneline";
fwrite($ourFileHandle, $stringData);
fclose($ourFileHandle);
}
}
//echo "</textarea>";
fclose($file2);
Here is the code that imports the split
<?php
ob_start();
// allows you to use cookies
include ("config.php");
//gets the config page
if (isset($_POST['register']))
{
echo "<div id**strong text**=\"sel1\"><img src=\"loading.gif\"></div>";
// the above line checks to see if the html form has been submitted
$dbname = $accesshost;
$dbhost = $username;
$dbuser = $password;
$dbpasswd = $database;
$table_prefix = $dbprefix;
//the above lines set variables with the user submitted information
//none were left blank! We continue...
//echo "$importme";
echo "<hr>";
$importme = "$_GET[file]";
$importme = file_get_contents($importme);
//echo "<b>$importme</b><br><br>";
$sql = $importme;
$findme1 = '-- Indexes for dumped tables';
$pos1 = strpos($importme, $findme1);
$findme2 = '-- AUTO_INCREMENT for dumped tables';
$pos2 = strpos($importme, $findme2);
$dbhost = '';
#set_time_limit(0);
if($pos1 !== false){
$splitted = explode("-- Indexes for table", $importme);
// print_r($splitted);
for($i=0;$i<count($splitted);$i++){
$sql = $splitted[$i];
$sql = preg_replace("/[`][a-z`\s]*[-]{2}/", "", $sql);
// echo "<b>$sql</b><hr>";
if($table_prefix !== 'phpbb_') $sql = preg_replace('/phpbb_/', $table_prefix, $sql);
$res = mysql_query($sql);
}
if(!$res) { echo '<b>error in query </b>', mysql_error(), '<br /><br>Try importing the split .sql file in phpmyadmin under the SQL tab.'; /* $i = $i +1; */ } else {
echo ("<meta http-equiv=\"Refresh\" content=\"0; URL=restore.php?page=done&file=$filename\"/>Thank You! You will be redirected");
}
} elseif($pos2 !== false){
$splitted = explode("-- AUTO_INCREMENT for table", $importme);
// print_r($splitted);
for($i=0;$i<count($splitted);$i++){
$sql = $splitted[$i];
$sql = preg_replace("/[`][a-z`\s]*[-]{2}/", "", $sql);
// echo "<b>$sql</b><hr>";
if($table_prefix !== 'phpbb_') $sql = preg_replace('/phpbb_/', $table_prefix, $sql);
$res = mysql_query($sql);
}
if(!$res) { echo '<b>error in query </b>', mysql_error(), '<br /><br>Try importing the split .sql file in phpmyadmin under the SQL tab.'; /* $i = $i +1; */ } else {
echo ("<meta http-equiv=\"Refresh\" content=\"0; URL=restore.php?page=done&file=$filename\"/>Thank You! You will be redirected");
}
} else {
if($table_prefix !== 'phpbb_') $sql = preg_replace('/phpbb_/', $table_prefix, $sql);
$res = mysql_query($sql);
if(!$res) { echo '<b>error in query </b>', mysql_error(), '<br /><br>Try importing the split .sql file in phpmyadmin under the SQL tab.'; /* $i = $i +1; */ } else {
echo ("<meta http-equiv=\"Refresh\" content=\"0; URL=restore.php?page=done&file=$filename\"/>Thank You! You will be redirected");
}
}
//echo 'done (', count($sql), ' queries).';
}
Simple solution is to run this query:
mysql -h yourhostname -u username -p databasename < yoursqlfile.sql
And if you want to import with progress bar, try this:
pv yoursqlfile.sql | mysql -uxxx -pxxxx databasename
I found below SSH commands are robust for export/import huge MySql databases, at least I'm using them for years. Never rely on backups generated via control panels like cPanel WHM, CWP, OVIPanel, etc they may trouble you especially when you're switching between control panels, trust SSH always.
[EXPORT]
$ mysqldump -u root -p example_database| gzip > example_database.sql.gz
[IMPORT]
$ gunzip < example_database.sql.gz | mysql -u root -p example_database
i have long tried to find a good solution to this question. finally i think i have a solution. from what i understand max_allowed_packet does not have an upper limit. so go head and set my.cnf to say max_allowed_packet=300M
now doing mysql> source sql.file will not do anything better because the dump files, the insert statements are broken into 1mb size. So my 45gb file insert count is ~: 45bg/1mb.
To get around this I parse the sql file with php and make the insert statement into the size i want. In my case i have set packet size to 100mb. so i make the insert string little less. On another machine i have packet size 300M and do inserts of 200M, it works.
Since total of all table size is ~1.2tb i export by database by table. So I have one sql file per table. If yours is different you have to adjust the code accordingly.
<?php
global $destFile, $tableName;
function writeOutFile(&$arr)
{
echo " [count: " . count($arr) .']';
if(empty($arr))return;
global $destFile, $tableName;
$data='';
//~ print_r($arr);
foreach($arr as $i=>$v)
{
$v = str_replace(";\n", '', $v);
//~ $v = str_replace("),(", "),\n(", $v);
$line = ($i==0? $v: str_replace("INSERT INTO `$tableName` VALUES",',', $v));
$data .= $line;
}
$data .= ";\n";
file_put_contents($destFile, $data, FILE_APPEND);
}
$file = '/path/to/sql.file';
$tableName = 'tablename';
$destFile = 'localfile.name';
file_put_contents($destFile, null);
$loop=0;
$arr=[];
$fp = fopen($file, 'r');
while(!feof($fp))
{
$line = fgets($fp);
if(strpos($line, "INSERT INTO `")!==false)$arr[]=$line;
else
{writeOutFile($arr); file_put_contents($destFile, $line, FILE_APPEND);$arr=[];continue;}
$loop++;
if(count($arr)==95){writeOutFile($arr);$arr=[];}
echo "\nLine: $loop, ". count($arr);
}
?>
how this works for you will depend on your hardware. but all things staying same, this process speeds up my imports exponentially. i don't have any benchmarks to share, its my working experience.
according to mysql documentation none of these works! People pay attention!
so we will upload test.sql into the test_db
type this into the shell:
mysql --user=user_name --password=yourpassword test_db < d:/test.sql
navigate to C:\wamp64\alias\phpmyadmin.conf and change from:
php_admin_value upload_max_filesize 128M
php_admin_value post_max_size 128M
to
php_admin_value upload_max_filesize 2048M
php_admin_value post_max_size 2048M
or more
:)

MySQL slow inserts

I have a MySql 8 database server running locally on a decent spec gaming laptop.
When I try to insert a number of records (500k) stored in an in-memory structure in Java via the following code it is extremely slow, we are talking about maybe 1000 records per minute at most.
I don't really know where to look for debug information or what metrics I should provide here to help answer this post, so if you have any guidance or additional information I can supply then please do let me know.
I've temporarily worked around this by saving the data to an in-memory H2 database but really I'd like to persist it and query it at leisure using the MySql workbench.
PreparedStatement ps = conn.prepareStatement(insertSQL);
ArrayList<String> stocks = new ArrayList<String>();
for (Spread spread : spreads) {
ps.setInt(1, spread.buyFeedId);
ps.setInt(2, spread.sellFeedId);
ps.setString(3, spread.stock);
ps.setString(4, spread.buyExchange);
ps.setString(5, spread.sellExchange);
ps.setTimestamp(6, new Timestamp(spread.spreadDateTime.toInstant().toEpochMilli()));
ps.setDouble(7, spread.buyPrice);
ps.setDouble(8, spread.sellPrice);
ps.setDouble(9, spread.diff);
ps.setInt(10, spread.askSize);
ps.setInt(11, spread.bidSize);
ps.execute();
In PHP we have PDO library allowing to actually bulk the inserts in one transaction by using:
$sql = 'my sql statement';
$Conn = DAO::getConnection();
$stmt = $Conn->prepare($sql);
$Conn->beginTransaction();
foreach($data as $row)
{
// now loop through each inner array to match bound values
foreach($row as $column => $value)
{
$stmt->bindValue(':' . $column, $value, PDOUtils::getPDOParam($value));
}
$stmt->execute();
}
$Conn->commit();
In your case with 1000+ inserts, only one transaction would be needed. I'me not java but for sure there is equivalent.
If you have all the 500K records handy and if you just want to insert, don't do insert operation line by line, but use BCP (Bulk copy program) functionality.
Read about mysqlimport or LOAD IN FILE commands and try implementing.
Batch insert the rows. That is, build a single INSERT for each 100-1000 rows. This will run about 10 times as fast. Using a "transaction" is a less effective way of batching.
Here's one discussion of Java + Batch: https://www.viralpatel.net/batch-insert-in-java-jdbc/
Search this for other possible answers on this site:
site:stackoverflow.com java MySQL batch INSERT
Another speedup is to change to
innodb_flush_log_at_trx_commit = 2
That is a speed vs reliability tradeoff -- in favor of speed.

How to get data from big table row by row

I need to get all data from mysql table. What I've try so far is:
my $query = $connection->prepare("select * from table");
$query->execute();
while (my #row=$query->fetchrow_array)
{
print format_row(#row);
}
but there is always a but...
Table has about 600M rows and apparently all results from query is store in memory after execute() command. There is not enough memory for this:(
My question is:
Is there a way to use perl DBI to get data from table row by row?Something like this:
my $query = $connection->prepare("select * from table");
while (my #row=$query->fetchrow_array)
{
#....do stuff
}
btw, pagination is to slow:/
apparently all results from query is store in memory after execute() command
That is the default behaviour of the mysql client library. You can disable it by using the mysql_use_result attribute on the database or statement handle.
Note that the read lock you'll have on the table will be held much longer while all the rows are being streamed to the client code. If that might be a concern you may want to use SQL_BUFFER_RESULT.
The fetchall_arrayref
method takes two parameters, the second of which allows you to limit the number of rows fetched from the table at once
The following code reads 1,000 lines from the table at a time and processes each one
my $sth = $dbh->prepare("SELECT * FROM table");
$sth->execute;
while ( my $chunk = $sth->fetchall_arrayref( undef, 1000 ) ) {
last unless #$chunk; # Empty array returned at end of table
for my $row ( #$chunk ) {
print format_row(#$row);
}
}
When working with Huge Tables I build Data Packages with dynamically built SQL Statements like
$sql = "SELECT * FROM table WHERE id>" . $lastid . " ORDER BY id LIMIT " . $packagesize
The Application will dynamically fill in $lastid according to each Package it processes.
If table has an ID Field id it has also an Index built on that Field so that the Performance is quite well.
It also limits Database Load by little rests between each Query.

How can I import a large (14 GB) MySQL dump file into a new MySQL database?

How can I import a large (14 GB) MySQL dump file into a new MySQL database?
I've searched around, and only this solution helped me:
mysql -u root -p
set global net_buffer_length=1000000; --Set network buffer length to a large byte number
set global max_allowed_packet=1000000000; --Set maximum allowed packet size to a large byte number
SET foreign_key_checks = 0; --Disable foreign key checking to avoid delays,errors and unwanted behaviour
source file.sql --Import your sql dump file
SET foreign_key_checks = 1; --Remember to enable foreign key checks when procedure is complete!
The answer is found here.
Have you tried just using the mysql command line client directly?
mysql -u username -p -h hostname databasename < dump.sql
If you can't do that, there are any number of utilities you can find by Googling that help you import a large dump into MySQL, like BigDump
On a recent project we had the challenge of working with and manipulating a large collection of data. Our client provided us with a 50 CSV files ranging from 30 MB to 350 MB in size and all in all containing approximately 20 million rows of data and 15 columns of data. Our end goal was to import and manipulate the data into a MySQL relational database to be used to power a front-end PHP script that we also developed. Now, working with a dataset this large or larger is not the simplest of tasks and in working on it we wanted to take a moment to share some of the things you should consider and know when working with large datasets like this.
Analyze Your Dataset Pre-Import
I can’t stress this first step enough! Make sure that you take the time to analyze the data you are working with before importing it at all. Getting an understand of what all of the data represents, what columns related to what and what type of manipulation you need to will end up saving you time in the long run.
LOAD DATA INFILE is Your Friend
Importing large data files like the ones we worked with (and larger ones) can be tough to do if you go ahead and try a regular CSV insert via a tool like PHPMyAdmin. Not only will it fail in many cases because your server won’t be able to handle a file upload as large as some of your data files due to upload size restrictions and server timeouts, but even if it does succeed, the process could take hours depending our your hardware. The SQL function LOAD DATA INFILE was created to handle these large datasets and will significantly reduce the time it takes to handle the import process. Of note, this can be executed through PHPMyAdmin, but you may still have file upload issues. In that case you can upload the files manually to your server and then execute from PHPMyAdmin (see their manual for more info) or execute the command via your SSH console (assuming you have your own server)
LOAD DATA INFILE '/mylargefile.csv' INTO TABLE temp_data FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'
MYISAM vs InnoDB
Large or small database it’s always good to take a little time to consider which database engine you are going to use for your project. The two main engines you are going to read about are MYISAM and InnoDB and each has their own benefits and drawbacks. In brief the things to consider (in general) are as follows:
MYISAM
Lower Memory Usage
Allows for Full-Text Searching
Table Level Locking – Locks Entire Table on Write
Great for Read-Intensive Applications
InnoDB
List item
Uses More Memory
No Full-Text Search Support
Faster Performance
Row Level Locking – Locks Single Row on Write
Great for Read/Write Intensive Applications
Plan Your Design Carefully
MySQL AnalyzeYour databases design/structure is going to be a large factor in how it performs. Take your time when it comes to planning out the different fields and analyze the data to figure out what the best field types, defaults and field length. You want to accommodate for the right amounts of data and try to avoid varchar columns and overly large data types when the data doesn’t warrant it. As an additional step after you are done with your database, you make want to see what MySQL suggests as field types for all of your different fields. You can do this by executing the following SQL command:
ANALYZE TABLE my_big_table
The result will be a description of each columns information along with a recommendation for what type of datatype it should be along with a proper length. Now you don’t necessarily need to follow the recommendations as they are based solely on existing data, but it may help put you on the right track and get you thinking
To Index or Not to Index
For a dataset as large as this it’s infinitely important to create proper indexes on your data based off of what you need to do with the data on the front-end, BUT if you plan to manipulate the data beforehand refrain from placing too many indexes on the data. Not only will it will make your SQL table larger, but it will also slow down certain operations like column additions, subtractions and additional indexing. With our dataset we needed to take the information we just imported and break it into several different tables to create a relational structure as well as take certain columns and split the information into additional columns. We placed an index on the bare minimum of columns that we knew would help us with the manipulation. All in all, we took 1 large table consisting of 20 million rows of data and split its information into 6 different tables with pieces of the main data in them along with newly created data based off the existing content. We did all of this by writing small PHP scripts to parse and move the data around.
Finding a Balance
A big part of working with large databases from a programming perspective is speed and efficiency. Getting all of the data into your database is great, but if the script you write to access the data is slow, what’s the point? When working with large datasets it’s extremely important that you take the time to understand all of the queries that your script is performing and to create indexes to help those queries where possible. One such way to analyze what your queries are doing is by executing the following SQL command:
EXPLAIN SELECT some_field FROM my_big_table WHERE another_field='MyCustomField';
By adding EXPLAIN to the start of your query MySQL will spit out information describing what indexes it tried to use, did use and how it used them. I labeled this point ‘Finding a balance’ because although indexes can help your script perform faster, they can just as easily make it run slower. You need to make sure you index what is needed and only what is needed. Every index consumes disk space and adds to the overhead of the table. Every time you make an edit to your table, you have to rebuild the index for that particular row and the more indexes you have on those rows, the longer it will take. It all comes down to making smart indexes, efficient SQL queries and most importantly benchmarking as you go to understand what each of your queries is doing and how long it’s taking to do it.
Index On, Index Off
As we worked on the database and front-end script, both the client and us started to notice little things that needed changing and that required us to make changes to the database. Some of these changes involved adding/removing columns and changing the column types. As we had already setup a number of indexes on the data, making any of these changes required the server to do some serious work to keep the indexes in place and handle any modifications. On our small VPS server, some of the changes were taking upwards of 6 hours to complete…certainly not helpful to us being able to do speedy development. The solution? Turn off indexes! Sometimes it’s better to turn the indexes off, make your changes and then turn the indexes back on….especially if you have a lot of different changes to make. With the indexes off, the changes took a matter of seconds to minutes versus hours. When we were happy with our changes we simply turned our indexes back on. This of course took quite some time to re-index everything, but it was at least able to re-index everything all at once, reducing the overall time needed to make these changes one by one. Here’s how to do it:
Disable Indexes: ALTER TABLE my_big_table DISABLE KEY
Enable Indexes: ALTER TABLE my_big_table ENABLE KEY
Give MySQL a Tune-Up
Don’t neglect your server when it comes to making your database and script run quickly. Your hardware needs just as much attention and tuning as your database and script does. In particular it’s important to look at your MySQL configuration file to see what changes you can make to better enhance its performance.
Don’t be Afraid to Ask
Working with SQL can be challenging to begin with and working with extremely large datasets only makes it that much harder. Don’t be afraid to go to professionals who know what they are doing when it comes to large datasets. Ultimately you will end up with a superior product, quicker development and quicker front-end performance. When it comes to large databases sometimes it’s take a professionals experienced eyes to find all the little caveats that could be slowing your databases performance.
I'm posting my finding in a few of the responses I've seen that didn't mention what I ran into, and apprently this would even defeat BigDump, so check it:
I was trying to load a 500 meg dump via Linux command line and kept getting the "Mysql server has gone away" errors. Settings in my.conf didn't help. What turned out to fix it is...I was doing one big extended insert like:
insert into table (fields) values (a record, a record, a record, 500 meg of data);
I needed to format the file as separate inserts like this:
insert into table (fields) values (a record);
insert into table (fields) values (a record);
insert into table (fields) values (a record);
Etc.
And to generate the dump, I used something like this and it worked like a charm:
SELECT
id,
status,
email
FROM contacts
INTO OUTFILE '/tmp/contacts.sql'
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES STARTING BY "INSERT INTO contacts (id,status,email) values ("
TERMINATED BY ');\n'
Use source command to import large DB
mysql -u username -p
> source sqldbfile.sql
this can import any large DB
I have made a PHP script which is designed to import large database dumps which have been generated by phpmyadmin or mysql dump (from cpanel) . It's called PETMI and you can download it here [project page] [gitlab page].
It works by splitting an. sql file into smaller files called a split and processing each split one at a time. Splits which fail to process can be processed manually by the user in phpmyadmin. This can be easily programmed as in sql dumps, each command is on a new line. Some things in sql dumps work in phpmyadmin imports but not in mysqli_query so those lines have been stripped from the splits.
It has been tested with a 1GB database. It has to be uploaded to an existing website. PETMI is open source and the sample code can be seen on Gitlab.
A moderator asked me to provide some sample code. I'm on a phone so excuse the formatting.
Here is the code that creates the splits.
//gets the config page
if (isset($_POST['register']) && $_POST['register'])
{
echo " <img src=\"loading.gif\">";
$folder = "split/";
include ("config.php");
$fh = fopen("importme.sql", 'a') or die("can't open file");
$stringData = "-- --------------------------------------------------------";
fwrite($fh, $stringData);
fclose($fh);
$file2 = fopen("importme.sql","r");
//echo "<br><textarea class=\"mediumtext\" style=\"width: 500px; height: 200px;\">";
$danumber = "1";
while(! feof($file2)){
//echo fgets($file2)."<!-- <br /><hr color=\"red\" size=\"15\"> -->";
$oneline = fgets($file2); //this is fgets($file2) but formatted nicely
//echo "<br>$oneline";
$findme1 = '-- --------------------------------------------------------';
$pos1 = strpos($oneline, $findme1);
$findme2 = '-- Table structure for';
$pos2 = strpos($oneline, $findme2);
$findme3 = '-- Dumping data for';
$pos3 = strpos($oneline, $findme3);
$findme4 = '-- Indexes for dumped tables';
$pos4 = strpos($oneline, $findme4);
$findme5 = '-- AUTO_INCREMENT for dumped tables';
$pos5 = strpos($oneline, $findme5);
if ($pos1 === false && $pos2 === false && $pos3 === false && $pos4 === false && $pos5 === false) {
// setcookie("filenumber",$i);
// if ($danumber2 == ""){$danumber2 = "0";} else { $danumber2 = $danumber2 +1;}
$ourFileName = "split/sql-split-$danumber.sql";
// echo "writing danumber is $danumber";
$ourFileHandle = fopen($ourFileName, 'a') or die("can't edit file. chmod directory to 777");
$stringData = $oneline;
$stringData = preg_replace("/\/[*][!\d\sA-Za-z#_='+:,]*[*][\/][;]/", "", $stringData);
$stringData = preg_replace("/\/[*][!]*[\d A-Za-z`]*[*]\/[;]/", "", $stringData);
$stringData = preg_replace("/DROP TABLE IF EXISTS `[a-zA-Z]*`;/", "", $stringData);
$stringData = preg_replace("/LOCK TABLES `[a-zA-Z` ;]*/", "", $stringData);
$stringData = preg_replace("/UNLOCK TABLES;/", "", $stringData);
fwrite($ourFileHandle, $stringData);
fclose($ourFileHandle);
} else {
//write new file;
if ($danumber == ""){$danumber = "1";} else { $danumber = $danumber +1;}
$ourFileName = "split/sql-split-$danumber.sql";
//echo "$ourFileName has been written with the contents above.\n";
$ourFileName = "split/sql-split-$danumber.sql";
$ourFileHandle = fopen($ourFileName, 'a') or die("can't edit file. chmod directory to 777");
$stringData = "$oneline";
fwrite($ourFileHandle, $stringData);
fclose($ourFileHandle);
}
}
//echo "</textarea>";
fclose($file2);
Here is the code that imports the split
<?php
ob_start();
// allows you to use cookies
include ("config.php");
//gets the config page
if (isset($_POST['register']))
{
echo "<div id**strong text**=\"sel1\"><img src=\"loading.gif\"></div>";
// the above line checks to see if the html form has been submitted
$dbname = $accesshost;
$dbhost = $username;
$dbuser = $password;
$dbpasswd = $database;
$table_prefix = $dbprefix;
//the above lines set variables with the user submitted information
//none were left blank! We continue...
//echo "$importme";
echo "<hr>";
$importme = "$_GET[file]";
$importme = file_get_contents($importme);
//echo "<b>$importme</b><br><br>";
$sql = $importme;
$findme1 = '-- Indexes for dumped tables';
$pos1 = strpos($importme, $findme1);
$findme2 = '-- AUTO_INCREMENT for dumped tables';
$pos2 = strpos($importme, $findme2);
$dbhost = '';
#set_time_limit(0);
if($pos1 !== false){
$splitted = explode("-- Indexes for table", $importme);
// print_r($splitted);
for($i=0;$i<count($splitted);$i++){
$sql = $splitted[$i];
$sql = preg_replace("/[`][a-z`\s]*[-]{2}/", "", $sql);
// echo "<b>$sql</b><hr>";
if($table_prefix !== 'phpbb_') $sql = preg_replace('/phpbb_/', $table_prefix, $sql);
$res = mysql_query($sql);
}
if(!$res) { echo '<b>error in query </b>', mysql_error(), '<br /><br>Try importing the split .sql file in phpmyadmin under the SQL tab.'; /* $i = $i +1; */ } else {
echo ("<meta http-equiv=\"Refresh\" content=\"0; URL=restore.php?page=done&file=$filename\"/>Thank You! You will be redirected");
}
} elseif($pos2 !== false){
$splitted = explode("-- AUTO_INCREMENT for table", $importme);
// print_r($splitted);
for($i=0;$i<count($splitted);$i++){
$sql = $splitted[$i];
$sql = preg_replace("/[`][a-z`\s]*[-]{2}/", "", $sql);
// echo "<b>$sql</b><hr>";
if($table_prefix !== 'phpbb_') $sql = preg_replace('/phpbb_/', $table_prefix, $sql);
$res = mysql_query($sql);
}
if(!$res) { echo '<b>error in query </b>', mysql_error(), '<br /><br>Try importing the split .sql file in phpmyadmin under the SQL tab.'; /* $i = $i +1; */ } else {
echo ("<meta http-equiv=\"Refresh\" content=\"0; URL=restore.php?page=done&file=$filename\"/>Thank You! You will be redirected");
}
} else {
if($table_prefix !== 'phpbb_') $sql = preg_replace('/phpbb_/', $table_prefix, $sql);
$res = mysql_query($sql);
if(!$res) { echo '<b>error in query </b>', mysql_error(), '<br /><br>Try importing the split .sql file in phpmyadmin under the SQL tab.'; /* $i = $i +1; */ } else {
echo ("<meta http-equiv=\"Refresh\" content=\"0; URL=restore.php?page=done&file=$filename\"/>Thank You! You will be redirected");
}
}
//echo 'done (', count($sql), ' queries).';
}
Simple solution is to run this query:
mysql -h yourhostname -u username -p databasename < yoursqlfile.sql
And if you want to import with progress bar, try this:
pv yoursqlfile.sql | mysql -uxxx -pxxxx databasename
I found below SSH commands are robust for export/import huge MySql databases, at least I'm using them for years. Never rely on backups generated via control panels like cPanel WHM, CWP, OVIPanel, etc they may trouble you especially when you're switching between control panels, trust SSH always.
[EXPORT]
$ mysqldump -u root -p example_database| gzip > example_database.sql.gz
[IMPORT]
$ gunzip < example_database.sql.gz | mysql -u root -p example_database
i have long tried to find a good solution to this question. finally i think i have a solution. from what i understand max_allowed_packet does not have an upper limit. so go head and set my.cnf to say max_allowed_packet=300M
now doing mysql> source sql.file will not do anything better because the dump files, the insert statements are broken into 1mb size. So my 45gb file insert count is ~: 45bg/1mb.
To get around this I parse the sql file with php and make the insert statement into the size i want. In my case i have set packet size to 100mb. so i make the insert string little less. On another machine i have packet size 300M and do inserts of 200M, it works.
Since total of all table size is ~1.2tb i export by database by table. So I have one sql file per table. If yours is different you have to adjust the code accordingly.
<?php
global $destFile, $tableName;
function writeOutFile(&$arr)
{
echo " [count: " . count($arr) .']';
if(empty($arr))return;
global $destFile, $tableName;
$data='';
//~ print_r($arr);
foreach($arr as $i=>$v)
{
$v = str_replace(";\n", '', $v);
//~ $v = str_replace("),(", "),\n(", $v);
$line = ($i==0? $v: str_replace("INSERT INTO `$tableName` VALUES",',', $v));
$data .= $line;
}
$data .= ";\n";
file_put_contents($destFile, $data, FILE_APPEND);
}
$file = '/path/to/sql.file';
$tableName = 'tablename';
$destFile = 'localfile.name';
file_put_contents($destFile, null);
$loop=0;
$arr=[];
$fp = fopen($file, 'r');
while(!feof($fp))
{
$line = fgets($fp);
if(strpos($line, "INSERT INTO `")!==false)$arr[]=$line;
else
{writeOutFile($arr); file_put_contents($destFile, $line, FILE_APPEND);$arr=[];continue;}
$loop++;
if(count($arr)==95){writeOutFile($arr);$arr=[];}
echo "\nLine: $loop, ". count($arr);
}
?>
how this works for you will depend on your hardware. but all things staying same, this process speeds up my imports exponentially. i don't have any benchmarks to share, its my working experience.
according to mysql documentation none of these works! People pay attention!
so we will upload test.sql into the test_db
type this into the shell:
mysql --user=user_name --password=yourpassword test_db < d:/test.sql
navigate to C:\wamp64\alias\phpmyadmin.conf and change from:
php_admin_value upload_max_filesize 128M
php_admin_value post_max_size 128M
to
php_admin_value upload_max_filesize 2048M
php_admin_value post_max_size 2048M
or more
:)

Updating the db 6000 times will take few minutes?

I am writing a test program with Ruby and ActiveRecord, and it reads a document
which is like 6000 words long. And then I just tally up the words by
recordWord = Word.find_by_s(word);
if (recordWord.nil?)
recordWord = Word.new
recordWord.s = word
end
if recordWord.count.nil?
recordWord.count = 1
else
recordWord.count += 1
end
recordWord.save
and so this part loops for 6000 times... and it takes a few minutes to
run at least using sqlite3. Is it normal? I was expecting it could run
within a couple seconds... can MySQL speed it up a lot?
With 6000 calls to write to the database, you're going to see speed issues. I would save the various tallies in memory and save to the database once at the end, not 6000 times along the way.
Take a look at AR:Extensions as well to handle the bulk insertions.
http://rubypond.com/articles/2008/06/18/bulk-insertion-of-data-with-activerecord/
I wrote up some quick code in perl that simply does:
Create the database
Insert a record that only contains a single integer
Retrieve the most recent record and verify that it returns what it inserted
And it does steps #2 and #3 6000 times. This is obviously a considerably lighter workload than having an entire object/relational bridge. For this trivial case with SQLite it still took 17 seconds to execute, so your desire to have it take "a couple of seconds" is not realistic on "traditional hardware."
Using the monitor I verified that it was primarily disk activity that was slowing it down. Based on that if for some reason you really do need the database to behave that quickly I suggest one of two options:
Do what people have suggested and find away around the requirement
Try buying some solid state disks.
I think #1 is a good way to start :)
Code:
#!/usr/bin/perl
use warnings;
use strict;
use DBI;
my $dbh = DBI->connect('dbi:SQLite:dbname=/tmp/dbfile', '', '');
create_database($dbh);
insert_data($dbh);
sub insert_data {
my ($dbh) = #_;
my $insert_sql = "INSERT INTO test_table (test_data) values (?)";
my $retrieve_sql = "SELECT test_data FROM test_table WHERE test_data = ?";
my $insert_sth = $dbh->prepare($insert_sql);
my $retrieve_sth = $dbh->prepare($retrieve_sql);
my $i = 0;
while (++$i < 6000) {
$insert_sth->execute(($i));
$retrieve_sth->execute(($i));
my $hash_ref = $retrieve_sth->fetchrow_hashref;
die "bad data!" unless $hash_ref->{'test_data'} == $i;
}
}
sub create_database {
my ($dbh) = #_;
my $status = $dbh->do("DROP TABLE test_table");
# return error status if CREATE resulted in error
if (!defined $status) {
print "DROP TABLE failed";
}
my $create_statement = "CREATE TABLE test_table (id INTEGER PRIMARY KEY AUTOINCREMENT, \n";
$create_statement .= "test_data varchar(255)\n";
$create_statement .= ");";
$status = $dbh->do($create_statement);
# return error status if CREATE resulted in error
if (!defined $status) {
die "CREATE failed";
}
}
What kind of database connection are you using? Some databases allow you to connect 'directly' rather then using a TCP network connection that goes through the network stack. In other words, if you're making an internet connection and sending data through that way, it can slow things down.
Another way to boost performance of a database connection is to group SQL statements together in a single command.
For example, making a single 6,000 line SQL statement that looks like this
"update words set count = count + 1 where word = 'the'
update words set count = count + 1 where word = 'in'
...
update words set count = count + 1 where word = 'copacetic'"
and run that as a single command, performance will be a lot better. By default, MySQL has a 'packet size' limit of 1 megabyte, but you can change that in the my.ini file to be larger if you want.
Since you're abstracting away your database calls through ActiveRecord, you don't have much control over how the commands are issued, so it can be difficult to optimize your code.
Another thin you could do would be to keep a count of words in memory, and then only insert the final total into the database, rather then doing an update every time you come across a word. That will probably cut down a lot on the number of inserts, because if you do an update every time you come across the word 'the', that's a huge, huge waste. Words have a 'long tail' distribution and the most common words are hugely more common then more obscure words. Then the underlying SQL would look more like this:
"update words set count = 300 where word = 'the'
update words set count = 250 where word = 'in'
...
update words set count = 1 where word = 'copacetic'"
If you're worried about taking up too much memory, you could count words and periodically 'flush' them. So read a couple megabytes of text, then spend a few seconds updating the totals, rather then updating each word every time you encounter it. If you want to improve performance even more, you should consider issuing SQL commands in batches directly
Without knowing about Ruby and Sqlite, some general hints:
create a unique index on Word.s (you did not state whether you have one)
define a default for Word.count in the database ( DEFAULT 1 )
optimize assignment of count:
recordWord = Word.find_by_s(word);
if (recordWord.nil?)
recordWord = Word.new
recordWord.s = word
recordWord.count = 1
else
recordWord.count += 1
end
recordWord.save
Use BEGIN TRANSACTION before your updates then COMMIT at the end.
ok, i found some general rule:
1) use a hash to keep the count first, not the db
2) at the end, wrap all insert or updates in one transaction, so that it won't hit the db 6000 times.