PhpMyAdmin data import performance issues - mysql

Originally, my question was related to the fact that PhpMyAdmin's SQL section wasn't working properly. As suggested in the comments, I realized that it was the amount of the input is impossible to handle. However, this didn't provide me with a valid solution of how to deal with the files that have (in my case - 35 thousand record lines) in format of (CSV):
...
20120509,126,1590.6,0
20120509,127,1590.7,1
20120509,129,1590.7,6
...
The Import option in PhpMyadmin is struggling just as the basic copy-paste input in SQL section does. This time, same as previously, it takes 5 minutes until the max execution time is called and then it stops. What is interesting tho, it adds like 6-7 thousand of records into the table. So that means the input actually goes through and does that almost successfully. I also tried halving the amount of data in the file. Nothing has changed however.
There is clearly something wrong now. It is pretty annoying to have to play with the data in php script when simple data import is not work.

Change your php upload max size.
Do you know where your php.ini file is?
First of all, try putting this file into your web root:
phpinfo.php
( see http://php.net/manual/en/function.phpinfo.php )
containing:
<?php
phpinfo();
?>
Then navigate to http://www.yoursite.com/phpinfo.php
Look for "php.ini".
To upload large files you need max_execution_time, post_max_size, upload_max_filesize
Also, do you know where your error.log file is? It would hopefully give you a clue as to what is going wrong.
EDIT:
Here is the query I use for the file import:
$query = "LOAD DATA LOCAL INFILE '$file_name' INTO TABLE `$table_name` FIELDS TERMINATED BY ',' OPTIONALLY
ENCLOSED BY '\"' LINES TERMINATED BY '$nl'";
Where $file_name is the temporary filename from php global variable $_FILES, $table_name is the table already prepared for import, and $nl is a variable for the csv line endings (default to windows line endings but I have an option to select linux line endings).
The other thing is that the table ($table_name) in my script is prepared in advance by first scanning the csv to determine column types. After it determines appropriate column types, it creates the MySQL table to receive the data.
I suggest you try creating the MySQL table definition first, to match what's in the file (data types, character lengths, etc). Then try the above query and see how fast it runs. I don't know how much of a factor the MySQL table definition is on speed.
Also, I have no indexes defined in the table until AFTER the data is loaded. Indexes slow down data loading.

Related

Importing and exporting TSVs with MySQL

I'm using a database with MySQL 5.7, and sometimes, data needs to be updated using a mixture of scripts and manual editing. Because people working with the database are usually not familiar with SQL, I'd like to export the data as a TSV, which then could be manipulated (for example with Python's pandas module) and then be imported back. I assume the standard way would be to directly connect to the database, but using TSVs has some upsides in this situation, I think. I've been reading the MySQL docs and some stackoverflow questions to find the best way to do this. I've found a couple of solutions, however, they all are somewhat inconvenient. I will list them below and explain my problems with them.
My question is: did I miss something, for example some helpful SQL commands or CLI options to help with this? Or are the solutions I found already the best when importing/exporting TSVs?
My example database looks like this:
Database: Export_test
Table: Sample
Field
Type
Null
Key
id
int(11)
NO
PRI
text_data
text
NO
optional
int(11)
YES
time
timestamp
NO
Example data:
INSERT INTO `Sample` VALUES (1,'first line\\\nsecond line',NULL,'2022-02-16 20:17:38');
The data contains an escaped newline, which caused a lot of problems for me when exporting.
Table: Reference
Field
Type
Null
Key
id
int(11)
NO
PRI
foreign_key
int(11)
NO
MUL
Example data:
INSERT INTO `Reference` VALUES (1,1);
foreign_key is referencing a Sample.id.
Note about encoding: As a caveat for people trying to do the same thing: If you want to export/import data, make sure that characters sets and collations are set up correctly for connections. This caused me some headache, because although the data itself is utf8mb4, the client, server and connection character sets were latin1, which caused some loss of data in some instances.
Export
So, for exporting, I found basically three solutions, and they all behave somewhat differently:
A: SELECT stdout redirection
mysql Export_test -e "SELECT * FROM Sample;" > out.tsv
Output:
id text_data optional time
1 first line\\\nsecond line NULL 2022-02-16 21:26:13
Pros:
headers are added, which makes it easy to use with external programs
formatting works as intended
Cons:
NULL is used for null values; when importing, \N is required instead; as far as I know, this can't be configured for exports
Workaround: replace NULL values when editing the data
B: SELECT INTO OUTFILE
mysql Export_test -e "SELECT * FROM Sample INTO OUTFILE '/tmp/out.tsv';"
Output:
1 first line\\\
second line \N 2022-02-16 21:26:13
Pros:
\N is used for null data
Cons:
escaped linebreaks are not handled correctly
headers are missing
file writing permission issues
Workaround: fix linebreaks manually; add headers by hand or supply them in the script; use /tmp/ as output directory
C: mysqldump with --tab (performs SELECT INTO OUTFILE behind the scenes)
mysqldump --tab='/tmp/' --skip-tz-utc Export_test Sample
Output, pros and cons: same as export variant B
Something that should be noted: the output is only the same as B, if --skip-tz-utc is used; otherwise, timestamps will be converted to UTC, and will be off after importing the data.
Import
Something I didn't realize it first, is that it's impossible to merely update data directly with LOAD INTO or mysqlimport, although that's something many GUI tools appear to be doing and other people attempted. For me as an beginner, this wasn't immediately clear from the MySQL docs. A workaround appears to be creating an empty table, import the data there and then updating the actual table of interest via a join. I also thought one could update individual columns with this, which again is not possible. If there are some other ways to achieve this, I would really like to know.
As far as I could tell, there are two options, which do pretty much the same thing:
LOAD INTO:
mysql Export_test -e "SET FOREIGN_KEY_CHECKS = 0; LOAD DATA INFILE '/tmp/Sample.tsv' REPLACE INTO TABLE Sample IGNORE 1 LINES; SET FOREIGN_KEY_CHECKS = 1;"
mysqlimport (performs LOAD INTO behind the scenes):
mysqlimport --replace Export_test /tmp/Sample.tsv
Notice: if there are foreign key constraints like in this example, SET FOREIGN_KEY_CHECKS = 0; needs to be performed (as far as I can tell, mysqlimport can't be directly used in these cases). Also, IGNORE 1 LINES or --ignore-lines can be used to skip the first line if the input TSV contains a header. For mysqlimport, the name of the input file without extension must be the name of the table. Again, file reading permissions can be an issue, and /tmp/ is used to avoid that.
Are there ways to make this process more convenient? Like, are there some options I can use to avoid the manual workarounds, or are there ways to use TSV importing to UPDATE entries without creating a temporary table?
What I ended up doing was using LOAD INTO OUTFILE for exporting, added a header manually and also fixed the malformed lines by hand. After manipulating the data, I used LOAD DATA INTO to update the data. In another case, I exported with SELECT to stdout redirection, manipulated the data and then added a script, which just created a file with a bunch of UPDATE ... WHERE statements with the corresponding data. Then I ran the resulting .sql in my database. Is the latter maybe the best option in this case?
Exporting and importing is indeed sort of clunky in MySQL.
One problem is that it introduces a race condition. What if you export data to work on it, then someone modifies the data in the database, then you import your modified data, overwriting your friend's recent changes?
If you say, "no one is allowed to change data until you re-import the data," that could cause an unacceptably long time where clients are blocked, if the table is large.
The trend is that people want the database to minimize downtime, and ideally to have no downtime at all. Advancements in database tools are generally made with this priority in mind, not so much to accommodate your workflow of taking the data out of MySQL for transformations.
Also what if the database is large enough that the exported data causes a problem because where do you store a 500GB TSV file? Does pandas even work on such a large file?
What most people do is modify data while it remains in the database. They use in-place UPDATE statements to modify data. If they can't do this in one pass (there's a practical limit of 4GB for a binary log event, for example), then they UPDATE more modest-size subsets of rows, looping until they have transformed the data on all rows of a given table.

MySQL workbench table data export extremely slow

I just downloaded the newest version of MySQL Workbench (6.3.6) and attempted to export a remote table (on Google CloudSQL) to csv using the new "table data export" wizard. The table had about 600,000 rows and the final downloaded size was about 75MB. It took 7.5 hours.
I realize I can use Google Developer Console to perform this export (which I did, and took about 15 seconds), but it seems that something is wrong with MySQL Workbench. Could there be a configuration issue which is causing this to go so slowly?
I know this question is quite old but I'm answering as I recently had this issue. I was trying to export 2 million + rows and it had taken 2 days to only complete half. This was after trying several different ways of export. Then found this:
SELECT *
FROM my_table
INTO OUTFILE 'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/my file.csv'
FIELDS ENCLOSED BY '"'
TERMINATED BY ';'
ESCAPED BY '"'
LINES TERMINATED BY '\r\n';
And it completed in 80 seconds!
Please note: if you hit secure_file_priv issue then set the file path to be equal to the result of:
SHOW VARIABLES LIKE "secure_file_priv"
Description:
Workbench is very slow exporting large datasets through the CSV export wizard. Disproportionately slow comapred to a smaller set. However, this is something I've come across before with .NET.
How to repeat:
Get a table with 15k or so records or more, and export through the wizard. Note how long it takes and then export a subset of that data and see how the time taken does not correlate linearly with the amount of rows.
Suggested fix:
Something I've noticed when building CSV export applications is that the MS .NET framework can't deal with huge strings very well, and tends to perform badly as a result.
I found a solution though. When building up the huge string to the dump into the file when you've done generating it, instead of building 1 huge string and writing it to file all at once when the export is done, I get much better performance by only doing a few hundred rows of CSV generated at a time, write it to the file and flush the buffer you have been writing the generated data to.
I'd recommend writing to a temp file, then rename/move it to the user's specified one when done. The Write to temp and then move/rename is the way Photoshop and some other applications save their data. And the writing x rows and flushing I've found when developing myself is much faster than trying to get .NET to manage a 20MB string.
Try using ETL tools Pental ETL
or
https://www.mycli.net/

import csv file with LOAD DATA LOCAL INFILE in symfony 1.4

I need to fill several of tables with CSV files. I tried to use a loop that do insert with each row but a file with 65,000 records take me more then 20 min.
I want to use the MySQL command LOAD DATA LOCAL INFILE, but I received this message :
LOAD DATA LOCAL INFILE forbidden in C:\xampp\htdocs\myProject\apps\backend\modules\member\actions\actions.class.php on line 112
After a little research, I understand there is need to change one of the security parameters of the PDO (PDO::MYSQL_ATTR_LOCAL_INFILE) to true.
In symfony2, you need to change it at config.yml of your app, but I can't find it on symfony 1.4.
Let me try to understand the question (or questions?!).
If you need to optimize the INSERT queries you should probably batch them at a single INSERT query or a few ones, but definitely not for each row. Besides, the INSERT query in MySQL will be always slow especially for a large amount of data inserted, also depends on indexing, engine and schema structure of the DB.
About the second question, take a look here, maybe it will help.

Importing Data MySQL

I have a huge dataset what is the faster way to upload data in MySQL PHP database and is there anyway to verify all datas are imported or not.
Any suggestion or hints will be greatly appreciate. Thanks.
If the data set is simply huge (can be transferred within hours), it is not worth the effort of finding an efficient way - any script should be able to do the job. I am assuming you are reading from some non-db format (eg. plain text) ? In that way, simply read, and insert.
If you require careful processing before you insert the rows, you might want to consider creating real objects in memory and their sub-objects first and then mapping them to rows and tables - Object-Relational data source patterns will be valuable here. This will, however, be much slower, and I would not recommend it unless it's absolutely necessary, especially if you are doing it just once.
For very fast access, some people wrote a direct binary blob of objects on the disk and then read it directly into an array, but that is available in languages like C/C++; I am not sure if/how it can be used in a scripted language. Again, this is good for READING the data back into memory, not transferring to DB.
The easiest way to verify that the data has been transferred is to compare the count(*) of the db with the number of items in your file. The more advanced way is to compute hash (eg. sha1) of primary key sets.
I used LOAD DATA, this is a standard MySql Loader Tools. It's work fine and faster. there are many options.
You can use :
data file named export_du_histo_complet.txt with multiple line like this :
"xxxxxxx.corp.xxxxxx.com";"GXTGENCDE";"GXGCDE001";"M_MAG105";"TERMINE";"2013-06-27";"14:08:00";"14:08:00";"00:00:01";"795691"
sql file with (because I use Unix Shell which call SQL File) :
LOAD DATA INFILE '/home2/soron/EXPORT_HISTO/export_du_histo_complet.txt'
INTO TABLE du_histo
FIELDS
TERMINATED BY ';'
ENCLOSED BY '"'
ESCAPED BY '\\'
LINES
STARTING BY ' '
TERMINATED BY '\n'
(server, sess, uproc, ug, etat, date_exploitation, debut_uproc, fin_uproc, duree, num_uproc)
I specified the table fields which i would import (my table has more columns)
Note that exist MySql bug, so you can't use variable to specify your INFILE.

MySQL BinLog Statement Retrieval

I have seven 1G MySQL binlog files that I have to use to retrieve some "lost" information. I only need to get certain INSERT statements from the log (ex. where the statement starts with "INSERT INTO table SET field1="). If I just run a mysqlbinlog (even if per database and with using --short-form), I get a text file that is several hundred megabytes, which makes it almost impossible to then parse with any other program.
Is there a way to just retrieve certain sql statements from the log? I don't need any of the ancillary information (timestamps, autoincrement #s, etc.). I just need a list of sql statements that match a certain string. Ideally, I would like to have a text file that just lists those sql statements, such as:
INSERT INTO table SET field1='a';
INSERT INTO table SET field1='tommy';
INSERT INTO table SET field1='2';
I could get that by running mysqlbinlog to a text file and then parsing the results based upon a string, but the text file is way too big. It just times out any script I run and even makes it impossible to open in a text editor.
Thanks for your help in advance.
I never received an answer, but I will tell you what I did to get by.
1. Ran mysqlbinlog to a textfile
2. Created a PHP script that uses fgets to read each line of the log
3. While looping through each line, the script parses it using the stristr function
4. If the line matches the string I am looking for, it logs the line to a file
It takes a while to run mysqlbinlog and the PHP script, but it no longer times out. I originally used fread in PHP, but that reads the entire file into memory and caused the script to crash on large (1G) log files. Now, it takes several minutes to run (I also set the max_execution_time variable to be larger), but it works like a charm. fgets gets one line at a time, so it doesn't take up nearly as much memory.