Importing Data MySQL - mysql

I have a huge dataset what is the faster way to upload data in MySQL PHP database and is there anyway to verify all datas are imported or not.
Any suggestion or hints will be greatly appreciate. Thanks.

If the data set is simply huge (can be transferred within hours), it is not worth the effort of finding an efficient way - any script should be able to do the job. I am assuming you are reading from some non-db format (eg. plain text) ? In that way, simply read, and insert.
If you require careful processing before you insert the rows, you might want to consider creating real objects in memory and their sub-objects first and then mapping them to rows and tables - Object-Relational data source patterns will be valuable here. This will, however, be much slower, and I would not recommend it unless it's absolutely necessary, especially if you are doing it just once.
For very fast access, some people wrote a direct binary blob of objects on the disk and then read it directly into an array, but that is available in languages like C/C++; I am not sure if/how it can be used in a scripted language. Again, this is good for READING the data back into memory, not transferring to DB.
The easiest way to verify that the data has been transferred is to compare the count(*) of the db with the number of items in your file. The more advanced way is to compute hash (eg. sha1) of primary key sets.

I used LOAD DATA, this is a standard MySql Loader Tools. It's work fine and faster. there are many options.
You can use :
data file named export_du_histo_complet.txt with multiple line like this :
"xxxxxxx.corp.xxxxxx.com";"GXTGENCDE";"GXGCDE001";"M_MAG105";"TERMINE";"2013-06-27";"14:08:00";"14:08:00";"00:00:01";"795691"
sql file with (because I use Unix Shell which call SQL File) :
LOAD DATA INFILE '/home2/soron/EXPORT_HISTO/export_du_histo_complet.txt'
INTO TABLE du_histo
FIELDS
TERMINATED BY ';'
ENCLOSED BY '"'
ESCAPED BY '\\'
LINES
STARTING BY ' '
TERMINATED BY '\n'
(server, sess, uproc, ug, etat, date_exploitation, debut_uproc, fin_uproc, duree, num_uproc)
I specified the table fields which i would import (my table has more columns)
Note that exist MySql bug, so you can't use variable to specify your INFILE.

Related

Importing and exporting TSVs with MySQL

I'm using a database with MySQL 5.7, and sometimes, data needs to be updated using a mixture of scripts and manual editing. Because people working with the database are usually not familiar with SQL, I'd like to export the data as a TSV, which then could be manipulated (for example with Python's pandas module) and then be imported back. I assume the standard way would be to directly connect to the database, but using TSVs has some upsides in this situation, I think. I've been reading the MySQL docs and some stackoverflow questions to find the best way to do this. I've found a couple of solutions, however, they all are somewhat inconvenient. I will list them below and explain my problems with them.
My question is: did I miss something, for example some helpful SQL commands or CLI options to help with this? Or are the solutions I found already the best when importing/exporting TSVs?
My example database looks like this:
Database: Export_test
Table: Sample
Field
Type
Null
Key
id
int(11)
NO
PRI
text_data
text
NO
optional
int(11)
YES
time
timestamp
NO
Example data:
INSERT INTO `Sample` VALUES (1,'first line\\\nsecond line',NULL,'2022-02-16 20:17:38');
The data contains an escaped newline, which caused a lot of problems for me when exporting.
Table: Reference
Field
Type
Null
Key
id
int(11)
NO
PRI
foreign_key
int(11)
NO
MUL
Example data:
INSERT INTO `Reference` VALUES (1,1);
foreign_key is referencing a Sample.id.
Note about encoding: As a caveat for people trying to do the same thing: If you want to export/import data, make sure that characters sets and collations are set up correctly for connections. This caused me some headache, because although the data itself is utf8mb4, the client, server and connection character sets were latin1, which caused some loss of data in some instances.
Export
So, for exporting, I found basically three solutions, and they all behave somewhat differently:
A: SELECT stdout redirection
mysql Export_test -e "SELECT * FROM Sample;" > out.tsv
Output:
id text_data optional time
1 first line\\\nsecond line NULL 2022-02-16 21:26:13
Pros:
headers are added, which makes it easy to use with external programs
formatting works as intended
Cons:
NULL is used for null values; when importing, \N is required instead; as far as I know, this can't be configured for exports
Workaround: replace NULL values when editing the data
B: SELECT INTO OUTFILE
mysql Export_test -e "SELECT * FROM Sample INTO OUTFILE '/tmp/out.tsv';"
Output:
1 first line\\\
second line \N 2022-02-16 21:26:13
Pros:
\N is used for null data
Cons:
escaped linebreaks are not handled correctly
headers are missing
file writing permission issues
Workaround: fix linebreaks manually; add headers by hand or supply them in the script; use /tmp/ as output directory
C: mysqldump with --tab (performs SELECT INTO OUTFILE behind the scenes)
mysqldump --tab='/tmp/' --skip-tz-utc Export_test Sample
Output, pros and cons: same as export variant B
Something that should be noted: the output is only the same as B, if --skip-tz-utc is used; otherwise, timestamps will be converted to UTC, and will be off after importing the data.
Import
Something I didn't realize it first, is that it's impossible to merely update data directly with LOAD INTO or mysqlimport, although that's something many GUI tools appear to be doing and other people attempted. For me as an beginner, this wasn't immediately clear from the MySQL docs. A workaround appears to be creating an empty table, import the data there and then updating the actual table of interest via a join. I also thought one could update individual columns with this, which again is not possible. If there are some other ways to achieve this, I would really like to know.
As far as I could tell, there are two options, which do pretty much the same thing:
LOAD INTO:
mysql Export_test -e "SET FOREIGN_KEY_CHECKS = 0; LOAD DATA INFILE '/tmp/Sample.tsv' REPLACE INTO TABLE Sample IGNORE 1 LINES; SET FOREIGN_KEY_CHECKS = 1;"
mysqlimport (performs LOAD INTO behind the scenes):
mysqlimport --replace Export_test /tmp/Sample.tsv
Notice: if there are foreign key constraints like in this example, SET FOREIGN_KEY_CHECKS = 0; needs to be performed (as far as I can tell, mysqlimport can't be directly used in these cases). Also, IGNORE 1 LINES or --ignore-lines can be used to skip the first line if the input TSV contains a header. For mysqlimport, the name of the input file without extension must be the name of the table. Again, file reading permissions can be an issue, and /tmp/ is used to avoid that.
Are there ways to make this process more convenient? Like, are there some options I can use to avoid the manual workarounds, or are there ways to use TSV importing to UPDATE entries without creating a temporary table?
What I ended up doing was using LOAD INTO OUTFILE for exporting, added a header manually and also fixed the malformed lines by hand. After manipulating the data, I used LOAD DATA INTO to update the data. In another case, I exported with SELECT to stdout redirection, manipulated the data and then added a script, which just created a file with a bunch of UPDATE ... WHERE statements with the corresponding data. Then I ran the resulting .sql in my database. Is the latter maybe the best option in this case?
Exporting and importing is indeed sort of clunky in MySQL.
One problem is that it introduces a race condition. What if you export data to work on it, then someone modifies the data in the database, then you import your modified data, overwriting your friend's recent changes?
If you say, "no one is allowed to change data until you re-import the data," that could cause an unacceptably long time where clients are blocked, if the table is large.
The trend is that people want the database to minimize downtime, and ideally to have no downtime at all. Advancements in database tools are generally made with this priority in mind, not so much to accommodate your workflow of taking the data out of MySQL for transformations.
Also what if the database is large enough that the exported data causes a problem because where do you store a 500GB TSV file? Does pandas even work on such a large file?
What most people do is modify data while it remains in the database. They use in-place UPDATE statements to modify data. If they can't do this in one pass (there's a practical limit of 4GB for a binary log event, for example), then they UPDATE more modest-size subsets of rows, looping until they have transformed the data on all rows of a given table.

Informix LOAD FROM file with header

I'm using Informix LOAD FROM command to bulk insert data from CSV files to a DB table, like:
LOAD FROM "file.csv" DELIMITER ";" INSERT INTO table_name(col1, col2, col3)
The problem is, the first line of each CSV file contains column headers. Is there any way to tell Informix that the first row shall be ignored?
No; there isn't a way to tell standard Informix LOAD statement to skip a header line. Note, too, that it won't remove quotes from around fields in CSV format and otherwise deal with things the way CSV format officially expects (though, since you have semicolon-separated values rather than comma-separated values, it is hard to know which rules are being followed — be leery of the treatment of backslashes too).
You might be able to use the Informix DB-Load utility (dbload) instead; it depends on whether your data is simply using ; in place of Informix's default | delimiter, or whether you have more of the semantics of CSV such as quotes around fields that need to be removed. If you want to get exotic, the Informix High-Performance Loader (HPL) can either handle it natively or be trained to handle it.
Alternatively, you could consider using my* SQLCMD program (it has been called sqlcmd a lot longer than Microsoft's johnny-come-lately of the same name) which allows you to specify:
LOAD FROM "file.csv" DELIMITER ";" SKIP 1 INSERT INTO table_name(col1, col2, col3);
SQLCMD also has an option FORMAT CSV (amongst other formats) that might, or might not, be relevant. It handles things like stripping quotes from around fields that the full CSV standard supports.
You'll need to have Informix ClientSDK and a C compiler (and the rest of a C development system) installed to build SQLCMD.
* Since SQLCMD is my program because I wrote it, any recommendation to use it is inherently biassed; you were warned.
You could also consider an 'external table' (CREATE EXTERNAL TABLE), but I'm not sure it is any better than the LOAD statement either with the formats it supports or with the ability to skip the first row of data.
When I Load CSV files using LOAD FROM into Informix I usually load to a temporary table which is all character columns which I then work with. You just delete the header row. Basically your just putting the whole file into a temp table which is easier to work with.

How can I turn a CSV file into a web-based SQL database table, without having any database already setup?

Currently, I have a CSV file with data in it. I want to turn it into a SQL table, so I can run SQL queries on it. I want the table to be within a web-based database that others in my organization can also access. What's the easiest way to go from CSV file to this end result? Would appreciate insight on setting the up database and table, giving others access, and getting data inside. Preferably PostgreSQL, but MySQL is fine too.
To create the table it depends on the number of columns you have. If you have only a few then do it manually:
CREATE TABLE <table name> (<variable name> <variable type (e.g. int or varchar(100)>, <etc.>)
If you have many columns you can open the csv file in excel and get 'SQL Converter for Excel' which will build a create statement for you using your column headings (and autodetect variable types too).
Loading data from a csv is also pretty straightforward:
LOAD DATA INFILE <filepath (e.g. 'C:/Users/<username>/Desktop/test.csv'>
INTO TABLE <table name>
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS; (Only use this line if you have column names included in the csv).
As for a web-based solution: https://cloud.google.com/products/cloud-sql/
That's a relatively open-ended question. A couple of noteworthy pointers off the top of my head:
MySQL allows you to store your data in different formats, one of them being CSV. That's a very straightforward solution if you're happy with it and don't mind a few limitations (see http://dev.mysql.com/doc/refman/5.0/en/csv-storage-engine.html).
Otherwise you can import your data into a table with a full-featured engine (see other answer(s) for details).
If you're happy with PostgreSQL and look for fully web based solution, have a look at Heroku.
There are a great many ways to make your data available through web services without accessing the back-end data store directly. Have a look at REST and SOAP for instance.
HTH

How to move data from one SQLite to MySQL with different designs?

The problem is:
I've got a SQLite database which is constantly being updated though a proprietary application.
I'm building an application which uses MySQL and the database design is very different from the one of SQLite.
I then have to copy data from SQLite to MySQL but it should be done very carefully as not everything should be moved, tables and fields have different names and sometimes data from one table goes to two tables (or the opposite).
In short, SQLite should behave as a client to MySQL inserting what is new and updating the old in an automated way. It doesn't need to be updating in real time; every X hours would be enough.
A google search gave me this:
http://migratedb.sourceforge.net/
And asking a friend I got information about the Multisource plugin (Squirrel SQL) in this page:
http://squirrel-sql.sourceforge.net/index.php?page=plugins
I would like to know if there is a better way to solve the problem or if I will have to make a custom script myself.
Thank you!
I recommend a custom script for this:
If it's not a one-to-one conversion between the tables and fields, tools might not help there. In your question, you've said:
...and sometimes data from one table goes to two tables (or the opposite).
If you only want the differences, then you'll need to build the logic for that unless every record in the SQLite db has timestamps.
Are you going to be updating the MySQL db at all? If not, are you okay to completely delete the MySQL db and refresh it every X hours with all the data from SQLite?
Also, if you are comfortable with a scripting language (like php, python, perl, ruby, etc.), they have API's for both SQLite and MySQL; it would be easy enough to build your own script which you can control customise more easily based on program logic. Especially if you want to run "conversions" between the data from one to the other and not just simple mapping.
I hope i understand you correctly, that you will flush the data which are stored in a SQLite DB periodicly to a MySQL DB. Right?
So this is how i would do it.
Create a Cron, which starts the script every x minutes.
Export the Data from SQLite into an CSV-File.
Do an LOAD DATA INFILE an import the CSV Data to MySQL
Code example for LOAD DATA INFILE
LOAD DATA INFILE 'PATH_TO_EXPORTED_CSV' REPLACE INTO TABLE your_table FIELDS TERMINATED BY ';' ENCLOSED BY '\"' LINES TERMINATED BY '\\n' IGNORE 1 LINES ( #value_column1, #unimportend_value, #value_column2, #unimportend_value, #unimportend_value, #value_column3) SET diff_mysql_column1 = #value_column1, diff_mysql_column2 = #value_column2, diff_mysql_column3 = #value_column3);
This Code you can query to as much db tables you want. Also you can change the variables #value_column1.
Im in a hurry. so thats it for now. ask if something is unclear.
Greets Michael

PhpMyAdmin data import performance issues

Originally, my question was related to the fact that PhpMyAdmin's SQL section wasn't working properly. As suggested in the comments, I realized that it was the amount of the input is impossible to handle. However, this didn't provide me with a valid solution of how to deal with the files that have (in my case - 35 thousand record lines) in format of (CSV):
...
20120509,126,1590.6,0
20120509,127,1590.7,1
20120509,129,1590.7,6
...
The Import option in PhpMyadmin is struggling just as the basic copy-paste input in SQL section does. This time, same as previously, it takes 5 minutes until the max execution time is called and then it stops. What is interesting tho, it adds like 6-7 thousand of records into the table. So that means the input actually goes through and does that almost successfully. I also tried halving the amount of data in the file. Nothing has changed however.
There is clearly something wrong now. It is pretty annoying to have to play with the data in php script when simple data import is not work.
Change your php upload max size.
Do you know where your php.ini file is?
First of all, try putting this file into your web root:
phpinfo.php
( see http://php.net/manual/en/function.phpinfo.php )
containing:
<?php
phpinfo();
?>
Then navigate to http://www.yoursite.com/phpinfo.php
Look for "php.ini".
To upload large files you need max_execution_time, post_max_size, upload_max_filesize
Also, do you know where your error.log file is? It would hopefully give you a clue as to what is going wrong.
EDIT:
Here is the query I use for the file import:
$query = "LOAD DATA LOCAL INFILE '$file_name' INTO TABLE `$table_name` FIELDS TERMINATED BY ',' OPTIONALLY
ENCLOSED BY '\"' LINES TERMINATED BY '$nl'";
Where $file_name is the temporary filename from php global variable $_FILES, $table_name is the table already prepared for import, and $nl is a variable for the csv line endings (default to windows line endings but I have an option to select linux line endings).
The other thing is that the table ($table_name) in my script is prepared in advance by first scanning the csv to determine column types. After it determines appropriate column types, it creates the MySQL table to receive the data.
I suggest you try creating the MySQL table definition first, to match what's in the file (data types, character lengths, etc). Then try the above query and see how fast it runs. I don't know how much of a factor the MySQL table definition is on speed.
Also, I have no indexes defined in the table until AFTER the data is loaded. Indexes slow down data loading.