I'm using Informix LOAD FROM command to bulk insert data from CSV files to a DB table, like:
LOAD FROM "file.csv" DELIMITER ";" INSERT INTO table_name(col1, col2, col3)
The problem is, the first line of each CSV file contains column headers. Is there any way to tell Informix that the first row shall be ignored?
No; there isn't a way to tell standard Informix LOAD statement to skip a header line. Note, too, that it won't remove quotes from around fields in CSV format and otherwise deal with things the way CSV format officially expects (though, since you have semicolon-separated values rather than comma-separated values, it is hard to know which rules are being followed — be leery of the treatment of backslashes too).
You might be able to use the Informix DB-Load utility (dbload) instead; it depends on whether your data is simply using ; in place of Informix's default | delimiter, or whether you have more of the semantics of CSV such as quotes around fields that need to be removed. If you want to get exotic, the Informix High-Performance Loader (HPL) can either handle it natively or be trained to handle it.
Alternatively, you could consider using my* SQLCMD program (it has been called sqlcmd a lot longer than Microsoft's johnny-come-lately of the same name) which allows you to specify:
LOAD FROM "file.csv" DELIMITER ";" SKIP 1 INSERT INTO table_name(col1, col2, col3);
SQLCMD also has an option FORMAT CSV (amongst other formats) that might, or might not, be relevant. It handles things like stripping quotes from around fields that the full CSV standard supports.
You'll need to have Informix ClientSDK and a C compiler (and the rest of a C development system) installed to build SQLCMD.
* Since SQLCMD is my program because I wrote it, any recommendation to use it is inherently biassed; you were warned.
You could also consider an 'external table' (CREATE EXTERNAL TABLE), but I'm not sure it is any better than the LOAD statement either with the formats it supports or with the ability to skip the first row of data.
When I Load CSV files using LOAD FROM into Informix I usually load to a temporary table which is all character columns which I then work with. You just delete the header row. Basically your just putting the whole file into a temp table which is easier to work with.
Related
I'm using a database with MySQL 5.7, and sometimes, data needs to be updated using a mixture of scripts and manual editing. Because people working with the database are usually not familiar with SQL, I'd like to export the data as a TSV, which then could be manipulated (for example with Python's pandas module) and then be imported back. I assume the standard way would be to directly connect to the database, but using TSVs has some upsides in this situation, I think. I've been reading the MySQL docs and some stackoverflow questions to find the best way to do this. I've found a couple of solutions, however, they all are somewhat inconvenient. I will list them below and explain my problems with them.
My question is: did I miss something, for example some helpful SQL commands or CLI options to help with this? Or are the solutions I found already the best when importing/exporting TSVs?
My example database looks like this:
Database: Export_test
Table: Sample
Field
Type
Null
Key
id
int(11)
NO
PRI
text_data
text
NO
optional
int(11)
YES
time
timestamp
NO
Example data:
INSERT INTO `Sample` VALUES (1,'first line\\\nsecond line',NULL,'2022-02-16 20:17:38');
The data contains an escaped newline, which caused a lot of problems for me when exporting.
Table: Reference
Field
Type
Null
Key
id
int(11)
NO
PRI
foreign_key
int(11)
NO
MUL
Example data:
INSERT INTO `Reference` VALUES (1,1);
foreign_key is referencing a Sample.id.
Note about encoding: As a caveat for people trying to do the same thing: If you want to export/import data, make sure that characters sets and collations are set up correctly for connections. This caused me some headache, because although the data itself is utf8mb4, the client, server and connection character sets were latin1, which caused some loss of data in some instances.
Export
So, for exporting, I found basically three solutions, and they all behave somewhat differently:
A: SELECT stdout redirection
mysql Export_test -e "SELECT * FROM Sample;" > out.tsv
Output:
id text_data optional time
1 first line\\\nsecond line NULL 2022-02-16 21:26:13
Pros:
headers are added, which makes it easy to use with external programs
formatting works as intended
Cons:
NULL is used for null values; when importing, \N is required instead; as far as I know, this can't be configured for exports
Workaround: replace NULL values when editing the data
B: SELECT INTO OUTFILE
mysql Export_test -e "SELECT * FROM Sample INTO OUTFILE '/tmp/out.tsv';"
Output:
1 first line\\\
second line \N 2022-02-16 21:26:13
Pros:
\N is used for null data
Cons:
escaped linebreaks are not handled correctly
headers are missing
file writing permission issues
Workaround: fix linebreaks manually; add headers by hand or supply them in the script; use /tmp/ as output directory
C: mysqldump with --tab (performs SELECT INTO OUTFILE behind the scenes)
mysqldump --tab='/tmp/' --skip-tz-utc Export_test Sample
Output, pros and cons: same as export variant B
Something that should be noted: the output is only the same as B, if --skip-tz-utc is used; otherwise, timestamps will be converted to UTC, and will be off after importing the data.
Import
Something I didn't realize it first, is that it's impossible to merely update data directly with LOAD INTO or mysqlimport, although that's something many GUI tools appear to be doing and other people attempted. For me as an beginner, this wasn't immediately clear from the MySQL docs. A workaround appears to be creating an empty table, import the data there and then updating the actual table of interest via a join. I also thought one could update individual columns with this, which again is not possible. If there are some other ways to achieve this, I would really like to know.
As far as I could tell, there are two options, which do pretty much the same thing:
LOAD INTO:
mysql Export_test -e "SET FOREIGN_KEY_CHECKS = 0; LOAD DATA INFILE '/tmp/Sample.tsv' REPLACE INTO TABLE Sample IGNORE 1 LINES; SET FOREIGN_KEY_CHECKS = 1;"
mysqlimport (performs LOAD INTO behind the scenes):
mysqlimport --replace Export_test /tmp/Sample.tsv
Notice: if there are foreign key constraints like in this example, SET FOREIGN_KEY_CHECKS = 0; needs to be performed (as far as I can tell, mysqlimport can't be directly used in these cases). Also, IGNORE 1 LINES or --ignore-lines can be used to skip the first line if the input TSV contains a header. For mysqlimport, the name of the input file without extension must be the name of the table. Again, file reading permissions can be an issue, and /tmp/ is used to avoid that.
Are there ways to make this process more convenient? Like, are there some options I can use to avoid the manual workarounds, or are there ways to use TSV importing to UPDATE entries without creating a temporary table?
What I ended up doing was using LOAD INTO OUTFILE for exporting, added a header manually and also fixed the malformed lines by hand. After manipulating the data, I used LOAD DATA INTO to update the data. In another case, I exported with SELECT to stdout redirection, manipulated the data and then added a script, which just created a file with a bunch of UPDATE ... WHERE statements with the corresponding data. Then I ran the resulting .sql in my database. Is the latter maybe the best option in this case?
Exporting and importing is indeed sort of clunky in MySQL.
One problem is that it introduces a race condition. What if you export data to work on it, then someone modifies the data in the database, then you import your modified data, overwriting your friend's recent changes?
If you say, "no one is allowed to change data until you re-import the data," that could cause an unacceptably long time where clients are blocked, if the table is large.
The trend is that people want the database to minimize downtime, and ideally to have no downtime at all. Advancements in database tools are generally made with this priority in mind, not so much to accommodate your workflow of taking the data out of MySQL for transformations.
Also what if the database is large enough that the exported data causes a problem because where do you store a 500GB TSV file? Does pandas even work on such a large file?
What most people do is modify data while it remains in the database. They use in-place UPDATE statements to modify data. If they can't do this in one pass (there's a practical limit of 4GB for a binary log event, for example), then they UPDATE more modest-size subsets of rows, looping until they have transformed the data on all rows of a given table.
I have a huge dataset what is the faster way to upload data in MySQL PHP database and is there anyway to verify all datas are imported or not.
Any suggestion or hints will be greatly appreciate. Thanks.
If the data set is simply huge (can be transferred within hours), it is not worth the effort of finding an efficient way - any script should be able to do the job. I am assuming you are reading from some non-db format (eg. plain text) ? In that way, simply read, and insert.
If you require careful processing before you insert the rows, you might want to consider creating real objects in memory and their sub-objects first and then mapping them to rows and tables - Object-Relational data source patterns will be valuable here. This will, however, be much slower, and I would not recommend it unless it's absolutely necessary, especially if you are doing it just once.
For very fast access, some people wrote a direct binary blob of objects on the disk and then read it directly into an array, but that is available in languages like C/C++; I am not sure if/how it can be used in a scripted language. Again, this is good for READING the data back into memory, not transferring to DB.
The easiest way to verify that the data has been transferred is to compare the count(*) of the db with the number of items in your file. The more advanced way is to compute hash (eg. sha1) of primary key sets.
I used LOAD DATA, this is a standard MySql Loader Tools. It's work fine and faster. there are many options.
You can use :
data file named export_du_histo_complet.txt with multiple line like this :
"xxxxxxx.corp.xxxxxx.com";"GXTGENCDE";"GXGCDE001";"M_MAG105";"TERMINE";"2013-06-27";"14:08:00";"14:08:00";"00:00:01";"795691"
sql file with (because I use Unix Shell which call SQL File) :
LOAD DATA INFILE '/home2/soron/EXPORT_HISTO/export_du_histo_complet.txt'
INTO TABLE du_histo
FIELDS
TERMINATED BY ';'
ENCLOSED BY '"'
ESCAPED BY '\\'
LINES
STARTING BY ' '
TERMINATED BY '\n'
(server, sess, uproc, ug, etat, date_exploitation, debut_uproc, fin_uproc, duree, num_uproc)
I specified the table fields which i would import (my table has more columns)
Note that exist MySql bug, so you can't use variable to specify your INFILE.
I haven't seen any data type that can store a file in SQL. Is there something like that? What I'm particularly talking about is that I want to insert into my table a source code. What is the best method to do it? It can be either stored in my database as a nicely formatted text, or better (what I actually want) to store it as a single file. Please note that I'm using MySQL.
It is best not to store a file in your SQL database but to store a path to the file in the server or any other UNC path that your application can retrieve by itself and do with it what ever is unnecessary.
see this: https://softwareengineering.stackexchange.com/questions/150669/is-it-a-bad-practice-to-store-large-files-10-mb-in-a-database
and this:
Better way to store large files in a MySQL database?
and if you still want to store the file on the DB.. here is an example:
http://mirificampress.com/permalink/saving_a_file_into_mysql
If you can serialized the file you can store it as binary and then deserialize when needed
http://dev.mysql.com/doc/refman/5.0/en/binary-varbinary.html
You can also use a BLOB (http://dev.mysql.com/doc/refman/5.0/en/blob.html) which has some differences. Normally I just store the file in the filesystem and a pointer in the DB, which makes serving it back via something like HTTP a bit easier and doesn't bloat up the Database.
Storing the file in a table only makes sense if you need to do searches in that code. In other cases, you should only store a file's URL.
If you want to store a text file, use the TEXT datatype. Since it is a source code, you may consider using the ASCII character set to save space - but be aware that this will cause character set conversions during your queries, and this affects performances. Also, if it is ASCII you can use REGEXP for searches (that operator doesnt work with multi-byte charsets).
To load the file, if the file is on the same server as MySQL, you can use the FILE() function within an INSERT.
I hava text file full of values like this:
The first line is a list of column names like this:
col_name_1, col_name_2, col_name_3 ......(600 columns)
and all the following columns have values like this:
1101,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1101,1,3.86,65,0.46418,65,0.57151...
What is the best way to import this into mysql?
Specifically how to come up with the proper CREATE TABLE command so that the data will load itself properly? What is the best generic data type which would take in all the above values like 1101 or 3.86 or 0.57151. I am not worried about the table being inefficient in terms of storage as I need this for a one time usage.
I have tried some of the suggestions in other related questions like using Phpmyadmin (it crashes I am guessing due to the large amount of data)
Please help!
Data in CSV files is not normalized; those 600 columns may be spread across a couple of related tables. This is the recommended way of treating those data. You can then use fgetcsv() to read CSV files line-by-line in PHP.
To make MySQL process the CSV, you can create a 600 column table (I think) and issue a LOAD DATA LOCAL INFILE statement (or perhaps use mysqlimport, not sure about that).
The most generic data type would have to be VARCHAR or TEXT for bigger values, but of course you would lose semantics when used on numbers, dates, etc.
I noticed that you included the phpmyadmin tag.
PHPMyAdmin can handle this out of box. It will decide "magically" which types to make each column, and will CREATE the table for you, as well as INSERT all the data. There is no need to worry about LOAD DATA FROM INFILE, though that method can be more safe if you want to know exactly what's going on without relying on PHPMyAdmin's magic tooling.
Try convertcsvtomysql, just upload your csv file and then you can download and/or copy the mysql statement to create the table and insert rows.
I have a text file to be imported in a MySQL table. The columns of the files are comma delimited. I set up an appropriate table and I used the command:
load data LOCAL INFILE 'myfile.txt' into table mytable FIELDS TERMINATED BY ‘,’;
The problem is, there are several spaces in the text file, before and after the data on each column, and it seems that the spaces are all imported in the tables (and that is not what I want). Is there a way to load the file without the empty spaces (other than processing each row of the text file before importing in MySQL)?
As far as I understand, there's no way to do this during the actual load of the data file dynamically (I've looked, as well).
It seems the best way to handle this is to either use the SET clause with the TRIM
function
("SET column2 = TRIM(column2)")
or run an update on the string columns after loading, using the TRIM() function.
You can also create a stored procedure using prepared statements to run the TRIM function on all columns in a specified table, immediately after loading it.
You would essentially pass in the table name as a variable, and the sp would use the information_schema database to determine which columns to upload.
If you can use .NET, CSVReader is a great option(http://www.codeproject.com/KB/database/CsvReader.aspx). You can read data from a CSV and specify delimiter, trimming options, etc. In your case, you could choose to trim left and right spaces from each value. You can then either save the result to a new text file and import it into the database, or loop through the CsvReader object and insert each row into the database directly. The performance of CsvReader is impressive. Hope this helps.