I have a problem within load a CSV file into MySQL database
the CSV file is like this:
stuID,stuName,degreeProg
6902101,A001,null
6902102,A002,null
6902103,A003,null
6902104,A004,null
6902105,A005,null
I have write a script like this:
LOAD DATA LOCAL INFILE 'demo.csv' INTO TABLE `table`
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(`col1`, `col2`, `col3`)
What troubles me is that:
the third column in file is null but when loading into the table, it becomes 'null' (the string)
at the end of the file, there is a extra empty line, which will be also loaded and assigned with null
How should I write the script to deal with those 2 questions? (It is forbidden to modify the csv file) (and it's better to try to reduce the warning from MySQL when runs this script )
1) one option is to have the LOAD DATA assign the value of the third field (i.e. the string 'null') into a user defined variable, and use the"SET col = expr"form to assign a value to the columncol3`.
As an example:
(`col1`, `col2`, #field3)
SET col3 = IF(#field3='null',NULL,#field3)
2) There's no way to have MySQL LOAD DATA "skip" the last record in the file. To have MySQL ignore the last line, that would be better handled outside MySQL. For example, have MySQL LOAD DATA read from a named pipe, and have a separate concurrent process read the CSV file and write to that named pipe.
If you could modify the CSV file, simply add FIELDS ENCLOSED BY '"' and change null to NULL (upper case) to get them to load as NULL. Alternatively, use \N to load in NULL.
Also, obviously, delete the empty line at the end (which is most likely causing the warnings):
stuID,stuName,degreeProg
6902101,A001,\N
6902102,A002,\N
6902103,A003,\N
6902104,A004,\N
6902105,A005,\N
Related
I am using the following command from the MySQL command line to try and import a csv file into one of my tables:
LOAD DATA INFILE 'file path' INTO TABLE table
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(BASEID, BIGID, StartDate, EndDate)
Some of my rows have no value for EndDate which I want to be represented as NULL values in the database.
Unfortunately when I execute the above command I get the following error:
for column 'EndDate' at row 141lue:
If I remove the rows with blank cells the command works, so it is clearly the blank values for EndDate which are causing the problem.
I have also tried changing my csv file so that the blank cells say NULL or \N. I have also tried the following command instead but I still get the same error message:
LOAD DATA INFILE 'file path' INTO TABLE Table
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(BASEID, BIGID, StartDate, #EndDate)
SET EndDate = nullif(#EndDate, ' ')
How can I load csv files which have some blank values? The suggest solutions I have seen on other posts don't seem to work as outlined above.
Is the issue that the value for the end date is missing, or that the column itself is missing? These are not the same thing. For the former case, I think LOAD DATA should be able to handle this, assuming that the target column for the end date can tolerate missing/null values.
What I suspect here is that some of your input lines look like this:
1,1,'2020-10-03'
That is, there is no fourth column present at all. If this be the case, then the most prudent thing to do here might be to run a simple regex over your input CSV flat file to fix these missing fourth column edge cases. You may try:
Find: ^([^,]+,[^,]+,'[^,]+')$
Replace: $1,
This would turn the sample line above into:
1,1,'2020-10-03',
Now, the date value is still missing, but at least LOAD DATA should detect that the line has four columns, instead of just three.
I have a tab delimited data file with many missing values and I need to import it into a table in mariadb(10.4.5).
I used this command:
load data infile 'c:/path to file/file.txt' into table table_name fields terminated by '\t' lines terminated by '\n' ignore 1 rows;
But I get this error:
SQL Error (1366): Incorrect double value: '' for column db_name.table_name.col_name1 at row 10
When I examine the text data file, col_name1 at row 10 is a missing value - ie. nothing between the two tab delimiters.
I have spent hours trying to solve this issue - I would appreciate any help: Is there any way of the data including importing missing values (empty strings) into the mysql table?
Do I need to pre-process the text file before using LOAD DATA INFILE? And if so, what would be the best way to pre-process?
Do I need to pre-process the text file before using LOAD DATA INFILE? And if so, what would be the best way to pre-process?
You must do it during the importing. Something like:
LOAD DATA INFILE 'c:/path to file/file.txt'
INTO TABLE table_name
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
-- the fields which values are set directly,
-- and intermediate variables for values which must be processed,
-- positioned according to CSV structure
(field1, field2, #variable3, field4, ...)
-- process the values in the variables and set fields values
SET field3 = CASE WHEN #variable3 = '' THEN 0 ELSE #variable3 END;
(field1, field2, #variable3, field4, ...) is the destination of data fields parsed from each line of source CSV file.
I.e. first parsed value from the source line which is currently processed will be assigned directly to the field field1 of destination table. The same with second value and field2.
The third value parsed will be assigned to user-defined local variable #variable3.
The 4th parsed value again will be assigned to the table field. And so on if more data and code is present.
After the whole line parsed due to specification explained above the next processing directive is executed: SET field3 = CASE WHEN #variable3 = '' THEN 0 ELSE #variable3 END.
It is simple. If a value of variable #variable3 was assigned to empty string, then the value 0 is assigned to the field field3 of the record currently parsed, else the value parsed from current line of source file is assigned to this field without modification.
After both lines processed the whole record (all fields which were assigned to some value to) are stored into one new record in destination table by common way (assigning defaults to non-listed fields, checks, triggers...).
After storing the record the next line from CSV is readed, parsed, processed, stored, then the next line ... and so on, until the end of file or some error.
I have a text file which has the following content (I have only shown the first few lines to illustrate that). They are in the form of key-value pair.
FIELD_A="Peter Kibbon",FIELD_B=31,FIELD_C="SCIENCE"
FIELD_A="James Gray",FIELD_B=28,FIELD_C="ARTS"
FIELD_A="Michelle Fernado",FIELD_B=25,FIELD_C="SCIENCE"
I want to import these data in a MySQL database using LOAD DATA FILE syntax to speed up the process. Is there any way that I can specify something like a field-prefix so that it can read the "value" part of each field.
I do not want to use MULTIPLE insert by parsing each line and each field, as this would slow down the process quite a bit.
If you know that all fields will be specified on each row and they are always in the same order, you can do something like this:
LOAD DATA INFILE 'your_file'
INTO TABLE table_name
FIELDS TERMINATED BY ','
(#col1_variable, #col2_variable, #col3_variable)
SET column1 = REPLACE(#col1_variable, 'FIELD_A=', ''),
column2 = REPLACE(#col2_variable, 'FIELD_B=', ''),
column3 = REPLACE(#col3_variable, 'FIELD_C=', '');
You load the content of the file in variables first, then operate on those variables and assign the result to your columns.
Read more about it here.
I'm trying to use mysql's LOAD DATA LOCAL INFILE syntax to load a .csv file into an existing table. Here is one record from my .csv file (with headers):
PROD, PLANT,PORD, REVN,A_CPN, A_CREV,BRDI, DTE, LTME
100100128144,12T1,2070000,04,3DB18194ACAA,05_01,ALA13320004,20130807,171442
The issue is that I want 3 extra things done during import:
A RECORDID INT NOT NULL AUTO_INTEGER PRIMARY_KEY field should be incremented as each row gets inserted (this table column and structure already exists within the mysql table)
DTE and LTME should be concatenated and converted to a mysql DATETIME format and inserted into an existing mysql column named TRANS_OCR
A CREATED TIMESTAMP field should be set to the current unix timestamp on row insertion (this table column and structure already exists as well within the mysql table)
I'm trying to import this data into the mysql table with the following command:
LOAD DATA LOCAL INFILE 'myfile.csv' INTO TABLE seriallog
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(FLEX_PN, FLEX_PLANT, FLEX_ORDID, FLEX_REV, CUST_PN, CUST_REV, SERIALID)
SET CREATED = CURRENT_TIMESTAMP;
I think I have the CREATED column set properly but the others are causing a mysql warning to be issued:
Warning: Out of range value for column 'FLEX_PN' at row 1
Warning: Row 1 was truncated; it contained more data than there were input columns
Can someone help me with the syntax, the LOAD DATA LOCAL INFILE module is confusing to me...
Figured out the proper syntax to make this work:
sql = """LOAD DATA LOCAL INFILE %s INTO TABLE seriallog_dev
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
LINES TERMINATED BY '\\n'
IGNORE 1 LINES
(FLEX_PN, FLEX_PLANT, FLEX_ORDID, FLEX_REV, CUST_PN, CUST_REV, SERIALID, #DTE, #LTME)
SET RECORDID = NULL,
TRANS_OCR = STR_TO_DATE(CONCAT(#DTE,'',#LTME), "%%Y%%m%%d%%H%%i%%s"),
CREATED = CURRENT_TIMESTAMP;"""
params = (file,)
self.db.query( sql, params )
Mind you--this is done with python's mysqldb module.
CAVEAT
The only issue with this solution is that for some reason my bulk insert only inserts the first 217 rows of data from my file. My total file size is 19KB so I can't imagine that it is too large for the mysql buffers... so what gives?
more info
Also, I just tried this syntax directly within the msyql-server CLI and it works for all 255 records. So, obviously it is some problem with python, the python mysqldb module, or the mysql connection that the mysqldb module makes...
DONE
I JUST figured out the problem, it had nothing to do with the load data local infile command but rather the method I was using to convert my original .dbf file into the .csv before attempting to import the .csv. For some reason the mysql import method was running on the .csv before .dbf to .csv conversion method finished -- resulting in a partial data set being found in the .csv file and imported... sorry to waste everyone's time!
I am using the following statement to load data from a file into a table:
LOAD DATA LOCAL INFILE '/home/100000462733296__Stats"
INTO TABLE datapoints
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
(uid1, uid2, status);
Now, if I want to enter a custom value into uid1, say 328383 without actually asking it to read it from a file, how would I do that? There are about 10 files and uid1 is the identifier for each of these files. I am looking for something like this:
LOAD DATA LOCAL INFILE '/home/100000462733296__Stats"
INTO TABLE datapoints
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
(uid1="328383", uid2, status);
Any suggestions?
The SET clause can be used to supply values not derived from the input file:
LOAD DATA LOCAL INFILE '/home/100000462733296__Stats"
INTO TABLE datapoints
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
(uid1, uid2, status)
SET uid1 = '328383';
It's not clear what the data type of uid1 is, but being that you enclosed the value in double quotes I assumed it's a string related data type - remove the single quotes if the data type is numeric.
There's more to read on what the SET functionality supports in the LOAD FILE documentation - it's a little more than 1/2 way down the page.
You could use a python interactive shell instead of MySQL shell to interactvely provide values for MySQL tables.
Install the python inerpreter from python.org (only needed if you are under windows, otherwise you have it already), and the mysql connector from http://sourceforge.net/projects/mysql-python/files/ (ah, I see you are on Lunux/Unix --just install teh mysqldb package then)
After that, you type these three lines in the python shell:
import MySQLdb
connection = MySQLdb.connect(" <hostname>", "< user>", "<pwd>", [ "<port>"] )
cursor = connection.cursor
Adter that you can use the cursor.execute method to issue SQL statements, but retaining th full flexibility of python to change your data.
For example, for this specific query:
myfile = open("/home/100000462733296__Stats")
for line in file:
uid1, uid2, status = line.split("|")
status = status.strip()
cursor.execute("""INSERT INTO datapoints SET uid1="328383", uid2=%s, status=%s""" %(uid2, status) )
voilá !
(maybe with a try: clause around the the "line.split " line to avoid an exception on the last line)
If you don't already, you may learn Python in under one hour with the tutorial at python.org
-- it is really worth it, even if the only things you do at computers is to import data into databases.
2 quick thought (one might be applicable :)):
change the value of uid1 in the file to 328383 in every line.
temporarily change the uid1 column in the table to be non-mandatory, load the contents of the file, then run a query that sets the value to 328383 in every row. Finally, reset the column to mandatory.