I have a text file which has the following content (I have only shown the first few lines to illustrate that). They are in the form of key-value pair.
FIELD_A="Peter Kibbon",FIELD_B=31,FIELD_C="SCIENCE"
FIELD_A="James Gray",FIELD_B=28,FIELD_C="ARTS"
FIELD_A="Michelle Fernado",FIELD_B=25,FIELD_C="SCIENCE"
I want to import these data in a MySQL database using LOAD DATA FILE syntax to speed up the process. Is there any way that I can specify something like a field-prefix so that it can read the "value" part of each field.
I do not want to use MULTIPLE insert by parsing each line and each field, as this would slow down the process quite a bit.
If you know that all fields will be specified on each row and they are always in the same order, you can do something like this:
LOAD DATA INFILE 'your_file'
INTO TABLE table_name
FIELDS TERMINATED BY ','
(#col1_variable, #col2_variable, #col3_variable)
SET column1 = REPLACE(#col1_variable, 'FIELD_A=', ''),
column2 = REPLACE(#col2_variable, 'FIELD_B=', ''),
column3 = REPLACE(#col3_variable, 'FIELD_C=', '');
You load the content of the file in variables first, then operate on those variables and assign the result to your columns.
Read more about it here.
Related
I've got a process that creates a csv file that contains ONE set of values that I need to import into a field in a MySQL database table. This process creates a specific file name that identifies the values of the other fields in that table. For instance, the file name T001U020C075.csv would be broken down as follows:
T001 = Test 001
U020 = User 020
C075 = Channel 075
The file contains a single row of data separated by commas for all of the test results for that user on a specific channel and it might look something like:
12.555, 15.275, 18.333, 25.000 ... (there are hundreds, maybe thousands, of results per user, per channel).
What I'm looking to do is to import directly from the CSV file adding the field information from the file name so that it looks something like:
insert into results (test_no, user_id, channel_id, result) values (1, 20, 75, 12.555)
I've tried to use "Bulk Insert" but that seems to want to import all of the fields where each ROW is a record. Sure, I could go into each file and convert the row to a column and add the data from the file name into the columns preceding the results but that would be a very time consuming task as there are hundreds of files that have been created and need to be imported.
I've found several "import CSV" solutions but they all assume all of the data is in the file. Obviously, it's not...
The process that generated these files is unable to be modified (yes, I asked). Even if it could be modified, it would only provide the proper format going forward and what is needed is analysis of the historical data. And, the new format would take significantly more space.
I'm limited to using either MATLAB or MySQL Workbench to import the data.
Any help is appreciated.
Bob
A possible SQL approach to getting the data loaded into the table would be to run a statement like this:
LOAD DATA LOCAL INFILE '/dir/T001U020C075.csv'
INTO TABLE results
FIELDS TERMINATED BY '|'
LINES TERMINATED BY ','
( result )
SET test_no = '001'
, user_id = '020'
, channel_id = '075'
;
We need the comma to be the line separator. We can specify some character that we are guaranteed not to tppear to be the field separator. So we get LOAD DATA to see a single "field" on each "line".
(If there isn't trailing comma at the end of the file, after the last value, we need to test to make sure we are getting the last value (the last "line" as we're telling LOAD DATA to look at the file.)
We could use user-defined variables in place of the literals, but that leaves the part about parsing the filename. That's really ugly in SQL, but it could be done, assuming a consistent filename format...
-- parse filename components into user-defined variables
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(f.n,'T',-1),'U',1) AS t
, SUBSTRING_INDEX(SUBSTRING_INDEX(f.n,'U',-1),'C',1) AS u
, SUBSTRING_INDEX(f.n,'C',-1) AS c
, f.n AS n
FROM ( SELECT SUBSTRING_INDEX(SUBSTRING_INDEX( i.filename ,'/',-1),'.csv',1) AS n
FROM ( SELECT '/tmp/T001U020C075.csv' AS filename ) i
) f
INTO #ls_u
, #ls_t
, #ls_c
, #ls_n
;
while we're testing, we probably want to see the result of the parsing.
-- for debugging/testing
SELECT #ls_t
, #ls_u
, #ls_c
, #ls_n
;
And then the part about running of the actual LOAD DATA statement. We've got to specify the filename again. We need to make sure we're using the same filename ...
LOAD DATA LOCAL INFILE '/tmp/T001U020C075.csv'
INTO TABLE results
FIELDS TERMINATED BY '|'
LINES TERMINATED BY ','
( result )
SET test_no = #ls_t
, user_id = #ls_u
, channel_id = #ls_c
;
(The client will need read permission the .csv file)
Unfortunately, we can't wrap this in a procedure because running LOAD DATA
statement is not allowed from a stored program.
Some would correctly point out that as a workaround, we could compile/build a user-defined function (UDF) to execute an external program, and a procedure could call that. Personally, I wouldn't do it. But it is an alternative we should mention, given the constraints.
I have a large .csv file which I want to import into a MySQL database. I want to use the LOAD DATA INFILE statement on the basis of its speed.
Fields are terminated by -|-. Lines are terminated by |--. Currently I am using the following statement:
LOAD DATA LOCAL INFILE 'C:\\test.csv' INTO TABLE mytable FIELDS TERMINATED BY '-|-' LINES TERMINATED BY '|--'
Most rows look something like this: (Note that the strings are not enclosed by any characters.)
goodstring-|--|-goodstring-|-goodstring-|-goodstring|--
goodstring-|--|-goodstring-|-goodstring-|-|--
goodstring-|-goodstring-|-goodstring-|-goodstring-|-|--
goodstring is a string that does not contain - as a character. As you can see the second or last column might be empty. Rows like the above do not cause any problems. However the last column may contain - characters. There might be a row that looks something like this:
goodstring-|--|-goodstring-|-goodstring-|---|--
The string -- in the last column causes problems. MySQL detects six instead of five columns. It inserts a single - character into the fifth column and truncates the sixth. The correct DB row should be ("goodstring", NULL, "goodstring", "goodstring", "--").
A solution would be to tell MySQL to regard everything after the fourth field has been terminated as part of the fith column (up until the line is terminated). Is this possible with LOAD DATA INFILE? Are there methods that yield the same result, do not require the source file to be edited and perform about as fast as LOAD DATA INFILE?
This is my solution:
LOAD DATA
LOCAL INFILE 'C:\\test.csv'
INTO TABLE mytable
FIELDS TERMINATED BY '-|-'
LINES TERMINATED BY '-\r\n'
(col1, col2, col3, col4, #col5, col6)
SET #col5 = (SELECT CASE WHEN col6 IS NOT NULL THEN CONCAT(#col5, '-') ELSE LEFT(#col5, LENGTH(#col5) - 2) END);
It will turn a row like this one:
goodstring-|--|-goodstring-|-goodstring-|-|--
Into this:
("goodstring", "", "goodstring", "goodstring", NULL)
And a bad row like this one:
goodstring-|--|-goodstring-|-goodstring-|---|--
Into this:
("goodstring", "", "goodstring", "goodstring", "")
I simply drop the last column after the import.
I have a problem within load a CSV file into MySQL database
the CSV file is like this:
stuID,stuName,degreeProg
6902101,A001,null
6902102,A002,null
6902103,A003,null
6902104,A004,null
6902105,A005,null
I have write a script like this:
LOAD DATA LOCAL INFILE 'demo.csv' INTO TABLE `table`
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(`col1`, `col2`, `col3`)
What troubles me is that:
the third column in file is null but when loading into the table, it becomes 'null' (the string)
at the end of the file, there is a extra empty line, which will be also loaded and assigned with null
How should I write the script to deal with those 2 questions? (It is forbidden to modify the csv file) (and it's better to try to reduce the warning from MySQL when runs this script )
1) one option is to have the LOAD DATA assign the value of the third field (i.e. the string 'null') into a user defined variable, and use the"SET col = expr"form to assign a value to the columncol3`.
As an example:
(`col1`, `col2`, #field3)
SET col3 = IF(#field3='null',NULL,#field3)
2) There's no way to have MySQL LOAD DATA "skip" the last record in the file. To have MySQL ignore the last line, that would be better handled outside MySQL. For example, have MySQL LOAD DATA read from a named pipe, and have a separate concurrent process read the CSV file and write to that named pipe.
If you could modify the CSV file, simply add FIELDS ENCLOSED BY '"' and change null to NULL (upper case) to get them to load as NULL. Alternatively, use \N to load in NULL.
Also, obviously, delete the empty line at the end (which is most likely causing the warnings):
stuID,stuName,degreeProg
6902101,A001,\N
6902102,A002,\N
6902103,A003,\N
6902104,A004,\N
6902105,A005,\N
I have a mysql database with a single table, that includes an autoincrement ID, a string and two numbers. I want to populate this database with many strings, coming from a text file, with all numbers initially reset to 0.
Is there a way to do it quickly? I thought of creating a script that generates many INSERT statements, but that seems somewhat primitive and slow. Especially since mysql is on remote site.
Yes - use LOAD DATA INFILE docs are here Example :
LOAD DATA INFILE 'csvfile'
INTO TABLE table
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 0 LINES
(cola, colb, colc)
SET cold = 0,
cole = 0;
Notice the set line - here is where you set a default value.
Depending on your field separator change the line FIELDS TERMINATED BY ','
The other answers only respond to half of your question. For the other half (zeroing numeric columns):
Either:
Set the default value of your number columns to 0,
In your text file, simply delete the numeric values,
This will cause the field to be read by LOAD INFILE as null, and the defauly value will be assigned, which you have set to 0.
Or:
Once you have your data in the table, issue a MySQL command to zero the fields, like
UPDATE table SET first_numeric_column_name = 0, second_numeric_column_name = 0 WHERE 1;
And to sum everything up, use LOAD DATA INFILE.
If you have access to server's file system, you can utilize LOAD DATA
If you don't want to fight with syntax, easiest way (if on windows) is to use HeidiSQL
It has friendly wizard for this purpose.
Maybe I can help you with right syntax, if you post sample line from text file.
I recommend you to use SB Data Generator by Softbuilder (which I work for), download and install the free trial.
First, create a connection to your MySQL database then go to “Tools -> Virtual Data” and import your test data (the file must be in CSV format).
After importing the test data, you will be able to preview them and query them in the same window without inserting them into the database.
Now, if you want to insert test data into your database, go to “Tools -> Data Generation” and select "generate data from virtual data".
SB data generator from Softbuilder
I have created a database and a table. I have also created all the fields I will be needing. I have created 46 fields including one that is my ID for the row. The CSV doesn't contain the ID field, nor does it contain the headers for the columns. I am new to all of this but have been trying to figure this out. I'm not on here being lazy asking for the answer, but looking for directions.
I'm trying to figure out how to import the CSV but have it start importing data starting at the 2nd field, since I'm hoping the auto_increment will fill in the ID field, which is the first field I created.
I tried these instructions with no luck. Can anyone offer some insight?
The column names of your CSV file must match those of your table
Browse to your required .csv file
Select CSV using LOAD DATA options
Check box 'ON' for Replace table data with file
In Fields terminated by box, type ,
In Fields enclosed by box, "
In Fields escaped by box, \
In Lines terminated by box, auto
In Column names box, type column name separated by , like column1,column2,column3
Check box ON for Use LOCAL keyword.
Edit:
The CSV file is 32.4kb
The first row of my CSV is:
Test Advertiser,23906032166,119938,287898,,585639051,287898 - Engager - 300x250,88793551,Running,295046551,301624551,2/1/2010,8/2/2010,Active,,Guaranteed,Publisher test,Maintainer test,example-site.com,,All,All,,Interest: Dental; custom geo zones: City,300x250,-,CPM,$37.49 ,"4,415","3,246",3,0,$165.52 ,$121.69 ,"2,895",805,0,0,$30.18 ,$37.49 ,0,$0.00 ,IMPRESSIONBASED,NA,USD
You can have MySQL set values for certain columns during import. If your id field is set to auto increment, you can set it to null during import and MySQL will then assign incrementing values to it. Try putting something like this in the SQL tab in phpMyAdmin:
LOAD DATA INFILE 'path/to/file.csv' INTO TABLE your_table FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' SET id=null;
Please look at this page and see if it has what you are looking for. Should be all you need since you are dealing with just one table. MYSQL LOAD DATA INFILE
So for example you might do something like this:
LOAD DATA INFILE 'filepath' INTO TABLE 'tablename' FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' (column2, column3, column4);
That should give you an idea. There are of course more options that can be added as seen in the above link.
be sure to use LOAD DATA LOCAL INFILE if the import file is local. :)