Importing data to mysql - mysql

I have an excel file which contains 4 columns
Chapter ID int fk
Subject ID int fk
Title varchar(100)
Description mediumtext
Recap mediumtext
The Description and Recap columns contain html syntax. I am currently exporting this data to CSV and then trying to import it into mySQL. However, I receive an error fraying the "file could not be read".
I am guessing this is because of the ,,", etc, present in the html syntax.
Does anyone know how else can the data be imported into mySQL. I do not want to alter the html in in the columns.

Have you tried the "LOAD DATA INFILE". Look at the doc here: http://dev.mysql.com/doc/refman/5.0/en/load-data.html
Example:
LOAD DATA INFILE 'c:/nodes.csv' INTO TABLE friends_of_friends
FIELDS TERMINATED BY ';' ENCLOSED BY '' ESCAPED BY '\\';
The ESCAPED BY '\\' defines \ as the escape character. There are two backslashes needed because the backslash is also the standard MySQL escape character.

Related

How to load HTML character codes data correctly into My Sql database?

I receive a data file in ETL from the client and we load the data into Mysql database using Load Data file functionality and use CHARACTER SET
as utf8.
LOAD DATA LOCAL INFILE '${filePath}'
INTO TABLE test_staging
CHARACTER SET 'utf8'
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
(${testcolumns}) SET
first_name = #first_name;
Data from client
1|"test"|"name"|2
2|"asdf"|asdf&test|2
3|fun|value|2
When I load the above data into the database and it is inserting directly as strings instead of converting to html characters
Database Data
id first_name last_name
1 "test" "name"
2 "asdf" asdf&test
3 fun value
I tried changing the CHARACTER SET value from utf8 to latin1 but the result is same.
I also tried replacing the special characters while loading the data into database but the issue is, I receive all types of html characters data in the file. I cannot keep on adding the replace function for all of them.
LOAD DATA LOCAL INFILE '${filePath}'
INTO TABLE test_staging
CHARACTER SET 'utf8'
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
(${testcolumns}) SET
first_name = REPLACE(REPLACE(REPLACE(first_name,''','\''),'"','"'),'&','&');
Is there any character set which converts the html data and loads correctly?
Expected Database Data
id first_name last_name
1 "test" "name"
2 "asdf" asdf&test
3 fun value
Any help is appreciated... Thanks
The problem you are facing is not about character set. It happens because the software that your client use intentionally converts HTML special characters to their codes.
It is probably possible to convert them back using MySQL though I couldn't find a quick solution, but as you are handling this data with ETL the better option seems to be to use the external tool before you insert the data into the database. One of these for example:
cat input-with-specialchars.html | recode html..ascii
xmlstarlet unesc
perl -MHTML::Entities -pe 'decode_entities($_);'
etc.
or something else depending on what tools you have available in your system or which ones you can afford to install.

MySQL bulk load

I'm trying to load csv files into mysql table.
Delimiter : ,(comma)
As part of the source data few of the field values are enclosed in double quotes and inside the double quotes we have ,
There are few records for which / is part of the field data and we need to escape it.
By default / is getting escaped and when I specified the " as escape character " is getting escaped. As we have multiple special characters inside the same file, we need to escape multiple special characters.
Any suggestion
Eg:
id name location
1 A "Location , name here"
2 B "Different Location"
3 C Another Location
4 D Location / with escape character
LOAD DATA LOCAL INFILE 'data.csv' INTO TABLE table_name FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES;
I think it's not possible. Referring to LOAD DATA reference
Any of the field- or line-handling options can specify an empty string (''). If not empty, the FIELDS [OPTIONALLY] ENCLOSED BY and FIELDS ESCAPED BY values must be a single character.
Only a single char is supported for ESCAPED BY field.
My proposal is to use any programming language (e.g. PHP, C# etc.) for opening and processing file line-by-line using regexp

Load data from text file to DB

Data:
1|\N|"First\Line"
2|\N|"Second\Line"
3|100|\N
\N represents NULL in MYSQL & MariaDB.
I'm trying to load above data using LOAD DATA LOCAL INFILE method into a table named ID_OPR.
Table structure:
CREATE TABLE ID_OPR (
idnt decimal(4),
age decimal(3),
comment varchar(100)
);
My code looks like below:
LOAD DATA LOCAL INFILE <DATA FILE LOCATION> INTO TABLE <TABLE_NAME> FIELDS TERMINATED BY '|' ESCAPED BY '' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n';
Problem with this code is it aborts with error Incorrect decimal value: '\\N' For column <Column name>.
Question:
How to load this data with NULL values in second decimal column and also without loosing \(Backslash) from third string column?
I'm trying this is MariaDB which is similar to Mysql in most case.
Update:
The error i have mentioned appears like a warning and the data is actually getting loaded into table. But the catch here is with the text data.
For example: Incase of the third record above it is being loaded as \N itself into string column. But i want it to be NULL.
Is there any way to make the software to recognize this null value? Something like decode in oracle?
You can't have it both ways - either \ is an escape character or it is not. From MySQL docs:
If the FIELDS ESCAPED BY character is empty, no characters are escaped and NULL is output as NULL, not \N. It is probably not a good idea to specify an empty escape character, particularly if field values in your data contain any of the characters in the list just given.
So, I'd suggest a consistently formatted input file, however that was generated:
use \\ if you want to keep the backslash in the strings
make \ an escape character in your load command
OR
make strings always, not optionally, enclosed in quotes
leave escape character empty, as is
use NULL for nulls, not \N
BTW, this also explains the warnings you were experiencing loading \N in your decimal field.
Deal with nulls with blanks. that should fix it.
1||"First\Line"
2||"Second\Line"
3|100|
Thats how nulls are handled on CSVs and TSVs. And don't expect decimal datatype to go null as it stays 0, use int or bigint instead if needed. You should forget about "ESCAPED BY"; as long as string data is enclosed by "" that deals with the escaping problem.
we need three text file & 1 batch file for Load Data:
Suppose your file location 'D:\loaddata'
Your text file 'D:\loaddata\abc.txt'
1. D:\loaddata\abc.bad -- empty
2. D:\loaddata\abc.log -- empty
3. D:\loaddata\abc.ctl
a. Write Code Below for no separator
OPTIONS ( SKIP=1, DIRECT=TRUE, ERRORS=10000000, ROWS=5000000)
load data
infile 'D:\loaddata\abc.txt'
TRUNCATE
into table Your_table
(
a_column POSITION (1:7) char,
b_column POSITION (8:10) char,
c_column POSITION (11:12) char,
d_column POSITION (13:13) char,
f_column POSITION (14:20) char
)
b. Write Code Below for coma separator
OPTIONS ( SKIP=1, DIRECT=TRUE, ERRORS=10000000, ROWS=5000000)
load data
infile 'D:\loaddata\abc.txt'
TRUNCATE
into table Your_table
FIELDS TERMINATED BY ","
TRAILING NULLCOLS
(a_column,
b_column,
c_column,
d_column,
e_column,
f_column
)
4.D:\loaddata\abc.bat "Write Code Below"
sqlldr db_user/db_passward#your_tns control=D:\loaddata\abc.ctl log=D:\loaddata\abc.log
After double click "D:\loaddata\abc.bat" file you data will be load desire oracle table. if anything wrong check you "D:\loaddata\abc.bad" and "D:\loaddata\abc.log" file

exporting table with blob and utf8 string fields from MySql to MS Sql server 2014

I have a table with binary(32), blob and varchar utf-8 fields.
from one mysql server to another I export data via csv:
select * INTO OUTFILE '$tmp_fname'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
LINES TERMINATED BY '\\r\\n'
from mytable
and then
load data local infile '" . $mysqli->real_escape_string($glb) . "' ignore into table mytable_temp
CHARACTER SET 'utf8'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
LINES TERMINATED BY '\\n'
I tried the same with bulk insert in MSSQL, and for simple types it works (I get another table with int and char(44) in it).
But in this case I get import errors.
Some detials: I need to make automated export-import - that's why I use csv, both servers can communicate only via http (php scripts). Tables have millions of rows.
So here are questions.
How blob field data should be formated in csv so that MS SQL can import it?
How can I export utf8 string for MS SQL? I tried convert(myfield using utf16), is it what I need?
Also I tried to export data in utf16 and specify DATAFILETYPE ='widechar' in bulk insert, but it throws an error on first int value. It can't actually read widechar?
It's strange nobody from professionals knows an answer.
blob and binary fields should be exported as HEX(field_name) and then imported to mssql as is.
By the way, the most flexible way is using format file, as having exact csv you see where quotes appear and where do not.
format file description
to export utf8 and other non-ansi strings from mysql you should use HEX( (convert(str_field_name using utf16le) )) - you get all bytes as they are - then bulk import to intermediate mssql table and then merge or insert to the target table converting to nvarchar: cast(source.str_field_name AS nvarchar(any-length-you-need)). I spend about an hour before realized that mssql needs exactly litle endian.
Don't try to 'select ... into outfile' with encoding utf16le, just leave it default, as everything we've got casting all strings to hex binary is pure ansi output.
Bulk insert somehow refused to import widechar (utf16le) csv as well as utf16be. So maybe hex-bin solution is not that fast but it is universal.

Handling escaped field separators with MySQL's LOAD DATA INFILE

I am using LOAD DATA INFILE to import into a MySQL table twenty |-delimited .dat files. But some of the | field terminators are escaped with a backslash. The second field below is an example:
1001604|EMERITUS CORP\WA\|SC 13G|1996-02-13|edgar/data/1001604/0000728757-96-000006.txt
1001604|EMERITUS CORP\WA\|SC 13G|1996-02-14|edgar/data/1001604/0000903949-96-000038.txt
I get an error because the last field clashes with the DATE type declared for the next to last field. I can open the .dat file and escape the escape, but is there a better way?
I could use a stream editor to double all backslashes, but this seems like a bad idea. Can I safely change the FIELDS ESCAPED BY option to something other than "\", or is that a bad idea? Thanks!
Here is my LOAD DATA INFILE command:
LOAD DATA INFILE 'C:/users/richard/research/data/edgar/masterfiles/master_1996.dat'
INTO TABLE edgar.master
FIELDS TERMINATED BY '|'
IGNORE 1 LINES;
Adding ESCAPED BY '' to my FIELDS clause allows the query to complete without error. I will update if I find that this caused a silent fail.