MySQL treating eszett character as "ß" during LOAD DATA operation - mysql

I am trying to import a small data set of Berlin street addresses using MySQL's LOAD DATA statement. The problem is that after the import runs, all of the beautiful ß characters in the German street names have become ß sets.
Here's the create-table statement I used for this table:
CREATE TABLE `subway_distances` (
`STN` varchar(255) DEFAULT NULL,
`HNR` int(9) DEFAULT NULL,
`Lat` decimal(36,15) DEFAULT NULL,
`Lon` decimal(36,15) DEFAULT NULL,
`Distance` decimal(45,20) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8
... and here is my MySQL shell code:
charset utf8;
TRUNCATE TABLE subway_distances;
LOAD DATA LOCAL INFILE '/path/to/output.csv'
INTO TABLE berlin.subway_distances
FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\';
SELECT * FROM subway_distances LIMIT 0,10;
I have looked at output.csv in vim, and the eszett character appears to be fine there.
I am assuming that I simply need a different encoding declaration in MySQL, but I'm not sure where to start.
I am also assuming that collation doesn't matter yet, since I'm not comparing values -- just purely trying to get a valid import.

I found an answer to this relatively quickly. It looks like I just need to specify the CHARACTER SET value in my LOAD DATA statement. So the new statement looks like this:
LOAD DATA LOCAL INFILE '/path/to/output.csv'
INTO TABLE berlin.subway_distances
CHARACTER SET 'utf8'
FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\';

Related

LOAD DATA INFILE the entire file into a field

I am storing the contents of text files in a table
CREATE TABLE Pages
(
ID int(11) unsigned NOT NULL,
Text mediumtext COMPRESSED,
PRIMARY KEY(ID)
) ENGINE=ARIA DEFAULT CHARSET=utf8 COLLATE utf8_general_ci ROW_FORMAT=DYNAMIC
I try to INSERT each file's contents directly via LOAD DATA INFILE
LOAD DATA INFILE 'file.txt' INTO TABLE table
FIELDS TERMINATED BY '\0' LINES TERMINATED BY '' (Text)
SET ID=$id
The problem is that if I ideally use TERMINATED BY '', it gives the error
You can't use fixed rowlength with BLOBs; please use 'fields
terminated by'
I used '\0' assuming the null character does not exist in the text file. Although it works, is there a more standard way to do so?

Importing geometry data into MySQL using LOAD DATA LOCAL INFILE

I am trying to import CSV data into MySQL using the LOAD DATA LOCAL INFILE syntax. This is normally a fairly simple task, but in this case the data includes a geometry field that is tripping me up.
When I try to run the import, I'm getting errors like this:
SQLSTATE[HY000]: General error: 4079 Illegal parameter data type longblob for operation 'st_geometryfromwkb'
The records in my CSV file look like this:
'Somewhere', -0.574823, 51.150771, '0x0101000000000000000000F03F000000000000F0BF'
So I have a location name, lat/long coords and a geometry field in binary WKB format. (the example above is a simple geometry that translates to POINT(1,1); the real data has complex polygons, but the content isn't relevant; the issue is the same with this simple example).
My table looks like this:
CREATE TABLE IF NOT EXISTS `mapping` (
`id` int AUTO_INCREMENT PRIMARY KEY,
`location` varchar(80) DEFAULT NULL,
`longitude` double DEFAULT NULL,
`latitude` double DEFAULT NULL,
`geom` geometry NOT NULL,
INDEX mapping_by_location (location),
SPATIAL KEY `mapping_by_geom` (`geom`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
And my import query looks like this:
LOAD DATA LOCAL INFILE '{$file}'
REPLACE INTO TABLE `mapping`
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '\"'
LINES TERMINATED BY '\n'
(#col1, #col2, #col3, #col4)
SET
`location` = #col1,
`latitude` = #col2,
`longitude` = #col3,
`geom` = GeomFromWKB(#col4);
As stated, with this import query, I am getting the Illegal parameter data type error shown at the top of this question.
However, the query works if I replace the final line with a hard-coded geometry string, like this:
`geom` = GeomFromWKB(0x0101000000000000000000F03F000000000000F0BF);
Obviously this isn't any good, as I need the field to load from the CVS not a hard-coded value in the query, but it does work, whereas loading the same value from the CSV in #col4 does not.
I have tried a bunch of variations on this query - with and without the call to GeomFromWKB(), with both X'...' and 0x... notations for the hex value; nothing seems to work.
Can anyone give me some help please?

MySql Import strange behaviour

I am trying to import a csv file that is delimited by tabs.
Here is my query
LOAD DATA LOCAL INFILE 'c:/news.csv'
INTO TABLE news
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\r'
(url, storyid, title, date, details, category, author);
What happens is only the first column is loaded, (url).
The rest shows NULL. I have tried lines terminated by \n as well. Same result.
Any advice?
Table structure for table `news`
--
CREATE TABLE IF NOT EXISTS `news` (
`url` varchar(62) DEFAULT NULL,
`storyid` int(15) DEFAULT NULL,
`title` varchar(255) DEFAULT NULL,
`date` date DEFAULT NULL,
`details` longtext,
`category` varchar(255) DEFAULT NULL,
`author` varchar(110) DEFAULT NULL
)
It depends on the exact format of your .csv file but for Windows .csv format I always use
LINES TERMINATED BY '\r\n'
also (again depending on the data) try
FIELDS ESCAPED BY '\\' TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '\"'
If you're unsure of the exact nature of the data sometimes it is better to view it in hexadecimal to see how the lines are really terminated. I use Hexedit - http://www.hexedit.com/
Hope this helps.
Dermot
Like I said in the comments you can use '/r/n' for a new line.
However your csv file contains only 1 column, namely a full line of text.
That is probably also why only the first table column is filled and the rest is null.
LOAD DATA LOCAL INFILE 'c:/news.csv'
INTO TABLE news
COLUMNS TERMINATED BY ','
LINES TERMINATED BY '\r\n'
(url, storyid, title, date, details, category, author)
This worked.
Turned out that even though it looks tab separated, it is comma separated. Dermot was right that you need to view it in hexadecimal view to see how it is really deliminated.

MySQL how to specify string position with LOAD DATA INFILE

I have ASCII files with a static number of characters for each line with no delimiters. I'd like to use LOAD DATA INFILE to import into my table.
Example of file:
USALALABAMA
USARARKANSAS
USFLFLORIDA
The structure for this table:
country Char(2)
state Char(2)
name Varchar(70)
CREATE TABLE `states` (
`country` char(2) COLLATE latin1_general_ci NOT NULL,
`state` char(2) COLLATE latin1_general_ci NOT NULL,
`name` varchar(70) COLLATE latin1_general_ci NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1_general_ci COLLATE=latin1_general_ci;
Is it possible to specify a start and end position for each column?
According to the documentation, you can load a fixed format file without using a temporary table.
If the FIELDS TERMINATED BY and FIELDS ENCLOSED BY values are both empty (''), a fixed-row (nondelimited) format is used. With fixed-row format, no delimiters are used between fields (but you can still have a line terminator). Instead, column values are read and written using a field width wide enough to hold all values in the field. For TINYINT, SMALLINT, MEDIUMINT, INT, and BIGINT, the field widths are 4, 6, 8, 11, and 20, respectively, no matter what the declared display width is.
The positions are derived from the columns definitions, which in your case match the structure of the file. So you just need to do:
LOAD DATA INFILE 'your_file' INTO TABLE your_table
FIELDS TERMINATED BY ''
LINES TERMINATED BY '\r\n'
SET name = trim(name);
First create a temporary table which you will load all lines into it, then you can load the data from the temporary table into the main table and split to fields using substring
Something like this:
CREATE TEMPORARY TABLE tmp_lines
(countrystring TEXT);
LOAD DATA INFILE 'yourfilegoeshere' INTO TABLE tmp_lines
FIELDS TERMINATED BY ''
LINES TERMINATED BY '\r\n';
INSERT INTO main_table SELECT SUBSTRING(countrystring,1,2), SUBSTRING(countrystring,3, 2), SUBSTRING(countrystring,5) from tmp_lines;
Another way to do this is just assigning a variable and splitting it direct in your load.
LOAD DATA INFILE 'yourfilegoeshere' INTO TABLE main_table
LINES TERMINATED BY '\r\n' (#_var)
set
field1=TRIM(SUBSTR(#_var from 1 for 2)),
field2=TRIM(SUBSTR(#_var from 3 for 2)),
field3=TRIM(SUBSTR(#_var from 5 for 70));
Just be sure not to specify any field separator, otherwise you will have to use more variables, note that I'm using TRIM to clean data in the same statement.

SQL Load Data Special Characters

I'm trying to import data into SQL from a CSV in PHP My Admin so it may be a PHP My Admin problem. The problem i'm having is that some of the columns use special characters for instance:
"Adán, Antonio"
Ends up as just "Ad".
The column structure is as follows:
CREATE TABLE IF NOT EXISTS `players` (
`player_name` varchar(255) COLLATE utf8 NOT NULL,
`player_nation` varchar(255) CHARACTER SET utf8 NOT NULL,
`player_club` varchar(255) CHARACTER SET utf8 NOT NULL,
`player_position` varchar(255) CHARACTER SET utf8 NOT NULL,
`player_age` tinyint(2) NOT NULL,
`player_dob` varchar(32) CHARACTER SET utf8 NOT NULL,
`player_based` varchar(255) CHARACTER SET utf8 NOT NULL,
`player_id` int(10) unsigned NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8;
I'm guessing it's something to do with the character set but mysql.com just suggests to alter the table to characters set utf8 which it already is.
Any ideas how else I can prevent this?
UPDATE
Inserting into the database is fine, so i'm guessing it's not to do with my table structure. It seems to be specifically to do with importing from a CSV.
This is the query for load data, as generated by PHP My Admin
LOAD DATA LOCAL INFILE 'C:\\Windows\\Temp\\php21E4.tmp' INTO TABLE `players` FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\r\n'.
SOLUTION
I discovered the problem. My CSV was encoded with a Western Europe charset. After faffing around for a bit to convert it to UTF8 it imported just fine. Not an SQL problem at all.
Can you track where the truncation is happening, its quite possible that the issue isn't with your DB at all.
Try a simple insert into your DB table from the command line with the special chars and see if it succeeds.
Then try logging the various steps in the import to track where the issue occurs...