Copy special character in AWS Redshift - mysql

I'm unable to load special character row in AWS Redshift.
Getting an error: String contains invalid or unsupported UTF8 codepoints. Bad UTF8 hex sequence: c8 4d (error 4)
The string causing the problem is: Crème (the è).
For a temporary fix, I am using:
copy dev.table (a,
b,
c,
d) from 's3://test-bucket/redshift_data_source/test_data.csv'
CREDENTIALS 'aws_access_key_id=xxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxx' CSV delimiter ',' IGNOREHEADER 1 COMPUPDATE OFF acceptinvchars;
acceptinvchars is accepting them as varchar but putting ? on those characters. How to read them as is?

The best solution seems to be to convert your source data to UTF-8. It is currently saved using some other encoding.

Related

What is the correct character set to use for '\xE7'

I am trying to load data from a file into a MySQL table using
load data infile 'outfile2.txt'
into table parsed_ingredients
fields terminated by ","
lines terminated by "\n";
All appears fine until I reach a line where this error gets thrown
ERROR 1366 (HY000): Incorrect string value: '\xE7ao' for column 'ingredient' at row 5875
I looked at the line and it's
ç (c-cedilla)
I have tried all sorts of combinations of encoding
utf8
latin1
uft8mb4
They are all leading to the same error. What is the correct encoding to use and what needs to be set to this encoding?
Any of cp1250, cp1256, dec8, latin1, latin2, latin5 map the single byte E7 to ç. Use latin1.
Apparently your client is using E7, so you need to tell MySQL that fact. Do that in the LOAD DATA statement with this between lines 2 and 3:
CHARACTER SET latin1
Meanwhile, the column in the table can be latin1 or utf8 or utf8mb4; it does not matter. (Or, at least, it does not matter to this Question.) The E7 will be translated to the suitable encoding for the table column. For utf8/utf8mb4, expect to find hex C2E7 in the table.

Special character data migration

I have to migrate a database from Oracle to MySql having billions of data. I found a strategy to create a schema and get the data in CSV from Oracle and load data to MySQL. I have created a CSV file with a delimiter of a quote(") and field terminated with a comma(,). Now the problem is that CSV file contains special character which is not going to be imported in MySql.
I am using the command:
LOAD DATA LOCAL infile 'C:/Users/NNCP4659/export.csv' INTO TABLE dbk_address_master
CHARACTER SET utf8 fields terminated BY "," enclosed by '"' lines terminated
BY "\r\n"(id, country_code,address,city_id,latitude,longitude,
#is_active,google_address,old_address,building_number,street_name,created_by)
set is_active=cast(#is_active as signed);
My data is like:
4113973,"CHE","167 Bernerstrasse Süd","57066","47.3943271","8.4865849",1,"Bernerstrasse Süd 167, 8048 Zürich,
Switzerland","167 Bernerstrasse Süd","Y","167","Bernerstrasse Süd","migration"
And error is:
ERROR 1300 (HY000): Invalid utf8 character string: '"167
Bernerstrasse S'
167 Bernerstrasse S looks like the truncation of 167 Bernerstrasse Süd at the first non-utf8 character.
You have specified that the incoming data is utf8 via
LOAD DATA ... CHARACTER SET utf8 ...
I conclude that the incoming file is not encoded correctly. It is probably latin1, in which case the hex would be FC. Assuming this is the case, you should switch to
LOAD DATA ... CHARACTER SET latin1 ...
It does not matter if the CHARACTER SET in the target column is not latin1; MySQL will transcode it in flight.
(Alternatively, you could change the incoming data to have utf8 (hex: C3BC), but that may be more hassle.)
Reference: "truncated" in Trouble with UTF-8 characters; what I see is not what I stored
(As for how to check the hex, or do SHOW CREATE TABLE, we need to know what OS you are using and what tools you have available.)

Unicode Error while loading data from csv to Greenplum

I have a csv file and need to load it to Greenplum DB.
My code looks like this:
CREATE TABLE usr_wrk.CAR(
brand varchar(255),
model varchar(255),
ID INTEGER
);
COPY usr_wrk.CAR FROM '...Car.csv' DELIMITER ',' CSV HEADER
But I get this error:
[22025] ERROR: invalid Unicode escape: Unicode escapes must be full-length: \uXXXX or \UXXXXXXXX.
Rows of a csv file looks, for example, like:
Jaguar,XJ,1
Or
Citroen,C4,91
I replaced all non-latin words, there are no NULL or empty values, but it still appears. Does anybody have thoughts on this?
P.S.
I don't have admin rights and can make/drop and rule tables only in this schema.
You might try one of the following:
copy usr_wrk.car from .../Car.csv DELIMITER ',' ESCAPE as 'OFF' NULL as '' CSV HEADER;
OR
copy usr_wrk.car from .../Car.csv DELIMITER ',' ESCAPE as '\' NULL as '' CSV HEADER;
Default escape is a double quote for CSV format. Turning it off or setting it to the default TEXT format escape (a backslash) may get you around this. You could also remove the CSV header from the file and declare it as TEXT file with a comma delimiter to avoid having to specify the ESCAPE character.
Are you sure there are no special characters around the car names? Thinking specifically of umlauts or grave accents that would make the data multibyte and trigger that error.
You might try doing: head Car.csv | oc -c | more and see if any multibyte characters show up in your file (this assumes you are on a Linux system).
If it is possible for you to do, you might try using the GPLOAD utility to load the file. You can specify the ENCODING of the data file as 'LATIN1' which may get you past the UTF error you are hitting.
Hope this helps.
Jim

MySQL Invalid UTF8 character string when importing csv table

I want to import an .csv file into MySQL Database by:
load data local infile 'C:\\Users\\t_lichtenberger\\Desktop\\tblEnvironmentLog.csv'
into table tblenvironmentlog
character set utf8
fields terminated by ';'
lines terminated by '\n'
ignore 1 lines;
The .csv file looks like:
But I am getting the following error and I cannot explain why:
Error Code: 1300. Invalid utf8 character string: 'M'
Any suggestions?
Nothing else I tried worked for me, including ensuring that my .csv was saved with UTF-8 encoding.
This worked:
When using LOAD DATA LOCAL INFILE, set CHARACTER SET latin1 instead of CHARACTER SET utf8mb4 as shown in https://dzone.com/articles/mysql-57-utf8mb4-and-the-load-data-infile
Here is a full example that worked for me:
TRUNCATE homestead_daily.answers;
SET FOREIGN_KEY_CHECKS = 0;
TRUNCATE homestead_daily.questions;
SET FOREIGN_KEY_CHECKS = 1;
LOAD DATA LOCAL INFILE 'C:/Users/me/Desktop/questions.csv' INTO TABLE homestead_daily.questions
CHARACTER SET latin1
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(type, question, created_at, updated_at);
SELECT * FROM homestead_daily.questions;
See what the settings for the export were. Look for "UTF-8".
This suggests that "Truncated text" is caused by the data not being encoded as utf8mb4. Outside MySQL, "look for "UTF-8". (Inside, MySQL, utf8 and utf8mb4 work equally well for all European character sets, so the ü should not be a problem.
If it was exported as "cp1252" (or any of a number of encodings), the byte for ü would not be valid for utf8mb4, leading to truncation.
If this analysis is correct, there are two solutions:
Plan A: Export as UTF-8.
Plan B: Import as latin1. (You do not need to change the column/table definition, just the LOAD DATA.)
Just open the csv file in your text editor (like Nodepad++)
and change the file Encoding to UTF-8
then import your csv file
It's complaining about 'M' but I think it's in München and the actual problematic character is next one, the umlaut 'ü'.
One simple way to test would be to try loading a file with just the first 2 rows & see if that works. Then add the 3rd row, try again & see if that fails.
If you can't or don't want to replace these special characters in your data, then you'll need to start investigating the character sets configured in your CSV file, database, table, columns, tools etc...
Are you using MySQL 5.7 or above? Then something simple to try would be to change to character set utf8mb4 in your load data command.
See How MySQL 5.7 Handles 'utf8mb4' and the Load Data Infile for a similar issue.
Also see:
import geonames allCountries.txt into MySQL 5.7 using LOAD INFILE - ERROR 1300 (HY000)
Trouble with utf8 characters; what I see is not what I stored
“Incorrect string value” when trying to insert UTF-8 into MySQL via JDBC?

Insert UTF-8 txt file into MySql results in ErrorCode 1366

I am trying to insert a delimited txt file into MySQL but it seems there is something wrong with the encoding. I get Error coe 1366:Incorrect string value in in MYSQL when I try and insert. When I open the txt file it looks like this on the line that caused the error.
Any idea how can I insert this data?
You need to escape special characters. Something like \' for quote and so on
Somehow hex 9C got into the file. In latin1, that is œ.
Plan A:
Edit the file to remove it. The text around there is all in uppercase English, so I suspect it was not supposed to be œ.
Plan B:
Use the charset option to set the incoming stream to latin1. Then deal with œ later (or not).