ERROR 1366 (22007): Incorrect string value: '\x92t' - mysql

Server version: 10.8.3-MariaDB
Server charset: UTF-8 Unicode (utf8mb4)
InnoDB
I'm getting an error trying to import into a blank db (db is already created, just trying to import now):
ERROR 1366 (22007) at line 19669: Incorrect string value: '\x92t' for column glen_wazzup.nuke_bbsearch_wordlist.word_text at row 1
The SQL:
CREATE TABLE `nuke_bbsearch_wordlist` (
`word_text` varchar(50) binary NOT NULL default '',
`word_id` mediumint(8) unsigned NOT NULL auto_increment,
`word_common` tinyint(1) unsigned NOT NULL default '0',
PRIMARY KEY (`word_text`),
KEY `word_id` (`word_id`)
) ENGINE=InnoDB AUTO_INCREMENT=18719 ;
Line 19669 (error line):
INSERT INTO `nuke_bbsearch_wordlist` VALUES (0x6469646e9274, 6895, 0);
From my readings it has something to do with character encoding, and the character is an apostrophe and the wires are getting crossed somewhere. I've read you can use an ALTER statement, but this is a raw sql import file that isn't able to be imported yet, so I'm not sure how (or exactly "what") to change in the file so that it'll import?

didn’t -- Note that the apostrophe is not the ascii char, but hex 92 if encoded in latin1 (and several character sets) or E28099 if encoded in utf8 or utf8mb4.
On the other hand, you have stated "Server charset: UTF-8 Unicode (utf8mb4)", but x92 is not valid in UTF-8
You are trying to import? How? From what? From mysqldump? From a CSV file? You have an INSERT statement; does that come from a dump?
In any case, it would probably be correct to state that the file is in "Character set latin1".
The collation is not important.

The solution may be as easy as converting your import source file from ISO-8859-1 to UTF-8 encoding.
To do the conversion on Linux, you can run recode l1..u8 <filename >filename.out (if installed) or iconv -f ISO-8859-1 -t UTF-8 -o filename.out filename. And then import filename.out to MySQL.
However, the source encoding may be different from ISO-8859-1 (e.g. it may be ISO-8859-2), so you may want to try multiple source encodings, and check which output file looks right (e.g. by looking at non-ASCII characters in filename.out).

Related

MySQL Multilingual Encoding | Error Code: 1366. Incorrect string value: '\xCE\x09DIS'

I am trying to set up a database to store string data that is in multiple languages and includes Chinese letters among many others.
Steps I have taken so far:
I have created a schema which uses utf8mb4 character set and utf8mb4_unicode_ci collation.
I have created a table which includes CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; at the end of the CREATE statement.
I am attempting to LOAD DATA INFILE from a CSV file with CHARACTER SET utf8mb4 specified in the LOAD statement.
However, I am receiving an error Error Code: 1366. Incorrect string value: '\xCE\x09DIS' for column 'company_name' at row 43630.
Did it successfully parse 43629 rows? Then croak on that row? It may actually be garbage in the file.
Do you know what that company name should be? What does the rest of the line say?
Do you have another example? Remove that one line and run the LOAD again.
CE can be interpreted by any 1-byte charset, but not necessarily in a meaningful way.
09 is the "tab" character in virtually all charsets; is it reasonable to have a tab in a company name??

Store bullet point unicode characters in Mysql and UTF8

I am reading data from a CSV text file using Coldfusion and inserting it into a table. The database is UTF8, and the table is UTf8.
This string •Detroit Diesel series-60 engine keeps getting stored in the Description field as
•Detroit Diesel series-60 engine. (This is what I get from the database, not displayed in the browser.)
I can manually insert the string into a new record from the command line, and the characters are correctly preserved. UTF8 must support the bullet character. What can I be doing wrong?
Datasource connection string:
this.datasources["blabla"] = {
class: 'org.gjt.mm.mysql.Driver'
, connectionString: 'jdbc:mysql://localhost:3306/blabla?useUnicode=true&characterEncoding=UTF-8&jdbcCompliantTruncation=true&allowMultiQueries=false&useLegacyDatetimeCode=true'
, username: 'nottellingyou'
, password: "encrypted:zzzzzzz"
};
CREATE TABLE output, minus several columns
CREATE TABLE `autos` (
`VIN` varchar(30) NOT NULL,
`Description` text,
...
) ENGINE=InnoDB DEFAULT CHARSET=utf8
In addition, I've run
ALTER TABLE blabla.autos
MODIFY description TEXT CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Full code of import file here: https://gist.github.com/mborn319/c40573d6a58f88ec6bf373efbbf92f29
CSV file here. See line 7: http://pastebin.com/fM7fFtXD
In my CFML script, I tried dumping the data per suggestion from #Leigh and #Rick James. I then saw that the characters are garbled BEFORE insertion into Mysql. Based on this, I realized I needed to specify the charset when reading the file.
<cffile
action="read"
file="#settings.csvfile#"
variable="autodata"
charset="utf-8">
Result: •Detroit Diesel series-60 engine. This can now insert correctly into the database.

My flat files should be UCS-2, but I can't import into MySQL database

I have twenty pipe-delimited text files that I would like to convert into a MySQL database. The manual that came with the data say
Owing to the difficulty of displaying data for characters outside of
standard Latin Character Sets, all data is displayed using Unicode
(UCS-2) character encoding. All CSV files are structured using
commercial standards with the preferred format being pipe delimiter
(“|”) and carriage return + line feed (CRLF) as row terminators.
I am using MySQL Workbench 6.2.5 on Win 8.1, but the manual provides example SQL Server scripts to create the twenty tables. Here's one.
/****** Object: Table [dbo].[tbl_Company_Profile_Stocks] Script Date:
12/12/2007 08:42:05 ******/
CREATE TABLE [dbo].[tbl_Company_Profile_Stocks](
[BoardID] [int] NULL,
[BoardName] [nvarchar](255) NULL,
[ClientCompanyID] [int] NULL,
[Ticker] [nvarchar](255) NULL,
[ISIN] [nvarchar](255) NULL,
[OrgVisible] [nvarchar](255) NULL
)
Which I adjust as follows for MySQL.
/****** Object: Table dbo.tbl_Company_Profile_Stocks Script Date:
12/12/2007 08:42:05 ******/
CREATE TABLE dbo.tbl_Company_Profile_Stocks
(
BoardID int NULL,
BoardName varchar(255) NULL,
ClientCompanyID int NULL,
Ticker varchar(255) NULL,
ISIN varchar(255) NULL,
OrgVisible varchar(255) NULL
);
Because the manual says that the flat files are UCS-2, I set the dbo schema to UCS-2 default collation when I create it. This works fine AFAIK. It is the LOAD INFILE that fails. Because the data are pipe-delimited with CRLF line endings I try the following.
LOAD DATA LOCAL INFILE 'C:/Users/Richard/Dropbox/Research/BoardEx_data/unzipped/Company_Profile_Stocks20100416.csv'
INTO TABLE dbo.tbl_company_profile_stocks
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;
But in this case now rows are imported and the message is 0 row(s) affected Records: 0 Deleted: 0 Skipped: 0 Warnings: 0. So I try \n line endings instead. This imports something, but my integer values become zeros and the text becomes very widely spaced. The message is 14121 row(s) affected, 64 warning(s): 1366 Incorrect integer value: <snip> Records: 14121 Deleted: 0 Skipped: 0 Warnings: 28257.
If I open the flat text file in Sublime Text 3, the Encoding Helper package suggests that the file has UTF-16 LE with BOM encoding. If I repeat the above with UTF-16 default collation when I create the dbo schema, then my results are the same.
How can I fix this? Encoding drives me crazy!
Probably the main problem is that the LOAD DATA needs this clause (see reference):
CHARACTER SET ucs2
In case that does not suffice, ...
Can you get a hex dump of a little of the csv file? I want to make sure it is really ucs2. (ucs2 is very rare. Usually text is transferred in utf8.) If it looks readable when you paste text into this forum, then it is probably utf8 instead.
There is no "dbo" ("database owner"), only database, in MySQL.
Please provide SHOW CREATE TABLE tbl_Company_Profile_Stocks
(just a recommendation) Don't prefix table names with "tbl_"; it does more to clutter than to clarify.
Provide a PRIMARY KEY for the table.
#Rick James had the correct answer (i.e., set the encoding for LOAD DATA with the CHARACTER SET option). But in my case this didn't work because MySQL doesn't support UCS-2.
Note
It is not possible to load data files that use the ucs2 character set.
Here are a few approaches that work here. In the end I went this SQLite rather than MySQL, but the last solution should work with MySQL, or any other DB that accepts flat files.
SQLiteStudio
SQLiteStudio was the easiest solution in this case. I prefer command line solutions, but the SQLiteStudio GUI accepts UCS-2 encoding and any delimiter. This keeps the data in UCS-2.
Convert to ASCII in Windows command line
The easiest conversion to ASCII is in the Windows command line with TYPE.
for %%f in (*.csv) do (
echo %%~nf
type "%%~nf.csv" > "%%~nf.txt"
)
This may cause problems with special characters. In my case it left in single and double quotes that caused some problems with the SQLite import. This is the crudest approach.
Convert to ASCII in Python
import codecs
import glob
import os
for fileOld in glob.glob('*.csv'):
print 'Reading: %s' % fileOld
fileNew = os.path.join('converted', fileOld)
with codecs.open(fileOld, 'r', encoding='utf-16le') as old, codecs.open(fileNew, 'w', encoding='ascii', errors='ignore') as new:
print 'Writing: %s' % fileNew
for line in old:
new.write(line.replace("\'", '').replace('"', ''))
This is the most extensible approach and would allow you more precisely control which data you convert or retain.

Change default charset to utf-8: mysql

I am developing an app in which I am using MySQL database. The database contains certain characters which can not be encoded to the client side & I found those values null.
Like, a string containing a special character is represented as null at the client side.
I found that the default charset for the db was latin1, I changed it to utf-8, including all tables and individual columns of those tables. Also in my pdo_construct I have mentioned the charset to be utf-8,
$db = new PDO('mysql:dbname=$dbname;host=$dbhost;charset=utf8',$dbname,$dbhost);
I also configured the response headers to use utf-8 charset. But the characters are still not encoded, I am still getting null string in case where the special character is present.
I tried changing the my.ini file configuration by setting the default charset, it gives me error in my connection file at PDO construct.
Its urgent for me to fix this! Can someone help?

How to fix Incorrect string value when trying to convert from Latin1 to UTF8 error in mysql?

Some of my mysql database tables have been accidentally created as latin1 instead of utf8. I am now trying to fix the issue by changing the columns to their binary type then converting them to utf8 then changing them back to their original type. The problem is I am getting the following error when I try to do this:
ERROR 1366 (HY000) at line 524: Incorrect string value: '\xB4s whi...' for column 'sName' at row 73
How can I keep this from happening and convert my columns/tables to utf8?
\xB4 is the "acute accent" character in the Latin1 codepage, and must be re-encoded as a 2-byte character in UTF-8. What you want to do is alter the encoding WITHOUT changing to binary first. This will let the server re-encode the characters correctly.