Utf8 character set encoding error while data insertion using perl script - mysql

I am facing issues to insert Bulgarian language string using perl script in mysql. If I do manual insertion using query then it's working fine but while using perl it convert that string in to unknown characters.
I have perform below steps to resolve that issue but still no luck.
Set utf8 character set in database connection
$dbh->do("set character set utf8");
$dbh->do('SET NAMES utf8');
$dbh->{'mysql_enable_utf8'} = 1;
Also i have set default character set utf8 from my.cnf file.
Still I am getting Unknown characters.
Can any one suggest me how to resolve this issue ?
Thanks

See if these help:
use utf8;
use open ':std', ':encoding(UTF-8)';
It's not just MySQL that could be screwing things up -- the original bytes could be mis-encoded; the output could be improperly rendered; etc.
If you have some data stored, let's check to see if they are 'correct'. Do something like
SELECT col, HEX(col) FROM tbl WHERE ...
ДЖ should come out as hex D094D096 if it is correctly encoded in utf8. Note that Cyrillic mostly has D0xx hex for its characters.
I have more discussion here.

Related

Arabic in MySQL shows as? [duplicate]

I have a mysql database with utf8_general_ci encoding ,
i'm connecting to the same database with php using utf-8 page and file encode and no problem
but when connection mysql with C# i have letters like this غزة
i editit the connection string to be like this
server=localhost;password=root;User Id=root;Persist Security Info=True;database=mydatabase;Character Set=utf8
but the same problem .
Server=myServerAddress;Database=myDataBase;Uid=myUsername;Pwd=myPassword; CharSet=utf8;
Note! Use lower case value utf8 and not upper case UTF8 as this will fail.
See http://www.connectionstrings.com/mysql
could you try:
Server=localhost;Port=3306;Database=xxx;Uid=x xx;Pwd=xxxx;charset=utf8;"
Edit: I got a new idea:
//To encode a string to UTF8 encoding
string source = "hello world";
byte [] UTF8encodes = UTF8Encoding.UTF8.GetBytes(source);
//get the string from UTF8 encoding
string plainText = UTF8Encoding.UTF8.GetString(UTF8encodes);
good luck
more info about this technique http://social.msdn.microsoft.com/forums/en-us/csharpgeneral/thread/BF68DDD8-3D95-4478-B84A-6570A2E20AE5
You might need to use the "utf8mb4" character set for the column in order to support 4 byte characters like this: "λ𝛌 "
The utf8 charset only supports 1-3 bytes per character and thus can't support all unicode characters.
See http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html for more details.
CHARSET should be uppercase
Server=localhost;Port=3306;Database=xxx;Uid=x xx;Pwd=xxxx;CHARSET=utf8;
Just in case some come here later.
I needed to create a Seed method using Mysql with EF6, to load a SQL file. After running it I got weird characters on database like ? replacing é, ó, á
SOLUTION:
Make sure I read the file using the right charset: UTF8 on my case.
var path = System.AppDomain.CurrentDomain.BaseDirectory;
var sql = System.IO.File.ReadAllText(path + "../../Migrations/SeedData/scripts/sectores.sql", Encoding.UTF8);
And then M.Shakeri reminder:
CHARSET=utf8 on cxn string in web.config. Using CHARSET as uppercase and utf8 lowercase.
Hope it helps.
R.
One thing I found, but haven't had the opportunity to really browse is the collation charts available here: http://www.collation-charts.org/mysql60/
This will show you which characters are part of a given MySQL collation so you can pick the best option for your dataset.
Setting the charset in the connection string refers to the charset of the queries sent to the server. It does not affect the results returned from the server.
https://dev.mysql.com/doc/connectors/en/connector-net-connection-options.html
One way I have found to specify the charset from the client is to run this after opening the connection.
set character_set_results='utf8';
this worked for me:
"datasource=xxx;port=3306;username=xxx;password=xxx;database=xxx;charset=utf8mb4"

MYSQL not recognizing some special characters

Why won't mysql recognize é and a lot more characters including em dash (—) ?? This is driving me nuts. i keep getting such errors like Incorrect string value: '\xE9' for column
I am using mysql 5.5.6 , my tables are innodb and using collation utf8-default collation.
I don't know if this is important but I am doing bulk insert from a csv file which contains special characters and my fields are of type TEXT
I had a similar problem trying to SELECT ... WHERE table_col LIKE "%–%" (long dash) turned out it wasn't working because my .php file which was sending the query wasn't in UTF8 but instead in ANSI! Converting it to UTF8 did the trick!!
Your problem sounds like one I have dealt with in the past, and I concur with Synchro that the client connection settings may be where you need to look. You probably need to specify UTF8 character set when starting the connection.
I use PDO, and initiate the connection with this:
$this->dbConn = new PDO("mysql:host=$this->host;dbname=$this->dbname", $this->user, $this->pass, array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"));
Before I started using PDO, I used this:
mysql_query("SET NAMES 'utf8'");
See http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
Just make sure the CSV file is in UTF8 and not the default ANSI. To do this open the csv file in notepad and using the save as option, ensure the encoding is in UTF8.
It's probably down to your PHP MySQL client's connection settings. Rob Allen's post can probably sort you out.
Rather than using a SET NAMES utf8 query, which the PHP docs explicitly warns against, there is a built-in function to do this for you in the mysqli extension: $mysqli->set_charset('utf8');.
An alternative explanation for bad characters if you're already doing this is that MySQL's utf8 charset isn't actually proper UTF-8... It only supports up to 3-byte characters and there are some increasingly common ones that use 4, specifically Emojis. Fortunately MySQL has a fix for this as of version 5.5.3: use the utf8mb4 charset instead.
On a related note, the sort order in the default utf8 charset (with the utf8_general_ci collation) has a number of problems that may affect you in, for example, German. The fix here is to use the utf8mb4_unicode_ci collation, which provides a more accurate, though slightly slower collation.

Select MySQL rows with Japanese characters

Would anyone know of a reliable method (with mySQL or otherwise) to select rows in a database that contain Japanese characters? I have a lot of rows in my database, some of which only have alphanumeric characters, some of which have Japanese characters.
Rules when you have problem with character sets:
While creating database use utf8 encoding:
CREATE DATABASE _test DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
Make sure all text fields (varchar and text) are using UTF-8:
CREATE TABLE _test.test (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE = MyISAM;
When you make a connection do this before you query/update the database:
SET NAMES utf8;
With phpMyAdmin - Choose UTF-8 when you login.
set web page encoding to utf-8 to make sure all post/get data will be in UTF-8 (or you'll have to since converting is painful..). PHP code (first line in the php file or at least before any output):
header('Content-Type: text/html; charset=UTF-8');
Make sure all your queries are written in UTF8 encoding. If using PHP:
6.1. If PHP supports code in UTF-8 - just write your files in UTF-8.
6.2. If php is compiled without UTF-8 support - convert your strings to UTF-8 like this:
$str = mb_convert_encoding($str, 'UTF-8', '<put your file encoding here');
$query = 'SELECT * FROM test WHERE name = "' . $str . '"';
That should make it work.
Following on to the helpful answer NickSoft, i had to set the encoding on the db connection to get it to work.
&characterEncoding=UTF8
Then the SET NAMES utf8; seemed to be redundant
As teneff stated, just use SELECT.
When installing MySQL, use UTF-8 as charset. Then, choosing utf8_general_ci as collation should do the work.
As Frosty stated, just use SELECT.
Look up the lowest and highest valued Japanese characters in the Unicode charts at http://www.unicode.org/roadmaps/bmp/ and use REGEXP. It may use several different regions of characters to get the whole Japanese character set. As long as you use the UTF-8 charset and utf8_general_ci collation, you should be able to use a REGEXP '[a-gk-nt-z]' where a-g represents one range of Unicode characters from the charts, k-n represents another range, etc.
There is limited number of japanese characters. You can search for these using
SELECT ... LIKE '%カ%'
Alternatively you can try their hexadecimal denomination -
SELECT ...LIKE CONCAT('%',CHAR(0x30ab),'%')
You may find useful this UTF-8 Japanese subset
http://www.utf8-chartable.de/unicode-utf8-table.pl?start=12448
Supposing you're using UTF-8 character set for fields, queries, results...

How to fix "Incorrect string value" errors?

After noticing an application tended to discard random emails due to incorrect string value errors, I went though and switched many text columns to use the utf8 column charset and the default column collate (utf8_general_ci) so that it would accept them. This fixed most of the errors, and made the application stop getting sql errors when it hit non-latin emails, too.
Despite this, some of the emails are still causing the program to hit incorrect string value errrors: (Incorrect string value: '\xE4\xC5\xCC\xC9\xD3\xD8...' for column 'contents' at row 1)
The contents column is a MEDIUMTEXT datatybe which uses the utf8 column charset and the utf8_general_ci column collate. There are no flags that I can toggle in this column.
Keeping in mind that I don't want to touch or even look at the application source code unless absolutely necessary:
What is causing that error? (yes, I know the emails are full of random garbage, but I thought utf8 would be pretty permissive)
How can I fix it?
What are the likely effects of such a fix?
One thing I considered was switching to a utf8 varchar([some large number]) with the binary flag turned on, but I'm rather unfamiliar with MySQL, and have no idea if such a fix makes sense.
UPDATE to the below answer:
The time the question was asked, "UTF8" in MySQL meant utf8mb3. In the meantime, utf8mb4 was added, but to my knowledge MySQLs "UTF8" was not switched to mean utf8mb4.
That means, you'd need to specifically put "utf8mb4", if you mean it (and you should use utf8mb4)
I'll keep this here instead of just editing the answer, to make clear there is still a difference when saying "UTF8"
Original
I would not suggest Richies answer, because you are screwing up the data inside the database. You would not fix your problem but try to "hide" it and not being able to perform essential database operations with the crapped data.
If you encounter this error either the data you are sending is not UTF-8 encoded, or your connection is not UTF-8. First, verify, that the data source (a file, ...) really is UTF-8.
Then, check your database connection, you should do this after connecting:
SET NAMES 'utf8mb4';
SET CHARACTER SET utf8mb4;
Next, verify that the tables where the data is stored have the utf8mb4 character set:
SELECT
`tables`.`TABLE_NAME`,
`collations`.`character_set_name`
FROM
`information_schema`.`TABLES` AS `tables`,
`information_schema`.`COLLATION_CHARACTER_SET_APPLICABILITY` AS `collations`
WHERE
`tables`.`table_schema` = DATABASE()
AND `collations`.`collation_name` = `tables`.`table_collation`
;
Last, check your database settings:
mysql> show variables like '%colla%';
mysql> show variables like '%charac%';
If source, transport and destination are utf8mb4, your problem is gone;)
MySQL’s utf-8 types are not actually proper utf-8 – it only uses up to three bytes per character and supports only the Basic Multilingual Plane (i.e. no Emoji, no astral plane, etc.).
If you need to store values from higher Unicode planes, you need the utf8mb4 encodings.
The table and fields have the wrong encoding; however, you can convert them to UTF-8.
ALTER TABLE logtest CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE logtest DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE logtest CHANGE title title VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_general_ci;
"\xE4\xC5\xCC\xC9\xD3\xD8" isn't valid UTF-8. Tested using Python:
>>> "\xE4\xC5\xCC\xC9\xD3\xD8".decode("utf-8")
...
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid data
If you're looking for a way to avoid decoding errors within the database, the cp1252 encoding (aka "Windows-1252" aka "Windows Western European") is the most permissive encoding there is - every byte value is a valid code point.
Of course it's not going to understand genuine UTF-8 any more, nor any other non-cp1252 encoding, but it sounds like you're not too concerned about that?
I solved this problem today by altering the column to 'LONGBLOB' type which stores raw bytes instead of UTF-8 characters.
The only disadvantage of doing this is that you have to take care of the encoding yourself. If one client of your application uses UTF-8 encoding and another uses CP1252, you may have your emails sent with incorrect characters. To avoid this, always use the same encoding (e.g. UTF-8) across all your applications.
Refer to this page http://dev.mysql.com/doc/refman/5.0/en/blob.html for more details of the differences between TEXT/LONGTEXT and BLOB/LONGBLOB. There are also many other arguments on the web discussing these two.
First check if your default_character_set_name is utf8.
SELECT default_character_set_name FROM information_schema.SCHEMATA S WHERE schema_name = "DBNAME";
If the result is not utf8 you must convert your database. At first you must save a dump.
To change the character set encoding to UTF-8 for all of the tables in the specified database, type the following command at the command line. Replace DBNAME with the database name:
mysql --database=DBNAME -B -N -e "SHOW TABLES" | awk '{print "SET foreign_key_checks = 0; ALTER TABLE", $1, "CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci; SET foreign_key_checks = 1; "}' | mysql --database=DBNAME
To change the character set encoding to UTF-8 for the database itself, type the following command at the mysql> prompt. Replace DBNAME with the database name:
ALTER DATABASE DBNAME CHARACTER SET utf8 COLLATE utf8_general_ci;
You can now retry to to write utf8 character into your database. This solution help me when i try to upload 200000 row of csv file into my database.
Although your collation is set to utf8_general_ci, I suspect that the character encoding of the database, table or even column may be different.
ALTER TABLE tabale_name MODIFY COLUMN column_name VARCHAR(255)
CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
In general, this happens when you insert strings to columns with incompatible encoding/collation.
I got this error when I had TRIGGERs, which inherit server's collation for some reason.
And mysql's default is (at least on Ubuntu) latin-1 with swedish collation.
Even though I had database and all tables set to UTF-8, I had yet to set my.cnf:
/etc/mysql/my.cnf :
[mysqld]
character-set-server=utf8
default-character-set=utf8
And this must list all triggers with utf8-*:
select TRIGGER_SCHEMA, TRIGGER_NAME, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, DATABASE_COLLATION from information_schema.TRIGGERS
And some of variables listed by this should also have utf-8-* (no latin-1 or other encoding):
show variables like 'char%';
I got a similar error (Incorrect string value: '\xD0\xBE\xDO\xB2. ...' for 'content' at row 1). I have tried to change character set of column to utf8mb4 and after that the error has changed to 'Data too long for column 'content' at row 1'.
It turned out that mysql shows me wrong error. I turned back character set of column to utf8 and changed type of the column to MEDIUMTEXT. After that the error disappeared.
I hope it helps someone.
By the way MariaDB in same case (I have tested the same INSERT there) just cut a text without error.
That error means that either you have the string with incorrect encoding (e.g. you're trying to enter ISO-8859-1 encoded string into UTF-8 encoded column), or the column does not support the data you're trying to enter.
In practice, the latter problem is caused by MySQL UTF-8 implementation that only supports UNICODE characters that need 1-3 bytes when represented in UTF-8. See "Incorrect string value" when trying to insert UTF-8 into MySQL via JDBC? for details. The trick is to use column type utf8mb4 instead of type utf8 which doesn't actually support all of UTF-8 despite the name. The former type is the correct type to use for all UTF-8 strings.
In my case, Incorrect string value: '\xCC\x88'..., the problem was that an o-umlaut was in its decomposed state. This question-and-answer helped me understand the difference between o¨ and ö. In PHP, the fix for me was to use PHP's Normalizer library. E.g., Normalizer::normalize('o¨', Normalizer::FORM_C).
The solution for me when running into this Incorrect string value: '\xF8' for column error using scriptcase was to be sure that my database is set up for utf8 general ci and so are my field collations. Then when I do my data import of a csv file I load the csv into UE Studio then save it formatted as utf8 and Voila! It works like a charm, 29000 records in there no errors. Previously I was trying to import an excel created csv.
I have tried all of the above solutions (which all bring valid points), but nothing was working for me.
Until I found that my MySQL table field mappings in C# was using an incorrect type: MySqlDbType.Blob . I changed it to MySqlDbType.Text and now I can write all the UTF8 symbols I want!
p.s. My MySQL table field is of the "LongText" type. However, when I autogenerated the field mappings using MyGeneration software, it automatically set the field type as MySqlDbType.Blob in C#.
Interestingly, I have been using the MySqlDbType.Blob type with UTF8 characters for many months with no trouble, until one day I tried writing a string with some specific characters in it.
Hope this helps someone who is struggling to find a reason for the error.
If you happen to process the value with some string function before saving, make sure the function can properly handle multibyte characters. String functions that cannot do that and are, say, attempting to truncate might split one of the single multibyte characters in the middle, and that can cause such string error situations.
In PHP for instance, you would need to switch from substr to mb_substr.
I added binary before the column name and solve the charset error.
insert into tableA values(binary stringcolname1);
Hi i also got this error when i use my online databases from godaddy server
i think it has the mysql version of 5.1 or more. but when i do from my localhost server (version 5.7) it was fine after that i created the table from local server and copied to the online server using mysql yog i think the problem is with character set
Screenshot Here
To fix this error I upgraded my MySQL database to utf8mb4 which supports the full Unicode character set by following this detailed tutorial. I suggest going through it carefully, because there are quite a few gotchas (e.g. the index keys can become too large due to the new encodings after which you have to modify field types).
There's good answers in here. I'm just adding mine since I ran into the same error but it turned out to be a completely different problem. (Maybe on the surface the same, but a different root cause.)
For me the error happened for the following field:
#Column(nullable = false, columnDefinition = "VARCHAR(255)")
private URI consulUri;
This ends up being stored in the database as a binary serialization of the URI class. This didn't raise any flags with unit testing (using H2) or CI/integration testing (using MariaDB4j), it blew up in our production-like setup. (Though, once the problem was understood, it was easy enough to see the wrong value in the MariaDB4j instance; it just didn't blow up the test.) The solution was to build a custom type mapper:
package redacted;
import javax.persistence.AttributeConverter;
import java.net.URI;
import java.net.URISyntaxException;
import static java.lang.String.format;
public class UriConverter implements AttributeConverter<URI, String> {
#Override
public String convertToDatabaseColumn(URI attribute) {
return attribute.toString();
}
#Override
public URI convertToEntityAttribute(String field) {
try {
return new URI(field);
}
catch (URISyntaxException e) {
throw new RuntimeException(format("could not convert database field to URI: %s", field));
}
}
}
Used as follows:
#Column(nullable = false, columnDefinition = "VARCHAR(255)")
#Convert(converter = UriConverter.class)
private URI consulUri;
As far as Hibernate is involved, it seems it has a bunch of provided type mappers, including for java.net.URL, but not for java.net.URI (which is what we needed here).
In my case that problem was solved by changing Mysql column encoding to 'binary' (data type will be changed automatically to VARBINARY). Probably I will not be able to filter or search with that column, but I'm no need for that.
In my case ,first i've meet a '???' in my website, then i check Mysql's character set which is latin now ,so i change it into utf-8,then i restart my project ,then i got the same error with you , then i found that i forget to change the database's charset and change into utf-8, boom,it worked.
I tried almost every steps mentioned here. None worked. Downloaded mariadb. It worked. I know this is not a solution yet this might help somebody to identify the problem quickly or give a temporary solution.
Server version: 10.2.10-MariaDB - MariaDB Server
Protocol version: 10
Server charset: UTF-8 Unicode (utf8)
I had a table with a varbinary column that I wanted to convert to utf8mb4 varchar. Unfortunately some of the existing data was invalid UTF-8 and the ALTER query returned Incorrect string value for various rows.
I tried every suggestion I could find regarding cast / convert / char_length = length etc. but nothing in SQL detected the erroneous values, other than the ALTER query returning bad rows one by one. I would love a pure SQL solution to remove the bad values. Sadly this solution is not pretty
I ended up select *'ing the entire table into PHP, where the erroneous rows could be detected en-masse by:
if (empty(htmlspecialchars($row['whatever'])))
The problem can also be caused by the client if the charset is not set to utf8mb4. so even if every Database, Table and Column is set to utf8mb4 you will still get an error, for instance in PyCharm.
For Python, set the charset of the connection in the MySQL Connector connect method:
mydb = mysql.connector.connect(
host="IP or Host",
user="<user>",
passwd="<password>",
database="<yourDB>",
# set charset to utf8mb4 to support emojis
charset='utf8mb4'
)
I know i`m late to the ball but someone else might come accross the problem i had with this and be happy to read my workaround.
I have come accross this problem with french characters. turns out i the text I was copying had encoding the accents on some charaatcers as 2 chars and others as single chars...
i couldn`t find how to set my table to accept the strings so i ended up changing the diacritics in my text import.
here is a list of them as double characters to search for them in your texts.
ùòìàè
áéíóú
ûôêâî
ç
1 - You have to declare in your connection the propertie of enconding UTF8. http://php.net/manual/en/mysqli.set-charset.php.
2 - If you are using mysql commando line to execute a script, you have to use the flag, like:
Cmd: C:\wamp64\bin\mysql\mysql5.7.14\bin\mysql.exe -h localhost -u root -P 3306 --default-character-set=utf8 omega_empresa_parametros_336 < C:\wamp64\www\PontoEletronico\PE10002Corporacao\BancoDeDadosModelo\omega_empresa_parametros.sql

How can I process data to avoid MySQL "incorrect string value" error?

I am trying to use a Rake task to migrate some legacy data from MS Access to MySQL. I'm working on Windows XP, using Ruby 1.8.6.
I have the encoding for Rails set as "utf8" in database.yml.
Also, the default character set for MySQL is utf8.
99% of the data is coming in fine, but every now and then I'll get a column value that gives me a error something like this:
Mysql::Error: Incorrect string value: '\x92 Comm...' for column 'name'
at row 1:
INSERT INTO `organizations` ( [...] )
VALUES('Lawyers’ Committee', [...] )
It looks as though the thing that's giving MySQL trouble is the apostrophe immediately after the "s" in the word "Lawyers".
Here's another one...
Mysql::Error: Incorrect string value: '\x99 aoc' for column 'department'
at row 1:
INSERT INTO `addresses`
[...]
'TRInfo™ aoc'
[....]
Looks like it's choking on the "TM" after "TRInfo".
Is there any Ruby or Rails method that I can run the data through to cleanse from it any characters that MySQL will choke on?
Ideally, it would be great to replace them with more palatable characters -- replace the apostrophe with a single quote and the TM symbol with the string "(TM)".
Or, if I could somehow configure MySQL to store those characters as-is without errors that would be great too.
It looks like your input data is not in utf-8.
I did a little investigating and the styled quote used in Lawyer's is encoded as \x92 in the Windows-1252 encoding, but would be nonsense for utf-8 (when I decoded it and encoded it into utf8, I got \xe2\x80\x99).
Thus you will need to convert the input strings from windows-1252 to utf-8 (or to unicode).
I had the same problem when putting contents of UTF-16 encoded files - which usually store one character per 16bit block - into mysql tables with java. The problem was that the UTF-16 encoded string contained so called surrogate pairs. It means two consecutive 16bit UTF-16 blocks encode one special character but cannot be translated into a corresponding UTF-8 encoding individually. See wikipedia for further explanation.
The solution was to simply replace these characters with spaces. This is the character range you might want to strip out of your string: U+D800–U+DFFF
In general, this happens when you insert strings to columns with incompatible encoding/collation.
I got this error when I had TRIGGERs, which inherit server's collation for some reason.
And mysql's default is (at least on Ubuntu) latin-1 with swedish collation.
Even though I had database and all tables set to UTF-8, I had yet to set my.cnf:
/etc/mysql/my.cnf :
[mysqld]
character-set-server=utf8
default-character-set=utf8
And this must list all triggers with utf8-*:
select TRIGGER_SCHEMA, TRIGGER_NAME, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, DATABASE_COLLATION from information_schema.TRIGGERS
And some of variables listed by this should also have utf-8-* (no latin-1 or other encoding):
show variables like 'char%';
It looks like your old database is in one string format (utf8?) and your rails is expecting something else. If you input is in utf8, have you tried configuring your rails to support it?
I encountered the same problem today.
After tried many times, I found out the reason and fix it at last.
For applications that store data using the default MySQL character set and collation (latin1, latin1_swedish_ci), so you need to specify the character set and collation to utf8/utf8_general_ci when your create your database or table.
e.g.:
$sql = "CREATE TABLE " . $table_name . " (
id mediumint(9) NOT NULL AUTO_INCREMENT,
bookname varchar(128) NOT NULL,
author varchar(64) NOT NULL,
PRIMARY KEY (id),
KEY (bookname)
)CHARACTER SET utf8 COLLATE utf8_general_ci;";
Reference:
《mysql create table problem? SOLVED!!!!!!!!!!!》
http://forums.mysql.com/read.php?121,193883,193883
《10.1.5. Configuring the Character Set and Collation for Applications》
http://dev.mysql.com/doc/refman/5.0/en/charset-applications.html
Hoping this can help you.
Adding binary before the weirdcolumn solves the problem.
In my case, I have an update trigger on tableA to insert data into other table.
There are some special characters in column weirdcolumn, and the update failed with message: "ERROR 1366 (HY000): Incorrect string value: '\xE7....'"
After I dig in a lot, I found the solution by adding binary before the string column name, or using cast(weirdcolumn as binary);
Hope this can help.
I had the same issue importing data from SQL Server to MySql using Php.
My solution was utf8_encode() when inserting into MySql and use utf8_decode() when retrieving from MySql to display into the browser.
Here you have my FULL code, that works good.
//For string values
$Gro2=(is_null($row["GrpNm"]))?"NULL":"\"".mysql_escape_string(utf8_encode($row["GrpNm"]))."\"";
$sqlMy ="INSERT INTO `tbl_name` VALUES ($Gro2)";
Please note: For new projects use
mysqli_escape_string()
link