So i have some Unicode(Arabic) text data stored in a Mongoid model and i want to insert it into a mysql database. I had to use gsub to escape single quotes as that was causing me SQL insertion errors.
text = model.text.squish().gsub("'", %q(\\\'))
db_con.query("insert into table (text) values ('#{text}')")
Now my problem is when i view the data at phpmyadmin this what i see
اليوم.. ملايين الهوات٠تودع "واتساب"
للأبد
I tried adding force_encoding('UTF-8') but that didn't change anything, i also tried escaping with str.dump but that transformed the data into Unicode code points like u{243} when viewed in phpmyadmin. How can this be fixed.
Fixed it by executing this query before insertion "SET CHARACTER SET 'UTF8'"
text = model.text.squish().gsub("'", %q(\\\'))
db_con.query("SET CHARACTER SET 'UTF8'")
db_con.query("insert into table (text) values ('#{text}')")
Related
I need to add a record to our MySQL database (via Omeka) that includes an invalid unicode character (this one)
The error message I get via Omeka is:
Mysqli statement execute error : Incorrect string value: '\xF0\xAA\xA8\xA7\xE7\x94...' for column 'text' at row 1
The database field is longtext with collation utf8_unicode_ci. There are already a lot of records in this table and I'm not quite sure what I should change without affecting the other data already in it. Suggestions?
ALTER TABLE tbl CONVERT TO utf8mb4;
Meanwhile, the text for that row in that column is probably truncated or the whole row is missing.
As best as I can tell, F0AAA8A7 is not yet assigned, but I think it is in the area of Chinese characters, not Emoji, which also need utf8mb4. It is Unicode "codepoint" 2AA27.
Following on from this question MySQL database contains quotes encoded and unencoded and it's breaking javascript
I am executing this MySQL query:
DELETE FROM `example` WHERE `name` = ''12345''
However it fails because the value in the database is '12345'. It seems that old data in the database has a mixture of encoded and unencoded quotes. Is it safe to to update all ' to ' in the database?
In most cases (yours included), store text without any "encoding". That is, do not store htmlentities, store the actual characters, do not store unicode 'codes', store the actual characters, etc.
Do likewise for anything you need to compare to what is in the database.
You will, however, have to escape strings when building SQL statements. Otherwise, you can't get quotes (in text) inside quotes (that are part of the SQL syntax.
That is, you will end up with this SQL when searching for that Irishman:
... WHERE `name` = 'O\'Brian'
I'm doing this directly in the mysql client. I want to do the following:
INSERT INTO MYTABLE VALUES(1,12,'\u5c40\u5c42');
So it would insert the two unicode characters. I'd like to do this without using some other programming language if possible, I'd like to just paste my insert statements right into mysql client.
What's the type of your table data? char or varchar? Your issue isn't quite clear, are you getting an error from that line? You might be experiencing: http://dev.hubspot.com/bid/7049/MySQL-and-Unicode-Three-Gotchas.
EDIT:
Quite a bit of information is within these three pages that should be able to help:
http://dev.mysql.com/doc/refman/5.5/en/charset-unicode.html
http://dev.mysql.com/doc/refman/5.5/en/string-syntax.html
http://dev.mysql.com/doc/refman/5.5/en/charset-literal.html
but I also saw this :
INSERT INTO mytable VALUES (1, 12, _ucs2'\x5C40\x5C42');
Using the mysql console, I can just paste your unicode characters into an insert command and mysql accepts it. Here's a little test I did using your data:
CREATE TABLE xx (col1 varchar(20));
insert into xx values ('局层');
select * from xx;
+---------+
| col1 |
+---------+
| 局层 |
+---------+
My db uses default encoding (latin1).
Have you tried just pasting them in? What error, if any, do you get?
It will depend on what you are programming with or if just dealing with the database directly. Are you just trying to do a straight insert from querybrowser or some other tool or are you doing this via a web app. IF the second, what language is the webapp using. If it is a .net app you can try setting the character set in the connection string i.e. charset=utf8
If you are doing something with php then take a look at this link http://randomchaos.com/documents/?source=php_and_unicode
You could also go and set the default character set on the database to UTF-8. Not sure how this will impact the current data so be sure to backup everything before making changes to the database.
By using MySQL Workbench
Alter the table of that column you want to insert unicode into.
Change Collation of that column to utf8-default collation.
Apply the setting and you are good to go to insert unicode.
In my case, I needed to insert Arabic characters into MySql server through C# form application. The only way that worked for me is as follows:
First: In your code, specify character set in the connection string as follows:
MySqlConnection mysqlConn = new MySqlConnection("Server= serverName; Port=3306; Database= dbName; Uid = userName; Pwd=password; charset=utf8");
Second: In phpMyAdmin console, click on your database name link, then head to "operations" tab and go to "Collation" at the bottom and select "utf8_unicode_ci" and check the options below and finally click on "Go"
Steps here
this worked for me
$con=mysqli_connect("localhost","my_user","my_password","my_db");
// Change character set to utf8
mysqli_set_charset($con,"utf8");
I have a php script that inserts values into mySQL table
INSERT INTO stories (title) VALUES('$_REQUEST[title]);
I checked the values of my request variables before going into the table and it's fine.
But when I add title=john to the table for example,
I get something like this:
title = "[][][][]john"
and when I extract the value, it's a newline then john.
I have my columns set to utf-8, I tried swedish character set as well.
Note: I don't get this error when inserting values from the phpMyAdmin commandline
You need {} around any array notation when used inside "".
$q="INSERT INTO stories(title) VALUES('{$_REQUEST['title']}')";
BTW, it would be better, when checking your $_REQUEST vars to store the sanitized versions in new variables, and to be sure to escape them with real_escape_string()
SET NAMES <encoding> query must be executed every time you connect to your database.
very simple rule.
where <encoding> is your HTML page encoding in mysql dialect (utf8 for the utf-8)
You need to check the character set of the database, the server, and the client.
Note that it's not a swedish character set, it's a swedish collation.
I am trying to use a Rake task to migrate some legacy data from MS Access to MySQL. I'm working on Windows XP, using Ruby 1.8.6.
I have the encoding for Rails set as "utf8" in database.yml.
Also, the default character set for MySQL is utf8.
99% of the data is coming in fine, but every now and then I'll get a column value that gives me a error something like this:
Mysql::Error: Incorrect string value: '\x92 Comm...' for column 'name'
at row 1:
INSERT INTO `organizations` ( [...] )
VALUES('Lawyers’ Committee', [...] )
It looks as though the thing that's giving MySQL trouble is the apostrophe immediately after the "s" in the word "Lawyers".
Here's another one...
Mysql::Error: Incorrect string value: '\x99 aoc' for column 'department'
at row 1:
INSERT INTO `addresses`
[...]
'TRInfo™ aoc'
[....]
Looks like it's choking on the "TM" after "TRInfo".
Is there any Ruby or Rails method that I can run the data through to cleanse from it any characters that MySQL will choke on?
Ideally, it would be great to replace them with more palatable characters -- replace the apostrophe with a single quote and the TM symbol with the string "(TM)".
Or, if I could somehow configure MySQL to store those characters as-is without errors that would be great too.
It looks like your input data is not in utf-8.
I did a little investigating and the styled quote used in Lawyer's is encoded as \x92 in the Windows-1252 encoding, but would be nonsense for utf-8 (when I decoded it and encoded it into utf8, I got \xe2\x80\x99).
Thus you will need to convert the input strings from windows-1252 to utf-8 (or to unicode).
I had the same problem when putting contents of UTF-16 encoded files - which usually store one character per 16bit block - into mysql tables with java. The problem was that the UTF-16 encoded string contained so called surrogate pairs. It means two consecutive 16bit UTF-16 blocks encode one special character but cannot be translated into a corresponding UTF-8 encoding individually. See wikipedia for further explanation.
The solution was to simply replace these characters with spaces. This is the character range you might want to strip out of your string: U+D800–U+DFFF
In general, this happens when you insert strings to columns with incompatible encoding/collation.
I got this error when I had TRIGGERs, which inherit server's collation for some reason.
And mysql's default is (at least on Ubuntu) latin-1 with swedish collation.
Even though I had database and all tables set to UTF-8, I had yet to set my.cnf:
/etc/mysql/my.cnf :
[mysqld]
character-set-server=utf8
default-character-set=utf8
And this must list all triggers with utf8-*:
select TRIGGER_SCHEMA, TRIGGER_NAME, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, DATABASE_COLLATION from information_schema.TRIGGERS
And some of variables listed by this should also have utf-8-* (no latin-1 or other encoding):
show variables like 'char%';
It looks like your old database is in one string format (utf8?) and your rails is expecting something else. If you input is in utf8, have you tried configuring your rails to support it?
I encountered the same problem today.
After tried many times, I found out the reason and fix it at last.
For applications that store data using the default MySQL character set and collation (latin1, latin1_swedish_ci), so you need to specify the character set and collation to utf8/utf8_general_ci when your create your database or table.
e.g.:
$sql = "CREATE TABLE " . $table_name . " (
id mediumint(9) NOT NULL AUTO_INCREMENT,
bookname varchar(128) NOT NULL,
author varchar(64) NOT NULL,
PRIMARY KEY (id),
KEY (bookname)
)CHARACTER SET utf8 COLLATE utf8_general_ci;";
Reference:
《mysql create table problem? SOLVED!!!!!!!!!!!》
http://forums.mysql.com/read.php?121,193883,193883
《10.1.5. Configuring the Character Set and Collation for Applications》
http://dev.mysql.com/doc/refman/5.0/en/charset-applications.html
Hoping this can help you.
Adding binary before the weirdcolumn solves the problem.
In my case, I have an update trigger on tableA to insert data into other table.
There are some special characters in column weirdcolumn, and the update failed with message: "ERROR 1366 (HY000): Incorrect string value: '\xE7....'"
After I dig in a lot, I found the solution by adding binary before the string column name, or using cast(weirdcolumn as binary);
Hope this can help.
I had the same issue importing data from SQL Server to MySql using Php.
My solution was utf8_encode() when inserting into MySql and use utf8_decode() when retrieving from MySql to display into the browser.
Here you have my FULL code, that works good.
//For string values
$Gro2=(is_null($row["GrpNm"]))?"NULL":"\"".mysql_escape_string(utf8_encode($row["GrpNm"]))."\"";
$sqlMy ="INSERT INTO `tbl_name` VALUES ($Gro2)";
Please note: For new projects use
mysqli_escape_string()
link