String of Chinese characters longer that MySQL field - mysql

I am developing a web app using MySQL and PHP. Some of the users will be Chinese so that I need to test my program with Chinese characters. As Chinese characters are longer that the usual ASCII characters, it happens that the MySQL field may be shorter than the string with Chinese characters.
I tried to limit the lenth in the input two times smaller than the MySQL field as in the example below:
<input name='field_name' maxlength='5'> in HTML
field_name VARCHAR(10) in MySQL (all my field are encoded with utf8_unicode_ci)
Nevertheless, the string '好好好好好‘ that have 5 characters would be truncated.
That is an important issue because truncated strings ends with the symbol "�" and jQuery ajax calls (json) reject an error.
That is why I would like to know how to secure on client and server sides these inputs so that the integrity of data would not be impacted in any situation (or at least be displayed even it is truncated).
Thanks!

The UTF 8 use one character for English character,
But UTF 8 use three character for Chinese character

Related

AngularJS not showing accented characters

In my MySQL database I have international strings with accented characters, such as ñ or é.
I can retrieve values from DB with Angular services and show then on the Views through Controllers, but whenever there is an accented character in a string, the whole string does not show up.
I have tried with $sce, with ng-bind-html, but always the same result. Strings with accented characters do not show.
Am I missing something?
Ok, found the issue. It was not related to Angular but to PHP reading the response from the MySQL server in a wrong character set.
In PHP, after opening a connection $mysqli, you should tell it to use the same CHARSET/COLLATION as your database/table is using:
$mysqli->set_charset('utf8');
Then special chars will be interpreted perfectly.

MYSQL: Inserting Traditional & Simplified Chinese in the same 'cell‘

newbie here!
I have source data that contains both simplified and traditional Chinese in the same 'cell' (sorry, newbie using Excel speak here!), which I'm trying to load into MYSQL using "Load Data Infile".
The offending text is "到达广州新冶酒吧!一杯芝華士 嘈雜的音樂 行行色色的男女". It's got both simplified Chinese ("广") and traditional Chinese ("華").
When I load it into MySQL, I get the following error:
Error Code: 1366. Incorrect string value: '\xF0\xA3\x8E\xB4\xE8\x83...' for column > 'Description' at row 2
The collation of the database is UTF-8 default collation, and the input file is also UTF-8 encoded.
Is there any way I can either:
a) Make SQL accept this row of data (ideal), or
b) Get SQL to skip inserting this line of data?
Thanks! Do let me know if you need further detail.
Kevin
If 😼 was tripping it up, that's because 😼 is not in the Basic Multilingual Plane of Unicode; it's in the Supplementary Multilingual Plane, which is above U+FFFF and takes up 4 bytes in UTF-8 instead of 3. Fully conformant Unicode implementations treat them no differently, but MySQL charset utf8 doesn't accept characters above U+FFFF. If you have a recent version of MySQL, you can ALTER TABLE to use utf8mb4 which properly handles all Unicode characters. There are some catches to changing, as MySQL allocates 4 bytes per character instead of 3; see http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html for the details.
This issue is a duplicate of Inserting UTF-8 encoded string into UTF-8 encoded mysql table fails with "Incorrect string value" .

Spanish characters in SQL select

I'm working on a Spanish language website where some text is stored in a MS SQL 2008 database table.
The text is stored in the db table with characters such as á, í and ñ.
When I retrieve the data, the characters don't display on the page.
This is probably a very simple fix but please educate me.
You must use Unicode instead of ANSI strings and functions, and must choose a web page encoding that has the required character set. Some searches on those terms will yield all you need. Look up content type 1252 and 8859 as well in case you get stuck (examples, not answers).

MySQL Query to Identify bad characters?

We have some tables that were set with the Latin character set instead of UTF-8 and it allowed bad characters to be entered into the tables, the usual culprit is people copy / pasting from Word or Outlook which copys those nasty hidden characters...
Is there any query we can use to identify these characters to clean them?
Thanks,
I assume that your connection chacater set was set to UTF8 when you filled the data in.
MySQL replaces unconvertable characters with ? (question marks):
SELECT CONVERT('тест' USING latin1);
----
????
The problem is distinguishing legitimate question marks from illegitimate ones.
Usually, the question marks in the beginning of a word are a bad sign, so this:
SELECT *
FROM mytable
WHERE myfield RLIKE '\\?[[:alnum:]]'
should give a good start.
You're probably noticing something like this 'bug'. The 'bad characters' are most likely UTF-8 control characters (eg \x80). You might be able to identify them using a query like
SELECT bar FROM foo WHERE bar LIKE LOCATE(UNHEX(80), bar)!=0
From that linked bug, they recommend using type BLOB to store text from windows files:
Use BLOB (with additional encoding field) instead of TEXT if you need to store windows files (even text files). Better than 3-byte UTF-8 and multi-tier encoding overhead.
Take a look at this Q/A (it's all about your client encoding aka SET NAMES )

Escape characters in MySQL, in Ruby

I have a couple escaped characters in user-entered fields that I can't figure out.
I know they are the "smart" single and double quotes, but I don't know how to search for them in mysql.
The characters in ruby, when output from Ruby look like \222, \223, \224 etc
irb> "\222".length => 1
So - do you know how to search for these in mysql? When I look in mysql, they look like '?'.
I'd like to find all records that have this character in the text field. I tried
mysql> select id from table where field LIKE '%\222%'
but that did not work.
Some more information - after doing a mysqldump, this is how one of the characters is represented - '\\xE2\\x80\\x99'. It's the smart single quote.
Ultimately, I'm building an RTF file and the characters are coming out completely wrong, so I'm trying to replace them with 'dumb' quotes for now. I was able to do a gsub(/\222\, "'").
Thanks.
I don't quite understand your problem but here is some info for you:
First, there are no escaped characters in the database. Because every character being stored as is, with no escaping.
they don't "look ilke ?". I's just wrong terminal settings. SET NAMES query always should be executed first, to match client encoding.
you have to determine character set and use it on every stage - in the database, in the mysql client, in ruby.
you should distinguish ruby strings representation from character itself.
To enter character in the mysql query, you can use char function. But in terminal only. In ruby just use the character itself.
smart quotes looks like 2-byte encoded in the unicode. You have to determine your encoding first.