I'm working on a Spanish language website where some text is stored in a MS SQL 2008 database table.
The text is stored in the db table with characters such as á, í and ñ.
When I retrieve the data, the characters don't display on the page.
This is probably a very simple fix but please educate me.
You must use Unicode instead of ANSI strings and functions, and must choose a web page encoding that has the required character set. Some searches on those terms will yield all you need. Look up content type 1252 and 8859 as well in case you get stuck (examples, not answers).
Related
I am developing a web app using MySQL and PHP. Some of the users will be Chinese so that I need to test my program with Chinese characters. As Chinese characters are longer that the usual ASCII characters, it happens that the MySQL field may be shorter than the string with Chinese characters.
I tried to limit the lenth in the input two times smaller than the MySQL field as in the example below:
<input name='field_name' maxlength='5'> in HTML
field_name VARCHAR(10) in MySQL (all my field are encoded with utf8_unicode_ci)
Nevertheless, the string '好好好好好‘ that have 5 characters would be truncated.
That is an important issue because truncated strings ends with the symbol "�" and jQuery ajax calls (json) reject an error.
That is why I would like to know how to secure on client and server sides these inputs so that the integrity of data would not be impacted in any situation (or at least be displayed even it is truncated).
Thanks!
The UTF 8 use one character for English character,
But UTF 8 use three character for Chinese character
My SQL Server 2008 R2 database has string columns (nvarchar). And some of the old data is showing Ascii. I need to show it to the user in my site and I prefer to convert the data in the database to Unicode. Is there a quick way to do this? are there downsides that I should be aware of?
Examples to my issue:
In the database, I see special chars instead of regulars chars. I have a name of a user which is supposed to be Amédée, and instead it shows Am?d??.
In other cases I see " instead of Quotation mark ("), or the chars &# instead of the word "and".
We have some tables that were set with the Latin character set instead of UTF-8 and it allowed bad characters to be entered into the tables, the usual culprit is people copy / pasting from Word or Outlook which copys those nasty hidden characters...
Is there any query we can use to identify these characters to clean them?
Thanks,
I assume that your connection chacater set was set to UTF8 when you filled the data in.
MySQL replaces unconvertable characters with ? (question marks):
SELECT CONVERT('тест' USING latin1);
----
????
The problem is distinguishing legitimate question marks from illegitimate ones.
Usually, the question marks in the beginning of a word are a bad sign, so this:
SELECT *
FROM mytable
WHERE myfield RLIKE '\\?[[:alnum:]]'
should give a good start.
You're probably noticing something like this 'bug'. The 'bad characters' are most likely UTF-8 control characters (eg \x80). You might be able to identify them using a query like
SELECT bar FROM foo WHERE bar LIKE LOCATE(UNHEX(80), bar)!=0
From that linked bug, they recommend using type BLOB to store text from windows files:
Use BLOB (with additional encoding field) instead of TEXT if you need to store windows files (even text files). Better than 3-byte UTF-8 and multi-tier encoding overhead.
Take a look at this Q/A (it's all about your client encoding aka SET NAMES )
I'm working on a site that has users from other countries. For the most part we get English text but sometimes people use special characters like Chinese symbols or the E with the accent. These symbols are displaying as "?" when shown on the site.
The site has a UTF-8 charset declaration and the SQL Server database field is Nvarchar. I did a test by going to Google translate and having it translate "Good morning" into Japanese. When I copied the resulting Kanji to my site and saved it myself it worked fine.
What could be causing this issue? I'm guessing it's because the text is being entered in a charset that is not UTF-8. Will accept-charset="UTF-8" resolve the issue? If not what can I do? Even if there is no way to fix existing bad data can I prevent this issue in the future?
SQL Server 7.0 and SQL Server 2000 use
a different Unicode encoding (UCS-2)
and do not recognize UTF-8 as valid
character data.
See the following knowledge base article for dealign with storing/retreieving utf-8 data in a MS SQL Server database: http://support.microsoft.com/kb/232580
I have a couple escaped characters in user-entered fields that I can't figure out.
I know they are the "smart" single and double quotes, but I don't know how to search for them in mysql.
The characters in ruby, when output from Ruby look like \222, \223, \224 etc
irb> "\222".length => 1
So - do you know how to search for these in mysql? When I look in mysql, they look like '?'.
I'd like to find all records that have this character in the text field. I tried
mysql> select id from table where field LIKE '%\222%'
but that did not work.
Some more information - after doing a mysqldump, this is how one of the characters is represented - '\\xE2\\x80\\x99'. It's the smart single quote.
Ultimately, I'm building an RTF file and the characters are coming out completely wrong, so I'm trying to replace them with 'dumb' quotes for now. I was able to do a gsub(/\222\, "'").
Thanks.
I don't quite understand your problem but here is some info for you:
First, there are no escaped characters in the database. Because every character being stored as is, with no escaping.
they don't "look ilke ?". I's just wrong terminal settings. SET NAMES query always should be executed first, to match client encoding.
you have to determine character set and use it on every stage - in the database, in the mysql client, in ruby.
you should distinguish ruby strings representation from character itself.
To enter character in the mysql query, you can use char function. But in terminal only. In ruby just use the character itself.
smart quotes looks like 2-byte encoded in the unicode. You have to determine your encoding first.