Characters entered from foreign users showing as? - html

I'm working on a site that has users from other countries. For the most part we get English text but sometimes people use special characters like Chinese symbols or the E with the accent. These symbols are displaying as "?" when shown on the site.
The site has a UTF-8 charset declaration and the SQL Server database field is Nvarchar. I did a test by going to Google translate and having it translate "Good morning" into Japanese. When I copied the resulting Kanji to my site and saved it myself it worked fine.
What could be causing this issue? I'm guessing it's because the text is being entered in a charset that is not UTF-8. Will accept-charset="UTF-8" resolve the issue? If not what can I do? Even if there is no way to fix existing bad data can I prevent this issue in the future?

SQL Server 7.0 and SQL Server 2000 use
a different Unicode encoding (UCS-2)
and do not recognize UTF-8 as valid
character data.
See the following knowledge base article for dealign with storing/retreieving utf-8 data in a MS SQL Server database: http://support.microsoft.com/kb/232580

Related

tinybutstrong not showing special characters from mysql

I'm trying to load data from a MySQL DB from a varchar(35) / utf8_swedish_ci field through TBS (tinybutstrong) and PHP using the example (MySQL data merge). My issue is that data loads fine if only ascii characters are in the fields but as soon as I add a single scandinavian special character like ö or ä the field contents vanishes entirely and other fields in row display correctly.
My understanding is that the latest versions on TBS automatically use UTF-8 coding (I have 3.9.0 for PHP 5) so I assumed it would work out-of-the-box. To be safe, I even added the coding to template as so:
'$TBS->LoadTemplate('mysql.html','UTF-8');' but to no avail.
Could someone please advice what is causing this.
For a good UTF-8 processing, all elements of the chain must be UTF-8.
You have to ensure that your template is UTF-8 : check the entered text and the HTML element <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
You have to ensure that all your PHP scripts are UTF-8 and not Ansi.
You also have to ensure that your MySQL connection is set to receive UTF-8 queries and to return UTF-8 item data. This can be done for example by querying the SQL : SET NAMES 'UTF8'

Character Encoding error when copying double quotes from word or other source

I am using JSP servlets and have a mysql database. I have an input field "Introduction". The error is when a user copy pastes a para from word then the character "(double quotes) is entered as ? in my table but only when the character is copied from a word or some other source. Also, if a user copies two paragraph's with spaces in between then a buggy character enters my sql table and the JS which is trying to load the introduction in my jsp page fails. i have also attached the screenshot for this. Please help me how can i resolve this.
MicroSoft, in its infinite wisdom, decided to have non-standard double quotes -- a left version and a right version. But that should be fixable, since those quotes do exist somewhere in the huge world of utf8 characters.
However, the data from your 'copy' was probably not copied in utf8 encoding. Since is is unclear how that is being done, we can't give you complete details on fixing it.
The "best" plan is to establish "utf8" at all stages of data/client/server/database/table/column/etc.
The quick-and-dirty fix is to replace the funny quotes with ascii quotes.

mysql - How to save ñ

Whenever I try to save ñ it becomes ? in the mysql database. After some few readings it is suggested that I have to change my jsp charset to UTF-8. For some reasons I have to stick to ISO-8859-1. My database table encoding is latin1. How can I fix this? Please help.
Go to your database administration with MySQL WorkBench for example, put the Engine to InnoDB and the collation to utf8-utf8_general_ci.
You state in your question that you require a ISO-8859-1 backend (latin1), and a Unicode (UTF-8) frontend. This setup is crazy, because the set on the frontend is much larger than that allowed in the database. The sanest thing would be using the same encoding through the software stack, but also using Unicode only for storage would make sense.
As you should know, a String is a human concept for a sequence of characters. In computer programs, a String is not that: it can be viewed as a sequence of characters, but it's really a pair data structure: a stream of bytes and an encoding.
Once you understand that passing a String is really passing bytes and a scheme, let's see who sends what:
Browser to HTTP server (usually same encoding as the form page, so UTF-8. The scheme is specified via Content-Type. If missing, the server will pick one based on its own strategy, for example default to ISO-8859-1 or a configuration parameter)
HTTP Server to Java program (it's Java to Java, so the encoding doesn't matter since we pass String objects)
Java client to MySQL server (the Connector/J documentation is quite convoluted - it uses the character_set_server system variable, possibly overridden by the characterEncoding connection parameter)
To understand where the problem lies, first assure that the column is really stored as latin1:
SELECT character_set_name, collation_name
FROM information_schema.columns
WHERE table_schema = :DATABASE
AND table_name = :TABLE
AND column_name = :COLUMN;
Then write the Java string you get from the request to a log file:
logger.info(request.getParameter("word"));
And finally see what actually is in the column:
SELECT HEX(:column) FROM :table
At this point you'll have enough information to understand the problem. If it's really a question mark (and not a replacement character) likely it's MySQL trying to transcode a character from a larger set (let's say Unicode) to a narrower one which doesn't contain it. The strange thing here is that ñ belongs to both ISO-8859-1 (0xF1, decimal 241) and Unicode (U+00F1), so it'd seem like there's a third charset (maybe a codepage?) involved in the round trip.
More information may help (operating system, HTTP server, MySQL version)
Change your db table content encoding to UTF-8
Here's the command for whole DB conversion
ALTER DATABASE db_name DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
And this is for single tables conversion
ALTER TABLE db_table CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
change your table collate to utf8_spanish_ci
where ñ is not equal to n but if you want both characters to be equal use
utf8_general_ci instead
I try several combinations, but this works for me:
VARCHAR(255) BINARY CHARACTER SET utf8 COLLATE utf8_bin
When data retrieve in dbforge express, shows like:
NIÑA
but in the application shows like:
NIÑA
I had the same problem. Found out that is not an issue about encoding UTF-8 or whatever charset. I imported my data from windows ANSI and all my Ñ and ñ where put in the database perfectly as it should be. Example last names showed on database last_name = "MUÑOZ". I was able to select normally from the database with query Select * from database where last_name LIKE "%muñoz%" and phpmyadmin show me results fine. It selected all "MUÑOZ" and "MUNOZ" without a problem. So phpmyadmin does show all my Ñ and ñ without any problems.
The problem was the program itself. All my characters mention, showed as you describe with the funky "MU�OZ" question mark. I had follow all advice everywhere. Set my headers correctly and tried all my charsets available. Even used google fonts and whatsoever font available to display correctly those last names, but no success.
Then I remembered an old program that was able to do the trick back and forth transparently and peeked into the code to figure it out: The database itself, showing all my special characters was the problem. Remember, I uploaded using windows ANSI encoding. Phpmyadmin did as expected, uploaded all as instructed.
The old program fixed this problem translating the Ñ to its UNICODE HTML Entity: Ñ (see chart here https://www.compart.com/en/unicode/U+00D1 ) a process done back and forth from MySQL to the app.
So you just need to change your database strings containing the letter Ñ and ñ to their corresponding UNICODE to reflect correctly on your browser with UTF charset.
In my case, I solved my issues replacing all my Ñ and ñ for their corresponding UNICODE in all the last names in my database.
UPDATE database_name
SET
last_name = REPLACE(last_name,
'MUÑOZ',
'MUÑOZ');
Now, Im able to display, browse, even search all my correct last names and accents/tildes, proper to spanish language. I hope this helps. It was a pain to figure it out, but an old program solved the problem. Best regards and happy coding !

Spanish characters in SQL select

I'm working on a Spanish language website where some text is stored in a MS SQL 2008 database table.
The text is stored in the db table with characters such as á, í and ñ.
When I retrieve the data, the characters don't display on the page.
This is probably a very simple fix but please educate me.
You must use Unicode instead of ANSI strings and functions, and must choose a web page encoding that has the required character set. Some searches on those terms will yield all you need. Look up content type 1252 and 8859 as well in case you get stuck (examples, not answers).

how to show special characters on my site?

I am currently working on a site that connects a DB and bring the information, some of this information has special characteres because is in polish languague, for example, in the database I have this one ę and I get e printed at my web,I already added the meta
<meta charset="ISO-8859-2">
but doesnt work, only if I write & #281; which is not pract and needs a lot of work, my question is if somebody did this , get the character, like ę, and print it just like that?
Thanks.
Make sure that:
the data really is in ISO-8859-2
the data isn't be corrupted by the configuration of the database
the HTTP headers aren't claiming the data is encoded a different way
whatever you are using to pull the data out of the database isn't transcoding it
You should also ditch ISO-8859-2 (as it is very legacy) and move to UTF-8.
Use a Unicode entity. &#xxxx; where xxxx is the Unicode value for the character.