I am attempting to learn about internalization in websites, so I am tampering with MySQL config file, field collations and html header character set type.
I basically have a form where I type some unicode characters in a text field, store it in a database, then output it back to the browser.
First scenario: HTML=>utf8 MySQL=>UTF8, that worked OK. However, when I viewed the database from PhPMyAdmin, weird characters were resident in the field.
Second scenario: I configured the VARCHAR in the database to be Latin1 by choosing a swedish_ci collation. The HTML remained utf8. I entered a unicode string in the form. Yet, the browser still displays the correct characters that I entered!!!
To make things more complicated for me to comprehend, I downloaded the mysql world database which is a database of all countries and cities of the world. The tables are Latin1 encoded. When I try to display them in a utf8 html, it displays weird characters for non-English characters. It works OK when my html character set is ISO-8591-1
If you are using utf-8 you should allways use utf-8 for HTML, utf8 charset in MySQL and utf8_general_ci or utf8_unicode_ci for MySQL collation. And your php document must be saved as utf-8 without BOM.
Related
I have a website form written in Perl that saves user input in multiple languages to a MySQL database. While it has worked perfectly saving and displaying all characters without problems, in PHPMyAdmin the characters always displayed with errors. However I ignored this since the website was displaying characters OK.
Now I've just recently moved the website to a VPS and the database has seemingly enforced ut8mb4 encoding on the data, so it is now displaying character errors on the site. I'm not an expert and find the whole encoding area quite confusing. My question is, how can I:
a) determine how my data is actually encoded in my table?
b) convert it correctly to utf8mb4 so it displays correctly in PHPMyAdmin and my website?
All HTML pages use the charset=utf8 declaration. MySQL connection uses mysql_enable_utf8 => 1. The table in my original database was set to utf8_general_ci collation. The original database collation (I just noticed) was set to latin1_swedish_ci. The new database AND table collation is utf8mb4_general_ci. Thanks in advance.
SHOW CREATE TABLE will tell you the default CHARACTER SET for the table. For any column(s) that overrode the default, the column will specify what it is set to.
However, there could be garbage in the column. Many users have encountered this problem when they stored utf8 bytes into a latin1 column. This lead to "Mojobake" or "double encoding".
The only way to tell what is actually stored there is to SELECT HEX(col). Western European accented characters will be
one byte for a latin1 character stored in latin1 column.
2 bytes for a utf8 character stored in 1 utf8 character or into 2 latin1 characters.
several bytes for "double encoding" when converted twice.
More discussion: Trouble with UTF-8 characters; what I see is not what I stored
So, I have a database and I use Navicat. We have a simple PHP website which is a few years old and we've upgraded the site to UTF8.
We have 'activities' on the site which handle UTF8 special characters perfectly, but we also have 'comments' on the site and curly single quotes and other special characters show me a �.
The database was converted to UTF via:
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
When I look at both databases in Navicat, I can see both are UTF8 and utf8_general_ci.
When I design the table I can see the 'activities' table I can see the cell is a mediumText and is setup with UTF8. When I design the 'comments' section, the cell that isn't working is a Blob and it doesn't have any character encoding info.
We're doing a pretty basic SELECT and then displaying via $vairable[column].
Does anyone know why the 'activities' would work perfectly with UTF8 and the 'comments' would have issues? We're not doing anything super fancy to either of them.
I have tried converting the Blob to a text field, but when I do that the database then escapes it'self when it's outputting to the page, so as soon as there is a single quote in the text it cuts off.
I have tried things like utf8_encode, stripslashes, mysql_real_escape_string, htmlentities, htmlspecialchars, but I'm not sure any of them would help anyway.
Thanks!
blob means binary large object. Binary data does not have any encoding in raw.
So you have latin1 or whatever data in a blob, and you show it and treat it like utf-8 data.
You need to manually convert the data using PHP or whatever.
Here is a good article from the performanceblog that describes what you can do:
http://www.mysqlperformanceblog.com/2013/10/16/utf8-data-on-latin1-tables-converting-to-utf8-without-downtime-or-double-encoding/
If you have problems firing your queries, use the console instead of phpMyAdmin and don't forget the connection encoding through SET NAMES
master> ALTER TABLE t CONVERT TO CHARACTER SET utf8, CHANGE comment comment TEXT;
master> SET NAMES utf8;
I want to insert via phpmyadmin 4 bytes character in the tabel. (phpmyadmin version is 5.5.33).
I assigned Server connection collation to utf8mb4_general_ci collation;
Database has utf8mb4 encode;
Table and column has utf8mb4 encode;
I tryed to insert 𩸽 symbol and it was success and without any errors! But this symbol in the table is displayed as ????.
Can someone help, please?
So I would reccomend you to check what is the application web encoding because your problem is not the data itself is the program that is printing it. If your php administration tool or the web container (apache most probably) that is hosting this application doesn't have your character encoding you wont see your character. Most of theese application use just UTF8 as encoding therefore I suggest you to change your database to this encoding just UTF8 and the collation to utf8_general_ci.
Your question is most probably related with this one How to display UTF-8 characters in phpMyAdmin?
I'm retrieving photos info from Flickr into my site.
For each photo, the URL, title, and tags are saved in MySQL database.
I read a lot about the most suited character set for such values and I found its mostly between using utf8 or latin1.
Some titles and tags include symbols like the copyright (or similar ones).
Will I be OK with Latin1 character set?
Always use utf-8, everywhere, in your database, text editor, html metas, charset headers....
This is the only advice I can give you.
Never use utf-8 in your database if the stored text is 100% ascii. Bloating each character with utf-8 could imply a significant performance hit if the text is indexed.
I am using UTF-8 encoding on my website. Lately I have been storing chinese/spanish/russian names in my MySQL tables and then printing them with PHP on a page generated with a charset of UTF-8. The page works fine and I see all the letters correctly. However, I just realized that my table is set with latin1_swedish_ci charset. How is it possible that even though I stored these names with latin1_swedish_ci charset, serving them on my site with UTF-8 still shows them up correctly?
Thanks!
Joel
Because mysql connection is still using latin1,
you should treat these data is in UTF-8 but store in latin1 environment.
So, to prove it,
show variables like '%char%';
the above should return most of the setting is in latin1
apply
set names utf8;
And you would see all the UTF-8 become double encoded (garbled)
You would be interested in thread:
A script to change all tables and fields to the utf-8-bin collation in MYSQL
They are dealing there with ways of repairing same things you have done.
Note that when used in primary keys, size of varchar encoded in utf8 count triple, so the maximal length for single primary key column is varchar(333).