Varchar Encoding in MySQL - mysql

I am coming to MySQL from MS SQL. It was my understanding that in order to store International Characters, I need to declare a field as varchar with UTF-8 character set.
I am using Sequel Pro to develop MySQL database. When I manually enter international characters into a field in my table, it does not understand it and turns it into questions marks (?????).
Could someone please point me into the right direction?

Simple example,
ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET utf8;
Source: http://dev.mysql.com/doc/refman/5.0/en/charset-conversion.html

Related

Strange character appearing in MySQL database field

I've imported some data from an old FileMaker program into a mysql database and I've noticed some strange characters. What is the best way to get rid of all these?
It seems the data you imported into your MySQL DB has a different character set (Collation) then what you set when you built the table in MySQL.
My guess would be those "strange characters" are an emoji. Which for example in utf8 charset, aren't defined. But utf8mb4 they would be.
The short answer is, find what charset the data was originally made in, and build your new data table with the same charset.
MySQL Workbench
The Charset can be manipulated easily in MySQL Workbench by right clicking the schema then table in question, click on the wrench.
SQL Syntax
Or via something similar to :
ALTER TABLE YourTableName CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE YourTableName modify name text charset utf8mb4;
I believe your varchar length are overflowing.
For example: field varchar(10) and you put "STRING OVER 10 CHARACTERS" this may cause trash on String.

Datas from mysql database don't display on website

I started my Project with UTF8 Database for the development. At the time of project completion, my Professor gave me the database of type Latin1. Because of Type Conflicting my project is not working properly, result my server side code doesn't access the database.
I tried much conversion: My Professors database into UTF8, but it failed.
Still I’m facing the issue. Kindly assist me in this regards.
Thank you !
You may try a charset converter like MySQL charset converter
This PHP script can be used to convert your MySQL database tables from latin1-to-UTF8 while retaining the integrity of all internal multibyte characters.
You could try this:
ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name;
replace charset_name with latin1
You can also change columns individually
ALTER TABLE TableName MODIFY ColumnName ColumnType CHARACTER SET latin1;

Migrating from SQL server to MySQL using pentaho unicode issue

I have a problem migrating data from SQL server to MySQL. I have nvarchar columns in SQL server and am exporting them to a Unicode textfile. But when I am importing the column into an utf-8 table of MySQL I get an error for duplicate value: Mysql sees no difference between 'Kaneko, Shûsuke' and 'Kaneko, Shusuke'. I am trying to get these values into a unique column.
What's wrong?
must I use another charset in MySQL?
I also tried converting the textfile to utf8 before importing into MySQL, but still getting the same error.
It seems the problem in your Mysql Table creation. First use SHOW CREATE TABLE on mysql prompt and see its table structure. Have you used right charset and collate. You can read here mysql docs
Many times collation is indeed not only case insensitive, but also partly accent insensitive, so ñ = n. (as Joni Salonen points out, this is incorrect!) but á = a.
So we can use binary collation but its have own drawback.Binary collation compares your string exactly as strcmp() in C would do, if characters are different (be it just case or diacritics difference). The downside of it that the sort order is not natural.
An example of unnatural sort order (as in "binary" is) : A,B,a,b Natural sort order would be in this case e.g : A,a,B,b (small and capital variations of the sme letter are sorted next to each other)
The practical advantage of binary collation is its speed, as string comparison is very simple/fast. In general case, indexes with binary might not produce expected results for sort, however for exact matches they can be useful.
Use a binary collation for the specific column (possibly your best bet)
For ex-
drop table cc;
CREATE TABLE cc ( c CHAR(100) primary key ) DEFAULT CHARACTER SET utf8 COLLATE utf8_bin;
insert into cc values ( 'Kaneko, Shûsuke' );
insert into cc values ( 'Kaneko, Shusuke' );

store polish characters mysql

I am trying to save characters like "ą", "ć", "ł" but they are saved in the database as question marks (I save them using phpMyAdmin).
The database and table's collation is utf8_bin.
Try changing the collation to:
utf8_unicode_ci
or
utf8_polish_ci
You can refer to: http://mysql.rjweb.org/doc.php/charcoll
Also you can TRY altering the specific column with:
ALTER TABLE tbl MODIFY COLUMN txt TEXT CHARACTER SET utf8
I've searched a lot and finally, I got a solution with this:
ALTER TABLE tableName MODIFY COLUMN columnName VARCHAR(64) CHARACTER SET `binary`;
You can change VARCHAR(64) to match your needs. I hope this helps someone. Note that I did not only required to store Polish characters but French and Spanish ones as well. So the solutions above might work for just polish chars.
You can also change column from vchar to nvchar. Then when inserting values to DB remember to ad N before as follows: N'ŁÓDŹ' (in persistence frameworks u should have some kind of NString representation)
from documentation:
Nvarchar stores UNICODE data. If you have requirements to store UNICODE or multilingual data, nvarchar is the choice. Varchar stores ASCII data and should be your data type of choice for normal use. Regarding memory usage, nvarchar uses 2 bytes per character, whereas varchar uses 1

Is this a safe way to convert MySQL tables from latin1 to utf-8?

I need to change all the tables in one of my databases from latin1 to utf-8 (with utf8_bin collation).
I have dumped the database, created a test database from it, and run the following without any errors or warnings for each table:
ALTER TABLE tablename CONVERT TO CHARSET utf8 COLLATION utf8_bin
Is it safe for me to repeat this on the real database? The data seems fine by inspection...
There are 3 different cases to consider:
The values are indeed encoded using Latin1
This is the consistent case: declared charset and content encoding match. This was the only case I covered in my initial answer.
Use the command you suggested:
ALTER TABLE tablename CONVERT TO CHARSET utf8 COLLATE utf8_bin
Note that the CONVERT TO CHARACTER SET command only appeared in MySQL 4.1.2, so anyone using a database installed before 2005 had to use an export/import trick. This is why there are so many legacy scripts and document on Internet doing it the old way.
The values are already encoded using utf8
In this case, you don't want mysql to convert any data, you only need to change the column's metadata.
For this, you have to change the type to BLOB first, then to TEXT utf8 for each column, so that there are no value conversions:
ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8
This is the recommended way, and it is explicitely documented in Alter Table Syntax Documentation.
The values use in a different encoding
The default encoding was Latin1 for several years on a some Linux distributions. In this case, you have to use a combination of the two techniques:
Fix the table meta-data, using the BLOB type trick
Convert the values using CONVERT TO.
A straightforward conversion will potentially break any strings with non-utf7 characters.
If you don't have any of those (i.e. all of your text is english), you'll usually be fine.
If you've any of those, however, you need to convert all char/varchar/text fields to blob in an initial run, and to convert them to utf8 in a subsequent run.
See this article for detailed procedures:
http://codex.wordpress.org/Converting_Database_Character_Sets
I've done this a few times on production databases in the past (converting from the old standard encoding swedish to latin1), and when MySQL encounters a character that cannot be translated to the target encoding, it aborts the conversion and remains in the unchanged state. Therefor, I'd deem the ALTER TABLE statement working.