Strange character appearing in MySQL database field - mysql

I've imported some data from an old FileMaker program into a mysql database and I've noticed some strange characters. What is the best way to get rid of all these?

It seems the data you imported into your MySQL DB has a different character set (Collation) then what you set when you built the table in MySQL.
My guess would be those "strange characters" are an emoji. Which for example in utf8 charset, aren't defined. But utf8mb4 they would be.
The short answer is, find what charset the data was originally made in, and build your new data table with the same charset.
MySQL Workbench
The Charset can be manipulated easily in MySQL Workbench by right clicking the schema then table in question, click on the wrench.
SQL Syntax
Or via something similar to :
ALTER TABLE YourTableName CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE YourTableName modify name text charset utf8mb4;

I believe your varchar length are overflowing.
For example: field varchar(10) and you put "STRING OVER 10 CHARACTERS" this may cause trash on String.

Related

How to have a column specific character encoding in Ruby on Rails and MySQL?

I have a new table I am adding to a large Ruby on Rails project. Our database currently uses the UTF8 character encoding. However, I need to support emoji's in one column of this new table.
In order to do that, that column needs to be in UTF8mb4 encoding, since the base UTF8 encoding in MySQL doesn't support emoji's.
I can change things on the mysql side by executing this SQL:
execute "ALTER TABLE table_name CHANGE column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin"
The problem is this doesn't fix the issue, I actually have to change the 'encoding' option in the database.yml file from utf8 to utf8mb4.
The problem with this is when my test run loads the database schema, suddenly it can't because characters in utf8mb4 use more bytes than they do in utf8, and we have existing indexes that are too long to fit into the new utf8mb4 byte requirement.
I don't really want to edit all of our existing indexes that break this, our database is pretty large.
Is there no way in Rails to use a character encoding on a table or column level? Do I have to update the database.yml file and therefore force my entire database to use utf8mb4?
Change VARCHAR(255) to VARCHAR(191), for any column being indexed, to get around index limit, or
Update to MySQL 5.7, which does not have the issue.
If you need to discuss more, please provide SHOW CREATE TABLE and the specific connection parameters, perhaps found in application_controller.rb.

Datas from mysql database don't display on website

I started my Project with UTF8 Database for the development. At the time of project completion, my Professor gave me the database of type Latin1. Because of Type Conflicting my project is not working properly, result my server side code doesn't access the database.
I tried much conversion: My Professors database into UTF8, but it failed.
Still I’m facing the issue. Kindly assist me in this regards.
Thank you !
You may try a charset converter like MySQL charset converter
This PHP script can be used to convert your MySQL database tables from latin1-to-UTF8 while retaining the integrity of all internal multibyte characters.
You could try this:
ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name;
replace charset_name with latin1
You can also change columns individually
ALTER TABLE TableName MODIFY ColumnName ColumnType CHARACTER SET latin1;

The dreaded � character and displaying info from database in UTF8

So, I have a database and I use Navicat. We have a simple PHP website which is a few years old and we've upgraded the site to UTF8.
We have 'activities' on the site which handle UTF8 special characters perfectly, but we also have 'comments' on the site and curly single quotes and other special characters show me a �.
The database was converted to UTF via:
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
When I look at both databases in Navicat, I can see both are UTF8 and utf8_general_ci.
When I design the table I can see the 'activities' table I can see the cell is a mediumText and is setup with UTF8. When I design the 'comments' section, the cell that isn't working is a Blob and it doesn't have any character encoding info.
We're doing a pretty basic SELECT and then displaying via $vairable[column].
Does anyone know why the 'activities' would work perfectly with UTF8 and the 'comments' would have issues? We're not doing anything super fancy to either of them.
I have tried converting the Blob to a text field, but when I do that the database then escapes it'self when it's outputting to the page, so as soon as there is a single quote in the text it cuts off.
I have tried things like utf8_encode, stripslashes, mysql_real_escape_string, htmlentities, htmlspecialchars, but I'm not sure any of them would help anyway.
Thanks!
blob means binary large object. Binary data does not have any encoding in raw.
So you have latin1 or whatever data in a blob, and you show it and treat it like utf-8 data.
You need to manually convert the data using PHP or whatever.
Here is a good article from the performanceblog that describes what you can do:
http://www.mysqlperformanceblog.com/2013/10/16/utf8-data-on-latin1-tables-converting-to-utf8-without-downtime-or-double-encoding/
If you have problems firing your queries, use the console instead of phpMyAdmin and don't forget the connection encoding through SET NAMES
master> ALTER TABLE t CONVERT TO CHARACTER SET utf8, CHANGE comment comment TEXT;
master> SET NAMES utf8;

Converting data from LATIN1 to UTF8 in MySQL

I'm trying migrate a MSSQL database to MySQL. Using MySQL Workbench I moved the schema and data over but having problems converting the character encoding. During the migration I had the tool put text into BLOBS when there was problems with the encoding.
I believe I've confirmed that the data that is now in MySQL is *latin1_swedish_ci*. To simplify the problem I'm looking at ® symbols in one of the columns.
I wanted to convert the BLOBS to VARCHAR or TEXT with UTF8 encoding. I'm running this SQL command on one of the columns:
ALTER TABLEbookdetailsMODIFYBookNameVARCHAR(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Instead of converting the ® it is just removing them which is not what I want. What am I doing wrong? Not that reading half the internet trying to find a solution isn't fun but 3 days in and I think my eyes are about to give out.
MySQL workbench has a UI that is relatively simple to nav. If you need to change the collation of the tables or schemas, you can right click them on the Object Browser and go to alter table, or alter schema there you can change the data types, and set the collation to whatever you want.

Is this a safe way to convert MySQL tables from latin1 to utf-8?

I need to change all the tables in one of my databases from latin1 to utf-8 (with utf8_bin collation).
I have dumped the database, created a test database from it, and run the following without any errors or warnings for each table:
ALTER TABLE tablename CONVERT TO CHARSET utf8 COLLATION utf8_bin
Is it safe for me to repeat this on the real database? The data seems fine by inspection...
There are 3 different cases to consider:
The values are indeed encoded using Latin1
This is the consistent case: declared charset and content encoding match. This was the only case I covered in my initial answer.
Use the command you suggested:
ALTER TABLE tablename CONVERT TO CHARSET utf8 COLLATE utf8_bin
Note that the CONVERT TO CHARACTER SET command only appeared in MySQL 4.1.2, so anyone using a database installed before 2005 had to use an export/import trick. This is why there are so many legacy scripts and document on Internet doing it the old way.
The values are already encoded using utf8
In this case, you don't want mysql to convert any data, you only need to change the column's metadata.
For this, you have to change the type to BLOB first, then to TEXT utf8 for each column, so that there are no value conversions:
ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8
This is the recommended way, and it is explicitely documented in Alter Table Syntax Documentation.
The values use in a different encoding
The default encoding was Latin1 for several years on a some Linux distributions. In this case, you have to use a combination of the two techniques:
Fix the table meta-data, using the BLOB type trick
Convert the values using CONVERT TO.
A straightforward conversion will potentially break any strings with non-utf7 characters.
If you don't have any of those (i.e. all of your text is english), you'll usually be fine.
If you've any of those, however, you need to convert all char/varchar/text fields to blob in an initial run, and to convert them to utf8 in a subsequent run.
See this article for detailed procedures:
http://codex.wordpress.org/Converting_Database_Character_Sets
I've done this a few times on production databases in the past (converting from the old standard encoding swedish to latin1), and when MySQL encounters a character that cannot be translated to the target encoding, it aborts the conversion and remains in the unchanged state. Therefor, I'd deem the ALTER TABLE statement working.