Convert database font from MS SQL to mysql utf8? - mysql

I have old database at one windows dedicated server and now i buy a new linux dedicated server with php and mysql.
I plan to using php to pull out database from ms sql server row by row and put it to mysql database.
But problem is mysql using utf8_unicode_ic and i don't know which charset MS SQL server used.
THanks for help.

Have you tried just running your code? Odds are it'll "Just work".
Caveats below:
You may run into issues in your data (although this is highly unlikely) because the character set you're referring to is actually a collation. That is, it defines the string "ABCDEFGH" to be equal to "abcdefgh". The "_ci" part of utf8_unicode_ci means it's case insensitive.
Some quick googling found that MySQL defaults to case and accent sensitive collation, that's good, because SQL Server does the same. You should check the collation of the SQL Server database, if it's "SQL_Latin1_General_CP1_CI_AS" you should be good.

SQL Server stores character-based data in extended (i.e. depending on Windows, operating system, code-pages/encodings installed and used on server machine) ASCII for non-unicode (char, varchar, text, etc.) types and in unicode (nchar, nvarchar, ntext, etc.) types. I believe internet has plenty of material on this FAQ topic

Related

Migrating from Latin1 SQL Server to utf8mb4 MySQL Incorrect String Error Problems

Final Update
I was able to easily migrate the data with Talend. No errors, and it worked perfectly the first time with no special settings. This shows what an utter piece of garbage the MySQL Workbench Migration tool is. While the learning curve of Talend is rough (it's not intuitive at all), it appears to be one of the best data migration solutions out there. I recommend using it. Note I never figured out why the migration failed (as seen below). I'm just walking away from the utter garbage Oracle has pushed on the community. Oh, and Talend migrated the data to utf8mb4/utf8_general_ci without a hitch.
Please note there are updates at the bottom.
We have to migrate an export from TrackerRMS (which luckily doesn't have FK constraints, but the data is a total mess) to MySQL. Restoring the backup of the TrackerRMS data to SQL Server is cake; no issues. The problem is copying the data from SQL Server to MySQL.
MySQL Workbench Migration can handle all but 4 of the tables; but those 4 tables are the key problem. They have crazy content in their fields which causes the migration tool to choke. I attempted to export the data as .sql from HeidiSQL and it chokes as well.
The source table problem fields are NVARCHAR(MAX) and SQL_Latin1_General_CP1_CI_AS collation.
Note I've tried changing the collation of the source SQL Server table columns to Latin1_General_100_BIN2_UTF8 and Latin1_General_100_CI_AI_SC_UTF8 and there is no effect.
The errors are:
ERROR: `Backup_EmpowerAssociates`.`BACKUP_documents`:Inserting Data: Incorrect string value: '\xF0\x9F\x93\x8A x...' for column 'filepath' at row 13
ERROR: `Backup_EmpowerAssociates`.`BACKUP_activities`:Inserting Data: Incorrect string value: '\xF0\x9F\x91\x80' for column 'subject' at row 42
ERROR: `Backup_EmpowerAssociates`.`BACKUP_resourcehistory`:Inserting Data: Incorrect string value: '\xF0\x9D\x91\x82(\xF0...' for column 'jobdescription' at row 80
This tells me the source data has 4-byte character details (which is beyond the standard utf8). Note the destination database in MySQL is utf8mb4 and utf8mb4_unicode_ci collated, and has the default settings as such. No connection settings override this.
When migrating I use Microsoft SQL Server and ODBC (native) for localhost (SQL Server) with default options. I've also tried turning ANSI off, but it has no impact. Note the ODBC configuration for SQL Server has no charset or collation settings or options. For target, I use the localhost stored connection which I use for general access.
Note the MySQL Workbench migration tool defines the receiving table columns (for the above problem columns) as LONGTEXT CHARACTER SET 'utf8mb4'.
Could the issue be the migration proxy (ODBC?) is somehow converting it to utf8 (even though I don't have that selected)? But if that was the case, wouldn't the incoming data not be erroring out in the migration process as a UTF8MB4 solution (4-byte vs less)?
Note I tried creating and adjusting the destination MySQL table (by adjusting the SQL in the migration tool) as CHARSET latin1 and latin1_general_ci collation. Same issue.
Migration simply does not want to work (this is with SQL Server source being SQL_Latin1_General_CP1_CI_AS). And I've tried it with UTF8 both on and off for driver. No effect.
Does anyone with migration experience recognize this issue, or have recommendations on how to resolve the problem? I'm fine with scrubbing the source data in SQL Server before I migrate - I just don't know the best method to do that (or if it's necessary).
Thanks!
===
UPDATE 1
This is very strange; using the below technique to show values that won't convert, this is the result:
SELECT filepath, CONVERT(varchar,filepath) FROM BACKUP_documents WHERE filepath <> CONVERT(varchar, Filepath);
Why on earth is the data being truncated upon convert with a simple filename at the "c" in documents?
Here's a capture that might also help resolve this issue.
But the strange part is MSSQL is showing normal text (without special characters) as being non-ASCII. I'm wondering if the folks at TrackerRMS are running code written in another country/language and it's messing up the data, but it's something that's not visible?
UPDATE 2
So to make things clear, here's what one of the characters that is messing up the data looks like.
I was able to easily migrate the data with Talend. No errors, and it worked perfectly the first time with no special settings. This shows what an utter piece of garbage the MySQL Workbench Migration tool is. While the learning curve of Talend is rough (it's not intuitive at all), it appears to be one of the best data migration solutions out there. I recommend using it. Note I never figured out why the migration failed (as seen below). I'm just walking away from the utter garbage Oracle has pushed on the community. Oh, and Talend migrated the data to utf8mb4/utf8_general_ci without a hitch.

MySQL utf8_bin collation equivalent for Azure SQL database

I am trying to migrate a MySql application to Azure.
The pricing for Azure's MySql database seems to be quite higher than the "SQL Databases" option so i decided to go for that "SQL database" option.
The last step for the resource set-up is to choose a collation.
In MySQL i use utf8_bin but that collation seems not to be valid for "SQL Database".
Is there an equivalent collation?
I need to store UTF characters, case sensitive and accent sensitive comparison and i almost never sort strings.
I did some research on the internet, but couldn't find any information about Azure's collations
Edit:
After additional researches i've come across 'Latin1_General_BIN2' that should do the job. I'm not sure that 'Latin' can handle all utf8 characters (eg. ʖ, ޖ, etc) - and i did not yet fully grasped the difference between BIN and BIN2 collations
that collation is not UTF8 capable. Up to this moment, existing collations in SQL Server and Azure SQL DB are non-Unicode, with Unicode being enabled (UTF-16) with the NCHAR and NVARCHAR (and SQLVARIANT) data types.
That being said, we are now running a private preview of UTF8 support in SQL Server and Azure SQL DB, so I'd like to further discuss with you.
Will you be at Ignite? If so please look for me in the SQL Server booth. If not, can you please send me an email to utf8team#microsoft.com?
Thank you!

SQL Server displays ??? instead of Unicode characters

I run SQL Server 2008 Express on my Windows 8.1. After inserting some Unicode characters in Database, characters are not displayed properly (the data has no problem just displayed improperly). Manually inserting the values using 'Management Studio' will lead to the same result.
I typed information in Persian Language and installed persial Language pack for Windows 8.1 however this does not fixed anything.
Am I missed something in SQL Server options? The same configuration on windows 7 have no problems.
Check if your field is UNICODE (nchar or nvarchar).
Check collation for your database.
Make sure your UNICODE literals have N prefix.
UPDATE: check this, possible duplicate question
Changing type of fields from VARCHAR to NVARCHAR fixes the problem but it does not recover data stored in bad format.

Transfer old 3.23.49 MySQL database to 5.0.51 MySQL database - Encoding in ANSI and UTF-8

I want to transfer a 3.23.49 MySQL database to a 5.0.51 MySQL database. Now I have exported the SQL file and I'm ready for import. I looked in the sql-file and Notepad++ shows me that the files is encoded in ANSI. I looked in the values and some of them are in ANSI and some of them are in UTF-8. What is the best way to proceed?
Should I change the encoding within Notepad++?
Should I use ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8;?
Should I use iconv?
Do I have to look through each table and make the necessary changes?
Whate are the settings for the import? MYSQL323 compatibility mode and encoding latin1?
Do I have to be aware of something if the php-scripts are using another encoding?
Thank you for your hints!
If the problem is to import a utf8-encoded mysql dump, the solution is usually to add --default-character-set=utf8 to mysql options:
mysql --default-character-set=utf8 -Ddbname -uuser -p < dump.sql
UPD1: In case the dump file is corrupted, I would try to export the database once again table by table so that the dump would result in a correct utf8 encoded file.
I have converted a MySQL 4.0 database (which also had no notion of character encoding yet) to MySQL 5.0 four years ago, so BTDT.
But first of all, there is no "ANSI" character encoding; that is a misconception and a misnomer that has caught on from the early versions of Windows (there are ANSI escape sequences, but they have nothing to do with character encoding). You are most certainly looking at Windows‑1252-encoded text. You should convert that text to UTF‑8 as then you have the best chance of keeping all used characters intact (UTF‑8 is a Unicode encoding, and Unicode contains all characters that can be encoded with Windows-125x, but at different code points).
I had used both the iconv and recode programs (on the Debian GNU/Linux system that the MySQL server ran on) to convert Windows‑1252-encoded text of a MySQL export (created by phpMyAdmin) to UTF‑8. Use whatever program or combination of programs works best for you.
As to your questions:
You can try, but it might not work. In particular, you might have trouble opening a large database dump with Notepad++ or another text editor.
Depends. ALTER TABLE … CONVERT TO … does more than just converting encodings.
See the paragraph above.
Yes. You should set the character encoding of every table and every text field that you are importing data into, to utf8 (use whatever utf8_… collation fits your purpose or data best). ALTER TABLE … CONVERT TO … does that. (But see 2.)
I don't think MYSQL323 matters here, as your export would contain only CREATE, INSERT and ALTER statements. But check the manual first (the "?" icon next to the setting in phpMyAdmin). latin1 means "Windows-1252" in MySQL 5.0, so that might work and you must skip the manual conversion of the import then.
I don't think so; PHP is not yet Unicode-aware. What matters is how the data is processed by the PHP script. Usually the Content-Type header field for your generated text resources using that data should end with ; charset=UTF-8.
On an additional note, you should not be using MySQL 5.0.x anymore. The current stable version is MySQL 5.5.18. "Per the MySQL Support Lifecycle policy, active support for MySQL 5.0 ended on December 31, 2009. MySQL 5.0 is now in the Extended support phase." MySQL 5.0.0 Alpha having been released on 2003-12-22, Extended Support is expected to end 8 full years after that, on 2011‑12‑31 (this year).

MySQL - refusing to run set names?

Quick question as I have never run into this before.
On a webhost I am running the query:
SET NAMES 'utf8'
This is returning the following error:
Error: Unknown system variable 'NAMES'
I haven't run across this before. I get similar errors when trying to specify CURRENT_TIMESTAMP as a default column value as well as setting the collation of a table.
The MySQL queries I am running have worked on literally hundreds of hosting accounts before this one. On contacting the host I was fobbed off saying it was probably my code.
Is the likely hood that this is a dodgy MySQL install? Host says they are running MySQL5
SET NAMES is available since MySQL 4.1, which brought large scale changes to character set handling and full UTF-8 support. Quite sure you have a MySQL version <4.1 in front of you. Try a
SELECT VERSION();
as a1ex07 has recommended.
Older versions of MySQL can only handle 8-bit character data. They can still store UTF-8 data as byte sequences, but they are not aware of it. There are several backdraws to storing UTF-8 in MySQL <4.1. For example string lengths can exceed given column limits although the number of characters should fit. Also the modern string comparison functions do not exist (they correctly compare special characters and different ways to write them, i.e. "ß" vs. "ss" in German).