I have a live MySQL database with UTF-8 data in it. I'm trying to migrate it to SQL Server 2005.
All table fields and all data are in UTF8.
The destination database will have SQL_Latin1_General_CP1_CI_AS collation .
What I did so far was to create an ODBC connection to my MySQL database and do the following:
EXEC master.dbo.sp_addlinkedserver #server = N'MYSQL2', #srvproduct=N'MySQL', #provider=N'MSDASQL', #provstr=N'DRIVER={MySQL ODBC 5.1 Driver}; SERVER=SERVERHOST; DATABASE=mysqldb; USER=mysqluser;PASSWORD=PASSWORD; OPTION=3;charset:utf8;';
select * into mssql.dbo.members from openquery(MySQL, 'select * from mysqldb.members');
the collation that my SQL Server has is Greek_CI_AS.
the destination database has SQL_Latin1_General_CP1_CI_AS collation and I created another one with Greek_CI_AS collation just to see if that was the problem.
The import went well and all table fields in SQL Server were created correctly BUT the data that was stored had encoding ISO-8859-7 (which i presume comes from the Greek_CI_AS collation).
this is a problem since our current site has UTF-8 pages and it should stay that way.
If i use Firefox and MDDB2 to try and get the data I get confused characters. If I change the View-> encoding to ISO-8859-7 I see the data correctly .. but this is not the solution
so how to do this migration and have the correct encoding on the destination SQL Server?
any help appreciated
thanks
you need to tell us what types of data you're using on the MS SQL side.
Are you using VARCHAR or NVARCHAR?
You need to use NVARCHAR if you want to store unicode data.
Related
Final Update
I was able to easily migrate the data with Talend. No errors, and it worked perfectly the first time with no special settings. This shows what an utter piece of garbage the MySQL Workbench Migration tool is. While the learning curve of Talend is rough (it's not intuitive at all), it appears to be one of the best data migration solutions out there. I recommend using it. Note I never figured out why the migration failed (as seen below). I'm just walking away from the utter garbage Oracle has pushed on the community. Oh, and Talend migrated the data to utf8mb4/utf8_general_ci without a hitch.
Please note there are updates at the bottom.
We have to migrate an export from TrackerRMS (which luckily doesn't have FK constraints, but the data is a total mess) to MySQL. Restoring the backup of the TrackerRMS data to SQL Server is cake; no issues. The problem is copying the data from SQL Server to MySQL.
MySQL Workbench Migration can handle all but 4 of the tables; but those 4 tables are the key problem. They have crazy content in their fields which causes the migration tool to choke. I attempted to export the data as .sql from HeidiSQL and it chokes as well.
The source table problem fields are NVARCHAR(MAX) and SQL_Latin1_General_CP1_CI_AS collation.
Note I've tried changing the collation of the source SQL Server table columns to Latin1_General_100_BIN2_UTF8 and Latin1_General_100_CI_AI_SC_UTF8 and there is no effect.
The errors are:
ERROR: `Backup_EmpowerAssociates`.`BACKUP_documents`:Inserting Data: Incorrect string value: '\xF0\x9F\x93\x8A x...' for column 'filepath' at row 13
ERROR: `Backup_EmpowerAssociates`.`BACKUP_activities`:Inserting Data: Incorrect string value: '\xF0\x9F\x91\x80' for column 'subject' at row 42
ERROR: `Backup_EmpowerAssociates`.`BACKUP_resourcehistory`:Inserting Data: Incorrect string value: '\xF0\x9D\x91\x82(\xF0...' for column 'jobdescription' at row 80
This tells me the source data has 4-byte character details (which is beyond the standard utf8). Note the destination database in MySQL is utf8mb4 and utf8mb4_unicode_ci collated, and has the default settings as such. No connection settings override this.
When migrating I use Microsoft SQL Server and ODBC (native) for localhost (SQL Server) with default options. I've also tried turning ANSI off, but it has no impact. Note the ODBC configuration for SQL Server has no charset or collation settings or options. For target, I use the localhost stored connection which I use for general access.
Note the MySQL Workbench migration tool defines the receiving table columns (for the above problem columns) as LONGTEXT CHARACTER SET 'utf8mb4'.
Could the issue be the migration proxy (ODBC?) is somehow converting it to utf8 (even though I don't have that selected)? But if that was the case, wouldn't the incoming data not be erroring out in the migration process as a UTF8MB4 solution (4-byte vs less)?
Note I tried creating and adjusting the destination MySQL table (by adjusting the SQL in the migration tool) as CHARSET latin1 and latin1_general_ci collation. Same issue.
Migration simply does not want to work (this is with SQL Server source being SQL_Latin1_General_CP1_CI_AS). And I've tried it with UTF8 both on and off for driver. No effect.
Does anyone with migration experience recognize this issue, or have recommendations on how to resolve the problem? I'm fine with scrubbing the source data in SQL Server before I migrate - I just don't know the best method to do that (or if it's necessary).
Thanks!
===
UPDATE 1
This is very strange; using the below technique to show values that won't convert, this is the result:
SELECT filepath, CONVERT(varchar,filepath) FROM BACKUP_documents WHERE filepath <> CONVERT(varchar, Filepath);
Why on earth is the data being truncated upon convert with a simple filename at the "c" in documents?
Here's a capture that might also help resolve this issue.
But the strange part is MSSQL is showing normal text (without special characters) as being non-ASCII. I'm wondering if the folks at TrackerRMS are running code written in another country/language and it's messing up the data, but it's something that's not visible?
UPDATE 2
So to make things clear, here's what one of the characters that is messing up the data looks like.
I was able to easily migrate the data with Talend. No errors, and it worked perfectly the first time with no special settings. This shows what an utter piece of garbage the MySQL Workbench Migration tool is. While the learning curve of Talend is rough (it's not intuitive at all), it appears to be one of the best data migration solutions out there. I recommend using it. Note I never figured out why the migration failed (as seen below). I'm just walking away from the utter garbage Oracle has pushed on the community. Oh, and Talend migrated the data to utf8mb4/utf8_general_ci without a hitch.
SQL Server Migration Assistant for MySQL (Chinese display error)
I am using SQL Server Migration Assistant for MySQL to migrate from Mysql to SQL Server
After the migration data is completed Chinese display ?? Error display!
迁移工具
Our data is available in four languages
zh_TW
zh_CN
pt_PT
en
I finally solved the problem completely
When I link SQL server data, I set charset = utf8, but there are still individual displays in Chinese Simplified "?"
At last, we set the field type to nvarchar (max). At first, I knew that modifying the field type could solve the problem, but we were migrating data and could not modify the field type. Using symfony schema update to generate SQL statements also did not support nvarchar (max). At last, we modified the source code of generating SQL statements. Using text type, we could compile nvarchar (max)
enter image description here
I have old database at one windows dedicated server and now i buy a new linux dedicated server with php and mysql.
I plan to using php to pull out database from ms sql server row by row and put it to mysql database.
But problem is mysql using utf8_unicode_ic and i don't know which charset MS SQL server used.
THanks for help.
Have you tried just running your code? Odds are it'll "Just work".
Caveats below:
You may run into issues in your data (although this is highly unlikely) because the character set you're referring to is actually a collation. That is, it defines the string "ABCDEFGH" to be equal to "abcdefgh". The "_ci" part of utf8_unicode_ci means it's case insensitive.
Some quick googling found that MySQL defaults to case and accent sensitive collation, that's good, because SQL Server does the same. You should check the collation of the SQL Server database, if it's "SQL_Latin1_General_CP1_CI_AS" you should be good.
SQL Server stores character-based data in extended (i.e. depending on Windows, operating system, code-pages/encodings installed and used on server machine) ASCII for non-unicode (char, varchar, text, etc.) types and in unicode (nchar, nvarchar, ntext, etc.) types. I believe internet has plenty of material on this FAQ topic
I am moving a table from MySQL to SQL Server 2008 which holds a mixture of characters in one e.g. english, Français, Ελλάδα
When I do this I either get the Greek character represented by ????? or I loose the French/Spanish accents?
I have set my columns up as nvarchar for unicode and played around with the collations but I cannot seem to figure this one out.
It turns out my problem was with the actuall insert script I was running. You using NVarchar field you need add an "N" to the insert. i.e. myColumn = N'Ελλάδα'.
What SQL Commands are you using??
When inserting into MS SQL you need to use the COLLATE keyword for each column that has a special collation.
You should ensure that MySQL characters are correctly converted to ucs-2 (unicode) used by SQL Server from whatever collation/encoding you have in MySQL (probably, utf-8?)
See. for ex., Insert UTF8 data into a SQL Server 2008
I wish to develop a client-server application in VB.NET. I want to store some fields in
Unicode. As per MySQL documentation I tried the fields with varchar and charset UTF-8 for storing Unicode data.
I could insert data using the MySQL connector command object but when I try to display data in datagridview some junk is appearing.
What am I missing?
I don't know VB.NET, but you should have the possibility to set the encoding of the database connection from your application to MySQL when setting up to connection. Is that part set to UTF-8 as well?
Alternatively you can try issuing the following MySQL command after you connect:
SET NAMES utf8