SSIS Convert english BlobColumn to String ending up with Chinese characters - mysql

I am transferring data from MS SQL to MYSQL. The transfer works, but having trouble with BlobColumn. I am achieving the transfer using a script component, and coding the insert statement. I have several blob columns that are 'text' columns in MySQL. I am converting like this:
Replace(System.Text.Encoding.Unicode.GetString(Row.link_desc.GetBlobData(0, Convert.ToInt32(Row.link_desc.Length))), "'","\'")
It transfers the contents, but they are in Chinese characters after the transfer. I assume this has something to do with the Encoding, but not sure what.

Sounds to me like the data coming in may be ASCII and your encoding is Unicode. Try:
Replace(System.Text.Encoding.ASCII.GetString(Row.link_desc.GetBlobData(0, Convert.ToInt32(Row.link_desc.Length))), "'","\'")

Related

UTF-8 encoded MS Access table

I have a MS Access table which is encoded in UTF-8 charset and the characters appear like this:
Participació en comissió
If I UTF-8 decode this text I get the correct text:
Participació en comissió
How can I utf-8 decode several Access table columns? I would like to end up with the same MS Access database but with the columns converted (utf-8 decoded). I cannot figure out an easy way to do this conversion.
Thanks in advance.
--
More clarifications:
So how did you decode the text that you have in the question?
I simply put the sentence in an online utf-8 decoder but it crashes when there is a lot of text. FYI, the Access table comes from a MS SQL Server database with Modern_Spanish_CI_AS collation and varchar (MAX) data type field. Maybe is there a way to perform the conversion while exporting the table from the MS SQL Server?
While searching for a solution I found this post that has a function to decode utf-8 fields right from the MS SQL Server. I tested it and it works perfectly, althought it is quite slow. Hope this helps someone else with the same problem.
New query editor and copy&paste the function provided in this link:
Convert text value in SQL Server from UTF8 to ISO 8859-1

How to export text data with Hebrew and special characters from SAS to ACCESS?

I have a SAS table that contains hundreds of thousands of rows and several text fields and I need to import this table into and ACCESS database.
The fields contains names in Hebrew characters and special characters such as commas, colons, brackets, quotes, double quotes and any other character you can think of.
I've tried exporting the table as a CSV file and importing it into my ACCESS database and encountered 2 issues:
Access does not recognize the Hebrew characters
Every time there is a special character that is also defined as a delimiter in the access import query, the data is read incorrectly.
Any ideas?
Im using SAS 9.2 and ACCESS 2010 on Windows XP. I'll probably be upgrading to Windows 7 and SAS 9.4 soon so I can have integrated connectivity between ACCESS and SAS. Anyone knows if it solves those problems?
Thanks.
Okay folks, i found the answer, and its really simple.
Instead of exporting to a CSV file and then to Access, there is an option of exporting data directly from SAS to an Access database (somehow I missed it before...).
Seems to work well. It keeps the Hebrew characters and doesn't mess the data. The SAS table and the ACCESS table are not linked, but that's not an issue in my current application.
Code used:
`
PROC EXPORT DATA=lib.table
OUTTABLE= "table1"
DBMS=ACCESS REPLACE;
DATABASE= "L:\test.accdb";
RUN;
`

Problems importing excel data into MySQL via CSV

I have 12 excel files, each one with lots of data organized in 2 fields (columns): id and text.
Each excel file uses a diferent language for the text field: spanish, italian, french, english, german, arabic, japanese, rusian, korean, chinese, japanese and portuguese.
The id field is a combination of letters and numbers.
I need to import every excel into a different MySQL table, so one table per language.
I'm trying to do it the following way:
- Save the excel as a CSV file
- Import that CSV in phpMyAdmin
The problem is that I'm getting all sorts of problems and I can't get to import them properly, probably because of codification issues.
For example, with the Arabic one, I set everything to UTF-8 (the database table field and the CSV file), but when I do the import, I get weird characters instead of the normal arabic ones (if I manually copy them, they show fine).
Other problems I'm getting are that some texts have commas, and since the CSV file uses also commas to separate fields, in texts that are imported are truncated whenever there's a comma.
Other problems are that, when saving as CSV, the characters get messed up (like the chinese one), and I can't find an option to tell excel what encoding I want to use in the CSV file.
Is there any "protocol" or "rule" that I can follow to make sure that I do it the right way? Something that works for each different language? I'm trying to pay attention to the character encoding, but even with that I still get weird stuff.
Maybe I should try a different method instead of CSV files?
Any advice would be much appreciated.
OK, how do I solved all my issues? FORGET ABOUT EXCEL!!!
I uploaded the excels to Googledocs spreadsheets, downloaded them as CSV, and all the characters were perfect.
Then I just imported into their corresponding fields of the tables, using a "utf_general_ci" collation, and now everything is uploaded perfectly in the database.
One standard thing to do in a CSV is to enclose fields containing commas with double quotes. So
ABC, johnny cant't come out, can he?, newfield
becomes
ABC, "johnny cant't come out, can he?", newfield
I believe Excel does this if you choose to save as file type CSV. A problem you'll have is that CSV is ANSI-only. I think you need to use the "Unicode Text" save-as option and live with the tab delimiters or convert them to commas. The Unicode text option also quotes comma-containing values. (checked using Excel 2007)
EDIT: Add specific directions
In Excel 2007 (the specifics may be different for other versions of Excel)
Choose "Save As"
In the "Save as type:" field, select "Unicode Text"
You'll get a Unicode file. UCS-2 Little Endian, specifically.

Importing csv file to mssql db with bulk insert changes special characters "æøå" to some unknown encoding

I am trying to import a csv file to a mssql database with the BULK INSERT method. Problem is that it contains special character (norwegian letters æ, ø, and å) and after the insert is ran they get replaced by characters I don't know the encoding of.
To be more specific, ø is replaced with °, å is replaced with Õ and æ is replaced with µ.
I also tried to convert them to UTF-8 before inserting, but I understood that the BULK INSERT method doesn't support this. The respective UTF-8 encodings for æøå then ended up with something like +©.
I have also tried to use the wizard import function, but since I have identity on one of the columns, the import will just insert a 0 for every record rendering the import useless for copying.
Anyone know how I could set the encoding when running the bulk insert as it works perfectly with the identity column. I am using MS SQL Server Managent Studio 2008
I think there are two ways:
Specify CODEPAGE = RAW in your BULK INSERT command (see MSDN).
Create a format file and specify a collation for each column.

Problem with charset

I have an MYSQL Database in utf-8 format, but the Characters inside the Database are ISO-8859-1 (ISO-8859-1 Strings are stored in utf-8). I've tried with recode, but it only converted e.g. ü to ü). Does anybody out there has an solution??
If you tried to store ISO-8859-1 characters in the a database which is set to UTF-8 you just managed to corrupt your "special characters" -- as MySQL would retrieve the bytes from the database and try to assemble them as UTF-8 rather than ISO-8859-1. The only way to read the data correctly is to use a script which does something like:
ResultSet rs = ...
byte[] b = rs.getBytes( COLUMN_NAME );
String s = new String( b, "ISO-8859-1" );
This would ensure you get the bytes (which came from a ISO-8859-1 string from what you said) and then you can assemble them back to ISO-8859-1 string.
The other problem as well -- what do you use to "view" the strings in the database -- is it not the case that your console doesn't have the right charset to display those characters rather than the characters being stored wrongly?
NOTE: Updated the above after the last comment
I just went through this. The biggest part of my solution was exporting the database to .csv and Find / Replace the characters in question. The character at issue may look like a space, but copy it directly from the cell as your Find parameter.
Once this is done - and missing this is what took me all morning:
Save the file as CSV ( MS-DOS )
Excellent post on the issue
Source of MS-DOS idea