I have a csv file that contains unicode characters (specifically Hindi characters) that I need to import using SSIS. When I set the connection to unicode, SSIS cannot find the CRLF delimiters. If I uncheck the unicode check box, it finds the CRLF delimiters just fine.
How can I correctly import this data?
Rather than ticking the "Unicode" checkbox beside the "Locale" drop-down, leave the checkbox blank and pick "65001 (UTF-8)" option from the "Locale" drop-down instead. I discovered it just today after wasting some 30 minutes trying various combinations of encodings etc. May work in your case as well.
Related
in Azure Data Studio there's a setting called queryEditor.results.saveAsCsv.delimiter that allows the user to choose the default delimiter when exporting query results in a csv file. However, I'm struggling to find a way to set this property to use a tab as a delimiter; i tried \t, but it seem that the property only accepts 1 single character, so it considers only \. I searched a lot and couldn't find any solution. Any ideas?
Make sure you're placing the escaped character in double quotes
"queryEditor.results.saveAsCsv.delimiter": "\t",
I don't think it accepts one character, I wanted my CSV exports to have a record for each line so my line separator needs be CRLF on a Windows machine. It's set to:
"queryEditor.results.saveAsCsv.lineSeperator": "\r\n",
When I use the import feature of PHPMyAdmin, it doesn't import non-ASCII characters such as ä, ö, ü, õ and the rest of the word after the characters.
When I open the CSV file with Notepad it displays the non-ASCII characters normally, but when I'm trying to import it - it doesn't work.
Entering those missing characters manually works and MySQL saves them just as it should. Any thoughts?
mySQL will do this when it encounters a character that is invalid under the current character set.
You're not mentioning what tool you are using to import the data, but you should be able to specify a character set when importing. If that character set matches the database's, everything will be fine. Also, make sure the file is actually encoded in that character set.
If your import tool doesn't offer the option of selecting the character set, you could try phpMyAdmin which does.
Make sure you know what the encoding of your CSV file is - it should be UTF-8. Then before you import, type 'use utf8', and it should work fine.
My flat files fields are tab delimited (\t) with a quotation mark (") text qualifier, and each row is linefeed (LF) separated.
My SSIS package works great when
no fields are text qualified
any field EXCEPT the last column is text qualified
When the last column is text qualified my package errors out saying it couldn't find the delimiter for my last column ... any ideas?
In a programmer's life, 3 problems (that often take hours to track down) are certain: permissions, case sensitivity, and line endings.
In my case, it is line endings. When a CRLF is pressed against the text qualifier ("), SSIS apparently doesn't interpret the text qualifier correctly, but does see the line break.
Here's what my setup looked like when I was having issues:
Here's what my setup looked like after changing the column delimiter:
The official answer here then is to change the line endings. The unfortunate side effect of that is to change a package that works on all the other files - leading to a need to convert files with CRLF to LF before hitting this package, or ending up with unsightly workarounds as seen here.
use this application on your SSIS Execute task
http://www.softsea.com/review/U2WIN.html
and put in flat file folder
I tried set TextQualified = 'False' on your last column to see if it helps
I'm creating SSIS package for to get .csv file to my local server and transfer it to FTP
When I get my csv into FTP and open into excel, My data getting shift over to other columns. Is there internally any kind set up do I need to change?
Also I tried different text qualifier still did not work.
It sounds like there may be hidden characters in your data set. If you are using comma's you may want to consider using a lesser used character for the delimiter such as a pipe "|". For instance an address may naturally have comma's. If a pipe shows up in an address field it's probably a type-o, and is far less likely. Things that shift data cells are often things like tab characters and CRLF. You can also open your data set in a text editor like notepad ++ and choose the "Show all Characters" option under "View->Show Symbols" menu option to see what the exact character is. If it's rampant in your data set you can use the replace function within the Derived Column Task to scrub the data as it comes out of the data source.
I have 12 excel files, each one with lots of data organized in 2 fields (columns): id and text.
Each excel file uses a diferent language for the text field: spanish, italian, french, english, german, arabic, japanese, rusian, korean, chinese, japanese and portuguese.
The id field is a combination of letters and numbers.
I need to import every excel into a different MySQL table, so one table per language.
I'm trying to do it the following way:
- Save the excel as a CSV file
- Import that CSV in phpMyAdmin
The problem is that I'm getting all sorts of problems and I can't get to import them properly, probably because of codification issues.
For example, with the Arabic one, I set everything to UTF-8 (the database table field and the CSV file), but when I do the import, I get weird characters instead of the normal arabic ones (if I manually copy them, they show fine).
Other problems I'm getting are that some texts have commas, and since the CSV file uses also commas to separate fields, in texts that are imported are truncated whenever there's a comma.
Other problems are that, when saving as CSV, the characters get messed up (like the chinese one), and I can't find an option to tell excel what encoding I want to use in the CSV file.
Is there any "protocol" or "rule" that I can follow to make sure that I do it the right way? Something that works for each different language? I'm trying to pay attention to the character encoding, but even with that I still get weird stuff.
Maybe I should try a different method instead of CSV files?
Any advice would be much appreciated.
OK, how do I solved all my issues? FORGET ABOUT EXCEL!!!
I uploaded the excels to Googledocs spreadsheets, downloaded them as CSV, and all the characters were perfect.
Then I just imported into their corresponding fields of the tables, using a "utf_general_ci" collation, and now everything is uploaded perfectly in the database.
One standard thing to do in a CSV is to enclose fields containing commas with double quotes. So
ABC, johnny cant't come out, can he?, newfield
becomes
ABC, "johnny cant't come out, can he?", newfield
I believe Excel does this if you choose to save as file type CSV. A problem you'll have is that CSV is ANSI-only. I think you need to use the "Unicode Text" save-as option and live with the tab delimiters or convert them to commas. The Unicode text option also quotes comma-containing values. (checked using Excel 2007)
EDIT: Add specific directions
In Excel 2007 (the specifics may be different for other versions of Excel)
Choose "Save As"
In the "Save as type:" field, select "Unicode Text"
You'll get a Unicode file. UCS-2 Little Endian, specifically.