Importing CSV in Talend double quote delimited column is ignored - csv

I have a CSV file with a double quote delimited timestamp and an email field, e.g.
Timestamp,Email
"2017-01-01 00:00:01",abc#email.com
"2017-01-01 00:02:31",sampleaddress#email2.com
I have defined a metadata source for the CSV file and it was correctly able to identify and type the two columns. When I execute the package however, it treats the timestamp column as though it doesn't exist (usually I get an error 'Unparseable date: "abc#email.com"')
I have tried altering the tFileInputDelimited with a number of settings, including the escape and text enclosure options and importing the timestamp as both a date and string (If I import it as a string, the timestamp field has the email address and the email address is blank), but I am unable to get the import to recognise the existence of the double quote delimited timestamp column.
I'm assuming that I have done something that is causing it to escape the whole timestamp value, but I can't think of what that might be.

If you really want to keep the double quote around the timestamp in your input file, try this date model
"\"yyyy-MM-dd HH:mm:ss\""
This way, you specify that you need double quotes (\") in the input string.

If you can alter the input data, you should either enable quotes for all fields or for none.
If this is no option, you could also read the file with tFileInputFullRow, remove the quotes with a String replace maybe and process the data afterwards with tDenormalize into column data.

If you are using metadata, then:
Ensure that the component is referring to the repository (Component -> Property Type = Repository)
Modify the metadata to change the text enclosure character to "\""

Related

Unable to load csv file into Snowflake

Iam getting the below error when I try to load CSV From my system to Snowflake table:
Unable to copy files into table.
Numeric value '"4' is not recognized File '#EMPP/ui1591621834308/snow.csv', line 2, character 25 Row 1, column "EMPP"["SALARY":5] If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.
You appear to be loading your CSV with the file format option of FIELD_OPTIONALLY_ENCLOSED_BY='"' specified.
This option will allow reading any fields properly quoted with the " character, and even support such fields carrying the delimiter character as well as the " character if properly escaped. Some examples that could be considered valid:
CSV FORM | ACTUAL DATA
------------------------
abc | abc
"abc" | abc
"a,bc" | a,bc
"a,""bc""" | a,"bc"
In particular, notice that the final example follows the specified rule:
When a field contains this character, escape it using the same character. For example, if the value is the double quote character and a field contains the string A "B" C, escape the double quotes as follows:
A ""B"" C
If your CSV file carries quote marks within the data but is not necessarily quoting the fields (and delimiters and newlines do not appear within data fields), you can remove the FIELD_OPTIONALLY_ENCLOSED_BY option from your file format definition and just read the file at the delimited (,) fields.
If your CSV does use quoting, ensure that whatever is producing the CSV files is using a valid CSV format writer and not simple string munging, and recreate it with the quotes properly escaped. If the above data example is to be considered valid in quoted form, it must instead appear within the file as "4" or 4.
The error message is saying that you have a value in your file that contains a "4 which is being added into a table that has a number field for that value. Since that isn't a number, it fails. This appears to be happening in your very first row of your file, so you could open it up and take a look at the value. If its just one record, you can add the ON_ERROR = 'CONTINUE' to your command, so that it skips it and moves on.

Lose data in random fields when importing from file into table using phpmyadmin

I have an access DB. I exported tables to xlsx. Then I saved as .ods using openOffice
because I found out that phpmyadmin-mysql no longer supports excel files. I have my mySQL database formated exactly as it should to accept the data. I import and everything seems fine except one little detail.
In some fields, the value is NULL instead of the value it should have according to the .ods file. Some rows show the same value for that field correctly, some show NULL.
Also, the "faulty" rows have some fields that show the value 0 for fields that where empty in the imported file (instead of NULL). Default value for those fields in mySQL is NULL. Each row has many fields like that and all of the same data type (tinyint). Some appear correctly NULL and some have the value 0....
I can't see a pattern on all these.
Any help is appreciated.
Check to see that imported strings have ("") quotes and NULL do not and that all are separated appropriately, usually a "," comma with the record/row delimited by ";" semicolon. Best way to check what the MySQL is looking for is to export some existing data to the same format and check it against what you are trying to import. One little missed quote and the deal is off. Be consistent in the use of either double " quotes or single ' quotes. also the ` character is not used as I think. If you are "squishing" your data through an application that applies "smart quotes" like MS word does or "Open Office??' this too can cause issues. Add the word NULL either inside or without quotes in your csv import where values appropriate.

Using Excel to create a CSV file with special characters and then Importing it into a db using SSIS

Take this XLS file
I then save this XLS file as CSV and then open it up with a text editor. This is what I see:
Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB""C","D,E",F,03,"3,2"
I see that the double quote character in column C was stored as AB""C, the column value was enclosed with quotations and the double quote character in the data was replaced with 2 double quote characters to indicate that the quote is occurring within the data and not terminating the column value. I also see that the value for column G, 3,2, is enclosed in quotes so that it is clear that the comma occurs within the data rather than indicating a new column. So far, so good.
I am a little surprised that all of the column values are not enclosed by quotes but even this seems reasonable OK when I assume that EXCEL only specifies column delimieters when special characters like a commad or a dbl quote character exists in the data.
Now I try to use SQL Server to import the csv file. Note that I specify a double quote character as the Text Qualifier character.
And a command char as the Column delimiter character. However, note that SSIS imports column 3 incorrectly,eg, not translating the two consecutive double quote characters as a single occurence of a double quote character.
What do I have to do to get Excel and SSIS to get along?
Generally people avoid the issue by using column delimiter chactacters that are LESS LIKELY to occur in the data but this is not a real solution.
I find that if I modify the file from this
Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB""C","D,E",F,03,"3,2"
...to this:
Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB"C","D,E",F,03,"3,2"
i.e, removing the two consecutive quotes in column C's value, that the data is loaded properly, however, this is a little confusing to me. First of all, how does SSIS determine that the double quote between the B and the C is not terminating that column value? Is it because the following characters are not a comma column delimiter or a row delimiter (CRLF)? And why does Excel export it this way?
According to Wikipedia, here are a couple of traits of a CSV file:
Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes. For example:
"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
However, it looks like SSIS doesn't like it that way when importing. What can be done to get Excel to create a CSV file that could contain ANY special characters used as column delimiters, text delimiters or row delimiters in the data? There's no reason that it can't work using the approach specified in Wikipedia,. which is what I thought the old MS DTS packages used to do...
Update:
If I use Notepad change the input file to
Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8
"1","ABC","AB""C","D,E","F","03","3,2","AB""C"
Excel reads it just fine
but SSIS returns
The preview sample contains embedded text qualifiers ("). The flat file parser does not support embedding text qualifiers in data. Parsing columns that contain data with text qualifiers will fail at run time.
Conclusion:
Just like the error message says in your update...
The flat file parser does not support embedding text qualifiers in data. Parsing columns that contain data with text qualifiers will fail at run time.
Confirmed bug in Microsoft Connect. I encourage everyone reading this to click on this aforementioned link and place your vote to have them fix this stinker. This is in the top 10 of the most egregious bugs I have encountered.
Do you need to use a comma delimiter.
I used a pipe delimiter with no Text qualifier and it worked fine. Here is my output form the text file.
1|ABC|AB"C|D,E|F|03|3,2
You have 3 options in my opinion.
Read the data into a stage table.
Run any update queries you need on the columns
Now select your data from the stage table and output it to a flat file.
OR
Use pipes are you delimiters.
OR
Do all of this in a C# application and build it in code.
You could send the row to a script in SSIS and parse and build the file you want there as well.
Using text qualifiers and "character" delimited fields is problematic for sure.
Have Fun!

How to handle single and double quotes in XSLX spreadsheet when converted to csv for phpMyAdmin import

My client is providing me with an XSLX spreadsheet that, in some columns, can have single and/or double quotes. I'm opening it up in LibreOffice and saving it as a CSV. Then I try to import it in phpMyAdmin, but every time the import gets tripped up on a line with either single or double quotes, depending on which I indicate to use for escaping.
When saving the XLSX as a CSV I select UTF-8 for encoding (it's defaulting to Windows-1252), comma for column delimiter, leave "Save cell content as shown" checked. For "Text delimiter" and "Quote all text cells", I've tried both options each (single and double quotes for delimiter and checked/unchecked for Quote).
Then in phpMyAdmin, for the import I leave UTF-8 selected, columns enclosed with double-quote (or single quote, matching what I selected in LibreOffice) and for columns escaped with I've tried backslash and double-quote and single-quote.
In ALL cases I keep getting the error "Invalid column count in CSV input on line n." The line number depends on what I selected for column escape/delimiter (single or double quote). If I selected double quote as delimiter, I get the error on the first line that has a column with an unescaped single quote in it, and vice versa for single quote delimiters.
How can I get this spreadsheet imported with both single and double quotes in the cells?
Okay, after some more research and "fiddling around" I figured it out.
In my situation, for the import I selected CSV using LOAD DATA, used double quotes for "columns enclosed with", and cleared the columns escaped with.
LOAD DATA apparently tells phpMyAdmin to allow MySQL to handle the file directly. I'm not sure why this would affect my issue if I can specify delimiters, etc for the "regular" CSV import selection, but it seems to have worked for me!
Hope this helps someone else out.

Openoffice - CSV-export: is there a default escape-charcter?

As far as I can see OpenOffice, when it comes to save a file as a csv-file, encloses all strings in quote-characters.
So is there any need for an escape character?
and related to this question:
Does OpenOffice have a default escape character?
I'm also wondering if there is a way to choose the escape character when saving OpenOffice as csv. phpmyadmin was not accepting a 9,000 line 50+ column spreadsheed in .ods format and there doesn't seem to be a way to choose the escape character when saving as CSV.
So I had to save as csv, open in word, and use some find/replace tricks to change the escape character to \ (back slash). Default is to use double quotes to escape double quotes, and phpmyadmin won't accept that format.
To properly convert the file to use \ (back-slash) to escape double-quotes, you have to do this:
Pick a placeholder character string, e.g. 'abcdefg', that does
not occur anywhere in the csv.
Find/replace """ (three double-quotes in a row) with the placeholder. This is to prevent possibly incorrect results in the next step.
Find/replace "" (two quotes in a row, representing one quote that should be escaped), with \" (back-slash double-quote). If you did this without find/replacing """ it's conceivable you could get a result like "\" instead of \"". Better safe than sorry.
Find/replace the placeholder string with \"" (back-slash double-quote double-quote).
That will work, unless you happen to have more than one double-quote in a row in your original text fields, which would result in as many as five double-quotes in a row in the resulting .ods or .xlsx csv file (two double-quotes for each escaped double quote, plus another double quote if its at the end of the field).
Escaping in quotes makes life easier for tools parsing the CSV file.
In a recent version of LibreOffice (3.4.4), the CSV export was not handled correctly by phpMyAdmin. Since LibreOffice doesn't provide an escape character, the phpMyAdmin's default "CSV" import feature "Columns escaped with:" didn't work well. The data was always inconsistent.
However, using the option CSV using LOAD DATA did work, only if the value in Columns escaped by option was removed. I presume phpMyAdmin uses the default MySQL LOAD DATA command, and thus the control is passed to MySQL for data processing. In my scenario it resulted in accurate data import.