I am loading a flat file into a SQL database. The flat file is comma delimited. Some of the column values have comma without being encapsulated in double quotes (for e.g - HPPV,TYRE). Now, when I try to use comma as text qualifier, I get a message saying column delimiter and text qualifier cannot be the same.
I want to somehow use comma as text qualifier so that the flat file keeps the value - HPPV,TYRE as on single entity - HPPVTYRE or HPPV TYRE, instead of spilling it over to the next column.
Is there any way we can use comma as text qualifier, it already being a column delimiter ?????
No, I don't think so but I searched and found this article that might help.
Related
I have read that the delimiter is designed to separate two values in the data file, and that the text qualifier is designed to accomodate scenarios when the delimiter is in the actual data. But what if the text qualifier could potentially be in the data too?
Example: pipe delimited data, with " as the text qualifier:
"ABC"|"DEF"|"GHI"
Now what if the text qualifier is inside the data:
"ABC"|"DE"F"|"GHI"
Now we're back to square one: we added a text qualifier because our delimiter could be inside the data, but now have the same problem with the text qualifier.
2 questions:
1) What is the point of using a text qualifier if its use is easily broken, just as with the delimiter?
2) It seems to me that the only reliable way to delimit data is to use a delimiter with multiple characters (example: |~| ) and pray that those characters are never in the data. Is this correct?
Thank you in advance.
I think your questions could be answered together:
The purpose of using Text Qualifier is to escape inline characters that are the same as the Column Delimiter, for example, in your case, you are using the pipe delimiter (Vertical delimiter), so Text Qualifier is introduced if there is some pipe characters in your text. It will bring you a parsing issue if the inline symbol is the same as the qualifier, otherwise, you are using the different qualifier compared to the character you would like to escape.
Is there a way to remove all non-numeric characters from a nvarchar text string in MySQL?
I kind of got it working with the following select statement but I am having issues on getting the UPDATE statement so I can use the UPDATE on a MySQL trigger everytime something gets inserted or updated. I need to remove the dash "-" and any spaces and only leave all the numeric characters.
Here is a sample of the table structure.
http://sqlfiddle.com/#!2/90668/7/0
The field that contain all the dashes and the spaces is 'locationNumber'.
Thank you in Advance.
I'm trying to get SAP BusinessObjects Data Services Designer 12.2.3.1 to read a CSV file that contains rows like:
"00501","P",0,0,"Nassau-Suffolk, NY","SUFFOLK"
The results I'm getting with column delimiter set to Comma, however, read that line as seven columns rather than six:
"00501" "P" 0 0 "Nassau-Suffolk NY" "SUFFOLK"
What additional options do I need in order to read the file as-is, without external preprocessing? (If this isn't possible, please say so and I'll stop getting grey matter all over this nice brick wall. Thanks!)
Solution to load data with double quote:
The solution was to set the Text delimiter to ".
Text: Denotes the start and end of a text string. All characters (including those specified as column delimiters) between the first and second occurrence of this character is a single text string. The treatment of the row characters is defined by the "Row within text string" setting.
Has anyone ever written peoplecode to read a CSV file that uses semi-colons as a delimiter and comma as a decimal divider?
I need to read an Italian CSV file that uses those characters instead of the normal comma and decimal point.
What was your experience? Anything to watch out for?
Two options, one using file layout and the other without.
Option A) Using file layout: Consider below properties of filelayout
Definition Delimiter = "semicolon"
FieldType for the numeric field having comma as decimal divider = "character"
After reading the field, replace comma with a period and use value(&new_str) on the new string to convert it to number
Option B) Without file layout:
Open the input file in your code.
Loop through each line.
Use split to fetch field values- e.g.
&ret_arr = split(&str_line,";");
&ret_arr array will be populated with field values, access them using &ret_arr[1],..[2] etc.
Replace comma from that numeric field and use value(&new_str) for conversion.
Above was my experience (long back), nothing else to watch out for. Hope this helps!
If you are using a file layout it probably won't read the commas in as decimal separator although you can tell it to use the semi-colon as the separator. Your other option is to read all fields in as text and then do a replace on the number fields to replace the comma with a period and then do a value() on the string to convert it to a number.
From the CSV spec (RFC 4180), Spaces are considered part of a field and should not be ignored. Obviously if the field contains double quotes it should retain the spaces inside the quotes.
My question is, what about spaces outside of the double quotes? The only way I can see this happening is if the tool that generated the CSV didn't do it properly.
Example: one, "two" ,three
Should the space before and after "two" be included?
That cell is invalid - to properly code that row it should be:
one," ""two"" ",three
Double quotes must also be escaped (as double-double quote) since they are used as the escape sequence. If you don't want to preserve the quotes around two, technically there are two things invalid about the row - (1) the spaces before and after the quotes and (2) the fact that there are quotes around the cell but nothing to be escaped. CSV demands that there can only be quotes around the cell if there are commas or quotes inside the content of the cell.
If I were in your case, I would err on the side of leniency.
I dealt with this using BULK INSERT and BCP format files, which is tricky to account for the quote and comma delimiting. In the event that there could be variation, say with a , " delimiter We used the lowest common delimiter, so the comma in your example, then stripped out what wasn't needed like all the double quotes.
But it could also be that your source data was only comma delimited and this was the actual contents of that field. Either way, I would toss out the quotes when loading the field, in whatever method was appropriate.