CSV with Semi-Colon as delimiter - csv

Has anyone ever written peoplecode to read a CSV file that uses semi-colons as a delimiter and comma as a decimal divider?
I need to read an Italian CSV file that uses those characters instead of the normal comma and decimal point.
What was your experience? Anything to watch out for?

Two options, one using file layout and the other without.
Option A) Using file layout: Consider below properties of filelayout
Definition Delimiter = "semicolon"
FieldType for the numeric field having comma as decimal divider = "character"
After reading the field, replace comma with a period and use value(&new_str) on the new string to convert it to number
Option B) Without file layout:
Open the input file in your code.
Loop through each line.
Use split to fetch field values- e.g.
&ret_arr = split(&str_line,";");
&ret_arr array will be populated with field values, access them using &ret_arr[1],..[2] etc.
Replace comma from that numeric field and use value(&new_str) for conversion.
Above was my experience (long back), nothing else to watch out for. Hope this helps!

If you are using a file layout it probably won't read the commas in as decimal separator although you can tell it to use the semi-colon as the separator. Your other option is to read all fields in as text and then do a replace on the number fields to replace the comma with a period and then do a value() on the string to convert it to a number.

Related

How do I replace comma with hyphen within the double quotes in json

I am processing a CSV file in powerautomate. Here is one record.
server2,usa,"rebooted,by citrix",25,good
Because "rebooted,by citrix" is data from a single field, when I am splitting with comma, the array manipulation gets mismatch.
I want to replace the comma within double quotes with hyphen. The expected output should be like server2,usa,"rebooted-by citrix",25,good
PrasadAthalye has a nice solution approach for this.
You could first split on the " character instead. Per item you could replace the , character and append the results to a new array. After that you should be able to apply your normal split.
https://powerusers.microsoft.com/t5/Building-Flows/Setting-up-specific-expression-to-remove-comma-inside-strings/m-p/646040/highlight/true#M86288

Read a list of CSV files in Talend with ; in field

I have a list of CSV files which i receive for ETL into database every month. Its in a folder. My data has ; in many columns as well. For example, in the location column values like New York; USA are present, which i want to appear in a single column instead of splitting into many columns. How do i specify delimiter then?
I think you cannot have the field separator included in the field content or you have to incluse these values between "". For example:
blabla;"New York; USA";blabla
Other solution, change the field delimitor to a more specific (and unused) character.
I'm afraid there is no better solution.
Regards,
TRF
As TRF mentioned, you can't have the delimiter as part of the non-delimiting text in your file.
My workaround for that would be the following:
1) Read the file with a tFileInputFullRow (https://help.talend.com/display/TalendComponentsReferenceGuide54EN/tFileInputFullRow)
2) Use a tReplace to replace the ; with some other character,
say -, for the problem cells (in your case, replace "New York;USA" with "New York-USA". You can also use the regex option in the tReplace component to make it a generic rule.
3) Save that output into another file
4) Now read the new file using ; as the delimiter
References:
1) tReplace: https://help.talend.com/display/TalendOpenStudioComponentsReferenceGuide521EN/18.16+tReplace
2) Regex: https://docs.oracle.com/javase/tutorial/essential/regex/

How to use comma as both column delimiter and text qualifier

I am loading a flat file into a SQL database. The flat file is comma delimited. Some of the column values have comma without being encapsulated in double quotes (for e.g - HPPV,TYRE). Now, when I try to use comma as text qualifier, I get a message saying column delimiter and text qualifier cannot be the same.
I want to somehow use comma as text qualifier so that the flat file keeps the value - HPPV,TYRE as on single entity - HPPVTYRE or HPPV TYRE, instead of spilling it over to the next column.
Is there any way we can use comma as text qualifier, it already being a column delimiter ?????
No, I don't think so but I searched and found this article that might help.

How can I strip dollar signs ($) from a field in Hive?

I'm trying to import a csv into Hive. I have a column which is a dollar value and is reported within the CSV as '$123,244.00.' I would like to convert this value into a float in Hive.
So I've loaded the csv into a temporary table, treating that column as a string. Next I want to load it into the final table, and in the process convert that string into a float or decimal.
Any suggestions on the best way to go about doing this?
This should work:
select float(regexp_replace(substr('$123,244.00', 2, length('$123,244.00')), ',', '')) from table;
You need to remove any commas as well as the dollar sign. You may find this link helpful as well: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-NumericTypes

CSV with comma or semicolon?

How is a CSV file built in general? With commas or semicolons?
Any advice on which one to use?
In Windows it is dependent on the "Regional and Language Options" customize screen where you find a List separator. This is the char Windows applications expect to be the CSV separator.
Of course this only has effect in Windows applications, for example Excel will not automatically split data into columns if the file is not using the above mentioned separator. All applications that use Windows regional settings will have this behavior.
If you are writing a program for Windows that will require importing the CSV in other applications and you know that the list separator set for your target machines is ,, then go for it, otherwise I prefer ; since it causes less problems with decimal points, digit grouping and does not appear in much text.
CSV is a standard format, outlined in RFC 4180 (in 2005), so there IS no lack of a standard. https://www.ietf.org/rfc/rfc4180.txt
And even before that, the C in CSV has always stood for Comma, not for semiColon :(
It's a pity Microsoft keeps ignoring that and is still sticking to the monstrosity they turned it into decades ago (yes, I admit, that was before the RFC was created).
One record per line, unless a newline occurs within quoted text (see below).
COMMA as column separator. Never a semicolon.
PERIOD as decimal point in numbers. Never a comma.
Text containing commas, periods and/or newlines enclosed in "double quotation marks".
Only if text is enclosed in double quotation marks, such quotations marks in the text escaped by doubling. These examples represent the same three fields:
1,"this text contains ""quotation marks""",3
1,this text contains "quotation marks",3
The standard does not cover date and time values, personally I try to stick to ISO 8601 format to avoid day/month/year -- month/day/year confusion.
I'd say stick to comma as it's widely recognized and understood. Be sure to quote your values and escape your quotes though.
ID,NAME,AGE
"23434","Norris, Chuck","24"
"34343","Bond, James ""master""","57"
Also relevant, but specially to excel, look at this answer and this other one that suggests, inserting a line at the beginning of the CSV with
"sep=,"
To inform excel which separator to expect
1.> Change File format to .CSV (semicolon delimited)
To achieve the desired result we need to temporary change the delimiter setting in the Excel Options:
Move to File -> Options -> Advanced -> Editing Section
Uncheck the “Use system separators” setting and put a comma in the “Decimal Separator” field.
Now save the file in the .CSV format and it will be saved in the semicolon delimited format.
Initially it was to be a comma, however as the comma is often used as a decimal point it wouldnt be such good separator, hence others like the semicolon, mostly country dependant
http://en.wikipedia.org/wiki/Comma-separated_values#Lack_of_a_standard
CSV is a Comma Seperated File. Generally the delimiter is a comma, but I have seen many other characters used as delimiters. They are just not as frequently used.
As for advising you on what to use, we need to know your application. Is the file specific to your application/program, or does this need to work with other programs?
To change comma to semicolon as the default Excel separator for CSV - go to Region -> Additional Settings -> Numbers tab -> List separator
and type ; instead of the default ,
Well to just to have some saying about semicolon. In lot of country, comma is what use for decimal not period. Mostly EU colonies, which consist of half of the world, another half follow UK standard (how the hell UK so big O_O) so in turn make using comma for database that include number create much of the headache because Excel refuse to recognize it as delimiter.
Like wise in my country, Viet Nam, follow France's standard, our partner HongKong use UK standard so comma make CSV unusable, and we use \t or ; instead for international use, but it still not "standard" per the document of CSV.
best way will be to save it in a text file with csv extension:
Sub ExportToCSV()
Dim i, j As Integer
Dim Name As String
Dim pathfile As String
Dim fs As Object
Dim stream As Object
Set fs = CreateObject("Scripting.FileSystemObject")
On Error GoTo fileexists
i = 15
Name = Format(Now(), "ddmmyyHHmmss")
pathfile = "D:\1\" & Name & ".csv"
Set stream = fs.CreateTextFile(pathfile, False, True)
fileexists:
If Err.Number = 58 Then
MsgBox "File already Exists"
'Your code here
Return
End If
On Error GoTo 0
j = 1
Do Until IsEmpty(ThisWorkbook.ActiveSheet.Cells(i, 1).Value)
stream.WriteLine (ThisWorkbook.Worksheets(1).Cells(i, 1).Value & ";" & Replace(ThisWorkbook.Worksheets(1).Cells(i, 6).Value, ".", ","))
j = j + 1
i = i + 1
Loop
stream.Close
End Sub