CSV with comma or semicolon? - csv

How is a CSV file built in general? With commas or semicolons?
Any advice on which one to use?

In Windows it is dependent on the "Regional and Language Options" customize screen where you find a List separator. This is the char Windows applications expect to be the CSV separator.
Of course this only has effect in Windows applications, for example Excel will not automatically split data into columns if the file is not using the above mentioned separator. All applications that use Windows regional settings will have this behavior.
If you are writing a program for Windows that will require importing the CSV in other applications and you know that the list separator set for your target machines is ,, then go for it, otherwise I prefer ; since it causes less problems with decimal points, digit grouping and does not appear in much text.

CSV is a standard format, outlined in RFC 4180 (in 2005), so there IS no lack of a standard. https://www.ietf.org/rfc/rfc4180.txt
And even before that, the C in CSV has always stood for Comma, not for semiColon :(
It's a pity Microsoft keeps ignoring that and is still sticking to the monstrosity they turned it into decades ago (yes, I admit, that was before the RFC was created).
One record per line, unless a newline occurs within quoted text (see below).
COMMA as column separator. Never a semicolon.
PERIOD as decimal point in numbers. Never a comma.
Text containing commas, periods and/or newlines enclosed in "double quotation marks".
Only if text is enclosed in double quotation marks, such quotations marks in the text escaped by doubling. These examples represent the same three fields:
1,"this text contains ""quotation marks""",3
1,this text contains "quotation marks",3
The standard does not cover date and time values, personally I try to stick to ISO 8601 format to avoid day/month/year -- month/day/year confusion.

I'd say stick to comma as it's widely recognized and understood. Be sure to quote your values and escape your quotes though.
ID,NAME,AGE
"23434","Norris, Chuck","24"
"34343","Bond, James ""master""","57"

Also relevant, but specially to excel, look at this answer and this other one that suggests, inserting a line at the beginning of the CSV with
"sep=,"
To inform excel which separator to expect

1.> Change File format to .CSV (semicolon delimited)
To achieve the desired result we need to temporary change the delimiter setting in the Excel Options:
Move to File -> Options -> Advanced -> Editing Section
Uncheck the “Use system separators” setting and put a comma in the “Decimal Separator” field.
Now save the file in the .CSV format and it will be saved in the semicolon delimited format.

Initially it was to be a comma, however as the comma is often used as a decimal point it wouldnt be such good separator, hence others like the semicolon, mostly country dependant
http://en.wikipedia.org/wiki/Comma-separated_values#Lack_of_a_standard

CSV is a Comma Seperated File. Generally the delimiter is a comma, but I have seen many other characters used as delimiters. They are just not as frequently used.
As for advising you on what to use, we need to know your application. Is the file specific to your application/program, or does this need to work with other programs?

To change comma to semicolon as the default Excel separator for CSV - go to Region -> Additional Settings -> Numbers tab -> List separator
and type ; instead of the default ,

Well to just to have some saying about semicolon. In lot of country, comma is what use for decimal not period. Mostly EU colonies, which consist of half of the world, another half follow UK standard (how the hell UK so big O_O) so in turn make using comma for database that include number create much of the headache because Excel refuse to recognize it as delimiter.
Like wise in my country, Viet Nam, follow France's standard, our partner HongKong use UK standard so comma make CSV unusable, and we use \t or ; instead for international use, but it still not "standard" per the document of CSV.

best way will be to save it in a text file with csv extension:
Sub ExportToCSV()
Dim i, j As Integer
Dim Name As String
Dim pathfile As String
Dim fs As Object
Dim stream As Object
Set fs = CreateObject("Scripting.FileSystemObject")
On Error GoTo fileexists
i = 15
Name = Format(Now(), "ddmmyyHHmmss")
pathfile = "D:\1\" & Name & ".csv"
Set stream = fs.CreateTextFile(pathfile, False, True)
fileexists:
If Err.Number = 58 Then
MsgBox "File already Exists"
'Your code here
Return
End If
On Error GoTo 0
j = 1
Do Until IsEmpty(ThisWorkbook.ActiveSheet.Cells(i, 1).Value)
stream.WriteLine (ThisWorkbook.Worksheets(1).Cells(i, 1).Value & ";" & Replace(ThisWorkbook.Worksheets(1).Cells(i, 6).Value, ".", ","))
j = j + 1
i = i + 1
Loop
stream.Close
End Sub

Related

Using InStr in MS Access 2010 with delimited text

I am trying to find when a key (Rejects.ID) is mentioned within Roles.Referenced.
InStr normally works for this, but both fields range from 2-4 characters. There are some intances where the characters of ID are found within Referenced, where ID is 34 and Referenced is 1234.
Referenced is delimited by semicolons except for the first and last entries. I can find 99% of the entries by padding semicolons before and after ID, this works for most-
InStr(Roles.Referenced,(";" & Rejects.ID & ";"))
Other than adding leading and trailing semicolons, is there a way I can find all instances of ID in Referenced?
Thank you,
JF
The only solution I could conceive is testing all possible scenarios: in the string surrounded by delimiters, at the beginning of the string, or at the end of the string.
InStr(Roles.Referenced,(";" & Rejects.ID & ";")) > 0
OR LEFT (Roles.Referenced,LEN(Rejects.ID))=CStr(Rejects.ID)
OR RIGHT(Roles.Referenced,LEN(Rejects.ID))=CStr(Rejects.ID)

dealing with strange characters with access, asp and CSVs

I have a problem, i have to create a csv file with an ASP Classic page, taking the data from a MS Access database, all really simple, but in the final file I have tons of strange characters, appearing as squares (unknown character square). I must get rid of those characters, but i really don't know how... have you got some ideas?
this is how I see something on the file: M�NSTERSTRA�E and of course, I don't really know which are the char that give problems...and they are really a lot.
and this is how I write the csv...
dim fs,f,d
set fs = Server.CreateObject("Scripting.FileSystemObject")
set f = fs.OpenTextFile(Server.MapPath("clienti.csv"), 2, true,true)
d = ""
do while not rs1.EOF
d = ""
For Each fField in RS1.Fields
f.Write(d)
f.Write(" ")
temp = RS1(fField.Name)
if len(trim(temp)) > 0 then
f.Write(trim(temp))
end if
d = ";"
Next
f.WriteLine("")
rs1.movenext
loop
f.Close
set f = Nothing
set fs = Nothing
I can't think about making a replace of all the chars, becouse I don't know them before i extract all the clients... I need some workaround for this...
The � means that your browser doesn't recognize that char, so makes a substitute. One example is the "smart quotes" (curly ones) that some applications, like MS Word, substitute for the strait quotes. The default character encoding is ISO-8859-1.
If you don't want those to show up, you have 2 choices. You can delete them, of you can try to find the appropriate substitution.
Either way, first you have to identify all the chars that result in �. To do this, you'll have to go through each char and compare it to this list: http://www.ic.unicamp.br/~stolfi/EXPORT/www/ISO-8859-1-Encoding.html
Once you identify the bad char, you have the choice of just deleting it, or once you figure out what they should be, you can change them to what they should be. For instance, the smart quotes are coded as 147 & 148, so you can just change both of those to strait quotes ("). If you do a search, you'll probably find some code that does most, if not all, of this for you.

How can I read a CSV with quoted commas as a flat file in BusinessObjects Data Services Designer?

I'm trying to get SAP BusinessObjects Data Services Designer 12.2.3.1 to read a CSV file that contains rows like:
"00501","P",0,0,"Nassau-Suffolk, NY","SUFFOLK"
The results I'm getting with column delimiter set to Comma, however, read that line as seven columns rather than six:
"00501" "P" 0 0 "Nassau-Suffolk NY" "SUFFOLK"
What additional options do I need in order to read the file as-is, without external preprocessing? (If this isn't possible, please say so and I'll stop getting grey matter all over this nice brick wall. Thanks!)
Solution to load data with double quote:
The solution was to set the Text delimiter to ".
Text: Denotes the start and end of a text string. All characters (including those specified as column delimiters) between the first and second occurrence of this character is a single text string. The treatment of the row characters is defined by the "Row within text string" setting.

MS SQL Backslash preceding new line/line feed removes line break

I'm working in classical ASP and am trying to simply insert some user input into my MS SQL 2008 database. This is something I do basically every day, but I don't think I've ever experienced this bug before.
The bug I am getting is that, if a user ends a line of text in backslash, and starts a new line below, both the backslash and line break are lost after the data is stored in the DB.
If i try the following statement, hardcoded from an ASP file:
UPDATE TBLarticle_text SET Introduction = 'Text on first line \" & vbCrLf & " text on second line' WHERE ArticleGuid = 28
The resulting data is without the backslash or the line break. The string is correct if stored in a variable and printed on the page.
Here is the example user input (normally from a form, but it's not really relevant). The input:
Text on first line \
text on second line
... is stored as:
Text on first line text on second line
I don't see any issues if the backslash is followed by anything other than a line break.
I know this is old, but I just came across a MS KB article that discusses this: http://support.microsoft.com/kb/164291/en-us. The long and short of it is that the three characters \<CR><LF> together is some weird escape sequence, and you need to replace all occurrences of \<CR><LF> with \\<CR><LF><CR><LF> when inserting or updating because the first backslash in \\<CR><LF><CR><LF> escapes the weird escape sequence, i.e. the next three characters \<CR><LF>, and then the additional <CR><LF> puts the carriage return back in place. Annoying, yes.
I am seeing a similar issue. I would say the best way to work around this is to not let ASP pass such strings to SQL Server. You can clean this up by simply replacing that character sequence and injecting a space at the end of such a line (which a user is unlikely to ever notice):
sql = REPLACE(sql, "\" & vbCrLf, "\ " & vbCrLf)
You don't have to be using ASP or VBScript to observe this behavior:
CREATE TABLE #floobar(i INT, x VARCHAR(255));
INSERT #floobar SELECT 1, 'foo \
bar';
INSERT #floobar SELECT 2, 'foo \\
bar';
INSERT #floobar SELECT 3, 'foo
bar';
SELECT * FROM #floobar;
I don't know that you're ever going to get Microsoft to fix this to not treat this character sequence special, so the "fix" is going to be to work around it. You may also have to watch out for non-traditional CR/LF, e.g. CHR(10) or CHR(13) on their own, or also vbTab (Chr(9))).

CSV with Semi-Colon as delimiter

Has anyone ever written peoplecode to read a CSV file that uses semi-colons as a delimiter and comma as a decimal divider?
I need to read an Italian CSV file that uses those characters instead of the normal comma and decimal point.
What was your experience? Anything to watch out for?
Two options, one using file layout and the other without.
Option A) Using file layout: Consider below properties of filelayout
Definition Delimiter = "semicolon"
FieldType for the numeric field having comma as decimal divider = "character"
After reading the field, replace comma with a period and use value(&new_str) on the new string to convert it to number
Option B) Without file layout:
Open the input file in your code.
Loop through each line.
Use split to fetch field values- e.g.
&ret_arr = split(&str_line,";");
&ret_arr array will be populated with field values, access them using &ret_arr[1],..[2] etc.
Replace comma from that numeric field and use value(&new_str) for conversion.
Above was my experience (long back), nothing else to watch out for. Hope this helps!
If you are using a file layout it probably won't read the commas in as decimal separator although you can tell it to use the semi-colon as the separator. Your other option is to read all fields in as text and then do a replace on the number fields to replace the comma with a period and then do a value() on the string to convert it to a number.