Is there a way to include commas in CSV columns without breaking the formatting? - csv

I've got a two column CSV with a name and a number. Some people's name use commas, for example Joe Blow, CFA. This comma breaks the CSV format, since it's interpreted as a new column.
I've read up and the most common prescription seems to be replacing that character, or replacing the delimiter, with a new value (e.g. this|that|the, other).
I'd really like to keep the comma separator (I know excel supports other delimiters but other interpreters may not). I'd also like to keep the comma in the name, as Joe Blow| CFA looks pretty silly.
Is there a way to include commas in CSV columns without breaking the formatting, for example by escaping them?

To encode a field containing comma (,) or double-quote (") characters, enclose the field in double-quotes:
field1,"field, 2",field3, ...
Literal double-quote characters are typically represented by a pair of double-quotes (""). For example, a field exclusively containing one double-quote character is encoded as """".
For example:
Sheet: |Hello, World!|You "matter" to us.|
CSV: "Hello, World!","You ""matter"" to us."
More examples (sheet → csv):
regular_value → regular_value
Fresh, brown "eggs" → "Fresh, brown ""eggs"""
" → """"
"," → ""","""
,,," → ",,,"""
,"", → ","""","
""" → """"""""
See wikipedia.

I found that some applications like Numbers in Mac ignore the double quote if there is space before it.
a, "b,c" doesn't work while a,"b,c" works.

The problem with the CSV format, is there's not one spec, there are several accepted methods, with no way of distinguishing which should be used (for generate/interpret). I discussed all the methods to escape characters (newlines in that case, but same basic premise) in another post. Basically it comes down to using a CSV generation/escaping process for the intended users, and hoping the rest don't mind.
Reference spec document.

If you want to make that you said, you can use quotes. Something like this
$name = "Joe Blow, CFA.";
$arr[] = "\"".$name."\"";
so now, you can use comma in your name variable.

You need to quote that values.
Here is a more detailed spec.

In addition to the points in other answers: one thing to note if you are using quotes in Excel is the placement of your spaces. If you have a line of code like this:
print '%s, "%s", "%s", "%s"' % (value_1, value_2, value_3, value_4)
Excel will treat the initial quote as a literal quote instead of using it to escape commas. Your code will need to change to
print '%s,"%s","%s","%s"' % (value_1, value_2, value_3, value_4)
It was this subtlety that brought me here.

You can use Template literals (Template strings)
e.g -
`"${item}"`

CSV files can actually be formatted using different delimiters, comma is just the default.
You can use the sep flag to specify the delimiter you want for your CSV file.
Just add the line sep=; as the very first line in your CSV file, that is if you want your delimiter to be semi-colon. You can change it to any other character.

This isn't a perfect solution, but you can just replace all uses of commas with ‚ or a lower quote. It looks very very similar to a comma and will visually serve the same purpose. No quotes are required
in JS this would be
stringVal.replaceAll(',', '‚')
You will need to be super careful of cases where you need to directly compare that data though

Depending on your language, there may be a to_json method available. That will escape many things that break CSVs.

I faced the same problem and quoting the , did not help. Eventually, I replaced the , with +, finished the processing, saved the output into an outfile and replaced the + with ,. This may seem ugly but it worked for me.

May not be what is needed here but it's a very old question and the answer may help others. A tip I find useful with importing into Excel with a different separator is to open the file in a text editor and add a first line like:
sep=|
where | is the separator you wish Excel to use.
Alternatively you can change the default separator in Windows but a bit long-winded:
Control Panel>Clock & region>Region>Formats>Additional>Numbers>List separator [change from comma to your preferred alternative]. That means Excel will also default to exporting CSVs using the chosen separator.

You could encode your values, for example in PHP base64_encode($str) / base64_decode($str)
IMO this is simpler than doubling up quotes, etc.
https://www.php.net/manual/en/function.base64-encode.php
The encoded values will never contain a comma so every comma in your CSV will be a separator.

You can use the Text_Qualifier field in your Flat file connection manager to as ". This should wrap your data in quotes and only separate by commas which are outside the quotes.

First, if item value has double quote character ("), replace with 2 double quote character ("")
item = item.ToString().Replace("""", """""")
Finally, wrap item value:
ON LEFT: With double quote character (")
ON RIGHT: With double quote character (") and comma character (,)
csv += """" & item.ToString() & ""","

Double quotes not worked for me, it worked for me \". If you want to place a double quotes as example you can set \"\".
You can build formulas, as example:
fprintf(strout, "\"=if(C3=1,\"\"\"\",B3)\"\n");
will write in csv:
=IF(C3=1,"",B3)

A C# method for escaping delimiter characters and quotes in column text. It should be all you need to ensure your csv is not mangled.
private string EscapeDelimiter(string field)
{
if (field.Contains(yourEscapeCharacter))
{
field = field.Replace("\"", "\"\"");
field = $"\"{field}\"";
}
return field;
}

Related

spark csv writer - escape string without using quotes

I am trying to escape delimiter character that appears inside data. Is there a way to do it by passing option parameters? I can do it from udf, but I am hoping it is possible using options.
val df = Seq((8, "test,me\nand your", "other")).toDF("number", "test", "t")
df.coalesce(1).write.mode("overwrite").format("csv").option("quote", "\u0000").option("delimiter", ",").option("escape", "\\").save("testcsv1")
But the escape is not working. The output file is written as
8,test,me
and your,other
I want the output file to be written as.
8,test\,me\\nand your,other
I'm not certain, but I think if you had your sequence as
Seq((8, "test\\,me\\\\nand your", "other"))
and did not specify a custom escape character, it would behave as you are expecting and give you 8,test\,me\\nand your,other as the output. This is because \\ acts simply as the character '\' rather than an escape, so they are printed where you want and the n immediately after is not interpreted as part of a newline character.

How to disable neo4j-import quotation checking

I try to import some large csv dataset into neo4j using the neo4j-import tool. Quotation is not used anywhere, and therefore i get errors when parsing using --quote " --quote ' --quote ´ and alike. even choosing very rare unicode chars doesnt help with this multi-gig csv because it also contains arabic letters, math symbols and everything you can imagine.
So: Is there a way to disable the quotation checking completely?
Perhaps it would be useful to have the import tool able to accept character configuration values specifying ASCII codes. If so then you could specify --quote \0 and no character would match. That would also be useful for specifying other special characters in general I'd guess.
You need to make sure the CSV file uses quotation marks, since they allow the tool to reliably determine when strings end.
Any string in your data file might contain the delimiter character (a comma, by default). Even if there were a way to turn off quotation checking, the tool would treat every delimiter character as the end of a field. Therefore, any string field that happened to contain the delimiter character would be terminated prematurely, causing errors.

Insert HTML special characters in (i18n) yml

I need to insert a special character in the html file translation.
The character is a space, need it to try to solve another problem.
But it is not working. The code for that character is displayed in the subject of the email.
For this I insert these lines:
pt.yml
subjects:
...
release_auto_pause_triggered_html: "%{project_name} %{release_name} - pausa automática disparada"
release_mailer.rb
subject = t('subjects.release_auto_pause_triggered_html', project_name: #project.name, release_name: #release.name).html_safe
But the subject of the email sent is as follows: pausa automática disparada
The "'" I added just to make this post, but it would not give to see here.
I need to look like this: "pausa automática disparada"
Where am I going wrong?
I think I manage to do it. Try this "pausa automática\xA0disparada"
Source:
Using single-quoted scalars, you may express any value that does not contain special characters. No escaping occurs for single quoted scalars except that a pair of adjacent quotes '' is replaced with a lone single quote '.
Double-quoted is the most powerful style and the only style that can express any scalar value. Double-quoted scalars allow escaping. Using escaping sequences \x** and \u**, you may express any ASCII or Unicode character.
And here I found the necessary code
My output is:
# YML
title: "Title \xA0 aaa"
# console
I18n.t('title')
=> "Title   aaa"
It should work this way. Try to check for typos.
In your example
"pausa automática&nbsp';disparada""
there is extra ' between &nbsp and ;.

csv parsing, quotes as characters

I have a csv file that contains fields such as this:
""#33CCFF"
I would imagine that should be the text value:
"#33CCFF
But both excel and open office calc will display:
#33CCFF"
What rule of csv am I missing?
When Excel parses the value, it does not first remove the outer quotes, to then proceed reading what's in between them. Even if it would, what would it do with the remaining " in front of #? It can not show this as you expect "#33CCFF, because for it to appear like that, the double quote should have been escaped by duplicating it. (That might be the 'csv' rule you are missing.)
Excel reads the value from left to right, interpreting "" as a single ", it then reads on, and finds an unexpected double quote at the end, 'panics' and simply displays it.
The person/application creating the csv file made the mistake of adding encapsulation without escaping the encapsulation character within the values. Now your data is malformed. Since when using encapsulation, the value should be """#33CCFF", and when not using encapsulation, it should be "#33CCFF.
This might clarify some things

How does Comma separated files work? ...if the text has commas in it

I never understood this.
Wikipdedia has the info you want
Fields with embedded commas must be enclosed within double-quote characters.
For more than you ever want to know about CSV: RfC4180 - Common Format and MIME Type for Comma-Separated Values (CSV) Files.
This article is quite complete.
Fields that contain a special
character (comma, newline, or double
quote), must be enclosed in double
quotes.
There is no real standard for what people tell csv files.
Mirosoft refers to csv as character separated values.
This is done because dependent on the decimal character the separated character is changed.
German: 1,2
English: 1.2
But I aggree that most times " of ' is used to enclose text elements.
But either all strings are enclosed in " or none.
CSV is very far from standardised. The nearest approach to it is this RFC, which explains how commas and other special characters should be handled.
You have 2 options:
Quote the field (use " character e.g. "..., ...")
Change your delimiter from comma "," to something else (e.g. ";")
See this webpage with an introduction and overview of CSV (comman separated values) format for more