How to properly escape a double quote in CSV? - csv

I have a line like this in my CSV:
"Samsung U600 24"","10000003409","1","10000003427"
Quote next to 24 is used to express inches, while the quote just next to that quote closes the field. I'm reading the line with fgetcsv but the parser makes a mistake and reads the value as:
Samsung U600 24",10000003409"
I tried putting a backslash before the inches quote, but then I just get a backslash in the name:
Samsung U600 24\"
Is there a way to properly escape this in the CSV, so that the value would be Samsung U600 24" , or do I have to regex it in the processor?

Use 2 quotes:
"Samsung U600 24"""

Not only double quotes, you will be in need for single quote ('), double quote ("), backslash (\) and NUL (the NULL byte).
Use fputcsv() to write, and fgetcsv() to read, which will take care of all.

I have written in Java.
public class CSVUtil {
public static String addQuote(
String pValue) {
if (pValue == null) {
return null;
} else {
if (pValue.contains("\"")) {
pValue = pValue.replace("\"", "\"\"");
}
if (pValue.contains(",")
|| pValue.contains("\n")
|| pValue.contains("'")
|| pValue.contains("\\")
|| pValue.contains("\"")) {
return "\"" + pValue + "\"";
}
}
return pValue;
}
public static void main(String[] args) {
System.out.println("ab\nc" + "|||" + CSVUtil.addQuote("ab\nc"));
System.out.println("a,bc" + "|||" + CSVUtil.addQuote("a,bc"));
System.out.println("a,\"bc" + "|||" + CSVUtil.addQuote("a,\"bc"));
System.out.println("a,\"\"bc" + "|||" + CSVUtil.addQuote("a,\"\"bc"));
System.out.println("\"a,\"\"bc\"" + "|||" + CSVUtil.addQuote("\"a,\"\"bc\""));
System.out.println("\"a,\"\"bc" + "|||" + CSVUtil.addQuote("\"a,\"\"bc"));
System.out.println("a,\"\"bc\"" + "|||" + CSVUtil.addQuote("a,\"\"bc\""));
}
}

Since no one has mentioned the way I usually do it, I'll just type this down. When there's a tricky string, I don't even bother escaping it.
What I do is just base64_encode and base64_decode, that is, encode the value to Base64 before writing the CSV line and when I want to read it, decode.
For your example assuming it's PHP:
$csvLine = [base64_encode('Samsung U600 24"'),"10000003409","1","10000003427"];
And when I want to take the value, I do the opposite.
$value = base64_decode($csvLine[0])
I just don't like to go through the pain.

I know this is an old post, but here's how I solved it (along with converting null values to empty string) in C# using an extension method.
Create a static class with something like the following:
/// <summary>
/// Wraps value in quotes if necessary and converts nulls to empty string
/// </summary>
/// <param name="value"></param>
/// <returns>String ready for use in CSV output</returns>
public static string Q(this string value)
{
if (value == null)
{
return string.Empty;
}
if (value.Contains(",") || (value.Contains("\"") || value.Contains("'") || value.Contains("\\"))
{
return "\"" + value + "\"";
}
return value;
}
Then for each string you're writing to CSV, instead of:
stringBuilder.Append( WhateverVariable );
You just do:
stringBuilder.Append( WhateverVariable.Q() );

If a value contains a comma, a newline character or a double quote, then the string must be enclosed in double quotes. E.g: "Newline char in this field \n".
You can use below online tool to escape "" and , operators.
https://www.freeformatter.com/csv-escape.html#ad-output

Related

How to use variable in JMESPath expression?

The regular expression works perfect as below:
jmespath.search(currentStats, 'Items[?Name == Annie]')
But I want to make my filtered key as a variable.
I have tried
var name = "Annie"
jmespath.search(JSONdata, 'Items[?Name == %s]' %name;)
Which does not work.
Many thanks in advance.
There's no built-in way in jmespath or the search function to template values into the query string, but you can safely embed JSON literals with backticks, however your language allows it.
var name = "Annie";
var result = jmespath.search(JSONdata, 'Items[?Name == `' + JSON.stringify(name).replace('`','\\`') + '`]');
We need to convert the string to JSON, escape any backticks in that string and then wrap it in backticks. Let's wrap that into a function to make it a bit nicer to read:
function jmespath_escape(item) {
return '`' + JSON.stringify(item).replace('`','\\`') + '`';
}
var name = "Annie";
var result = jmespath.search(JSONdata, 'Items[?Name == ' + jmespath_escape(name) + ']');

Parsing a String That's Kind of JSON

I have a set of strings that's JSONish, but totally JSON uncompliant. It's also kind of CSV, but values themselves sometimes have commas.
The strings look like this:
ATTRIBUTE: Value of this attribute, ATTRIBUTE2: Another value, but this one has a comma in it, ATTRIBUTE3:, another value...
The only two patterns I can see that would mostly work are that the attribute names are in caps and followed by a : and space. After the first attribute, the pattern is , name-in-caps : space.
The data is stored in Redshift, so I was going to see if I can use regex to resolved this, but my regex knowledge is limited - where would I start?
If not, I'll resort to python hacking.
What you're describing would be something like:
^([A-Z\d]+?): (.*?), ([A-Z\d]+?): (.*?), ([A-Z\d]+?): (.*)$
Though this answer would imply your third attribute value doesn't really start with a comma, and that your attributes name could countain numbers.
If we take this appart:
[A-Z\d] Capital letters and numbers
+?: As many as needed, up to the first :
(.*?), a space, then as many characters as needed up to a coma and a space
^ and $ The begining and the end of a string, respectively
And the rest is a repetition of that pattern.
The ( ) are just meant to identify your capture sections, in this case, they don't impact directly the match.
Here's a working example
Often regex is not the right tool to use when it seems like it is.
Read this thoughtful post for details: https://softwareengineering.stackexchange.com/questions/223634/what-is-meant-by-now-you-have-two-problems
When a simpler scheme will do, use it! Here is one scheme that would successfully parse the structure as long as colons only occur between attributes and values, and not in them:
Code
static void Main(string[] args)
{
string data = "ATTRIBUTE: Value of this attribute,ATTRIBUTE2: Another value, but this one has a comma in it,ATTRIBUTE3:, another value,value1,ATTRIBUTE4:end of file";
Console.WriteLine();
Console.WriteLine("As an String");
Console.WriteLine();
Console.WriteLine(data);
string[] arr = data.Split(new[] { ":" }, StringSplitOptions.None);
Dictionary<string, string> attributeNameToValue = new Dictionary<string, string>();
Console.WriteLine();
Console.WriteLine("As an Array Split on ':'");
Console.WriteLine();
Console.WriteLine("{\"" + String.Join("\",\"", arr) + "\"}");
string currentAttribute = null;
string currentValue = null;
for (int i = 0; i < arr.Length; i++)
{
if (i == 0)
{
// The first element only has the first attribute name
currentAttribute = arr[i].Trim();
}
else if (i == arr.Length - 1)
{
// The last element only has the final value
attributeNameToValue[currentAttribute] = arr[i].Trim();
}
else
{
int indexOfLastComma = arr[i].LastIndexOf(",");
currentValue = arr[i].Substring(0, indexOfLastComma).Trim();
string nextAttribute = arr[i].Substring(indexOfLastComma + 1).Trim();
attributeNameToValue[currentAttribute] = currentValue;
currentAttribute = nextAttribute;
}
}
Console.WriteLine();
Console.WriteLine("As a Dictionary");
Console.WriteLine();
foreach (string key in attributeNameToValue.Keys)
{
Console.WriteLine(key + " : " + attributeNameToValue[key]);
}
}
Output:
As an String
ATTRIBUTE: Value of this attribute,ATTRIBUTE2: Another value, but this one has a comma in it,ATTRIBUTE3:, another value,value1,ATTRIBUTE4:end of file
As an Array Split on ':'
{"ATTRIBUTE"," Value of this attribute,ATTRIBUTE2"," Another value, but this one has a comma in it,ATTRIBUTE3",", another value,value1,ATTRIBUTE4","end of file"}
As a Dictionary
ATTRIBUTE : Value of this attribute
ATTRIBUTE2 : Another value, but this one has a comma in it
ATTRIBUTE3 : , another value,value1
ATTRIBUTE4 : end of file

OpenCsv reading file with escaped separator

Am using opencsv 2.3 and it does not appear to be dealing with escape characters as I expect. I need to be able to handle an escaped separator in a CSV file that does not use quoting characters.
Sample test code:
CSVReader reader = new CSVReader(new FileReader("D:/Temp/test.csv"), ',', '"', '\\');
String[] nextLine;
while ((nextLine = reader.readNext()) != null) {
for (String string : nextLine) {
System.out.println("Field [" + string + "].");
}
}
and the csv file:
first field,second\,field
and the output:
Field [first field].
Field [second].
Field [field].
Note that if I change the csv to
first field,"second\,field"
then I get the output I am after:
Field [first field].
Field [second,field].
However, in my case I do not have the option of modifying the source CSV.
Unfortunately it looks like opencsv does not support escaping of separator characters unless they're in quotes. The following method (taken from opencsv's source) is called when an escape character is encountered.
protected boolean isNextCharacterEscapable(String nextLine, boolean inQuotes, int i) {
return inQuotes // we are in quotes, therefore there can be escaped quotes in here.
&& nextLine.length() > (i + 1) // there is indeed another character to check.
&& (nextLine.charAt(i + 1) == quotechar || nextLine.charAt(i + 1) == this.escape);
}
As you can see, this method only returns true if the character following the escape character is a quote character or another escape character. You could patch the library to this, but in its current form, it won't let you do what you're trying to do.

how to parse xml which contains special characters in element attribute value .how to replace that special character with space and parse it in java

am trying to parse a xml file in java using DOM OR SAX. the problem is while parsing , if my xml contains atteibute values as special character like < > " then parser throws ParserException.
for example xml file:
<?xml version="1.0" encoding="UTF-8"?>
<abc>
<check name="bike" value="apache <nice model"/>
<check name="car" value="tata sumo "style" />
</abc>
in this example xml element check has attribute value and it contains < or " . the parser takes it as invalid and throws parser exception.
now my problem is ,before parsing xml file to parser , detect that special character(< or > or " ) in xml file attribute values and have to replace that symbol(< > ") with spaces.
eg: if xml contains <
<check name="bike" value="apache <nice model"/>
replace with space
<check name="bike" value="apache nice model"/>
.please give me suggestions.in what method it can be done...can we do it using XSD...
thanks in advance.
What about replacing those symbols with entities?
&apos; is an apostrophe: '
& is an ampersand: &
" is a quotation mark: "
< is a less-than symbol: <
> is a greater-than symbol: >
One could argue if it really is xml. One rule is that xml must be well-formed. That means tags must have opening and closing, certain characters are not allowed (particularly <> in attributes) in all places.
If you can't correct this from the source, that is, produce well-formed xml, then I guess you need to first do simple search and replace as #Visher suggests and then treat it as xml or come up with your own parser
This code works pretty well (replaces '<' and '>' in quotation marks):
public static void main(String[] args)
{
char[] characters = new char[]{'<', '>'};
String[] entities = new String[]{"<", ">"};
String text = "<check name=\"bike\" value=\"apache <nice model\"/> ";
StringBuilder sb = new StringBuilder();
boolean insideQuotation = false;
for (int i = 0; i < text.length(); i++)
{
char character = text.charAt(i);
if (insideQuotation)
{
int index = -1;
for (int x = 0; x < characters.length; x++)
{
if (characters[x] == character)
{
index = x;
break;
}
}
if (index != -1)
sb.append(entities[index]);
else
sb.append(character);
if (character == '"')
insideQuotation = false;
}
else
{
if (character == '"')
insideQuotation = true;
sb.append(character);
}
}
System.out.println(sb.toString());
}
There will be a problem if you will add another quotations marks inside quotation.

How do I escape special characters in MySQL?

For example:
select * from tablename where fields like "%string "hi" %";
Error:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'hi" "' at line 1
How do I build this query?
The information provided in this answer can lead to insecure programming practices.
The information provided here depends highly on MySQL configuration, including (but not limited to) the program version, the database client and character-encoding used.
See http://dev.mysql.com/doc/refman/5.0/en/string-literals.html
MySQL recognizes the following escape sequences.
\0 An ASCII NUL (0x00) character.
\' A single quote (“'”) character.
\" A double quote (“"”) character.
\b A backspace character.
\n A newline (linefeed) character.
\r A carriage return character.
\t A tab character.
\Z ASCII 26 (Control-Z). See note following the table.
\\ A backslash (“\”) character.
\% A “%” character. See note following the table.
\_ A “_” character. See note following the table.
So you need
select * from tablename where fields like "%string \"hi\" %";
Although as Bill Karwin notes below, using double quotes for string delimiters isn't standard SQL, so it's good practice to use single quotes. This simplifies things:
select * from tablename where fields like '%string "hi" %';
I've developed my own MySQL escape method in Java (if useful for anyone).
See class code below.
Warning: wrong if NO_BACKSLASH_ESCAPES SQL mode is enabled.
private static final HashMap<String,String> sqlTokens;
private static Pattern sqlTokenPattern;
static
{
//MySQL escape sequences: http://dev.mysql.com/doc/refman/5.1/en/string-syntax.html
String[][] search_regex_replacement = new String[][]
{
//search string search regex sql replacement regex
{ "\u0000" , "\\x00" , "\\\\0" },
{ "'" , "'" , "\\\\'" },
{ "\"" , "\"" , "\\\\\"" },
{ "\b" , "\\x08" , "\\\\b" },
{ "\n" , "\\n" , "\\\\n" },
{ "\r" , "\\r" , "\\\\r" },
{ "\t" , "\\t" , "\\\\t" },
{ "\u001A" , "\\x1A" , "\\\\Z" },
{ "\\" , "\\\\" , "\\\\\\\\" }
};
sqlTokens = new HashMap<String,String>();
String patternStr = "";
for (String[] srr : search_regex_replacement)
{
sqlTokens.put(srr[0], srr[2]);
patternStr += (patternStr.isEmpty() ? "" : "|") + srr[1];
}
sqlTokenPattern = Pattern.compile('(' + patternStr + ')');
}
public static String escape(String s)
{
Matcher matcher = sqlTokenPattern.matcher(s);
StringBuffer sb = new StringBuffer();
while(matcher.find())
{
matcher.appendReplacement(sb, sqlTokens.get(matcher.group(1)));
}
matcher.appendTail(sb);
return sb.toString();
}
You should use single-quotes for string delimiters. The single-quote is the standard SQL string delimiter, and double-quotes are identifier delimiters (so you can use special words or characters in the names of tables or columns).
In MySQL, double-quotes work (nonstandardly) as a string delimiter by default (unless you set ANSI SQL mode). If you ever use another brand of SQL database, you'll benefit from getting into the habit of using quotes standardly.
Another handy benefit of using single-quotes is that the literal double-quote characters within your string don't need to be escaped:
select * from tablename where fields like '%string "hi" %';
MySQL has the string function QUOTE, and it should solve the problem
For strings like that, for me the most comfortable way to do it is doubling the ' or ", as explained in the MySQL manual:
There are several ways to include quote characters within a string:
A “'” inside a string quoted with “'” may be written as “''”.
A “"” inside a string quoted with “"” may be written as “""”.
Precede the quote character by an escape character (“\”).
A “'” inside a string quoted with “"” needs no special treatment and need not be doubled or escaped. In the same way, “"” inside a
Strings quoted with “'” need no special treatment.
It is from http://dev.mysql.com/doc/refman/5.0/en/string-literals.html.
You can use mysql_real_escape_string. mysql_real_escape_string() does not escape % and _, so you should escape MySQL wildcards (% and _) separately.
For testing how to insert the double quotes in MySQL using the terminal, you can use the following way:
TableName(Name,DString) - > Schema
insert into TableName values("Name","My QQDoubleQuotedStringQQ")
After inserting the value you can update the value in the database with double quotes or single quotes:
update table TableName replace(Dstring, "QQ", "\"")
If you're using a variable when searching in a string, mysql_real_escape_string() is good for you. Just my suggestion:
$char = "and way's 'hihi'";
$myvar = mysql_real_escape_string($char);
select * from tablename where fields like "%string $myvar %";