Space Before Quote in CSV Field - csv

From the CSV spec (RFC 4180), Spaces are considered part of a field and should not be ignored. Obviously if the field contains double quotes it should retain the spaces inside the quotes.
My question is, what about spaces outside of the double quotes? The only way I can see this happening is if the tool that generated the CSV didn't do it properly.
Example: one, "two" ,three
Should the space before and after "two" be included?

That cell is invalid - to properly code that row it should be:
one," ""two"" ",three
Double quotes must also be escaped (as double-double quote) since they are used as the escape sequence. If you don't want to preserve the quotes around two, technically there are two things invalid about the row - (1) the spaces before and after the quotes and (2) the fact that there are quotes around the cell but nothing to be escaped. CSV demands that there can only be quotes around the cell if there are commas or quotes inside the content of the cell.
If I were in your case, I would err on the side of leniency.

I dealt with this using BULK INSERT and BCP format files, which is tricky to account for the quote and comma delimiting. In the event that there could be variation, say with a , " delimiter We used the lowest common delimiter, so the comma in your example, then stripped out what wasn't needed like all the double quotes.
But it could also be that your source data was only comma delimited and this was the actual contents of that field. Either way, I would toss out the quotes when loading the field, in whatever method was appropriate.

Related

How do I replace comma with hyphen within the double quotes in json

I am processing a CSV file in powerautomate. Here is one record.
server2,usa,"rebooted,by citrix",25,good
Because "rebooted,by citrix" is data from a single field, when I am splitting with comma, the array manipulation gets mismatch.
I want to replace the comma within double quotes with hyphen. The expected output should be like server2,usa,"rebooted-by citrix",25,good
PrasadAthalye has a nice solution approach for this.
You could first split on the " character instead. Per item you could replace the , character and append the results to a new array. After that you should be able to apply your normal split.
https://powerusers.microsoft.com/t5/Building-Flows/Setting-up-specific-expression-to-remove-comma-inside-strings/m-p/646040/highlight/true#M86288

Removing double quotes from CSV but leaving qualifiers (using Notepad++)

I have a comma delimited text file where one of the columns (appropriately) has text encased with double quotes. There are also many instances of double quotes within the content of this particular column. I've used the following to remove many of the double quotes, replacing them with single quotes (excluding any double quotes next to a comma).
(?<!^)(?<![,])"(?![,])(?!$)
How do I isolate/replace the double quote after [fine,] without removing the "good" double quotes?
column1,"he's doing 'fine," says Tom, but nothing specific. Blah, blah, blah", column3
Here is another example of "good" double quotes that I don't want to remove (where the first two columns are blank/empty)
,,"This is text I need",
Assuming that double quotes only occur in one column then I suggest a two-step approach. First change all double quotes in the file to single quotes, using a simple replace all. Next change the first and last single quotes back to double quotes. This can be done in one regex, replace (^[^\r\n']*)'(.*)'(^[^\r\n']*)$ with \1"\2"\3.
If single quotes occur in other columns and see should not be altered then a three-step approach can be used. Choose a character that does not occur anywhere in the text. Change all double quotes to that character, I will use ! as an example. As above, change the first and last ! to double quotes. This can be done in one regex, replace (^[^\r\n']*)!(.*)!(^[^\r\n']*)$ with \1"\2"\3. Finally change all the ! to single quotes. If you cannot find an unused character then you can use a longer string that is not in the file instead, perhaps something like _<<abc>>_ instead of the !.
Struggled with this a bit, but based on your question, there might be a possible solution. If you only have one column which has unescaped quotes or commas, you might be able to count the commas in front of that column and the commas after that column then strip all the quotes and commas between them. If you have multiple columns with unescaped characters, this might be harder.
Not familiar with Notepad++, but reading other answers I assume there is a way to use regex. If so, you can use this one:
(?<!^|",)"(?!,"|$)

Data is getting wrapped in the double quotes for Hive's CHAR data types if there is trailing space in it [duplicate]

I am using the following code to export my data frame to csv:
data.write.format('com.databricks.spark.csv').options(delimiter="\t", codec="org.apache.hadoop.io.compress.GzipCodec").save('s3a://myBucket/myPath')
Note that I use delimiter="\t", as I don't want to add additional quotation marks around each field. However, when I checked the output csv file, there are still some fields which are enclosed by quotation marks. e.g.
abcdABCDAAbbcd ....
1234_3456ABCD ...
"-12345678AbCd" ...
It seems that the quotation mark appears when the leading character of a field is "-". Why is this happening and is there a way to avoid this? Thanks!
You don't use all the options provided by the CSV writer. It has quoteMode parameter which takes one of the four values (descriptions from the org.apache.commons.csv documentation:
ALL - quotes all fields
MINIMAL (default) - quotes fields which contain special characters such as a delimiter, quotes character or any of the characters in line separator
NON_NUMERIC - quotes all non-numeric fields
NONE - never quotes fields
If want to avoid quoting the last options looks a good choice, doesn't it?

SQL injection when single quotes are escaped with two single quotes

Is there any way to perform a SQL injection when single quotes are escaped by two single quotes? I know the MySQL server is using this specific technique to prevent against an attack. I'm trying to log in as a specific user but all of the common injections I've tried for the password have not worked successfully (i.e. ' or '1'='1, ' or ' 1=1, etc.).
No, and yes.
There's no way to have an unsafe values "breakout" of literal values that are enclosed in single quotes, if the value being supplied is "escaped" by preceding single quotes by with an additional single quote.
That is, assuming that your statement is guaranteeing that string literals are enclosed in quotes, as part of the "static" SQL text.
example perl-ish/php-ish
$sql = "... WHERE t.foo = '" . $safe_value . "' ... ";
^ ^
I've underscored here that the single quotes enclosing the literal are part of the SQL text. If $safe_value has been "escaped" by preceding each single quote in the "unsafe" value with another single value to make it "safe"...
$unsafe_value $safe_value
------------- ------------
I'm going I''m going
'she''s' ''she''''s''
1'='1 -- 1''=''1 --
As long as the escaping is handled properly, that we guarantee that potentially unsafe values are are run through the escaping, then including single quotes in data values is not a viable way to "breakout" of a literal with the SQL text.
That's the "no" part of the answer.
The "yes" part of the answer.
One of the biggest problems is making sure this is done EVERYWHERE, and that a mistake has not been made somewhere, assuming that a potentially unsafe string is "safe", and is not escaped. (For example, assuming that values pulled from a database table are "safe", and not escaping them before including them in SQL text.)
Also, the single quote trick is not the only avenue for SQL injection. The code could still be vulnerable.
Firstly, if we're not careful about other parts of the statement, like the single quotes enclosing string literals. Or, if for example, the code were to run the $sql through some other function, before it gets submitted to the database:
$sql = some_other_function($sql);
The return from some_other_function could potentially return SQL text that was in fact vulnerable. (As a ridiculous example, some_other_function might replace all occurrences of two consecutive single quotes with a single single quote. DOH!)
Also, with the vast number of possible unicode characters, if we're ever running through a characterset translation, there's also a possibility that some unicode character could get mapped to a single quote character. I don't have any specific example of that, but dollars to donuts that somewhere, in that plethora of multibyte encodings, there's some unicode character somewhere that will get translated to a single quote in some target.
There's a default character in the target for unmapped characters in the source, and that's usually a question mark (or a white question mark in a black diamond.) It would be a huge problem if the default character in the target (for unmapped characters in the source) was a single quote.
Bottom line: escaping unsafe strings by replacing single quotes with two single quotes goes a long ways towards mediating (mitigating?) SQL injection vulnerabilities. But in and of itself, it doesn't guarantee that code is not vulnerable in some other way.
if the input accepts unicode and is implicitly converted to ascii in the database (not as uncommon as it sounds) then an attacker can simply substitute ʻ or ʼ (0x02BB or 0x02BC) in place of single tick to get around the escaping mechanism and the implicit conversion will map those characters to single ticks (at least that's the case in SQL Server)

Check spaces in mysql field

select name from emp_profile;
Result:
tom#rj6.com
In the above result how to determine whether there are trailing spaces in it or not
RTRIM() removes trailing spaces.
If RTRIM(name) varies from name, there are trailing spaces in the field.
Related functions are LTRIM() (trims starting spaces) and TRIM() (both sides)
As a side note, I would recommend removing trailing spaces (and other invalid data) during input time on application level, not in the database.
If name is a char field it wil not have trailing spaces as far as I can ascertain varchar's do have trailing spaces.
An easy way to check for trailing whitespace to check the length against the trimmed length. rtrim()