formatting MySQL output to valid CSV or XLSX - mysql

I have a query whose output I format and dump onto a CSV file.
This is the code I'm using,
(query.....)
INTO OUTFILE
"/tmp/dump.csv"
FIELDS TERMINATED BY
','
ENCLOSED BY
'"'
LINES TERMINATED BY
'\n'
;
However when I open the CSV in Google Sheets or Excel, the columns are broken up into hundreds of smaller ones.
When I open the CSV in a plain text editor, I see that the column values itself contain quotes (single and double), commas, line-breaks.
Only the double-quotes are escaped.
Even though the double-quotes are escaped, they are omitted when interpreted by Google Sheets and Excel.
I tried manually editing the CSV entries; escaping the commas and such. But no luck. The commas still break the columns. However, in a couple of instances they didn't break the column. I am not able to figure why though.
So my question is how do I correctly format the output to accommodate for these characters and dump it onto a CSV or even an XLXS ( in case a CSV is not capable for situations like these )?
For context, I'm operating in a WordPress environment. If there is a solution in PHP, that can work too.
EDIT ::
Here is a sample line from the CSV,
"1369","Blaze Pannier Mounts for KTM Duke 200 & 390","HTA.04.740.80200/B","<strong>Product Description</strong><span data-sheets-value=\"[null,2,"SW Motech brings you the Blaze Pannier Brackets for the Duke 200 & 390. "]\" data-sheets-userformat=\"[null,null,15293,[null,0],11]\">SW Motech brings you the Blaze Pannier Brackets for the Duke 200 & 390.</span>"," <strong>What's in the box? </strong><span data-sheets-value=\"[null,2,"2 Quick Lock SupportsnMounting materialnMounting Instructions"]\" data-sheets-userformat=\"[null,null,15293,[null,0],null,[null,[[null,2,0,null,null,[null,2,13421772]],[null,0,0,3],[null,1,0,null,1]]],[null,[[null,2,0,null,null,[null,2,13421772]],[null,0,0,3],[null,1,0,null,1]]],[null,[[null,2,0,null,null,[null,2,13421772]],[null,0,0,3],[null,1,0,null,1]]],[null,[[null,2,0,null,null,[null,2,13421772]],[null,0,0,3],[null,1,0,null,1]]],null,0,1,0,null,[null,2,0],"calibri,arial,sans,sans-serif",11]\">2 Quick Lock SupportsMounting materialMounting Instructions</span> ","Installation Instructions"

From RFC 4180
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
Any double quotes inside fields enclosed with double quotes need to be escaped with another double quote. So given abc,ab"c," the expected formatting would be abc,"ab""c","""".

Related

Exporting data from SSRS to a .csv file adds lots of quotation marks how do I get just one set?

I have a report which is just a simple SELECT statement which generates a list of columns full of data. I want this data to be exported as a CSV file with each datum being enclosed in " quotation marks. I have created a table and used this as my expression
=""""+Fields!Activity_Code.Value+""""
When I run the report inside ReportBuilder 3.0 I get exactly what I'm looking for
No headers and each datum has quotation marks, perfect.
But when I hit export to csv, and then open with notepad I see this.
The headers are in there where they shouldn't be and each datum has 3 quotation marks on each side. What am I doing wrong?
This is perfectly normal.
When csv fields contain a separator or double quotes, the fields are enclosed in double quotes and the quotes inside the fields are escaped with another quote.
Example - the fields:
123
"27" monitor"
456
become:
123,"""27"" monitor""",456
or:
"123","""27"" monitor""","456"
A csv reader/parser should handle this correctly when reading the data (or you could provide a parameter telling the parser that the fields are quoted).
On the other hand, if you just want your fields to be quoted inside the csv (and not visible after opening the file), you can tell the csv generator to quote the fields (or in this case do nothing since the generator seems to be adding quotes already).

Prevent LOAD DATA INFILE from escaping double double quotes

I have csv data like the following:
"E12 98003";1085894;"HELLA";"8GS007949261";"";1
"5 3/4"";652493;"HELLA";"9HD140976001";"";1
Some fields are included in double quotes. The problem is that
as you may see in the second line the data in the first column contains a double quotation mark at the end as part of the data.
I tried something along the lines of:
LOAD DATA INFILE file.csv
INTO TABLE mytable
FIELDS TERMINATED BY ';' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
but it will use the quotation mark in the data to escape the field enclosing quotation mark. I also tried ESCAPED BY '' and ESCAPED BY '\\' with no success.
Is there a way to stop the LOAD DATA INFILE command from escaping the double double quotation marks?
Or should I parse the csv and put double quotation marks when there is only one?
I am parsing the files anyway using powershell to change the encoding to utf8. Is there some way to fix this quickly there? My powershell code:
function Convert-FileToUTF8 {
param([string]$infile,
[string]$outfile,
[System.Int32]$encodingCode)
$encoding = [System.Text.Encoding]::GetEncoding($encodingCode)
$text = [System.IO.File]::ReadAllText($infile, $encoding)
[System.IO.File]::WriteAllText($outfile, $text)
}
Ok, I did it using a .NET regular expression to fix the csv. It is costly, but not too much.
I wrote
$text = [regex]::Replace($text, "(?m)(?<!^)(?<!\;)""(?!\;)(?!\r?$)", '""');
just before the last line in the function and it seems to work ok. Since I am a novice in regular expressions this could probably be improved.
The main problem is that the input data constitutes invalid CSV syntax, as stated in RFC-4180, paragraph 7:
If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.
But in your PowerShell script you could try to fix this issue with an extra line, using the replace method on $text, once you got it's value:
$text = $text.Replace('"";', '""";')
This should be enough, as the loader will deal well with unescaped double quotes if they appear elsewhere in the data, as stated on mysql.com (my highlight):
If the field begins with the ENCLOSED BY character, instances of that character are recognized as terminating a field value only if followed by the field or line TERMINATED BY sequence.
Of course, if the badly formatted CSV has data that contains ";, then you still have a problem. But it is very hard to determine whether such an occurrence terminates the data or should be seen as part of the data, even for humans :-)
Another thing to pay attention to as found on mysql.com:
If the input values are not necessarily enclosed within quotation marks, use OPTIONALLY before the ENCLOSED BY keywords.
In addition: importing CSV files in MySQL having the values enclosed in quotes works fine when using the ENCLOSED BY option.. UNLESS the enclosed field is the last field in a row, AND you used Excel to create the CSV file. Excel omits the field separator after the last field in a row. MySQL doesn't mind... unless the last field is enclosed in quotes. Then the import terminates at that line.
Examples:
This works fine: ...;value2;value3 (no trailing separator)
This also works fine ...;"value 2";value3 (value enclosed in quotes)
This also works fine ...;value 2;"value3"; (last field value enclosed in quotes and trailing separator)
But this breaks the import: ...;value2;"value 3" (last field value enclosed in quotes and no trailing separator)
Took me some time to figure this out; hope sharing this saves somebody else that time.

MySQL importing CSV file with phpmyadmin without cell quotes

I have a huge CSV file with the data like this:
ID~Name~Location~Price~Rating
02~Foxtrot~Scotland~~9
08~Alpha~Iceland~9.90~4
32~ForestLane~Germany~14.35~
The issue is that when importing using PHPMyAdmin, it asks for Columns enclosed with: and Columns escaped with:. The trouble is, that this CSV doesn't have quotes for the cells.
If I leave this blank, it gives the error: Invalid parameter for CSV import: Columns escaped with
Is there a way to import without having quotes on the CSV?
I can reproduce this behavior. I'll bring it up on the phpMyAdmin development discussion list, but in the meantime, you can can work around it by using some nonsense character for "Columns escaped with" and leaving "Columns enclosed with" blank. Make sure your data doesn't contain, say a " or £ and use that for "Columns escaped with". For instance, I have a data set where I know £ doesn't exist, so I can use that for the "Columns escaped with" character -- if you don't have any escaped characters, you can enter any character there.
I'll update if I can provide any more useful information, but certainly that workaround should allow you to import your data.

error with MySQL load data infile field with double quotes

I have .csv file data like this:
"UPRR 38 PAN AM "M"","1"
and I loaded data into table using below command which is having two columns (a and b).
LOAD DATA LOCAL INFILE 'E:\monthly_data.csv'
INTO TABLE test_data_table
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n';
But when I select table, it's giving unexpected results which is shown below.
a contains:
UPRR 38 PAN AM "M","1
... and b is NULL.
Thanks
You can replace all the instances of "Double quote double quote" in your file
either A. open the files and find replace them
or B. make a script to open the files and replace the extra quote that is messing it up
You have this:
ENCLOSED BY '"'
Thus " is not a regular character any more. It's a special character that has a special meaning: it highlights the start and end of a column value. If you want to type a " that does not behave that way you need to escape it. The RFC 4180 - Common Format and MIME Type for Comma-Separated Values (CSV) Files document explains how to do that:
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote
a;b
"UPRR 38 PAN AM ""M""";1
As they say, garbage in, garbage out ;-)

Using Excel to create a CSV file with special characters and then Importing it into a db using SSIS

Take this XLS file
I then save this XLS file as CSV and then open it up with a text editor. This is what I see:
Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB""C","D,E",F,03,"3,2"
I see that the double quote character in column C was stored as AB""C, the column value was enclosed with quotations and the double quote character in the data was replaced with 2 double quote characters to indicate that the quote is occurring within the data and not terminating the column value. I also see that the value for column G, 3,2, is enclosed in quotes so that it is clear that the comma occurs within the data rather than indicating a new column. So far, so good.
I am a little surprised that all of the column values are not enclosed by quotes but even this seems reasonable OK when I assume that EXCEL only specifies column delimieters when special characters like a commad or a dbl quote character exists in the data.
Now I try to use SQL Server to import the csv file. Note that I specify a double quote character as the Text Qualifier character.
And a command char as the Column delimiter character. However, note that SSIS imports column 3 incorrectly,eg, not translating the two consecutive double quote characters as a single occurence of a double quote character.
What do I have to do to get Excel and SSIS to get along?
Generally people avoid the issue by using column delimiter chactacters that are LESS LIKELY to occur in the data but this is not a real solution.
I find that if I modify the file from this
Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB""C","D,E",F,03,"3,2"
...to this:
Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB"C","D,E",F,03,"3,2"
i.e, removing the two consecutive quotes in column C's value, that the data is loaded properly, however, this is a little confusing to me. First of all, how does SSIS determine that the double quote between the B and the C is not terminating that column value? Is it because the following characters are not a comma column delimiter or a row delimiter (CRLF)? And why does Excel export it this way?
According to Wikipedia, here are a couple of traits of a CSV file:
Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes. For example:
"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
However, it looks like SSIS doesn't like it that way when importing. What can be done to get Excel to create a CSV file that could contain ANY special characters used as column delimiters, text delimiters or row delimiters in the data? There's no reason that it can't work using the approach specified in Wikipedia,. which is what I thought the old MS DTS packages used to do...
Update:
If I use Notepad change the input file to
Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8
"1","ABC","AB""C","D,E","F","03","3,2","AB""C"
Excel reads it just fine
but SSIS returns
The preview sample contains embedded text qualifiers ("). The flat file parser does not support embedding text qualifiers in data. Parsing columns that contain data with text qualifiers will fail at run time.
Conclusion:
Just like the error message says in your update...
The flat file parser does not support embedding text qualifiers in data. Parsing columns that contain data with text qualifiers will fail at run time.
Confirmed bug in Microsoft Connect. I encourage everyone reading this to click on this aforementioned link and place your vote to have them fix this stinker. This is in the top 10 of the most egregious bugs I have encountered.
Do you need to use a comma delimiter.
I used a pipe delimiter with no Text qualifier and it worked fine. Here is my output form the text file.
1|ABC|AB"C|D,E|F|03|3,2
You have 3 options in my opinion.
Read the data into a stage table.
Run any update queries you need on the columns
Now select your data from the stage table and output it to a flat file.
OR
Use pipes are you delimiters.
OR
Do all of this in a C# application and build it in code.
You could send the row to a script in SSIS and parse and build the file you want there as well.
Using text qualifiers and "character" delimited fields is problematic for sure.
Have Fun!