CSV using windows line break as a delimiter - mysql

I'm trying to build a .CSV file out of information in a database. Unfortunately I don't have access to do this right from a SQL query and the rest of the site uses Classic ASP.
The information in the database often contains line breaks. Users enter this information with a html <textarea> field on a website. Line breaks show up in the resulting web page when the information is pulled from the database. The long term goal is to use the .CSV created here to change information then upload the file to update the database. Because of that I'm looking for a way to preserve the line breaks, or maybe replace them with <br/> tags.
The problem I'm running into is that Excel is using whatever line break character as a row delimiter. Is it better to replace the break character with <br/>? If that's the case how do I find the character I'm replacing for my Replace()? Is there a way to escape these characters and avoid the replace()? I've already got each cell surrounded by double quotes to escape other quotes.

I was able to replace the line break character with a <br/> that seems to work just as well when information is displayed on the site. I ended up using
writethis = Replace(myString, vbCrLf, "<br/>")
Which replaced the line break with a <br/> and stopped the row from ending prematurely.

Related

Azure Data Studio set tab as delimiter when exporting a csv

in Azure Data Studio there's a setting called queryEditor.results.saveAsCsv.delimiter that allows the user to choose the default delimiter when exporting query results in a csv file. However, I'm struggling to find a way to set this property to use a tab as a delimiter; i tried \t, but it seem that the property only accepts 1 single character, so it considers only \. I searched a lot and couldn't find any solution. Any ideas?
Make sure you're placing the escaped character in double quotes
"queryEditor.results.saveAsCsv.delimiter": "\t",
I don't think it accepts one character, I wanted my CSV exports to have a record for each line so my line separator needs be CRLF on a Windows machine. It's set to:
"queryEditor.results.saveAsCsv.lineSeperator": "\r\n",

SSIS Text Qualifier not working on last column

My flat files fields are tab delimited (\t) with a quotation mark (") text qualifier, and each row is linefeed (LF) separated.
My SSIS package works great when
no fields are text qualified
any field EXCEPT the last column is text qualified
When the last column is text qualified my package errors out saying it couldn't find the delimiter for my last column ... any ideas?
In a programmer's life, 3 problems (that often take hours to track down) are certain: permissions, case sensitivity, and line endings.
In my case, it is line endings. When a CRLF is pressed against the text qualifier ("), SSIS apparently doesn't interpret the text qualifier correctly, but does see the line break.
Here's what my setup looked like when I was having issues:
Here's what my setup looked like after changing the column delimiter:
The official answer here then is to change the line endings. The unfortunate side effect of that is to change a package that works on all the other files - leading to a need to convert files with CRLF to LF before hitting this package, or ending up with unsightly workarounds as seen here.
use this application on your SSIS Execute task
http://www.softsea.com/review/U2WIN.html
and put in flat file folder
I tried set TextQualified = 'False' on your last column to see if it helps

Exporting Database with HTML from PHPMYADMIN

I've been trying to export data from a Virtuemart installation into an excel file, so that it can be easily imported into Magento. The problem I'm having is that any fields containing HTML are causing line breaks and breaking the formatting of the file.
I've tried using semicolon as the delimiter as well as tab, but that didn't seem to address the issue because the odd line breaks were still there.
Is removing the line breaks and praying for it to work the only way around this?
Thanks!
It's not clear whether the problem comes from unescaped commas or newlines in your CSV file, but either way there should be a means to properly escape them so they don't affect your import.
I'm also not quite clear what programs you're using in what ways; you've tagged this as phpMyAdmin and in the title ask about an export from phpMyAdmin, but reference Virtuemart and Magento in the post, so I'm guessing you're using phpMyAdmin to do the import/export of the database used by those other ecommerce programs.
Can I strongly suggest using the SQL file type instead?
Within phpMyAdmin, you can select custom values for "Lines terminated with" on both import and export for CSV files. Perhaps you can leverage that to make, for instance, ยง your line termination value. Incidentally, my understanding is that as long as each field is properly escaped ("Columns enclosed with" and/or "Columns escaped with"), an extra newline or comma in your content shouldn't matter to your import/export. Open up the exported file in a text editor and look at a few of the entries to make sure they're properly escaped and perhaps post a few lines that fail as an example here (obscuring any sensitive information, of course).

How to tell what type of newline is being used in a txt file?

I have a txt file which contains quoted, comma deliminated text, and i am trying to figure out what type of newline is being used.
The reason is because i am trying to import it into mysql server, using local infile, and obviously i need to tell it the correct LINES TERMINATED BY
When i use either \n or \r\n it imports exactly half, of the records only, skipping a line each time.
But when i use \r it imports exactly double, giving me the exact number of rows as all values null, as there are records.
When i open the file in notepad, there is no space in between lines, however, if i open it in a browser, there is a blank line in between each line, as though there is a paragraph there somewhere. Like wise if i choose "open with > Excel" it does not put into columns, and has a blank line between each. The only way to open properly in excel is to use "get external data > From text" and choose comma deliminator.
I provide a couple of lines below exactly by just copying and pasting, and obviously it would be great if someone could let me know the correct settings to use for importing. But i it would be even more great, if there was a way for me to quickly know what type of newline any particular file is using (there is also a blank line at the very end of the file as per the other rows).
"Item No.","Description","Description 2","Customers Price","Home stock","Brand Name","Expected date for delivery","Item Group No.","Item Group Name","Item Product Link (Web)","Item Picture Link (Web)","EAN/UPC","Weight","UNSPSC code","Product type code","Warranty"
"/5PL0006","Drum Unit","DK-23","127.00","32","Kyocera","04/11/2013","800002","Drums","http://uk.product.com/product.aspx?id=%2f5PL0006","http://s.product.eu/products/2_PICTURE-TAKEN.JPG","5711045360824","0.30","44103109","","3M"
"/DK24","DK-24 Drum Unit FS-3750","","119.00","8","Dell","08/11/2013","800002","Drums","http://uk.product.com/product.aspx?id=%2fDK24","http://s.product.eu/products/2_PICTURE-TAKEN.JPG","5711045360718","0.20","44103109","","3M"

Using Excel to create a CSV file with special characters and then Importing it into a db using SSIS

Take this XLS file
I then save this XLS file as CSV and then open it up with a text editor. This is what I see:
Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB""C","D,E",F,03,"3,2"
I see that the double quote character in column C was stored as AB""C, the column value was enclosed with quotations and the double quote character in the data was replaced with 2 double quote characters to indicate that the quote is occurring within the data and not terminating the column value. I also see that the value for column G, 3,2, is enclosed in quotes so that it is clear that the comma occurs within the data rather than indicating a new column. So far, so good.
I am a little surprised that all of the column values are not enclosed by quotes but even this seems reasonable OK when I assume that EXCEL only specifies column delimieters when special characters like a commad or a dbl quote character exists in the data.
Now I try to use SQL Server to import the csv file. Note that I specify a double quote character as the Text Qualifier character.
And a command char as the Column delimiter character. However, note that SSIS imports column 3 incorrectly,eg, not translating the two consecutive double quote characters as a single occurence of a double quote character.
What do I have to do to get Excel and SSIS to get along?
Generally people avoid the issue by using column delimiter chactacters that are LESS LIKELY to occur in the data but this is not a real solution.
I find that if I modify the file from this
Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB""C","D,E",F,03,"3,2"
...to this:
Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB"C","D,E",F,03,"3,2"
i.e, removing the two consecutive quotes in column C's value, that the data is loaded properly, however, this is a little confusing to me. First of all, how does SSIS determine that the double quote between the B and the C is not terminating that column value? Is it because the following characters are not a comma column delimiter or a row delimiter (CRLF)? And why does Excel export it this way?
According to Wikipedia, here are a couple of traits of a CSV file:
Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes. For example:
"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
However, it looks like SSIS doesn't like it that way when importing. What can be done to get Excel to create a CSV file that could contain ANY special characters used as column delimiters, text delimiters or row delimiters in the data? There's no reason that it can't work using the approach specified in Wikipedia,. which is what I thought the old MS DTS packages used to do...
Update:
If I use Notepad change the input file to
Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8
"1","ABC","AB""C","D,E","F","03","3,2","AB""C"
Excel reads it just fine
but SSIS returns
The preview sample contains embedded text qualifiers ("). The flat file parser does not support embedding text qualifiers in data. Parsing columns that contain data with text qualifiers will fail at run time.
Conclusion:
Just like the error message says in your update...
The flat file parser does not support embedding text qualifiers in data. Parsing columns that contain data with text qualifiers will fail at run time.
Confirmed bug in Microsoft Connect. I encourage everyone reading this to click on this aforementioned link and place your vote to have them fix this stinker. This is in the top 10 of the most egregious bugs I have encountered.
Do you need to use a comma delimiter.
I used a pipe delimiter with no Text qualifier and it worked fine. Here is my output form the text file.
1|ABC|AB"C|D,E|F|03|3,2
You have 3 options in my opinion.
Read the data into a stage table.
Run any update queries you need on the columns
Now select your data from the stage table and output it to a flat file.
OR
Use pipes are you delimiters.
OR
Do all of this in a C# application and build it in code.
You could send the row to a script in SSIS and parse and build the file you want there as well.
Using text qualifiers and "character" delimited fields is problematic for sure.
Have Fun!