Loading comma separated .csv file into database - sql-server-2008

I was trying to load a .csv file into my database. It was a comma delimited file and for one of the columns there is a comma(,) in between the data just like Texas,Houston can some one help me how to get rid of the comma in between. the package which i have created recognizing the value after the comma as a new column but it should not be like that. Can any of the guys help me in this. I was getting error in the Flat file source itself. I thought of using Derived column but the package is failing at the source point itself.

Well some "comma" delimited files have ,"something or other", when there is a string and only use ,numeric_value, when its a number type. If your file is like this then you can preprocess your file changing ," for some (other) rare character, and similarly ", then replace the , if it occurs between the two rare characters. Or you can count the comma in any line and if its greater than the number pf delimited columns, manually frocess the exceptions

Related

structure in getMetadata activity for csv file dataset ignores an extra comma in the files first row

I am using a reference CSV file with just the correct number and name of columns and want to compare its structure with that of incoming CSV files before proceeding to use Copy Data to import the incoming CSV data into Azure SQL. Files arriving in blob storage trigger this pipeline.
The need to validate the structure has arisen due to random files arriving with a trailing comma in the header row which causes a failure in the copy data pipeline as it sees the trailing comma as an extra column.
I have set up a getMetadata for both the reference file & the incoming files. Using an If Condition, I compare schemas.
The problem I have is that the output of getMetadata is ignoring the trailing comma.
I have tried 'column count' & 'structure' arguments. The same problem either way as the getMetadata fails to see the trailing comma as an issue.
Any help appreciated
I tried with extra commas in header of csv file. Its not ignoring them
reading those extra commas also as columns.
Please check below screenshots.

csv data with comma values throws error while processing the file through the BizTalk flatfile Disassembler

I'm going to a pick a csv file in BizTalk and after some process I wanted to update it with two or more different systems.
In order to getting the csv file, I'm using the default Flatfile Disassembler for breaking it and constructing it as XML with the help of genereted schema. I can do that successfully with some consistent data however if I use a data with comma in it (other than delimiters), BizTalk fails!
Any other way to do this without using a custom pipeline component?
Expecting a simple configuration within the flatfile disassembler component!
So, here's the deal. BizTalk is not failing. Well, it is, but that is the expected and correct behavior.
What you have in an invalid CSV file. The CSV specification disallows the comma in field data unless a wrap character is used. Either way, both are reserved characters.
To accept the comma in field data, you must choose a wrap character and set that in the Wrap Character property in the Flat File Schema.
This is valid:
1/1/01,"Smith, John", $5000
This is not:
1/1/01,Smith, John, $5000
Since your schema definition has ',' as delimiter, flat file disassembler will consider the data with comma as two fields and will fail due to mismatch in columns.
You have few options:
Either add a new field to schema if you know , in data will only be present in a particular field.
Or change the delimiter in flat file from , to |(pipe) or some other character so that data does not conflict with delimiter.
Or as you mentioned manipulate the flat file in a custom pipeline component, which should be last resort if above two are not feasible.

How can I quickly reformat a CSV file into SQL format in Vim?

I have a CSV file that I need to format (i.e., turn into) a SQL file for ingestion into MySQL. I am looking for a way to add the text delimiters (single quote) to the text, but not to the numbers, booleans, etc. I am finding it difficult because some of the text that I need to enclose in single quotes have commas themselves, making it difficult to key in to the commas for search and replace. Here is an example line I am working with:
1239,1998-08-26,'Severe Storm(s)','Texas,Val Verde,"DEL RIO, PARKS",'No',25,"412,007.74"
This is FEMA data file, with 131246 lines, I got off of data.gov that I am trying to get into a MySQL database. As you can see, I need to insert a single quote after Texas and before Val Verde, so I tried:
s/,/','/3
But that only replaced the first occurrence of the comma on the first three lines of the file. Once I get past that, I will need to find a way to deal with "DEL RIO, PARKS", as that has a comma that I do not want to place a single quote around.
So, is there a "nice" way to manipulate this data to get it from plain CSV to a proper SQL format?
Thanks
CSV files are notoriously dicey to parse. Different programs export CSV in different ways, possibly including strangeness like embedding new lines within a quoted field or different ways of representing quotes within a quoted field. You're better off using a tool specifically suited to parsing CSV -- perl, python, ruby and java all have CSV parsing libraries, or there are command line programs such as csvtool or ffe.
If you use a scripting language's CSV library, you may also be able to leverage the language's SQL import as well. That's overkill for a one-off, but if you're importing a lot of data this way, or if you're transforming data, it may be worthwhile.
I think that I would also want to do some troubleshooting to find out why the CSV import into MYSql failed.
I would take an approach like this:
:%s/,\("[^"]*"\|[^,"]*\)/,'\1'/g
:%s/^\("[^"]*"\|[^,"]*\)/'\1'/g
In words, look for a double quoted set of characters or , \|, a non-double quoted set of characters beginning with a comma and replace the set of characters in a single quotation.
Next, for the first column in a row, look for a double quoted set of characters or , \|, a non-double quoted set of characters beginning with a comma and replace the set of characters in a single quotation.
Try the csv plugin. It allows to convert the data into other formats. The help includes an example, how to convert the data for importing it into a database
Just to bring this to a close, I ended up using #Eric Andres idea, which was the MySQL load data option:
LOAD DATA LOCAL INFILE '/path/to/file.csv'
INTO TABLE MYTABLE FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n';
The initial .csv file still took a little massaging, but not as much as I were to do it by hand.
When I commented that the LOAD DATA had truncated my file, I was incorrect. I was treating the file as a typical .sql file and assumed the "ID" column I had added would auto-increment. This turned out to not be the case. I had to create a quick script that prepended an ID to the front of each line. After that, the LOAD DATA command worked for all lines in my file. In other words, all data has to be in place within the file to load before the load, or the load will not work.
Thanks again to all who replied, and #Eric Andres for his idea, which I ultimately used.

Using Excel to create a CSV file with special characters and then Importing it into a db using SSIS

Take this XLS file
I then save this XLS file as CSV and then open it up with a text editor. This is what I see:
Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB""C","D,E",F,03,"3,2"
I see that the double quote character in column C was stored as AB""C, the column value was enclosed with quotations and the double quote character in the data was replaced with 2 double quote characters to indicate that the quote is occurring within the data and not terminating the column value. I also see that the value for column G, 3,2, is enclosed in quotes so that it is clear that the comma occurs within the data rather than indicating a new column. So far, so good.
I am a little surprised that all of the column values are not enclosed by quotes but even this seems reasonable OK when I assume that EXCEL only specifies column delimieters when special characters like a commad or a dbl quote character exists in the data.
Now I try to use SQL Server to import the csv file. Note that I specify a double quote character as the Text Qualifier character.
And a command char as the Column delimiter character. However, note that SSIS imports column 3 incorrectly,eg, not translating the two consecutive double quote characters as a single occurence of a double quote character.
What do I have to do to get Excel and SSIS to get along?
Generally people avoid the issue by using column delimiter chactacters that are LESS LIKELY to occur in the data but this is not a real solution.
I find that if I modify the file from this
Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB""C","D,E",F,03,"3,2"
...to this:
Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB"C","D,E",F,03,"3,2"
i.e, removing the two consecutive quotes in column C's value, that the data is loaded properly, however, this is a little confusing to me. First of all, how does SSIS determine that the double quote between the B and the C is not terminating that column value? Is it because the following characters are not a comma column delimiter or a row delimiter (CRLF)? And why does Excel export it this way?
According to Wikipedia, here are a couple of traits of a CSV file:
Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes. For example:
"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
However, it looks like SSIS doesn't like it that way when importing. What can be done to get Excel to create a CSV file that could contain ANY special characters used as column delimiters, text delimiters or row delimiters in the data? There's no reason that it can't work using the approach specified in Wikipedia,. which is what I thought the old MS DTS packages used to do...
Update:
If I use Notepad change the input file to
Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8
"1","ABC","AB""C","D,E","F","03","3,2","AB""C"
Excel reads it just fine
but SSIS returns
The preview sample contains embedded text qualifiers ("). The flat file parser does not support embedding text qualifiers in data. Parsing columns that contain data with text qualifiers will fail at run time.
Conclusion:
Just like the error message says in your update...
The flat file parser does not support embedding text qualifiers in data. Parsing columns that contain data with text qualifiers will fail at run time.
Confirmed bug in Microsoft Connect. I encourage everyone reading this to click on this aforementioned link and place your vote to have them fix this stinker. This is in the top 10 of the most egregious bugs I have encountered.
Do you need to use a comma delimiter.
I used a pipe delimiter with no Text qualifier and it worked fine. Here is my output form the text file.
1|ABC|AB"C|D,E|F|03|3,2
You have 3 options in my opinion.
Read the data into a stage table.
Run any update queries you need on the columns
Now select your data from the stage table and output it to a flat file.
OR
Use pipes are you delimiters.
OR
Do all of this in a C# application and build it in code.
You could send the row to a script in SSIS and parse and build the file you want there as well.
Using text qualifiers and "character" delimited fields is problematic for sure.
Have Fun!

How to deal with a string with comma in it from a csv, when we have to read the data by using loadrunner?

When I used Loadrunner, it can read data from a csv file. As we know , csv file is separated by a comma.
The question is, if the parameter in csv has comma itself, the string will be separated to several segments. That is not I want to get.
How can we get the original data with comma in it?
When data has a comma, use an escape character to store the data in the parameter.
For example, if the name is 'Smith, John', it can be stored as Smith\, John in the Loadrunner data file.
When you save a file in Excel that has commas in the actual cell data, the whole cell will be inside two " characters. Also it seems that cells with a space in them are inside " chars.
Example
ColA,ColB,"ColC with a , inside",ColD,ColE
More info on CSV file format: http://www.parse-o-matic.com/parse/pskb/CSV-File-Format.htm
The answer to the question is that perhaps the easiest way to do deal with , separators is to change the separator to a ; character. This is also a valid separator in CSV files.
Then the example would be:
ColA;ColB;"ColC with a , inside";ColD;ColE
Maybe the right way is to use C functions to read data from the file (for example fopen/fread)? When you have read it you be able to use "strchr" to find first quotes char and second quotes char. All in that interval would be a value, and it doesn't matter if comma is inside.
For the documentation about fopen, fread,strchr, you could refer to the HP or C function references.
Hope this will help you.
Assuming you are reading from a data file for the parameters, just use a custom seperator. Comma is the default, but you can define it to be whatever you want. Whenever I have a comma in the variable data I tend to use a pipe symbol, '|' as a way to distinguish the columns of data in the data file.
Examine your parameter dialog carefully and you will see where to make the change.