import csv file - ssis

I need to pull data from csv file to SQL Server table. Which Control task should I use ? Is it Flat File ? What is the correct method to pull data ?
The problem is I have used Flat File Task for pulling csv file. But the csv file whihc I am having, contains headings as first row, then on the third row, I have the columns, and data starting from fifth row.
Another problem is, in this file column details comes again after 1000 data ie columns appears in two rows. Is it possible to pull data ? If so, HOW ?

While Valentino's suggestion should work, I suggest that first you work with the provider of the file to get them to provide the data in a better format. When we get stuff like this we almost always push it back and ask for properly formatted data. We get it too about 90% of the time. It will save you work if they will fix their own drek. In our case, the customers providing the data are paying for our programming services and when they understand how substantial an increase in the cost to them, they are usually nmore than willing to accomodate our needs.

I believe you'll first have to transform your file into a proper CSV file so that the SSIS Flat File Source component (Data Flow) can read it. If the source system cannot produce a real CSV file, we usually create custom .NET applications for the cleanup/conversion task.
An Execute Process task (Control Flow) that executes the custom app can then be called prior to the Data Flow.

Related

Ssis empty excel columns causing error

Using Microsoft Visual Studio Community 2015.
Goal of project
-create "*\temp\email" directory
-start program to extract all emails that include xls attachments to the previously created folder
-use for each loop to cycle through each file in the folder, process, and shift to sql table.
The problem I am running into is caused by either a blank excel document (which is occasionally sent from a remote location) or some of the original xls reports only contain 5 columns instead of 6 that I have mapped now. Is there any way to separate files that include the correct columns from those that do not match?
** as Long as these two problems do not exist I can run the ssis package and everything runs without issue.
Control flow;
File System Task (creates directory --->Execute Process Task (xls extraction)-->ForEach Loop(Data flow Task "email2Sql")
Data Flow;
Excel Source (uses expression ExcelFilePath,#user:filepath) delay validation ==true
(columns are initially set to f1-f6 and are mapped to for ex. a,b,c,d,e,f. The Older files that get mixed in only include a,b,c,d,e.) This is where I want to be able to separate the xls files
Conditional Transformation split (column names are not in row 1, this helps remove "null" values)
Ole Db destination (sql table)
Sorry for the amount of reading, but for the first post I tried to include anything that I thought may be relevant.
There are some tools out there which would allow you to open the excel doc and read it. However, I think the simplest thing to do would be to use SSIS out of the box:
1 - add a file system task after the data flow which reads the file.
2 - Make the precedence constraint from the data flow to the file system task "failure." This will cause that to only fire when the data flow task fails.
3 - set the file task to move the "bad" files to another folder
This will allow you to loop through all the files and move the failed ones. Ultimately, the package will end in failure. If you don't want that behavior you can change the ForceExecutionResult property to be success. However, it might be good to know that there were problems with some files so that they can be addressed.
m

How to do File System Task in SSIS depending on Result of Data Flow

I'm writing a (what I thought to be a) simple SSIS package to import data from a CSV file into a SQL table.
On the Control Flow task I have a Data Flow Task. In that Data Flow Task I have
a Flat File Source "step",
followed by a Data Conversion "step",
followed by a OLE DB destination "step".
What I want to do is to move the source CSV file to a "Completed" folder or to a "Failed" folder based on the results of the Data Flow Task.
I see that I can't add a File System step inside the Data Flow Task, but I have to do it in the Control Flow tab.
My question is how do I do a simple thing like assign a value to a variable (I saw how to create variable and assign them a value at the bottom pane of Data Tools (2012)) depending of if the "step" succeeds or fails?
Thanks!
(You can tell by my question that I'm an SSIS rookie - and don't assume I can write a C# script, please)
I have used VB or C# scripts to accomplish this myself. Since you do not want to use scripts I would recommend using a different path for the project to flow. Have your success path lead to moving the file to completed and failure path lead to moving the file to failed. This keeps it simple and accomplishes what you are looking for.

Adding static files to Talend jobs

I'm using Talend Open Studio for Big Data and I have a job where I use tFileInputDelimited to load a CSV file and use it as a lookup with a tMap.
Currently the file is loaded from the disk using an absolute path (C:\work\jobs\lookup.csv) and everything works fine locally.
The issue is that when I deploy the task, it obviously doesn't take the lookup.csv file with it.
Which begs a question:
Is there any way to "bundle" this file (lookup.csv) into the job so I can later deploy them together?
With static data such as this your best bet is to hard code the data into the job using a tFixedFlowInput instead.
As an example, if we want to use a list of country names, their ISO2 and ISO3 codes you might have these in a CSV that you'd normally access with a tFileInputDelimited. However, to save bundling this CSV with every build (which could be done with ANT/Maven) you can just hard code this data into a tFixedFlowInput:
You then just need to make sure your schema is set up as the same as your delimited file would have been (so in this case we have 3 columns: Country_Name, ISO2 and ISO3).

SSIS Excel File Null Data Microsoft BI

I'm new to SSIS. When I try to load data from an Excel File and there is another data flow task in the same package, it just fills the table with null data, e.g., dim_Alarm(null,null,null,null). However, when I try adding a new package and the data flow task is alone in the package, then the data is loaded.
Look at the Connection Manager for the Excel Source for the dataflow that is returning null data. There is probably some difference - maybe a typo error? - between the one that returns null data, and the one that loads the data from the file.
It is unlikely that the presence or absence of the other data flows is causing this problem, unless they are hitting the same Excel file, or they are hitting the same database table dim_Alarm. It is much more likely that there is some small difference between the data flow that loads nulls and the data flow that works (in the empty package).
You can also add a Data Viewer to the data flow that isn't behaving as you expect. The Data Viewer goes on one of the arrows between transformations in the data flow. When you run the package in BIDS, the Data Viewer will show you the data that flows through that point. If the data is missing, you may be able to see where it got lost. Is there data coming out of the Excel Source, but after the next transformation there is no more data? Then that is where the problem is.

Iterate Through Rows of CSV File and Assign Particular Row Value to Package Variable

I am currently using SSIS 2008 and am fairly new to it. I have a programming background with some Java, VBA, and VB.NET.
I have a connection to a csv file that contains a list of URLs.
There about a thousand rows in the file and with each row, I want to add the URL to a package variable that will be used to see if the most current link has already been downloaded and updated or not.
I've set up a Foreach Loop Container that is intended to loop through each row of of the csv file.
However, I cannot figure out how to "look at" each row. Once I can do that I know it will be no problem to assign the URL to the variable but I am stuck mid-way. Does anyone have any suggestions?
You want to do something to each row from a given source. That's usually a data flow type of activity. Drop a Data Flow Task onto your Control Flow. Inside that data flow, add a Flat File Source. In the Flat file connection manager, click New and fill out the details for your file. I assume it's just one data element (url) per line. Click OKs and then you should have a working data source.
Great, now all you need to do is that "something" to the data coming in which in your case is "see if the most current link has already been downloaded and updated or not." I'm not sure exactly what that translates to but whatever you attach (look up task, script task, etc) to the output from the Flat File Source will perform that operation for every row flowing through it.