I am trying to use SSIS Flat File destination, but cannot come up with a work around for getting the output file to have two columns named to same thing.
I have a requirement for the output file to have the column headers:
first1, last1, email, shortname, email
Whenever I try to map the source data, I get error messages saying things like "This column name already exists" and "There is more than one data source column with the name "email"".
What's the best work around?
Thanks
Assuming I understand the problem correctly, you need to have the same column name in the output file twice. Doesn't matter whether it's same data or not, just the header needs to be repeated.
It's a little hokey, but in your connection manager, uncheck "Column Names in the first data row" and redefine the columns as email and email1. This will allow you to connect the columns to the right places in the file.
In your flat file destination, you have the ability to define Header row(s). It's very limited, you can't put useful things in there like dynamic checksums and such but in your case, paste in first1, last1, email, shortname, email and run the package. Data will be extracted to the correct columns and a header row will be prepended to the file with all the "right" field names.
Two downsides to this approach. First is the connection manager becomes output only as it would attempt to read in the header row from the file. Second is that any changes to the layout will not be kept in sync with the manual header row.
Related
I have a flat file where rows have ids in form of guid. What I need is to redirect error output to table which will have error row id, error column and error code. The problem is that I can map only "Flat File Source Error Output Column" which is some sort of other columns concatenation. Is there a way I can get the ID column value of the error row? The best solution I could find is to add counter which will give the row number, but that's not exactly what I need, as ids are strings in my case.
Nope. You get 3* columns from the Flat File Source Component's: Flat File Source Error Output Column, ErrorCode, ErrorColumn.
A Source component defines the columns that all row buffers "downstream" of that point will contain. It is responsible for adding rows and then filling the columns in that new row buffer.
The Flat File Source component has a contract that describes how it should consume the source data - this many columns, this delimiter (or this many characters) etc.
What happens though when something overflows a length, or the data type is incompatible or not all of the delimiters are present? The design decision is to either put incomplete rows into the pipeline (but then how do you determine which columns get populated - fill left to right? what about type mismatch?) or treat it as an error. Normally, this blows up the data flow but if you add an Error output path, then you can see what row failed.
And the row is the atomic unit the flat file is using as input.
Read line -> Parse -> Write to Output [or Error] buffer {loop}
You could then use a Script Task to try and parse out the GUID from "Flat File Source Error Output Column" but then you have to hope that the value is in the row. It could be that a column has an embedded delimiter that wasn't escaped, someone transferred the file using the wrong encoding/line endings, etc.
*The Flat File Source Component does expose a property in the advanced editor, Component Properties tab for FileNameColumnName and that too will show up in the Error output path but that is the only source component I am aware of with this behaviour.
Despite searching both Google and the documentation I can't figure this out:
I have a CSV file that has a header line, like this:
ID,Name,Street,City
1,John Doe,Main Street,Some City
2,Jane Done,Sideroad,Other City
Importing into FileMaker works well, except for two things:
It imports the header line as a data set, so I get one row that has an ID of "ID", a Name of "Name", etc.
It assigns the items to fields by order, including the default primary key, created date, etc. I have to manually re-assign them, which works but seems like work that could be avoided.
I would like it to understand that the header line is not a data set and that it could use the field names from the header line and match them to the field names in my FileMaker table.
How do I do that? Where is it explained?
When you import records, you have the option to select a record in the source file that contains field names (usually the first row). See #4 here.
Once you have done that, you will get the option to map the fields automatically by matching names.
If you're doing this periodically, it's best to script the action. A script will remember your choices, so you only need to do this once.
I'm working on a flow where I get CSV files. I want to put the records into different directories based on the first field in the CSV record.
For ex, the CSV file would look like this
country,firstname,lastname,ssn,mob_num
US,xxxx,xxxxx,xxxxx,xxxx
UK,xxxx,xxxxx,xxxxx,xxxx
US,xxxx,xxxxx,xxxxx,xxxx
JP,xxxx,xxxxx,xxxxx,xxxx
JP,xxxx,xxxxx,xxxxx,xxxx
I want to get the field value of the first field i.e, country. Put those records into a particular directory. US records goes to US directory, UK records goes to UK directory, and so on.
The flow that I have right now is:
GetFile ----> SplitText(line split count = 1 & header line count = 1) ----> ExtractText (line = (.+)) ----> PutFile(Directory = \tmp\data\${line:getDelimitedField(1)}). I need the header file to be replicated across all the split files for a different purpose. So I need them.
The thing is, the incoming CSV file gets split into multiple flow files with the header successfully. However, the regex that I have given in ExtractText processor evaluates it against the splitted flow files' CSV header instead of the record. So instead of getting US or UK in the "line" attribute, I always get "country". So all the files go to \tmp\data\country. Help me how to resolve this.
I believe getDelimitedField will only work off a singular line and is likely not moving past the newline in your split file.
I would advocate for a slightly different approach in which you could alter your ExtractText to find the country code through a regular expression and avoid the need to include the contents of the file as an attribute.
Using a regex of ^.*\n+(\w+) will capture the first line and the first set of word characters up to the comma and place them in the attribute name you specify in capture group 1. (e.g. country.1).
I have created a template that should get the value you are looking for available at https://github.com/apiri/nifi-review-collateral/blob/master/stackoverflow/42022249/Extract_Country_From_Splits.xml
i have 3 csv files in a folder which contains eid, ename, country fields, and my 5 csv files names are test1_20120116_034512, test1_20120116_035512,test1_20120116_035812 etc.. my requirement is I want to take lastest file based on timne stamp and modified date, which i have done. Now i want to import the extracted file name into destination table..
my destination tables contains fields like,
filepath, filename, eid, ename, country
I have posted regarding this before in the same site i got an answer for extracting filename, now i want to load the extracted FileName into destination table
Import most recent csv file to sql server in ssis
my destination tables should have output as
C:/source test1_20120116_035812 1234 tester USA
In your DataFlow task, add a Derived Column Transformation. The value of CurrentFile will be the fully qualified path to the file. As you only want the file name, I would look to use a replace function on that with the base folder and then strip the remaining slash. This does not strip the file extension but you can add yet another call to REPLACE and substitute an empty string
Derived Column Name: filename
Derived Column:
Expression: REPLACE(REPLACE(#[User::CurrentFile], #[User::RootFolder], ""), "\\", "")
The above expects it to look like
CurrentFile = "C:\source\test1_20120116_035812.csv"
RootFolder = "C:\source"
Edit
I believe you've done something in your approach that I did not do. You should see a warning about possible truncation but given the values discussed in this and the preceding question, I don't believe the 4k limit on expressions will be of concern.
Displaying the derived column
Demonstrating the derived column does work
I will give you a +1 for providing an approach I wasn't aware of, but you'll still need to add a derived column to match your provided format (base path name)
Full path is provided from the custom properties. Use the above REPLACE section to remove the path info except use the column [FileName] instead of #[User::CurrentFile]
I tried to get the filename through the procedure which Billinkc has given, but its throwing me error stating that filename column failed becaue of truncation error..
Any how i tried different approach to load file name into table.
steps i have used
1. right click on flat file Source and click on show advanced edito for Flat file
2. select component Properties tab
3. Inside that Custom Properties section ---> it has a property FileNameColumnName
I have assigned Filename to that column property like
FileNameColumnName----> FileName thats it, am able to get the filename into my destination table..
We have written a number of SSIS packages that import data from CSV files using the Flat File Source.
It now seems that after these packages are deployed into production, the providers of these files may deliver files where the column order of the files changes (Don't ask!). Currently if this happens, our packages will fail.
For example, an additional column is inserted at the beginning of each row. In this case, the flat file source continues to use the existing column order, which obviously has a detrimental effect on the transformation!
Eg. Using a trivial example, the original file has the following content :
OurReference,Client,Amount
235,MFI,20000.00
236,MS,30000.00
The output from the flat file source is :
OurReference Client Amount
235 ClientA 20000.00
236 ClientB 30000.00
Subsequently, the file delivered changes to :
OurReference,ClientReference,Client,Amount
235,A244,ClientA,20000.00
236,B222,ClientB,30000.00
When the existing unchanged package is run against this file, the output from the flat file source is :
OurReference Client Amount
235 A244 ClientA,20000.00
236 B222 ClientB,30000.00
Ideally, we would like to use a data source that will cope with this problem - ie which produces output based on the column names, instead of the column order.
Any suggestions would be welcomed!
Not that I know of.
A possibility to check for the problem in advance is to set up two different connection managers, one with a single flat row. This one can read the first row and tell if it's OK or not and abort.
If you want to do the work, you can take it a step further and make that flat one-field row the only connection manager, and use a script component in your flow to parse the row and assign to the columns you need later in the flow.
As far as I know, there is no way to dynamically add columns to the flow at runtime - so all the columns you need will need to be added to the script task output. Whether they can be found and get parsed from the each line is up to you. Any "new" (i.e. unanticipated) columns cannot be used. Columns which are missing you could default or throw an exception.
A final possibility is to use the SSIS object model to modify the package before running to alter the connection manager - or even to write the entire package dynamically using the object model based on an inspection of the input file. I have done quite a bit of package generation in C# using templates and then adding information based on metadata I obtained from master files describing the mainframe files.
Best approach would be to run a check before the SSIS package imports the CSV data. This may have to be an external script/application, because I don't think you can manipulate data in the MS Business Intelligence Studio.
Here is a rough approach. I will write down the limitations at the end.
Create a flat file source. Put the entire row in one column.
Do not check Column names in first data row.
Create a Script Component
Code:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
string sRow = Row.Column0;
string sManipulated = string.Empty;
string temp = string.Empty;
string[] columns = sRow.Split(',');
foreach (string column in columns)
{
sManipulated = string.Format("{0}{1}", sManipulated, column.PadRight(15, ' '));
}
/* Note: For sake of demonstration I am padding to 15 chars.*/
Row.Column0 = sManipulated;
}
Create a flat file destination
Map Column0 to Column0
Limitation: I have arbitrarily padded each field to 15 characters. Points to consider:
1. Do we need to have each field of same size?
2. If yes, what is that size?
A generic way to handle that would be to create a table to store the file name, fields, and field sizes.
Use the file name to dynamically create the source and destination connection manager.
Use the field name and corresponding field size to decide the padding. Not sure, if you need this much flexibility. If you have any question, please respond.