SSIS package to Insert flat file data to two different tables - ssis

I want to insert flat file data to two different sql table.But some additional field coming from flat file should be inserted to other table on the basis of indicator field but the usual field coming should be inserted into the regular table.
The other issue,the additional field to be inserted cannot be inserted directly because of no column mapping.
eg:
1234 056 Y Tushar
5678 065 N
So 1234 056 should be inserted to regular table but indicator Y tells us that Tushar should be inserted to other table.
But the table in which I want to Insert Tushar cannot be done directly as it does not have 1234 column name.
For indicator N also it should get inserted normally in the base table.
So what I did was I used a conditional split and then used ole db command but it it inserting multiple records in the table.

If you put a Multicast task right after your flat file source, you can create extra copies of your data set. Then you can use one copy to insert into Regular Table, and then you can put your Conditional Split on the second copy.
Your data flow would then look like this:
In my Flat File Source I defined four columns:
The Multicast doesn't need any configuration, and I assume the Regular Table destination isn't giving you the trouble. So next, you'd create the Indicator check with a Conditional Split task. Check for a value of Y like this:
Then just map whichever available columns you want to insert into Other Table. I chose the second column (I called mine Seq) and the Name column. You may have these named differently.

Related

How to Guess schema in Mysqlinput on the fly in Talend

I've build a job that copy data from a mysql db table to b mysql table.
The table columns are the same except sometimes a new column can be added in table a db.
i want to retrieve all the columns from a to b but only those that exists in table b. i was able to put in the query specific select colume statment that exists in table b like:
select coulmn1,column2,columns3... from table a
the issue is if i add a new column in b that matches a the talend job schema in Mysqlinput should be changed as well cause i work with build in type.
Is there a way to force the schema columns during the job running?
If you are using a subscription version of Talend, you can use the dynamic column type. You can define a single column for your input of type "Dynamic" and map it to a column of the same type in your output component. This will dynamically get columns from table a and map them to the same columns in table b. Here's an example.
If you are using Talend Open Studio, things get a little trickier as Talend expects a list of columns for the input and output components that need to be defined at design time.
Here's a solution I put together to work around this limitation.
The idea is to list all table a's columns that are present in table b. Then convert it to a comma separated list of columns, in my example id,Theme,name and store it in a global variable COLUMN_LIST. A second output of the tMap builds the same list of columns, but this time putting single quotes between columns (so as they can be used as parameters to the CONCAT function later), then add single quotes to the beginning and end, like so: "'", id,"','",Theme,"','",name,"'" and store it in a global variable CONCAT_LIST.
On the next subjob, I query table a using the CONCAT function, giving it the list of columns to be concatenated CONCAT_LIST, thus retrieving each record in a single column like so 'value1', 'value2',..etc
Then at last I execute an INSERT query against table b, by specifying the list of columns given by the global variable COLUMN_LIST, and the values to be inserted as a single string resulting from the CONCAT function (row6.values).
This solution is generic, if you replace your table names by context variables, you can use it to copy data from any MySQL table to another table.

How to insert data into one table which is coming from 2 different csv files using conditional split transformation?

I am having two 3 csv files 1 teacher and 2 students I have to insert teachers data into one table and students data who got more than 50 marks into one table from 2 csv files, please explain how to use conditional split transformation for those 2 students file to put the data into one table
Are you sure you to use the Coniditional Split? You need to combine the student flatfiles into one table, right? If so, what you want to use is a Merge Join transformation.
You can read more about how to use the Merge Join, here.
Not sure if I have understood the question correctly. My assumptions:
Teacher is moved from CSV to a table 1 no conditions.
Student files (CSV) contain only unique records.
Records where student achieved score greater or equal to 50 are inserted into a table 2.
If the above assumptions are correct. The simplest way will be to use a loop container to loop through the students file, and have one workflow which does as follows:
Reads student file
Passes the file to the conditional split
Writes to the destination table
Conditional split task allows one to configure the conditions and outputs on those conditions.
If the file contains the column called StudentScore, then in the conditional split the first condition should be set as in the attached screen, please note that because the StudentScore is set to a string in the source file it has to be converted to the integer hence (DT_I4), if it is set to be an integer in the source file this conversion is redundant.
I also have given an output a name StudentScore, this output then will be linked to the destination file. I hope this helps.

SSIS - Reuse Ole DB source when matching Fact against lookup table twice

I am pretty new to SSIS and BI in general, so first of all sorry if this is a newbie question.
I have my source data for the fact table in a csv, so I want to match the ids against the surrogate keys in lookup tables.
The data structure in the csv is like this
... userId, OriginStationId, DestinyStationId,..
What I am trying to accomplish is to match the data against my lookup table. So what I am doing is
Reading Lookup data using OLE DB Source
Reading my csv file
Sorting both inputs by the same field
Doing a left join by Id, in order to get the SK
This way, if there is no match (aka can't find the surrogate key) I can redirect that to a rejected csv and handle it later.
something like this:
(sorry for the spanish!)
I am doing this for each dimension, so I can handle each one with different error codes.
Since OriginStationId and DestinyStationId are two values from the same dimension (they both match against the same lookup table), I wanted to know if there's a way to avoid reading two times the data from the table (I mean, not to use two ole db sources to read twice the data from the same table).
I tried adding a second output to the sort but I am not allowed to. The same goes to adding another output from OLE DB Source.
I see there's an "cache option", is the best way to go ? (Although it would impy creating anyway another OLE DB source.. right?)
The third option I thought of was joining by the two fields, but since there is only one field in the lookup table (the same field) I am getting an error when I try to map both colums from my csv against the same column in my Lookup table
There are columns missing with the sort order 2 to 2
What is the best way to go for this ?
Or I am thinking something incorrectly ?
If something was not clear let me know and I'll update my question
Any time you wish you could have multiple outputs from a component that only allows one, all you have to do is follow that component with the Multicast component, whose sole purpose is to split a Data Flow stream into multiple outputs.
Gonzalo
I have just used this article on how to derive columns for a data warehouse building:- How to Populate a Fact Table using SSIS (part 1).
Using this I built a simple package that reads a CSV file with two columns that are used to derive separate values from the same CodeTable. The CodeTable has two fields Id and Description.
The Data Flow has two "Lookup" tasks. The first one joins the attribute Lookup1 against the Description to derive its Id. The second joins the attribute Lookup2 against the Description to derive a different Id.
Here is the Data Flow:-
Note the "Data Conversion" was required to convert the string attributes from the CSV file into "Unicode string [DT_WSTR]" so they could be joined to the nvarchar(50) description attribute in the table.
Here is the Data Conversion:-
Here is the first Lookup (the second one joins "Copy of Lookup2" to the Description):-
Here is the Data Viewer output with the to two derived Ids CodeTableFirstId and CodeTableSecondId:-
Hopefully I understand your problem and this is of use to you.
Cheers John

How to skip irregular header information of a Flat File in SSIS?

I have a file like as seen below: Just Ex:
kwqif h;wehf uhfeqi f ef
fekjfnkenfekfh ijferihfq eiuh qfe iwhuq fbweq
fjqlbflkjqfh iufhquwhfe hued liuwfe
jewbkfb flkeb l jdqj jvfqjwv yjwfvjyvdfe
enjkfne khef kurehf2 kuh fkuwh lwefglu
gjghjgyuhhh jhkvv vytvgyvyv vygvyvv
gldw nbb ouyyu buyuy bjbuy
ID Name Address
1 Andrew UK
2 John US
3 Kate AUS
I want to dynamically skip header information and load flatfile to DB
Like below:
ID Name Address
1 Andrew UK
2 John US
3 Kate AUS
The header information may vary (not fixed no. of rows) from file to file.
Any help..Thanks in advance.
The generic SSIS components cannot meet this requirement. You need to code for this e.g. in an SSIS Script task.
I would code that script to read through the file looking for that header row ID Name Address, and then write that line and the rest of the file out to a new file.
Then I would load that new file using the SSIS Flat File Source component.
You might be able to avoid a script task if you'd prefer not to use one. I'll offer a few ideas here as it's not entirely clear which will be best from your example data. To some extent it's down to personal preference anyway, and also the different ideas might help other people in future:
Convert ID and ignore failures: Set the file source so that it expects however many columns you're forced into having by the header rows, and simply pull everything in as string data. In the data flow - immediately after the source component - add a data conversion component or conditional split component. Try to convert the first column (with the ID) into a number. Add a row count component and set the error output of the data conversion or conditional split to be redirected to that row count rather than causing a failure. Send the rest of the data on its way through the rest of your data flow.
This should mean you only get the rows which have a numeric value in the ID column - but if there's any chance you might get real failures (i.e. the file comes in with invalid ID values on rows you otherwise would want to load), then this might be a bad idea. You could drop your failed rows into a table where you can check for anything unexpected going on.
Check for known header values/header value attributes: If your header rows have other identifying features then you could avoid relying on the error output by simply setting up the conditional split to check for various different things: exact string matches if the header rows always start with certain values, strings over a certain length if you know they're always much longer than the ID column can ever be, etc.
Check for configurable header values: You could also put a list of unacceptable ID values into a table, and then do a lookup onto this table, throwing out the rows which match the lookup - then if you need to update the list of header values, you just have to update the table and not your whole SSIS package.
Check for acceptable ID values: You could set up a table like the above, but populate this with numbers - not great if you have no idea how many rows might be coming in or if the IDs are actually unique each time, but if you're only loading in a few rows each time and they always start at 1, you could chuck the numbers 1 - 100 into a table and throw away and rows you load which don't match when doing a lookup onto this table.
Staging table: This is probably the way I'd deal with it if I didn't want to use a script component, but in part that's because I tend to implement initial staging tables like this anyway, and I'm comfortable working in SQL - so your mileage may vary.
Pick up the file in a data flow and drop it into a staging table as-is. Set your staging table data types to all be large strings which you know will hold the file data - you can always add a derived column which truncates things or set the destination to ignore truncation if you think there's a risk of sometimes getting abnormally large values. In a separate data flow which runs after that, use SQL to pick up the rows where ID is numeric, and carry on with the rest of your processing.
This has the added bonus that you can just pick up the columns which you know will have data you care about in (i.e. columns 1 through 3), you can do any conversions you need to do in SQL rather than in SSIS, and you can make sure your columns have sensible names to be used in SSIS.

SSIS to import data from excel into multiple tables

I have an Excel sheet (input) where each row needs to be saved in one of three SQL server tables based on the Record type (column 1) of the row.
Example:
If the Record type is EMP, the whole row should go to the Employee table.
If the Record type is CUS, the whole row should go to the Customer table
I am trying to use a multicast and not sure how to split the data from multicast to the destination table. Do I need any other control in between?
Any idea would be appreciated.
A Conditional Split Component sounds like just want you need. A Conditional Split uses expressions you define to route each input row to one output. In your case, your Conditional Split would define three outputs, each of which would be attached to a SQL destination.
In comparison, the Multicast Component you're currently using sends each input row to all outputs. This component would be useful if you were trying to save a copy of each row to all three SQL destinations.