SSIS Design pattern for Multiple XLSB File sheets to multiple tables - ssis

There are xlsxb files and each of these consists of varying number of sheets (consists of upto 9 sheets) .All the sheets are of different structure.The requirement is to laoad a specific named sheet data into corresponding table.The specific named sheet may or may not appear in all the xlsb files.
Example:
1st xlsb file consists of 9 sheets(
sh1,sh2,...,sh9)
2nd xlsb file consists of 6 sheets(sh1,sh5,sh6,sh7,sh9,sh2)
3rd xlsb file consists of 3 sheets(sh5,sh7,sh9)
The idea is all sh9 to be colleted in on table called Table_sh9.All sh5 has to
be in Table_sh5
What SSIS design pattern can be followed for this.

To have this dynamic I recommend you do this with a script task. You can look into OpenXML and ClosedXML to read the data.
https://closedxml.codeplex.com/
I would read the header to determine what table you are loading to, then create a datareader from the input and feed that datareader to sqlbulkcopy.
I have a similar solution that automatically create the tables for me and I use the datastreams library -> https://www.csvreader.com/

Related

When creating a Copy Activity in an Azure Data Factory pipeline, how do I map a CSV sheet with 5 columns to a CSV sheet with 20 columns?

So I have an input CSV sheet that I want to copy into an output CSV sheet. The output CSV sheet has all the columns in the input sheet, plus a bunch of other columns. (I will be copying data into those from other input sheets later.)
When I run the pipeline containing my Copy Activity, the only columns present in the new output sheet are the 5 columns from the input sheet, I assume because those are the only ones in the mapping. However, I've also tried creating 15 "Additional Columns" in the "source" section of the Copy Activity --- just trying out things like "test", \"test\", test, #test, #pipeline().DataFactory, $$FILEPATH, etc. --- but when I debug the pipeline and go back to my container and look at the output sheet, still only the 5 columns from the input sheet are present there!
How do I get the output sheet to contain columns that are not present in the input sheet? Do I need to create an ARM template?
I am doing this entirely via the Azure Portal, btw.
This will be much easier to design in ADF's data flows instead by creating Derived Columns to append to your output sink schema
This works fine on my side. Is there any differences?

Metadata map for importing CSV data into IPTC XMP images using Bridge

Let's say I have 100 scanned Tif files. I also have a CSV of the metadata for those 100 Tif files. Each file is named with its unique identifier, which is also column 1 of the csv.
First: How do I find a map that tells me what columns should be named what, in order to stay within the IPTC standard using XMP? (I've googled for most of the day and have found nothing)
Second: How can I merge the metadata in the CSV to each corresponding image?
I'm basically creating a spreadsheet with all 50,000 images in an archival collection, and plan to use the CSV to create the metadata for the images once they're scanned.
Thanks!
To know where to put your metadata, I'd suggest looking at the IPTC Photo Metadata Standard page. Without knowing more about your data, it's hard for someone else to say what data should go where.
As for embedding your data into your files from a CSV file, I'd suggest exiftool. Change the header of each column to the name of the TAG to write to and make the first column the path/filename of each file, your command would be as simple as
exiftool -csv=file.csv /path/to/files
See exiftool FAQ #26 for more details.

How to Get Data from CSV File and Send them to Excel Using Pentaho?

I have a tabular csv file that has seven columns and containing the following data:
ID,Gender,PatientPrefix,PatientFirstName,PatientLastName,PatientSuffix,PatientPrefName
2 ,M ,Mr ,Lawrence ,Harry , ,Larry
I am new to pentaho and I want to design a transformation that moves the data (values of the 7 columns) to an empty excel sheet. The excel sheet has different column names, but should carry the same data, as shown:
prefix_name,first_name,middle_name,last_name,maiden_name,suffix_name,Gender,ID
I tried to design a transformation using the following series of steps, but it gives me errors at the end that I could not interpret them.
What is the proper design to move the data from the csv file to the excel sheet in this case? Any ideas to solve this problem?
As #Brian.D.Myers mentioned in the comment you can use select values step. But here is how you do it step by step explanation.
Select all the fields from CSV file input step.
Configure the select values step as follows.
In the Content tab of Excel writer step click on Get fields button and fill the fields. Alternatively you can use Excel output step as well.

2 csv file export to excel file using ssis

I'm trying read the data from 2 csv files and export into the new excel file, but I'm not able to export the data in excel destination. While doing the mapping of the columns, there are 4 columns in the input columns but it is showing only 1 column in available output column that is only F1. Please let me know how to resolve this issue.
If I understand the question correctly, you are unable to map columns to a ‘new’ XL file.
If this is the case, the metadata for the mappings is probably the issue.
Try first creating a new xl file with the column headings and column types you want, then map to this.
Alternativly, right click on the excel destination and use the ‘Show Advanced Editor’ option and then adjust the columns in the ‘External Colums’ of ‘Input and Output settings tab.
You may then need to set the ValidateExternalMetadata option to false for the excel destination component in order to allow creation of new files from scratch.
Open an excel sheet and on the 1st row, give the column headings. Column A, B, C and D should have the same names as your source columns. After doing so, save the excel file and close the work book. Go to SSIS and open the Excel destination mappings. You should be able to map them now.

SSIS - Is there a Data Flow Source component that will handle CSV files where the column order may change?

We have written a number of SSIS packages that import data from CSV files using the Flat File Source.
It now seems that after these packages are deployed into production, the providers of these files may deliver files where the column order of the files changes (Don't ask!). Currently if this happens, our packages will fail.
For example, an additional column is inserted at the beginning of each row. In this case, the flat file source continues to use the existing column order, which obviously has a detrimental effect on the transformation!
Eg. Using a trivial example, the original file has the following content :
OurReference,Client,Amount
235,MFI,20000.00
236,MS,30000.00
The output from the flat file source is :
OurReference Client Amount
235 ClientA 20000.00
236 ClientB 30000.00
Subsequently, the file delivered changes to :
OurReference,ClientReference,Client,Amount
235,A244,ClientA,20000.00
236,B222,ClientB,30000.00
When the existing unchanged package is run against this file, the output from the flat file source is :
OurReference Client Amount
235 A244 ClientA,20000.00
236 B222 ClientB,30000.00
Ideally, we would like to use a data source that will cope with this problem - ie which produces output based on the column names, instead of the column order.
Any suggestions would be welcomed!
Not that I know of.
A possibility to check for the problem in advance is to set up two different connection managers, one with a single flat row. This one can read the first row and tell if it's OK or not and abort.
If you want to do the work, you can take it a step further and make that flat one-field row the only connection manager, and use a script component in your flow to parse the row and assign to the columns you need later in the flow.
As far as I know, there is no way to dynamically add columns to the flow at runtime - so all the columns you need will need to be added to the script task output. Whether they can be found and get parsed from the each line is up to you. Any "new" (i.e. unanticipated) columns cannot be used. Columns which are missing you could default or throw an exception.
A final possibility is to use the SSIS object model to modify the package before running to alter the connection manager - or even to write the entire package dynamically using the object model based on an inspection of the input file. I have done quite a bit of package generation in C# using templates and then adding information based on metadata I obtained from master files describing the mainframe files.
Best approach would be to run a check before the SSIS package imports the CSV data. This may have to be an external script/application, because I don't think you can manipulate data in the MS Business Intelligence Studio.
Here is a rough approach. I will write down the limitations at the end.
Create a flat file source. Put the entire row in one column.
Do not check Column names in first data row.
Create a Script Component
Code:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
string sRow = Row.Column0;
string sManipulated = string.Empty;
string temp = string.Empty;
string[] columns = sRow.Split(',');
foreach (string column in columns)
{
sManipulated = string.Format("{0}{1}", sManipulated, column.PadRight(15, ' '));
}
/* Note: For sake of demonstration I am padding to 15 chars.*/
Row.Column0 = sManipulated;
}
Create a flat file destination
Map Column0 to Column0
Limitation: I have arbitrarily padded each field to 15 characters. Points to consider:
1. Do we need to have each field of same size?
2. If yes, what is that size?
A generic way to handle that would be to create a table to store the file name, fields, and field sizes.
Use the file name to dynamically create the source and destination connection manager.
Use the field name and corresponding field size to decide the padding. Not sure, if you need this much flexibility. If you have any question, please respond.