Indirection in SSIS - sql-server-2008

Is it possible to perform any sort of indirection in SSIS?
I have a series of jobs performing FTP and loops through the files before trying to run another DTSX package on them. Currently this incurs a lot of repeated cruft to pull down the file and logging.
Is there any way of redesigning this so I only need one package rather than 6?

Based on your comment:
Effectively the 6 packages are really 2 x 3. 1st for each "group" is FTP pull
down and XML parsing to place into flat tables. Then 2nd then transforms and
loads that data.
Instead of downloading files using one package and inserting data into tables using another package, you can do that in a single package.
Here is a link containing an example which downloads files from FTP and saves it to local disk.
Here is a link containing an example to loop through CSV files in a given folder and inserts that data into database.
Since you are using XML files, here is a link that shows how to loop through XML files.
You can effectively combine the above examples into a single package by placing the control flow tasks one after the other.
Let me know if this is not what you are looking for.

Related

How to load a directory of different files Excel and CSV in multiple tables on database with Talend?

I need to load a directory of different files (Excel and CSV) without any relation between them in multiple tables on database, every file must be loaded in its own table without any transformation.
I tried to do this using TfileList ==> TfileInputExcel ==> tMySQLOutput but it doesn't work because I need a lot of outputs.
Your question is not very clear, but it seems like you want something generic enough that will work with just one flow for all your files.
You might be able to accomplish that using dynamic schemas. See here for further guidance: https://www.talendforge.org/forum/viewtopic.php?id=21887. You will probably need at least 2 flows, one for the CSV files and one for the XLS files. You can filter the files for each flow by their extension in the tFileList component.
But if you are new to Talend, I encourage you to avoid this approach. It might be very hard to understand and use dynamic schemas. Instead, I would recommend you have one flow for each file.

Load txt file into SQL Server database

I am still learning SQL Server.
The scenario is that I have a lot of .txt files with name format like DIAGNOSIS.YYMMDDHHSS.txt and only the YYMMDDHHSS is different from file to file. They are all saved in folder Z:\diagnosis.
How could I write a stored procedure to upload all .txt files with a name in the format of DIAGNOSIS.YYMMDDHHSS.txt in folder Z:\diagnosis? Files can only be loaded once.
Thank you
I would not do it using a stored proc. I would use SSIS. It has a for each file task you can use. When the file has been loaded, I would move it to an archive location so that it doesn't get processed the next time. Alternatively you could create a table where you store the names of the files that were successfully processed and have the for each file loop skip any in that table, but then you just keep getting more and more files to loop through, better to move processed ones to a different location if you can.
And personally I also would put the file data in a staging table before loading the data to the final table. We use two of them, one for the raw data and one for the cleaned data. Then we transform to staging tables that match the relational tables in production to make sure the data will meet the needs there before trying to affect production and send exceptions to an exception table of records that can't be inserted for one reason or another. Working in the health care environment you will want to make sure your process meets the government regulations for storage of patient records for the country you are in if they exist (See HIPAA in the US). You may have to load directly to production or severely limit the access to staging tables and files.

How to make SSIS choose data source depending on parameter?

I have an SSIS data flow task that reads a CSV file with certain fields, tweaks it a little and inserts results into a table. The source file name is a package parameter. All is good and fine there.
Now, I need to process slightly different kind of CSV files with an extra field. This extra field can be safely ignored, so the processing is essentially the same. The only difference is in the column mapping of the data source..
I could, of course, create a copy of the whole package and tweak the data source to match the second file format. However, this "solution" seems like terrible duplication: if there are any changes in the course of processing, I will have to do them twice. I'd rather pass another parameter to the package that would tell it what kind of file to process.
The trouble is, I don't know how to make SSIS read from one data source or another depending on parameter, hence the question.
I would duplicate the Connection Manager (CSV definition) and Data Flow in the SSIS package and tweak them for the new file format. Then I would use the parameter you described to Enable/Disable either Data Flow.
In essence, SSIS doesnt work with variable metadata. If this is going to be a recurring pattern I would deal with it upstream from SSIS, building a VB / C# command-line app to shred the files into SQL tables.
You could make the connection manager push all the data into 1 column. Then use a script transformation component to parse out the data to the output, depending on the number of fields in the row.
You can split the data based on delimiter into say a string array (I googled for help when I needed to do this). With the array you can tell the size of it and thus what type of file it is that has been connected to.
Then, your mapping to the destination can remain the same. No need to duplicate any components either.
I had to do something similar myself once, because although the files I was using were meant to always be the same format - depending on version of the system sending the file, it could change - and thus by handling it in a script transformation this way I was able to handle the minor variations to the file format. If the files are 99% always the same that is ok.. if they were radically different you would be better to use a separate file connection manager.

Best practice to organize a 200+ tables import project

This question is going to be a purely organizational question about SSIS project best practice for medium sized imports.
So I have source database which is continuously being enriched with new data. Then I have a staging database in which I sometimes load the data from the source database so I can work on a copy of the source database and migrate the current system. I am actually using a SSIS Visual Studio project to import this data.
My issue is that I realised the actual design of my project is not really optimal and now I would like to move this project to SQL Server so I can schedule the import instead of running manually the Visual Studio project. That means the actual project needs to be cleaned and optimized.
So basically, for each table, the process is simple: truncate table, extract from source and load into destination. And I have about 200 tables. Extractions cannot be parallelized as the source database only accepts one connection at a time. So how would you design such a project?
I read from Microsoft documentation that they recommend to use one Data Flow per package, but managing 200 different package seems quite impossible, especially that I will have to chain for scheduling import. On the other hand a single package with 200 Data Flows seems unamangeable too...
Edit 21/11:
The first apporach I wanted to use when starting this project was to extract my table automatically by iterating on a list of table names. This could have worked out well if my source and destination tables had all the same schema object names, but the source and destination database being from different vendor (BTrieve and Oracle) they also have different naming restrictions. For example BTrieve does not reserve names and allow more than 30 characters names, which Oracle does not. So that is how I ended up manually creating 200 data flows with a semi-automatic column mapping (most were automatic).
When generating the CREATE TABLE query for the destination database, I created a reusable C# library containing the methods to generate the new schema object names, just in case the methodology could automated. If there was any custom tool to generate the package that could use an external .NET library, then this might do the trick.
Have you looked into BIDS Helper's BIML (Business Intelligence Markup Language) as a package generation tool? I've used it to create multiple packages that all follow the same basic truncate-extract-load pattern. If you need slightly more cleverness than what's built into BIML, there's BimlScript, which adds the ability to embed C# code into the processing.
From your problem description, I believe you'd be able to write one BIML file and have that generate two hundred individual packages. You could probably use it to generate one package with two hundred data flow tasks, but I've never tried pushing SSIS that hard.
You can basically create 10 child packages each having 20 data flow tasks and create a master package which triggers these child pkgs.Using parent to child configuration create a single XML file configuration file .Define the precedence constraint for executing the package in serial fashion in master pkg. In this way maintainability will be better compared to having 200 packages or single package with 200 data flow tasks.
Following link may be useful to you.
Single SSIS Package for Staging Process
Hope this helps!

Load XML Using SSIS

I have a ETL type requirement for SQL Server 2005. I am new to SSIS but I believe that it will be the right tool for the job.
The project is related to a loyalty card reward system. Each month partners in the scheme send one or more XML files detailing the qualiifying transactions from the previous month. Each XML file can contain up to 10,000 records. The format of the XML is very simple, 4 "header" elements, then a repeating sequence containing the record elements. The key record elements are card_number, partner_id and points_awarded.
The process is currently running in production but it was developed as a c# app which runs an insert for each record individually. It is very slow, taking over 8 hours to process a 10,000 record file. Through using SSIS I am hoping to improve performance and maintainability.
What I need to do:
Collect the file
Validate against XSD
Business Rule Validation on the records. For each record I need to ensure that a valid partner_id and card_number have been supplied. To do this I need to execute a lookup against the partner and card tables. Any "bad" records should be stripped out and written to a response XML file. This is the same format as the request XML, with the addition of an error_code element. The "good" records need to be imported into a single table.
I have points 1 and 2 working ok. I have also created an XSLT to transform the XML into a flat format ready for insert. For point 3 I had started down the road of using a ForEach Loop Container control in the control flow surface, to loop each XML node, and the SQL Lookup task. However, this would require a call to the database for each lookup and a call to the file system to write out the XML files for the "bad" and "good" records.
I believe that better performance could be achieved by using the Lookup control on the data flow surface. Unfortunately, I have no experience of working with the data flow surface.
Does anyone have a suggestion as to the best way to solve the problem? I searched the web for examples of SSIS packages that do something similar to what I need but found none - are there any out there?
Thanks
Rob.
SSIS is frequently used to load data warehouses, so your requirement is nothing new. Take a look at this question/answer, to get you started with tutorials etc.
For-each in control flow is used to loop through files in directory, tables in a db etc. Data flow is where records fly through transformations from a source (your xml file) to a destination (tables).
You do need a lookup in one of its many flavors. Google for "ssis loading data warehouse dimensions"; this will eventually show you several techniques of efficiently using lookup transformation.
To flatten the XML (if simple enough), I would simply use XML source in data flow, XML task is for heavier stuff.