Output multiple rows from script component per single row input - ssis

I'm pretty inexperienced with SSIS, though I have much experience in SQL and C# and other technologies.
I am converting a task I have written as a stand-alone c# console app into an SSIS package.
I have a OLEDB input source that is a SQL command, this collects certain data in the database that I then feed into a Script Component Transform. I use the input fields as parameters to an OAuth based restful web service, which requires a lot of custom C# code to accomplish. The web service returns an XML respose that includes many rows that must be output for each input row.
My understanding of how the script transform works is that it's more or less one row in, one row out.
I have several questions here really.
Is it a good idea to use the input source this way? Or is there a better way to feed input rows into my web service?
Is a script component transform the correct tool to use here? I can't use a normal web service because the web service is not SOAP or WCF based, and requires OAuth in the request. (or is there a way to use the web service component this way?)
How can output more than one row for every input row?
Does SSIS support a way to take the XML results (that contain multiple rows) and map them to the rows of the output field in the script transform? I know there's an XML Input source, but that's not really this. I'm thinking something that takes XML input and spits out rows of data
UPDATE:
Data from the Web Service looks like this (extra cruft elided):
<user>
<item>
<col1>1</col1>
<col2>2</col2>
<col3>3</col3>
</item>
<item>
<col1>1</col1>
<col2>2</col2>
<col3>3</col3>
</item>
....
</user>
Essentially, the SQL DataSource returns a dataset of of users. The users dataset is fed into the script and used as parameters for the web service calls. The web service calls return a set of XML results, which have multiple "rows" of data that must be output from the script.
In the above data, the outputs of the script would be multiple rows of col1, col2, and col3 for each user supplied in the input source. I need a way to extract those elements and put them into columns in the output buffer for each row of xml data. Or, a way to simply make the xml the output of the script and feed that output into another component to parse the xml into rows (like an XML source does, but obviously you can't put an XML source in the middle of a data flow).

Answering what I can
Is it a good idea to use the input source this way? Or is there a better way to feed input rows into my web service?
It depends but generally, if your data is in a database, an OLE DB, or ADO.NET source is your preferred component for injecting it into the pipeline. Better? It depends on your needs but is there a reason you think it wouldn't be advisable? Nice benefits to using a data flow are built in buffering, parallelism, logging, configuration, etc. I'm assuming that or some other reason is leading you to move your .NET app into an Integration Services package so I would think if you're moving into this space, go whole hog.
Is a script component transform the correct tool to use here?
Definitely. The built-in web-service stuff is less-than-industrial-strength. You're already familiar with .NET so you're well positioned to take maximum advantage of that component.
How can output more than one row for every input row?
Yes. Your assumption of 1:1 input:output is only for the default behaviour. By default, a script component is synchronous so as you've observed, every row has an output. But, by changing your script component to becoming an asynchronous component, then you can have 1B rows transformed into a single row of output or have 1 row of source generate N rows of output. I had to do the latter for a Bill of Materials type problem---I'd receive a parent id and I'd have to lookup all the child rows associated to the parent. Anyways, the linked MSDN article describes how to make it async.
Does SSIS support a way to take the XML results
I don't understand well enough what you're asking to address this. Dummy up some examples for this dummy and I'll see if it clicks.

Related

How to make SSIS choose data source depending on parameter?

I have an SSIS data flow task that reads a CSV file with certain fields, tweaks it a little and inserts results into a table. The source file name is a package parameter. All is good and fine there.
Now, I need to process slightly different kind of CSV files with an extra field. This extra field can be safely ignored, so the processing is essentially the same. The only difference is in the column mapping of the data source..
I could, of course, create a copy of the whole package and tweak the data source to match the second file format. However, this "solution" seems like terrible duplication: if there are any changes in the course of processing, I will have to do them twice. I'd rather pass another parameter to the package that would tell it what kind of file to process.
The trouble is, I don't know how to make SSIS read from one data source or another depending on parameter, hence the question.
I would duplicate the Connection Manager (CSV definition) and Data Flow in the SSIS package and tweak them for the new file format. Then I would use the parameter you described to Enable/Disable either Data Flow.
In essence, SSIS doesnt work with variable metadata. If this is going to be a recurring pattern I would deal with it upstream from SSIS, building a VB / C# command-line app to shred the files into SQL tables.
You could make the connection manager push all the data into 1 column. Then use a script transformation component to parse out the data to the output, depending on the number of fields in the row.
You can split the data based on delimiter into say a string array (I googled for help when I needed to do this). With the array you can tell the size of it and thus what type of file it is that has been connected to.
Then, your mapping to the destination can remain the same. No need to duplicate any components either.
I had to do something similar myself once, because although the files I was using were meant to always be the same format - depending on version of the system sending the file, it could change - and thus by handling it in a script transformation this way I was able to handle the minor variations to the file format. If the files are 99% always the same that is ok.. if they were radically different you would be better to use a separate file connection manager.

moving database from file maker pro7 to Mysql

So I have this fileMakerPro7 database. As my senior project, I supposed to migrate the database to a MySQL database and than give it a PHP Based interface in 3N form...
Company allow us $200 tops to spend on the project, but if I pay for something, it has to work. However, I am having trouble finding a way of migrating the database. Any suggestions?
I have found "file maker pro migrator" (http://www.fmpromigrator.com), would the trial version be enough for us? In worst case, we will start from the beginning with throwing away the whole database that company has.
I can also download fileMakerPro12 and use it for a month with trial version for free. Would I be able to convert the db by using FMP12?
I am totally lost...open to any free suggestions...
+this is a non-profit-making company I'm doing the project for
If I had to do it, I'd look at the design of the FileMaker db and create something similar in mysql. Then I would export the Filemaker data to text and import it somehow. The details depend on foreign key values and such.
The PHP interface would be done separately.
MySQL Data Conversion:
Yes, if your database is small enough, the demo version of FmPro Migrator will convert the database and also build you a PHP web application - at no cost.
Here are the limitations of the demo version:
5 fields
5 scripts
5 layouts
PHP Web Application:
Most people don't realize it, but there is a wealth of FileMaker metadata available in XML format for performing these types of conversions. This XML info is available either thru copying the layout via the clipboard or reading it from the Database Design Report XML file. I have found the clipboard data to be the most reliable source of this info.
FmPro Migrator is able to parse in the XML and convert it into the PHP web application.
Each object on a layout is represented in XML, along with style and position info. This info can be used to create form files representing the same look as the original layout. In fact, it can be difficult to see the difference between the web application and the original database if you get all of the object properties implemented. This can be helpful for situations in which companies don't want to have to retrain their employees. They want the web application to look and work the same as the original desktop application.
I have done a few of these conversions recently into the CakePHP framework. Here a few techniques I used:
Auto-Enter Calculation Fields - Stored calculation fields are calculated and stored within the model saves a record to the database.
Unstored Calculation Fields - Unstored Calculation fields are calculated in real-time within the form controller - but only for fields actually displayed on the form. This prevents unnecessarily calculating these values if they aren't being used on a form, improving performance.
Global Fields - A Global field in FileMaker is used like a global variable in programming languages. It is important to know that each FileMaker user gets there own private copy of global field data. There is no equivalent feature MySQL or other SQL database servers, but this functionality can easily be simulated using SESSION variables. Therefore each web user will still get their own private SESSION data, simulating the same functionality originally present in the FileMaker database. I structure these globals in the model data array as if they were retrieved from the model, meaning that converted scripts and fields on forms can reference them easily. Just before the record gets written into the database, the results are saved into SESSION variables for persistence.
Global Variables in Scripts - Global variables within FileMaker scripts match up very well with the use of PHP SESSION variables, if you want to implement the same functionality.
Vector Graphic Objects - FileMaker layouts frequently include rectangles, ovals and line objects. These objects can be replaced with the RafaelJS library, providing high quality resolution independent graphics.
Value Lists - Custom and Field based value lists are implemented in a centralized location within the AppController.php file. Therefore making a change to the definition of the value list within the AppController, succeeds in changing the menu automatically throughout the whole application.

Dynamic JSON file vs API

I am designing a system with 30,000 objects or so and can't decide between the two: either have a JSON file pre computed for each one and get data by pointing to URL of the file (I think Twitter does something similar) or have a PHP/Perl/whatever else script that will produce JSON object on the fly when requested, from let's say database, and send it back. Is one more suited for than another? I guess if it takes a long time to generate the JSON data it is better to have already done JSON files. What if generating is as quick as accessing a database? Although I suppose one has a dedicated table in the database specifically for that. Data doesn't change very often so updating is not a constant thing. In that respect the data is static for all intense and purposes.
Anyways, any thought would be much appreciated!
Alex
You might want to try MongoDB which retrieves the objects as JSON and is highly scalable and easy to setup.

Sql DB Driven Web Application architecture question

I'm building medium sized business web application, data is being saved on a MySQL database.
I'm trying to think of a way of adding certain selectable "widgets" to that application (e.g. a currency widget - which will show user specified currencies when the web app is visible) but having an hard time deciding how to save the widget data and settings per user since the widgets do not have a common base.
For example, the currency widget's settings is totally different than say, a weather widget.
One will require a list of desired currencies, and one would require the weather's target location.
I thought of solving the above by keeping all the widget's settings data encoded in the "widgetData" column of a db table which will contain the userId, widgetId and widgetData.
I chose JSON as my way of encoding, and each time a user tries to load it's page, I have to decode it's settings and hand the user the desired data based on the settings.
The same is true for saving the widget's actual data which does not have a common base itself.
Hopefully I can solve this by using a NO-SQL data structure next time, but this is not the case for the current project.
The Entity Attribute Value database model would be useful very to you in this scenario.
It's much more flexible than JSON or XML or other types of formats because it works within your standard SQL data storage, albeit in a different manner.
I voted up the EAV solution because this is one of the valid reasons for using it, but don't fall in love with it. An advantage of EAV is that it is database-native to the extent that you can write queries in SQL to query it (find me all widgets missing some setting and then add it), while most engines do not have JSON support.
On the other hand, if you want/need to query within a column which contains structured data, XML is a better option than JSON (right now): http://dev.mysql.com/doc/refman/5.1/en/xml-functions.html#function_extractvalue
If your widgets are rendered via Javascript in the browser, then your solution is perfectly fine. Your widgetData remains a JSON string, in Javascript you use JSON.parse() to turn it into an object and render it, and JSON.stringify() to turn it back into a string before posting it back to your server.

Load XML Using SSIS

I have a ETL type requirement for SQL Server 2005. I am new to SSIS but I believe that it will be the right tool for the job.
The project is related to a loyalty card reward system. Each month partners in the scheme send one or more XML files detailing the qualiifying transactions from the previous month. Each XML file can contain up to 10,000 records. The format of the XML is very simple, 4 "header" elements, then a repeating sequence containing the record elements. The key record elements are card_number, partner_id and points_awarded.
The process is currently running in production but it was developed as a c# app which runs an insert for each record individually. It is very slow, taking over 8 hours to process a 10,000 record file. Through using SSIS I am hoping to improve performance and maintainability.
What I need to do:
Collect the file
Validate against XSD
Business Rule Validation on the records. For each record I need to ensure that a valid partner_id and card_number have been supplied. To do this I need to execute a lookup against the partner and card tables. Any "bad" records should be stripped out and written to a response XML file. This is the same format as the request XML, with the addition of an error_code element. The "good" records need to be imported into a single table.
I have points 1 and 2 working ok. I have also created an XSLT to transform the XML into a flat format ready for insert. For point 3 I had started down the road of using a ForEach Loop Container control in the control flow surface, to loop each XML node, and the SQL Lookup task. However, this would require a call to the database for each lookup and a call to the file system to write out the XML files for the "bad" and "good" records.
I believe that better performance could be achieved by using the Lookup control on the data flow surface. Unfortunately, I have no experience of working with the data flow surface.
Does anyone have a suggestion as to the best way to solve the problem? I searched the web for examples of SSIS packages that do something similar to what I need but found none - are there any out there?
Thanks
Rob.
SSIS is frequently used to load data warehouses, so your requirement is nothing new. Take a look at this question/answer, to get you started with tutorials etc.
For-each in control flow is used to loop through files in directory, tables in a db etc. Data flow is where records fly through transformations from a source (your xml file) to a destination (tables).
You do need a lookup in one of its many flavors. Google for "ssis loading data warehouse dimensions"; this will eventually show you several techniques of efficiently using lookup transformation.
To flatten the XML (if simple enough), I would simply use XML source in data flow, XML task is for heavier stuff.