SSIS GUIDs as automaticly generated Id's from the XML source component - ssis

Is it possible to setup the XML Source component so that it generates GUID, instead of the default int, values for the automatically generated _id column?

I would send each output from the XML Source to a SQL table (using an OLE DB Destination), and add a GUID column to each table.
Actually I would forget about GUIDs because they are painful in general and with SSIS in particular - it essentially does not provide any support for GUIDs.

The way to solve my problems was to write a custom component. No other alternatives where available.

Related

No columns returned SSIS

I am implementing a SSIS package and currently trying to do the following.
Truncate the destination table
Fetch the data by executing the stored procedure and insert it into the destination table.
I have created an Execute SQL task to address step 1 and dataflow with oledb source and oledb destination to address the second point. It been working successfully so far but isn't working for one my stored procedure that uses temp tables.
When I edit the oledb source and click the preview button, I get the error no column returned
I know that SSIS has an issue with generating column while executing stored procedures that depend on temp tables. I have converted the stored proc to use temporary table variables and its now able to return columns in SSIS when I do a preview. The only downside is that the stored procedure is taking longer time to execute. Its taking 1 hour 15 mins as compared to 15 mins while using temp tables.
I did see a suggestion to use SET FMTONLY before executing the stored procedure as an alternate solution to changing to temp table variables but that didn't seem to work as I am getting syntax or permission denied error.
Could somebody tell me a solution to my problem which does not compromise on the performance.
Sounds like you've already read all the approaches to using Temp tables in SSIS, including the IF 1=0... trick? If you haven't seen that one yet, google it.
You say that using Table Variables causes your stored procedure to take about 5 times longer than using Temp Tables. The most likely reason for that is that you are indexing your temp tables but not your table variables. If you didn't know that table variables can be indexed, they can. You might try that.
Finally, a solution that you haven't mentioned is that you can replace your temporary table with a real table that gets truncated when you're done using it.
Short comment:
Try EXEC WITH RESULT SETS and specify the metadata yourself for a proc with temp tables; or use the Script Component as a source and specify the Output columns yourself.
Long comment:
Technically speaking, it is the driver/database you are using in SSIS that would decide the behavior when working with temp tables.
Metadata is an important factor when using SSIS's pipeline components. By metadata, I mean the names of the columns, their data types etc that a pipeline component uses. When designing a data flow, someone/something should provide this metadata to the components that require it.
In most cases, SSIS automatically retreives the metadata. Components that do not connect to a external data source, like Conditional Split etc, get their metadata from the other components they are connected to. For the pipeline components that connect to a external data source (like Oledb source, oledb destination, Lookup etc.), SSIS provides a mechanism to get this metadata without human involvement. This mechanism involves the driver connecting to the database and retrieving the metadata of the output. If the driver/database is capable of returning the metadata, then that metadata is used. If the driver/database is incapable, then you get the errors you are seeing. The rest of my comments are based on the assumption that you are using a SQL Server database in your question.
When working with a SQL Server database in SSIS, typically, we use the native client drivers provided by Microsoft. When trying to get the metadata, these drivers try to get the metadata without actually executing the SQL Statement (actual execution can have side effects; and also, might take more than a few seconds/minutes/hours; and you dont want side effects and long wait times during package design time.) So to get the metadata, the driver relies on the metadata of the actual objects used in the sql command. If the command uses a physical table or view, SQL Server already has the metadata available and can supply it to the driver. If it is a temp table, SQL Server does not have the metadata until it can create the temp table. If using FMT ONLY option, you can use it in such a way to create the temp tables, but avoid any heavy processing/side affects and thus be able to retrieve metadata without penalties. Post 2012, these native client drivers rely on some newer functionality to retrieve metadata than the drivers before 2012. In 2012 and after, the driver uses the sp_describe_first_result_set proc to retrieve metadata. So, whether you can get metadata or not is determined by the ability of the sp_describe_first_result_set proc.
So while SSIS can automatically get the metadata (because of the driver/database), it does not automatically get the metadata in some cases (again because of the driver/database). In cases involving the second scenario, some other process (typically a human) can help the driver infer metadata or provide the metadata to the component directly.
To help the driver, in case of SQL Server 2012 and after, you can use the WITH RESULTSETS clause to specify the output metadata. When this clause is present, the driver will use it and doesnt try to query the metadata from system objects; and thus avoid the error which you would otherwise get. If you are using the drivers that came with SQL Server 2008, you can use FMT ONLY. This option is at the driver/database level.
Another option could be to use a Script Component as the Source and in the Output columns, you can specify the columns/metadata. SSIS would not try to retrieve metadata from the datasource in this case, but would rely on the definitions you provided in the Output section of the Script Component.
As you can see, both options involve a human (or some other process) specifying the metadata instead of SSIS trying to retrieve the metadata in an automated fashion. I would prefer the first option if working with SQL Server and the second option if working with databases like MySql.

Mondrian - Fact Table Data as XML

I am evaluating a Mondrian-Saiku solution for a client.
After analyzing their current database schemas, I realize that what constitutes as their 'fact table data' is currently being stored in XML's. The XML 's themselves are stored as blob datatypes in a MySQL table. Think of it like this: the table holds all the transactions of the company; the details of each transaction are stored in their own XML; each XML string is stored as one of the field values in a given transaction row.
This presents a slight dilemma since the Mondrian XML schema requires the explicit use of column names.
Short of having to extract and transfer the XML data to new tables (not realistic for my purposes due to the size of data and dependencies from other systems), is there any way I can work my client's existing setup for the purposes of a Mondrian-Saiku implementation?
You need to expose the data in a traditional table way. What is the database here? Can you create a database view which does some xml processing on the XML in the blob and exposes the columns?
Alternatively maybe something like composite or jboss teiid can help here. These tools allow you to expose as a standard looking table, virtually anything. It may not be quick enough though!

Convert database table structure to XSD format

Is there any way i can convert a table struture in a MySQL or Oracle database to XSD (XML Schema Definition) format ?.
Thank You.
use XML Spy.
http://williamjxj.wordpress.com/2011/05/25/1004/
Yes, but it's fairly complicated. You'll want to run the query SHOW CREATE TABLE <tablename> and it will return the full table creation statement (in tidy CREATE TABLE syntax).
Then you'll want to parse each line of the create table syntax using your language. Thankfully the fields are neatly separated by newlines.
The types should be fairly easy to map to XSD types.
Where it gets complicated is when you're parsing foreign key relationships - then you'll need to define custom types in your XSD and reference them accordingly.
It really comes down to your implementation. If you're looking for a portable data format that you can easily import/export from your database then there are a number of other solutions.

OLE DB to get BlobColumn Data in SSIS Dataflow

When I use ADO.net source in DataFlow to read Blob Column and pass it to Script Component to do further validations - need script compoment to do further validations on each column to generate master / child error records master (for each row) and child (for each error column). This works fine.
As I need to parameterize my source, I can't use ADO.net and instead need to use the OLEDB Source which supports parameters. When I use this OLEDB source, the script component doesnt recognise the BLOB data being passed by OLEDB source. It reports datatype problems i.e., convering nonunicode to unicode.
How can this be done.
Regards
Can you confirm what your source database is (SQL Server, Oracle, etc).
I had the same problem using the 'Oracle OLEDB provider for Oracle' data source. The provider seems to convert every varcahr into an nvarchar. I solved this by adding a 'data conversion' component, and explicitly converting all nvarchar columns to varchar here.
The new columns are incuded in the output of this compnent, so you can link them to the fields on your spreadsheet.

At run time how to I verify that the database schema matches my objects?

I have data access object that have been generated by SqlMetal, however the database is created by running a sql script.
Is there an easy way to verify that all table and columns names and type matches the attributes on the classes that SqlMetal created?
I guess the easiest way to do this would be to have some kind of version number hidden in a config table in your schema. Then on runtime check the version number returned.
Much easier than doing a full scan. Set the version number in your SQL script and somewhere in your data access object