SSIS ETL execute MySQL Stored procedure on Destination Server - mysql

I am working on an SSIS ETL and I wanna know if there is a possibility to execute a MySQL Stored Procedure.
Here is what I want to do : From an SQL Server Database, I want to get Information by an ETL (SSIS) and send them to a MySQL Database (by a stored procedure)
Here is what I have done so far : I get my data from SQL Server Database and tranform them.
Here where I am stuck : I don't know how to execute an existing stored procedure on the MySQL Server Database (my destination)
Here is my ETL (DATA FLOW) diagram :
I also add an OLEDB provider on the server and add my destination source (MySQL Database) but I don't know what I need to do in my ETL to execute the stored procedure.
I can provide more information if necessary.
Thanks in advance

I am not sure if you mean that you want to execute a stored procedure by passing in every row in memory in SSIS that enters through your multicast. If that is the case, that may be possible with SSIS, but I have never done it.
why not just send the rows to two separate tables in MySQL from the multicast, then go back to the control flow tab and add two execute sql tasks, one for each stored procedure. Change each stored procedure to run on the tables instead of individual rows.
You would probably get a performance boost as well from switching to a set based operation instead of row by row.

Related

MySQL Stored Procedure in SSIS destination

Is it possible to use MySQL stored procedure in DataFlow task destination?
While SQLServer is the source, MySQL DB is the destination, and I would like to use stored procedure to normalize the data. Currently the process uses ADO NET Destination with ADO.NET connection manager to insert data into a single table and it works.
Well, Yes you can use it. But take care about Connection of MySQL and Parameters you passed to Stored Procedure.
Current System and use of StoredProcedure both will work for same.
You can't use a stored procedure as a destination, but you can use a staging table as a destination and then call a stored procedure that normalizes your data as it sends it to the destination tables.

No columns returned SSIS

I am implementing a SSIS package and currently trying to do the following.
Truncate the destination table
Fetch the data by executing the stored procedure and insert it into the destination table.
I have created an Execute SQL task to address step 1 and dataflow with oledb source and oledb destination to address the second point. It been working successfully so far but isn't working for one my stored procedure that uses temp tables.
When I edit the oledb source and click the preview button, I get the error no column returned
I know that SSIS has an issue with generating column while executing stored procedures that depend on temp tables. I have converted the stored proc to use temporary table variables and its now able to return columns in SSIS when I do a preview. The only downside is that the stored procedure is taking longer time to execute. Its taking 1 hour 15 mins as compared to 15 mins while using temp tables.
I did see a suggestion to use SET FMTONLY before executing the stored procedure as an alternate solution to changing to temp table variables but that didn't seem to work as I am getting syntax or permission denied error.
Could somebody tell me a solution to my problem which does not compromise on the performance.
Sounds like you've already read all the approaches to using Temp tables in SSIS, including the IF 1=0... trick? If you haven't seen that one yet, google it.
You say that using Table Variables causes your stored procedure to take about 5 times longer than using Temp Tables. The most likely reason for that is that you are indexing your temp tables but not your table variables. If you didn't know that table variables can be indexed, they can. You might try that.
Finally, a solution that you haven't mentioned is that you can replace your temporary table with a real table that gets truncated when you're done using it.
Short comment:
Try EXEC WITH RESULT SETS and specify the metadata yourself for a proc with temp tables; or use the Script Component as a source and specify the Output columns yourself.
Long comment:
Technically speaking, it is the driver/database you are using in SSIS that would decide the behavior when working with temp tables.
Metadata is an important factor when using SSIS's pipeline components. By metadata, I mean the names of the columns, their data types etc that a pipeline component uses. When designing a data flow, someone/something should provide this metadata to the components that require it.
In most cases, SSIS automatically retreives the metadata. Components that do not connect to a external data source, like Conditional Split etc, get their metadata from the other components they are connected to. For the pipeline components that connect to a external data source (like Oledb source, oledb destination, Lookup etc.), SSIS provides a mechanism to get this metadata without human involvement. This mechanism involves the driver connecting to the database and retrieving the metadata of the output. If the driver/database is capable of returning the metadata, then that metadata is used. If the driver/database is incapable, then you get the errors you are seeing. The rest of my comments are based on the assumption that you are using a SQL Server database in your question.
When working with a SQL Server database in SSIS, typically, we use the native client drivers provided by Microsoft. When trying to get the metadata, these drivers try to get the metadata without actually executing the SQL Statement (actual execution can have side effects; and also, might take more than a few seconds/minutes/hours; and you dont want side effects and long wait times during package design time.) So to get the metadata, the driver relies on the metadata of the actual objects used in the sql command. If the command uses a physical table or view, SQL Server already has the metadata available and can supply it to the driver. If it is a temp table, SQL Server does not have the metadata until it can create the temp table. If using FMT ONLY option, you can use it in such a way to create the temp tables, but avoid any heavy processing/side affects and thus be able to retrieve metadata without penalties. Post 2012, these native client drivers rely on some newer functionality to retrieve metadata than the drivers before 2012. In 2012 and after, the driver uses the sp_describe_first_result_set proc to retrieve metadata. So, whether you can get metadata or not is determined by the ability of the sp_describe_first_result_set proc.
So while SSIS can automatically get the metadata (because of the driver/database), it does not automatically get the metadata in some cases (again because of the driver/database). In cases involving the second scenario, some other process (typically a human) can help the driver infer metadata or provide the metadata to the component directly.
To help the driver, in case of SQL Server 2012 and after, you can use the WITH RESULTSETS clause to specify the output metadata. When this clause is present, the driver will use it and doesnt try to query the metadata from system objects; and thus avoid the error which you would otherwise get. If you are using the drivers that came with SQL Server 2008, you can use FMT ONLY. This option is at the driver/database level.
Another option could be to use a Script Component as the Source and in the Output columns, you can specify the columns/metadata. SSIS would not try to retrieve metadata from the datasource in this case, but would rely on the definitions you provided in the Output section of the Script Component.
As you can see, both options involve a human (or some other process) specifying the metadata instead of SSIS trying to retrieve the metadata in an automated fashion. I would prefer the first option if working with SQL Server and the second option if working with databases like MySql.

Automating tasks on more than one SQL Server 2008 database

We host multiple SQL Server 2008 databases provided by another group. Every so often, they provide a backup of a new version of one of the databases, and we run through a routine of deleting the old one, restoring the new one, and then going into the newly restored database and adding an existing SQL login as a user in that database and assigning it a standard role that exists in all of these databases.
The routine is the same, except that each database has a different name and different logical and OS names for its data and log files. My inclination was to set up an auxiliary database with a table defining the set of names associated with each database, and then create a stored procedure accepting the name of the database to be replaced and the name of the backup file as parameters. The SP would look up the associated logical and OS file names and then do the work.
This would require building the commands as strings and then exec'ing them, which is fine. However, the stored procedure, after restoring a database, would then have to USE it before it would be able to add the SQL login to the database as a user and assign it to the database role. A stored procedure can't do this.
What alternative is there for creating an automated procedure with the pieces filled in dynamically and that can operate cross-database like this?
I came up with my own solution.
Create a job to do the work, specifying that the job should be run out of the master database, and defining one Transact-SQL step for it that contains the code to be executed.
In a utility database created just for the purpose of hosting objects to be used by the job, create a table meant to contain at most one row, whose data will be the parameters for the job.
In that database, create a stored procedure that can be called with the parameters that should be stored for use by the job (including the name of the database to be replaced). The SP should validate the parameters, report any errors, and, if successful, write them to the parameter table and start the job using msdb..sp_start_job.
In the job, for any statement where the job needs to reference the database to be replaced, build the statement as a string and EXECUTE it.
For any statement that needs to be run in the database that's been re-created, doubly-quote the statement to use as an argument for the instance of sp_executesql IN THAT DATABASE, and use EXECUTE to run the whole thing:
SET #statement = #dbName + '..sp_executesql ''[statement to execute in database #dbName]''';
EXEC (#statement);
Configure the job to write output to a log file.

Querying MySQL and MSSQL databases at the same time

I'm getting data from an MSSQL DB ("A") and inserting into a MySQL DB ("B") using the date created in the MSSQL DB. I'm doing it with simple logics, but there's got to be a faster and more efficient way of doing this. Below is the sequence of logics involved:
Create one connection for MSSQL DB and one connection for MySQL DB.
Grab all of data from A that meet the date range criterion provided.
Check to see which of the data obtained are not present in B.
Insert these new data into B.
As you can imagine, step 2 is basically a loop, which can easily max out the time limit on the server, and I feel like there must be a way of doing this must faster and during when the first query is made. Can anyone point me to right direction to achieve this? Can you make "one" connection to both of the DBs and do something like below?
SELECT * FROM A.some_table_in_A.some_column WHERE
"it doesn't exist in" B.some_table_in_B.some_column
A linked server might suit this
A linked server allows for access to distributed, heterogeneous
queries against OLE DB data sources. After a linked server is created,
distributed queries can be run against this server, and queries can
join tables from more than one data source. If the linked server is
defined as an instance of SQL Server, remote stored procedures can be
executed.
Check out this HOWTO as well
If I understand your question right, you're just trying to move things in the MSSQL DB into the MySQL DB. I'm also assuming there is some sort of filter criteria you're using to do the migration. If this is correct, you might try using a stored procedure in MSSQL that can do the querying of the MySQL database with a distributed query. You can then use that stored procedure to do the loops or checks on the database side and the front end server will only need to make one connection.
If the MySQL database has a primary key defined, you can at least skip step 3 ("Check to see which of the data obtained are not present in B"). Use INSERT IGNORE INTO... and it will attempt to insert all the records, silently skipping over ones where a record with the primary key already exists.

Grouping SQL queries

Sometimes an application requires quite a few SQL queries before it can do anything useful. I was wondering if there is a way to send those as a batch to the database, to avoid the overhead of going back and forth between the client and the server?
If there is no standard way to do it, I'm using the python bindings of MySQL.
PS: I know MySQL has an executemany() function, but that's only for the same query executed many times with different parameters, right?
This process works best on inserts
Make all you SQL queries into Stored Procedures. These eventually will become child stored procedures
Create Master Store procedure to run all other Stored Procedures.
Modify master Stored procedure to accept values required by child Stored Procedures
Modify master Stored procedure to accept commands using "if" statements to know which
child stored procedures to run
If you need return data from Database use 1 stored procedure at the time.