I want to run some setup SQLs before the content of my report is being processed and then at the end run some cleanup SQLs. e.g. some ALTER statements at the beginning and revert the ALTER at the end.
These should be run per report and users will be accessing the reports via the web url of the report server. I wonder if these SQLs can be configured in the report definition file.rdl using BIDS or I can configure this on the SSRS server side or the underlying database. And how?
First I should say that you may not have the best process if you need to ALTER a table back and forth for a query but I know that crazy stuff is sometimes necessary.
You can add DDL statements to your dataset query.
Here's a query for a Dataset I have that creates a Temp table and some other processes before SELECTing the data needed.
CREATE TABLE #TEMP_CENSUS(
GEO_DATA GEOMETRY NOT NULL,
VALUE DECIMAL(12, 4) NOT NULL DEFAULT 0,
NAME NVARCHAR(50) NULL,
GEO NVARCHAR(250) NULL ) ON [PRIMARY]
INSERT INTO #TEMP_CENSUS(GEO_DATA, VALUE, NAME)
exec dbo.CreateHeatMap 20, 25, ...
Unfortunately, you want other operations after your data is selected. For your reverting ALTER statements, you would want to create another dataset using the same source with the alter statements.
In your DataSource, check the Use Single Transaction box so that the two datasets will be performed in order (as they appear in the Dataset list) so your first dataset will ALTER the tables you need then SELECT your data. Then the second query will run to unALTER (re/de -ALTER?) the tables. You may need to add a SELECT of some sort to the second dataset query so it has some data so SSRS doesn't freak out - I haven't had to run any DDL without returning data (yet).
Related
I want to be able to update a specific column of a table using data from another table. Here's what the two tables look like, the DB type and SSIS components used to get the tables data (btw, both ID and Code are unique).
Table1(ID, Code, Description) [T-SQL DB accessed using ADO NET Source component]
Table2(..., Code, Description,...) [MySQL DB accessed using ODBC Source component]
I want to update the column Table1.Description using the Table2.Description by matching them with the right Code first (because Table1.Code is the same as Table2.Code).
What i tried:
Doing a Merge Join transformation using the Code column but I couldn't figure out how to reinsert the table because since Table1 has relationships i can't simply drop the table and replace it with the new one
Using a Lookup transformation but since both tables are not the same type it didn't allow me to create the lookup table's connection manager (which would be for in my case MySQL)
I'm still new to SSIS but any ideas or help would be greatly appreciated
My solution is based on #Akina's comments. Although using a linked server would've definitely fit, my requirement is to make an SSIS package to take care of migrating some old data.
The first and last are SQL tasks, while the Migrate ICDDx is the DFT that transfers the data to a staging table created during the first SQL task.
Here's the SQL commands that gets executed during Create Staging Table :
DROP TABLE IF EXISTS [tempdb].[##stagedICDDx];
CREATE TABLE ##stagedICDDx (
ID INT NOT NULL,
Code VARCHAR(15) NOT NULL,
Description NVARCHAR(500) NOT NULL,
........
);
and here's the sql command (based on #Akina's comment) for transferring from staged to final (inside Transfer Staged):
UPDATE [MyDB].[dbo].[ICDDx]
SET [ICDDx].[Description] = [##stagedICDDx].[Description]
FROM [dbo].[##stagedICDDx]
WHERE [ICDDx].[Code]=[##stagedICDDx].[Code]
GO
Here's the DFT used (both TSQL and MySQL sources return sorted output using ORDER BY Code, so i didnt have to insert Sort components before the Merge Join) :
Note: Btw, you have to setup the connection manager to retain/reuse the same connection so that the temporary table doesn't get deleted before we transfer data to it. If all goes well, then after the Transfer Staged SQL Task, the connection would be closed and the global temporary table would be deleted.
I have a simple data flow in SSIS (defined in visual studio 2013), which uses SQL to extract data from one sql server instance's table A to then add it to another SQL server instance's table B.
What is the best practice pattern to truncate the data in table B? A truncate statement like this:
TRUNCATE TABLE B
after the select statement for table A - especially when you have have a fairly big table to 'transmit'?
One thing I have done in cases like that is to create two copies of the same table and then a view that points to one or the other that has the name of the current table.
The SSIS package then determines which table is in use and sets the connection for the table to populate to the other table.
Then an exec SQl task truncates the table not currently in use. You may also want to drop any indexes at this point.
Then a dataflow populates the table not currently in use.
Then recreate any indexes you dropped.
Finally an exec SQL task drops and creates the view to use the table you just populated instead of the other one.
Total down time of the table being referenced? Generally less than a second for the drop and create view no matter how long it takes to populate the table.
I am not a DBA but I do work for a small company as the IT person. I have to replicate a database from staging to production. I have created an SSIS package to do this but it takes hours to run. This isn't a large data warehouse type of project, either, it's a pretty straightforward Upsert. I'm assuming that I am the weak link in how I designed it.
Here's my procedure:
Truncate staging tables (EXECUTE SQL TASK)
Pull data from a development table into staging (Data Flow Task)
Run a data flow task
OLE DB Source
Conditional Split Transformation (Condition used: [!]ISNULL(is_new_flag))
If new insert, if existing update
The data flow task is mimicked a few times to change tables/values but the flow is the same. I've read several things about OLE DB components being slow to updates being slow and have tried a few things but haven't gotten it to run very quickly.
I'm not sure what other details to give, but I can give anything that's asked for.
Sample package using SSIS 2008 R2 that inserts or updates using batch operation:
Here is a sample package written in SSIS 2008 R2 that illustrates how to perform insert, update between two databases using batch operations.
Using OLE DB Command will slow down the update operations on your package because it does not perform batch operations. Every row is updated individually.
The sample uses two databases namely Source and Destination. In my example, both the databases reside on the server but the logic can still be applied for databases residing on different servers and locations.
I created a table named dbo.SourceTable in my source database Source.
CREATE TABLE [dbo].[SourceTable](
[RowNumber] [bigint] NOT NULL,
[CreatedOn] [datetime] NOT NULL,
[ModifiedOn] [datetime] NOT NULL,
[IsActive] [bit] NULL
)
Also, created two tables named dbo.DestinationTable and dbo.StagingTable in my destination database Destination.
CREATE TABLE [dbo].[DestinationTable](
[RowNumber] [bigint] NOT NULL,
[CreatedOn] [datetime] NOT NULL,
[ModifiedOn] [datetime] NOT NULL
)
GO
CREATE TABLE [dbo].[StagingTable](
[RowNumber] [bigint] NOT NULL,
[CreatedOn] [datetime] NOT NULL,
[ModifiedOn] [datetime] NOT NULL
)
GO
Inserted about 1.4 million rows in the table dbo.SourceTable with unique values into RowNumber column. The tables dbo.DestinationTable and dbo.StagingTable were empty to begin with. All the rows in the table dbo.SourceTable have the flag IsActive set to false.
Created an SSIS package with two OLE DB connection managers, each connecting to Source and Destination databases. Designed the Control Flow as shown below:
First Execute SQL Task executes the statement TRUNCATE TABLE dbo.StagingTable against the destination database to truncate the staging tables.
Next section explains how the Data Flow Task is configured.
Second Execute SQL Task executes the below given SQL statement that updates data in dbo.DestinationTable using the data available in dbo.StagingTable, assuming that there is a unique key that matches between those two tables. In this case, the unique key is the column RowNumber.
Script to update:
UPDATE D
SET D.CreatedOn = S.CreatedOn
, D.ModifiedOn = S.ModifiedOn
FROM dbo.DestinationTable D
INNER JOIN dbo.StagingTable S
ON D.RowNumber = S.RowNumber
I have designed the Data Flow Task as shown below.
OLE DB Source reads data from dbo.SourceTable using the SQL command SELECT RowNumber,CreatedOn, ModifiedOn FROM Source.dbo.SourceTable WHERE IsActive = 1
Lookup transformation is used to check if the RowNumber value already exists in the table dbo.DestinationTable
If the record does not exist, it will be redirected to the OLE DB Destination named as Insert into destination table, which inserts the row into dbo.DestinationTable
If the record exists, it will be redirected to the OLE DB Destination named as Insert into staging table, which inserts the row into dbo.StagingTable. This data in staging table will be used in the second `Execute SQL Task to perform batch update.
To activate few more rows for OLE DB Source, I ran the below query to activate some records
UPDATE dbo.SourceTable
SET IsActive = 1
WHERE (RowNumber % 9 = 1)
OR (RowNumber % 9 = 2)
First execution of the package looked as shown below. All the rows were directed to destination table because it was empty. The execution of the package on my machine took about 3 seconds.
Ran the row count query again to find the row counts in all three table.
To activate few more rows for OLE DB Source, I ran the below query to activate some records
UPDATE dbo.SourceTable
SET IsActive = 1
WHERE (RowNumber % 9 = 3)
OR (RowNumber % 9 = 5)
OR (RowNumber % 9 = 6)
OR (RowNumber % 9 = 7)
Second execution of the package looked as shown below. 314,268 rows that were previously inserted during first execution were redirected to staging table. 628,766 new rows were directly inserted into the destination table. The execution of the package on my machine took about 12 seconds. 314,268 rows in destination table were updated in the second Execute SQL Task with the data using staging table.
Ran the row count query again to find the row counts in all three table.
I hope that gives you an idea to implement your solution.
The two things I'd look at are your inserts (ensure you are using either the "Table or View - fast load" or "Table name or view name variable - fast load") and your updates.
As you have correctly determined, the update logic is usually where performance falls down and that is due to the OLE DB component firing singleton updates for each row flowing through it. The usual approach people take to overcome this is to write all the updates to a staging table, much as your Insert logic does. Then follow up your Data Flow Task with an Execute SQL Task to perform a bulk Update.
If you are in the mind of acquiring 3rd party tools, PragmaticWorks offers an Upsert destination
Here is a chunk of the SQL I'm using for a Perl-based web application. I have a number of requests and each has a number of accessions, and each has a status. This chunk of code is there to update the table for every accession_analysis that shares all these fields for each accession in a request.
UPDATE accession_analysis
SET analysis_id = ? ,
reference_id = ? ,
status = ? ,
extra_parameters = ?
WHERE analysis_id = ?
AND reference_id = ?
AND status = ?
AND extra_parameters = ?
and accession_id is (
SELECT accesion_id
FROM accessions
where request_id = ?
)
I have changed the tables so that there's a status table for accession_analysis, so when I update, I update both accession_analysis and accession_analysis_status, which has status, status_text and the id of the accession_analysis, which is a not null auto_increment variable.
I have no strong idea about how to modify this code to allow this. My first pass grabbed all the accessions and looped through them, then filtered for all the fields, then updated. I didn't like that because I had many connections with short SQL commands, which I understood to be bad, but I can't help but think the only way to really do this is to go back to the loop in Perl holding two simpler SQL statements.
Is there a way to do this in SQL that, with my relative SQL inexperience, I'm just not seeing?
The answer depends on which DBMS you're using. The easiest way is to create a trigger on one table that provides the logic of updating the other table. (For any DB newbies -- a trigger is procedural code attached to a table at the DBMS (not application) layer that runs in response to an insert, update or delete on the table.). A similar, slightly less desirable method is to put the logic in a stored procedure and execute that instead of the update statement you're now using.
If the DBMS you're using doesn't support either of these mechanisms, then there isn't a good way to do what you're after while guaranteeing transactional integrity. However if the problem you're solving can tolerate a timing difference in the two tables' updates (i.e. The data in one of the tables is only used at predetermined times, like reporting or some type of batched operation) you could write to one table (live) and create a separate process that runs when needed (later) to update the second table using data from the first table. The correctness of allowing data to be updated at different times becomes a large and immovable design assumption, however.
If this is mostly about connection speed, then one option you have is to write a stored procedure that handles the "double update or insert" transparently. See the manual for stored procedures:
http://dev.mysql.com/doc/refman/5.5/en/create-procedure.html
Otherwise, You probably cannot do it in one statement, see the MySQL INSERT syntax:
http://dev.mysql.com/doc/refman/5.5/en/insert.html
The UPDATE syntax allows for multi-table updates (not in combination with INSERT, though):
http://dev.mysql.com/doc/refman/5.5/en/update.html
Each table needs its own INSERT / UPDATE in the query.
In fact, even if you create a view by JOINing multiple tables, when you INSERT into the view, you can only INSERT with fields belonging to one of the tables at a time.
The modifications made by the INSERT statement cannot affect more than one of the base tables referenced in the FROM clause of the view. For example, an INSERT into a multitable view must use a column_list that references only columns from one base table. For more information about updatable views, see CREATE VIEW.
Inserting data into multiple tables through an sql view (MySQL)
INSERT (SQL Server)
Same is true of UPDATE
The modifications made by the UPDATE statement cannot affect more than one of the base tables referenced in the FROM clause of the view. For more information on updatable views, see CREATE VIEW.
However, you can have multiple INSERTs or UPDATEs per query or stored procedure.
I need to create a SQL JOB.
Step1:
Insert a Row into TaskToProcess Table and return ProcessID(PK and Identity)
Step2:
Retrive the ProcessID which is generated in step1 and pass the value to SSIS package and execute the SSIS Package.
Is this Possible in SQL server JOB??
Please help me on this
Thanks in advance.
There is no built-in method of passing variable values between job steps. However, there are a couple of workarounds.
One option would be to store the value in table at the end of step 1 and query it back from the database in step 2.
It sounds like you are generating ProcessID by inserting into a table and returning the SCOPE_IDENTITY() of the inserted row. If job step 1 is the only process inserting into this table, you can retrieve the last inserted value from job 2 using the IDENT_CURRENT('<tablename>') function.
EDIT
If multiple process could insert into your process control table, the best solution is probably to refactor steps 1 and 2 into a single step - possibly with a controlling SSIS master package (or other equivalent technology) which can pass the variables between steps.
Similar to Ed Harper's answer, but some details found in "Variables in Job Steps" MSDN forum thread
For the job environment, some flavor of Process-Keyed Tables (using
the job_id) or Global Temporary Tables seems most useful. Of course,
I realize that you might not want to have something left 'globally'
available. If necessary, you could also look into encrypting or
obfuscating the value that you store. Be sure to delete the row once
you have used it.
The Process-Keyed Tables are described in article "How to Share Data between Stored Procedure"
Another suggestion in Send parameters to SQL server agent jobs/job steps MSDN forum thread to create a table to hold the parameters, such as:
CREATE TABLE SQLAgentJobParms
(job_id uniqueidentifier,
execution_instance int,
parameter_name nvarchar(100),
parameter_value nvarchar(100),
used_datetime datetime NULL);
Your calling stored procedure would take the parameters passed to it
and insert them into SQLAgentJobParms. After that, it could use EXEC
sp_start_job. And, as already noted, the job steps would select from
SQLAgentJobParms to get the necessary values.