Get output of TMSSqlRow in Talend - output

I would like to get the number of row affected / deleted / updated with a TMSSqlRow.
Here is how the job is:
the file use contains a lot of sql statement like DELETE ... INSERT ... UPDATE...
each row are separate by ";"
But now, I would like to get result of each statement (x rows updated, like results are display in management studio).
When I go to "advanced settings" tab of tmssqlrow, I select " Propagate QUERY's recordset" and select a column I created before (Object Type).
On execution, I have this error:
The executeQuery method must return a result set.
So, how I can get the result of each statement and insert it (by example) in a database / file?

The option "Propagate QUERY's recordset" must be used in combination with a tParseRecordSet in order to extract info from the returned recordset. However, that is not sufficent: you must explicitly write the query to return the number of records updated/deleted.
Here's what I did:
My tJDBCRow (same as tMSSqlRow) query looks like this (notice how I had to add 'set nocount on' before the update query, and 'select ##rowcount' after)
tParseRecordSet retrieves the number of lines from the column resultset (nbLines is the alias of my rowcount)

If you need the number of rows affected, a better option is to use the tMSSqlOutput component which can update,insert or delete rows. After execution, the component provides global variables to show how many rows were affected by the operation.
((Integer)globalMap.get("tMSSqlOutput_1_NB_LINE"))
((Integer)globalMap.get("tMSSqlOutput_1_NB_LINE_UPDATED"))
((Integer)globalMap.get("tMSSqlOutput_1_NB_LINE_INSERTED"))
((Integer)globalMap.get("tMSSqlOutput_1_NB_LINE_DELETED"))

Related

Can SqlAlchemy's array_agg function accept more than one column?

I want to return arrays with data from the entire row (so all columns), not just a single column. I can do this with a raw sql statement in Postgresql,
SELECT
array_agg(users.*)
FROM users
WHERE
l_name LIKE 'Br%'
GROUP BY f_name;
but when I try to do it with SqlAlchemy, I'm getting
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) can't adapt type 'InstrumentedAttribute'
For example, when I execute this query, it works fine
query: Query[User] = session.query(array_agg(self.user.f_name))
But with this I get arrays of rows with only one column value in them (in this example, the first name of a user) whereas I want the entire row (all columns for a user).
I've tried explicitly listing multiple columns, but to no avail. For example I've tried this:
query: Query[User] = session.query(array_agg((self.user.f_name, self.user.l_name))))
But it doesn't work. I get the above error message.
You could use Python feature unpack for create
example = [func.array_agg(column) for column in self.example.__table__.columns]
query = self.dbsession.query(*attach)
And after join results

How to compare the two table row count , if counts matches than ok if not matches this will restart the SSIS package

I have made the ssis package in which i made the data flow for incremental data. Source and destination server ip's are different. Below you can find the flow diagram of my packageControl flow diagram
Data flow diagram
the package is working fine .
In the Execute SQl task :- it controls the log table and start the incremental task
query which i used is :-
insert into audit_log (
Packagename,
process_date,
start_datetime,
end_datetime,
Record_processed,
status
)values('CRM-TO-TRANSORGDB',null,GETDATE(),null,null,null);
select MAX(ID) as ID,MAX(process_date) as proc_date from audit_log where Packagename ='CRM-TO-TRANSORGDB' ;
store the ID and proc_date in the variable.
in the Execute SQl task 1:- it just update the log table.
UPDATE audit_log
SET
process_date=?,
end_datetime = GETDATE(),
status='SUCCESS'
record_processed=?
WHERE (packagename = 'CRM-TO-TRANSORGDB') AND ID=? ;
this is the query we have used to update the log table.
In the Data flow simple fetching the all the records and put in into the destination table.
this all i have done .
But my question are:-
1) How to compare the total no. of row counts from the source table to destination table in ssis package.
2) if its doesn't matches than it will restart my task automatically.
#thomas as per your instruction i have done the following thing:
1) i have made the Execute SQl Task for source and destination .
2) and Add the Execute Package task and added the condition for not matching the count.
and added the expression for check row_count_src!= row_count_dest
and in Source_table_count i have used the below query:
select count(SubOrderID) as row_count_src from fact_suborder_journey
WHERE Suborderdate between '2016-06-01' and GETDATE()-1 ;
in dest_table_count i have used the below query:
select count(SubOrderID) as row_count_dest from fact_suborder_journey
WHERE Suborderdate between '2016-06-01' and GETDATE()-1 ;
i have added the two variable as int64 in ths ssis package. and map in the result set below you can find the pic what i have done.
but After done all this this i am getting this error:
[Execute SQL Task] Error: An error occurred while assigning a value to variable "row_count_src": "The type of the value being assigned to variable "User::row_count_src" differs from the current variable type. Variables may not change type during execution. Variable types are strict, except for variables of type Object.
".
I havent tested this completely but you might be able to do something like this. This creates a loop of your packages and will executes as long as your count variables are different from each other.
What have i done?
First i have a DataFlow Task which moves data from source to
destination.
Then i have an Execute SQL task which basically counts all rows from
TableA and maps it to variable count1 eg. Source table
Then i have an Execute SQL task which basically counts all rows from
TableB and maps it to variable count2 eg. Destination Table
Then i create an Execute Package task where i reference it too it
self. Then i make a precedence constraint with an expression saying
Count1 != count2.
Because if they are different you want to restart the task. If they
are equal the last task Execute Package task will never be executed.
Hope that is something like that?
If I understand your challenge correctly...
In the data flow task, use a RowCount transformation between source
and destination to capture the rows written to the destination. This
will be stored in a variable.
In the control flow, get the max row counts available from the log table and store that a variable.
Create an execute package tasks that executes this same package and put a precedence constraint before if that compares if variable from Step1 <> variable in Step2.

Find out Total Number of Rows affected by SQL Command Variable in Data Flow Task

I have SQL Command From Variable (In General it is a Select Statement) as a Source in DataFlow Task.
Destination is .csv File.
Problem: Even though no rows is affected by SQL command Variable .csv file is generating without records. I don't want to generate the file if the select statement (from SQL command variable) populate no records.
Please advise me.
Simple procedure:
you could count the rows with a query before export, using Execute SQL Task, if the number of rows is greater than 0 then proceed with the export;
The following is a possible solution:
use a query like SELEC COUNT(*) AS MYCOUNT FROM...
use a package variable (myVariable to associate with MYCOUNT), to contain the number of rows
set Result Set = Single Row in SQL Task Editor
map the variable in tab Result Set in SQL Task Editor (MYCOUNT - myVariable)
use two arrows from Execute SQL Task in each arrow choose Evaluation operation: Expression, Expression: myVariable > 0 (first arrow) and myVariable == 0 (second arrow), choose Logical OR, in this way you have a bifurcation!
connect the export to the arrow with myVariable > 0
connect the other arrow to another possible task, for example it can warn you that there are no rows via email
For counting rows can also use the task: Row Count (present in the latest SSIS versions); the Row Count transformation counts rows as they pass through a data flow and stores the final count in a variable.
I hope it help

Number of rows incorrectly retrieved with updateQuery() statement

I want to print the number of rows retrieved from an updateQuery() statement in JDBC (mysql database). The code I have till now is this:
int rows=0;
//constructor sets this up like opening connection, etc...
String buildSelectQuery = buildSelectQueryForCode();
stmt = connection.createStatement();
rows= stmt.executeUpdate(buildSelectQuery); <---- mismatch
System.out.println(rows);
Where buildSelectQuery is CREATE VIEW viewName AS (SELECT * FROM tableName WHERE gen-congruence>1). There is a getRows method as well in the class:
public String getRows(){
return Integer.toString(rows);
}
Now, this query should ideally pull out over 2000 records and this is done in the view as well (in the database actually) but the getRows (which is being called in the GUI) prints out incorrect number of rows in the view (0) and I have no idea why. Is there another method to setup the result set? Am I doing something wrong? Please help me.
Your query is creating a view, not selecting from the view, so no rows are returned. You need to update rows when some rows are read.

Query not working in execute SQL task in the ssis package

This query works fine in the query window of SQL Server 2005, but throws error when I run it in Execute SQL Task in the ssis package.
declare #VarExpiredDays int
Select #VarExpiredDays= Value1 From dbo.Configuration(nolock) where Type=11
DECLARE #VarENDDateTime datetime,#VarStartDateTime datetime
SET #VarStartDateTime= GETDATE()- #VarExpiredDays
SET #VarENDDateTime=GETDATE();
select #VarStartDateTime
select #VarENDDateTime
SELECT * FROM
(SELECT CONVERT(Varchar(11),#VarStartDateTime,106) AS VarStartDateTime) A,
(SELECT CONVERT(Varchar(11),#VarENDDateTime,106) AS VarENDDateTime) B
What is the issue here?
Your intention is to retrieve the values of start and end and assign those into SSIS variables.
As #Diego noted above, those two SELECTS are going to cause trouble. With the Execute SQL task, your resultset options are None, Single Row, Full resultset and XML. Discarding the XML option because I don't want to deal with it and None because we want rows back, our options are Single or Full. We could use Full, but then we'd need to return values of the same data type and then the processing gets much more complicated.
By process of elimination, that leads us to using a resultset of Single Row.
Query aka SQLStatement
I corrected the supplied query by simply removing the two aforementioned SELECTS. The final select can be simplified to the following (no need to put them into derived tables)
SELECT
CONVERT(Varchar(11),#VarStartDateTime,106) AS VarStartDateTime
, CONVERT(Varchar(11),#VarENDDateTime,106) AS VarENDDateTime
Full query used below
declare #VarExpiredDays int
-- I HARDCODED THIS
Select #VarExpiredDays= 10
DECLARE #VarENDDateTime datetime,#VarStartDateTime datetime
SET #VarStartDateTime= GETDATE()- #VarExpiredDays
SET #VarENDDateTime=GETDATE();
/*
select #VarStartDateTime
select #VarENDDateTime
*/
SELECT * FROM
(SELECT CONVERT(Varchar(11),#VarStartDateTime,106) AS VarStartDateTime) A,
(SELECT CONVERT(Varchar(11),#VarENDDateTime,106) AS VarENDDateTime) B
Verify the Execute SQL Task runs as expected. At this point, it simply becomes a matter of wiring up the outputs to SSIS variables. As you can see in the results window below, I created two package level variables StartDateText and EndDateText of type String with default values of an empty string. You can see in the Locals window they have values assigned that correspond to #VarExpiredDays = 10 in the supplied source query
Getting there is simply a matter of configuring the Result Set tab of the Execute SQL Task. The hardest part of this is ensuring you have a correct mapping between source system type and SSIS type. With an OLE DB connection, the Result Name has no bearing on what the column is called in the query. It is simply a matter of referencing columns by their ordinal position (0 based counting).
Final thought, I find it better to keep things in their base type, like a datetime data type and let the interface format it into a pretty, localized value.
you have more that one output type. You have two variables and one query.
You need to select only one on the "resultset" propertie
are you mapping these to the output parameters?
select #VarStartDateTime
select #VarENDDateTime