I want to compare two row counts for two tables from two different connexions.
I've tried to get the number of rows for each different table by executing
Select count(*) as count1 from Table1
and
Select count(*) as count2 from Table2
in two different Execute SQL Script steps, as in the following screenshot, but I have no idea how to proceed.
Especially, I want to get the two different counts and compare them then branch with success/failure on whether they're equal or not, respectively.
How can I achieve this ?
It's pretty easy. There is a step called Evaluate rows number in a table. This both gets a row count from a table and tests it against a value. The value can come from a variable in the Job (note Job, not Transform).
So all you need to do is create a variable with a Set variables task, get the row count from one of your tables and then execute the Evaluate rows task. The following job will do just that.
The transform to get the row count for the other table is very simple. Just execute a SELECT COUNT(*) FROM {tblname} in a Table input step and flow the output to a Set variables step in a transform. Be sure to mark the variable as valid in the parent job.
You can also execute an SQL against a connection with the JavaScript step which would avoid creating the transform, but I prefer to avoid scripting when possible.
You can use Table Compare step, everything you need is here, regards
Related
Can i use another column in InList clause?
Example,
i have created a variable and below is the formula.
IF [query1.column1] inList ([query2.column2]) then SUM([query1.amountColumn])
Else 0
OR is it possible to put variable after inList in formula?
If not possible -- is there any other alternative to this?
I see two possible approaches. I will to use the eFashion universe for both solutions.
Solution #1
Here are my 2 queries to begin...
Run your queries. Click on the columns you want to compare, [query1].[column1] and [query2].[column2] in your case; [Query 1].[Month] and [Query 2].[Month] for me. Right-click and merge them. They must be dimensions and of the same data type.
Now create a variable based on [Query 2].[Month Name] which you can filter on to eliminate the results from Query 1 that do not match up to anything in Query 2.
[UV Month Name]=[Query 2].[Month Name]
The key here is you need to change the Qualification to "Detail" and set the Associated Dimension to what we just merged by clicking three dots to the right. Choose [Month Name] not from either query, but the merged dimension.
Now build out your table with whatever object you want from Query 1 and add in the variable we just created.
Now add a filter on that variable to only show row where it is not null.
And you are done.
Pros
Works when limiting query (query2) has a relatively large number of values (compare to Cons for Solution #2).
Cons
More complicated to set up
May run into universe or performance issues related to query being filtered (query1).
Solution #2
Building upon Solution #1, I duplicated Query 1 and renamed it Query 3. Now you can choose "Results from another query" to get the [query1].[column1] InList ([query2].[column2]) logic you want.
If you take this approach then you don't need to do the merge, variable, and filter. The results of the query are filter before being returned by the report.
Pros
Simple
Cons
The number of values coming from your second query must be relatively small. It varies by database or maybe even your universe. I have found if it is over 1,000 values I get an error when I run the query that it is "too complex".
I have data from two different source locations that need to be combined into one. I am assuming I would want to do this with a merge or a merge join, but I am unsure of what exactly I need to do.
Table 1 has the same fields as Table 2 but the data is different which is why I would like to combine them into one destination table. I am trying to do this with SSIS, but I have never had to merge data before.
The other issue that i have is that some of the data is duplicated between the two. How would I only keep 1 of the duplicated records?
Instead of making an entirely new table which will need to be updated again every time Table 1 or 2 changes, you could use a combination of views and UNIONs. In other words create a view that is the result of a UNION query between your two tables. To get rid of duplicates you could group by whatever column uniquely identifies each record.
Here is a UNION query using Group By to remove duplicates:
SELECT
MAX (ID) AS ID,
NAME,
MAX (going)
FROM
(
SELECT
ID :: VARCHAR,
NAME,
going
FROM
facebook_events
UNION
SELECT
ID :: VARCHAR,
NAME,
going
FROM
events
) AS merged_events
GROUP BY
NAME
(Postgres not SSIS, but same concept)
Instead of Merge and Sort , Use union all Sort. because Merge transform need two sorted input and performance will be decreased
1)Give Source1 & Source2 as input to UnionALL Transformation
2) Give Output of UnionALL transfromation to Sort transformation and check remove duplicate keys.
This sounds like a pretty classic merge. Create your source and destination connections. Put in a Data Flow task. Put both sources into the Data Flow. Make sure the sources are both sorted and connect them to a Merge. You can either add in a Sort transformation between the connection and the Merge or sort them using a query when you pull them in. It's easier to do it with a query if that's possible in your situation. Put a Sort transformation after the Merge and check the "Remove rows with duplicate sort values" box. That will take care of any duplicates you have. Connect the Sort transformation to the data destination.
You can do this without SSIS, too.
I want to create this query with ssis component:
SELECT TOP 1 Date_Execution
FROM Action
WHERE Action_Ref=1
ORDER BY Date_Execution
Can I do that without using script component?
Update
This query must be included in a flow task. I need it to calculate a field of a table I'm creating.
Use the Aggregate transform. If you insist on starting with the full table contents then use a Conditional Split transform to get those rows that meet your WHERE clause, then use the Aggregate transform to get the minimum Date_Execution from the remaining rows.
Or just be reasonable about it and use a SQL execute task. The database server will be much more efficient than SSIS at getting this result.
I have joined 5 tables and done transformation on these tables. Now I got a single table at the end. Now I want to perform sql query on this single table to filter records. But I don't know how to perform simple sql query on this table. I have attached a snap shot which shows the resulting table. How I get this resulting data set as the source? I want to populate my destination after filter out this data.
I am using SSIS 2008.
Click here to see the Table on which I want to perform a simple sql query
SELECT * FROM `first_table`
where `some_column` =
(
SELECT `*`
FROM second_table
WHERE
`some_column2`='something'
LIMIT 1
)
Try this code This will help. You can even use this to connect all those four tables with each other.
From the image you posted, it looks like you have a set of data in the dataflow you're trying to query against. You need to do one of two things at this point. Either you insert the data into a table in the database and use another data flow to query it, or you use use a conditional split (or multicast and conditional splits) to filter the rows down further from there.
Without more detail about what you're actually trying to accomplish, these are the recommendations I can determine.
You could send the rows into a record set destination, but you aren't able to query it like a regular table and you'd need some C#/VB skills to access it to do more than a FOR EACH loop.
Assuming your sql query that you want to run against the resulting table is simple, you can use a script component task. By simple, I mean, if it is of this nature:
SELECT * FROM T WHERE a = 'zz' and b = 'XX' etc.
However, if your query has self joins, then you would be better of dumping the outcome of joining those 5 tables in to a physical table, and go from there.
It appears that query is going to be real straight-forward; in that case using a script component would be helpful.
A separate question: It's advisable to do the sorting at the database level. You are using 5 sort tasks in your solution. Can you please elucidate the reason?
I need to do a count on the items in a joined result set where a condition is true. I thus have a "from join where where" type of expression. This expression must end with a select or groupby. I do not need the column data actually and figure it is thus faster not to select it:
count = (from e in dc.entries select new {}).Count();
I have 2 questions:
Is there a faster way to do this in terms of the db load?
I have to duplicate my entire copy of the query. Is there a way to structure my query where I can have it one place for both counts and for getting say a list with all fields?
Thanks.
Please pay especial attention:
The query is a join and not a simple table thus I must use a select statement.
I will need 2 different query bodies because I do not need to load all the actual fields for the count but will for the list.
I assume when I use the select query it is filling up with data when I use query.Count vs Table.Count. Look forward to those who understand what I'm asking for possible better ways to do this and some detailed knowledge of what actually happens. I need to pull out the logging to look into this deeper.
Queryable.Count
The query behavior that occurs as a
result of executing an expression tree
that represents calling
Count(IQueryable)
depends on the implementation of the
type of the source parameter. The
expected behavior is that it counts
the number of items in source.
In fact, if you use LinqToSql or LinqToEntities, Queryable.Count() is sent into the database. No columns are loaded to memory. Check the generated sql to confirm.
I assume when I use the select query it is filling up with data when I use query.Count vs Table.Count
This is not true. Check the generated sql to confirm.
I have to duplicate my entire copy of the query. Is there a way to structure my query where I can have it one place for both counts and for getting say a list with all fields
If you need both the count and the list, get the list and count it.
If you need the count sometimes and other times you need the list... write a method that returns the complex IQueryable, and sometimes call .Count() and other times call .ToList();
I do not need the column data actually and figure it is thus faster not to select it.
This is basically false in your scenario. It can be true in a scenario where an index covers the result columns, but you don't have any result columns.
In your scenario, whatever index is chosen by the query optimizer, that index can be used to make the count.
Sum up: Query optimizer will perform the optimization you desire.
//you can put a where condition here
var queryEntries = from e in dc.entries select e;
//Get count
queryEntries.Count();
//Loop through Entries, so you basically returned all entries
foreach(entry en in queryEntries)
{}