Can i use another column in InList clause?
Example,
i have created a variable and below is the formula.
IF [query1.column1] inList ([query2.column2]) then SUM([query1.amountColumn])
Else 0
OR is it possible to put variable after inList in formula?
If not possible -- is there any other alternative to this?
I see two possible approaches. I will to use the eFashion universe for both solutions.
Solution #1
Here are my 2 queries to begin...
Run your queries. Click on the columns you want to compare, [query1].[column1] and [query2].[column2] in your case; [Query 1].[Month] and [Query 2].[Month] for me. Right-click and merge them. They must be dimensions and of the same data type.
Now create a variable based on [Query 2].[Month Name] which you can filter on to eliminate the results from Query 1 that do not match up to anything in Query 2.
[UV Month Name]=[Query 2].[Month Name]
The key here is you need to change the Qualification to "Detail" and set the Associated Dimension to what we just merged by clicking three dots to the right. Choose [Month Name] not from either query, but the merged dimension.
Now build out your table with whatever object you want from Query 1 and add in the variable we just created.
Now add a filter on that variable to only show row where it is not null.
And you are done.
Pros
Works when limiting query (query2) has a relatively large number of values (compare to Cons for Solution #2).
Cons
More complicated to set up
May run into universe or performance issues related to query being filtered (query1).
Solution #2
Building upon Solution #1, I duplicated Query 1 and renamed it Query 3. Now you can choose "Results from another query" to get the [query1].[column1] InList ([query2].[column2]) logic you want.
If you take this approach then you don't need to do the merge, variable, and filter. The results of the query are filter before being returned by the report.
Pros
Simple
Cons
The number of values coming from your second query must be relatively small. It varies by database or maybe even your universe. I have found if it is over 1,000 values I get an error when I run the query that it is "too complex".
Related
I am attempting to combine two groupings(sum), EPL and POL and relabel them as something, say "Other GL". The current output is this. I've attempted adding a formula in the criteria but it is not working. I have also attempted adding another column in the design view with a formula alone.
The best way to "combine" data rows for grouping (i.e. sums) is to create a preliminary query which reassigns the individual source rows to a common value. Then use that query as the source for the other query(ies). (Such a preliminary query could be either a nested query -a.k.a. subquery-, or a saved query. I personally prefer saved queries since they can be edited and viewed using the standard Access Query Designer, whereas subqueries can only be edited as SQL text.)
Without other database schema or SQL statement to work with, all I can show is a SQL snippet showing the altered selection:
SELECT iif(Claims2.Grouping = 'EPL' Or Claims2.Grouping = 'POL', 'Other GL', Claims2.Grouping) As AltGrouping, ...
FROM Claims2
For what it's worth, the same iif() statement could also be inserted directly into the your query as a "calculated field"--within the query designer just copy and paste it into the Field cell in place of Grouping. But a saved query that adjusts labels preliminary to final queries can be reused and makes later queries simpler.
I have job in Talend that is designed to bring together some data from different databases: one is a MySQL database and the other a MSSQL database.
What I want to do is match a selection of loan numbers from the MySQL database (about 82,000 loan numbers) to the corresponding information we have housed in the MSSQL database.
However, the tables in MSSQL to which I am joining the data from MySQL are much larger (~ 2 million rows), are quite wide, and thus cost much more time to query. Ideally I could perform an inner join between the two tables based on the loan number, but since they are in different databases this is not possible. The inner join that is performed inside a tMap occurs after the Lookup input has already returned its data set, which is quite large (especially since this particular MSSQL query will execute a user-defined function for each loan number).
Is there any way to create a global variable out of the output from the MySQL query (namely, the loan numbers selected by the MySQL query) and use that global variable as an IN clause in the MSSQL query?
This should be possible. I'm not working in MySQL but I have something roughly equivalent here that I think you should be able to adapt to your needs.
I've never actually answered a Stackoverflow question and while I was typing this the page started telling me I need at least 10 reputation to post more than 2 pictures/links here and I think I need 4 pics, so I'm just going to write it out in words here and post the whole thing complete with illustrations on my blog in case you need more info (quite likely, I should think!)
As you can see, I've got some data coming out of the table and getting filtered by tFilterRow_1 to only show the rows I'm interested in.
The next step is to limit it to just the field I want to use in the variable. I've used tMap_3 rather than a tFilterColumns because the field I'm using is a string and I wanted to be able to concatenate single quotes around it but if you're using an integer you might not need to do that. And of course if you have a lot of repetition you might also want to get a tUniqueRows in there as well to save a lot of unnecessary repetition
The next step is the one that does the magic. I've got a list like this:
'A1'
'A2'
'B1'
'B2'
etc, and I want to turn it into 'A1','A2','B1','B2' so I can slot it into my where clause. For this, I've used tAggregateRow_1, selecting "list" as the aggregate function to use.
Next up, we want to take this list and put it into a context variable (I've already created the context variable in the metadata - you know how to do that, right?). Use another tMap component, feeding into a tContextLoad widget. tContextLoad always has two columns in its schema, so map the output of the tAggregateRows to the "value" column and enter the name of the variable in the "key". In this example, my context variable is called MyList
Now your list is loaded as a text string and stored in the context variable ready for retrieval. So open up a new input and embed the variable in the sql code like this
"SELECT distinct MY_COLUMN
from MY_SECOND_TABLE where the_selected_row in ("+
context.MyList+")"
It should be as easy as that, and when I whipped it up it worked first time, but let me know if you have any trouble and I'll see what I can do.
I have two tables and from that I am generating a query. I have columns in my query with field type yes/no. These are basically columns to determine the race of a person. The user enters information in yes/no for various races. I want another calculated column in the query which checks for all other race columns and calculates values in it.
I have to check for a few conditions in for the values in columns
For example:
1) If Hispanic is chosen, the new column should say hispanic(no matter what other options are selected. This is like a trump card)
2) If more than one is selected, then new column entry should say "multi"
3) If none of the options are selected, it should say "unknown"
4) If exactly one of them is selected, then that race should be displayed
Can anyone help me with this? I am new to Access
I can't code it for you but I can point you in the right direction. What you want to do is take all the tests you explained above and put them in a coded format :
iif ( condition, value_if_true, value_if_false )
Since you have a lot of possible outputs i'd use something like a Case Statement where you can test for all the possibilities.
Follow this link if you need any info on how to code both type of statements (iif and case).
Once you have tried something like this, you can comeback with a specific question if you encountered a problem in the process.
Good luck with your database.
I need to do a count on the items in a joined result set where a condition is true. I thus have a "from join where where" type of expression. This expression must end with a select or groupby. I do not need the column data actually and figure it is thus faster not to select it:
count = (from e in dc.entries select new {}).Count();
I have 2 questions:
Is there a faster way to do this in terms of the db load?
I have to duplicate my entire copy of the query. Is there a way to structure my query where I can have it one place for both counts and for getting say a list with all fields?
Thanks.
Please pay especial attention:
The query is a join and not a simple table thus I must use a select statement.
I will need 2 different query bodies because I do not need to load all the actual fields for the count but will for the list.
I assume when I use the select query it is filling up with data when I use query.Count vs Table.Count. Look forward to those who understand what I'm asking for possible better ways to do this and some detailed knowledge of what actually happens. I need to pull out the logging to look into this deeper.
Queryable.Count
The query behavior that occurs as a
result of executing an expression tree
that represents calling
Count(IQueryable)
depends on the implementation of the
type of the source parameter. The
expected behavior is that it counts
the number of items in source.
In fact, if you use LinqToSql or LinqToEntities, Queryable.Count() is sent into the database. No columns are loaded to memory. Check the generated sql to confirm.
I assume when I use the select query it is filling up with data when I use query.Count vs Table.Count
This is not true. Check the generated sql to confirm.
I have to duplicate my entire copy of the query. Is there a way to structure my query where I can have it one place for both counts and for getting say a list with all fields
If you need both the count and the list, get the list and count it.
If you need the count sometimes and other times you need the list... write a method that returns the complex IQueryable, and sometimes call .Count() and other times call .ToList();
I do not need the column data actually and figure it is thus faster not to select it.
This is basically false in your scenario. It can be true in a scenario where an index covers the result columns, but you don't have any result columns.
In your scenario, whatever index is chosen by the query optimizer, that index can be used to make the count.
Sum up: Query optimizer will perform the optimization you desire.
//you can put a where condition here
var queryEntries = from e in dc.entries select e;
//Get count
queryEntries.Count();
//Loop through Entries, so you basically returned all entries
foreach(entry en in queryEntries)
{}
This is quite a strange problem, wasn't quite sure how to title it. The issue I have is some data rows in an SSIS task which need to be modified depending on other rows.
Name Location IsMultiple
Bob England
Jim Wales
John Scotland
Jane England
A simplifed dataset, with some names, their locations, and a column 'IsMultiple' which needs to be updated to show which rows share locations. (Bob and Jane's rows would be flagged 'True' in the example above).
In my situation there is much more complex logic involved, so solutions using sql would not be suitable.
My initial thoughts were to use an asyncronous script task, take in all the data rows, parse them, and then output them all after the very last row has been input. The only way I could think of doing this was to call row creation in the PostExecute Phase, which did not work.
Is there a better way to go about this?
A couple of options come to mind for SSIS solutions. With both options you would need the data sorted by location. If you can do this in your SQL source, that would be best. Otherwise, you have the Sort component.
With sorted data as your input you can use a Script component that compares the values of adjacent rows to see if multiple locations exist.
Another option would be to split your data path into two. Do this by adding a Multicast component. The first path would be your main path that you currently have. In the second task, add an Aggregate transformation after the Multicast component. Edit the Aggregate and select Location as a Group By operation. Select (*) as a Count all. The output will be rows with counts by location.
After the Aggregate, Add a Merge Join component and select your first and second data paths as inputs. Your join keys should be the Location column from each path. All the inputs from path 1 should be outputs and include the count from path 2 as an output.
In a derived column, modify the isMultiple column with an expression that expresses "If count is greater than 1 then true else false".
If possible, I might recommend doing it with pure SQL in a SQL task on your control flow prior to your data flow. A simple UPDATE query where you GROUP BY location and do a HAVING COUNT for everything greater than 1 should be able to do this. But if this is a simplified version this may not be feasible.
If the data isn't available until after the data flow is done you could place the SQL task after your data flow on your control flow.