I'm creating a report that has an unusual BoxPlot chart. I need to calculate the values for "Low Box" and "High Box" using all of the data for the certain column. The methodology for calculating these values is not that complicated, but I can not disclose it.
Basically I want to create a custom aggregate function. I understand how to create a VB function, but how do I make it take in a series of data instead of a single value. I know there is a Max function already, but for the sake of example how would one implement a Max function?
Thanks for your help.
"can not disclose it." implies high value, which implies that you are using a recent version of SSRS, so this link should be of value for you. (The blog article also includes how you might implement this in 2005, but doesn't focus on it.)
Essentially create a custom function that gets called for every row of the data, taking in values from that row. That method or another related method can return your aggregate. 2008 includes Group Variables should help with a convenient place to store that.
Another approach, but much harder I think, would be to implement a custom data provider wrapping your query.
Related
I am using the following expression to tier the sales figures.
=sum(iif(Fields!InitialValue.Value>=500000 and Fields!InitialValue.Value<1000000,Fields!InitialValue.Value,nothing))
Basically, I just change the greater than and less than values for each cell. We have 4 tiers.
From what I understand, the IIF statement will go through each line and evaluate it before returning anything.
I am also averaging the size of each new account, so I have 8 cells that evaluate the data each time. I will also need to add how many accounts are in each tier, which means 12 passes at the same data. It takes some time to generate this report.
Is this the most efficient method?
Thanks in advance for all your help!
One way you could increase the efficiency at this, from what I can tell is one of two way, the way I would do it is add a column to your query that labels each row by Tier, this would mean when the data gets to SSRS it is already set and never needs to be evaluated. My theory is SSRS is not as smart as a query optimizer. Another way to do it, that may or may not speed it up, is add a calculated field to your data set that especially does the same thing. I believe this would have SSRS calculate it once and that is is.
Each returned transaction I am to report on is stored with a return reason code and a description of the return reason code. I built a tablix with two columns - one for return codes and another for descriptions. This works just peachy. The report owner is upset that a long list of codes will split pages - sigh. I was told to display them side-by-side.
I am new to t-sql and SSRS and its idiosyncrasies. I have minimal support from our DBAs. Two tables, filtered to display codes that meet a criteria sound simple enough.
My research:
MSDN's support network, Operators in Expressions page, and various help topics. I also found SO posts regarding split functions in t-sql and similar as well as one specifically asking about comparison and varchar. I found sites with helpful information like ResultData and Network Steve. I haven't found what I think I'm looking for.
My problem:
The return reason code is a varchar that always consists of the letter 'r' and two numeric digits (R00 to R99). It appears I can't run a comparison operator on an entire varchar that is alphanumeric; it doesn't recognize IIF((Fields!... <= R17),True,False). Additionally, the company will not allow the warehouse or its functions to be edited so I cannot create my own.
My solution ideas:
Add each Rnn code to the tablix filter, individually. This means ~50 filters per tablix and seems a sloppy or inefficient way of handling this
Separate the varchar string in to its alpha and numeric components and compare the latter using standard operators. This sounds the cleanest method but I'm unsure how to accomplish this in an expression or within SSRS
Forgo the two-table idea and create one table with four columns (code, description, code, description). This still leaves me with how to set a limit on the number of rows that can be created before 'spilling over' to the other side
I appreciate being pointed to any resources or any offered input to the issue and my (not so?)logical approach to it.
You can achieve your second option as follows:
CInt(Fields!ReturnCode.Value.Substring(1,2))
I recently asked a question about many-to-many relationships and how they can be used to calculate intersections that got answered pretty fine. Now, there is another nice-to-have requirement for our cube to extend that to more data. The general question remains: How many orders contain both product x and y?
However, the measure groups are now much larger, currently about 1.4 billion rows. I tried to implement that using the method described in the other post, with several hidden cross-referenced measure groups. However, this is simply too much for our hardware, the cube is reaching sizes next to 0.5 TB, and querys take several minutes to complete.
Now I would try to use another option: Can I access our relational database in a calculated measure? It seems I can, using UDFs like described in this article. I could write a Function in c# that queries our relational database and returns all the orders that contain the products chosen by the user. But in order to do that, I need to supply all the dimensional data the user has selected to the UDF. I also need the UDF to return the calculated value so it can be output as the result of the calculated member. Is that possible? If yes, how? The example microsoft provides only includes a small deterministic string-function as the UDF.
Here my own results:
It seems to be possible, though with limitations. The class Microsoft.AnalysisServices.AdomdServer.Context can provide you with the currentMember of each Hierarchy, however this does not work with Excel-Style-Subselects. It either contains a single member or the AllMember.
Another option is to get the MDX query using the dmv SELECT * FROM $System.DISCOVER_SESSIONS. There will be a column on that view which contains the last mdx query for a given session. However in order to not overwrite your own last query, you will need to not use the current connection, but to open a new one. The session id can be obtained through Microsoft.AnalysisServices.AdomdServer.Context.CurrentConnection.SessionID.
The second approach is ok for our use-case. It does not allow you to handle axes, since the udf-function has a cell-scope, but you don't know which cell you are in. If anyone of you knows anything about that last bit, please tell me. Thanks!
I'm hoping you can point me in the right direction.
I'm trying to generate a control chart (http://en.wikipedia.org/wiki/Control_chart) using SQL Server 2008. Creating a basic control chart is easy enough. I'd just calculate the mean and standard deviations and then plot them.
The complex bit (for me at least) is that I would like the chart to reset the mean and the control limits when a step change is identified.
Currently I'm only interested in a really simple method of identifying a step change, 5 points appearing consecutively above or below the mean. There are more complex ways of identifying them (http://en.wikipedia.org/wiki/Western_Electric_rules) but I just want to get this off the ground first.
The process I have sort of worked out is:
Aggregate and order by month and year, apply row numbers.
Calculate overall mean
Identify if each data item is higher, lower or the same as the mean, tag with +1, -1 or 0.
Identify when their are 5 consecutive data items which are above or below the mean (currently using a cursor).
Recalculate the mean if 5 points are above or 5 points are below the mean.
Repeat until end of table.
Is this sort of process possible in SQL server? It feels like I maybe need a recursive UDF but recursion is a bit beyond me!
A nudge in the right direction would be much appreciated!
Cheers
Ok, I ended up just using WHILE loops to iterate through. I won't post full code but the steps were:
Set up a user defined table data type in order to pass data into a stored procedure parameter.
Wrote accompanying stored procedure that uses row numbers and while loops to iterate along each data value in the input table and then uses the current row number to do set based processing on a subset of the input data (to check if following 5 points are above/below mean and recalculate the mean and standard deviations when this flag is tripped).
Outputs table with original values, row numbers, months, mean values, upper control limit and lower control limit.
I've also got one up and running that works based on full Nelson rules and will also state which test the data has failed.
Currently it's only been used by me as I develop it further so I've set up an Excel sheet with some VBA to dynamically construct a SQL string which it passes to a pivot table as the command text. That way you can repeatedly ping the USP with different data sets and also change a few of the other parameters on how the procedure runs (such as adjusting control limits and the like).
Ultimately I want to be able to pass the resulting data to Business Objects reports and dashboards that we're working on.
I need something like a T-SQL IN statement to filter records in a conditional split based on an array variable (or something similar)
I need to have a list of items that a column can be filtered on.
As Filip has indicated, there is no IN operator in the expression language. I did come up with some options though as I thought this sounded like an interesting problem.
My long analysis is on my blog: Filter list in SSIS
Conditional split
If you can transform your list of values into a delimited string, then you can use FINDSTRING and the current value to determine whether it's in the list. This provided the best throughput for my testing scenario. (FINDSTRING(#[User::MyListStr], [MyColumn],1)) > 0
Script task
I had assumed using a List in a script task to determine membership would provide the best performance but I was wrong. Row.IsInList = MyListObj.Contains(Row.MyColumn);
Lookup/Cache Connection Manager
The third approach I had come up with was dumping the list into a Cached Connection Manager and then using that in a lookup task. I thought this was the easiest to conceptualize and maintain but the performance was lacking.
Conclusion
For this problem domain, the FINDSTRING approach was the most efficient, by a considerable margin. The other three approaches consistently averaged a throughput of within 7 rows per millisecond of each other. I did find it interesting that the standard deviation of the FINDSTRING approach fluctuated so much. While this box is older and slower, there was not a considerable amount of activity going on during the package executions.
There is no IN operator in SSIS expression operators. And there is no similar operator. Since there is no such operator, You can't do that with built-in expressions and built-in Conditional Split. But You can do one of the following:
use Script Transformation to check if particular column has that is in variable array, and add additional column (flag) with value 1 if it contains, 0 if not; then use Conditional Split on this flag added in Script Transformation, or
it's better to put variables in database table and then use Lookup or Merge Join to check if row exists