SSRS report with SSAS as the data source performance bottleneck - reporting-services

I've searched web and StackOverflow for a long time but it was not useful for me.
I created an SSRS Report which uses Cube as Data Source. In SSMS the cube's query is fast and about Maximum 4 seconds, but the report shows the results after 1 minute.
I used SQL Server Profiler on Cube and I understood that Serializing the Results is the most time-consuming event on my report. First of all, I didn't find any solution to decrease "serialize the result" events and then I don't know how to improve my server utilization because despite of this slow report, its resource utilization (CPU, NETWORK, MEMORY) is at most 5 percent !!
by the way, my report returns about 300 thousand rows in every execution. (ROWS_RETURNED ~ 300000)
Mdx Query
SELECT NON EMPTY{[Measures].Members} ON COLUMNS, NONEMPTY ( ( NONEMPTY((STRTOSET(#BankSelect))) * NONEMPTY ( STRTOSET(#BranchSelect) ) * NONEMPTY ( STRTOSET(#DimTimeShamsiFullDate))* NONEMPTY ( (STRTOSET(#BranchTypeSelect) )) * NONEMPTY ( (STRTOSET(#StateSelect) )) * NONEMPTY ((STRTOSET(#CitySelect)) * NONEMPTY (STRTOSET(#RowTypeSelect))) * NONEMPTY ( (STRTOSET(#MoneyStatusSelect))) * NONEMPTY ((STRTOSET(#MoneyTypeSelect))) * NONEMPTY ((STRTOSET(#MoneyUnitNameSelect)) ) )) ON ROWS FROM (SELECT ( STRTOSET(#DateInFrom)) ON COLUMNS FROM [SinapDW]) CELL PROPERTIES VALUE
Any help is appreciated :)
Thank you.

If your problem is network serialization or performance I would suggest having a look at the performance whitepaper on how to optimize the network packets SSAS sends. These documents provide insights on how to optimise resource usage for SSAS and also how to use its inner settings for optimisations based on your workloads.
SQL SSAS 2008R2
SQL SSAS 2012 & 2014

Related

Cube vs. Micro Cube in BusinessObjects

I am having problems understanding cubes and microcubes in BusinessObjects environment.
Although I have tried to find answers online, I did not find an answers that give overall explanations.
Beside description of the functionality, I would like to know where is cube and where is micro cube located: on server or in browser?
How many cubes/microcubes are there? One microcube per report or one micro cube per session, or sthg else?
Furthermore, can anyone explain the difference in aggregations on database level as opossed to aggregation on report level (when defining a measure, there are two possibilities - to define aggregation on report and/or aggregation level). Although there are some answers online, they are too general. Therefore, I would appreciate a simple explanation with an example.
And finaly, is it possible to color tables in data foundation layer? (since I have a lot of tables in a universe, it would be very helpful if I could color fact and dimensional tables).
It helps to understand the two-pass nature of querying traditional (non-OLAP) data in BO. When you refresh a report in BO, the report engine constructs a SQL statement based on the objects (results and conditions) that are specified in the query.
The SQL statement is sent to the database, and the report engine then retrieves the data that is returned from the database. This data becomes the microcube -- it is nothing more than a tabular representation of the data that was received from the database. If you could look at it visually, it would look identical to what you would get by running the SQL statement in a traditional SQL tool (such as TOAD or SQL*Plus).
The microcube then becomes the source for the report presentation. Your could, for example, create a table in a report by dragging in dimensions and measures from the object list. As you are dragging in these objects, the table will recalculate and redisplay as appropriate. This is all done by retrieving and calculating the data from the microcube, not from the source data. That is, as you are interacting with a report tab, you are not initiating a refresh from the database -- all of the calculation is done from the microcube.
For example, let's say you have created a new query with two dimensions (State, Region) and one measure (Sales). The SQL might look like this:
select state_nm,region_nm,sum(sales_amt)
from all_sales
group by state_nm,region_nm
Note that there is a sum() applied to sales_amt. This is causing the database to perform the initial aggregation on this field.
The microcube that is created after the refresh may look like:
AL North 30000
AL South 40000
AR North 5000
AR South 10000
Now you create a table in your report, and select just State and Sales. The report engine uses the data in the microcube to display the result:
AL 70000
AR 5000
In this case, the BO report engine has performed a sum() aggregation on Sales Amt. This is report-side aggregation.
Then you remove State. Again, the report engine aggregates the table from the microcube, not the database:
75000
The microcube is stored with the report file. If you are working with a WebI report via InfoView, then the microcube is loaded into the server's memory. When you save the report, a physical file is created on the server (with a .wid extension); the microcube is stored in that file. If you open the report again, the microcube is again loaded into the server's memory and is available for interaction.
If you are using Rich Client, then the same behavior applies, it's just using your workstation and local storage.
The term "cube" is generally used to describe data sources in an OLAP environment (SSAS or Essbase, for example). This is external to BO -- BO would be querying data from an OLAP cube.
Regarding the aggregation:
Database aggregation is performed by your RDBMS on the source data, before it is transferred to the client application (e.g. Web Intelligence). You can apply database aggregation by using a statement such as SUM() or COUNT() in the select statement of your measure (in the business layer of your universe). It only makes sense for measures objects, not dimensions.
Changes the data retrieved from the database
Can have a positive impact on performance due to a small dataset being returned
Leverages the database aggregate performance
.
Projection aggregation or report aggregation is the aggregation performed by the client application (e.g. Web Intelligence) after retrieving the data from the database, thus in-memory.
Happens on the fly as the dimensions change to which the measure is projected (hence projection aggregation)
The result set retrieved from the database remains the same
Regarding the table colours: have a look at the tutorial Apply color to tables that share the same information. This should show you how to configure the colours for the tables in the data foundation.

Improving Column Grouping Speed in Microsoft Reporting Services

I am doing a large application where I am using column grouping in my reports. Unfortunately, the performance is pitifully slow, and my customer is complaining about it. As an example, if they run a report for a 24 hour period, it takes ~10 minutes to return (~800 display pages of data). If they run it for a month, it may never return!
The query itself for a 24 hour period returns in ~20 seconds. The balance of the time is pivoting and generating the report.
Do you have any suggestions as what I could do?
Thanks!
Check to make sure you are sorting on the report side as opposed to the query side. This can speed things up. Sorting groups or sorting by aggregate values is much simpler in the report than in the query and is frequently more efficient also.
Take a look at these tips for speeding up performance.
Troubleshooting Reports: Report Performance
Reporting Services has three stages in creating a report:
Data retrieval (executing the queries to return the dataset results)
Processing (grouping and aggregating the returned data according to
the report layout)
Rendering (generating the selected output, e.g.
HTML, Excel)
If your query returns data in ~20 seconds but the report takes 10 minutes to render, then the main issue is with the speed of the report processing (rendering is rarely a performance bottleneck), as you rightly assumed. The best way to improve performance is to off load as much of the aggregation as possible to the source database by re-writing your query to do the aggregating there. The database platform is usually much faster at aggregating data than Reporting Services. Ideally you want to return the minimum amount of data required for the report so that it has as little processing to do as possible.
If you have access to the ReportServer database, run this query to confirm where the bottleneck is:
SELECT ItemPath, Format, TimeStart, TimeEnd, TimeDataRetrieval, TimeProcessing, TimeRendering, [Status], ByteCount, [RowCount]
FROM ExecutionLog3
WHERE ItemPath LIKE '%My Report Name%'
TimeDataRetrieval, TimeProcessing and TimeRendering should give you a clear idea of where the problem lies. If the issue is with TimeProcessing then try and rewrite the query to reduce the data for the report and also review the possible design issues to see if any of them apply.

Unioning in parallel... will SQL Server do it?

If I want to union data from multiple tables located on different drives, will SQL pull the data in parallel? Are there any related setting or hints I should know about?
The UNION should run in parallel, at least since SQL Server 2005.
It doesn't make a difference if the tables are located on different drives or the same drive. In the modern world, disk can be virtual, or have multiple read heads. The distinction between one drive and more than one drive is less and less relevant.
If you have MAXDOP set to 1, then there will only be one thread.
Do note that UNION is going to be much slower than UNION ALL.
Brandon . . . let me respond here. You seem to be thinking in terms of older style architectures. These definitely still exist. However, modern disks have multiple read heads and multiple platters. Often, the issue with returning data involves the bandwidth at the controller level, and not the speed of the read. You also have multiple levels of caching and read-ahead (sometimes at both the file system and database levels). You are often better off letting the data base engines manage this complexity.
For instance, the machine that I'm working on right now is really a virtual machine. The disk I use is a partition on an EMC box. The processors are some set of processors in a big box.
My understanding of multi-threading in SQL Server is that we should leave it to the query optimiser - queries will be run in parallel when optimal.
You can limit the number of threads by using the MAXDOP hint (see What is the purpose for using OPTION(MAXDOP 1) in SQL Server?).
The default behaviour is to run in parallel when possible and optimal.
I wouldn't count on data being returned in a specific order solely by the order of your union'ed queries.
For me, when I have to do something like that I always wrap that entire query as a sub select only to handle sorting. like the following
Select pk_id, value from (
select pk_id, value from table1
union
select pk_id, value from table2
) order by PK_id, value
That way your never surprised by what you get back.

How to use for each loop to help load large dataset

I'm trying to load a large dataset from SQL Server 2008 in SSIS. However, it's too slow for Visual Studio load everything at once. Then I decide to use for-each loop to just load part of the table each time.
E.g. If there are 10 million records, I wish I could just load 1 million each time and run for 10 times to complete processing.
This is just my "brain design" and I've no idea how to make it with Foreach Loop component. Is there any other approach to deal with a large dataset?
So many variables to cover and I have a meeting in 5 minutes.
You say it's slow. What is slow? Without knowing that, you could be spending forever chasing the wrong rabbit.
SSIS took the crown in 2008 for the ETL processing speed by loading 1TB in 30 minutes. Sure, they tweaked the every loving bejesus out of the system to get it to do so but they lay out in detail what steps they took.
10M rows, while sounding large, is nothing I'd consider taxing to SSIS. To start, look at your destination object (assume OLEDB). If it doesn't have the Fast Load option checked, you are issuing 10M single insert statements. That is going to swamp your transaction log. Also look at the number of rows in your commit size. 0 means all or nothing which may or may not be the right decision based on your recoverability but do realize the implication that holds for your transaction log (it's going to eat quite a bit of space).
What transformation(s) are you applying to the data in the pipeline? There are transforms that will kill your throughput (sort, aggregation, etc)
Create a baseline package, all it does is read N rows of data from the source location and performs a row count. This would be critical to understanding the best theoretical throughput you could expect given your hardware.
Running a package in Visual Studio/BIDS/SSDT is slower, sometimes by an order of magnitude than the experience you will receive from invocation through SQL Agent/dtexec as it does not wrap the execution in a debugger.
I'll amend this answer as I have time but those are some initial thoughts. I'll post on using foreach loop task to process discrete chunks of data after the meeting.
The best way in my opinion is to functionally partition your data. A date column is in most cases appropriate to do this. Let's take an order date as an example.
For that column, find the best denominator, for example each year of your order date produces about a million rows.
Instead of a for each loop container, use a for loop container.
To make this loop work, you'll have to find the minimum and maximum year of all order dates in your source data. These can be retrieved with SQL statements that save their scalar result into SSIS variables.
Next, set up your for loop container to loop between the minimum and maximum year that you stored in variables earlier, adding one year per iteration.
Lastly, to actually retrieve your data, you'll have to save your source SQL statement as an expression in a variable with a where clause that specifies the current year of produced by your for loop container:
"SELECT * FROM transactions WHERE YEAR(OrderDate) = " + #[User::ForLoopCurrentYear]
Now you can use this variable in a data flow source to retrieve your partitioned data.
Edit:
A different solution using a for each loop container would be to retrieve your partition keys with a Execute SQL Task and saving that resultset in a SSIS variable of type Object:
SELECT YEAR(OrderDate) FROM transaction GROUP BY YEAR(OrderDate)
With a for each loop container you can loop through the object using the ADO enumerator and use the same method as above to inject the current year into your source SQL statement.

SSRS Performance

I Have created an SSRS Report for retrieving 55000 records using a Stored Procedure. When
executing from the Stored Proc it is taking just 3 seconds but when executing from SSRS report it is taking more than one minute. How can I solve this problem?
The additional time could be due to Reporting Services rendering the report in addition to querying the data. For example if you have 55,000 rows returned for the report and the report server then has to group, sort and/or filter those rows to render the report then that could take additional time.
I would have a look at the way the data is being grouped and filtered in the report, then review your stored procedure to see if you could offload some of that processing to the SQL code, maybe using some parameters. Try and aim to reduce the the amount of rows returned to the report to be the minimum needed to render the report and preferably try to avoid doing the grouping and filtering in the report itself.
I had such problem because of parameters sniffing in my SP. In SQL Management Studio when I run my SP, I recreated it with new execution plan (and call was very fast), but my reports used old bad plan (for another sequence of parameters) and load time was much longer than in SQL MS.
in the ReportServerDB you will find a table called ExecutionLog. you got to look up the catalog id of your report and check the latest execution instance. this can tell you the break-up of the times taken - for data retrieval, for processing, for rendering etc.
Use the SSRS Performance Dashboard reports to debug your issues.
Archaic question but because things like that are kinda recurring, my "quick and dirty" solution to improve SSRS, which works perfectly on large enterprise environments (I am rendering reports that can have up to 100.000+ lines daily) is to properly set the InteractiveSize of the page (for example, setting it to A4 size -21 cm ). When InteractiveSize is set to 0, then all results are going to be rendered as single page and this literally kills the performance of SSRS. In cases like that, queries that can take a few seconds on your DB can take forever to render (or cause an out of memory exception unless you have tons of redundant H/W on your SSRS server).
So, in cases of queries/ SP's that execute reasonably fast on direct call and retrieve large number of rows, set InteractiveSize and you won't need to bother with other, more sophisticated solutions.
I had a similar problem: a query that returns 4,000 rows and runs in 1 second on it's own was taking so long in SSRS that it timed out.
It turned out that the issue was caused by the way SSRS was handling a multi-valued parameter. Interestingly, if the user selected multiple values, the report rendered quickly (~1 second), but if only a single value was selected, the report took several minutes to render.
Here is the original query that was taking more than 100x longer to render than it should:
SELECT ...
FROM ...
WHERE filename IN (#file);
-- #file is an SSRS multi-value parameter passed directly to the query
I suspect the issue was that SSRS was bringing in the entire source table (over 1 million rows) and then performing a client-side filter.
To fix this, I ended up passing the parameter into the query through an expression, so that I could control the filter myself. That is, in the "DataSet Properties" window, on the "Parameters" screen, I replaced the parameter value with this expression:
=JOIN(Parameters!file.Value,",")
... (I also gave the result a new name: filelist) and then I updated the query to look like this:
SELECT ...
FROM ...
WHERE ',' + #filelist + ',' LIKE '%,' + FILENAME + ',%';
-- #filelist is passed to the query as the following expression:
-- =JOIN(Parameters!file.Value,",")
I would guess that moving the query to a stored procedure would also be an effective way to alleviate the problem (because SSRS basically does the same JOIN before passing a multi-value parameter to a stored procedure). But in my case it was a little simpler to handle it all within the report.
Finally, I should note that the LIKE operator is maybe not the most efficient way to filter on a list of items, but it's certainly much simpler to code than a split-string approach, and the report now runs in about a second, so splitting the string didn't seem worth the extra effort.
Obviously getting the report running correctly (i.e. taking the same order of magnitude of time to select the data as SSMS) would be preferable but as a work around, would your report support execution snapshots (i.e. no parameters, or parameter defaults stored in the report)?
This will allow a scheduled snapshot of the data to be retrieved and stored beforehand, meaning SSRS only needs to process and render the report when the user opens it. Should reduce the wait down to a few seconds (depending on what processing the report requires. YMMV, test to see if you get a performance improvement).
Go to the report's properties tab in Report manager, select Execution, change to Render this report from a report execution snapshot, specify your schedule.
The primary solution to speeding SSRS reports is to cache the reports. If one does this (either my preloading the cache at 7:30 am for instance) or caches the reports on-hit, one will find massive gains in load speed.
Please note that I do this daily and professionally and am not simply waxing poetic on SSRS
Caching in SSRS
http://msdn.microsoft.com/en-us/library/ms155927.aspx
Pre-loading the Cache
http://msdn.microsoft.com/en-us/library/ms155876.aspx
If you do not like initial reports taking long and your data is static i.e. a daily general ledger or the like, meaning the data is relatively static over the day, you may increase the cache life-span.
Finally, you may also opt for business managers to instead receive these reports via email subscriptions, which will send them a point in time Excel report which they may find easier and more systematic.
You can also use parameters in SSRS to allow for easy parsing by the user and faster queries. In the query builder type IN(#SSN) under the Filter column that you wish to parameterize, you will then find it created in the parameter folder just above data sources in the upper left of your BIDS GUI.
[If you do not see the data source section in SSRS, hit CTRL+ALT+D.
See a nearly identical question here: Performance Issuses with SSRS
Few things can be done to improve the performance of the report which are as below:
1. Enable caching on the report manager and set a time period to refresh the cache.
2. Apply indexing on all the backend database tables that are used as a source in the report, although your stored procedure is already taking very less time in rendering the data but still applying indexing can further improve the performance at the backend level.
3. Use shared datasets instead of using embedded datasets in the report and apply caching on all these datasets as well.
4. If possible, set the parameters to load default values.
5. Try to reduce the data that is selected by the stored procedure, e.g. if the report contains historical data which is of no use, a filter can be added to exclude that data.
I experienced the same issue. Query ran in SQL just fine but was slow as anything in SSRS. Are you using an SSRS parameter in your dataset? I've found that if you use the report parameter directly in certain queries, there is a huge performance hit.
Instead, if you have a report parameter called #reportParam, in the dataset, simply do the following:
declare #reportParamLocal int
set #reportParamLocal = #reportParam
select * from Table A where A.field = #reportParam
It's a little strange. I don't quite know why it works but it does for me.
One quick thing you may want to look at is whether elements on your report could be slowing down the execution.
For example i have found massive execution time differences when converting between dateTimes. Do any elements on report use the CDate function? If so you may want to consider doing your formatting at the sql level.
Conversions in general can cause a massive slow down so take the time to look into your dataset and see what may be the problem.
This is a bit of a mix of the answers above, but do your best to get the data back from your stored procedure in the simplest and most finished format. I do all my sorting, grouping and filtering up on the server. The server is built for this and I just let reporting services do the pretty display work.
I had the same issue ... it was indeed the rendering time but more specifically, it was because of the SORT being in SSRS. Try moving your sort to the stored procedure and removing any SORT from the SSRS report. On 55K rows, this will improve it dramatically.
Further to #RomanBadiornyi's answer, try adding
OPTION (RECOMPILE)
to the end of your main query in the stored procedure.
This will ensure the query is recompiled for different parameters each time, in case different parameters need a different execution plan.