I'm trying to populate dimension tables on a regular basis and I've thought of two ways of getting distinct values for my dimension:
Using an Aggregate transformation and then using the "Group by" operation.
Using a Sort transformation while removing duplicates.
I'm not sure which one is better (more efficient), or which one is adopted more widely in the industry.
I tried to perform some tests using dummy data, but I can't quite get a solid answer.
P.S. Using SELECT DISTINCT from the source is not an option here.
My first choice would always be to correct this in my source query if possible. I realise that isn't always an option, but for the sake of completeness for future readers: I would first check whether I had a problem in my source query that was creating duplicates. Whenever a DISTINCT seems necessary, I first see whether there's actually a problem with the query that needs resolving.
My second choice would be a DISTINCT - if it were possible - because this is one of those cases where it will probably be quicker to resolve in SQL than in SSIS; but I realise that's not an option for you.
From that point, you're getting into a situation where you might need to try out the remaining options. Aside from using an Aggregate or Sort in SSIS, you could also dump the results into a staging table, and then have a separate data flow which does use a DISTINCT in its source query. Aggregate and Sort are both blocking transactions in SSIS so using a staging table might end up being faster - but which is fastest for you will depend on a number of factors including the nature of your data, and also the nature of your infrastructure. You might also want to keep in mind what else is running in parallel if you use the SSIS options, as they can be memory-hungry.
If your data is (or can be) sorted in your source or source query, then there's also a clever idea in the link below for creating "semi-blocking" versions of Aggregate and Sort using script tasks:
http://social.technet.microsoft.com/wiki/contents/articles/30703.ssis-implementing-a-faster-distinct-sort-or-aggregate-transformation.aspx
This is a simple question even though the title sounds complicated.
Let's say I'm storing data from a bunch of applications into one central database/ data warehouse. This is data at a pretty fine level -- say, daily summaries of various metrics.
HOWEVER, I know in the front-end I will be frequently displaying weekly and monthly aggregates of this data as well.
One idea would be to have scripting language do this for me after querying the SQL database - but that seems horribly inefficient, perhaps.
The second idea would be to have views in the database that represent business weeks and months -- this might be the best way to do it.
But my final idea is -- couldn't a SQL client simply run a query that aggregates all the daily data into weeks (or months) and store them in a separate table? The advantage of this is that it would reduce querying time of any user, since all the query work is done before a website or button is even loaded/ pushed. Even with a view, I guess that aggregation calculation would have to be done as soon as the view was queried.
The only downside to having the queries aggregated from the weeks/ months perhaps even once a day (instead of every time the website is loaded) -- is that it won't be up-to-date/ may reflect inconsistencies.
I'm not really an expert when it comes to this bigger picture stuff -- anyone have any thoughts? thanks
It depends on the user experience you're trying to create.
Is the user base expecting to watch monthly aggregates with one finger on the F5 key when watching this month's statistics? To cover this scenario, you might want to have a view with criteria that presents a window always relative to getdate(). Keeping in mind that with good indexing strategies and query design should mitigate the impact of this sort of approach to nearly nothing.
Is the user expecting informational data that doesn't include today's data? More performance might be seen out of a nightly job that does the aggregation into a new table.
Of all the scenarios, though, I would not recommend manual aggregation. Down that road are unexpected bugs and exceptions that can really be handled with a good SQL statement. Aggregates are a big part of all DBMSs', let their software handle that and work on the rest of your application.
What is the most efficient way to automate both creation and deployment of simple SSRS reports from one underlying query?
An example query might look like
SELECT Name, ID, Date FROM Errorlog
Query could contain quite a few columns and anywhere from 1 to 1 million rows.
The business purpose behind this question is that I have a sizable number of report queries that need to go out as SSRS reports. I also need the capacity to turn any query I write instantly (or within a matter of seconds) into a simple SSRS report. Unfortunately, doing it through BIDS manually (using toolbox items and creating datasets is cumbersome, slow and unnecessarily repetitive. The only thing I am concerned with is making sure interactive page height/width is zero (to allow scrolling) and that columns are autosized.
How would you accomplish this in a way that is smooth and repeatable?
Let me start by saying that I don't think SSRS will not be very good at this. Specifically on two points this may be troublesome.
First, the number of rows may become a problem. One million results is typically a bit much for reporting services 2008 (though it does depend on the context a bit), it's much better at displaying either aggregated data, or a limited number (up to a few thousand - though again: depending on context) of data rows.
Second, a dynamic number of columns being returned by the SQL side will be a problem. There's only two ways around this that I know of:
Have a denormalized data set with a fixed number of columns, and one or more columns that contain the grouping. Then use a matrix to generate columns dynamically in SSRS. This does have a considerable performance impact.
Generate the RDL dynamically. There's information on the schema to do this, and if you create a good starting point it's very possible. After generating the RDL you'll have to execute it - how to do that depends on your specific setup.
Bottom line is that I wouldn't recommend using SSRS for the task you describe. Consider other technologies that may be better up to this task, e.g. SSIS packages, or perhaps another custom made or third party tool?
If I were you, I'd utilize 'Access Data Projects' which have a wizard for creating report.. that is then easy to upsize to Reporting Services. Right-click IMPORT into a solution full of RDL, and it prompts for MS Access file.
You can easily make a couple of columns into a report using an Access wizard, and then upsize to SSRS.. I've done it hundreds upon hundreds of times like this.
please flog me if I haven't searched thoroughly enough.....
I am wondering what would be better for performance:
Collect, aggregate and sort my data using SQL (WHERE, Group by, Order By statements in dataset)
or
just collect the 'naked' data and group, sort and filter in the report. (Filters on dataset, parameters and aggregating in the report)
Would using stored procedures be beneficial to performance?
Greetings,
Henro
Well, SSRS is a tool to display results, it is optimized to do that, and though it can perform aggregations and filters and a lot more things, it doesn't means that its his primary goal, so its not optimized to do that. When you perform the aggregations, filters and data manipulation on the dataset, you are using the database engine for that, something that its optimized to do that, so you are most likely get better performance this way. As for stored procedures or plain SQL, there is no inherent performance benefits in either of one (I prefer plain SQL only because it gives me more flexibility).
In terms of performance, SQL Server is optimized for that sort of thing;
Under certain circumstances, a stored procedure can significantly increase performance as it precompiles the query plan... in this case, unless the report is being called quite often, I don't know if you'll notice the difference. I do prefer to keep my SQL out of the report, though.
I Have created an SSRS Report for retrieving 55000 records using a Stored Procedure. When
executing from the Stored Proc it is taking just 3 seconds but when executing from SSRS report it is taking more than one minute. How can I solve this problem?
The additional time could be due to Reporting Services rendering the report in addition to querying the data. For example if you have 55,000 rows returned for the report and the report server then has to group, sort and/or filter those rows to render the report then that could take additional time.
I would have a look at the way the data is being grouped and filtered in the report, then review your stored procedure to see if you could offload some of that processing to the SQL code, maybe using some parameters. Try and aim to reduce the the amount of rows returned to the report to be the minimum needed to render the report and preferably try to avoid doing the grouping and filtering in the report itself.
I had such problem because of parameters sniffing in my SP. In SQL Management Studio when I run my SP, I recreated it with new execution plan (and call was very fast), but my reports used old bad plan (for another sequence of parameters) and load time was much longer than in SQL MS.
in the ReportServerDB you will find a table called ExecutionLog. you got to look up the catalog id of your report and check the latest execution instance. this can tell you the break-up of the times taken - for data retrieval, for processing, for rendering etc.
Use the SSRS Performance Dashboard reports to debug your issues.
Archaic question but because things like that are kinda recurring, my "quick and dirty" solution to improve SSRS, which works perfectly on large enterprise environments (I am rendering reports that can have up to 100.000+ lines daily) is to properly set the InteractiveSize of the page (for example, setting it to A4 size -21 cm ). When InteractiveSize is set to 0, then all results are going to be rendered as single page and this literally kills the performance of SSRS. In cases like that, queries that can take a few seconds on your DB can take forever to render (or cause an out of memory exception unless you have tons of redundant H/W on your SSRS server).
So, in cases of queries/ SP's that execute reasonably fast on direct call and retrieve large number of rows, set InteractiveSize and you won't need to bother with other, more sophisticated solutions.
I had a similar problem: a query that returns 4,000 rows and runs in 1 second on it's own was taking so long in SSRS that it timed out.
It turned out that the issue was caused by the way SSRS was handling a multi-valued parameter. Interestingly, if the user selected multiple values, the report rendered quickly (~1 second), but if only a single value was selected, the report took several minutes to render.
Here is the original query that was taking more than 100x longer to render than it should:
SELECT ...
FROM ...
WHERE filename IN (#file);
-- #file is an SSRS multi-value parameter passed directly to the query
I suspect the issue was that SSRS was bringing in the entire source table (over 1 million rows) and then performing a client-side filter.
To fix this, I ended up passing the parameter into the query through an expression, so that I could control the filter myself. That is, in the "DataSet Properties" window, on the "Parameters" screen, I replaced the parameter value with this expression:
=JOIN(Parameters!file.Value,",")
... (I also gave the result a new name: filelist) and then I updated the query to look like this:
SELECT ...
FROM ...
WHERE ',' + #filelist + ',' LIKE '%,' + FILENAME + ',%';
-- #filelist is passed to the query as the following expression:
-- =JOIN(Parameters!file.Value,",")
I would guess that moving the query to a stored procedure would also be an effective way to alleviate the problem (because SSRS basically does the same JOIN before passing a multi-value parameter to a stored procedure). But in my case it was a little simpler to handle it all within the report.
Finally, I should note that the LIKE operator is maybe not the most efficient way to filter on a list of items, but it's certainly much simpler to code than a split-string approach, and the report now runs in about a second, so splitting the string didn't seem worth the extra effort.
Obviously getting the report running correctly (i.e. taking the same order of magnitude of time to select the data as SSMS) would be preferable but as a work around, would your report support execution snapshots (i.e. no parameters, or parameter defaults stored in the report)?
This will allow a scheduled snapshot of the data to be retrieved and stored beforehand, meaning SSRS only needs to process and render the report when the user opens it. Should reduce the wait down to a few seconds (depending on what processing the report requires. YMMV, test to see if you get a performance improvement).
Go to the report's properties tab in Report manager, select Execution, change to Render this report from a report execution snapshot, specify your schedule.
The primary solution to speeding SSRS reports is to cache the reports. If one does this (either my preloading the cache at 7:30 am for instance) or caches the reports on-hit, one will find massive gains in load speed.
Please note that I do this daily and professionally and am not simply waxing poetic on SSRS
Caching in SSRS
http://msdn.microsoft.com/en-us/library/ms155927.aspx
Pre-loading the Cache
http://msdn.microsoft.com/en-us/library/ms155876.aspx
If you do not like initial reports taking long and your data is static i.e. a daily general ledger or the like, meaning the data is relatively static over the day, you may increase the cache life-span.
Finally, you may also opt for business managers to instead receive these reports via email subscriptions, which will send them a point in time Excel report which they may find easier and more systematic.
You can also use parameters in SSRS to allow for easy parsing by the user and faster queries. In the query builder type IN(#SSN) under the Filter column that you wish to parameterize, you will then find it created in the parameter folder just above data sources in the upper left of your BIDS GUI.
[If you do not see the data source section in SSRS, hit CTRL+ALT+D.
See a nearly identical question here: Performance Issuses with SSRS
Few things can be done to improve the performance of the report which are as below:
1. Enable caching on the report manager and set a time period to refresh the cache.
2. Apply indexing on all the backend database tables that are used as a source in the report, although your stored procedure is already taking very less time in rendering the data but still applying indexing can further improve the performance at the backend level.
3. Use shared datasets instead of using embedded datasets in the report and apply caching on all these datasets as well.
4. If possible, set the parameters to load default values.
5. Try to reduce the data that is selected by the stored procedure, e.g. if the report contains historical data which is of no use, a filter can be added to exclude that data.
I experienced the same issue. Query ran in SQL just fine but was slow as anything in SSRS. Are you using an SSRS parameter in your dataset? I've found that if you use the report parameter directly in certain queries, there is a huge performance hit.
Instead, if you have a report parameter called #reportParam, in the dataset, simply do the following:
declare #reportParamLocal int
set #reportParamLocal = #reportParam
select * from Table A where A.field = #reportParam
It's a little strange. I don't quite know why it works but it does for me.
One quick thing you may want to look at is whether elements on your report could be slowing down the execution.
For example i have found massive execution time differences when converting between dateTimes. Do any elements on report use the CDate function? If so you may want to consider doing your formatting at the sql level.
Conversions in general can cause a massive slow down so take the time to look into your dataset and see what may be the problem.
This is a bit of a mix of the answers above, but do your best to get the data back from your stored procedure in the simplest and most finished format. I do all my sorting, grouping and filtering up on the server. The server is built for this and I just let reporting services do the pretty display work.
I had the same issue ... it was indeed the rendering time but more specifically, it was because of the SORT being in SSRS. Try moving your sort to the stored procedure and removing any SORT from the SSRS report. On 55K rows, this will improve it dramatically.
Further to #RomanBadiornyi's answer, try adding
OPTION (RECOMPILE)
to the end of your main query in the stored procedure.
This will ensure the query is recompiled for different parameters each time, in case different parameters need a different execution plan.