I'm showing a table with only actuals from week x, while below the actuals i'm showing a graph with the same data but then over 13 periods (x-13)
Performance wise, would you use 2 datasets for each object? Or would you use 1 dataset and filter the table on the last week?
Thanks!
If the query is pretty intensive, it will obviously be quicker to call it once and filter in the report. Same is true if the connection is slow, or if the server performance is awful.
However, if the client machine runs like a dog (or the report server), then you might fare better calling 2 different queries.
Related
I'm working on my first SSRS report and I haven't been able to find general guidelines as to how to create reports. Specifically, I would like to know what the general approach is when aggregate data is needed on a report. For example, let's say I need to show the following in my report:
Pancakes ---34
Eggs----------56
Bacon--------73
I have a several more rows like the above that need to show aggregate data. I'm currently grouping the whole row by type and then on each cell I'm showing a count as follows: [Count(Status)].
My report is already taking 45+ seconds to run. Is it generally preferable to do aggregation like this in the query? Or does this depend on the amount of data being returned? Any pointers are greatly appreciated. Thanks!
As with all SQL answers: it depends.
But generally do your aggregation in SQL. SQL server is much better at performing aggregation than the report layer. Also bringing back less rows will reduce your data transfer and the amount of data which SSRS needs to process. Usually you would only want to do the aggregation at the report layer if there are other constraints which make doing it in the SQL query impossible or if doing so will make the report more difficult to maintain in the future. (There's certainly something to be said for sacrificing a bit of performance in the name of maintainability.) One case would be when you need to display all of the data and returning two datasets is either too complicated or actually slows down the performance of the report.
As a side note, if your report is taking 45+ seconds to run then likely your SQL is not optimized very well or your report is doing a lot of complicated calculations. The more work you can put back on the SQL server the better your performance will be. SQL Server is made for crunching numbers and doing aggregations so certainly let it do what it does best when you can.
YMMV, so always do performance testing for different methods to see what works best.
I am having problems understanding cubes and microcubes in BusinessObjects environment.
Although I have tried to find answers online, I did not find an answers that give overall explanations.
Beside description of the functionality, I would like to know where is cube and where is micro cube located: on server or in browser?
How many cubes/microcubes are there? One microcube per report or one micro cube per session, or sthg else?
Furthermore, can anyone explain the difference in aggregations on database level as opossed to aggregation on report level (when defining a measure, there are two possibilities - to define aggregation on report and/or aggregation level). Although there are some answers online, they are too general. Therefore, I would appreciate a simple explanation with an example.
And finaly, is it possible to color tables in data foundation layer? (since I have a lot of tables in a universe, it would be very helpful if I could color fact and dimensional tables).
It helps to understand the two-pass nature of querying traditional (non-OLAP) data in BO. When you refresh a report in BO, the report engine constructs a SQL statement based on the objects (results and conditions) that are specified in the query.
The SQL statement is sent to the database, and the report engine then retrieves the data that is returned from the database. This data becomes the microcube -- it is nothing more than a tabular representation of the data that was received from the database. If you could look at it visually, it would look identical to what you would get by running the SQL statement in a traditional SQL tool (such as TOAD or SQL*Plus).
The microcube then becomes the source for the report presentation. Your could, for example, create a table in a report by dragging in dimensions and measures from the object list. As you are dragging in these objects, the table will recalculate and redisplay as appropriate. This is all done by retrieving and calculating the data from the microcube, not from the source data. That is, as you are interacting with a report tab, you are not initiating a refresh from the database -- all of the calculation is done from the microcube.
For example, let's say you have created a new query with two dimensions (State, Region) and one measure (Sales). The SQL might look like this:
select state_nm,region_nm,sum(sales_amt)
from all_sales
group by state_nm,region_nm
Note that there is a sum() applied to sales_amt. This is causing the database to perform the initial aggregation on this field.
The microcube that is created after the refresh may look like:
AL North 30000
AL South 40000
AR North 5000
AR South 10000
Now you create a table in your report, and select just State and Sales. The report engine uses the data in the microcube to display the result:
AL 70000
AR 5000
In this case, the BO report engine has performed a sum() aggregation on Sales Amt. This is report-side aggregation.
Then you remove State. Again, the report engine aggregates the table from the microcube, not the database:
75000
The microcube is stored with the report file. If you are working with a WebI report via InfoView, then the microcube is loaded into the server's memory. When you save the report, a physical file is created on the server (with a .wid extension); the microcube is stored in that file. If you open the report again, the microcube is again loaded into the server's memory and is available for interaction.
If you are using Rich Client, then the same behavior applies, it's just using your workstation and local storage.
The term "cube" is generally used to describe data sources in an OLAP environment (SSAS or Essbase, for example). This is external to BO -- BO would be querying data from an OLAP cube.
Regarding the aggregation:
Database aggregation is performed by your RDBMS on the source data, before it is transferred to the client application (e.g. Web Intelligence). You can apply database aggregation by using a statement such as SUM() or COUNT() in the select statement of your measure (in the business layer of your universe). It only makes sense for measures objects, not dimensions.
Changes the data retrieved from the database
Can have a positive impact on performance due to a small dataset being returned
Leverages the database aggregate performance
.
Projection aggregation or report aggregation is the aggregation performed by the client application (e.g. Web Intelligence) after retrieving the data from the database, thus in-memory.
Happens on the fly as the dimensions change to which the measure is projected (hence projection aggregation)
The result set retrieved from the database remains the same
Regarding the table colours: have a look at the tutorial Apply color to tables that share the same information. This should show you how to configure the colours for the tables in the data foundation.
I am doing a large application where I am using column grouping in my reports. Unfortunately, the performance is pitifully slow, and my customer is complaining about it. As an example, if they run a report for a 24 hour period, it takes ~10 minutes to return (~800 display pages of data). If they run it for a month, it may never return!
The query itself for a 24 hour period returns in ~20 seconds. The balance of the time is pivoting and generating the report.
Do you have any suggestions as what I could do?
Thanks!
Check to make sure you are sorting on the report side as opposed to the query side. This can speed things up. Sorting groups or sorting by aggregate values is much simpler in the report than in the query and is frequently more efficient also.
Take a look at these tips for speeding up performance.
Troubleshooting Reports: Report Performance
Reporting Services has three stages in creating a report:
Data retrieval (executing the queries to return the dataset results)
Processing (grouping and aggregating the returned data according to
the report layout)
Rendering (generating the selected output, e.g.
HTML, Excel)
If your query returns data in ~20 seconds but the report takes 10 minutes to render, then the main issue is with the speed of the report processing (rendering is rarely a performance bottleneck), as you rightly assumed. The best way to improve performance is to off load as much of the aggregation as possible to the source database by re-writing your query to do the aggregating there. The database platform is usually much faster at aggregating data than Reporting Services. Ideally you want to return the minimum amount of data required for the report so that it has as little processing to do as possible.
If you have access to the ReportServer database, run this query to confirm where the bottleneck is:
SELECT ItemPath, Format, TimeStart, TimeEnd, TimeDataRetrieval, TimeProcessing, TimeRendering, [Status], ByteCount, [RowCount]
FROM ExecutionLog3
WHERE ItemPath LIKE '%My Report Name%'
TimeDataRetrieval, TimeProcessing and TimeRendering should give you a clear idea of where the problem lies. If the issue is with TimeProcessing then try and rewrite the query to reduce the data for the report and also review the possible design issues to see if any of them apply.
I don't know if this is the right place to ask question like this, but here it goes:
I have an intranet-like Rails 3 application managing about 20k users which are in nested-set (preordered tree - http://en.wikipedia.org/wiki/Nested_set_model).
Those users enter stats (data, just plain numeric values). Entered stats are assigned to category (we call it Pointer) and a week number.
Those data are further processed and computed to Results.
Some are computed from users activity + result from some other category... etc.
What user enters isn't always the same what he sees in reports.
Those computations can be very tricky, some categories have very specific formulae.
But the rest is just "give me sum of all entered values for this category for this user for this week/month/year".
Problem is that those stats needs also to be summed for a subset of users under selected user (so it will basically return sum of all values for all users under the user, including self).
This app is in production for 2 years and it is doing its job pretty well... but with more and more users it's also pretty slow when it comes to server-expensive reports, like "give me list of all users under myself and their statistics. One line for summed by their sub-group and one line for their personal stats"). Of course, users wants (and needs) their reports to be as actual as possible, 5 mins to reflect newly entered data is too much for them. And this specific report is their favorite :/
To stay realtime, we cannot do the high-intensive sqls directly... That would kill the server. So I'm computing them only once via background process and frontend just reads the results.
Those sqls are hard to optimize and I'm glad I've moved from this approach... (caching is not an option. See below.)
Current app goes like this:
frontend: when user enters new data, it is saved to simple mysql table, like [user_id, pointer_id, date, value] and there is also insert to the queue.
backend: then there is calc_daemon process, which every 5 seconds checks the queue for new "recompute requests". We pop the requests, determine what else needs to be recomputed along with it (pointers have dependencies... simplest case is: when you change week stats, we must recompute month and year stats...). It does this recomputation the easy way.. we select the data by customized per-pointer-different sqls generated by their classes.
those computed results are then written back to mysql, but to partitioned tables (one table per year). One line in this table is like [user_id, pointer_id, month_value, w1_value, w2_value, w3_value, w4_value]. This way, the tables have ~500k records (I've basically reduced 5x # of records).
when frontend needs those results it does simple sums on those partitioned data, with 2 joins (because of the nested set conds).
The problem is that those simple sqls with sums, group by and join-on-the-subtree can take like 200ms each... just for a few records.. and we need to run a lot of these sqls... I think they are optimized the best they can, according to explain... but they are just too hard for it.
So... The QUESTION:
Can I rewrite this to use Redis (or other fast key-value store) and see any benefit from it when I'm using Ruby and Rails? As I see it, if I'll rewrite it to use redis, I'll have to run much more queries against it than I have to with mysql, and then perform the sum in ruby manually... so the performance can be hurt considerably... I'm not really sure if I could write all the possible queries I have now with redis... Loading the users in rails and then doing something like "redis, give me sum for users 1,2,3,4,5..." doesn't seem like right idea... But maybe there is some feature in redis that could make this simpler?)...
Also the tree structure needs to be like nested set, i.e. it cannot have one entry in redis with list of all child-ids for some user (something like children_for_user_10: [1,2,3]) because the tree structure changes frequently... That's also the reason why I can't have those sums in those partitioned tables, because when the tree changes, I would have to recompute everything.. That's why I perform those sums realtime.)
Or would you suggest me to rewrite this app to different language (java?) and to compute the results in memory instead? :) (I've tried to do it SOA-way but it failed on that I end up one way or another with XXX megabytes of data in ruby... especially when generating the reports... and gc just kills it...) (and a side effect is that one generating report blocks the whole rails app :/ )
Suggestions are welcome.
Redis would be faster, it is an in-memory database, but can you fit all of that data in memory? Iterating over redis keys is not recommended, as noted in the comments, so I wouldn't use it to store the raw data. However, Redis is often used for storing the results of sums (e.g. logging counts of events), for example it has a fast INCR command.
I'm guessing that you would get sufficient speed improvement by using a stored procedure or a faster language than ruby (eg C-inline or Go) to do the recalculation. Are you doing group-by in the recalculation? Is it possible to change group-bys to code that orders the result-set and then manually checks when the 'group' changes. For example if you are looping by user and grouping by week inside the loop, change that to ordering by user and week and keep variables for the current and previous values of user and week, as well as variables for the sums.
This is assuming the bottleneck is the recalculation, you don't really mention which part is too slow.
We have a report that users can run that needs to select records from 5 different services. Right not, I am using UNION to combine all the tables in one query, but sometimes, it was just too much for the server and it crashed!
I optimized bits and pieces of the query (where's and table joins) and there haven't been any crashes since, but the report still takes a long time to load (ie the query is very slow).
The question is, will mysql perform faster and more optimally if I create 5 temp tables for the different service types, and then select from all of the temps? Or is there a different idea?
I could, of course, just use 5 separate selects and then combine them in the code (php). But I imagine this would cause the report to load even slower...
Any ideas?
Usually the limiting factor in speed is the database, not PHP. I'd suggest running seperate queries and let the PHP do the combining, see if that is faster. If you're not storing all data in arrays or doing other heavy processing, I suspect the PHP way is much faster.
(this was actually meant as a comment but don't have those rights yet..)