Arithmetic overflow HOLAP/ROLAP but not MOLAP - ssis

I have a problem with processing mdx queries after create ROLAP/HOLAP cube, if I create MOLAP cube everything works fine (processing time about 0-2000 miliseconds), but when I change cube structure to ROLAP/HOLAP, my mdx queries invokes very long time (20min + and they never end), or then they (Rolap/Holap cubes) throws the arithmetic overflow error. In my data warehouse I have about 20 milions (for US bilions) records. I use Visual Studio 2013 Data Tools and Microsoft SQL Server 2014.
Here is error which I get:
arithmetic overflow error converting expression to data type int 22003
I will be very grateful for help!

MOLAP:
Data is preprocessed and stored in cube. So whenever an MDX query runs, it picks up aggregated data directly from cube. The complexity of MDX has no effect on the relational source of data underneath. Bottom line: It is fast and independent of data source(once the cube is processed).
ROLAP:
Whenever an MDX query runs, it is in turn converted to SQL, crudely speaking, in the background and data is fetched from the relational data source. Since data is fetched real-time, these are obviously slow(As slow as a standard optimized SQL query. It can never match up to MOLAP in terms of speed, unless the MDX is very simple one. Also, since it is closely tied with the relational database, it generally comes with limitations associated with the relational database beneath(data type overflow, being one of them). Bottom line: It is real time but comes at the cost of being slow and dependent on the underlying relational data source.
HOLAP
It tries to combine best of both worlds but still has the same limitations of ROLAP.
Bottom most line!! Thus your queries are slow and error prone if you happen to use ROLPA/HOLAP.

Related

Database (OLTP) and Reporting

I am working on a trading platform that has reporting as a big portion of its business.
The set up is the following:
SQL OLTP database (about 200 tables) – rather small in number of records. (20,000 records the biggest table – but keeps growing every week)
For reporting services, SQL views are being used to query the Live Transaction Database. Imagine the result set of the views a de-normalized one, in the spirit of a data warehouse approach. Then these data sets are passed to a third party Reporting platform (like Tableau, Power Bi or SiSense), which take these data sets and throws them into Cubes (probably some columnar structure, like mono db, hadoop, etc). From there the Reports are getting generated.
Current challenges.
The SQL views (about 8). Are huge and very hard to maintain. To give you an example, one of the views outputs 100 fields. But each of these fields are calculated fields with complicated CASE statements, nested IF statements, inline Functions, and what not, which makes this view as big as 700 lines of sql code. I inherited these from anther employee and now, sadly, I have to maintain them.
Because the data grows weekly by several hundreds records (through migration and transactions) and the number of fields in the views also grow (a few every week), the cube building takes longer and longer. To give you an example, few months ago we set up the cube re-built ever 10 minutes to refresh the data (which was taking 5 minutes). Currently takes 12-15 minutes to build, so we set it up every 30 minutes. As you can imagine, this will get worse as data and the number of fields keep growing; and we kind of need the data as current as possible.
The only good thing is that once the cube is built, the reports load fast because they are being pulled form the 3rd party platform, so no concerns here.
What I have in mind
I would like to get rid of the views so I could ease the process of maintenanace and also keep at minimum the duration of the cube re-built.
Options:
to build a data warehouse. And then build SSIS packages to populate this structure with the live transactional data. The de-normalized structure would probably look very similar the views mentioned above. The draw back here is that I don’t really feel like I simplify much, actualy adding one more layer, which is the data migration from the OLTP to the OLAP (datawarehouse). And I would still have to re-build the cube.
To turn the current views into SQL Indexed Views (materialized views), but in their current state, I simply cannot do it because of the agregate and inline functions used a lot across the entire view.
Another option I red about is to built a ODS (Operational Data Store – which would be a databse that would contain the necessary tables similar to the sql views I have now – and refresh it constantly) Maybe using triggers, or, Transaction logs? But I am not sure what involves to built such thing and how hard is to maintain.
Question:
What approach should I take?
Do any of the 3 above make any sense?
Of course, I am interested in other ideas or suggestions, as well.
Thank you!
from my experience your best approach will be 1. It is costly, but will give you better benefits . Create a ROLAP DWH(i recommend Kimball's "The data warehouse toolkit" for best practices and design patterns), if you have the oportunity use a columnar data store(like amazon redshift, or sap sybase iq) and all the case statements ,nested ifs and all operations that you mentioned, would be applied on ETL time, so in the ROLAP everything is precalculated and optimized to consumption. And dont forget about aplying indexes(depending on the relying technology you use) . Some database vendors already published "indexing best practices" for ROLAP so they will tell you which type of index aply depending of the type of table(dimension) and data type for example.

Where to Aggregate Using Microsoft Reporting Services?

I'm working on my first SSRS report and I haven't been able to find general guidelines as to how to create reports. Specifically, I would like to know what the general approach is when aggregate data is needed on a report. For example, let's say I need to show the following in my report:
Pancakes ---34
Eggs----------56
Bacon--------73
I have a several more rows like the above that need to show aggregate data. I'm currently grouping the whole row by type and then on each cell I'm showing a count as follows: [Count(Status)].
My report is already taking 45+ seconds to run. Is it generally preferable to do aggregation like this in the query? Or does this depend on the amount of data being returned? Any pointers are greatly appreciated. Thanks!
As with all SQL answers: it depends.
But generally do your aggregation in SQL. SQL server is much better at performing aggregation than the report layer. Also bringing back less rows will reduce your data transfer and the amount of data which SSRS needs to process. Usually you would only want to do the aggregation at the report layer if there are other constraints which make doing it in the SQL query impossible or if doing so will make the report more difficult to maintain in the future. (There's certainly something to be said for sacrificing a bit of performance in the name of maintainability.) One case would be when you need to display all of the data and returning two datasets is either too complicated or actually slows down the performance of the report.
As a side note, if your report is taking 45+ seconds to run then likely your SQL is not optimized very well or your report is doing a lot of complicated calculations. The more work you can put back on the SQL server the better your performance will be. SQL Server is made for crunching numbers and doing aggregations so certainly let it do what it does best when you can.
YMMV, so always do performance testing for different methods to see what works best.

Cube vs. Micro Cube in BusinessObjects

I am having problems understanding cubes and microcubes in BusinessObjects environment.
Although I have tried to find answers online, I did not find an answers that give overall explanations.
Beside description of the functionality, I would like to know where is cube and where is micro cube located: on server or in browser?
How many cubes/microcubes are there? One microcube per report or one micro cube per session, or sthg else?
Furthermore, can anyone explain the difference in aggregations on database level as opossed to aggregation on report level (when defining a measure, there are two possibilities - to define aggregation on report and/or aggregation level). Although there are some answers online, they are too general. Therefore, I would appreciate a simple explanation with an example.
And finaly, is it possible to color tables in data foundation layer? (since I have a lot of tables in a universe, it would be very helpful if I could color fact and dimensional tables).
It helps to understand the two-pass nature of querying traditional (non-OLAP) data in BO. When you refresh a report in BO, the report engine constructs a SQL statement based on the objects (results and conditions) that are specified in the query.
The SQL statement is sent to the database, and the report engine then retrieves the data that is returned from the database. This data becomes the microcube -- it is nothing more than a tabular representation of the data that was received from the database. If you could look at it visually, it would look identical to what you would get by running the SQL statement in a traditional SQL tool (such as TOAD or SQL*Plus).
The microcube then becomes the source for the report presentation. Your could, for example, create a table in a report by dragging in dimensions and measures from the object list. As you are dragging in these objects, the table will recalculate and redisplay as appropriate. This is all done by retrieving and calculating the data from the microcube, not from the source data. That is, as you are interacting with a report tab, you are not initiating a refresh from the database -- all of the calculation is done from the microcube.
For example, let's say you have created a new query with two dimensions (State, Region) and one measure (Sales). The SQL might look like this:
select state_nm,region_nm,sum(sales_amt)
from all_sales
group by state_nm,region_nm
Note that there is a sum() applied to sales_amt. This is causing the database to perform the initial aggregation on this field.
The microcube that is created after the refresh may look like:
AL North 30000
AL South 40000
AR North 5000
AR South 10000
Now you create a table in your report, and select just State and Sales. The report engine uses the data in the microcube to display the result:
AL 70000
AR 5000
In this case, the BO report engine has performed a sum() aggregation on Sales Amt. This is report-side aggregation.
Then you remove State. Again, the report engine aggregates the table from the microcube, not the database:
75000
The microcube is stored with the report file. If you are working with a WebI report via InfoView, then the microcube is loaded into the server's memory. When you save the report, a physical file is created on the server (with a .wid extension); the microcube is stored in that file. If you open the report again, the microcube is again loaded into the server's memory and is available for interaction.
If you are using Rich Client, then the same behavior applies, it's just using your workstation and local storage.
The term "cube" is generally used to describe data sources in an OLAP environment (SSAS or Essbase, for example). This is external to BO -- BO would be querying data from an OLAP cube.
Regarding the aggregation:
Database aggregation is performed by your RDBMS on the source data, before it is transferred to the client application (e.g. Web Intelligence). You can apply database aggregation by using a statement such as SUM() or COUNT() in the select statement of your measure (in the business layer of your universe). It only makes sense for measures objects, not dimensions.
Changes the data retrieved from the database
Can have a positive impact on performance due to a small dataset being returned
Leverages the database aggregate performance
.
Projection aggregation or report aggregation is the aggregation performed by the client application (e.g. Web Intelligence) after retrieving the data from the database, thus in-memory.
Happens on the fly as the dimensions change to which the measure is projected (hence projection aggregation)
The result set retrieved from the database remains the same
Regarding the table colours: have a look at the tutorial Apply color to tables that share the same information. This should show you how to configure the colours for the tables in the data foundation.

Looking for approach to limit TempDb space needed. Does a VIEW have any place here?

We have a process that executes the following SQL pattern on a very large amount of data:
INSERT INTO Target
SELECT
Col1,Col2,Col33,Col44, (...) Col30
,SUM(Col31)
,SUM(Col32)
FROM
SOURCE
GROUP BY
1,2,3,4 ...30
Because of the large numbers of rows in the Source table and the large number of columns in the Group by clause, this results in very high TEMPDB space usage and on one occasion we ran out of space.
Given that the Business requirements dictate grouping by an usual number of columns, we need to find a way to reduce the amount of TEMPDB space that we use without effecting the performance of the resulting Target table, which serves as our main reporting table.
We were thinking to get the the Reporting Months in the SourceTable and then create a CURSOR in which we individually loop and execute the above SQL only WHERE Source.ReportingMonth = #CurrentReportingMonth from the Cursor and do this until all of the ReportingMonths have been processed. Unfortunately, historical data is allowed to change and so all of the data for each month must be examined each time we process a monthly cycle. The data in each month is approximately teh same volume.
When we told our DBA that this was our intent, his response was "I think that is a very good start, however, if the resultant table is basically used for reporting and not futher aggregation, we would probably be better off just replacing the table with a view since there is very little aggregation actually performed."
My thought is that a view still results in the execution of the SQL and that by converting the SQL to a view, performance could be impacted because the SQL may be executed many times by reports that need the data that used to be stored permanently in a persistent physical table. In SQL Server, are their views that can be persisted for performance reasons to avoid having to execute the SQL multiple times? If we only have one process that runs monthly to populate the Target table, is there any advantage to turning the Target table into a view of the Source table?
Q1: Is our cursoring idea a reasonable approach to pursue to solve the TEMPDB space issue?
Q2: Does a VIEW have any place here?
Note: we also suggested that the users identify if there is data that is "old enough" to be archived.

SSRS Performance

I Have created an SSRS Report for retrieving 55000 records using a Stored Procedure. When
executing from the Stored Proc it is taking just 3 seconds but when executing from SSRS report it is taking more than one minute. How can I solve this problem?
The additional time could be due to Reporting Services rendering the report in addition to querying the data. For example if you have 55,000 rows returned for the report and the report server then has to group, sort and/or filter those rows to render the report then that could take additional time.
I would have a look at the way the data is being grouped and filtered in the report, then review your stored procedure to see if you could offload some of that processing to the SQL code, maybe using some parameters. Try and aim to reduce the the amount of rows returned to the report to be the minimum needed to render the report and preferably try to avoid doing the grouping and filtering in the report itself.
I had such problem because of parameters sniffing in my SP. In SQL Management Studio when I run my SP, I recreated it with new execution plan (and call was very fast), but my reports used old bad plan (for another sequence of parameters) and load time was much longer than in SQL MS.
in the ReportServerDB you will find a table called ExecutionLog. you got to look up the catalog id of your report and check the latest execution instance. this can tell you the break-up of the times taken - for data retrieval, for processing, for rendering etc.
Use the SSRS Performance Dashboard reports to debug your issues.
Archaic question but because things like that are kinda recurring, my "quick and dirty" solution to improve SSRS, which works perfectly on large enterprise environments (I am rendering reports that can have up to 100.000+ lines daily) is to properly set the InteractiveSize of the page (for example, setting it to A4 size -21 cm ). When InteractiveSize is set to 0, then all results are going to be rendered as single page and this literally kills the performance of SSRS. In cases like that, queries that can take a few seconds on your DB can take forever to render (or cause an out of memory exception unless you have tons of redundant H/W on your SSRS server).
So, in cases of queries/ SP's that execute reasonably fast on direct call and retrieve large number of rows, set InteractiveSize and you won't need to bother with other, more sophisticated solutions.
I had a similar problem: a query that returns 4,000 rows and runs in 1 second on it's own was taking so long in SSRS that it timed out.
It turned out that the issue was caused by the way SSRS was handling a multi-valued parameter. Interestingly, if the user selected multiple values, the report rendered quickly (~1 second), but if only a single value was selected, the report took several minutes to render.
Here is the original query that was taking more than 100x longer to render than it should:
SELECT ...
FROM ...
WHERE filename IN (#file);
-- #file is an SSRS multi-value parameter passed directly to the query
I suspect the issue was that SSRS was bringing in the entire source table (over 1 million rows) and then performing a client-side filter.
To fix this, I ended up passing the parameter into the query through an expression, so that I could control the filter myself. That is, in the "DataSet Properties" window, on the "Parameters" screen, I replaced the parameter value with this expression:
=JOIN(Parameters!file.Value,",")
... (I also gave the result a new name: filelist) and then I updated the query to look like this:
SELECT ...
FROM ...
WHERE ',' + #filelist + ',' LIKE '%,' + FILENAME + ',%';
-- #filelist is passed to the query as the following expression:
-- =JOIN(Parameters!file.Value,",")
I would guess that moving the query to a stored procedure would also be an effective way to alleviate the problem (because SSRS basically does the same JOIN before passing a multi-value parameter to a stored procedure). But in my case it was a little simpler to handle it all within the report.
Finally, I should note that the LIKE operator is maybe not the most efficient way to filter on a list of items, but it's certainly much simpler to code than a split-string approach, and the report now runs in about a second, so splitting the string didn't seem worth the extra effort.
Obviously getting the report running correctly (i.e. taking the same order of magnitude of time to select the data as SSMS) would be preferable but as a work around, would your report support execution snapshots (i.e. no parameters, or parameter defaults stored in the report)?
This will allow a scheduled snapshot of the data to be retrieved and stored beforehand, meaning SSRS only needs to process and render the report when the user opens it. Should reduce the wait down to a few seconds (depending on what processing the report requires. YMMV, test to see if you get a performance improvement).
Go to the report's properties tab in Report manager, select Execution, change to Render this report from a report execution snapshot, specify your schedule.
The primary solution to speeding SSRS reports is to cache the reports. If one does this (either my preloading the cache at 7:30 am for instance) or caches the reports on-hit, one will find massive gains in load speed.
Please note that I do this daily and professionally and am not simply waxing poetic on SSRS
Caching in SSRS
http://msdn.microsoft.com/en-us/library/ms155927.aspx
Pre-loading the Cache
http://msdn.microsoft.com/en-us/library/ms155876.aspx
If you do not like initial reports taking long and your data is static i.e. a daily general ledger or the like, meaning the data is relatively static over the day, you may increase the cache life-span.
Finally, you may also opt for business managers to instead receive these reports via email subscriptions, which will send them a point in time Excel report which they may find easier and more systematic.
You can also use parameters in SSRS to allow for easy parsing by the user and faster queries. In the query builder type IN(#SSN) under the Filter column that you wish to parameterize, you will then find it created in the parameter folder just above data sources in the upper left of your BIDS GUI.
[If you do not see the data source section in SSRS, hit CTRL+ALT+D.
See a nearly identical question here: Performance Issuses with SSRS
Few things can be done to improve the performance of the report which are as below:
1. Enable caching on the report manager and set a time period to refresh the cache.
2. Apply indexing on all the backend database tables that are used as a source in the report, although your stored procedure is already taking very less time in rendering the data but still applying indexing can further improve the performance at the backend level.
3. Use shared datasets instead of using embedded datasets in the report and apply caching on all these datasets as well.
4. If possible, set the parameters to load default values.
5. Try to reduce the data that is selected by the stored procedure, e.g. if the report contains historical data which is of no use, a filter can be added to exclude that data.
I experienced the same issue. Query ran in SQL just fine but was slow as anything in SSRS. Are you using an SSRS parameter in your dataset? I've found that if you use the report parameter directly in certain queries, there is a huge performance hit.
Instead, if you have a report parameter called #reportParam, in the dataset, simply do the following:
declare #reportParamLocal int
set #reportParamLocal = #reportParam
select * from Table A where A.field = #reportParam
It's a little strange. I don't quite know why it works but it does for me.
One quick thing you may want to look at is whether elements on your report could be slowing down the execution.
For example i have found massive execution time differences when converting between dateTimes. Do any elements on report use the CDate function? If so you may want to consider doing your formatting at the sql level.
Conversions in general can cause a massive slow down so take the time to look into your dataset and see what may be the problem.
This is a bit of a mix of the answers above, but do your best to get the data back from your stored procedure in the simplest and most finished format. I do all my sorting, grouping and filtering up on the server. The server is built for this and I just let reporting services do the pretty display work.
I had the same issue ... it was indeed the rendering time but more specifically, it was because of the SORT being in SSRS. Try moving your sort to the stored procedure and removing any SORT from the SSRS report. On 55K rows, this will improve it dramatically.
Further to #RomanBadiornyi's answer, try adding
OPTION (RECOMPILE)
to the end of your main query in the stored procedure.
This will ensure the query is recompiled for different parameters each time, in case different parameters need a different execution plan.