Cube vs. Micro Cube in BusinessObjects - business-objects

I am having problems understanding cubes and microcubes in BusinessObjects environment.
Although I have tried to find answers online, I did not find an answers that give overall explanations.
Beside description of the functionality, I would like to know where is cube and where is micro cube located: on server or in browser?
How many cubes/microcubes are there? One microcube per report or one micro cube per session, or sthg else?
Furthermore, can anyone explain the difference in aggregations on database level as opossed to aggregation on report level (when defining a measure, there are two possibilities - to define aggregation on report and/or aggregation level). Although there are some answers online, they are too general. Therefore, I would appreciate a simple explanation with an example.
And finaly, is it possible to color tables in data foundation layer? (since I have a lot of tables in a universe, it would be very helpful if I could color fact and dimensional tables).

It helps to understand the two-pass nature of querying traditional (non-OLAP) data in BO. When you refresh a report in BO, the report engine constructs a SQL statement based on the objects (results and conditions) that are specified in the query.
The SQL statement is sent to the database, and the report engine then retrieves the data that is returned from the database. This data becomes the microcube -- it is nothing more than a tabular representation of the data that was received from the database. If you could look at it visually, it would look identical to what you would get by running the SQL statement in a traditional SQL tool (such as TOAD or SQL*Plus).
The microcube then becomes the source for the report presentation. Your could, for example, create a table in a report by dragging in dimensions and measures from the object list. As you are dragging in these objects, the table will recalculate and redisplay as appropriate. This is all done by retrieving and calculating the data from the microcube, not from the source data. That is, as you are interacting with a report tab, you are not initiating a refresh from the database -- all of the calculation is done from the microcube.
For example, let's say you have created a new query with two dimensions (State, Region) and one measure (Sales). The SQL might look like this:
select state_nm,region_nm,sum(sales_amt)
from all_sales
group by state_nm,region_nm
Note that there is a sum() applied to sales_amt. This is causing the database to perform the initial aggregation on this field.
The microcube that is created after the refresh may look like:
AL North 30000
AL South 40000
AR North 5000
AR South 10000
Now you create a table in your report, and select just State and Sales. The report engine uses the data in the microcube to display the result:
AL 70000
AR 5000
In this case, the BO report engine has performed a sum() aggregation on Sales Amt. This is report-side aggregation.
Then you remove State. Again, the report engine aggregates the table from the microcube, not the database:
75000
The microcube is stored with the report file. If you are working with a WebI report via InfoView, then the microcube is loaded into the server's memory. When you save the report, a physical file is created on the server (with a .wid extension); the microcube is stored in that file. If you open the report again, the microcube is again loaded into the server's memory and is available for interaction.
If you are using Rich Client, then the same behavior applies, it's just using your workstation and local storage.
The term "cube" is generally used to describe data sources in an OLAP environment (SSAS or Essbase, for example). This is external to BO -- BO would be querying data from an OLAP cube.

Regarding the aggregation:
Database aggregation is performed by your RDBMS on the source data, before it is transferred to the client application (e.g. Web Intelligence). You can apply database aggregation by using a statement such as SUM() or COUNT() in the select statement of your measure (in the business layer of your universe). It only makes sense for measures objects, not dimensions.
Changes the data retrieved from the database
Can have a positive impact on performance due to a small dataset being returned
Leverages the database aggregate performance
.
Projection aggregation or report aggregation is the aggregation performed by the client application (e.g. Web Intelligence) after retrieving the data from the database, thus in-memory.
Happens on the fly as the dimensions change to which the measure is projected (hence projection aggregation)
The result set retrieved from the database remains the same
Regarding the table colours: have a look at the tutorial Apply color to tables that share the same information. This should show you how to configure the colours for the tables in the data foundation.

Related

Analyzing multiple Json with Tableau

I'm beginning to use Tableau and I have a project involving multiple website logs stored as JSON. I have one log for each day for about a month, each weighting about 500-600 Mb.
Is it possible to open (and join) multiple JSON files in Tableau? If yes, how ? I can load them in parallel, but not join them.
EDIT : I can load multiple JSON files and define their relationship, so this is OK. I still have the memory issue:
I'm am worried that by joining them all, I will not have enough memory to make it work. Are the loaded files stored in RAM of in an internal DB ?
What would be the best way to do this ? Should I merge all the JSON first, or load them in a database and use a connector to Tableau? If so, what could be a good choice of DB?
I'm aware some of these questions are opinion-based, but I have no clue about this and I really need some guideline to get started.
For this volume of data, you probably want to preprocess, filter, aggregate and index it ahead of time - either using a database, something like Parquet and Spark and/or Tableau extracts.
If you use extracts, you probably want to filter and aggregate them for specific purposes, just be aware if that that you aggregate the data when you make the extract, you need to be careful that any further aggregations you perform in the visualization are well defined. Additive functions like SUM(), MIN() and MAX() are safe. Sums of partial sums are still correct sums. But averages of averages and count distincts of count distincts often are not.
Tableau sends a query to the database and then renders a visualization based on the query result set. The volume of data returned depends on the query which depends on what you specify in Tableau. Tableau caches results, and you can also create an extract which serves as a persistent, potentially filtered and aggregated, cache. See this related stack overflow answer
For text files and extracts, Tableau loads them into memory via its Data Engine process today -- replaced by a new in-memory database called Hyper in the future. The concept is the same though, Tableau sends the data source a query which returns a result set. For data of the size you are talking about, you might want to test using some sort of database if it the volume exceeds what comfortably fits in memory.
The JSON driver is very convenient for exploring JSON data, and I would definitely start there. You can avoid an entire ETL step if that serves your needs. But at high volume of data, you might need to move to some sort of external data source to handle production loads. FYI, the UNION feature with Tableau's JSON driver is not (yet) available as of version 10.1.
I think the answer which nobody gave is that No, you cannot join two JSON files in Tableau. Please correct me if I'm wrong.
I believe we can join 2 JSON tables in Tableau.
First extract the column names from the JSON data as below--
select
get_json_object(JSON_column, '$.Attribute1') as Attribute1,
get_json_object(line, '$.Attribute2') as Attribute2
from table_name;
perform the above for the required tableau and join them.

Database (OLTP) and Reporting

I am working on a trading platform that has reporting as a big portion of its business.
The set up is the following:
SQL OLTP database (about 200 tables) – rather small in number of records. (20,000 records the biggest table – but keeps growing every week)
For reporting services, SQL views are being used to query the Live Transaction Database. Imagine the result set of the views a de-normalized one, in the spirit of a data warehouse approach. Then these data sets are passed to a third party Reporting platform (like Tableau, Power Bi or SiSense), which take these data sets and throws them into Cubes (probably some columnar structure, like mono db, hadoop, etc). From there the Reports are getting generated.
Current challenges.
The SQL views (about 8). Are huge and very hard to maintain. To give you an example, one of the views outputs 100 fields. But each of these fields are calculated fields with complicated CASE statements, nested IF statements, inline Functions, and what not, which makes this view as big as 700 lines of sql code. I inherited these from anther employee and now, sadly, I have to maintain them.
Because the data grows weekly by several hundreds records (through migration and transactions) and the number of fields in the views also grow (a few every week), the cube building takes longer and longer. To give you an example, few months ago we set up the cube re-built ever 10 minutes to refresh the data (which was taking 5 minutes). Currently takes 12-15 minutes to build, so we set it up every 30 minutes. As you can imagine, this will get worse as data and the number of fields keep growing; and we kind of need the data as current as possible.
The only good thing is that once the cube is built, the reports load fast because they are being pulled form the 3rd party platform, so no concerns here.
What I have in mind
I would like to get rid of the views so I could ease the process of maintenanace and also keep at minimum the duration of the cube re-built.
Options:
to build a data warehouse. And then build SSIS packages to populate this structure with the live transactional data. The de-normalized structure would probably look very similar the views mentioned above. The draw back here is that I don’t really feel like I simplify much, actualy adding one more layer, which is the data migration from the OLTP to the OLAP (datawarehouse). And I would still have to re-build the cube.
To turn the current views into SQL Indexed Views (materialized views), but in their current state, I simply cannot do it because of the agregate and inline functions used a lot across the entire view.
Another option I red about is to built a ODS (Operational Data Store – which would be a databse that would contain the necessary tables similar to the sql views I have now – and refresh it constantly) Maybe using triggers, or, Transaction logs? But I am not sure what involves to built such thing and how hard is to maintain.
Question:
What approach should I take?
Do any of the 3 above make any sense?
Of course, I am interested in other ideas or suggestions, as well.
Thank you!
from my experience your best approach will be 1. It is costly, but will give you better benefits . Create a ROLAP DWH(i recommend Kimball's "The data warehouse toolkit" for best practices and design patterns), if you have the oportunity use a columnar data store(like amazon redshift, or sap sybase iq) and all the case statements ,nested ifs and all operations that you mentioned, would be applied on ETL time, so in the ROLAP everything is precalculated and optimized to consumption. And dont forget about aplying indexes(depending on the relying technology you use) . Some database vendors already published "indexing best practices" for ROLAP so they will tell you which type of index aply depending of the type of table(dimension) and data type for example.

Arithmetic overflow HOLAP/ROLAP but not MOLAP

I have a problem with processing mdx queries after create ROLAP/HOLAP cube, if I create MOLAP cube everything works fine (processing time about 0-2000 miliseconds), but when I change cube structure to ROLAP/HOLAP, my mdx queries invokes very long time (20min + and they never end), or then they (Rolap/Holap cubes) throws the arithmetic overflow error. In my data warehouse I have about 20 milions (for US bilions) records. I use Visual Studio 2013 Data Tools and Microsoft SQL Server 2014.
Here is error which I get:
arithmetic overflow error converting expression to data type int 22003
I will be very grateful for help!
MOLAP:
Data is preprocessed and stored in cube. So whenever an MDX query runs, it picks up aggregated data directly from cube. The complexity of MDX has no effect on the relational source of data underneath. Bottom line: It is fast and independent of data source(once the cube is processed).
ROLAP:
Whenever an MDX query runs, it is in turn converted to SQL, crudely speaking, in the background and data is fetched from the relational data source. Since data is fetched real-time, these are obviously slow(As slow as a standard optimized SQL query. It can never match up to MOLAP in terms of speed, unless the MDX is very simple one. Also, since it is closely tied with the relational database, it generally comes with limitations associated with the relational database beneath(data type overflow, being one of them). Bottom line: It is real time but comes at the cost of being slow and dependent on the underlying relational data source.
HOLAP
It tries to combine best of both worlds but still has the same limitations of ROLAP.
Bottom most line!! Thus your queries are slow and error prone if you happen to use ROLPA/HOLAP.

Automate creation and deployment of SSRS report from single table query

What is the most efficient way to automate both creation and deployment of simple SSRS reports from one underlying query?
An example query might look like
SELECT Name, ID, Date FROM Errorlog
Query could contain quite a few columns and anywhere from 1 to 1 million rows.
The business purpose behind this question is that I have a sizable number of report queries that need to go out as SSRS reports. I also need the capacity to turn any query I write instantly (or within a matter of seconds) into a simple SSRS report. Unfortunately, doing it through BIDS manually (using toolbox items and creating datasets is cumbersome, slow and unnecessarily repetitive. The only thing I am concerned with is making sure interactive page height/width is zero (to allow scrolling) and that columns are autosized.
How would you accomplish this in a way that is smooth and repeatable?
Let me start by saying that I don't think SSRS will not be very good at this. Specifically on two points this may be troublesome.
First, the number of rows may become a problem. One million results is typically a bit much for reporting services 2008 (though it does depend on the context a bit), it's much better at displaying either aggregated data, or a limited number (up to a few thousand - though again: depending on context) of data rows.
Second, a dynamic number of columns being returned by the SQL side will be a problem. There's only two ways around this that I know of:
Have a denormalized data set with a fixed number of columns, and one or more columns that contain the grouping. Then use a matrix to generate columns dynamically in SSRS. This does have a considerable performance impact.
Generate the RDL dynamically. There's information on the schema to do this, and if you create a good starting point it's very possible. After generating the RDL you'll have to execute it - how to do that depends on your specific setup.
Bottom line is that I wouldn't recommend using SSRS for the task you describe. Consider other technologies that may be better up to this task, e.g. SSIS packages, or perhaps another custom made or third party tool?
If I were you, I'd utilize 'Access Data Projects' which have a wizard for creating report.. that is then easy to upsize to Reporting Services. Right-click IMPORT into a solution full of RDL, and it prompts for MS Access file.
You can easily make a couple of columns into a report using an Access wizard, and then upsize to SSRS.. I've done it hundreds upon hundreds of times like this.

SSRS Performance

I Have created an SSRS Report for retrieving 55000 records using a Stored Procedure. When
executing from the Stored Proc it is taking just 3 seconds but when executing from SSRS report it is taking more than one minute. How can I solve this problem?
The additional time could be due to Reporting Services rendering the report in addition to querying the data. For example if you have 55,000 rows returned for the report and the report server then has to group, sort and/or filter those rows to render the report then that could take additional time.
I would have a look at the way the data is being grouped and filtered in the report, then review your stored procedure to see if you could offload some of that processing to the SQL code, maybe using some parameters. Try and aim to reduce the the amount of rows returned to the report to be the minimum needed to render the report and preferably try to avoid doing the grouping and filtering in the report itself.
I had such problem because of parameters sniffing in my SP. In SQL Management Studio when I run my SP, I recreated it with new execution plan (and call was very fast), but my reports used old bad plan (for another sequence of parameters) and load time was much longer than in SQL MS.
in the ReportServerDB you will find a table called ExecutionLog. you got to look up the catalog id of your report and check the latest execution instance. this can tell you the break-up of the times taken - for data retrieval, for processing, for rendering etc.
Use the SSRS Performance Dashboard reports to debug your issues.
Archaic question but because things like that are kinda recurring, my "quick and dirty" solution to improve SSRS, which works perfectly on large enterprise environments (I am rendering reports that can have up to 100.000+ lines daily) is to properly set the InteractiveSize of the page (for example, setting it to A4 size -21 cm ). When InteractiveSize is set to 0, then all results are going to be rendered as single page and this literally kills the performance of SSRS. In cases like that, queries that can take a few seconds on your DB can take forever to render (or cause an out of memory exception unless you have tons of redundant H/W on your SSRS server).
So, in cases of queries/ SP's that execute reasonably fast on direct call and retrieve large number of rows, set InteractiveSize and you won't need to bother with other, more sophisticated solutions.
I had a similar problem: a query that returns 4,000 rows and runs in 1 second on it's own was taking so long in SSRS that it timed out.
It turned out that the issue was caused by the way SSRS was handling a multi-valued parameter. Interestingly, if the user selected multiple values, the report rendered quickly (~1 second), but if only a single value was selected, the report took several minutes to render.
Here is the original query that was taking more than 100x longer to render than it should:
SELECT ...
FROM ...
WHERE filename IN (#file);
-- #file is an SSRS multi-value parameter passed directly to the query
I suspect the issue was that SSRS was bringing in the entire source table (over 1 million rows) and then performing a client-side filter.
To fix this, I ended up passing the parameter into the query through an expression, so that I could control the filter myself. That is, in the "DataSet Properties" window, on the "Parameters" screen, I replaced the parameter value with this expression:
=JOIN(Parameters!file.Value,",")
... (I also gave the result a new name: filelist) and then I updated the query to look like this:
SELECT ...
FROM ...
WHERE ',' + #filelist + ',' LIKE '%,' + FILENAME + ',%';
-- #filelist is passed to the query as the following expression:
-- =JOIN(Parameters!file.Value,",")
I would guess that moving the query to a stored procedure would also be an effective way to alleviate the problem (because SSRS basically does the same JOIN before passing a multi-value parameter to a stored procedure). But in my case it was a little simpler to handle it all within the report.
Finally, I should note that the LIKE operator is maybe not the most efficient way to filter on a list of items, but it's certainly much simpler to code than a split-string approach, and the report now runs in about a second, so splitting the string didn't seem worth the extra effort.
Obviously getting the report running correctly (i.e. taking the same order of magnitude of time to select the data as SSMS) would be preferable but as a work around, would your report support execution snapshots (i.e. no parameters, or parameter defaults stored in the report)?
This will allow a scheduled snapshot of the data to be retrieved and stored beforehand, meaning SSRS only needs to process and render the report when the user opens it. Should reduce the wait down to a few seconds (depending on what processing the report requires. YMMV, test to see if you get a performance improvement).
Go to the report's properties tab in Report manager, select Execution, change to Render this report from a report execution snapshot, specify your schedule.
The primary solution to speeding SSRS reports is to cache the reports. If one does this (either my preloading the cache at 7:30 am for instance) or caches the reports on-hit, one will find massive gains in load speed.
Please note that I do this daily and professionally and am not simply waxing poetic on SSRS
Caching in SSRS
http://msdn.microsoft.com/en-us/library/ms155927.aspx
Pre-loading the Cache
http://msdn.microsoft.com/en-us/library/ms155876.aspx
If you do not like initial reports taking long and your data is static i.e. a daily general ledger or the like, meaning the data is relatively static over the day, you may increase the cache life-span.
Finally, you may also opt for business managers to instead receive these reports via email subscriptions, which will send them a point in time Excel report which they may find easier and more systematic.
You can also use parameters in SSRS to allow for easy parsing by the user and faster queries. In the query builder type IN(#SSN) under the Filter column that you wish to parameterize, you will then find it created in the parameter folder just above data sources in the upper left of your BIDS GUI.
[If you do not see the data source section in SSRS, hit CTRL+ALT+D.
See a nearly identical question here: Performance Issuses with SSRS
Few things can be done to improve the performance of the report which are as below:
1. Enable caching on the report manager and set a time period to refresh the cache.
2. Apply indexing on all the backend database tables that are used as a source in the report, although your stored procedure is already taking very less time in rendering the data but still applying indexing can further improve the performance at the backend level.
3. Use shared datasets instead of using embedded datasets in the report and apply caching on all these datasets as well.
4. If possible, set the parameters to load default values.
5. Try to reduce the data that is selected by the stored procedure, e.g. if the report contains historical data which is of no use, a filter can be added to exclude that data.
I experienced the same issue. Query ran in SQL just fine but was slow as anything in SSRS. Are you using an SSRS parameter in your dataset? I've found that if you use the report parameter directly in certain queries, there is a huge performance hit.
Instead, if you have a report parameter called #reportParam, in the dataset, simply do the following:
declare #reportParamLocal int
set #reportParamLocal = #reportParam
select * from Table A where A.field = #reportParam
It's a little strange. I don't quite know why it works but it does for me.
One quick thing you may want to look at is whether elements on your report could be slowing down the execution.
For example i have found massive execution time differences when converting between dateTimes. Do any elements on report use the CDate function? If so you may want to consider doing your formatting at the sql level.
Conversions in general can cause a massive slow down so take the time to look into your dataset and see what may be the problem.
This is a bit of a mix of the answers above, but do your best to get the data back from your stored procedure in the simplest and most finished format. I do all my sorting, grouping and filtering up on the server. The server is built for this and I just let reporting services do the pretty display work.
I had the same issue ... it was indeed the rendering time but more specifically, it was because of the SORT being in SSRS. Try moving your sort to the stored procedure and removing any SORT from the SSRS report. On 55K rows, this will improve it dramatically.
Further to #RomanBadiornyi's answer, try adding
OPTION (RECOMPILE)
to the end of your main query in the stored procedure.
This will ensure the query is recompiled for different parameters each time, in case different parameters need a different execution plan.