reporting vs analysis with Star schema data warehouse - relational-database

I'm working on a BI project where i'll use pentaho.
My DW will be modelize as a star schema.
I know that for the analysis part we need to go from this star schema (relational DW) and design the cube thks to schema workbench for example. Thks to that, the analysis tool can do the multidimensional request
For the reporting part, does It also need to know about the cube, or can I just do normal request on the star schema relational DW?
Is it a good or bad thing?
thks for your help

Cubes are for OLAP, which is interactive analysis with Pivot Tables
Your reporting tool, assuming it's not OLAP, should just talk directly to the data warehouse.

Related

web based business intelligence tool using dc.js or olap for web application dashboard

I have a Mysql database. I want to perform multidimensional analyse on this database and build web-based dashboard.
I got little confused between using classic OLAP server like mondrian or SSAS or using dc.js =( d3 + crossfilter) which it provide very nice visualizations
can dc.js be considered as olap server and replace it? is there a way to combine both olap server and dc.js ?
the final objective is to build web application for browse the data in multidimensional way.
thanks for you help
dc.js and OLAP are not comparable.
dc.js takes care of the visualisation, but you need to provide the data (as json or csv), so it still needs something to extract/aggregate the data you need to visualise.
What you use to generate that data is specific to your case, a thin layer on the top of a database might be enough, or may be not and it would work better if you have a more complete datawarehouse (OLAP).
In any case, dc is great if you know what dimensions and graphs you want in your dashboard and can code it, but if you want to have something your users can use to build their own dashboard and queries, other solutions (eg metabase) are probably more adapted.

Which database can be used to store processed data from NLP engine

I am looking at taking unstructured data in the form of files, processing it and storing it in a database for retrieval.
The data will be in natural language and the queries to get information will also be in natural language.
Ex: the data could be "Roses are red" and the query could be "What is the color of a rose?"
I have looked at several nlp systems, focusing more on open-source information extraction and relation extraction system and the following seems apt and easy for quick start:
https://www.npmjs.com/package/mitie
This can give data in the form of (word,type) pairs. It also gives a relation as result of running the the processing (check the site example).
I want to know if sql is good database to save this information. For retrieving the information, I will need to convert the natural language query also to some kind of (word, meaning) pairs
and for using sql I will have to write a layer that converts natural language to sql queries.
Please suggest if there are any open source database that work well in this situation. I'm open to suggestions for databases that work with other open-source information extraction and relation extraction systems if not MITIE.
SQL wont be an appropriate choice for your problem. You can use NLP or rules to extract relationships and then store that relationship in a Triple Store or a Graph database. There are many good open source Graph Databases like Neo4j and Apache Titan. You can query Google for Triple-stores, I suppose Apache Jena should be a good choice. After storing your data you can query your graphs using any of the Graph Query Languages like Gremlin or Cypher etc. (like SQL). Note that the heart of your system would be a Knowledge Graph.
You may also setup a Lucene/Solr based Search System on your unstructured data which may help you with answering your queries in conjunction with Graph Databases. All of these (NLP, IR, Graph DB/Triplestores etc.) would coexist to solve your problem.
It would be like an ensemble. No silver bullets :) However to start with look at Graph DB's or Triple-stores.

Convert rdb to an ontology

I have a Microsoft Access relational database which I am trying to convert into an ontology for Activity Based Intelligence purposes. I am new to ontologies and was wondering how this would be done. Thanks!
Learn about ontologies, then have at it.
http://doc.utwente.nl/50826/1/thesis_Guizzardi.pdf

How to convert dimensional DB model to datamining friendly layout?

My problem is, I have a Dimensional Model DB NFL league. So we have Players, Teams, Leagues as the dimension tables and Match as the factual table relates these tables. For instance, if I need to query stats of a player in a particular match or a range of matches, it is very painstaking SQL query with lots of joins to convert machine readable ID based tables to human readable name based version. In addition, analysis of that data is also very painful. For being a solution, I suggest to transform that DB to Analysis friendly version. Again for example, Player table ll include players at each row with related stats and same for Teams as well.
The question is, is there any framework, method or schema that might guide me to design the analysis friendly DB layout. Also still the use of SQL is favorable or any non-sql DB is better for this problem?
I know it sounds very general question but I just want to hear some expertise about the topic. Therefore, any help, suggestion is very welcome.
I was in a team faced with a similar situation about 13 years ago. We used a tool called "PowerPlay", a Business Intelligence tool from Cognos. This tool was very friendly to the data analysts, with drill down capabilities, and all sorts of name based searching.
If I recall correctly (it's been a while), The BI tool stored the data in its own format (a data cube) but it had its own tool for automatically discovering the structure of an SQL based data source. That automatic tool was really struggling with the OLTP database, which was SQL (Oracle) and which was a real mess... a terrible relational design.
So what I ended up doing was building a star schema to collect and organize the same data, but more compatible with a multidimensional view of the data. I then built the ETL stuff to load the star from the OLTP database. The BI tool cut through the star schema like a hot knife through butter.
And the analysts didn't have to mess with ID fields at all.
It sounds like your starting place is like the star schema I had to build. So I would suggest that there are BI tools out there that you can lay on top of your star and that will provide precisely the kind of analyst friendly environment you are looking for. Cognos is only one of many vendors of BI tools.
A few caveats: If you go this way, you have to make an effort to make sure your name fields "make sense" if they are going to provide meaningful guidance to the analysts trying to drill down or search. Sometimes original data sources treat name fields as more or less meaningless stuff, where errors don't matter much. The same goes for column names. Column names that DBAs like are often gibberish to data analysts. You may also have to flatten any hierarchical groupings in your dimension tables, but you may have already done this. It depends on what your BI tool needs.
Hope this helps, even if it's a little generic.

Advice on Prototype for a Business Intelligence System

Our organisation lacks any data mining or analytical tools, so I'm trying to persuade them to implement a Business Intelligence solution using Microsoft SQL Server 2008 R2. They've asked for a prototype, so they can get a better idea of what Business Intelligence can do for them.
I'm assuming that the prototype will consist of -
A subset of data from a critical application
Integration Services (SSIS): Used to clean the data subset?
Analysis Services (SSAS): Used to create and maintain a dimensional model based on that data subset? Data Mining?
Reporting Services (SSRS): Used to create, maintain and update a 'dashboard' of reports.
I want to show how a Business Intelligence solution with data mining and analytic capabilities can help their organisation perform better.
As this is the first time I've done this, I'd value advice from other people on whether this prototype is realistic or not. And does anyone know of any easily-accessible real-life examples that I can show them?
my thoughts …
Don’t overestimate the size (in terms of time) of a new DWH project.
Start with something not very complex and well understood in terms of business rules.
The biggest problem we have had with new DWH projects/pilots (we are a DWH consultancy so have a number of clients) is getting management support of it. Often the DWH will be sponsored by someone in IT and there is no real board level support, so the project takes a long time to progress and it is difficult to get resources for it.
The best projects we have found are ones that have levels of support in three areas: Management (board level), IT and Business (preferably someone with good understanding of the business rules involved).
Have a look at Ralph Kimball’s Data Warehouse Toolkit which goes through different styles of DWH design for different industries. It is very good!
The tools I expect you would use (I’ve added a couple of technologies here)
SSIS (ETL tool) is used to Extract (from source systems) Transform (data into appropriate form the load) and Load (into Dim and Fact tables)
SSAS (OLAP tool) is used to create/process an OLAP cube. Warning: there is quite a large learning curve on this tool!!
SSRS (reporting tool) is used to create static and dynamic reports/dashboards.
MS Excel. There are free data mining models that can be added in and when connected to an OLAP cube which will allow very interesting DM to be performed.
Windows Sharepoint Services (WSS) (comes free with a Windows Server operating systems) to deploy your SSRS reports onto.
This is a good prototype scope in terms of technologies (if you are from the MS background), but the spread of technologies is very large and for one person coming in cold to them, this is perhaps unrealistic.
The most critical thing is to get the DWH star schema design correct (and there will be a number of solutions that are “correct” for any set of data), otherwise your OLAP cube design will be flawed and very difficult to build.
I would get a local DWH consultant to validate your design before you have built it. Make sure you give them a very tight scope of not changing much otherwise most consultants will “tinker” with things to make them “better”.
Good Luck!
It's been 2 years since the question was posted and of course, there has been updates in the world of business Intelligence. We now have couple of great tools for prototyping in the Microsoft Business Intelligence World:
- Power Query (Self Service ETL)
- Power Pivot.
Hope this helps someone just getting started w/ building prototypes.