If SSAS in MOLAP mode stores data, what is the application of SSIS and why do we need a Data Warehouse and the ETL process of SSIS?
I have a SQL Server OLTP database. I am using SSIS to transfer my SQL Server data from OLTP database to a Data Warehouse database that contains fact and dimension tables.
After that I want to create cubes using SSAS form Data Warehouse data.
I know that MOLAP stores data. Do I need any Data warehouse with Fact and Dimension tables?
Is not it better to avoid creating Data warehouse and create cubes directly from OLTP database?
This might be a candidate for "Too Broad" but I'll give it a go.
Why would I want to store my data 3 times?
I have my data in my OLTP (online, transaction processing system), why would I want to move that data into a completely new structure (data warehouse) and then move it again into an OLAP system?
Let's start simple. You have only one system of record and it's not amazingly busy. Maybe you can get away with an abstraction layer (views in the database or named queries in SSAS) and skip the data warehouse.
So, you build out your cubes, dimensions and people start using it and they love it.
"You know what'd be great? If we could correlate our Blats to the Foos and Bars we already have in there" Now you need to integrate your simple app with data from a completely unrelated app. Customer id 10 in your app is customer id {ECA67697-1200-49E2-BF00-7A13A549F57D} in the CRM app. Now what? You're going to need to present a single view of the Customer to your users or they will not use the tool.
Maybe you rule with an iron fist and say No, you can't have that data in the cube and your users go along with it.
"Do people's buying habits change after having a child?" We can't answer that because our application only stores the current version of a customer. Once they have a child, they've always had a child so you can't cleanly identify patterns before or after an event.
"What were our sales like last year" We can't answer that because we only keep a rolling 12 weeks of data in the app to make it manageable.
"The data in the cubes is stale, can you refresh it?" Egads, it's the middle of the day. The SSAS processing takes table locks and would essentially bring our app down until it's done processing.
Need I go on with these scenarios?
Summary
The data warehouse serves as an integration point for diverse systems. It has conformed dimensions (everyone's has a common definition for what a thing is). The data in the warehouse may exceed the lifetime of the data in the source systems. The business needs might drive the tracking of data that the source application does not support. The data in the DW supports business activities while your OLTP system supports itself.
SSIS is just a tool for moving data. There are plenty out there, some better, some worse.
So No, generally speaking, it is not better to avoid creating a DW and build your cubes based on your OLTP database.
Related
I am working on a project to create a data warehouse. I have been using a third party tool to create OLAP cubes, but the problem is it creates separate staging area for each OLAP cube, while most of my cubes share same source of data. The company decided to make a single data warehouse and then the cubes will source data from this warehouse.
I will be extracting data from different sources, and store them in a database(staging area), then I transform this data to appropriate tables for dimensions and facts and store them in a separate database called Data warehouse, and then I will create Individual cubes by sourcing data from data warehouse.
My concern here is can I make different databases for staging area and database warehouse though they will be on the same server?
Plus what about my data marts do I need to have all data marts into same warehouse or I can have them into different databases, I want to know here about logical and physical separations, and best practices.
It sounds like you've been prescribed something along the lines of the Corporate Information Factory from Bill Inmon's data warehouse solution.
http://www.inmoncif.com/library/cif/
Read EDW as your single data warehouse and Departmental Datamarts as your individual cubes.
You could normalise (and perform much of the transformation) as you load the first data warehouse and use this as your centralised data.
When loading your cubes you could choose from several methods of presenting the data for load. Staging to a new single database as you have described, creating views on top of the central data warehouse to read data from or creating a separate staging area for each cube.
Keep in mind the purpose of separate cubes may be to isolate departments from each other to enable rapid concise developments for individual departments and the purpose of the first central data warehouse may be to reconcile disparate data into one agreeable dataset before using for many reporting purposes.
Speak with the people who decided on your architecture to see what they had in mind or the problems they wanted to solve.
For designing a data warehouse.
First we should know the utility of data warehouse(means what type of report we are going to take from the system)
we need to chose schema (STAR or SNOWFLAK)
we need to create dimensions of data warehouse.
we need to create a fact table where all transactional data will be stored.
I would like to get some advice on our BI architecture, which is pretty complex to maintain.
I work in a e-shopping company, our prod works on a LAMP system (Linux Apache MySQL PHP).
Every night:
data from our prod DB (MySQL) are extracted with Talend then inserted in another MySQL database named DWH for data warehouse
data from this DWH are then extracted by Jedox/Palo to load OLAP cubes, which are used by Excel with a plugin for restitution
data from this DWH are also accessed by Access/Excel one-shot reports, but this is not working very fine
Each time we need to modify an existing workflow or create a new report, there is a lot of steps and different technologies to use, which leads us to a pretty complicated platform.
What can I do to simplify this process?
You should be able to load the Palo OLAP tables with Talend as part of the data warehousing load process using the Palo components provided which should cut down on doing ETL stuff in a separate way to your main ETL process.
Roberto Machetto's blog has some interesting tutorials on how to do this and I'm sure there's plenty more examples across the internet and of course you could ask here for any specific issues you're having.
Once all of your data is properly loaded into the data warehouse and any OLAP cubes then your users should be able to run any bespoke or otherwise queries against the data as it's stored. If you're seeing instances where users don't have access to the proper data for their analysis then that should be resolved in the initial data warehouse/OLAP cube load by proper understanding of dimensional modelling.
It's a little bit difficult give advices about what or not to do, it depends on your final target or objective. What I could recommend you is to separate your data into some stages before delivery your OLAP cubes.
For example, you could create facts and dimensions into this DW database that you have, so you could separate your data into subjects, what could make your reports building much easier, since anyone could group this data as needed. You could have a fact table just for sales, another for churn, another for new customers, and go on...
Try to take a look on fact and dimensions tables, or just dimensional modeling, it will make your daily work a lot of easier.
Some links:
http://en.wikipedia.org/wiki/Dimensional_modeling
http://www.learndatamodeling.com/ddm.php
http://pubapi.cryptsy.com/api.php?method=marketdatav2
I would like to synchronize market data on a continuous basis (e.g. cryptsy and other exchanges). I would like to show latest buy/sell price from the respective orders from these exchanges on a regular basis as a historical time series.
What backend database should I used to store and render or plot any parameter from the retrieved data as a historical timeseries data.
I'd suggest you look at a database tuned for handling time series data. The one that springs to mind is InfluxDB. This question has a more general take on time series databases.
I think it needs more detail about the requirement.
It just describe, "it needs sync time series data". What is scenario? what is data source and destination?
Option 1.
If it is just data synchronization issues between two data based, easiest solution is CouchDB NoSQL Series (CouchDB, CouchBase, Cloudant)
All they are based on CouchDB, anyway they provides data center level data replication feature (XCDR). So you can replicate the date to other couchDB in other data center or even in couchDB in mobile devices.
I hope it will be useful to u.
Option 2.
Other approach is Data Integration approach. You can sync data by using ETL batch job. Batch worker can copy data to destination periodically. It is most common way to replicate data to other destination. There are a lot of tools it supports ETL line Pentaho ETL, Spring Integration, Apache Camel.
If you provide me more detail scenario, i can help u in more detail
Enjoy
-Terry
I think mongoDB is a good choice. Here is why:
You can easily scale out, and thus be able to store tremendous amount of data. When using an according shard key, you might even be able to position the shards close to the exchange they follow in order to improve speed, if that should become a concern.
Replica sets offer automatic failover, which implicitly could be an issue
Using the TTL feature, data can be automatically deleted after their TTL, effectively creating a round robin database.
Both the aggregation and the map/reduce framework will be helpful
There are some free classes at MongoDB University which will prevent you to avoid the most common pitfalls
I'm working on SSRS.Actually I'm new to this.We have a OLTP database in which we have created stored procedure for each report.These stored procedures are used to create DataSet in BI solution to run the report.
Now we were asked to go through SSIS process ( ETL ) and Data Warehouse concept and all reports will now be running through these two approach.
So my doubt is:
1) As per my knowledge in SSIS , we have to create a new database and new tables for each report.Through packages (which include ETL process) we will insert all data into each tables and finally will fetch report data from these table directly.
This approach speed up data retrieval process because data is already calculated for every reports and do not need to design Data Warehouse.
Am I right?
2) Do we really need to run all reports through SSIS and Data Warehouse approach i.e. how can i judge which report need to run through SSIS and Data Warehouse approach OR can continue running report with OLTP system.
3) Any best article link for SSIS and Data warehouse concept
4) Do I have to first create SSIS packages before designing Data warehouse.
Thanks
1) I'm not sure you want a table per report. I guess you might end up with this if non of your reports used the same fields. When I hear data warehouse, I think dimensional model/star schema. The benefit of a star schema is that it simplifies the data model and reduces the amount of joins you might have to go through to get the data you need, optimizing for data retrieval.
2) The answer to this question depends on your goals. Many companies with a data warehouse try to do all non-real-time reporting out of their data warehouse or an ODS to reduce the load on the production OLTP system. If optimized reliability and speed of report delivery is the goal, then test query speeds, data integrity, and accuracy and decide if a data warehouse with ETL provides a better experience (and if that justifies the monitoring and maintenance required for a data warehouse).
3) For data warehouse concepts, try the Kimball Group. For SSIS, start with MSDN and make sure to visit the SSIS Package Essentials page.
4)You should design your data warehouse before you build SSIS packages. You might have to make a few tweaks as you get into the ETL process, but you generally know what you want to end up with (your DW design) and use SSIS to get the data to that desired end state.
I've got confused with warehousing process... I'm in a process of building a data mart but the part I don't really understand is related to cubes. I've read some tutorials about SSAS but I don't see how can I use this data in other applications. What I need is following:
A warehouse (data mart) that contains all data needed for analysis (drill down and aggregated data, like a daily revenue and YTD revenue)
A .NET web service that can take this data so that many different apps can use it
A part I don't understand are cubes. I see that many people use SSAS to build cubes. What are these cubes in SSAS? Are they objects? Are they tables where data is stored? How can my web service access the data from cubes?
Are there any alternatives to SSAS? Would it be practical to just build cubes in a data mart and load them during ETL process?
Cubes are preaggregated stores of data in a format to make reporting much more efficient than is possible in a Relational database store. In SSAS you have several choices for how your data is ultimately stored, but generally they are stored in files in the OS file system. They can be queried similarly to SQL (using a specialized query language called MDX) or by several other methods depending upon your version level. You can set up connections to the data for your web service using the appropriate drivers from Microsoft. I am unsure of what you are meaning by data mart. Are you referring to relational table in a star schema format? If so, these are generally precursors to the actual cube. You will not get as much benefit from a reporting standpoint by using these relational sources as you would from a cube (since a cube stores the aggregates of each node (or tuple) within the dimensional space defined by your star schema tables) To explain this, if I have a relational store (even in star schema format) and I want to get sales dollars for a particular location for a particular date, I have to run a query against a very large sales fact table and join the location and date dimesion tables (which may also be very large). If I want the same data from a cube, I define my cube filters and the datawarehouse query pulls that single tuple from the data and returns it much more quickly.
There are many alternatives to SSAS, but each would be a form of a cube if you are using a datawarehouse. If you have a large data set, a cube, properly designed will out perform a relational datamart for multidimensional queries.