Refreshing a reporting database - ssis

We are currently having an OLTP sql server 2005 database for our project. We are planning to build a separate reporting database(de-normalized) so that we can take the load off from our OLTP DB. I'm not quite sure which is the best approach to sync these databases. We are not looking for a real-time system though. Is SSIS a good option? I'm completely new to SSIS, so not sure about the feasibility. Kindly provide your inputs.

Everyone has there own opinion of SSIS. But I have used it for years for datamarts and my current environment which is a full BI installation. I personally love its capabilities to move data and it still is holding the world record for moving 1.13 terabytes in under 30 minutes.
As for setup we use log shipping from our transactional DB to populate a 2nd box. Then use SSIS to de-normalize and warehouse the data. The community for SSIS is also very large and there are tons of free training and helpful resources online.

We build our data warehouse using SSIS from which we run reports. Its a big learning curve and the errors it throws aren't particularly useful, and it helps to be good at SQL, rather than treating it as a 'row by row transfer' - what I mean is you should be creating set based queries in sql command tasks rather than using lots of SSIS component and dataflow tasks.
Understand that every warehouse is difference and you need to decide how to do it best. This link may give you some good idea's.
How we implement ours (we have a postgres backend and use PGNP provider, and making use of linked servers could make your life easier ):
First of all you need to have a time-stamp column in each table so you can when it was last changed.
Then write a query that selects the data that has changed since you last ran the package (using an audit table would help) and get that data into a staging table. We run this as a dataflow task as (using postgres) we don't have any other choice, although you may be able to make use of a normal reference to another database (dbname.schemaname.tablename or somthing like that) or use a linked server query. Either way the idea is the same. You end up with data that has change since your query.
We then update (based on id) the data that already exists then insert the new data (by left joining the table to find out what doesn't already exist in the current warehouse).
So now we have one denormalised table that show in this case jobs per day. From this we calculate other tables based on aggregated values from this one.
Hope that helps, here are some good links that I found useful:
Choosing .Net or SSIS
SSIS Talk
Package Configurations
Improving the Performance of the Data Flow
Trnsformations
Custom Logging / Good Blog

Related

SQL Server & Active Directory

What’s the best practice for integrating SQL Server with Active Directory (AD)?
NB. I’m using SQL Server 2016
Crux of the issue: I'm using SSRS 2016 and have several reports that need to be filtered based on the user accessing the reports. Originally I created a table of users that would need to access the reports. Then in the report builder I passed the UserID as a parameter within the query so that the resulting dataset would be limited to the data the user needed to see.
The problem this created is that the User table would have to be maintained, and Active Directories are dynamic. Now that I have some time to develop a better option, I’d like to link the LDAP data with SQL Server.
I’m wondering what the best practice for doing this is.
One way I pursued this was through an SSIS package ADO.Net connection. Then convert the data. Then load it into a table. Then schedule a job to run the package however often I needed it. This was problematic because for whatever reason I couldn’t get the data conversion process to work.
The second way I’ve been approaching this is to create a linked server instance for the AD. My research has indicated that I’ll need to create a function that overcomes the string limitation of the xp_sprintf Function. Then leverage temp tables and loop through LDAP data to get around the 1000 record limitation from the AD. I've been able to accomplish all this.
At this point though, there appears to be some other issues.
This ultimately increases the code necessary in the views for my reports which may make it harder for other database users to update if & when the time comes. To the point that I'd need to abandon the views and create stored procedures for the reports to pull from.
This also increases transaction counts beyond the SQL Server to include LDAP every time a user accesses a report.
So to resolve that I could wrap the original query of the LDAP data to create a table and then create a job to run that stored procedure every so often.
Either option solves the problem of maintaining the users table which is good, but it isn't perfect because AD changes can take place at any time.
Which option is better here?
If the SSIS package is the better route, I’m curious as to why that is the better route. I’m not opposed to going back and figuring out what it is I’m missing on the SSIS package to make it work.
Are there additional options I should consider if I want to get the most up-to-date Active Directory listing?
Thanks.

Merge MySQL and SQL Server Data Sources for Reporting

We have data stored for our customers in MySQL (Web App) and other data stored in SQL Server (billing data) and now we have a need to report on this data inside our customer-facing application.
Does anyone have experience merging these two data sources? Is there an effective way to do this?
Are there existing solutions, preferably OSS, that can aggregate the data sources and allow them to be queried as though they were one (this would be ideal)?
Otherwise, without asking for the "best" solution, what is optimal in this situation? Should we merge the separate sources into one database nightly? This is the only thing I can think of off the bat, and am wondering (hoping) whether other, more elegant or robust solutions exist.
Ideally we'd be able to query the data in real-time, rather than working off of a daily upload or whatever.
if you want to write queries across the two db then you could link the MySQL to the SQL Server
- something like this
http://coresystems.ch/en/about-us/newsroom/category/blog/how-add-linked-server-connection-mysql-mssql/
If you don't mind using a third party reporting engine, then, you can give DBxtra a spin, it lets you combine different databases in one single query to produce a report, it even lets you do so graphically, so you don't have to write the query yourself.

Mutiple Tables import using single dataflow in ssis

I have 10 tables I am importing to another sql server database using SSIS.
Do I have to create 10 different Dataflow tasks or can I proceed with one Dataflow task and add the 10 tables to it?
I have tried to use a single dataflow task but it is only allowing for a single table.
Do all the source tables share one common schema?
Do all the destination tables share one common schema (which doesn't have to be the same as the common schema for the source tables)?
If the answer to both questions is "yes", then you can in fact write a single Data Flow Task (whose connection managers are parameterized) and put it in a Foreach Loop container.
If the answer to either (or both) of those questions is "no", then you'll have to have separate sources and destinations. You might want to investigate Business Intelligence Markup Language as a way to generate those data flows automatically, although it's probably overkill for "only" ten tables.
The answer depends upon you and your best practices and how many developers you will have working on projects at the same time.
It is entirely possible to put more than one set of tables in a single dataflow. You can simply add additional sources and destinations to your dataflow. However, this is almost never a good idea as it adds to the maintenance effort later in the lifecycle of your project. It makes it more difficult to find and debug errors. It makes the entire project more complex.
If you are working alone and you will be building and maintaining this project's full lifecycle by yourself, then by all means do whatever you feel most comfortable with.
If you are in a group that may all maintain this project, I would suggest that you, at a minimum, break out the dataflow to different tables into different dataflow tasks.
If you are in a larger group and for more flexibility in maintenance, I would suggest that each dataflow be broken out into a different package (assuming 2008 or below. I have not played with the 2012 project models yet, so won't comment on them here), so that each can be worked on by different developers simultaneously. (I would actually recommend coding this way even if you are the only one on the project, but that is just the style I have developed over my career.)

how to keep table data same in oracle and sql

I am trying to build a database in sql server that replicates exact data present in tables in oracle production database. The database in sql server will be used for reporting and for analysis. I want every new or updated data in oracle tables to be present in sql server tables in around 1 hour time span. Does sql server integration services helps on this? is there any tool that does this i.e. it makes sure that data present in oracle table and sql server table is always same( neglecting the 1 hour lag?)......
There are two things you could look into: replication and SSIS. SQL Server replication allows you to replicate data from Oracle to MSSQL so that would be one way to handle the data copy. On the other hand, if you plan on doing data transformations, mappings etc. then you might want to use SSIS because it's a full ETL tool.
One important question is how you can identify new data in Oracle, because that may determine at least the first part of your solution. And you then have to decide what transformations are necessary once you've copied the data into SQL Server; perhaps you will need to run some stored procedures to clean the data and put it into reporting tables. Since your reporting system is a different platform from the source, you will need to handle data type transformations at some point, whatever solution you choose.
Your question is quite general, and it isn't really possible to say what you should do without a lot more detail about your environment, your requirements, your resources and so on. I suggest that you try to break down your task into smaller ones, and then you should be able to ask more specific questions.

Migration strategies for SQL 2000 to SQL 2008

I've perused the threads here on migration from SQL 2000 to SQL 2008 but haven't really run into my question, so here we go with another one.
I'm building a strategy to move specific SQL 2000 databases to a new SQL 2008 R2 instance. My question comes with regards to the best method for transferring the schema and data. One way I know of is to do the quick 'n' dirty detach - copy - attach method, which should work so long as I've done my homework wrt compatibility and code and such.
What if, though, I wrote the schema and logins via script and then copied the data via SSIS? I'm thinking of trying that so I can more easily integrate some of my test cases into the package (error handling and whatnot). What would I be setting myself up for if I did this?
Since you are moving the data between servers or instances, I would recommend moving the data via data flows. If you don't expect to run the code more than once, then you can let the wizard generate your code for this move. However, when I did this once 2+ years ago, the wizard code generated combined execute sql tasks that combined many "create table" commands into one task and created a few data flow tasks that had multiple source and destinations in them to insert data in the destination. This was good to get up and running, but it was inadequate when I wanted to refresh the tables one more time after I modified the schema of the new target tables. If you expect to run the refresh more than once, then you may want to take the time to create the target schema first and then manually create the data flows.
Once you have moved the data, then you can enable full-text search on the new server. I don't believe you will need to have this enabled on your first load.
One reason I recommend against the detach-attach method for migration is that you bring all the dirty laundry from the 2000 database to the 2008 R2 database. If you had too lax security on the 2000 server or many ancient users that shouldn't exist, it could be easier to clean this up by starting from scratch. If you use the detach-attach method, then you have to worry about users.