SQL JOB to Update all tables - sql-server-2008

I am using Microsoft SQL Server 2008 R2.I have copied database A(myproduction database) to database B(Myreportin database) by creating SSIS package.Both databases are in same server.I want to run a job so that If any change(data modifications like inserting new rows or updating values of any row in any table) take place in database A that will also take place in my B database and sql job will run and acomplish the changing automatically.I don't want that in database B table will be dropped and recreated (as its not our business rule )instead only the change will take place.
Can any one help me please.Thanks in Advance.

I would suggest that you investigate using replication. Specifically, transactional replication if you need constant updates. Here's a bit from MSDN:
Transactional replication typically starts with a snapshot of the publication database objects and data. As soon as the initial snapshot is taken, subsequent data changes and schema modifications made at the Publisher are usually delivered to the Subscriber as they occur (in near real time). The data changes are applied to the Subscriber in the same order and within the same transaction boundaries as they occurred at the Publisher; therefore, within a publication, transactional consistency is guaranteed.
If you don't need constant updating (that comes at a price in performance, of course), you can consider the alternatives of merge replication or snapshot replication. Here's a page to start examining those alternatives.

Related

SSIS to insert non-matching data on non-linked server

This is regarding SQL Server 2008 R2 and SSIS.
I need to update dozens of history tables on one server with new data from production tables on another server.
The two servers are not, and will not be, linked.
Some of the history tables have 100's of millions of rows and some of the production tables have dozens of millions of rows.
I currently have a process in place for each table that uses the following data flow components:
OLEDB Source task to pull the appropriate production data.
Lookup task to check if the production data's key already exists in the history table and using the "Redirect to error output" -
Transfer the missing data to the OLEDB Destination history table.
The process is too slow for the large tables. There has to be a better way. Can someone help?
I know if the servers were linked a single set based query could accomplish the task easily and efficiently, but the servers are not linked.
Segment your problem into smaller problems. That's the only way you're going to solve this.
Let's examine the problems.
You're inserting and/or updating existing data. At a database level, rows are packed into pages. Rarely is it an exact fit and there's usually some amount of free space left in a page. When you update a row, pretend the Name field went from "bob" to "Robert Michael Stuckenschneider III". That row needs more room to live and while there's some room left on the page, there's not enough. Other rows might get shuffled down to the next page just to give this one some elbow room. That's going to cause lots of disk activity. Yes, it's inevitable given that you are adding more data but it's important to understand how your data is going to grow and ensure your database itself is ready for that growth. Maybe, you have some non-clustered indexes on a target table. Disabling/dropping them should improve insert/update performance. If you still have your database and log set to grow at 10% or 1MB or whatever the default values are, the storage engine is going to spend all of its time trying to grow files and won't have time to actually write data. Take away: ensure your system is poised to receive lots of data. Work with your DBA, LAN and SAN team(s)
You have tens of millions of rows in your OLTP system and hundreds of millions in your archive system. Starting with the OLTP data, you need to identify what does not exist in your historical system. Given your data volumes, I would plan for this package to have a hiccup in processing and needs to be "restartable." I would have a package that has a data flow with only the business keys selected from the OLTP that are used to make a match against the target table. Write those keys into a table that lives on the OLTP server (ToBeTransfered). Have a second package that uses a subset of those keys (N rows) joined back to the original table as the Source. It's wired directly to the Destination so no lookup required. That fat data row flows on over the network only one time. Then have an Execute SQL Task go in and delete the batch you just sent to the Archive server. This batching method can allow you to run the second package on multiple servers. The SSIS team describes it better in their paper: We loaded 1TB in 30 minutes
Ensure the Lookup is a Query of the form SELECT key1, key2 FROM MyTable Better yet, can you provide a filter to the lookup? WHERE ProcessingYear = 2013 as there's no need to waste cache on 2012 if the OLTP only contains 2013 data.
You might need to modify your PacketSize on your Connection Manager and have a network person set up Jumbo frames.
Look at your queries. Are you getting good plans? Are your tables over-indexed? Remember, each index is going to result in an increase in the number of writes performed. If you can dump them and recreate after the processing is completed, you'll think your SAN admins bought you some FusionIO drives. I know I did when I dropped 14 NC indexes from a billion row table that only had 10 total columns.
If you're still having performance issues, establish a theoretical baseline (under ideal conditions that will never occur in the real world, I can push 1GB from A to B in N units of time) and work your way from there to what your actual is. You must have a limiting factor (IO, CPU, Memory or Network). Find the culprit and throw more money at it or restructure the solution until it's no longer the lagging metric.
Step 1. Incremental bulk import of appropriate proudction data to new server.
Ref: Importing Data from a Single Client (or Stream) into a Non-Empty Table
http://msdn.microsoft.com/en-us/library/ms177445(v=sql.105).aspx
Step 2. Use Merge Statement to identify new/existing records and operate on them.
I realize that it will take a significant amount of disk space on the new server, but the process would run faster.

Copying data from PostgreSQL to MySQL

I currently have a PostgreSQL database, because one of the pieces of software we're using only supports this particular database engine. I then have a query which summarizes and splits the data from the app into a more useful format.
In my MySQL database, I have a table which contains an identical schema to the output of the query described above.
What I would like to develop is an hourly cron job which will run the query against the PostgreSQL database, then insert the results into the MySQL database. During the hour period, I don't expect to ever see more than 10,000 new rows (and that's a stretch) which would need to be transferred.
Both databases are on separate physical servers, continents apart from one another. The MySQL instance runs on Amazon RDS - so we don't have a lot of control over the machine itself. The PostgreSQL instance runs on a VM on one of our servers, giving us complete control.
The duplication is, unfortunately, necessary because the PostgreSQL database only acts as a collector for the information, while the MySQL database has an application running on it which needs the data. For simplicity, we're wanting to do the move/merge and delete from PostgreSQL hourly to keep things clean.
To be clear - I'm a network/sysadmin guy - not a DBA. I don't really understand all of the intricacies necessary in converting one format to the other. What I do know is that the data being transferred consists of 1xVARCHAR, 1xDATETIME and 6xBIGINT columns.
The closest guess I have for an approach is to use some scripting language to make the query, convert results into an internal data structure, then split it back out to MySQL again.
In doing so, are there any particular good or bad practices I should be wary of when writing the script? Or - any documentation that I should look at which might be useful for doing this kind of conversion? I've found plenty of scheduling jobs which look very manageable and well-documented, but the ongoing nature of this script (hourly run) seems less common and/or less documented.
Open to any suggestions.
Use the same database system on both ends and use replication
If your remote end was also PostgreSQL, you could use streaming replication with hot standby to keep the remote end in sync with the local one transparently and automatically.
If the local end and remote end were both MySQL, you could do something similar using MySQL's various replication features like binlog replication.
Sync using an external script
There's nothing wrong with using an external script. In fact, even if you use DBI-Link or similar (see below) you probably have to use an external script (or psql) from a cron job to initiate repliation, unless you're going to use PgAgent to do it.
Either accumulate rows in a queue table maintained by a trigger procedure, or make sure you can write a query that always reliably selects only the new rows. Then connect to the target database and INSERT the new rows.
If the rows to be copied are too big to comfortably fit in memory you can use a cursor and read the rows with FETCH, which can be helpful if the rows to be copied are too big to comfortably fit in memory.
I'd do the work in this order:
Connect to PostgreSQL
Connect to MySQL
Begin a PostgreSQL transaction
Begin a MySQL transaction. If your MySQL is using MyISAM, go and fix it now.
Read the rows from PostgreSQL, possibly via a cursor or with DELETE FROM queue_table RETURNING *
Insert them into MySQL
DELETE any rows from the queue table in PostgreSQL if you haven't already.
COMMIT the MySQL transaction.
If the MySQL COMMIT succeeded, COMMIT the PostgreSQL transaction. If it failed, ROLLBACK the PostgreSQL transaction and try the whole thing again.
The PostgreSQL COMMIT is incredibly unlikely to fail because it's a local database, but if you need perfect reliability you can use two-phase commit on the PostgreSQL side, where you:
PREPARE TRANSACTION in PostgreSQL
COMMIT in MySQL
then either COMMIT PREPARED or ROLLBACK PREPARED in PostgreSQL depending on the outcome of the MySQL commit.
This is likely too complicated for your needs, but is the only way to be totally sure the change happens on both databases or neither, never just one.
BTW, seriously, if your MySQL is using MyISAM table storage, you should probably remedy that. It's vulnerable to data loss on crash, and it can't be transactionally updated. Convert to InnoDB.
Use DBI-Link in PostgreSQL
Maybe it's because I'm comfortable with PostgreSQL, but I'd do this using a PostgreSQL function that used DBI-link via PL/Perlu to do the job.
When replication should take place, I'd run a PL/PgSQL or PL/Perl procedure that uses DBI-Link to connect to the MySQL database and insert the data in the queue table.
Many examples exist for DBI-Link, so I won't repeat them here. This is a common use case.
Use a trigger to queue changes and DBI-link to sync
If you only want to copy new rows and your table is append-only, you could write a trigger procedure that appends all newly INSERTed rows into a separate queue table with the same definition as the main table. When you want to sync, your sync procedure can then in a single transaction LOCK TABLE the_queue_table IN EXCLUSIVE MODE;, copy the data, and DELETE FROM the_queue_table;. This guarantees that no rows will be lost, though it only works for INSERT-only tables. Handling UPDATE and DELETE on the target table is possible, but much more complicated.
Add MySQL to PostgreSQL with a foreign data wrapper
Alternately, for PostgreSQL 9.1 and above, I might consider using the MySQL Foreign Data Wrapper, ODBC FDW or JDBC FDW to allow PostgreSQL to see the remote MySQL table as if it were a local table. Then I could just use a writable CTE to copy the data.
WITH moved_rows AS (
DELETE FROM queue_table RETURNING *
)
INSERT INTO mysql_table
SELECT * FROM moved_rows;
In short you have two scenarios:
1) Make destination pull the data from source into its own structure
2) Make source push out the data from its structure to destination
I'd rather try the second one, look around and find a way to create postgresql trigger or some special "virtual" table, or maybe pl/pgsql function - then instead of external script, you'll be able to execute the procedure by executing some query from cron, or possibly from inside postgres, there are some possibilities of operation scheduling.
I'd choose 2nd scenario, because postgres is much more flexible, and manipulating data some special, DIY ways - you will simply have more possibilities.
External script probably isn't a good solution, e.g. because you will need to treat binary data with special care, or convert dates&times from DATE to VARCHAR and then to DATE again. Inside external script, various text-stored data will be probably just strings, and you will need to quote it too.

Can I safely change data in a replicated SQL Server table on the target machine?

I replicate a table from one SQL Server 2008 instance to another which already works fine. I'd also like to set a field in the replicated table on the target instance of the SQL Servers, but I'm not quite sure whether this is allowed (even though it seems to work).
Reason behind this: Replication from server A to B, processing of rows on server B and then setting (e.g. a flag such as "processed") when the row has been processed. This information is not available on server A and can only be set on server B.
A more cumbersome way would involve a separate table on server B which would have to keep entries of IDs in the replicated table that have already been processed, but maybe this is not necessary?
With Transactional Replication, by default the Subscribers should be treated as read-only. This is because if data has been changed at a Subscriber - whether it be inserts, updated, or deletes - it can not only cause data consistency errors, but a reinitialization could wipe the Subscriber data out as article #pre_creation_cmd is set to drop by default.
If you're going to be updating data at the Subscribers then I would suggest using Updatable Subscriptions for Transactional Replication, Peer to Peer Replication, or Merge Replication.

Best way for incremental load in ssis

I am getting 600,000 rows daily from my source and I need to dump them into the SQL Server destination, which would be an incremental load.
Now, as the destination table size is likely to be increase day by day which would be the best approach for the incremental load. I have few options in my mind:
Lookup Task
Merge Join
SCD
etc..
Please suggest me the best option which will perform well in incremental load.
Look at Andy Leonard's excellent Stairway to Integration Services series or Todd McDermid's videos on how to use the free SSIS Dimension Merge SCD component Both will address how to do it right far better than I could enumerate in this box.
Merge join is a huge performance problem as it requires sorting of all records upfront and should not be used for this.
We process many multimillion record files daily and generally place them in a staging table and do a hash compare to the data in our Change data tracking tables to see if the data is different from what is on prod and then only load the new ones or ones that are different. Because we do the comparison outside of our production database, we have very little impact on prod becasue uinstead of checking millions of records against prod, we are only dealing with the 247 that it actually needs to have. In fact for our busiest server, all this processing happens on a separate server except for the last step that goes to prod.
if you only need to insert them, it doesnt actually matter.
if you need to check something like, if exists, update else insert, I suggest creating a oleDbSource where you query your 600.000 rows and check if they exist with a lookup task on the existing datasource. Since the existing datasource is (or tend to be) HUGE, be careful with the way you configure the caching mode. i would go with partial cache with some memory limit ordered by the ID you are looking up (this detais is very important based on the way the caching works)

Can I mirror changes in a database with homebrew code?

I have several mysql databases and tables that need to be "listened to". I need to know what data changes and send the changes to remote servers that have local mirrors of the database.
How can I mirror changes in the mysql databases? I was thinking of setting up mysql triggers that write all changes to another table. This table has the database name, table name, and all of the columns. I'd then write custom code to transfer the changes and install them periodically on the remote mirrors. Will this accomplish my need?
Your plan is 100% correct.
That extra table is called an "audit" or "history" table (there are subtle distinctions but you shouldn't much care - but you now have the "official" terms which you can use to do further research).
If the main table has columns A, B, C, then the audit would have 3 more: A, B, C, Operation, Changed_By, Change_DateTime (names are subject to your tastes and coding standards).
"Operation" column stores whether the change was an insert, delete, old value of update or new value of update (frequently it's 3 characters wide and the operations as "INS"/"DEL"/"U_D" and "U_I", but there are other approaches).
The data in the audit table is populated via a trigger on the main table.
Then make sure there's an index on Change_DateTime column.
And to find a list of changes, you keep track of when you last polled, and then simply do
SELECT * FROM Table_Audit WHERE Change_DateTime > 'LAST_POLL_TIME'
You can tell MySQL to create an incremental backup from a specific point in time. The data contains only the changes to the database since that time.
You have to turn on binary logging and then you can use the mysqlbinlog command to export the changes since a given timestamp. See the Point-in-Time (Incremental) Recovery section of the manual as well as the documentation for mysqlbinlog. Specifically, you will want the --start-datetime parameter.
Once you have the exported log in text format, you can execute it on another database instance.
As soon as you step outside the mechanisms of the DBMS to accomplish an inherently DB oriented task like mirroring, you've violated most of the properties of a DB that distinguish it from an ordinary file.
In particular, the mechanism you propose violates the atomicity, consistency, isolation, and durability that MySQL is built to ensure. For example, incomplete playback of the log on the mirrors will leave your mirrors in a state inconsistent with the parent DB. What you propose can only approximate mirroring, thus you should prefer DBMS intrinsic mechanisms unless you don't care if the mirrors accurately reflect the state of the parent.