We have a performance issue with the current transactional replication setup on sql server 2008.
When a new snapshot is created and the snapshot is applied to the subscriber, we see network utilization on the publisher and the distributor jump to 99%, and we are seeing disk queues going to 30
This is causing application timeouts.
Is there any way, we can throttle the replicated data that is being sent over?
Can we restrict the number of rows being replicated?
Are there any switches which can be set on/off to accomplish this?
Thanks!
You have an alternative to deal with this situation
While setting up transaction replication on a table that has millions of records
Initial snapshot would take time for the records to be delivered to subscriber
In SQL 2005 we have an option to create the tables on both transaction and publish server, populate dataset and setup replication on top of it
When you add subscription with command EXEC sp_addsubscription set The #sync_type = 'replication support only'.
Reference article http://www.mssqltips.com/tip.asp?tip=1117
Our DBA has forced us to break down dml code to run in batches of 50000 rows at a time with a couple of minutes in between. He plays with that batch size time to time but this way our replicating databases are ok.
For batching, everything has to go into temp tables, a new column (name it Ordinal) that does row_number(), and then a BatchID to be like Ordinal / 50000. Finally comes a loop to count BatchID and update target table batch by batch. Hard on devs, easier for DBAs and no need to pay more for infrastructure.
Related
I just joined my new office as database administrator. Here we are using SQL Server merge replication. It surprises me that 3 of major replication tables
Msmergre_contents
Msmergre_genhistory
Msmergre_tombstone
Size of Msmergre_contents grew up to 64GB & no of records about to 64 billion and this is happening due to None set as the Expiration Period for subscriptions.
Now I want to clean up this table. As we are using a Simple recovery model, when I wrote a delete query on this table, everything got stuck. I have no downtime to stop/pause replication process.
Can anyone help me out how to minimize its size or delete half of its data?
You should not be directly deleting from the Merge system tables, that is not supported.
Instead, the proper way to cleanup the metadata from the Merge system tables is to set your Subscription expiration to something other than None, the default is 14 days. Metadata cleanup will be run when the Merge Agent runs, it executes sp_mergemetadataretentioncleanup. More information on subscription expiration and metadata cleanup can be found in How Merge Replication Manages Subscription Expiration and Metadata Cleanup.
However, since you most likely have a lot of metadata that needs to be cleaned up, I would gradually reduce the retention period. An explanation of this approach can be found here:
https://blogs.technet.microsoft.com/claudia_silva/2009/06/22/replication-infinite-retention-period-causing-performance-issues/
Hope this helps.
This is regarding SQL Server 2008 R2 and SSIS.
I need to update dozens of history tables on one server with new data from production tables on another server.
The two servers are not, and will not be, linked.
Some of the history tables have 100's of millions of rows and some of the production tables have dozens of millions of rows.
I currently have a process in place for each table that uses the following data flow components:
OLEDB Source task to pull the appropriate production data.
Lookup task to check if the production data's key already exists in the history table and using the "Redirect to error output" -
Transfer the missing data to the OLEDB Destination history table.
The process is too slow for the large tables. There has to be a better way. Can someone help?
I know if the servers were linked a single set based query could accomplish the task easily and efficiently, but the servers are not linked.
Segment your problem into smaller problems. That's the only way you're going to solve this.
Let's examine the problems.
You're inserting and/or updating existing data. At a database level, rows are packed into pages. Rarely is it an exact fit and there's usually some amount of free space left in a page. When you update a row, pretend the Name field went from "bob" to "Robert Michael Stuckenschneider III". That row needs more room to live and while there's some room left on the page, there's not enough. Other rows might get shuffled down to the next page just to give this one some elbow room. That's going to cause lots of disk activity. Yes, it's inevitable given that you are adding more data but it's important to understand how your data is going to grow and ensure your database itself is ready for that growth. Maybe, you have some non-clustered indexes on a target table. Disabling/dropping them should improve insert/update performance. If you still have your database and log set to grow at 10% or 1MB or whatever the default values are, the storage engine is going to spend all of its time trying to grow files and won't have time to actually write data. Take away: ensure your system is poised to receive lots of data. Work with your DBA, LAN and SAN team(s)
You have tens of millions of rows in your OLTP system and hundreds of millions in your archive system. Starting with the OLTP data, you need to identify what does not exist in your historical system. Given your data volumes, I would plan for this package to have a hiccup in processing and needs to be "restartable." I would have a package that has a data flow with only the business keys selected from the OLTP that are used to make a match against the target table. Write those keys into a table that lives on the OLTP server (ToBeTransfered). Have a second package that uses a subset of those keys (N rows) joined back to the original table as the Source. It's wired directly to the Destination so no lookup required. That fat data row flows on over the network only one time. Then have an Execute SQL Task go in and delete the batch you just sent to the Archive server. This batching method can allow you to run the second package on multiple servers. The SSIS team describes it better in their paper: We loaded 1TB in 30 minutes
Ensure the Lookup is a Query of the form SELECT key1, key2 FROM MyTable Better yet, can you provide a filter to the lookup? WHERE ProcessingYear = 2013 as there's no need to waste cache on 2012 if the OLTP only contains 2013 data.
You might need to modify your PacketSize on your Connection Manager and have a network person set up Jumbo frames.
Look at your queries. Are you getting good plans? Are your tables over-indexed? Remember, each index is going to result in an increase in the number of writes performed. If you can dump them and recreate after the processing is completed, you'll think your SAN admins bought you some FusionIO drives. I know I did when I dropped 14 NC indexes from a billion row table that only had 10 total columns.
If you're still having performance issues, establish a theoretical baseline (under ideal conditions that will never occur in the real world, I can push 1GB from A to B in N units of time) and work your way from there to what your actual is. You must have a limiting factor (IO, CPU, Memory or Network). Find the culprit and throw more money at it or restructure the solution until it's no longer the lagging metric.
Step 1. Incremental bulk import of appropriate proudction data to new server.
Ref: Importing Data from a Single Client (or Stream) into a Non-Empty Table
http://msdn.microsoft.com/en-us/library/ms177445(v=sql.105).aspx
Step 2. Use Merge Statement to identify new/existing records and operate on them.
I realize that it will take a significant amount of disk space on the new server, but the process would run faster.
I have a table in SQL server database in which I am recording the latest activity time of users. Can somebody please confirm me that SQL server will automatically handle the scenario when multiple update requests received simultaneously for different users. I am expecting 25-50 concurrent update request on this table but each request is responsible for updating different rows in the table. Do i need something extra like connection pooling etc..?
Yes, Sql Server will handle this scenario.
It is a SGDB and it expects scenarios like this one.
When you insert/update/delete a row in Sql, sql will lock the table/row/page to garantee that you will be able to do what you want. This lock will be released when you are done inserting/updating/deleting the row.
Check this Link
And introduction-to-locking-in-sql-server
But there are a few thing you should do:
1 - Make sure you will do whatener you want fast. Because of the lock issue, if you stay connected for too long other requests to the same table may be locked until you are done and this can lead to a timeout.
2 - Always use a transaction.
3 - Make sure to adjust the fill factor of your indexes. Check Fill Factor on MSDN.
4 - Adjust the Isolation level according to what you want.
5 - Get rid of unused indexes to speed up your insert/update.
Connection pooling are not very related to your question. Connection pooling is a technique that avoid the extra overhead of creating new connections to the Database every time you send a request. In C# and other languages that uses ADO this is automatically done. Check this out: SQL Server Connection Pooling.
Other links that may be usefull:
best-practices-for-inserting-updating-large-amount-of-data-in-sql-2008
Speed Up Insert Performance
I am using Microsoft SQL Server 2008 R2.I have copied database A(myproduction database) to database B(Myreportin database) by creating SSIS package.Both databases are in same server.I want to run a job so that If any change(data modifications like inserting new rows or updating values of any row in any table) take place in database A that will also take place in my B database and sql job will run and acomplish the changing automatically.I don't want that in database B table will be dropped and recreated (as its not our business rule )instead only the change will take place.
Can any one help me please.Thanks in Advance.
I would suggest that you investigate using replication. Specifically, transactional replication if you need constant updates. Here's a bit from MSDN:
Transactional replication typically starts with a snapshot of the publication database objects and data. As soon as the initial snapshot is taken, subsequent data changes and schema modifications made at the Publisher are usually delivered to the Subscriber as they occur (in near real time). The data changes are applied to the Subscriber in the same order and within the same transaction boundaries as they occurred at the Publisher; therefore, within a publication, transactional consistency is guaranteed.
If you don't need constant updating (that comes at a price in performance, of course), you can consider the alternatives of merge replication or snapshot replication. Here's a page to start examining those alternatives.
Currently we have 3 slave databases,
but almost always there is one of them extremly slow than others(can be an hour after master database)
Has any one met similar problem?What may be the cause?
I'd guess some other process is running on the same host as the slow replica, and it's hogging resources.
Try running "top" (or use Nagios or Cactus or something) to monitor system performance on the three replica hosts and see if there are any trends you can observe. CPU utilization pegged by another process besides mysqld, or I/O constantly saturated, that sort of thing.
update: Read the following two articles by MySQL performance expert Peter Zaitsev:
Managing Slave Lag with MySQL Replication
Fighting MySQL Replication Lag
The author points out that replication is single-threaded, and the replica executes queries sequentially, instead of in parallel as they were executed on the master. So if you have a few replicated queries that are very long-running, they can "hold up the queue."
He suggests the remedy is to simplify long-running SQL queries so they run more quickly. For example:
If you have an UPDATE that affects millions of rows, break it up into multiple UPDATEs that act on a subset of the rows.
If you have complex SELECT statements incorporated into your UPDATE or INSERT queries, separate the SELECT into its own statement, generate a set of literal values in application code, and then run your UPDATE or INSERT on these. Of course the SELECT won't be replicated, the replica will only see the UPDATE/INSERT with literal values.
If you have a long-running batch job running, it could be blocking other updates from executing on the replica. You can put some sleeps into the batch job, or even write the batch job to check the replication lag at intervals and sleep if needed.
Are all the slave servers located in the same location? In my case, one of the slave server was located in another location and it was a network issue.