SQL Server 2008 on a separate machine is very slow - sql-server-2008

We have an application which gets the data from a MS SQL database residing on a server in the headquarter over an internet connection.
Now the problem is that the data is pretty large which makes the query to take large time to transfer the data from the head quarter to the client.
Does SQL provide anything method to speed up queries if your application is geographically separated from the database (e.g. zipping up the data before it starts to transfer) ?

Following quotes from a similar question might provide you with some pointers.
There is never a reason to return a
large result to start with. For those
particular scenarios that need large
amounts of data shipped to the client,
there are better solutions than T-SQL
queries (log shipping, service broker,
replication).
and
You have a couple options that I am
aware of:
Use a third party tool like SQLNitro.
Move to Windows Server 2008 and SQL Server 2008 where they have made
various TCP stack improvements, as
outlined here: Appendix A: A Closer
Look - Examining the Enhancements in
Windows Server 2008 and SQL Server
2008 That Enable Faster Performance

Performance is not only determined by the bandwidth, the latency is also a very strong factor here. Try to minimize the number of requests. You could combine some queries, to give two (or more) resultsets.

Related

Replication from MySQL to SQL Server

I have a system in which data is written constantly. It works on MySQL, I also have a second system that runs on SQL Server and uses some parameters from the first base.
Question: how is it possible (is this even possible) to constantly transfer values from one base (MySQL) to another (SQL Server)? The option to switch to one base is not an option. As I understand it, it will be necessary to write a program for example in Delphi which will transfer values from the other database to another.
You have a number of options.
SQL Server can access another database using ODBC, so you could setup SQL server to obtain the information it needs directly from tables that are held in MySQL.
MySQL supports replication using log files, so you could configure MySQL replication (which does not have to be on all tables) to write relevant transactions to a log file. You would then need to process that log file (which you could do in (almost) real time as the standard MySQL replication does) to identify what needs to be written to the MS SQL Server. Typically this would produce a set of statements to be run against the MS SQL server. You have any number of languages you could use to process the log file and issue the updates.
You could have a scheduled task that reads the required parameters from MySQL and posts it to MS SQL, but this would leave a period of time where the two may not be in sync. Given that you may have an issue with parsing log files and posting the updates you may still want to implement this as a fall back if you are processing log files.
If the SQL Server and the MySQL server are on the same network the external tables method is likely to be simplest and lowest maintenance, but depending on the amount of data involved you may find the overhead of the external connection and queries could affect the overall performace of the queries made against the MS SQL Server.

Large Local Data Analysis: SQL Server Express vs MS Access

What is better for large, local data analysis; MS Access or SQL Server Express?
To paint the picture of my constraints/needs:
I do Cisco telephony analysis for a large corporation. I historically have imported data sets from TSQL into excel to manipulate the data locally. I do not have server space/rights to use our corporate SQL Servers for my work so everything must be done locally.
The analysis consists of merging several data sets together before beginning analysis. Each data set will typically contain 200k-900k records. Most analysis is adhoc and requirements change frequently.
Lately, my data sets have begun to exceed 1m rows and the Excel version I am supplied with is unable to support volume above 1.3m records. The processing time to merge several data sets this large is becoming excruciating. Simple functions like Index/Match take 15 minutes to complete.
I need to find a better way of performing analysis and cannot decide between MS Access and SQL Server Express.
My concern with Access is that it will not have the capacity for what I need and I am worried about database corruption.
My concern with SQL Server is that I am unsure of using it in this manner. I need to determine standard deviations, averages, counts, etc, based on aggregated data. I use SQL as an analyst (data retrieval) and have very little experience with creating/managing a SQL SQL Server database. I am also concerned with the creation time for adhoc reports. I am unsure if this is a valid concern.
Which one should I use in place of excel for my needs?
If I were in your position I would use SQL Server Express Edition to store the data and perform the more complex data manipulation, and I would use Access with ODBC linked tables into the SQL Server database for "exploring", e.g.,
creating ad-hoc queries to get a better understanding of the data, with the option of exporting those queries to Excel for further crunching, and
building reports quickly without getting into the whole SQL Server Reporting Services (SSRS) thing.
I believe the maximum database size for Access is 1 GB. SQL Server Express is 10 GB. You'd want to use SQL Server for many other reasons as well. Access is an atavism - an evolutionary throwback.

Which are the RDBMS that minimize the server roundtrips? Which RDBMS are better (in this area) than MS SQL?

IMPORTANT NOTE: I recieved many answers and I thank you all. But all the answers are more comments than answers. My question is related on the number of roundtrips per RDBMS. An experienced person told me that MySQL has less roundtrips than Firebird. I would like that the answer stays in the same area. I agree that this is not the first thing to consider, there are many others (application design, network settings, protocol settings...), anyway I 'd like to recieve an answer to my question, not a comment. By the way I found the comments all very useful. Thanks.
When the latency is high ("when pinging the server takes time") the server roundtrips make the difference.
Now I don't want to focus on the roundtrips created in programming, but the roundtrips that occur "under the hood" in the DB engine+Protocol+DataAccessLayer.
I have been told that FireBird has more roundtrips than MySQL. But this is the only information I know.
I am currently supporting MS SQL but I'd like to change RDBMS, so to make a wise choice I would like to include also this point into "my RDBMS comparison feature matrix" to understand which is the best RDBMS to choose as an alternative to MS SQL.
So the bold sentence above would make me prefer MySQL to Firebird (for the roundtrips concept, not in general), but can anyone add informations?
And MS SQL where is it located? Is someone able to "rank" the roundtrip performance of the main RDBMS, or at least:
MS SQL, MySql, Postegresql, Firebird (I am not interested in Oracle since it is not free, and if I have to change I would change to a free RDBMS).
Anyway MySql (as mentioned several times on stackoverflow) has a not clear future and a not 100% free license. So my final choice will probably dall on PostgreSQL or Firebird.
Additional info:
somehow you can answer my question by making a simple list like:
MSSQL:3;
MySQL:1;
Firebird:2;
Postgresql:2
(where 1 is good, 2 average, 3 bad). Of course if you can post some links where the roundtrips per RDBMSs are compared it would be great
Update:
I use Delphi and I plan to use DevArt DAC (UNIDAC), so somehow the "same" Data Access component is used, so if there are significant roundtrip differences they are due to the different RDBMS used.
Further update:
I have a 2 tier application (inserting a middle tier is not an option), so by choosing a RDBMS that is optimized "roundtrip-side" I have a chance to further improve the performance of the application. This kind of "optimization" is like "buy a faster internet connection" or "put more memory on the server" or "upgrade the server CPUs". Anyway also those "optimizations" are important.
Why are you concentrating on roundtrips? Normally they shouldn't affect your performance unless you had a very slow and unreliable network. For example, the difference between ODBC and OLEDB drivers for any database is nearly an order of magnitude in favor of OLEDB.
If you go to either MySQL or Firebird using ODBC instead of OLEDB/ADO.NET drivers you incur an overhead several orders of magnituted greater than the roundtrips you might save.
How your application is coded and how and when data are accessed and transferred have a much greater impact in slow connection or high latency situations than the db network protocol itself. Some database protocols may be tuned to better work in uncommon scenarios, i.e. increasing or decreasing the data packet size.
You may also encounter slow down at the TCP/IP layer itself, which could require TCP/IP tuning as well.
Until v2.1, Firebird certainly creates more traffic than MS SQL Server. I have a friend which developed a MSSQL C/S application here in Brazil where the db is hosted in a datacenter. The client apps runs from many stores directly connecting on server over VPN/Internet using end-user broadband connections (1Mbps, mostly) for 5+ years and no trouble with it. The distances involved range from few hundred to thousands of kilometers from datacenter.
After v2.1, I can't figure out if this remains true, because I haven't made a fair comparison since and Firebird's remote protocol had been changed to optimize network traffic on slow connections. More on FirebirdSQL site.
Can't say on PostGres ou MySQL, since I didn't used any.
I can't give round trip details but i was in a very similar situation a while back when i was trying to find alternatives to MS SQL due to budgeting. myself and 4 others spent some time comparing MySQL, Postgres, and FireBird.
Having worked with MySQL for a long time we quickly ruled it out for most of our larger projects. The decision fell between Postgres and FireBird. One thing just starting off was the lack of popular support/documentation with FireBird in contrast to Postgres. Our bench tests always either had Postgres on top or on level with FireBird, never under. In terms of features; Postgres again answered our needs while FiredBird had us needing to come up with creative solutions.
Below is a feature comparison chart. i'll admit it is now a bit dated but still very helpful:
Here is also a long forum thread discussing the difference
Good luck!
Sometimes the "roundtrips" are also in the protocol or data access layer, not the "DB engine"
I will not rank the client-server DBMS's from the roundtrips side. There are a lot of options to make one DBMS the best (ask SQL Server to use the default cursor), and other the worse (create an Oracle cursor with nested datasets).
What you are looking for is, probably, the general approach, oriented on the trafic minimization and the independent work of a client from a server. That are the middle-tier data access libraries.
So, if your application is so sensitive to the trafic optimization, then look for such libraries like the DataAbstract, kbmMW or ThinDAC.

ETL mechanisms for MySQL to SQL Server over WAN

I’m looking for some feedback on mechanisms to batch data from MySQL Community Server 5.1.32 with an external host down to an internal SQL Server 05 Enterprise machine over VPN. The external box accumulates data throughout business hours (about 100Mb per day), which then needs to be transferred internationally across a WAN connection (quality not yet determined but it's not going to be super fast) to an internal corporate environment before some BI work is performed. This should just be change-sets making their way down each night.
I’m interested in thoughts on the ETL mechanisms people have successfully used in similar scenarios before. SSIS seems like a potential candidate; can anyone comment on the suitability for this scenario? Alternatively, other thoughts on how to do this in a cost-conscious way would be most appreciated. Thanks!
It depends on the use you have of the data received from the external machine.
If you must have the data for the calculations of the morning after or do not have confidence in your network, you would prefer to loose-couple the two systems and enable some message-queuing between them so that if something fails during the night like the DBs, the networks links, anything that would be a pain for you to recover, you can start every morning with some data.
If the data retrieval is not subject to a high degree of criticality, any solution is good :)
Regarding SSIS, it's just a great ETL framework (yes, there's a subtlety :)). But I don't see it as a part of the data transfer, rather in the ETL part when your data has been received or is still waiting in the message-queing system.
First, if you are going to do this, have a good way to easily see what has changed since the last time. Every field should have a last updatedate or a timestamp that changes when the record is updated (not sure if mysql has this). This is far better than comparing every single field.
If you had SQL Server in both locations I would recommend replication, is it possible to use SQL server instead of mySQL? If not then SSIS is your best bet.
In terms of actually getting your data from MySQL into SQL Server, you can use SSIS to import the data using a number of methods. One would be to connect directly to your MySQL source (via an OLEDB Connection or similar) or you could do a daily export from MySQL to a flat file and pick this up using a FTP Task. Once you have the data, SSIS can perform the required transforms before loading the processed data to SQL Server.

What are viable options for data synchronization and transformation in sql server 2008?

Our company needs to synchronize two sql server 2008 databases on two different servers.
The database schemas are about 50% different so transformation is needed.
Synchronization needs to be done in real time.
Synchronization needs to be bi-directional.
What are some good practices used for this purpose?
We have analyzed the following solutions and they didn't work for us
Microsoft Sync Framework. This option doesn't work because of the amount of time required to set up the framework (specifically the metadata tables/triggers/sprocs that the framework uses). It is also a newer framework so documentation/examples are scarce and the product might not be as stable as some other solutions.
SQL Server Integration Services. This solution has a learning curve and possible road blocks. It also seems to be too much for just this purpose.
Any help is greatly appreciated.
SQL Server Replication.
Create views in both databases to emulate the 50% of the other side that is different. These, along with the tables that still match will server as your publication source. For the tables that match, just set up two-way replication.
For the tables that do not match use the emulations views as the publication sources. Add a "SourceID" to their base tables to identify what server they were originally created on, and then setup replication filters to insure that no server ever receives a row that it originally created. Publish these views to the other server(s) one-way only. You may need to make these as indexed views in order for it to work (sorry, I can't remember).