I have a system in which data is written constantly. It works on MySQL, I also have a second system that runs on SQL Server and uses some parameters from the first base.
Question: how is it possible (is this even possible) to constantly transfer values from one base (MySQL) to another (SQL Server)? The option to switch to one base is not an option. As I understand it, it will be necessary to write a program for example in Delphi which will transfer values from the other database to another.
You have a number of options.
SQL Server can access another database using ODBC, so you could setup SQL server to obtain the information it needs directly from tables that are held in MySQL.
MySQL supports replication using log files, so you could configure MySQL replication (which does not have to be on all tables) to write relevant transactions to a log file. You would then need to process that log file (which you could do in (almost) real time as the standard MySQL replication does) to identify what needs to be written to the MS SQL Server. Typically this would produce a set of statements to be run against the MS SQL server. You have any number of languages you could use to process the log file and issue the updates.
You could have a scheduled task that reads the required parameters from MySQL and posts it to MS SQL, but this would leave a period of time where the two may not be in sync. Given that you may have an issue with parsing log files and posting the updates you may still want to implement this as a fall back if you are processing log files.
If the SQL Server and the MySQL server are on the same network the external tables method is likely to be simplest and lowest maintenance, but depending on the amount of data involved you may find the overhead of the external connection and queries could affect the overall performace of the queries made against the MS SQL Server.
Related
I have a Transaction database with 10,000+ entries inserted on daily basis.
My client's requirement is that we allow him to download reports from his own server, for this we make a same copy of Transaction database to his server.
But now problem is how do we move data at a specific time to his server which takes latest data entry?
There are at least a couple of options in SQL Server.
If you can connect to your customer's database, Change data capture with SSIS is one option. CDC collects all changes in a queryable store which SSIS then reads and pushes to your target. You can be as selective as you want on what to move over since you write the ETL process in SSIS. One downside to CDC is it's in enterprise edition only. See detailed instructions at https://technet.microsoft.com/en-us/library/bb895315(v=sql.105).aspx
Transactional replication is another option which is available in both enterprise and standard editions. This has been around along time and used by a lot of organizations to do exactly what you described - incrementally move data to another database. Not as flexible as CDC but you can still apply filters to what rows/columns get moved. Not needed enterprise edition is helpful for many customers. Lots of detail about the technology here https://msdn.microsoft.com/en-us/library/ms151198(v=sql.105).aspx but highly encourage you to check out Kendra Little's most excellent article that covers trans repl and compares it with CDC http://www.brentozar.com/archive/2013/09/transactional-replication-change-tracking-data-capture/
If you can't connect directly to the customer database, CDC with SSIS still works but the output target will be some flat file which then gets transferred to the customer and loaded using another SSIS package or some other bulk load job (TSQL, BCP, etc...). Do be careful with how the flat file gets moved since anybody can see its contents.
I'd avoid any manual methods like creating triggers or running some (usually expensive) query to find the changed rows. Apart from the maintenance efforts, you're very likely to encounter tough performance issues.
What is better for large, local data analysis; MS Access or SQL Server Express?
To paint the picture of my constraints/needs:
I do Cisco telephony analysis for a large corporation. I historically have imported data sets from TSQL into excel to manipulate the data locally. I do not have server space/rights to use our corporate SQL Servers for my work so everything must be done locally.
The analysis consists of merging several data sets together before beginning analysis. Each data set will typically contain 200k-900k records. Most analysis is adhoc and requirements change frequently.
Lately, my data sets have begun to exceed 1m rows and the Excel version I am supplied with is unable to support volume above 1.3m records. The processing time to merge several data sets this large is becoming excruciating. Simple functions like Index/Match take 15 minutes to complete.
I need to find a better way of performing analysis and cannot decide between MS Access and SQL Server Express.
My concern with Access is that it will not have the capacity for what I need and I am worried about database corruption.
My concern with SQL Server is that I am unsure of using it in this manner. I need to determine standard deviations, averages, counts, etc, based on aggregated data. I use SQL as an analyst (data retrieval) and have very little experience with creating/managing a SQL SQL Server database. I am also concerned with the creation time for adhoc reports. I am unsure if this is a valid concern.
Which one should I use in place of excel for my needs?
If I were in your position I would use SQL Server Express Edition to store the data and perform the more complex data manipulation, and I would use Access with ODBC linked tables into the SQL Server database for "exploring", e.g.,
creating ad-hoc queries to get a better understanding of the data, with the option of exporting those queries to Excel for further crunching, and
building reports quickly without getting into the whole SQL Server Reporting Services (SSRS) thing.
I believe the maximum database size for Access is 1 GB. SQL Server Express is 10 GB. You'd want to use SQL Server for many other reasons as well. Access is an atavism - an evolutionary throwback.
I haven't got a lot of ETL experience but I haven't found the answer to my question either, although I guess it may be a no-brainer if you've worked with it. We're currently looking into creating a simple data warehouse (simple as in "copy most columns from most tables" and not OLAP-style) and it seems we're leaning towards SQL Server (2008) for a few reasons.
SSIS seems to be the tool for this kind of tasks when it comes to SQL Server, but I can't find anything about how it is affecting the source database cache, if at all, when loading data. Some of our installations are very sensitive performance-wise when it comes to having a usage-style-cache.
But if SSIS runs a "select *"-ish query and the cache is altered, then the performance for the users may degrade to unacceptable levels until it is rebuild from those queries again.
So my question is, does SSIS (or is there a way to avoid) affect the database cache when loading data from a SQL Server database?
Part of the problem is also that the source database could be both an Oracle or SQL Server database, so if there is a way to avoid the cache-affecting part for Oracle, that would be good input as well. (I guess the Attunity connector is the way to go?)
(Some additional info: We have considered plain files as well, but then export-import probably takes longer time than SSIS-transfer? I also guess change data capture is something we'll also look into, so if that is relevant to this question, feel free to include possible issues/benefits.)
Any other relevant suggestions are also welcome!
Thanks!
Tackling the SQL Server side:
First off, SSIS doesn't do anything special to avoid the buffer pool, or the plan cache.
Simple test (on a NON-production instance!):
Create a new SSIS package with a single connection manaager, and a single data flow containing one OLE DB Source, pointing to a table, similar to:
Clear the buffer pool, from SSMS: DBCC DROPCLEANBUFFERS
Verify that the cache has been cleared using the glorified dm_os_buffer_descriptors query at the top of this page: I get this:
Run the package
Re-run the query from step (2), and note that the data pages for the table (BOM_PIECE in my example) have been loaded into the cache:
Note that most SSIS components allow you to provide your own query, so if you have a way to avoid the buffer pool (I don't know that this is possible - I'd defer to someone who knows more about it), you could insert that into a query. So in the above example, instead of selecting Table or view in the OLE DB Source, you would select SQL command, or SQL command from variable if your command requires dynamic text.
Finally, I can imagine why you want to eliminate the cache load - but are you sure you want to do this? SQL Server is fairly good at managing memory, and what you're doing is swapping memory load for disk I/O load, which (depending on your use case) may have a negative impact on other users. This question has a discussion on SQL Server caching.
Read this article about Attunity regarding reading data from oracle
What do you mean "affect the database cache when loading data from a SQL Server database". SQL Server does not cache data, it caches execution plans. The fact that you are using SSIS wont affect your Server (other than the overhead of reading the data of course). Just use a propper transaction isolation level.
Also, read about the fast load property on SSIS components
About change data capture, I don't see how it can replace SSIS. You can use CDC to select the rows that will be loaded, but it wont do the loading for you.
I am working with a client who is syncing between SQL Server and MySQL containing the exact same schema and data. We want to centralize that data into one database. Other then performance and maintainability issues, what else is bad about the original design?
You can create a linked server instance in SQL Server, with the MySQL instance.
Despite being completely proprietary, one of the nice connectivity features offered in SQL Server is the ability to query other servers through a Linked Server. Essentially, a linked server is a method of directly querying another RDBMS; this often happens through the use of an ODBC driver installed on the server.
Refer This article : step-by-step process SQL Server Linked Server to MySQL.
Providing you grant the MySQL user you connect on behalf of proper permissions, you can write to the MySQL instance accouding to you. So you can update stored procedures to do an additional step to insert records into MySQL.
Much easier solution is to use commercial application - Omega Sync from Spectral Core
Omega Sync can compare and synchronize both database schema and table data. You can even synchronize data of heterogeneous databases (for example, compare your local SQL Server database with a MySQL replica on your web site - and synchronize all the differences in just a few minutes).
on the otherhand I think you've already mentioned what possible problems you may encounter when synchronizing 2 db at the same time aside from this two I think it would be the resources. since there are different RDBMS working for the application they would also have a separate resources for each, like when I update a particular record of a user it still needs to check on which resource does it really exist, but I love to hear more from other people out there this is really an interesting topic to discuss. ;)
I'm setting up a SQL Server 2008 server on a production server, which way is the best to backup this data? Should I use replication and then backup that server? Should I just use a simple command-line script and export the data? Which replication method should i use?
The server is going to be pretty loaded so I need an efficent method.
I have access to multiple computers that I can use.
A very simple yet good solution is to run a full backup using sqlcmd (formerly osql) locally, then copy the BAK file over the network to a NAS or other store. It's sub-optimal in terms of network/disk usage, but it's very safe because every backup is independent and given that the process is very simple it is also very robust.
Moreover, this even works in Express editions.
The "best" backup solutions depends upon your recovery criteria.
If you need immediate access to the data in the event of a failure, a three server database mirroring scenario (live, mirror and witness) would seem to fit - although your application may need to be adapted to make use of automatic failover. "Log shipping" may produce similar results (although without automatic failover, or need for a witness).
If, however, there's some wiggle room in the recovery time, regular scheduled backups of the database (e.g., via SQL Agent) and it's transaction logs will allow you to do point-in-time restores. The frequency of backups would be determined by database size, how frequently the data is updated, and how far you are willing to rollback the database in the event of complete failure (unless you can extract a transaction log backup out of a failed server, you can only recover to the latest backup)
If you're looking to simply rollback to known-good states after, say, user error, you can make use of database snapshots as a lightweight "backup" scenario - but these are useless in the event of server failure. They're near instantaneous to create, and only take up room when the data changed - but incur a slight performance overhead.
Of course, these aren't the only backup solutions, nor are they mutually exclusive - just the ones that came to mind.