Does a SSIS load FROM SQL Server affect database cache? - sql-server-2008

I haven't got a lot of ETL experience but I haven't found the answer to my question either, although I guess it may be a no-brainer if you've worked with it. We're currently looking into creating a simple data warehouse (simple as in "copy most columns from most tables" and not OLAP-style) and it seems we're leaning towards SQL Server (2008) for a few reasons.
SSIS seems to be the tool for this kind of tasks when it comes to SQL Server, but I can't find anything about how it is affecting the source database cache, if at all, when loading data. Some of our installations are very sensitive performance-wise when it comes to having a usage-style-cache.
But if SSIS runs a "select *"-ish query and the cache is altered, then the performance for the users may degrade to unacceptable levels until it is rebuild from those queries again.
So my question is, does SSIS (or is there a way to avoid) affect the database cache when loading data from a SQL Server database?
Part of the problem is also that the source database could be both an Oracle or SQL Server database, so if there is a way to avoid the cache-affecting part for Oracle, that would be good input as well. (I guess the Attunity connector is the way to go?)
(Some additional info: We have considered plain files as well, but then export-import probably takes longer time than SSIS-transfer? I also guess change data capture is something we'll also look into, so if that is relevant to this question, feel free to include possible issues/benefits.)
Any other relevant suggestions are also welcome!
Thanks!

Tackling the SQL Server side:
First off, SSIS doesn't do anything special to avoid the buffer pool, or the plan cache.
Simple test (on a NON-production instance!):
Create a new SSIS package with a single connection manaager, and a single data flow containing one OLE DB Source, pointing to a table, similar to:
Clear the buffer pool, from SSMS: DBCC DROPCLEANBUFFERS
Verify that the cache has been cleared using the glorified dm_os_buffer_descriptors query at the top of this page: I get this:
Run the package
Re-run the query from step (2), and note that the data pages for the table (BOM_PIECE in my example) have been loaded into the cache:
Note that most SSIS components allow you to provide your own query, so if you have a way to avoid the buffer pool (I don't know that this is possible - I'd defer to someone who knows more about it), you could insert that into a query. So in the above example, instead of selecting Table or view in the OLE DB Source, you would select SQL command, or SQL command from variable if your command requires dynamic text.
Finally, I can imagine why you want to eliminate the cache load - but are you sure you want to do this? SQL Server is fairly good at managing memory, and what you're doing is swapping memory load for disk I/O load, which (depending on your use case) may have a negative impact on other users. This question has a discussion on SQL Server caching.

Read this article about Attunity regarding reading data from oracle
What do you mean "affect the database cache when loading data from a SQL Server database". SQL Server does not cache data, it caches execution plans. The fact that you are using SSIS wont affect your Server (other than the overhead of reading the data of course). Just use a propper transaction isolation level.
Also, read about the fast load property on SSIS components
About change data capture, I don't see how it can replace SSIS. You can use CDC to select the rows that will be loaded, but it wont do the loading for you.

Related

Replication from MySQL to SQL Server

I have a system in which data is written constantly. It works on MySQL, I also have a second system that runs on SQL Server and uses some parameters from the first base.
Question: how is it possible (is this even possible) to constantly transfer values from one base (MySQL) to another (SQL Server)? The option to switch to one base is not an option. As I understand it, it will be necessary to write a program for example in Delphi which will transfer values from the other database to another.
You have a number of options.
SQL Server can access another database using ODBC, so you could setup SQL server to obtain the information it needs directly from tables that are held in MySQL.
MySQL supports replication using log files, so you could configure MySQL replication (which does not have to be on all tables) to write relevant transactions to a log file. You would then need to process that log file (which you could do in (almost) real time as the standard MySQL replication does) to identify what needs to be written to the MS SQL Server. Typically this would produce a set of statements to be run against the MS SQL server. You have any number of languages you could use to process the log file and issue the updates.
You could have a scheduled task that reads the required parameters from MySQL and posts it to MS SQL, but this would leave a period of time where the two may not be in sync. Given that you may have an issue with parsing log files and posting the updates you may still want to implement this as a fall back if you are processing log files.
If the SQL Server and the MySQL server are on the same network the external tables method is likely to be simplest and lowest maintenance, but depending on the amount of data involved you may find the overhead of the external connection and queries could affect the overall performace of the queries made against the MS SQL Server.

MySQL DB replication hook to clean local cache

I have the app a MySQL DB is a slave for other remote Master DB. And i use memcache to do caching of some DB data.
My slave DB can be updated if there are updates in a Master DB. So in my application i want to know when my local (slave) DB is updated to invalidate related cached data and display fresh data i got from master.
Is there any way to run some program when slave mysql DB is updated ? i would then filter q query and understand if i need to clean a cache or not.
Thanks
First of all you are looking for solution similar to what Facebook did in their db architecture (As I remember they patched MySQL for this).
You can build your own solution based on one of these techniques:
Parse replication log on slave side, remove cache entry when you see update of data in the log
Load UDF (user defined function) for memcached, attach trigger on replica side (it will call UDF remove function) to interested tables inside MySQL.
Please note that this configuration is complicated during the support and maintenance. If you can sacrifice stale data in the cache maybe small ttl will help you.
As Kirugan says, it's as simple as writing your own SQL parser, and ensuring that you also provide an indexed lookup keyed to the underlying data for anything you insert into the cache, then cross reference the datasets for any DML you apply to the database. Of course, this will be a lot simpler if you create a simplified, abstract syntax to represent the DML, but thereby losing the flexibilty of SQL and of course, having to re-implement any legacy code using your new syntax. Apart from fixing the existing code, it should only take a year or two to get this working right. Basing your syntax on MySQL's handler API rather than SQL will probably save a lot of pain later in the project.
Of course, if you need full cache consistency then you need to ensure that a logical transaction now spans all the relevant datacentres which will have something of an adverse impact on your performance (certainly much slower than just referencing the master directly).
For a company like facebook, with hundreds of thousands of servers and terrabytes of data (and no requirement for cache consistency) such an approach to solving the problem leads to massive savings. If you only have 2 servers, a better solution would be to switch to multi-master replication, possibly add another database node, optimize the storage (e.g. switching to ssds / adding fast bcache) make sure you have session affinity to the dbms from the aplication (but not stcky sessions) and spend some time tuning your dbms, particularly its cache performance.

Classic ASP + MSAccess extremely slow on IIS7.5

I migrated my classic asp site from IIS6 to a much powerful server with Windows Server 2008 R2 and IIS7.5, but it actually runs even slower.
Every simple call to the MSAccess database is taking forever. Many times the request is dropped because of Session timeout (120 seconds).
Any idea what can cause the problem and how to solve it?
Thank You.
Before blaming Access and moving to SQL Server Express or another database, you need to make sure you know where the slowdown occurs.
From what you are motioning, it looks like at least some of the queries don't even work (IIS times out after 120s).
Access databases, especially if they are accessed locally by a handful of concurrent users, are fast.
Moving to another database may or may not solve the problem, but it will be probably be a lot more work than solving your issue with your current Access database.
That said, if your website needs to server lots of concurrent users (say more than 50 at a time) you may need to look into moving to a full database server like MySQL, SQL Server Express or PostgreSQL for instance.
A few things to make sure you double check:
Corrupted database. Make sure you use Compact and Repair regularly as a regular maintenance measure (make a backup first).
Incorrect filesystem rights.
Make sure the your IIS process has read/write rights to the folder where the database is located so that it is able to create the lock file (.ldb or .laccdb depending on whether you are using .mdb or the new .accdb database format).
A related issue is that the IIS process must be able to create temporary files in the temporary folder, for instance %SystemDrive%\Windows\ServiceProfiles\NetworkService\AppData\Local\Temp.
Bad queries. Open the database with Access and run the queries to check how long they really take and if they return any errors.
If there are data integrity issues, it could be that the query returns unexpected results that could have strange side-effect to the code in your asp page.
Check your IIS logs for errors. Also check the OS Event Log.
Make sure there are no other errors that could incorrectly cause the behaviour.
Make sure you profile your asp code to find out exactly which queries and parts of your code are slow and which are fine.
Once you have solved your issues. Improve performance by keeping the database open to avoid the lock file being create/deleted all the time (this can have a huge impact on performance).
A good reference with more detailed information on some of the topics above: Using Classic ASP with Microsoft Access Databases on IIS

Mysql with Node.js: Does it make sense to have node.js save/load stuff to/from the database all the time?

So I have a small game in node.js(only the server of course) which has map data and player accounts stored in a mysql database. Right now I constructed it in a way that minimizes the amount of queries made by loading data from the database and keeping it in javascript objects/arrays or whatever seems appropriate and only writing to the database when needed.
Now I was thinking: Is this really worth it? In many cases it would be alot better(in terms of data would be more save and WAY more up-to-date) to hardly store data in the server and just loading it from the database when needed(respectively writing when it needs to be changed).
My question is: Is it efficient/save/recommendable to have the server read/write from the database often rather than having data from the database in javascript variables in the server?
Additional info:
-The nodejs server and my mysql server are on the same machine and a query usually takes less than 1ms or maybe 3ms for big queries like loading room data.
-I am using a module simply called mysql.
-If needed I will include extra info, just ask in a comment.
Really depends on your Use-Case. Generally speaking, I would not add another layer of caching in node.js but handle that in your db with a bigger cache and optimized queries.

MySQL and Hibernate Simultaneous read write

I have a web application which has the following parts:
Commentators continuously doing match commentary through a browser based tool. The comments are inserted into DB using hibernat.
Lots of users are accessing a URL to read commentary. Hibernate is reading data from the table being updated by commentators in step #1.
There are some stored procedures as well which are set to run every 1 hour. Few of them access the same table (used in step #1 and #2) for reading and writing/updating purpose.
Now my problem is, whenever the site has 100+ concurrent users watching a particular match commentary, my MySQL goes down. It shows lots of queries stuck in processlist. Many of them are in "Copying to temp table" state. This makes the JBOSS restart frequently.
I am using transactions in hibernate for both reading and writing purposes. Please help because I loose big matches because of these crashes.
You have a performance problem. It is difficult to give solutions which always work. What you can consider to do is:
1) Revise the HQL (Hibernate) statements. For this best you write a protocol with <property name="show_sql">true</property> in the config file (or even a tool like log4jdbc if you want to see the actual parameters) and analyse the output. There you see which SQL requests you have most. In many cases a better strategy for reading and writing db data can significantly reduce the database traffic. And check you have good indexes for your table.
2) Consider to use a second level cache. (Normally hibernate only uses the first level cache, which is of no use in your case because it is bound to one session.) Then at least the requests for reading actual commentaries can be served by the cache and don't need to go to the database. (Pay attention: The cache might interfere with the stored procedures. Have a look if the cache product you like to use supports MySQL stored procedures. In the worst case you have to remove the stored procedures for the critical tables and let you application server do the job so it goes through the cache.)
3) If it is only a few tables which are heavily used you can consider to cache them by your application. That's more work, but perhaps you can do it exactly for the demands of your application, so you might be faster than with a general second level cache.
4) If nothing helps and the traffic is really too heavy then perhaps you have to invest in more hardware.
Good luck ;-)