Is it possible to add external cache provider like ehCache or memcache to the mySQL database. By doing this I am hoping that the performance of the mySQL can be improved. Is this possible to do?
Not "add to MySQL" per se as far as I know. You can easily use Memcache(d) for instance in your system if you want: not to be rude about it, but basically just 'use' it.
Now:
You have code that requests data from your database.
With Memcache:
You have code that checks if the data is in the cache. If it isn't, request it from the database and add it to the cache, then return.
In this case you didn't add it to MySQL itself, but it does help you get better results (if you do it right ofcourse). For sql cache you have the query-cache from the database itself, but that only works for especially-equal (that's a strange term) queries.
Related
In my work I need to revamp the web which need to accept numerous connection always. Before I use the JSON to get the data until now.But now I want to direct call the DB and get the data. As I know use cache is the best way for my web. But in initial the concurrent access to DB is often happen.Any advice for me to handle the situation. Because I want the web that can get the updated data always.
Thanks.
Following are my suggestions
If you want to use cache, you have to automate your cache clear process whenever there is an update in the particular data you hit. But this is practically possible if your data is updated infrequently.
If your budget allows, Put your DB in a cluster (Write in master and read from master&slave)
In worst case,ensure your db is properly indexed.
I have the app a MySQL DB is a slave for other remote Master DB. And i use memcache to do caching of some DB data.
My slave DB can be updated if there are updates in a Master DB. So in my application i want to know when my local (slave) DB is updated to invalidate related cached data and display fresh data i got from master.
Is there any way to run some program when slave mysql DB is updated ? i would then filter q query and understand if i need to clean a cache or not.
Thanks
First of all you are looking for solution similar to what Facebook did in their db architecture (As I remember they patched MySQL for this).
You can build your own solution based on one of these techniques:
Parse replication log on slave side, remove cache entry when you see update of data in the log
Load UDF (user defined function) for memcached, attach trigger on replica side (it will call UDF remove function) to interested tables inside MySQL.
Please note that this configuration is complicated during the support and maintenance. If you can sacrifice stale data in the cache maybe small ttl will help you.
As Kirugan says, it's as simple as writing your own SQL parser, and ensuring that you also provide an indexed lookup keyed to the underlying data for anything you insert into the cache, then cross reference the datasets for any DML you apply to the database. Of course, this will be a lot simpler if you create a simplified, abstract syntax to represent the DML, but thereby losing the flexibilty of SQL and of course, having to re-implement any legacy code using your new syntax. Apart from fixing the existing code, it should only take a year or two to get this working right. Basing your syntax on MySQL's handler API rather than SQL will probably save a lot of pain later in the project.
Of course, if you need full cache consistency then you need to ensure that a logical transaction now spans all the relevant datacentres which will have something of an adverse impact on your performance (certainly much slower than just referencing the master directly).
For a company like facebook, with hundreds of thousands of servers and terrabytes of data (and no requirement for cache consistency) such an approach to solving the problem leads to massive savings. If you only have 2 servers, a better solution would be to switch to multi-master replication, possibly add another database node, optimize the storage (e.g. switching to ssds / adding fast bcache) make sure you have session affinity to the dbms from the aplication (but not stcky sessions) and spend some time tuning your dbms, particularly its cache performance.
I have a web application which has the following parts:
Commentators continuously doing match commentary through a browser based tool. The comments are inserted into DB using hibernat.
Lots of users are accessing a URL to read commentary. Hibernate is reading data from the table being updated by commentators in step #1.
There are some stored procedures as well which are set to run every 1 hour. Few of them access the same table (used in step #1 and #2) for reading and writing/updating purpose.
Now my problem is, whenever the site has 100+ concurrent users watching a particular match commentary, my MySQL goes down. It shows lots of queries stuck in processlist. Many of them are in "Copying to temp table" state. This makes the JBOSS restart frequently.
I am using transactions in hibernate for both reading and writing purposes. Please help because I loose big matches because of these crashes.
You have a performance problem. It is difficult to give solutions which always work. What you can consider to do is:
1) Revise the HQL (Hibernate) statements. For this best you write a protocol with <property name="show_sql">true</property> in the config file (or even a tool like log4jdbc if you want to see the actual parameters) and analyse the output. There you see which SQL requests you have most. In many cases a better strategy for reading and writing db data can significantly reduce the database traffic. And check you have good indexes for your table.
2) Consider to use a second level cache. (Normally hibernate only uses the first level cache, which is of no use in your case because it is bound to one session.) Then at least the requests for reading actual commentaries can be served by the cache and don't need to go to the database. (Pay attention: The cache might interfere with the stored procedures. Have a look if the cache product you like to use supports MySQL stored procedures. In the worst case you have to remove the stored procedures for the critical tables and let you application server do the job so it goes through the cache.)
3) If it is only a few tables which are heavily used you can consider to cache them by your application. That's more work, but perhaps you can do it exactly for the demands of your application, so you might be faster than with a general second level cache.
4) If nothing helps and the traffic is really too heavy then perhaps you have to invest in more hardware.
Good luck ;-)
I haven't got a lot of ETL experience but I haven't found the answer to my question either, although I guess it may be a no-brainer if you've worked with it. We're currently looking into creating a simple data warehouse (simple as in "copy most columns from most tables" and not OLAP-style) and it seems we're leaning towards SQL Server (2008) for a few reasons.
SSIS seems to be the tool for this kind of tasks when it comes to SQL Server, but I can't find anything about how it is affecting the source database cache, if at all, when loading data. Some of our installations are very sensitive performance-wise when it comes to having a usage-style-cache.
But if SSIS runs a "select *"-ish query and the cache is altered, then the performance for the users may degrade to unacceptable levels until it is rebuild from those queries again.
So my question is, does SSIS (or is there a way to avoid) affect the database cache when loading data from a SQL Server database?
Part of the problem is also that the source database could be both an Oracle or SQL Server database, so if there is a way to avoid the cache-affecting part for Oracle, that would be good input as well. (I guess the Attunity connector is the way to go?)
(Some additional info: We have considered plain files as well, but then export-import probably takes longer time than SSIS-transfer? I also guess change data capture is something we'll also look into, so if that is relevant to this question, feel free to include possible issues/benefits.)
Any other relevant suggestions are also welcome!
Thanks!
Tackling the SQL Server side:
First off, SSIS doesn't do anything special to avoid the buffer pool, or the plan cache.
Simple test (on a NON-production instance!):
Create a new SSIS package with a single connection manaager, and a single data flow containing one OLE DB Source, pointing to a table, similar to:
Clear the buffer pool, from SSMS: DBCC DROPCLEANBUFFERS
Verify that the cache has been cleared using the glorified dm_os_buffer_descriptors query at the top of this page: I get this:
Run the package
Re-run the query from step (2), and note that the data pages for the table (BOM_PIECE in my example) have been loaded into the cache:
Note that most SSIS components allow you to provide your own query, so if you have a way to avoid the buffer pool (I don't know that this is possible - I'd defer to someone who knows more about it), you could insert that into a query. So in the above example, instead of selecting Table or view in the OLE DB Source, you would select SQL command, or SQL command from variable if your command requires dynamic text.
Finally, I can imagine why you want to eliminate the cache load - but are you sure you want to do this? SQL Server is fairly good at managing memory, and what you're doing is swapping memory load for disk I/O load, which (depending on your use case) may have a negative impact on other users. This question has a discussion on SQL Server caching.
Read this article about Attunity regarding reading data from oracle
What do you mean "affect the database cache when loading data from a SQL Server database". SQL Server does not cache data, it caches execution plans. The fact that you are using SSIS wont affect your Server (other than the overhead of reading the data of course). Just use a propper transaction isolation level.
Also, read about the fast load property on SSIS components
About change data capture, I don't see how it can replace SSIS. You can use CDC to select the rows that will be loaded, but it wont do the loading for you.
I'm in a situation where I need to make a call to a stored procedure from Rails. I can do it, but it either breaks the MySQL connection, or is a pseudo hack that requires weird changes to the stored procs. Plus the pseudo hack can't return large sets of data.
Right now my solution is to use system() and call the mysql command line directly. I'm thinking that a less sad solution would be to open my own MySQL connection independent of Active Record's connection.
I don't know of any reasons why this would be bad. But I also don't know the innards of the MySQL well enough to know it's 100% safe.
It would solve my problem neatly, in that with the controller actions that need to call a stored proc would open a fresh database connection, make the call and close it. I might sacrifice some performance, but if it works that's good enough. It also solves the issue of multiple users in the same process (we use mongrel, currently) in edge rails where it's now finally thread safe, as the hack requires two sql queries and I don't think I can guarantee I'm using the same database connection via Active Record.
So, is this a bad idea and/or dangerous?
Ruby on Rails generally eschews stored procedures, or implementing any other business logic in the database. One might say that you're not following "the Rails way" to be calling a stored proc in the first place.
But if you must call the stored proc, IMO opening a second connection from Ruby must be preferable to shelling out with system(). The latter method would open a second connection to MySQL anyway, plus it would incur the overhead of forking a process to run the mysql client.
You should check out "Enterprise Recipes with Ruby and Rails" by Maik Schmidt. It has a chapter on calling stored procedures from Rails.
MySQL can handle more than one connection per request, though it will increase the load on the database server. You should open the second connection in a 'lazy' manner, only when you are sure you need it on a given request.
Anyway, if performance were important in this application, you wouldn't be using Rails! >:-)
(joking!)
Considering how firmly RoR is intertwined with its own view of dbms usage, you probably should open a second connection to the database for any interaction it doesn't manage for you, just for SoC purposes if nothing else. It sounds from your description like it's the simplest approach as well, which is usually a strong positive sign.
Applications from other languages (esp. e.g. PHP) open multiple connections regularly (which doesn't make it desirable, but at least it demonstrates that mysql won't object.)
We've since tried the latest mysql gem from github and even that doesn't solve the problem.
We've patched the mysql adapter in Rails and that actually does work. All it does is make sure the MySQL connection has no more results before continuing on.
I'm not accepting this answer, yet, because I don't feel 100% that the fix is a good one. We haven't done quite enough testing. But I wanted to put it out there for anyone else looking at this question.