For a project we are working with an several external partner. For the project we need access to their MySQL database. The problem is, they cant do that. Their databse is hosted in a managed environment where they don't have much configuration possibilities. And they dont want do give us access to all of their data. So the solution they came up with, is the federated storage engine.
We now have one table for each table of their database. The problem is, the amount of data we get is huge and will even increase in the future. That means there are a lot of inserts performed on our database. The optimal solution for us would be to intercept all incoming MySQL traffic, process it and then store it in bulk. We also thought about using someting like redis to store the data.
Additionnaly, we plan to get more data from different partners. They will potentialy provide us the data in different ways. So using redis would allow us, to have all our data in one place.
Copying the data to redis after its stored in the mysql database is not an option. We just cant handle that many inserts and we need the data as fast as possible.
TL;DR
is there a way to pretend to be a MySQL server so we can directly process data received via the federated storage engine?
We also thought about using the blackhole engine in combination with binary logging on our side. So incoming data would only be written to the binary log and wouldn't be stored in the database. But then performance would still be limited by Disk I/O.
Related
I'm working on a typical web application, doing some CRUD and save changes into mysql database. Now a new redis cache layer on top of mysql DB is added, to improve the performance.
I did some actions to prevent data inconsistency when updating/deleting data, but I cannot 100% guarantee it. So I thought maybe I need some mechanism to monitor any data inconsistency occurrences.
I can write a script to scan data from both redis and mysql and diff them, but I'm hoping to know if there're any other easy ways/tools/best practices to automatically detect any data inconsistency between redis cache and database?
I have a application where I need to maintain the audit log operation performed on the collection. I am currently using the MongoDB for storage purpose which work well so far.
Now for audit log I am thinking to use the MySQL database where reasons are-
1. Using the mongo implicit audit filter degrade the performance.
2. Storage will be huge if I store the logs also in the mongoDB which will impact in replication of nodes in cluster.
Conditions to see the logs are not very often in application, so thinking to store logs out of main storage. I am confused to use mongoDB with MySQL, also is this a right choice for future perspective.
Also Is MySQL a good choice to store the audit log, or any other database can help me in storage and conditional query later.
Performance is not guaranteed to go to a completely different database system only for this purpose.
My first attempt for separation would be creating a new database in your current database system and forward to there or even using a normal text file.
Give your feedbacks.
I have the app a MySQL DB is a slave for other remote Master DB. And i use memcache to do caching of some DB data.
My slave DB can be updated if there are updates in a Master DB. So in my application i want to know when my local (slave) DB is updated to invalidate related cached data and display fresh data i got from master.
Is there any way to run some program when slave mysql DB is updated ? i would then filter q query and understand if i need to clean a cache or not.
Thanks
First of all you are looking for solution similar to what Facebook did in their db architecture (As I remember they patched MySQL for this).
You can build your own solution based on one of these techniques:
Parse replication log on slave side, remove cache entry when you see update of data in the log
Load UDF (user defined function) for memcached, attach trigger on replica side (it will call UDF remove function) to interested tables inside MySQL.
Please note that this configuration is complicated during the support and maintenance. If you can sacrifice stale data in the cache maybe small ttl will help you.
As Kirugan says, it's as simple as writing your own SQL parser, and ensuring that you also provide an indexed lookup keyed to the underlying data for anything you insert into the cache, then cross reference the datasets for any DML you apply to the database. Of course, this will be a lot simpler if you create a simplified, abstract syntax to represent the DML, but thereby losing the flexibilty of SQL and of course, having to re-implement any legacy code using your new syntax. Apart from fixing the existing code, it should only take a year or two to get this working right. Basing your syntax on MySQL's handler API rather than SQL will probably save a lot of pain later in the project.
Of course, if you need full cache consistency then you need to ensure that a logical transaction now spans all the relevant datacentres which will have something of an adverse impact on your performance (certainly much slower than just referencing the master directly).
For a company like facebook, with hundreds of thousands of servers and terrabytes of data (and no requirement for cache consistency) such an approach to solving the problem leads to massive savings. If you only have 2 servers, a better solution would be to switch to multi-master replication, possibly add another database node, optimize the storage (e.g. switching to ssds / adding fast bcache) make sure you have session affinity to the dbms from the aplication (but not stcky sessions) and spend some time tuning your dbms, particularly its cache performance.
I'm about to create my first cloud based app using PHP and MYSQL. I'm in a limbo and can't seem to figure out whether I should use one database which stores everything, or should I setup a dedicated database for each user that signs up to my service.
The application itself will be recording hundreds and thousands of records on daily basis. So having a one big shared database could get heavy. It could also lead to performance issues when querying the database. On the other hand, having a dedicated database for each user could potentially lead to maintenance problems (E.g backing up and keeping track of each database instance).
Could anyone please advise me on the best approach? Or is there a better way of doing this?
I'm developing a site that is heavily dynamic and uses a MySQL database constantly. My question is - should I worry about the load on the database?
For example, a part of the site has a live chat which uses AJAX to contact the database every second for each user. Depending on how many users are connected, that's a lot of queries!
Is this something a MySQL database can handle, or am I pushing it? Thanks.
You are actually pushing it. Depending on your server and online users count MySQL can handle at some point.
MySQL and other database management systems are data storage systems, and you are not actually storing the data! You are just sending data between clients through MySQL and that is not efficient.
But to speed things up, you can use MySQL Memory Tables for instant messages and keep offline messages in another MyISAM or InnoDB table (which will be storing the data)
But the best way to have a chat infrastructure is having a backend application which keeps all the messages in the memory, and after some limit sending not received messages to the MySQL as offline messages. This is very much like MySQL Memory Tables but you will have more control over the data. The problem with this is you need to implement logical and efficient data structures with good memory management, which is a very hard task if you are not doing a commercial product and unnecessary if you are not thinking about selling that chat system so I recommend use MySQL Memory Tables as I described.
Update
Mysql Memory Tables are volatile (will be reset on service/server restart), so don't use it for storing, use only for keeping data in a short time for instant messages.