I have a application where I need to maintain the audit log operation performed on the collection. I am currently using the MongoDB for storage purpose which work well so far.
Now for audit log I am thinking to use the MySQL database where reasons are-
1. Using the mongo implicit audit filter degrade the performance.
2. Storage will be huge if I store the logs also in the mongoDB which will impact in replication of nodes in cluster.
Conditions to see the logs are not very often in application, so thinking to store logs out of main storage. I am confused to use mongoDB with MySQL, also is this a right choice for future perspective.
Also Is MySQL a good choice to store the audit log, or any other database can help me in storage and conditional query later.
Performance is not guaranteed to go to a completely different database system only for this purpose.
My first attempt for separation would be creating a new database in your current database system and forward to there or even using a normal text file.
Give your feedbacks.
Related
How to handle transaction issues in an environment where mongodb and mysql databases work together?
I want to use mongodb for scalability and mysql for transactions. (transactions are used in the inventory management system but product information is stored in a mysql database)
There is good news: As of Mongo version 4.2, multiple document ACID transactions are now fully supported:
For situations that require atomicity of reads and writes to multiple documents (in a single or multiple collections), MongoDB supports multi-document transactions.
As a general comment to your question, there is nothing wrong with having more than one data store in your architecture. However, keep in mind that business operations which would require both Mongo and MySQL for a single logical transaction/unit, there would probably be no way to make this atomic. If you are falling into this category, then you need to rethink your database design, and just stick with one database for each business operation.
For a project we are working with an several external partner. For the project we need access to their MySQL database. The problem is, they cant do that. Their databse is hosted in a managed environment where they don't have much configuration possibilities. And they dont want do give us access to all of their data. So the solution they came up with, is the federated storage engine.
We now have one table for each table of their database. The problem is, the amount of data we get is huge and will even increase in the future. That means there are a lot of inserts performed on our database. The optimal solution for us would be to intercept all incoming MySQL traffic, process it and then store it in bulk. We also thought about using someting like redis to store the data.
Additionnaly, we plan to get more data from different partners. They will potentialy provide us the data in different ways. So using redis would allow us, to have all our data in one place.
Copying the data to redis after its stored in the mysql database is not an option. We just cant handle that many inserts and we need the data as fast as possible.
TL;DR
is there a way to pretend to be a MySQL server so we can directly process data received via the federated storage engine?
We also thought about using the blackhole engine in combination with binary logging on our side. So incoming data would only be written to the binary log and wouldn't be stored in the database. But then performance would still be limited by Disk I/O.
Right now I'm trying to choose the most appropriate approach in order to implement Audit Trail for my entities with AWS RDS MySQL database.
I have to log all entity changes including the initiator(user) who initiated these changes. One of the main criterion is performance.
Hibernate Envers looks like the easiest and the most complete solution and can be very quickly integrated. Right now I'm worried about the possible performance slowdown after Envers introducing. I saw a few posts where developers prefer approach for Audit Trail based on database triggers.
The main issue with triggers is how to get initiator(user) who initiated these changes.
Based on your experience, could you please suggest the approach for Java/Spring/Hibernate/MySQL(AWS) in order to implement Audit Trail for historical changes.
Also, do we have any solution for Audit Trail within AWS RDS MySQL database infrastructure ?
Understand that speculation about performance without concrete evidence to support one's theory is analagous to premature optimization of code. It's almost always a waste of time.
From a simple database point of view, as a table grows to a specific limit, yes it's performance will degrade, but typcally this mainly impacts queries and less on insertion/update if the table is properly indexed and queries properly formed.
But many databases support partitioning as a means to control performance concerns, particularly on larger tables. This typically involves separating a table's data across a set of boundaries defined by a partition scheme you create. You simply define what is the most relevant data and you try and store this partition on your fastest drives/storage and the less relevant, typically older, data is stored on your slower drives/storage.
You can also elect to store database tables in differing schemas/tablespaces by specifying the envers property org.hibernate.envers.default_schema. If your database supports putting schemas in different database files on the file system, you can help increase performance by allowing your entity table reads/writes not impact the reads/writes of your audit tables.
I can't speak to MySQL's support for any of these things, but I do know that MSSQL/Oracle supports partitioning very easily and Oracle for sure allows the separation of schemas across differing database files.
I have the app a MySQL DB is a slave for other remote Master DB. And i use memcache to do caching of some DB data.
My slave DB can be updated if there are updates in a Master DB. So in my application i want to know when my local (slave) DB is updated to invalidate related cached data and display fresh data i got from master.
Is there any way to run some program when slave mysql DB is updated ? i would then filter q query and understand if i need to clean a cache or not.
Thanks
First of all you are looking for solution similar to what Facebook did in their db architecture (As I remember they patched MySQL for this).
You can build your own solution based on one of these techniques:
Parse replication log on slave side, remove cache entry when you see update of data in the log
Load UDF (user defined function) for memcached, attach trigger on replica side (it will call UDF remove function) to interested tables inside MySQL.
Please note that this configuration is complicated during the support and maintenance. If you can sacrifice stale data in the cache maybe small ttl will help you.
As Kirugan says, it's as simple as writing your own SQL parser, and ensuring that you also provide an indexed lookup keyed to the underlying data for anything you insert into the cache, then cross reference the datasets for any DML you apply to the database. Of course, this will be a lot simpler if you create a simplified, abstract syntax to represent the DML, but thereby losing the flexibilty of SQL and of course, having to re-implement any legacy code using your new syntax. Apart from fixing the existing code, it should only take a year or two to get this working right. Basing your syntax on MySQL's handler API rather than SQL will probably save a lot of pain later in the project.
Of course, if you need full cache consistency then you need to ensure that a logical transaction now spans all the relevant datacentres which will have something of an adverse impact on your performance (certainly much slower than just referencing the master directly).
For a company like facebook, with hundreds of thousands of servers and terrabytes of data (and no requirement for cache consistency) such an approach to solving the problem leads to massive savings. If you only have 2 servers, a better solution would be to switch to multi-master replication, possibly add another database node, optimize the storage (e.g. switching to ssds / adding fast bcache) make sure you have session affinity to the dbms from the aplication (but not stcky sessions) and spend some time tuning your dbms, particularly its cache performance.
Currently I have a system, which is based solely on Solr. Which means, that I store all data in Solr (using SolrJ) with no other datastore involved. The problem is now, that I experience some performance issues. I thought, that it maybe could make sense to store in MySQL and then synchronize the data with Solr with e.g. the DataImportHandler. So that I have the reading operations on the Solr index and the main writing operations in MySQL and then sometimes only Solr-Writing operations when synchronizing with Solr.
The thing is that I expect hundreds of millions documents which should be stored and I don't really now if that the MySQL/Solr makes sense.
Is there another better solution? Maybe Master-Solr for writing and Solr-slaves for reading?
Update: What I forgot to say is, that also in case of a schema.xml change, the "storing data in MySQL" solution could be useful in my opinion, because then I can re-commit all the data without caring about Solr's self-stored data.
Its not preferable to use the same Solr instance for both reading and writing as the activities (with commit and optimize) on Solr during writing would heavily impact the read operations.
Master - Slave confgurations would be nicer approach, with master primarily for writes and slaves for read only purposes.
Slaves being periodically refreshed with the contents from Master. (So there would be some delay)
You can always scale by adding multiple slaves.
Using MySQL as a persistant store with Master-Slave Solr would be a best approach.
MySQL providing a stable data store, and would guard you against index corruption or some more issues which would result in data lost.
Using dataimport handler you can do it easily with incremental updates, but there would be more time tag for latest data to appear on slaves.
With this you can also use Index swapping for full refreshes.
In case the index grows up hugh to be be maintainable and has performance impact, you may want to check solr shards.
I also thought about the same issue: storing everything in solr or stor in mySql and index in Solr.
I decided to go the 2nd way: store with MySQL and index in solr.
The reason: handling of data (reading and writing data) in MySql is much better than by Solr. Also data import/export from/to MySql is supported/possible by lots of tools, out of the box.
Next Point: Backup. There are much more established ways for backing up an MySql DB than an Solr index.
Of course, for fulltext-search, Solr is much more better than MySql. So i decided, that everyone should have to work where he knows best.
For your Information: i'm talking about an medium Index: 4GB for some million documents.
//Edit: don't forgett, that some features requiere stared data in lucene (not only indexed), like highlighting. If you need this, you have to store the documents in solr (additional). An alternative way could be implementing those features on client-side. (I did it this way)