We have a postgresql database that is used in production and a mysql database for reporting, that copies data from the production server. Now, my issue is that the reporting database is denormalized so it is not a simple replica of the production DB schema. The data from production is copied onto the reporting server using complex queries that transform the data into a desired format. My question is about any guidelines that are industry standards for a practice like this. I would like to create indexes on the production server designed to make the complex queries for copying to reporting database faster. However, I would like to have some kind of quantifiable methods of measuring the pros and cons of creating these kind of indexes in the production server w.r.t its effect on the insert/update performance on the database. If anybody can provide some inputs based on experience or point me in the direction of some write-ups on this I would be grateful. I have tried google but haven't found anything that addresses an issue of this kind.
Any input would be much appreciated.
Thanks in advance.
Related
I was asked to create a little environment showcasing benefits from using NoSQL - SQL hybrid over only SQL database. Since my background is mostly Admin/DevOps I have basic knowledge about databases, but I've never done something like this.
I thought of creating a VM hosting MySQL or PostgreSQL instance and populating it with Sakila or other free database as starting point and the second VM with Mongo/Redis, but I don't know what to do from this point.
How can I integrate those databases? How can I run tests and what should I test - query response times? Is this even good strategy?
Any help would be appreciated.
Try to make two data models for the twitter data feed, one in relational SQL tables and another one in a single JSON document.
Used MySQL as a relational DB AND MongoDB as NoSQL DB.
The main performance indicator will show up when you start executing SQL on table joins in MySQL against executing MongoDB queries where there is no need to joins.
This is an idea but there are many other advantages.
I provide a solution that handles operations for brick and mortar shops. My next step is to provide analytics for my customers.
As I am in the starting phase I am hoping to find a free way to do it myself instead of using third party solutions. I am not expecting a massive scale at this point but I would like to get it done right instead of running queries off the production database.
And I am thinking for performance concerns I should run the analytics queries from separate tables in the same database. A cron job will run every night to replicate the data from the production tables to the analytics tables.
Is that the proper way to do this?
The other option I have in mind is to run the analytics from a different database (as opposed to just tables). I am using Amazon RDS with MySQL if that makes it more convenient?
It depends on how much analytics you want to provide.
I am a DWH manager and would start off with a small (free) BI (Business Intelligence) solution.
Your production DB and analytics DB should always be separate.
Take a look at Pentaho Data Integration (Community Edition) It's a free ETL tool that will help you get your data from your production to your analytics database and also can perform transformation.
check out some free reporting software like Jaspersoft to help you provide a Reporting Platform for customers (if that's what you want, otherwise just use Excel).
BI never wants to throw away data. If you think that your data in the analytics DB is gonna grow large (2TB +) don't use MySQL but rather PostgreSQL. MySQL does not handle big data well.
If you are really serious about this, read "The Datawarehouse Toolkit" by Ralph Kimball. That will set you up with some basic Data Warehouse knowledge.
Amazon RDS provides something call a Read-Replica. Which automatically performs replication and is optimised for reading.
I like this solution for its high convenience. Downside: its price-tag.
I'm looking for a possible solution for the following problem.
First the situation I'm at:
I've 2 databases, 1 Oracle DB and 1 MySQL DB. Although they have a lot of similarities they are not identical. A lot of tables are available on both the Oracle DB and the MySQL DB but the Oracle tables are often more extensive and contain more columns.
The situation with the databases can't be changed, so I've to deal with that.
Now I'm looking for the following:
I want to synchronise data from Oracle to MySQL and vice versa. This has to be done real time or as close to real time as possible. So when changes are made at one DB they have to be synced to the other DB as quickly as possible.
Also not every table has to be in sync, so the solution must offer a way of selecting which tables have to be synced and which not.
Because the databases are not identical replication isn't an option I think. But what is?
I hope you guys can help me with finding a way of doing this or a tool which does exactly what I need. Maybe you know some good papers/articles I can use?
Thanks!
Thanks for the comments.
I did some further research on ETL and EAI.
I found out that I am searching for an ETL tool.
I read your question and your answer. I have worked on both Oracle, SQL, ETL and data warehouses and here are my suggestions:
It is good to have a readymade ETL tool. But, if your application is big enough to make you need a tailor made ETL tool, I suggest you for a home-made ETL process.
If your transactional database is on Oracle, you can have triggers set up on the key tables that would further trigger an external procedure written in C, C++ or Java.
The reason behind using an external procedure is to be able to communicate with both databases at a time - Oracle and MySQL.
You can read more about Oracle External Procedures here.
If not through ExtProc, you can develop a separate application in Java or .Net that would extract data from the first database, transform it according to your business rules and load it into your warehouse.
In either approaches that you choose, you will have greater control on the ETL process if you implement your own tool, rather than going for a readymade tool.
I have a production database server running on MYSQL 5.1, now we need to build a app for reporting which will fetch the data from the production database server, since reporting queries through entire database may slow down, hence planning to switch to nosql. The whole system is running aws stack planning to use DynamoDb. Kindly suggest me the ways to sync data from the production nosql server to nosql database server.
Just remember the simple fact that any NoSQL database is essentially a document database; it's really difficult to automatically convert a typical relational database in MySQL to a good document design.
In NoSQL you have a single collection of documents, and each document will probably contain data that would be in related rows in multiple tables. The advantage of a NoSQL redesign is that most data access is simpler and faster without requiring you to write complex join statements.
If you automatically convert each MySQL table to a corresponding NoSQL collection, you really won't be taking advantage of a NoSQL DB. This is because you'll end up loading many more documents, and thus make many more calls to the database than needed and thus loosing simplicity and speediness of NoSQL DB.
Perhaps a better approach is to look at how your applications use the MySQL database and go from there. You might then consider writing a simple utility script knowing fully well your MySQL database design.
As the data from a NoSQL database like MongoDB, RIAK or CouchDB has a very different structure than a relational database like MySQL the only way to migrate/synchronise the data would be to actually write a job which would write the data from MySQL to the NoSQL database using SELECT queries as stated on the MongoDB website:
Migrate the data from the database to MongoDB, probably simply by writing a bunch of SELECT * FROM statements against the database and then loading the data into your MongoDB model using the language of your choice.
Depending of the quantity of your data this could take awhile to process.
If you have any other questions don't hesitateo to ask.
I'm not sure, if it fits exactly stackoverflow, however as i'm seeking for some code rather than a tool, i think it does.
I'm looking for a way of how to replicate / synchronize different database systems -- in this case: mysql and mongodb. We are running both for different purpose. We started with a mysql database and added mongodb later on for special applications. There's data we would like to have in both databases, where we want to have constraints in mysql respectivly dbrefs in mongodb. For example: We need a user-record in mysql, but also in mongodb for references between tables respectivly objects. At the moment we have a cronjob, which dumps the mysql data and imports it in mongodb. However though it works quite well, that's not the solution we would like to have.
I think for the moment a one-way replication would be enough -- mysql->mongodb, the important part is, that the replication works in "realtime", much like a mysql master->slave replication works.
Are there already any solutions for this problem or ideas anyone of how to achieve this?
Thanks!
SymmetricDS is open source, Java-based, web-enabled, database independent, data synchronization/replication software that might do the trick with a few tweaks. It has an extension point called IDataLoaderFilter which you could use to implement a MongodbDataLoader.
This would help with one way database replication. It might be a little more difficult to synchronized from MongoDb -> relational database, but the SymmetricDS team would be very helpful in trying to find the solution.
What you're looking for is called EAI (Enterprise application integration). There are a lot of commercial tools around but under the provided link, you'll also find a couple OSS solutions. The basis of EAI is that you have data sources and data sinks. The EAI framework offers tools to build custom pumps between the two.
I suggest to either use a DB trigger to start the synchronization or send a trigger signal in your applications. Note that there is no key-hole solution since synchronization can become arbitrarily complex (for example, how do you make sure that all rows are copied?).
As far as I see you need to develop some sort of "Control program" that has the drivers for each DBMS and run it as a daemon. The daemon should have a trigger or a very small recheck interval to keep the DBs synchronized
Technically, you could set up a process which parses the binary log of the MySQL server and replicate the relevant sql queries. I've never done such a thing with a a different database as a slave, but maybe it is worth a shot?