How does MYSQL Cluster (NDB) compare against MongoDB? It seems that while both NDB and Mongo supports scale out over commodity machine nodes, NDB also provides all the relational capabilities such as JOINs, transactions, etc...
Therefore, under what situations would one choose Mongo over NDB?
Even though MYSQL Cluster NDB is a shared-nothing approach that scales a relational database across commodity machines, there are limitations and impacts to performance. You can read the full details at the link below, but some of the more important features are just not supported in NDB, such a foreign keys, which may make you question why you would cluster a RDBMS in the first place if you have to give up some of the features you're expecting to leverage.
18.1.5.1 Differences Between the NDB and InnoDB Storage Engines
What are the limitations of implementing MySQL NDB Cluster?
I come from a relational background, and things like MongoDB did not initially click with me, but after tinkering with it for a few weeks, I was surprised at how much is possible while not being subject to traditional schema guidelines and transactional overhead that comes with relational databases. If you really want true, horizontal scalability and are willing to give up the luxury of joins and foreign keys, you should seriously consider using Mongo or something similar that falls under the NoSQL category.
If you want to keep your sql/ relational database structure then go with NDB.
If you want to build data that is a little more heirarchial in structure you should go with mongodb.
Related
It seems like most large companies that have to shard their databases choose MySQL over PostgreSQL. What are the major advantages that MySQL has over PostgreSQL when it comes to distributed database? I don't see any major downside to Postgres that will prevent a successful implementation of sharding at the application level, but the sheer number of companies that choose MySQL over Postgres is giving me pause and making me wonder if I'm missing something.
PARTITIONing involves a single server; Sharding involves many servers. They solve (or fail to solve) different problems. Partitioning provides very few use cases to justify its existence; sharding provides write scaling at the cost of complexity.
MySQL's has no built-in sharding capability. There are 3rd party packages that assist with such, but there is still a large burden on the DBA. (See Spider and various Proxy servers.)
So, I see no reason why Postgres (or any other RDBMS) could not be sharded. After all, you do most of the work; the RDBMS sits on multiple machines not realizing that there are siblings with other chunks of the data.
(Disclaimer: I am very familiar with MySQL, and not familiar with Postgres.)
I am designing a transportation system in which I need to store location of the vehicles at least once or twice a minute. I want to find out which database is better to choose (MySql or MariaDB) for this case in terms of performance and scalability. How much it worth if I switch to NoSQL databases such as MangoDB or whatever!?
If you want to use features provided by NoSQL you may choose MariaDB.It has Cassandra engine and you may use dynamic column to store data as like NoSQL inside MYSQL engine.
In terms of scaling
NoSQL’s simpler data models can make the process easier, and many have been built with scaling functionality from the start. That is a generalization, so seek expert advice if you encounter this situation
In terms of performance
NoSQL’s simpler denormalized store allows you to retrieve all information about a specific item in a single request. There’s no need for related JOINs or complex SQL queries.
Where you need NoSQL ?
unrelated, indeterminate or evolving data requirements
speed and scalability is imperative
Where you need MYSQL?
logical related discrete data requirements which can be identified up-front
data integrity is essential
EDIT :
You may check this link.He explained RDBMS vs NoSQL very well !!
Right now I'm trying to choose the most appropriate approach in order to implement Audit Trail for my entities with AWS RDS MySQL database.
I have to log all entity changes including the initiator(user) who initiated these changes. One of the main criterion is performance.
Hibernate Envers looks like the easiest and the most complete solution and can be very quickly integrated. Right now I'm worried about the possible performance slowdown after Envers introducing. I saw a few posts where developers prefer approach for Audit Trail based on database triggers.
The main issue with triggers is how to get initiator(user) who initiated these changes.
Based on your experience, could you please suggest the approach for Java/Spring/Hibernate/MySQL(AWS) in order to implement Audit Trail for historical changes.
Also, do we have any solution for Audit Trail within AWS RDS MySQL database infrastructure ?
Understand that speculation about performance without concrete evidence to support one's theory is analagous to premature optimization of code. It's almost always a waste of time.
From a simple database point of view, as a table grows to a specific limit, yes it's performance will degrade, but typcally this mainly impacts queries and less on insertion/update if the table is properly indexed and queries properly formed.
But many databases support partitioning as a means to control performance concerns, particularly on larger tables. This typically involves separating a table's data across a set of boundaries defined by a partition scheme you create. You simply define what is the most relevant data and you try and store this partition on your fastest drives/storage and the less relevant, typically older, data is stored on your slower drives/storage.
You can also elect to store database tables in differing schemas/tablespaces by specifying the envers property org.hibernate.envers.default_schema. If your database supports putting schemas in different database files on the file system, you can help increase performance by allowing your entity table reads/writes not impact the reads/writes of your audit tables.
I can't speak to MySQL's support for any of these things, but I do know that MSSQL/Oracle supports partitioning very easily and Oracle for sure allows the separation of schemas across differing database files.
Having studied about relational databases, document-stores, graph databases, and column-oriented databases, I concluded that something like Cassandra best fits my needs. In particular, the ability to add columns on the fly and no requirement to have a strict schema seals the deal for me. This seems to nicely bridge the gap between a rather novel graph db and a time-tested rdbms.
But I am concerned about how running Cassandra on a single node. Like many others, I can start only with a small amount of data, so more than one node to start with is just not practical. Based on another excellent SO question: Why don't you start off with a "single & small" Cassandra server as you usually do it with MySQL? I concluded that Cassandra can indeed be run just fine as a single node, as long as one is willing to give up benefits like availability which are derived from a multi-node setup.
There also seem to be ways of implementing dynamic adding of fields in an RDBMS for instance as discussed here on SO: How to design a database for User Defined Fields? This would, to some extent, mimic schemaless-ness.
So I would now like to understand how do Cassandra and MySQL compare - with regard to features and performance, on a single node setup? What would you advise someone in my situation - start with a simple RDBMS with the plan/intent to switch to Cassandra later on? Or start with Cassandra?
In a single node setup of Cassandra, many of the advantages of Cassandra are lost, so the main reason for doing that would be if you intended to expand to multiple nodes in the future. Performance would tend to favor RDBMS in most applications when using a single node since RDBMS is designed for that environment and can assume all data is local.
The strengths of Cassandra are scalability and availability. You can add nodes to increase capacity and having multiple nodes means you can deal with hardware failures and not have downtime. These strengths come at the cost of more difficult schema design since access is based primarily on consistent hashing. It also means you don't have full SQL available and often must rely on denormalization techniques to support fast access to data. Cassandra is also weak for ACID transactions since it is inherently difficult to coordinate atomic actions on multiple nodes.
RDBMS by contrast is a more mature technology. ACID transactions are no problem. Schema design is much simpler since you can add efficient indexes to any column to optimize queries, and you have joins available so that redundant data can be largely eliminated. By eliminating redundant data it is much easier to keep your data consistent, since there are not multiple copies of data that need to be updated when someone changes their address for example. But you run the risk of running out of space on a single machine to store all your data. And if you get a disk crash you will have downtime and need backups to restore the data, while Cassandra can often easily repair the data on a node that is out of sync. There is also no easy way to scale an RDBMS to handle higher transaction rates other than buying a faster machine.
There are a lot of other differences, but those are the major ones. Neither one is better than the other, but each one may be better suited to certain applications. So it really depends on the requirements of your use case which one will be a better fit.
Simple question, could I conceivably use redis instead of mysql for all sorts of web applications: social networks, geo-location services etc?
Nothing is impossible in IT. But some things might get extremely complicated.
Using key-value storage for things like full-text search might be extremely painfull.
Also, as far as I see, it lack support for large, clustered databases: so on MySQL you have no problems if you grow over 100s of Gb in Database, and on Redis... Well, it will require more effort :-)
So use it for what it was developed for, storing simple things which just need to be retreived by id.
ACID compliance is a must, if data integrity is important. Medical records and financial transactions would be an example. Most of the NoSQL solutions, including Redis, are fast because they trade ACID properties for speed.
Sometimes data is simply more convenient to represent using a relational database and the queries are simpler.
Also, thanks to foreign relationships and constraints in relational databases, your data is more likely to be correct. Keeping data in sync in NoSQL solutions is more difficult.
So, no I don't think we can talk about full replacement. They are different tools for different jobs. I wouldn't trade my hammer for a screwdriver.