MySQL Cluster is a NoSQL technology? Or is another way to use the relational database?
MySQL Cluster uses MySQL Servers as API nodes to provide SQL access/a relational view to the data. The data itself is stored in the data nodes - which are separate processes. The fastest way to access the data is through the C++ API (NDB API) - in fact that is how the MySQL Server gets to the data.
There are a number of NoSQL access methods for getting to the data (that avoid going through the MySQL Server/releational view) including Rest, Java, JPA, LDAP and most recently the Memcached key-value store API.
It is another way to use the database by spreading it across multiple machines and allowing a simplified concurrent-master setup. It comes with a bit of a cost in that your indexes cannot exceed the amount of RAM available to hold them. To you application, it looks no different than regular MySQL.
Perhaps take a look at Can MySQL Cluster handle a terabyte database.
Related
For a project we are working with an several external partner. For the project we need access to their MySQL database. The problem is, they cant do that. Their databse is hosted in a managed environment where they don't have much configuration possibilities. And they dont want do give us access to all of their data. So the solution they came up with, is the federated storage engine.
We now have one table for each table of their database. The problem is, the amount of data we get is huge and will even increase in the future. That means there are a lot of inserts performed on our database. The optimal solution for us would be to intercept all incoming MySQL traffic, process it and then store it in bulk. We also thought about using someting like redis to store the data.
Additionnaly, we plan to get more data from different partners. They will potentialy provide us the data in different ways. So using redis would allow us, to have all our data in one place.
Copying the data to redis after its stored in the mysql database is not an option. We just cant handle that many inserts and we need the data as fast as possible.
TL;DR
is there a way to pretend to be a MySQL server so we can directly process data received via the federated storage engine?
We also thought about using the blackhole engine in combination with binary logging on our side. So incoming data would only be written to the binary log and wouldn't be stored in the database. But then performance would still be limited by Disk I/O.
I am creating a messaging app. I have my users stored in a mysql database and messages stored in google datastore a nosql database. However I was wondering what would be the drawbacks of having my messages in a mysql database since I am fetching the message and the user simultaneously.
Is there performance drawbacks?
Generally, different database usage cannot affect anything if your backend architecture is well-defined. Database stores only data to manipulate. I think for authentication you use mySQL and store data in Google Datastore. Performance drawbacks are coming from the bandwidth of your server.
I propose that you must use the same database to store all data, it will be more stable and easy to manage.
I have to design a piece of software on a three layer architecture:
A process periodically polling a data source such an ftp to inject in a database
A database
Spark for the processing of the data
My data is simple and perfectly suitable for being stored in a single RDMS table, or I can store it in Cassandra, then periodically I would need Spark to run some machine learning algorithms on the whole set of data.
Which of the database better suits my use case? In detail, I do not need to scale on multiple nodes and I think the main underlying questions are:
Is simple querying (SELECT) faster on Cassandra or MySQL on a simple table?
Does the Spark Connector from Cassandra benefit of some features of it that will make it faster than a SQL connector?
You can use MySQL if data size is less than 2Tb. Select on MySQL table will be more flexible than in Cassandra.
You should use Cassandra when your data storage requirement crosses single machine. Cassandra needs careful data modeling to be done for each lookup or Select scenario.
You can use suggested approach below for MySQL Spark Integration
How to work with MySQL and Apache Spark?
It all depends on Data : size, integrity, scale, Flexible schema sharding etc.
Use MySQL if:
Data size is small ( in single digit TBs)
Strong Consistency( Atomicity, Consistency, Isolation & Durability) is required
Use Cassandra if:
Data size is huge and horizontal scalability is required
Eventual Consistency ( Basically Available Soft-state Eventual consistency)
Flexible schema
Distributed application.
Have a look at this benchmarking article and this pdf
I think it's better to use a sql database as mysql, cassandra only should be used if you need to scale your data in bigger proportions and along many datacenters. The java cassandra jdbc driver is just a normal driver to connect to cassandra, it doesn't have any especial advantages over other database drivers.
I have a 2GB Digital Ocean VPS with 2 CPUs, to host a social network app written in Java. Right now my app stores data to Cassandra, but Cassandra is a new technology & not as reliable as MySQL that has been for years, also my experience in managing Cassandra as a DBA is not much. So I wanted to change my primary datasource back to MySQL but since some of the data is stored just schemaless, for e.g. there are lists specific to each user that are easily stored in Cassandra. For this type of data, I would use Cassandra as primary database.
So, to sum up, I would replicate my entire data in both the databases. Data will be written to both databases but read from where I can get it most performantly. This will help me in case when the entire Cassandra cluster goes down I can serve from mysql or vice versa. Is this usually done & recommended to do ?
(Right now I have a single 2 GB VPS that would host my app as well as the databases)
Normally we never see that people managing two separate database system just for purpose of data lost and recovery, always better to rely upon replication or mirroring of any of database system. Both are good and providing enough solution for replication so better you will choose any single one of them.
I have a production database server running on MYSQL 5.1, now we need to build a app for reporting which will fetch the data from the production database server, since reporting queries through entire database may slow down, hence planning to switch to nosql. The whole system is running aws stack planning to use DynamoDb. Kindly suggest me the ways to sync data from the production nosql server to nosql database server.
Just remember the simple fact that any NoSQL database is essentially a document database; it's really difficult to automatically convert a typical relational database in MySQL to a good document design.
In NoSQL you have a single collection of documents, and each document will probably contain data that would be in related rows in multiple tables. The advantage of a NoSQL redesign is that most data access is simpler and faster without requiring you to write complex join statements.
If you automatically convert each MySQL table to a corresponding NoSQL collection, you really won't be taking advantage of a NoSQL DB. This is because you'll end up loading many more documents, and thus make many more calls to the database than needed and thus loosing simplicity and speediness of NoSQL DB.
Perhaps a better approach is to look at how your applications use the MySQL database and go from there. You might then consider writing a simple utility script knowing fully well your MySQL database design.
As the data from a NoSQL database like MongoDB, RIAK or CouchDB has a very different structure than a relational database like MySQL the only way to migrate/synchronise the data would be to actually write a job which would write the data from MySQL to the NoSQL database using SELECT queries as stated on the MongoDB website:
Migrate the data from the database to MongoDB, probably simply by writing a bunch of SELECT * FROM statements against the database and then loading the data into your MongoDB model using the language of your choice.
Depending of the quantity of your data this could take awhile to process.
If you have any other questions don't hesitateo to ask.