library for sharding mysql in Java - mysql

Sooner, we are going to shard our mysql database to achieve horizontal scaling. Our technology stack is based on spring, hibernate.
However, I haven't been able to find any alternate to hibernate which would support sharding at the application level.
I read about hibernate shards but it is no longer maintained, so I would not be suitable to use it in production.
Moreover, with companies like facebook, twitter, digg using mysql sharding, I am surprised that there is not GA hibernate alternative to sharding.
I would appreciate if someone could suggest some persistence framework in java which supports sharding out of the box.
Thanks in advance!!!!

Disclaimer: I work for ScaleBase, a provider of a complete MySQL scale-out solution an "automatic sharding machine" if you like.
I'm a believer of a solution that is outside the code. This way the entire eco-system including ad-hoc and administration queries from MySQL command line and utilities like MySQLDump are also "aware" of the sharding.
This was the main disadvantage of Hib Shards, or any sharding framework that would be limited to the persistence layer, inside the application code.
We have some good resources about key-based sharding (hash, range, list) and data distribution on our site: http://www.scalebase.com/products/database-sharding/ http://www.scalebase.com/resources/webinars/ - (search for "WEBINAR – 10.23.12: Benefits of Automatic Data Distribution")
I also invite you to look at my blog: http://database-scalability.blogspot.com
Hope I helped.

Related

AWS RDS MySql or Postgres - performance wise and cost wise?

I want to use aws for hosting a django application and use aws rds for database purpose. The application is kind of blog like system.
I am not able to decide which RDS I should choose over MySql or Postgres? Both price wise and performance wise according to aws pricing policy.
This can be very broad and may be opinionated , I would try to keep it short as i read it somewhere:
MySQL would be very good for any CMS Site as it works very well with it and MyISAM tables are quite nice here.
From What I read where PostgreSQL does better than MySQL:
Multi-application databases
Advanced data modelling
What Advance Data Modelling means is that PostgreSQL is far more mature at doing complex data modelling than MySQL is. It has a very mature extensible type system, a wide range of procedural languages, and a great deal of flexibility in how these languages can be plugged into existing queries.
If that wasn't enough, the fact is that you can essentially build your data model in PostgreSQL based not only on what information you are storing but what information is commonly derived from what you are storing. This makes things like not-first-normal-form actually sane to use where they are needed. Add collections and multiple inheritance in table structure and you have a very sophisticated data modelling platform, this blog would help you understand it better.
Besides the content management system market, MySQL's other major market is in applications where data is not expected to be exposed to more than one writing application at a time. This leads to a significant difference in handling data validation, etc.
In PostgreSQL validation is always equally strict. If the app expects special error treatment it had better call functions or casts to handle this explicitly.
MySQL however places the application in charge of defining the data validation rules.So while PostgreSQL allows the relational and object-relational interface to serve as a public API, it is essentially intended largely to be a private API for applications in MySQL. This is a huge difference and not readily understood by many people trying to make the choice. This leads to major differences in application design.
MySQL is a data storage and reporting solution for your application.
PostgreSQL is a data centralization, modelling, and reporting solution
for your organization. The two are remarkably different.
Now coming to Second Question based on pricing as you can see from MySQL Pricing Page and PostgreSQL Pricing Page MySQL is bit cheaper than PostgrSQL , reading on the answer you can make informed decision what would be best for you.
Hope this Helps!
I'm gonna offer you a 3rd option: Aurora - try it. It's cheaper than those 2 and is MySQL compatible.
This article may be of help to you when deciding.
For simple blog-like thingie I'd go with MySQL (or Aurora MySQL compatible version)
For data-critical and highly relational solutions I might also consider Postgres (Aurora)

MongoDB for small datasets

Are there any benefits to using MongoDB for a Node.js application rather than a traditional SQL database such as MySQL, if I'm not planning to have large (>1000 item) collections and am already comfortable with SQL?
MongoDB is schema-less document based database. This means you can insert a JSON object with other nested objects. This can make development easier, especially for prototyping.
For a small project, why not? For a larger project you should do more research. Large or small, doesn't hurt to do the research anyway. You want to consider how your application uses the database (reads vs writes) and how MongoDB scales horizontally, and how it handles failures.
There's a thing called the CAP theorem that defines NoSQL databases. MongoDB is CP. This visual guide shows the relationships between different databases. What is most important to you and your application?
Something else to consider is that most NoSQL databases are not ACID compliant. If you're using MySQL with InnoDB, that can be something significant to give up, depending on your application. For example, transactions might be something you might not want to give up.
Lots of pros and cons. Best thing to ask yourself is: What am I gaining? What am I giving up? There are many things, and it really depends on your use-case.
There are lots of reasons to stick with a simple dbms for a small-scale application. One of them is the widespread availability of cheap hosting services providing MySQL. Another is ease of deployment and maintenance.
Of course, if you're trying to learn to use MongoDB, go for it!

OpenStack Nova switching to Cassandra -- pros and cons?

OpenStack Nova is currently using MySQL (powered by SQLAlchemy) as its db backend. What would be the pros and cons of switching to Cassandra?
Openstack uses MYSQL as a backend for persisting service schema and the state of various artifacts (nodes, roles, networks, security groups, etc). The transactional intensity towards the persistence store is not so "instensive", therefore NoSQL is a good option in general. Here are some pros/cons:
PROS:
persistence store high availability out of the box
live horizontal scalability
better multi-tenancy, given the large schematic scope and scalability of Cassandra
enablement for analytics: sitting on a NoSQL store it becomes more straightforward to introduce analytics functionality within openstack
CONS:
code redesign: openstack's code is centric on relational database model. Migrating to NoSQL would require a relevant redesign of all openstack projects/codes, as well as require the introduction of indexing a model within cassandra to allow to relate data. Changes like this often require time, thinking and stability
more complex administration/maintenance than Mysql
potential for data conflicts: Cassandra has an eventually consistent model, although, given the not so concurrent transactional use of openstack, this should not be much of a problem at first sight
performance, although again, as openstack is not really "transactional" and as it has its own performance issues (python based code and services), this should not be much of a problem as well.

how to apply sharding + replication to MySQL relational db

I wanted to know, what are the different techniques for sharding and replication that can be applied to MySQL or any other relational database?
Are there any guidelines/rules that I should be aware of?
Basically, i want to create a custom MySQl(or other relational DB) that has support for sharding and replication. Most of the sites I see explain some technology or service that takes care of sharding/replication-- I want to understand the concepts and apply them myself to a regular MySQL database.
Sharding is not provided for MySQL "out of the box".
ScaleBase (disclaimer: I work there) is a maker of a complete scale-out solution an "automatic sharding machine" if you like. The application or any other client tool (mysql, mysqldump, PHPMyAdmin...) connects to ScaleBase controller, it looks and feels like a MySQL, proxy to a grid of "shards", automating command routing and parallelizing cross-db queries, and merge results - you wouldn't feel any difference... Just like a result that would have come from 1 DB. ORDER, GROUP, LIMIT, agg functions supported! The routing and parallelizing is done inside the "controller" according to the command and parameters. From experience with our customers, not only had we got great performance improvements with parallel queries, we also improved maintenance, think about creating an index, adding a column to a table - these are also parallelized and run much faster. All with none or pretty minimal changes to the code.
Also - I invite you to look at my blog about the subject: http://database-scalability.blogspot.com/.
PS - I forgot to mention, that ScaleBase also completes the solution for replication with a frontend that manages automatic failover and Read/Write splitting over replicated databases.
Hope I helped

MongoDB vs MySQL

I used to build Ruby on Rails apps with MySQL.
MongoDB currently become more and more famous and I am now starting to give it a try.
The problem is, I don't know the underlying theory of how MongoDB is working (am using mongoid gem if it matter)
So I would like to have a comparison on the performance between using MySQL+ActiveRecord and model generated by mongoid gem, could anyone help me to figure it out?
The article entitled: What the heck are you actually using NoSQL for? does a very good job at presenting the pros and cons of using NoSQL.
Edit: Also read http://blog.fatalmind.com/2011/05/13/choosing-nosql-for-the-right-reason/ blog post too
Re-edit: I found some recent material (published in 2014) on this topic that I consider to be relevant: What’s left of NoSQL?
I don't know much of the underlying theory. But this is the advice I got: only use MongoDB if you run it across multiple servers; that's when it'll shine. As far as I understand, the NoSQL movement appeared in no small part due to the pain of load-balancing relational databases across multiple servers. So if you're hosting your application on no more than one server, MySQL would be the preferred choice.
The good people over at the Doctrine project recently wrote a quite useful blog post on the subject.
From what I have read so far... here is my take on it.
Standard SQL trades lower performance for feature richness... i.e. it allows you to do Joins and Transactions across data sets (tables/collections if you will) among other things.
This allows a application developer to push some of the application complexity into the database layer. This has it's advantages of not having to worry about data integrity and the rest of the ACID properties by the application by depending upon proven technology.
The lack of extreme scalability works for pretty much all projects as long as one can manage to keep the application working within expected time limits, which may sometimes result in having to purchase high performance/expensive relational database systems.
On the other hand, Mongo DB, deliberately excludes much of the inherent complexity associated with relational databases, there by allowing for better scalable performance.
This approach forces the application developer to re-architect the application to work around the lack of relational features... which in and itself is a good thing, but the effort involved is generally only worth it if you have the scalability requirements. Please note that with MongoDB depending upon the data requirements w.r.t ACID properties, the application will have to step up and handle as necessary.