Future potential of switching from MySQL to Cassandra (NoSQL) [closed] - mysql

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I am planning on eventually switching my website's database system from MySQL to NoSQL (in this case Cassandra).
From what I have understood so far about Cassandra, is that there is no such thing as a join, but rather just larger records that work more efficiently. I am by no standard an expert in NoSQL atm, i actually understand very very little about it and am very confused on how a lot of it works...
One of my goals for my web project is to switch to Python and Cassandra for a more advanced and speedier solution as my website is beginning to grow and I want to be able to scale it easily with additional servers.
Right now i am in the process of designing a new feature for my website, the ability to take files and create folders out of them. So far this is what I was originally using: How to join/subquery a second table (A question I just asked)
Then the people were suggesting to normalize the data and make it a 3 table system including one for folders, one for folders/files, and one for files. #egrunin answered my question and even gave me the info for the NoSQL, but i really wanted to check it with a second source just to make sure that this is the right approach.
Also are there any conversion tools for SQL to NoSQL?
So my ultimate goal is to design this folder/file system in the database (along with other features that I am adding) so that when I switch from SQL to NoSQL I will be ready and the conversion of all of my data will be a lot easier.
Any tutorials, guides, and information on converting SQL to NoSQL, Cassandra, or how NoSQL works is much appreciated, so far the Cassandra documentation has left me very confused.

At Couchbase we've recently done a webinar series about the transition from RDBMS to NoSQL. It's obviously through the lens of JSON documents, but a lot of the lessons will apply to any distributed database.
http://www.couchbase.com/webinars

MasterGberry:
One of my goals for my web project is to switch to Python and Cassandra for a more advanced and speedier solution as my website is beginning to grow and I want to be able to scale it easily with additional servers.
This is something that you need to clearly quantify before switching to Cassandra.
MySQL can do amazing things and so can Cassandra, but switch to Cassandra usually cannot be driven just by wanting to do things faster, because they might not be faster - at least not in the areas where you are used for MySQL to do great (column level numerical aggregates on well defined, tabular data).
I am by no means discouraging the transition, but I am warning about the expectations.
This might be a good reading:
http://itsecrets.wordpress.com/2012/01/12/jumping-from-mysql-to-cassandra-a-success-story/

Actually, you can use a tool like playOrm to support joins BUT on partitions only NOT entire tables. So if you partition by month or account, you can grab the account 4536 partition and query into that joining it with something else (either another smaller table or another partition from another table).
This is very useful if you have a system with lots of clients and each client is really independent of another client as you can self contain all the client information into that client's partitions of all tables.
later,
Dean

Cassandra isn't really meant to be the main storage for an application. One of its main purposes is storing sequential data and pulling all that back with a key lookup. One example is logging. Interestingly, the row keys are not sorted, but the column names are. So logging would have a key for every minute and then create a new column for each log entry with a sequential time stamp as the name of the column. That is just one example of course, chat history is another.

Related

Planning for database scaling and schema changes [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm doing research before I create my social network database and I've found a lot of questions/resources pertaining to graph and key-value databases for social networks. I understand there are a TON of different options and ways to implement the DB. I also understand that what the big companies do is complex and way above what I currently need (1b+ users). I also know each of the big companies have revamped their databases to account for the insane scaling they go through.
Because I don't know how the network will grow, and I don't believe I can accurately create a model that will scale to 1m users (due to unknowns such as how people will use it, how often people post, comment, etc). But I can at least try to create a database that will be easiest to scale when (if) the need arises.
Do most companies create a database to handle up to 1k users, then once they grow, they revamp it for 10k users, then 100k, etc? If they do, at each of these arbitrary numbers (because of the unknowns listed above), do companies typically change a few tables/nodes/etc, or do they completely recreate the database to take advantage of new technologies (such as moving from SQL to graph)?
I want to pick the best solution, but I'm finding the decision between graph, key-value, SQL, among others very difficult--especially with no data to know what relationships/data is most important. I believe I can create a solid system using a graph that can support up to 10k users, but I'm worried having to potentially completely reacreate the database as the system grows. Is this a worry now to avoid issues, or implement now and adapt later type problem?
Going further, if I do need to plan on complete DB restructures, does it typically make sense to use a Multi-Model NoSQL DBMS (such as OrientDB or ArangoDB)?
I personally think you are asking premature questions.
Seriously, even with a bad model, a database can handle 10k users.
You think about scaling, but the hardest problem is not scaling, it is to come to the point where you need to scale.
I'm sure everybody wants 1bn users, but then you are already dreaming about having a social network with 200 times more users than Github itself ? (Github has ~ 5 million users).
Also, even by thinking it ahead, you will refactor and refactor again definitely during years, and you will have more than one persistence layer, be sure of it.
Code and code good, stay lean, remain able to change quickly, deploy, show to users, refactor, test, deploy and show to users in the same day. These are the things you need to do now, not asking questions about a problem you don't have yet, you definitely have a lot of other problems to solve now ;-)
UPDATE
Based on your comment, you might need to think that there are questions we just can not simply answer, because we don't need your exact requirements.
I have a simple app, which uses 4 persistence layers, and this app is not yet online. I'll give you my "why" about using it and which use case :
Neo4j : it is the core of the application data, I use it because I love it, I know it very much (it is my job) and, as the concept of the app is quite new and can evolve rapidly, having a schemaless db is reducing a lot of the refactoring stuff. Also I have now a lot of use cases coming by building the app, which make Neo4j a good choice when you need to add features without breaking what has already been done.
MySQL
I use it for User accounts and profiles. Why ? Because the framework I use already has a lot of bundles integrating this kind of stuff in a couple of lines of code, the bundles are well maintained and if I would use (currently) neo4j for it, I will have to reinvent the wheel. Also all the modules I use evolve in stability and compatibility with the framework.
Of course the mysql data is coupled (minimally) with the neo4j one. But I know that this kind of data will not evolve that much, so Mysql is a good choice and in case I have to refactor some points, this will not be a huge pain.
Redis
I use Redis for storing analytics data, Redis is quite flexible and I can easily create new keys and add data on top of it.
RabbitMQ :
I use a lot of message queues, why ? For testing refactoring. I can easily process messages with multiple consumers for testing "refactoring", testing mutliple database layers while the app is running for testing changes, testing new features, testing refactoring, ...
You will refactor ! Just try to keep it as simple as possible.

Database modeling troubleshooting [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Hi i recently came across a situation where i am asked to optimize data model for one of our client for there already developed and running product.The main reason for doing this exerciser is, the product suffers from performance slowness due to too many locks and too many slow running queries.As i am not a DBA, looking at first site to the data model and doing some tracing of queries, i realize that the whole data model suffers from improper design and storage.The database is MySQl 5.6 and we are running InnoDB engine on that.
I want to know that is there any tool out there which can analyze the whole data model and can point out to possible issues including data structure definitions,indexes and other stuffs?
I tried lots of profiling tools including Mysql Workbench,Mysql Enterprise Monitor(paid version),jet profiler but they all are seems to be limited to identifying slow queries only. What i am interested in a tool which can analyze the existing data model and report problems with it and possible solutions for the same
You can not look at the data model in isolation. You need to consider the data model together with the requirements and the actual data access/update patterns.
I recomend you identify the top X slowest queries and perform your root-cause analysis.
Make sure you focus on the parts of the application that matters, i.g. the performance problems that negatively affects the usefulness of the application.
And by data access/update patterns I mean for example:
High vs low nr of concurrent access
Mostly reads or updates?
Single record reads vs reading large nr of records at once
Access is evenly spread out during the day or in bulk at certain times
Random access (every record is likely to be selected or updated) vs mostly recent records
Are all tables equally used or some are more used than others
Are all columns of all tables read at once? Or are there clusters of columns that are used together?
What tables are frequently used together?
The slow queries are the most important to look at. Show us a few of them, together with SHOW CREATE TABLE, EXPLAIN, and how big the tables are.
Also, how many queries per second are you running?
SHOW VARIABLES LIKE '%buffer%';
There are no such tools as those you are looking for, so I guess you'll have to do your homework, proposing another data model that "follows the rules".
You could begin by getting familiar with the first three normal forms.
You could also try to detect SQL antipatterns (there are books talking about these) in your database. This should give you some leads to work on.

When to use MongoDB [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm writing an application that doesn't necessarily need scaling abilities as it won't be collecting large amounts data at the beginning. (However, if I'm lucky, I could down the road potentially.)
I will be running my web server and database on the same box (for now).
That being said, I am looking for performance and efficiency.
The main part of my application will be loading blog articles. Using an RDBMS (MySQL) I will make 6 queries (2 of the queries being joins), just to load a single blog article page.
select blog
select blog_album
select blog_tags
select blog_notes
select blog_comments (join with users)
select blog_author_participants (join with users)
However, with MongoDB I can de-normalize and flatten 6 tables into just 2 tables/collections and minimizes my queries to potentially just one 1 query,
users
blogs
->blog_album
->blog_tags
->blog_notes
->blog_comments
->blog_author_participants
Now, going with the MongoDB schema, there will be some data redundancy. However, hard drive space is cheaper than CPU/servers.
1.) Would this be a good scenario to use MongoDB?
2.) Do you only benefit in performance using MongoDB when scaling beyond a single server?
3.) Are there any durability risks using MongoDB? I hear that there is potential for loss of data while performing inserts - as insert are written to memory first, then to the database.
4.) Should this stop me from using MongoDB in production?
You would use MongoDB when you have a use case that matches its strengths.
Do you need a schema-less document store? Nope, you have a stable schema.
Do you need automatic sharding? Nope, you don't have extraordinary data needs or budget for horizontally scaling hardware.
Do you need map/reduce data processing? Not for something like a blog.
So why are you even considering it?
However, with MongoDB I can de-normalize and flatten 6 tables into just 2 tables/collections and minimizes my queries to potentially just one 1 query
But you can easily query MySQL for 6 tables worth of information related to a single blog post with a single properly crafted SQL statement.
however hard drive space is cheaper than CPU/servers.
If performance and scaling is a priority then you are going to be concerned with having enough RAM to fit everything into main memory and enough CPU cores to run queries. An enterprise grade RAID 10 array is a requirement, don't get me wrong, but as soon as your database software (MongoDB or MySQL) needs to scan an index that can't fit into main memory you'll be in for a world of pain assuming a large active database. :)
I like MongoDB, but it's big strength in my mind is map/reduce and its document-orientation. You require neither of those features. MySQL is time-tested in large scale deployments and supports partitioning (but I would argue that your database would have to be in the order of 50-100 GB before you can realize substantial gain from partitioning vs a single (plus passive backup) server with tons (64 GB+) of RAM. I would also argue that if performance is truly a concern then MySQL would be preferable as you would have supreme control over your indexes.
That's not to say that MongoDB isn't high performance, but its place probably isn't serving blogs. Your concern with inserts is valid as well. MongoDB is not an ACID system. Google transactions in both systems and compare.
Here is a good explanation: http://mod.erni.st/nosql-if-only-it-was-that-easy/
The last paragraph summarizes it:
What am I going to build my next app on? Probably Postgres. Will I use NoSQL? Maybe. I might also use Hadoop and Hive. I might keep everything in flat files. Maybe I’ll start hacking on Maglev. I’ll use whatever is best for the job. If I need reporting, I won’t be using any NoSQL. If I need caching, I’ll probably use Tokyo Tyrant. If I need ACIDity, I won’t use NoSQL. If I need a ton of counters, I’ll use Redis. If I need transactions, I’ll use Postgres. If I have a ton of a single type of documents, I’ll probably use Mongo. If I need to write 1 billion objects a day, I’d probably use Voldemort. If I need full text search, I’d probably use Solr. If I need full text search of volatile data, I’d probably use Sphinx.
NoSQL vs. RDBMS: Apples and Oranges?
I would advise you to read up a little on what NoSQL is and what it does before you decide whether you can use it. You can't take a normal database and turn it into a NoSQL thing just like that. The way you work with the data is completely different.
NoSQL definitely has its uses. But it's definitely not the answer for everything. The main advantage of NoSQL is the easily changeable data model.
Advantages of using mongodb ( as per Moshe Kaplan published in dzone article)
Schema-less design
Scalability in managing Tera bytes of data
Rapid replicaSet with high availability feature
Sharding enables linear and scale out growth w/o running out of budget
Support high write load
Use of Data locality for query processing
MongoDB meets Consistency & Partitioning requirements in CAP theory ( Consistency, Availability and Partitioning)
Related SE questions:
What are the advantages of using a schema-free database like MongoDB compared to a relational database?
When to Redis? When to MongoDB?
I can't speak to the performance considerations, but for me, the first consideration of whether you want to use a SQL-DB vs MongoDB is the structure of the data you want to store.
MongoDB is "schema-less" in the sense that you don't need to know what "tables" and "columns" you want beforehand. It is very flexible. So, if you don't know what information you want to store in your "blogs" Collection for example, or if different blog posts may store different information, then MongoDB allows this flexibility. Whereas with SQL relational databases, you have to know your schema upfront.
But it sounds like you already know what information you want to store, in which case I might just stick with a SQL relational database. I don't think performance is the first consideration in your case - you're not building a real-time application where one or two milliseconds matter all that much.

SQL (MySQL) vs NoSQL (CouchDB) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am in the middle of designing a highly-scalable application which must store a lot of data. Just for example it will store lots about users and then things like a lot of their messages, comments etc. I have always used MySQL before but now I am minded to try something new like couchdb or similar which is not SQL.
Does anyone have any thoughts or guidance on this?
Here's a quote from a recent blog post from Dare Obasanjo.
SQL databases are like automatic
transmission and NoSQL databases are
like manual transmission. Once you
switch to NoSQL, you become
responsible for a lot of work that the
system takes care of automatically in
a relational database system. Similar
to what happens when you pick manual
over automatic transmission. Secondly,
NoSQL allows you to eke more
performance out of the system by
eliminating a lot of integrity checks
done by relational databases from the
database tier. Again, this is similar
to how you can get more performance
out of your car by driving a manual
transmission versus an automatic
transmission vehicle.
However the most notable similarity is
that just like most of us can’t really
take advantage of the benefits of a
manual transmission vehicle because
the majority of our driving is sitting
in traffic on the way to and from
work, there is a similar harsh reality
in that most sites aren’t at Google or
Facebook’s scale and thus have no need
for a Bigtable or Cassandra.
To which I can add only that switching from MySQL, where you have at least some experience, to CouchDB, where you have no experience, means you will have to deal with a whole new set of problems and learn different concepts and best practices. While by itself this is wonderful (I am playing at home with MongoDB and like it a lot), it will be a cost that you need to calculate when estimating the work for that project, and brings unknown risks while promising unknown benefits. It will be very hard to judge if you can do the project on time and with the quality you want/need to be successful, if it's based on a technology you don't know.
Now, if you have on the team an expert in the NoSQL field, then by all means take a good look at it. But without any expertise on the team, don't jump on NoSQL for a new commercial project.
Update: Just to throw some gasoline in the open fire you started, here are two interesting articles from people on the SQL camp. :-)
I Can't Wait for NoSQL to Die (original article is gone, here's a copy)
Fighting The NoSQL Mindset, Though This Isn't an anti-NoSQL Piece
Update: Well here is an interesting article about NoSQL
Making Sense of NoSQL
Seems like only real solutions today revolve around scaling out or sharding. All modern databases (NoSQLs as well as NewSQLs) support horizontal scaling right out of the box, at the database layer, without the need for the application to have sharding code or something.
Unfortunately enough, for the trusted good-old MySQL, sharding is not provided "out of the box". ScaleBase (disclaimer: I work there) is a maker of a complete scale-out solution an "automatic sharding machine" if you like. ScaleBae analyzes your data and SQL stream, splits the data across DB nodes, and aggregates in runtime – so you won’t have to!
And it's free download.
Don't get me wrong, NoSQLs are great, they're new, new is more choice and choice is always good!! But choosing NoSQL comes with a price, make sure you can pay it...
You can see here some more data about MySQL, NoSQL...: http://www.scalebase.com/extreme-scalability-with-mongodb-and-mysql-part-1-auto-sharding
Hope that helped.
One of the best options is to go for MongoDB(NOSql dB) that supports scalability.Stores large amounts of data nothing but bigdata in the form of documents unlike rows and tables in sql.This is fasters that follows sharding of the data.Uses replicasets to ensure data guarantee that maintains multiple servers having primary db server as the base. Language independent.
Flexible to use

Should we be converting to PostgreSQL from MySQL? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Now that MySQL is in Oracle's hands, do you think it's a good idea to switch to using PostgreSQL for new applications instead? (Also what do you think about converting existing applications?)
I've used both DB systems before and while PostgreSQL is great for it's licensing terms and standards compliance, MySQL is definitely easier to get up and running quickly. (I make this as a personal observation, I know you might disagree...)
Edit:
I should clarify... I don't want this to be a MySQL/PostgreSQL is better than PostgreSQL/MySQL debate. I like both DB systems and am happy using both (and really for the complexity of most of the applications I'm working on, it's much of a muchness). I'm just in a position where I'm trying to look forward and consider the stability of my technology base before committing myself to a particular course. If you have gone through a similar process and have some kind of migration plan in mind I would like to hear from you regarding what that is and why you decided on it.
Installing is a one-time-job ... kindof. Depends ofcourse. but PostgreSQL isn't much harder to install than MySQL, if harder at all. It's the day-to-day cost of ownership that matters. As a developer I prefer PostgreSQL over MySQL, as the latter behaves different from version to version (they're still playing catchup to the sql standard and probably always will). Also MySQL is a pain to administer sometime. What does it matter if it takes ten minutes more to install if you must wait for hours when adding a column to a table or other trivial tasks. Finally I think the mysql-environment was too turbulent even before the Oracle takeover, with Oracle already owning innoDB, MariaDB. I think it is a general mess. So yes, I'd migrate, but for other reasons.
If you actually prefer MySQL over PostgreSQL I'd lay out a migration plan just to be ready if need arises, as a kind of lazy proactiveness ...
Look at it this way: regardless of what Oracle says, the fact remains that they could decide to do Something Bad with MySQL at any time. Maybe they will, and maybe they won't, but why take the risk (for new projects, at least) when you can just use PostgreSQL?
Given the choice, I'd just as soon go with Postgres myself. It seems to be a very stable project upon which to base my own work. Long history, under active development, good documentation, etc.
Since you've indicated that you're happy working with either one, I say go with Postgres for new projects and don't worry about converting existing projects unless and until Oracle does something with MySQL that gives you cause for concern.
I am no fan of Oracle, but the company has come forward with a 10 point commitment to existing MySQL customers.
So at least as of now, I don't see any cause for worry. Any database migration will require some effort and cost in terms of time and money. So if I were you, I'd hold on for a while before doing anything drastic as a database migration.
Even if MySQL does go south, there's MariaDB, which was started by the founder of MySQL. It's a drop in replacement and has some quite exciting new features.
http://askmonty.org/wiki/index.php/MariaDB
I've been giving a go on my development environment and I've been liking it so far.
See the article:
Save MySQL by letting Oracle keep it GPL
This answers your question amongst other things.
Good lord.
O.k. so let's just get it in the open. I am not a MySQL fan. I think its broken. However I am biased (http://www.commandprompt.com/). That said here are the benefits of PostgreSQL.
PostgreSQL scales farther than MySQL. MySQL does really well if you have a limited number of CPUs. If you get above 4, PostgreSQL will just go farther, longer.
PostgreSQL's license allows it to never be bought. You don't have to worry about a single entity taking it over. At present there are at least a dozen actively supporting companies including, Red Hat, PgExperts, Command Prompt, OmniTI, EnterpriseDB, Fujitsu and Oracle (yep).
PostgreSQL's feature set is remarkable. Just look at it.
However, and this is the most important. Do what your business requires. MySQL is a decent database when used for its purpose.