Database alternative to MySQL made for millions of TABLES - mysql

My understanding is that MySQL is intended to have tables with millions of rows. I am looking for a database system designed to have millions of relational tables. Am I correct in my understanding that the way MySql queries data makes it inefficient for that sort of an implementation? It is for a long-term, user-driven project, so extensibility is a must.
Thanks!
EDIT:
Due to the immediately negative reaction, I'll explain myself. "Millions" of tables would be an issue if the project lived to accumulate a strong user base in time. It would implement an edit system similar to that on Stack Overflow; I considered a variety of solutions, and decided the one I liked best was one using a relational table for each offshoot of edits. I assumed there was some database framework designed for that sort of thing. Is this really considered "bad" architecture? Why is it not just an abnormal type of architecture? What is "wrong" with doing something that way?

You could always look towards a NoSql DB:
From: http://nosql-database.org/
"NoSQL DEFINITION: Next Generation Databases mostly addressing some of
the points: being non-relational, distributed, open-source and
horizontally scalable."
Edit: Scalable is what I was shooting for..
Suggestion:
http://www.mongodb.org/
Edit: Interesting idea about data versioning:
Ways to implement data versioning in MongoDB

Related

How to handle ever changing database structure

I am working on my masters thesis. For my implementation I have some MySQL tables.
With every iteration my table structure will differ (adding, removing columns etc). I was wondering what the best way is to handle the ever changing structure, without changing old code too much.
I read that Facebook has a version control system where the can specify exactly what kind of code/feature is available and for what user. As far as I know that must mean that they manage many different database structures at once. How does their old code work along side their new code with respect to their database? Do they do a lot of testing? Did they abandon MySQL all together?
Personally I like FriendFeeds Solution a lot. However I am wondering if it is too much for me.
Why anyone would try to use a relational database for non-relational data.
Forget about FriendFied and take a look at NoSQL solutions. They are schemaless, they support horizontal scalability much better than any RDBS and most of them are free/open source.
I can recommend MongoDB. It's very fast, written in C++, but no ACID complaint.
Also you could try RavenDB. It's not as fast as MongoDB and inserts are very slow compared to Mongo, but it's ACID complaint. Written in .NET.

Use NoSql? And if yes how?

I read and heared a lot (podcasts, stackoverflow questions..) about NoSQL-Databases and I am really curious to use them, but...
Although I read a lot of things like how-to-sql-or-nosql or what-scalability-problems-have-you-solved-using-a-nosql-data-store I am still not certain which kind of DB to use.
The Problem is: For a (school) project we (my project group) need to implement a quite big database (that should serve a rest-server, probably written in erlang, with lots of clients).
We are quite good at designing datamodels for relational databases. So we startet to do that.
Now I played around with some NoSQL and was really impressed by the performance.
So: Is it a good Idea to use a NoSQL Database? Our Datamodel has lots of relations and the queries would have lots of joins (or at least use joined views).
I sometimes read this means I should go with a relational Database and in other places I read this means I could easily redesign it into NoSQL-Style to loose this overhead of relations.
Should I use NoSQL and if yes, which of the systems would you suggest me to use?
Are Things like HanderlerSocket for MySQL are an option?
And how can I easily redesign a relational Datamodel into NoSQL-Style?
The answer to your question is: It totally depends on your data and requirements. In a real-world project you would analyze the benefits of various NoSQL-Databases (HBase, Cassandra, MongoDB, CouchDB, Riak,...) in your special project. Then you could evaluate these against the benefits of a classical RDBMS like MySQL.
In a school project like yours a NoSQL-Database is mainly a decision of taste as your project will probably never benefit from typical NoSQL-advantages like schemalessness or sharding.
A redesign of a relational datamodel can be a very tricky task as you have to wrap up your mind around the different database model of the chosen NoSQL-database. Joins are not necessarily a problem if your business data fits the database model of your chosen NoSQL-database. Sometimes Join-intensive relational models are a lot easier to implement in some NoSQL-Databases (e.g. a Document oriented database like MongoDB).
If you really want to try out NoSQL go with MongoDB as it is very well documented for a first entry.
As a german-speaker (Grützi in die Schweiz aus Berlin) I recommend you to read the following book in German, which helps you to get the main reasons for using a NoSQL-database and explains the main steps to start using the most popular NoSQL-Databases: NoSQL: Einstieg in die Welt nichtrelationaler Web 2.0 Datenbanken
Please keep in mind that you are not required to use just 1 data storage engine. You can use SQL and noSQL solutions in parallel.
Just remember to document your database/noSQL structures properly.
-daniel
If you want to do joins in nosql, you could use playOrm which does joins on partitions. In this way, you can have a 1 trillion row table and 1 partition of that table may only be 100,000 rows and you can join that partition with another one. playOrm also then gives you all the familiar hibernate relationships as well.

How to create pseudo document oriented model?

Currently, I am using Rails with Mysql as the backend. Unfortunately, my application has scaled in data which was not expected or foreseen when it started. Now, I am facing a lot of performance issues with increasing entries in the database and ActiveRecord is taking hit due to in-numerous queries that are fired as a result of enjoying the Relational logic.
I have come to a point where I feel like paying penalty for enjoying the advantages of a proper Relational model. Since speed has come under the hammer, I had to do research on Document-Oriented Models like Mongo DB and found that they offer speed compensating the Relational features.
My question here is, how to slowly migrate from Relational model to document model. Perhaps, I will store my temporary schemas or the tables returned and dump them as a bulk document on the fly instead of setting up a proper document-oriented DB (at least during the initial phase). Space is not an issue for me. All I care now is time. But then, I cannot do that in one single sweep. I would like to know how to approach this problem, any links/references where this kind of problem has been solved before would be much appreciated.
I would highly recommend against migrating to a document db unless your data is better suited to such a database.
Migrating for speed reasons would generally be a bad idea, and you should instead look for slow queries in your existing AR based system and optimise them.

What database works well with 200+GB of data?

I've been using mysql (with innodb; on Amazon rds) because it's sort of universal default, but it's been ridiculously under-performing, and tweaking it only delays the inevitable.
The data is mostly relatively short (<1kB of bytes each) blobs information about 100Ms of urls. There is (or should be, mysql cannot seem to handle it) very high amount of insert / update / retrieve but few complex queries - not that complex queries wouldn't be useful, but because mysql is so slow that it's far faster to get the data out, process it locally, and cache the results somewhere.
I can keep tweaking mysql and throwing more hardware at it, but it seems increasingly futile.
So what are the options? SQL/relational model/etc. optional - anything will do as long as it's fast, networked, and language-independent.
Have you done any sort of end-to-end profiling of your application and MySQL database? To provide better advice it would also be good to understand what improvements you have tried to implement, and your database structure. You haven't given a lot of information on how your MySQL database is configured either. It provides a lot of options for tuning.
You should pick up a copy of High Performance MySQL if you haven't already to learn more about the product.
There is no point in doing anything until you know what your problem is. NoSQL solutions can offer performance benefits but you have provided little evidence that MySQL is incapable of servicing your needs.
Well "Fast, networked and language-independent" + "few complex queries" brings to mind the various NoSQL solutions. To name a few:
MongoDB
CouchDB
Cassandra
And if that's not fast enough, there are always the wicked fast Redis which is my personal favorite atm. :) It is not a database per se, but it's good enough for most scenarios.
I am sure other people can list more NoSQL databases...
and there is always http://nosql-database.org/ .
Generally speaking, databases in this category is better and faster in your scenario because they have relaxed constraints and thus is easier and faster to insert/update/retrieve frequently. But that requires that you think harder about your data model and it is generally not possible to do SQL-style complex queries directly -- you'll instead write more pre-computed data or use a more denormalized design to account for the lack of complex queries.
But since complex queries is a minor problem in your case, I think NoSQL solutions are ideal for you.
With the data you've given about your application's data and workload, it is almost impossible to determine whether the problem really is MySQL itself or something else. You seem to assume that you can throw any workload to a relational engine and it should handle it. Therefore the suggestions made by other commenters about analyzing the performance more carefully are valid in my opinion. Without more data (transactions / second etc.) any further analysis regarding other suitable engines is also futile.
I'm not sure I agree with the advice to jump ship on traditional databases. It might not be the most efficient tool, but it is the one that is FAR more widely understood and used, and a strongly doubt you have a problem that can't be handled by an efficiently set up relational database.
Obvious answers are Oracle, SQLServer, etc, but it might just be your database structure isn't right. I don't know much about MySQL but I do know it's used in some pretty big projects (eBay being noteworthy).

Where to find a good reference when choosing a database?

I and two others are working on a project at the university.
In the project we are making a prototype of a MMORPG.
We have decided to use PostgreSQL as our database. The other databases we considered were MS SQL-server and MySQL.
Does somebody have a good reference which will justify our choice? (preferably written during the last year)
Someone recently recommended me wikivs.com: MySQL vs. PostgreSQL - it is a quite detailed comparison of those two, and might be of help to you.
the most mentioned difference between MySQL and PostgreSQL is about your reading/writing ratios. If you read a lot more than you write, MySQL is usually faster; but if you do a lot of heavy updates to a table, as often as other threads have to read, then the default locking in MySQL is not the best, and PostgreSQL can be a better choice, performance-wise.
IOW, PostgreSQL scales better regarding to DB writes.
that's why it's usually said that MySQL is best for webapps, while PostgreSQL is more 'enterprisey'.
Of course, the picture is not so simple:
InnoDB tables on MySQL have a very different performance behaviour
At the load levels where PostgreSQL's better locks overtake MySQL's, other parts of your platform could be the bottlenecks.
PostgreSQL does comply better with standards, so it can be easier to replace later.
in the end, the choice has so many variables that no matter which way you go, you'll find some important issue that makes it the right choice.
Go with something that someone in your team has actual experience of using in production. All databases have issues which frequent users are aware of.
I cannot stress enough that someone in the team needs PRODUCTION experience of using it. Not using it for their homework, or to keep their list of CDs in.
All of these databases have their advantages and disadvantages. Which is better is dependent on:
Your teams experience
Your exact requirements
Your current environemnt e.g. whats your app written in and going to be hosted on?
SQL servers main problem is the cost unless you use express edition which has performance limitations however its very easy to use and has a number of good tools.
There is a comparison of the different sql versions at:
http://www.microsoft.com/sql/prodinfo/features/compare-features.mspx
You could then compare these with MySQL and PostGre.
If the purpose of this comparison is a theoretical one for your essay then you can reference web pages such as the microsoft link and compare performance, cost etc.
Postgresql has a page of case studies that you can quote and link to.
Really, any of the above would have worked for you. I personally like PostgreSQL. One solid advantage it has over MSSQL (even assuming you can get it for "free") is that PostgreSQL is non-proprietary. If you're going to introduce a dependency into your project (and re-inventing an RDBMS would be crazy), you don't want it to be a black box.