Performance Analysis of CouchDB - mysql

I am developing a Discussion Forum for my University. For this to manipulate the data i m using CouchDB as database.
I m finding difficulty in designing the structure of my db, in order to maximize the performance of my db.
I want to discuss what is the good practice of designing a document database.
Either we should make only one database as SQL and make 'n' no. of documents in the database.
Or we can make more no of database in order to flatten my db structure.This also reduce the more no. of documents to be developed.

The questions you need to ask are simply this: "How do you want to get data out of your database?"
Database design hinges around the queries to be made, not what is available to be stored.
This is especially important for Document DBs like Couch, since, while it does have a flexible schema, it does not have flexible indexing. By that I mean that because of the granularity of the data, it's quite like that later on, when you need to ask a question that it was not designed to answer, answering that question may well be very expensive. It's much, much cheaper to design your views and other constructs early, when there is little data in the data base rather than later after you have thousands or millions of rows.
RDBMS's, since they tend to have a finer granularity of data, tend to be more nimble to new queries and such later in life. Document DBs, not so much.
So think through your use cases up front, and design around those, and design those early on, it's much less painless now than later.

It's hard to tell the right way to approach modeling your data since you don't give much information. Generally though you want to keep as much data as possible in one database as this allows you to index it together (indexes cannot span more than one database).
Also, since there is no schema enforcement in the database, you can create different types of records in each database. For example, there is nothing wrong with have both user information and forum entries in the same database.
Last, you will most likely want to keep messages and their replies in different records. This is an old but still relevant discussion on this topic: http://www.cmlenz.net/archives/2007/10/couchdb-joins
Cheers.

Related

Is mongo appropriate to use alongside MySQL?

I can't discuss things in great detail due to an NDA, but I'm hoping an overview of the system being built can help you in aiding me in making a decision concerning our databases.
I'm building an app that will help vendors compete to gain clientele by making strategic offers based on records of inventory/purchase from the storefronts.
One side of the app is for the store owners to see presented offers, network, etc. I've got that going with a standard php/MySQL setup.
My question is concerning the records of inventory. We are talking millions of records here nearly immediately. The sample data I'm using is roll up of four of their managers (they have dozens) over the course of a year or two and it had over 500k rows with about 30 or more columns. When we get scores of stores with all of their managers it will be massive, at least compared to anything I've worked with as of yet.
The vendors will have a side of the product in which they can search through these records and make competitive offers based off of it.
Is the sheer size a good reason to use something like mongo? Or is it more a matter of how the data is laid out / what it consists of? Or some other element that I'm not considering?
And, if not mongo/nosql, then is there some other methodology or technology that such large data stores would benefit from me using (sharding, amazon cloud database, etc).
Thanks
Answers ...
Q: Is the sheer size a good reason to use something like mongo?
A: I think so. Mongo was built from the ground up to scale in a massive way. You have replica sets and sharding that can help you scale. They also have features to make sure your data gets stored in the appropriately geographically distributed data centers.
Q: Or is it more a matter of how the data is laid out / what it consists of?
A: Mongo is a document database and you're right, the data models will be different. You have to think of data in a denormalized way instead of normalized. Just like any technology, there are pros and cons to storing things as documents.
Some pros: Schema management is a breeze. Data more naturally fits objects in your application. Don't have to pay the price of complicated/slow joins.
Some cons: Schemas can be inconsistent - you have to manage it. Data is repeated, which is not managed means it can become inconsistent.
In general I think Mongo would be a good choice to deal with that scale. Mongo has a new aggregation framework that brings a lot of SQL concepts to queries on documents. Easier to make complex queries. Also Mongo has map/reduce to run any kind of query you might have.
After using Mongo daily for about a year, I've really enjoyed the support around it as a product and the general ease of setting it up and working with it.

XML or MYSQL.Which should be used for storing connected data?

i am writing code for friend list and messaging system for my college website.I need to store interconnected data.. need to search them ...It has about 3500 records..So which way I proceed MYSQL or XML ..which is fastest..which is best ?why?
I'm going to use one of my professor's favorite answers here: "it depends."
XML and MySQL have very different applications. If you need to be doing lots of simultaneous queries for all sorts of sophisticated things, MySQL is your clear winner. Sometimes MySQL can be hard to use in some applications because you must first create a database schema in which to fit your data. It sounds like though, that you have many records with the same structure, and it would be easy enough to throw them into a database. With a SQL based database engine like MySQL, you can also construct queries using the standard SQL language. Database optimizations can also help to increase the performance of these types of queries, for example, you can used indexes and keys. If your data needs to be updated regularly, than MySQL will likely provide better performance as it will not have to rewrite the XML file. If you need your application to scale to many simultaneous connections of sophisticated queries, you are definitely going to want to go with some sort of SQL solution.
Depending upon your application though, sometimes there are other ways to store and access your data. I for one once needed to create a persistent data structure on the disk which could be accessed very quickly, but never updated. For that, I used cdb. There are also other database systems out there like the Berkeley database, and some No-SQL solutions such as couchdb and mongodb. I posed a somewhat interesting question here on stackoverflow on the use of No-SQL solutions a little while back which you may find interesting as well.
This is really just a sampling of different considerations you may want to make when you are choosing how you want to store your data. Think about questions like: How frequently will things be queried? or updated? What will your queries look like? What kinds of applications do you need to access your information from? etc.

mysql v mongodb - best solution for a complex user focussed site?

I've spent a few days researching the pros and cons of mysql against nosql solutions (specifically mongodb) for my project.
The project needs to be able to eventually scale to handle tens of thousands of simultaneous users - millions of users in total. The site is heavily user focussed and will interact with the database as much if not more than a site like facebook - it is very relational, all functionality is dependant on the relation to the user and their relationship with other users. It's also data heavy - lots of files, images, audio, messaging, personal news feed etc.
I like the look on mongodb a lot, I like the way it works, and I like how it scales - but can't get my head around how this would work for a site such as I describe. Would all interactions for a specific user have to be stored in a single document?
I am however very comfortable using mysql and like the relational aspect of it. I am just worried without a lot of work there will be scalability issues with this project - although perhaps with memcached and sharding this won't be an issue?
I'd like to know from those with experience with the two databases on large projects, out of mysql and mongodb which is the right tool for this particular job?
If the data is highly relational, use a relational database. If it's not, don't. NoSQL is great, don't get me wrong, but it's not suited to all tasks. It may be suited to your task, but the only way to find out is for you to build some tests for your specific usecase. Add a bunch of dummy data (millions if not hundreds of millions of rows). And then load test it.
As far as scaling, that's more of a component of how you build your application than the backend you choose. Do you have a solid schema? Do you have a strong cache layer with write-through caching? Do you access the backend as efficiently as possible (queries and such)? Can you shard based upon your application?
Those are the kind of questions which are appropriate here. Not "which will scale for me better". And not "which is the right tool". Both can do the job fine. Which is best is up to you...
Obviously, there's no silver bullet here. However, I would like to challenge this one assumption you've made:
... it is very relational, all functionality is dependant on the relation to the user and their relationship with other users...
OK, I'd like you to picture having 100M users in a relational database and start building this model. Let's try something simple, grab the names of a user's friends.
How do you get a user's friends? Well you go to the users_friends table. If each user has even just 10 friends, that table contains a billion rows. If users have a more reasonable 100 friends, you now have 10B rows.
So now you have a user and a list of their friends IDs. How do we get their friend's names? Well you go through the list of 100 IDs and pull down each of the friends. Perfect.
So now, if you want to show one user the names of all of their friends, all you have to do is join the 100M record table to the 10B record table. This is not a simple task. Scaling joins becomes exponentially harder and more expensive as the dataset grows.
So, to make this easier, you're probably going to run a for loop and manually collect the records for each friend. You have to do this because the friends are scattered across multiple servers, so each "lookup" has to be done individually.
Already you've broken your "relational model".
What about the friends list? Is keeping a table of 10B records really practical? Why not just keep a list of friend IDs with each user? Why do an extra query.
If you notice the pattern here, we've basically broken down the "very relational" model into something that's effectively key-value lookups. Of course, the key-value model will scale much better. And so, MongoDB seems like a good fit here.
Don't get me wrong, there are lots of good uses for relational databases. But when you're talking about handling millions of individual key-value style requests, you probably want to look at a NoSQL database.
There is no law that you have to build an application with exactly one database. It is often common practice having dedicated backends for particular tasks. E.g. in the context of a Facebook-like application it may make sense to work with a graph-database for storing relations between users - every database has its pros and cons and only would fools implement large backends with only a RDBMS or only a NoSQL db just because they don't know better.

Database structure - is mySQL the right choice?

We are currently planning the database structure of a quite complex e-commerce web app that has flexibility as its main cornerstone.
Our app features a large amount of data (products) and we have run into a slight headache trying to keep performance high without compromizing normalization rules in the database, or leaving our highly beloved flexibility concept behind when integrating product options (also widely known as product attributes or parameters).
Based on various references and sources available, we have made up lists on pros and cons of all major and well known database patterns to solve this. After comparing these, we have come up with two final alternatives:
EAV (Entity-attribute-value model) :
Pros: Database is used for all sorting.
Cons: All related queries will include a number of joins between multiple tables in order to complete the collection of data.
SLOB (Serialized LOB, also known as Facade?) :
Pros: Very flexible. Keeping the number of necessary joins low compared to a EAV design pattern. Easy to update/add/remove data from each product but hard to keep data integrity without additional tables.
Cons: All sorting will be done by the application instead of the database. Will use lots of performance (memory?) when big datasets is processed by a large number of users.
Our main questions:
Which pattern/structure would you use, or maybe even a different solution?
Is there better databases besides mySQL available nowadays to accomplish what we want?
Thanks a lot!
Reference: How to design a product table for many kinds of product where each product has many parameters
Why limit yourself to one model? It's very possible that you'll be better off with two different models where each one meets a specific goal very well.
Assuming, as is often the case, that the two don't have to be absolutely and instantaneously in sync, you might easily end up with much better overall performance. What kind of hard requirements would you have on synchronization? Milliseconds up to a minute?
Udi Dahan has some good information on command query responsibility separation (CQRS) that's relevant. See also a couple of other articles. InfoQ also has very relevant video of Greg Young from QCon08.
EDIT: Here's another video (by Udi Dahan) that discusses, among other things, the benefits of multiple models.
MySQL performs very well even for very large datasets. I use it at a financial services SaaS company and it has always worked well. I have also use SQL Server and Oracle for very large applications and MySQL performs no better or worse on whole. My focus is more the business layer, though, and you may get more detailed opinions from people closer to the DB.
When selecting a pattern, keep in mind that it's much more straightforward to scale the application tier than the data tier (easy and cheap to add application servers). Performing many joins for common operations can cause a real performance bottleneck.
I would suggest you prototype both approaches so that you can both get more familiar with each of them, and benchmark their performance in your specific environment.
Additionally, you may want to look into alternatives to SQL that attempt to achieve a pattern similar to the ones you outline. A friend at a very large, well-known Internet company is starting to use Project Voldemort. He prefers it over similar efforts mostly due to the very active community.
from your solution, it seems you don't want to use a relational model, so perhaps it's better not to use a relational database, take a look at these alternatives: http://nosql-database.org/ btw SQLServer has nice SLOB features in the form of xml fields (can be indexed an queried through XQuery)

do document-oriented databases have integrity?

I'm coming from a MySQL background, and I'm interested in document-oriented databases, specifically CouchDB. One of the things I'm interested in is data integrity. How do document-oriented databases handle this? For instance, in RDBMSes, there are ways to prevent duplication of records, or guaranteeing that if you have one bit of information, you will have another, or else none at all.
I guess more broadly, my question is, what types of problems are RDBMSes cut out for, compared to problems that DODBes are used for? I looked on some of the other stackoverflow questions for an explanation, but didn't find any good ones.
Also, with my databases at work, I do a lot of reporting, with summing and averaging values, and historical trending. Is this something appropriate for document-oriented databases?
Most of the document-databases have only support very limited integrity or no integrity checks. They rely on the application to ensure that the data is correct. I can tell you how it is in CouchDB.
To the second part. I think RDBMS do very well at reporting and analyzing data. The fact that you can run complex queries on the data with joins, aggregations, functions etc make RDBMS a very powerful reporting-tool.
Document-databases do really well for storing the 'live' application-data. It very easy to store an retrieve object-graph into document-databases. The schema-free design makes it easy to extends the model for new application features. However this only works if you can split your application-data into nice documents. Otherwise you loose a lot of the elegance.
If you want to do mostly reporting, I would prefer a RDBMS. When to store lots of flat, simple records it very easy to do reporting on it. The tooling etc. is perfect for reporting. However when want to do reporting on complex structured data, you probably still better of with another database desgine than a RDBMS.
However this doesn't mean you need to limit yourself to RDBMS. You could combine the two technologies. Imagine a blog-software. You store the 'live' application data like blog-posts and comments into the documentdatabase. Data for reporting like click- and login-statistics is stored in a RDBMS. See also Rob Conerys post.