Session storage preferences in node.js - mysql

I have a node.js application that uses a MySQL database. I wanted to know what would be a good place for storing the sessions?
My application is actually a final project for one of my courses, but it could be a real world application later, as we are re-writing a software that is currently used by the university. I can use MySQL for session store, but I want to make my application using the most reliable or best practice in my situation.
I have read many posts/answers/forums, and the opinion is divided. Using another technology like Memcached/MemcacheDB or Redis, just for session store, would it be a recommended approach? Or should I just stick to MySQL, and later deal with scaling if the server load increases?
Even if the application is later used in real world, it would only be used by the undergraduate university students and faculties, so the users are sort of limited.
As of now, I'm leaning towards MySQL for the session store.

I am replying under the assumption that you are using MySQL throughout the whole application.
If the application will be used in the context of your university possibly it will not have scaling issues. SQL databases are not bad, they are able to handle quite a lot of data efficiently, you just need to be careful in the first place and to create efficient queries. Be careful with the joins because can really kill the server. You need to analyze quite a lot your application. For example, why do you think that you will have scaling/performance issues on the sessions and not in another place of your application? Do a bit of load testing, get some metrics and try to understand if you need it or no.
If you are a student though and you don't have prior experience with redis, I would go with redis because it is good to work with a new technology and gain a bit more of experience :)

Related

MongoDB, MySQL or something third for my project?

Ok guys.
I've begun developing a little sparetime project that might become big someday. Before I really get started, I want to be certain that I'm starting with the right setup. So I come to you.
I'm making a service, which will work mostly as a todolist/project planner.
In this system there will be an amount of users and an amount of tasks. Each task can be assigned to multiple users, and each user can have multiple tasks (many to many relation).
Until now I was planning to use MySQL, but a friend of mine, who is part of the project, sugested MongoDB instead. He tells me that it would increase performance and be more scaleable.
On the other hand I'm thinking that in order to either get all tasks assigned to a specific user, or all users assigned to a specifik task, one would need to use joins, which MongoDB doesnt have (or have in a cumbersome way as far as I have understood).
Now my question to you is "Which DB system would you suggest. MySQL or MongoDB or a third option? And why?"
Thank you for your time and your assistance.
Morten
We use MySQL at IGN to store person relationships (many-to-many like your use case), and have about 5M records in the relationship table. We have 4 MySQL servers in a cluster and the reads are distributed across 3 MySQL slaves. BTW you can always denormalize to optimize reads and penalizing writes among other things based on the read/write heavyness of your system.
We use the DAO pattern with Spring, so its fairly easy for us to swap DB providers through configuration (and by writing a Mongo/MySQL DAO Implementation as applicable). We have moved activities (like in Social Media) to Mongo almost a year ago but the person relationships are living happily in MySQL.
The comment to your post by Jonas says it all,
If need be, you can always scale later.
This.
I am very much of the mindset that If you don't have scaling problems, don't worry too much (if at all) about scaling problems. Why not use what is easiest, smartest and cleanest to deliver the features clients pay for (in my case at least!) This approach saves a lot of time and energy and is the proper one for 9 projects out of 10.
Learning a technology because it scales is great. Being tied to an unlearned technology and unknown technology because it scales in an upcoming project, is not as great. There are many other factors than scalability, when using 3rd party stuff.
MySQL would seem to be a good choice MySQL being more mature and having loads of client libraries, ORM's and other timesaving technologies. MySQL can handle millions (billions if you have the ram) of rows. I have yet to encounter a project it could not handle, and I have seen some pretty impressive datasets!
Of course, when you will need performance, sure maybe you will find yourself ripping out orm and sql generating code to replace with your own hand tweaked queries, but that day is way down the line and chances are, that day will never even come.
Mongodb, although it is real cool I am sorry to say may well bring you issues having nothing to do with scaling.
My 2 cents, happy coding!
MySQL
Either would likely work for your purposes, but your database seems relatively rigid in its structure, something which SQL deals well with. As such, I would recommend MySQL. A many-to-many relationship is relatively easy to implement and access, as well.
You may take a tiny bit of a performance hit, but in my experience, this is generally not extremely noticeable with smaller scale applications (i.e. databases with less than millions of entries). I do agree with #Jonas Elfström's comment, however: you should have an abstraction layer between your application and the database, so that should scaling become an issue, you can address it without too many problems.
Stick with a relational database, it can handle many to many relationships and is fully featured for backup and recovery, high availability and importantly you will find that every developer you need is familiar with it. There are plenty of documented methods for scaling a relational database.
Pick an open source databases either MySQL or Postgres dependant upon which your team is most familiar with and how it integrates into the rest of your infrastructure stack.
Make sure you design your data model correctly most importantly the relationships between the entities.
Good luck!

MySQL stored procedures use them or not to use them

We are at the beginning of a new project, and we are really wondering if we should use stored procedures in MySQL or not.
We would use the stored procedures only to insert and update business model entities. There are several tables which represent a model entity, and we would abstract it in those stored procedures insert/update.
On the other hand, we can call insert and update from the Model layer but not in MySQL but in PHP.
In your experience, Which is the best option? advantages and disadvantages of both approaches. Which is the fastest one in terms of high performance?
PS: It is is a web project with mostly read and high performance is the most important requisite.
Unlike actual programming language code, they:
not portable (every db has its own version of PL/SQL. Sometimes different versions of the same database are incompatible - I've seen it!)
not easily testable - you need a real (dev) database instance to test them and thus unit testing their code as part of a build is virtually impossible
not easily updatable/releasable - you must drop/create them, ie modify the production db to release them
do not have library support (why write code when someone else has)
are not easily integratable with other technologies (try calling a web service from them)
they use a language about as primitive as Fortran and thus are inelegant and laborious to get useful coding done, so it is difficult to express business logic, even though typically that is what their primary purpose is
do not offer debugging/tracing/message-logging etc (some dbs may support this - I haven't seen it though)
lack a decent IDE to help with syntax and linking to other existing procedures (eg like Eclipse does for java)
people skilled in coding them are rarer and more expensive than app coders
their "high performance" is a myth, because they execute on the database server they usually increase the db server load, so using them will usually reduce your maximum transaction throughput
inability to efficiently share constants (normally solved by creating a table and questing it from within your procedure - very inefficient)
etc.
If you have a very database-specific action (eg an in-transaction action to maintain db integrity), or keep your procedures very atomic and simple, perhaps you might consider them.
Caution is advised when specifying "high performance" up front. It often leads to poor choices at the expense of good design and it will bite you much sooner than you think.
Use stored procedures at your own peril (from someone who's been there and never wants to go back). My recommendation is to avoid them like the plague.
Unlike programming code, they:
render SQL injection attacks almost
impossible (unless you are are
constructing and executing dynamic
SQL from within your procedures)
require far less data to be sent over
the IPC as part of the callout
enable the database to far better
cache plans and result sets (this is
admittedly not so effective with
MySQL due to its internal caching
structures)
are easily testable in isolation
(i.e. not as part of JUnit tests)
are portable in the sense that they
allow you to use db-specific
features, abstracted away behind a
procedure name (in code you are stuck
with generic SQL-type stuff)
are almost never slower than SQL
called from code
but, as Bohemian says, there are plenty of cons as well (this is just by way of offering another perspectve). You'll have to perhaps benchmark before you decide what's best for you.
As for performances, they have the potential to be really performant in a future MySQL version (under SQL Server or Oracle, they are a real treat!). Yet, for all the rest... They totally blow up competition. I'll summarize:
Security: You can give your app the EXECUTE right only, everything is fine. Your SP will insert update select ..., with no possible leak of any sort. It means global control over your model, and an enforced data security.
Security 2: I know it's rare, but sometimes php code leaks out from the server (i.e. becomes visible to public). If it includes your queries, possible attackers know your model. This is pretty odd but I wanted to signal it anyway
Task force: yes, creating efficient SQL SPs requires some specific resources, sometimes more expensive. But if you think you don't need these resources just because you're integrating your queries in your client... you're going to have serious problems. I'd mention the analogy of web development: it's good to separate the view from the rest because your designer can work on their own technology while the programmers can focus on programming the business layer.
Encapsulating business layer: using stored procedures totally isolates the business where it belongs: the damn database.
Quickly testable: one command line under your shell and your code is tested.
Independence from the client technology: if tomorrow you'd like to switch from php to something else, no problem. Ok, just storing these SQL in a separate file would do the trick too, that's right. Also, good point in the comments about if you decide to switch sql engines, you'd have a lot of work to do. You have to have a good reason to do that anyway, because for big projects and big companies, that rarely happens (due to the cost and HR management mostly)
Enforcing agile 3+-tier developments: if your database is not on the same server than your client code, you may have different servers but only one for the database. In that case, you don't have to upgrade any of your php servers when you need to change the SQL related code.
Ok, I think that's the most important thing I had to say on the subject. I developed in both spirits (SP vs client) and I really, really love the SP style one. I just wished Mysql had a real IDE for them because right now it's kind of a pain in the ass limited.
Stored procedures are good to use because they keep your queries organized and allow you to perform a batch at once. Stored procedures are normally quick in execution because they are pre-compiled, unlike queries that are compiled on every run. This has significant impact in situations where database is on a remote server; if queries are in a PHP script, there are multiple communication between the application and the database server - the query is send, executed, and result thrown back. However, if using stored procedures, it only need to send a small CALL statement instead of big, complicated queries.
It might take a while to adapt to programming a stored procedure because they have their own language and syntaxes. But once you are used to it, you'll see that your code is really clean.
In terms of performance, it might not be any significant gain if you use stored procedures or not.
I will let know my opinion, despite my toughts possibly are not directly related to the question.:
As in many issues, reply about using Stored Procedures or an application-layer driven solution relies on questions that will drive the overall effort:
What you want to get.
Are you trying to do either batch operations or on-line operations? are they completely transactional? how recurrent are those operations? how heavy is the awaited workload for the database?
What you have in order to get it.
What kind of database technology you have? What kind of infrastucture? Is your team fully trained in the database technology? Is your team better capable of building a database-aegnostic solution?
Time for get it.
No secrets about that.
Architecture.
Is your solution required to be distributed onto several locations? is your solution required to use remote communications? is your solution working on several database servers, or possibly using a cluster-based architecture?
Mainteinance.
How much is the application required to change? do you have personal specifically trained for maintain the solution?
Change Management.
Do you see your database technology will change at a short, middle, long time? do you see will be required to migrate the solution frequently?
Cost
How much will cost to implement that solution using one or another strategy?
The overall of those points will drive the answer. So you have to care each of this points when making a decision about using or not any strategy. There are cases where using of stored procedures are better than application-layer managed queries, and others when, conducting queries and using an application-layer based solution is best.
Using of stored procedures tends to be more addequate when:
Your database technology isn't provided to change at a short time.
Your database technology can handle parallelized operations, table partitions or anything else strategy for divide the workload onto several processors, memory and resources (clustering, grid).
Your database technology is fully integrated with the stored proceduce definition language, that is, support is inside the database engine.
You have a development team who aren't afraid about using a procedural language (3rd. Generation language) for getting a result.
Operations you wanna achieve are built-in or supported inside the database (Exporting to XML data, managing data integrity and coherence appropiately with triggers, scheduled operations, etc).
Portability isn't an important issue and you do not whatch a technology change at a short time into your organization, even, it is not desirable. Generally, portability is seen like a milestone by the application-driven and layered-oriented developers. From my point of view, portability isn't an issue when your application isn't required to be deployed for several platforms, less when there are no reasons for making a technology change, or the effort for migrating all the organizational data is higher than the benefit for making a change. What you can win by using an application-layer driven approach (portability) you can loose in performance and value obtained from your database (Why to spend thousands of dollars for to get a Ferrari that you'll drive no more than 60 milles/hr?).
Performance is an issue. First: In several cases, you can achieve better results by using a single stored procedure call than multiple requests for data from another application. Moreover, some characteristics you need to perform may be built-in your database and its use less expensive in terms of workload. When you use an application-layer driven solution you have to take in account the cost associated to make database connections, making calls to the database, network traffic, data wrapping (i.e., using either Java or .NET, there is an implicit cost when using JDBC/ADO.NET calls as you have to wrap your data into objects that represents the database data, so instantiation has an associated cost in terms of processing, memory, and network when data comes from and goes to outside).
Using of application-layer driven solutions tends to be more addequate when:
Portability is an important issue.
Application will be deployed onto several locations with only one or few database repositories.
Your application will use heavy business-oriented rules, that need to be agnostic of the underlying database technology.
You have in mind to do change technology providers based on market tendencies and budget.
Your database isn't fully integrated with the stored procedure language that calls to the database.
Your database capabilities are limited and your requirement goes beyond what you can achieve with your database technology.
Your application can support the penalty inherent to external calls, is more transactional-based with business-specific rules and has to abstract the database model onto a business model for the users.
Parallelizing database operations isn't important, moreover, your database has not parallelization capabilities.
You have a development team which is not well-trained onto the database technology and is better productive by using an application-driven based technology.
Hope this may help to anyone asking himself/herself what is better to use.
I would recommend you don't use stored procedures:
Their language in MySQL is very crappy
There is no way to send arrays, lists, or other types of data structure into a stored procedure
A stored procedure cannot ever change its interface; MySQL permits neither named nor optional parameters
It makes deploying new versions of your application more complicated - say you have 10x application servers and 2 databases, which do you update first?
Your developers all need to learn and understand the stored procedure language - which is very crap (as I mentioned before)
Instead, I recommend to create a layer / library and put all your queries in there
You can
Update this library and ship it on your app servers with your app
Have rich data types, such as arrays, structures etc passed around
Unit test this library, instead of the stored procedures.
On performance:
Using stored procedures will decrease the performance of your application developers, which is the main thing you care about.
It is extremely difficult to identify performance problems within a complicated stored procedure (it is much easier for plain queries)
You can submit a query batch in a single chunk over the wire (if CLIENT_MULTI_STATEMENTS flag is enabled), which means you don't get any more latency without stored procedures.
Application-side code generally scales better than database-side code
If your database is complex and not a forum type with responses, but true warehousing SP will definitely benefit. You can out all your business logic in there and not a single developer is going to care about it, they just call your SP's. I have been doing this joining over 15 tables is not fun, and you cannot explain this to a new developer.
Developers also don't have access to a DB, great! Leave that up to database designers and maintainers. If you also decide that the table structure is going to get changed, you can hide this behind your interface. n-Tier, remember??
High performance and relational DB's is not something that goes together, not even with MySQL InnoDB is slow, MyISAM should be thrown out of the window by now. If you need performance with a web-app, you need proper cache, memcache or others.
in your case, because you mentioned 'Web' I would not use stored procedures, if it was data warehouse I would definitely consider it (we use SP's for our warehouse).
Tip:
Since you mentioned Web-project, ever though about nosql sort of solution? Also, you need a fast DB, why not use PostgreSQL? (trying to advocate here...)
I used to use MySql and my understanding of sql was poor at best, I spent a fair amount of time using Sql Server, I have a clear separation of a data layer and an application layer, I currently look after a server with 0.5 terabytes.
I have felt frustrated at times not using an ORM as development is really quick with stored procedures it is much slower. I think much of our work could have been sped up by using an ORM.
When your application reaches critical mass, the ORM performance will suffer, a well written stored procedure, will give you your results faster.
As an example of performance I collect 10 different types of data in an application, then convert that to XML, which I process in the stored procedure, I have one call to the database rather than 10.
Sql is really good at dealing with sets of data, one thing that gets me frustrated is when I see someone getting data from sql in a raw form and using application code to loop over the results and format and group them, this really is bad practice.
My advice is to learn and understand sql enough and your applications will really benefit.
Lots of info here to confuse people, software development is a evolutionary. What we did 20 years ago isn't best practice now. Back in the day with classic client server you wouldnt dream of anything but SPs.
It is absolutely horses for courses, if you are a big organisation with you will use multi tier, and probably SPs but you will care little about them because a dedicated team will be sorting them out.
The opposite which is where I find myself trying to quickly knock up a web app solution, that fleshes out business requirements, it was super fast to leave the developer (remote to me) to knock up the pages and SQL queries and I define the DB structure.
However complexity is growing and without an easy way to provide APIs, I am staring to use SPs to contain the business logic. I think it is working well and sensible, I control this because I can build logic and provide a simple result set for my offshore developer to build a front end around.
Should I find my software a phenomenal success, then more separation of concerns will occur and different implementations of n teir will come about but for now SPs are perfect.
You should know all the tool sets available to you and match them is wise to start with. Unless you are building an enterprise system to start with then fast and simple is best.
I would recommend that you stay away from DB specific Stored Procedures.
I've been through a lot of projects where they suddently want to switch DB platform and the code inside a SP is usually not very portable = extra work and possible errors.
Stored Procedure development also requires the developer to have access directly to the SQL-engine, where as a normal connection can be changed by anyone in the project with code-access only.
Regarding your Model/layer/tier idea: yes, stick with that.
Website calls Business layer (BL)
BL calls Data layer (DL)
DL calls whatever storage (SQL, XML, Webservice, Sockets, Textfiles etc.)
This way you can maintain the logic level between tiers. IF and ONLY IF the DL calls seems to be very slow, you can start to fiddle around with Stored Procedures, but maintain the original none-SP code somewhere, if you suddently need to transfer the DB to a whole new platform. With all the Cloud-hosting in the business, you never know whats going to be the next DB platform...
I keep a close eye on Amazon AWS of the very same reason.
I think there is a lot of misinformation floating around about database stored queries.
I would recommend using MySQL Stored Procedures if you're doing many static queries for data manipulation. Especially if you're moving things from one table to another (i.e. moving from a live table to a historical table for whatever reason). There are drawbacks of course in that you'll have to keep a separate log of changes to them (you could in theory make a table that just holds changes to the stored procedures that the DBA's update). If you have many different applications interfacing with the database, especially if say you have a desktop program written in C# and a web program in PHP, it might be more beneficial to have some of your procedures stored in the database as they are platform independent.
This website has some interesting information on it you may find useful.
https://www.sitepoint.com/stored-procedures-mysql-php/
As always, build in a sandbox first, and test.
Try to update 100,000,000 records on a live system from a framework, and let me know how it goes. For small apps, SPs are not a must, but for large serious systems, they are a real asset.

MongoDB vs MySQL

I used to build Ruby on Rails apps with MySQL.
MongoDB currently become more and more famous and I am now starting to give it a try.
The problem is, I don't know the underlying theory of how MongoDB is working (am using mongoid gem if it matter)
So I would like to have a comparison on the performance between using MySQL+ActiveRecord and model generated by mongoid gem, could anyone help me to figure it out?
The article entitled: What the heck are you actually using NoSQL for? does a very good job at presenting the pros and cons of using NoSQL.
Edit: Also read http://blog.fatalmind.com/2011/05/13/choosing-nosql-for-the-right-reason/ blog post too
Re-edit: I found some recent material (published in 2014) on this topic that I consider to be relevant: What’s left of NoSQL?
I don't know much of the underlying theory. But this is the advice I got: only use MongoDB if you run it across multiple servers; that's when it'll shine. As far as I understand, the NoSQL movement appeared in no small part due to the pain of load-balancing relational databases across multiple servers. So if you're hosting your application on no more than one server, MySQL would be the preferred choice.
The good people over at the Doctrine project recently wrote a quite useful blog post on the subject.
From what I have read so far... here is my take on it.
Standard SQL trades lower performance for feature richness... i.e. it allows you to do Joins and Transactions across data sets (tables/collections if you will) among other things.
This allows a application developer to push some of the application complexity into the database layer. This has it's advantages of not having to worry about data integrity and the rest of the ACID properties by the application by depending upon proven technology.
The lack of extreme scalability works for pretty much all projects as long as one can manage to keep the application working within expected time limits, which may sometimes result in having to purchase high performance/expensive relational database systems.
On the other hand, Mongo DB, deliberately excludes much of the inherent complexity associated with relational databases, there by allowing for better scalable performance.
This approach forces the application developer to re-architect the application to work around the lack of relational features... which in and itself is a good thing, but the effort involved is generally only worth it if you have the scalability requirements. Please note that with MongoDB depending upon the data requirements w.r.t ACID properties, the application will have to step up and handle as necessary.

What database systems should a startup company consider?

Right now I'm developing the prototype of a web application that aggregates large number of text entries from a large number of users. This data must be frequently displayed back and often updated. At the moment I store the content inside a MySQL database and use NHibernate ORM layer to interact with the DB. I've got a table defined for users, roles, submissions, tags, notifications and etc. I like this solution because it works well and my code looks nice and sane, but I'm also worried about how MySQL will perform once the size of our database reaches a significant number. I feel that it may struggle performing join operations fast enough.
This has made me think about non-relational database system such as MongoDB, CouchDB, Cassandra or Hadoop. Unfortunately I have no experience with either. I've read some good reviews on MongoDB and it looks interesting. I'm happy to spend the time and learn if one turns out to be the way to go. I'd much appreciate any one offering points or issues to consider when going with none relational dbms?
The other answers here have focused mainly on the technical aspects, but I think there are important points to be made that focus on the startup company aspect of things:
Availabililty of talent. MySQL is very common and you will probably find it easier (and more importantly, cheaper) to find developers for it, compared to the more rarified database systems. This larger developer base will also mean more tutorials, a more active support community, etc.
Ease of development. Again, because MySQL is so common, you will find it is the db of choice for a great many systems / services. This common ground may make any external integration a little easier.
You are preparing for a situation that may never exist, and is manageable if it does. Very few businesses (nevermind startups) come close to MySQL's limits, and with all due respect (and I am just guessing here); the likelihood that your startup will ever hit the sort of data throughput to cripple a properly structured, well resourced MySQL db is almost zero.
Basically, don't spend your time ( == money) worrying about which db to use, as MySQL can handle a lot of data, is well proven and well supported.
Going back to the technical side of things... Something that will have a far greater impact on the speed of your app than choice of db, is how efficiently data can be cached. An effective cache can have dramatic effects on reducing db load and speeding up the general responsivness of an app. I would spend your time investigating caching solutions and making sure you are developing your app in such a way that it can make the best use of those solutions.
FYI, my caching solution of choice is memcached.
So far no one has mentioned PostgreSQL as alternative to MySQL on the relational side. Be aware that MySQL libs are pure GPL, not LGPL. That might force you to release your code if you link to them, although maybe someone with more legal experience could tell you better the implications. On the other side, linking to a MySQL library is not the same that just connecting to the server and issue commands, you can do that with closed source.
PostreSQL is usually the best free replacement of Oracle and the BSD license should be more business friendly.
Since you prefer a non relational database, consider that the transition will be more dramatic. If you ever need to customize your database, you should also consider the license type factor.
There are three things that really have a deep impact on which one is your best database choice and you do not mention:
The size of your data or if you need to store files within your database.
A huge number of reads and very few (even restricted) writes. In that case more than a database you need a directory such as LDAP
The importance of of data distribution and/or replication. Most relational databases can be more or less well replicated, but because of their concept/design do not handle data distribution as well... but will you handle as much data that does not fit into one server or have access rights that needs special separate/extra servers?
However most people will go for a non relational database just because they do not like learning SQL
What do you think is a significant amount of data? MySQL, and basically most relational database engines, can handle rather large amount of data, with proper indexes and sane database schema.
Why don't you try how MySQL behaves with bigger data amount in your setup? Make some scripts that generate realistic data to MySQL test database and and generate some load on the system and see if it is fast enough.
Only when it is not fast enough, first start considering optimizing the database and changing to different database engine.
Be careful with NHibernate, it is easy to make a solution that is nice and easy to code with, but has bad performance with large amount of data. For example whether to use lazy or eager fetching with associations should be carefully considered. I don't mean that you shouldn't use NHibernate, but make sure that you understand how NHibernate works, for example what "n + 1 selects" -problem means.
Measure, don't assume.
Relational databases and NoSQL databases can both scale enormously, if the application is written right in each case, and if the system it runs on is properly tuned.
So, if you have a use case for NoSQL, code to it. Or, if you're more comfortable with relational, code to that. Then, measure how well it performs and how it scales, and if it's OK, go with it, if not, analyse why.
Only once you understand your performance problem should you go searching for exotic technology, unless you're comfortable with that technology or want to try it for some other reason.
I'd suggest you try out each db and pick the one that makes it easiest to develop your application. Go to http://try.mongodb.org to try MongoDB with a simple tutorial. Don't worry as much about speed since at the beginning developer time is more valuable than the CPU time.
I know that many MongoDB users have been able to ditch their ORM and their caching layer. Mongo's data model is much closer to the objects you work with than relational tables, so you can usually just directly store your objects as-is, even if they contain lists of nested objects, such as a blog post with comments. Also, because mongo is fast enough for most sites as-is, you can avoid dealing the complexities of caching and generally deliver a more real-time site. For example, Wordnik.com reported 250,000 reads/sec and 100,000 inserts/sec with a 1.2TB / 5 billion object DB.
There are a few ways to connect to MongoDB from .Net, but I don't have enough experience with that platform to know which is best:
Norm: http://wiki.github.com/atheken/NoRM/
MongoDB-CSharp: http://github.com/samus/mongodb-csharp
Simple-MongoDB: http://code.google.com/p/simple-mongodb/
Disclaimer: I work for 10gen on MongoDB so I am a bit biased.

Which is the Best database for Rails application?

I am developing a Rails application that will access a lot of RSS feeds or crawl sites for data (mostly news). It will be something like Google News but with a different approach, so I'll store a lot of news (or news summaries), classify them in different categories and use ranking and recommendation techniques.
Should I go with MySQL?
Is it worthwhile using IBM DB2
purexml to store the doucuments?
Also Ruby search implementations
(Ferret, Ultrasphinx and others) are
not needed If I choose DB2. Is that correct?
What are the advantages of
PostreSQL in this?
Does it makes sense to use Couch DB in
this scenario?
I'd like to choose the best option but without over-complicating the solution. So I discarded the idea to use two different storage solutions (one for the news documents and other for the rest of the data). I'm also considering only "free" options, so I didn't look at Oracle or MS SQL Server.
purexml is heavier than SQL, so you pay more for your roundtrip between webserver and DB. If you plan to have lots of users, I'd avoid it, your better off letting your webserver cache the requests, thus avoiding creating xml(rss) everytime, if that is what you are thinking about.
I'd go with MySQL because its really good at serving and its totally free, well PostgreSQL is too, but haven't used it so I can't say.
CouchDB could make sense, but not if you plan on doing OLAP (Offline Analysis) of your data, a normal RDBMS will be better at it.
Admitting firstly that I generally don't like mysql, I will say that there has been writing on this topic regarding postgres:
http://oldmoe.blogspot.com/2008/08/101-reasons-why-postgresql-is-better.html
This is always my choice when I need a pure relational database. I don't know whether a document database would be more appropriate for your application without knowing more about it. It does sound like it's something you should at least investigate.
MySQL is probably one of the best options out there; light, easy to install and maintain, multiplatform and free. On top of that there are some good free client tools.
Something to think about; because of the nature of your system you will probably have some tables that will grow quite a lot very quickly so you might want to think about performance.
Thus, MySQL supports vertical partitioning but only from V 5.1.
It sounds to me the application you will build can easily become a large-scale web app. I would suggest PostgreSQL, for it has been known for its reliability.
You can check out the following link -- Bob Ippolito from MochiMedia tells us why they ditched MySQL for PostgreSQL. Although the posts are more than 3 years old, the issues MySQL 5.1 has recently tend to prove that they are still relevant.
http://bob.pythonmac.org/archives/category/sql/mysql/
MySQL is good in production. I haven't used PostgreSQL for rails, but it's a good solution as well.
In the dev and test environments I'd start out with SQLite (default), and perhaps migrate to your target DB in the test environment as you move closer to completion.