Using MongoDB and MySQL in unison - mysql

Some parts of my web app would work very well with a RDBMS, such as user and URL handling - I want to normalize users, emails, hosts (ie stackoverflow.com), and urls (ie https://stackoverflow.com/questions/ask) so that updating things in one place update things in all places and to minimize redundancy.
But some parts of my web app would very well with a document-based database, like Mongo, because they have a lot of components that would work more efficiently as embedded objects.
Would it make sense to use MySQL for the relational objects and Mongo for the document objects, or would it be not worth the hassle to have to manage two types of databases? I know that Mongo has references, but I get the idea that it is not really designed and optimized for references.
Thanks!
PS: I read this: Using combination of MySQL and MongoDB and it scratches the edge of what I am asking, but it is really a completely different question.

We use Mongo and MySQL in unision. Yes there is additional maintenance involved but it is about using the right tool for the right job. We use Mongo for a more real-time scenario where we need fast reads and writes and can do without persisting data for long periods of time. MySQL for everything else.
That being said, your needs may be unique and you need to figure out the right tool for the job.

I recently built a system using MySql for as the RDBMS managing users and blogging and MongoDB for searchable attributes. It works well however keeping data in sync, especially user Id's etc requires a bit of work. It is a case of basically choosing the right tool for the job.

Related

SQLite3 database per customer

Scenario:
Building a commercial app consisting in an RESTful backend with symfony2 and a frontend in AngularJS
This app will never be used by many customers (if I get to sell 100 that would be fantastic. Hopefully much more, but in any case will be massive)
I want to have a multi tenant structure for the database with one schema per customer (they store sensitive information for their customers)
I'm aware of problem when updating schemas but I will have to live with it.
Today I have a MySQL demo database that I will clone each time a new customer purchase the app.
There is no relationship between my customers, so I don't need to communicate with multiple shards for any query
For one customer, they can be using the app from several devices at the time, but there won't be massive write operations in the db
My question
Trying to set some functional tests for the backend API I read about having a dedicated sqlite database for loading testing data, which seems to be good idea.
However I wonder if it's also a good idea to switch from MySQL to SQLite3 database as my main database support for the application, and if it's a common practice to have one dedicated SQLite3 database PER CLIENT. I've never used SQLite and I have no idea if the process of updating a schema and replicate the changes in all the databases is done in the same way as for other RDBMS
Is this a correct scenario for SQLite?
Any suggestion (aka tutorial) in how to achieve this?
[I wonder] if it's a common practice to have one dedicated SQLite3 database PER CLIENT
Only if the database is deployed along with the application, like on a phone. Otherwise I've never heard of such a thing.
I've never used SQLite and I have no idea if the process of updating a schema and replicate the changes in all the databases is done in the same way as for other RDBMS
SQLite is a SQL database and responds to ALTER TABLE and the like. As for updating all the schemas, you'll have to re-run the update for all schemas.
Schema synching is usually handled by an outside utility, usually your ORM will have something. Some are server agnostic, some only support specific servers. There are also dedicated database change management tools such as Sqitch.
However I wonder if it's also a good idea to switch from MySQL to SQLite3 database as my main database support for the application, and
SQLite's main advantage is not requiring you to install and run a server. That makes sense for quick projects or where you have to deploy the database, like a phone app. For server based application there's no problem having a database server. SQLite's very restricted set of SQL features becomes a disadvantage. It will also likely run slower than a server database for anything but the simplest queries.
Trying to set some functional tests for the backend API I read about having a dedicated sqlite database for loading testing data, which seems to be good idea.
Under no circumstances should you test with a different database than the production database. Databases do not all implement SQL the same, MySQL is particularly bad about this, and your tests will not reflect reality. Running a MySQL instance for testing is not much work.
This separate schema thing claims three advantages...
Extensibility (you can add fields whenever you like)
Security (a query cannot accidentally show data for the wrong tenant)
Parallel Scaling (you can potentially split each schema onto a different server)
What they're proposing is equivalent to having a separate, customized copy of the code for every tenant. You wouldn't do that, it's obviously a maintenance nightmare. Code at least has the advantage of version control systems with branching and merging. I know only of one database management tool that supports branching, Sqitch.
Let's imagine you've made a custom change to tenant 5's schema. Now you have a general schema change you'd like to apply to all of them. What if the change to 5 conflicts with this? What if the change to 5 requires special data migration different from everybody else? Now let's imagine you've made custom changes to ten schemas. A hundred. A thousand? Nightmare.
Different schemas will require different queries. The application will have to know which schema each tenant is using, there will have to be some sort of schema version map you'll need to maintain. And every different possible query for every different possible schema will have to be maintained in the application code. Nightmare.
Yes, putting each tenant in a separate schema is more secure, but that only protects against writing bad queries or including a query builder (which is a bad idea anyway). There are better ways mitigate the problem such as the view filter suggested in the docs. There are many other ways an attacker can access tenant data that this doesn't address: gain a database connection, gain access to the filesystem, sniff network traffic. I don't see the small security gain being worth the maintenance nightmare.
As for scaling, the article is ten years out of date. There are far, far better ways to achieve parallel scaling then to coarsely put schemas on different servers. There are entire databases dedicated to this idea. Fortunately, you don't need any of this! Scaling won't be a problem for you until you have tens of thousands to millions of tenants. The idea of front loading your design with a schema maintenance nightmare for a hypothetical big parallel scaling problem is putting the cart so far before the horse, it's already at the pub having a pint.
If you want to use a relational database I would recommend PostgreSQL. It has a very rich SQL implementation, its fast and scales well, and it has something that renders this whole idea of separate schemas moot: a built in JSON type. This can be used to implement the "extensibility" mentioned in the article. Each table can have a meta column using the JSON type that you can throw any extra data into you like. The application does not need special queries, the meta column is always there. PostgreSQL's JSON operators make working with the meta data very easy and efficient.
You could also look into a NoSQL database. There are plenty to choose from and many support custom schemas and parallel scaling. However, it's likely you will have to change your choice of framework to use one that supports NoSQL.

MongoDB, Mysql and relationships

I'm creating an online chat.
Context (if needed):
So far I was using PHP/MySQL and AJAX to do the job but this is not a healthy solution as I'm stuck with a "pull" type application with concerns about scalability.
I read about the "push" method alternatives and it seems that my choices are limited and exclude PHP.
Websockets could be a very interesting option if it was integrated in every browser but that's not the case (and it seems that for most of those implementing it, it is disabled by default).
Long polling would also be a candidate but it involves other issues like the number of concurrent open connections that may kill your web app too.
This is why, against my will, I think that my only viable option is to use server-side javascript (node.js + now.js would be my choice then).
This said, I may need to rethink the use of a database too.
I need to keep stored data of each users and link these users to their submitted messages.
In case of a chat engine driven by a push system, would MySQL still be a valuable choice then?
I read about NoSQL data management and it seems that MongoDB would be a good addition to node.js.
My two questions:
Is there a reason I'm better off moving to a NoSQL system (which I need to learn from scratch) instead of MySQL (which I know already) in case of a real time web app?
Let's say that in MySQL:
I have a table called user (user_id_p, username)
I have a table called messages (message_id, message, user_id_f)
I want to make a single query to get all the messages associated with the username "omgtheykilledkenny".
Simple enough but how can I achieve that with MongoDB and its collections philosophy?
Thank you for your help.
Working with node.js/MongoDB is cool because Mongo's document structure is already JSONish, so you don't have to convert your queries to JSON. If you already know JavaScript, you have a headstart learning MongoDB. Mongo does scale for writes and reads pretty easily, the speed is pretty awesome, although I've seen some MySQL benchmarks on a single system that compare well to Mongo--it really shines when you start needing multiple boxes.
Assuming you have a separate messages collection, and you already know the id of the user you could just do: db.messages.find({user_id:ObjectId(...)});
Update: If you don't know the user id, then you need to do two queries, yes (unless you use an embedded array as recommended in the other answer--I would advise against that for this sort of use case, though, because you'll end up querying the entire document/list of messages even to display just a subset). Depending on your use case, obviously, if you have the username, you could also keep the user id handy, for situations like this. If it's client input giving the username that wouldn't work.
Update2: If you have unique usernames, you could make the username the _id for the users collection to avoid this issue. Most people would probably advise against this, and it has some definite drawbacks, such as making it harder to change a username.
You can't perform joins in MongoDB, so you can't achieve your second requirement. The Mongo way so do this would be either to nest messages within the user collection:
{ username: 'abc', messages: [...]}
Or use refId's, which is a kind of half-way house between joins and nested documents:
http://uk3.php.net/manual/en/class.mongodbref.php
In terms of switching from MySQL to Mongo, you don't necessarily need to ditch MySQL entirely. There are use cases where one is more appropriate than the other. You could use both for different parts of the system if it's appropriate to do so. Personally, I've used MySQL for a lot of things in the past, and I'm using MongoDB for a big project at the moment. I found the move very easy to make, because it's so easy to use the MongoDB driver, and the MongoDB site is very good for documentation on the whole.
You can convert to and from JSON with json_encode and json_decode from the front end, and you query and insert/update with arrays with MongoDB's PHP driver, so it's arguably more intuitive and easier to use than MySQL. It's just a question of getting used to it.

How do I move from a relational database and do I need to?

I'm currently building a CRM for a niche industry that can be tailored for the end user. We currently are using a base table for leads and then basically a hacked together combination of tables as a key/value store for any custom fields that might be added.
Each lead in the CRM also has tables for various owners, permissions, access logs, scheduled tasks, and generated contracts/documents.
Currently, everything is stored in a MySQL database but it just seems really messy. I'm thinking of using a NoSQL solution like MongoDB and then using Redis to create any relationships between tables.
Example: I'm thinking of storing the leads and users in a MongoDB database and then using Redis for storing what users a certain lead belongs to etc.
Just looking for some general advice on what the best solution is for this situation as I haven't dealt with NoSQL before.
mysql is a great tool
storages is not some stuff that works automagically. For being working well - the great developer needs to implement great schema and great code.
What I wanted to say: if you have some issues now - it doesn't mean that they will disappear once you moved to another database (paradigm, ideology, etc). Even if they moved away - the new ones will come. Nothing happens without hard working.
And especially to big volumes of data and high loaded projects - there is no general advice and silver bullet.
ps:
I'm thinking of using a NoSQL solution like MongoDB and then using Redis to create any relationships between tables.
MongoDB can maintain relations. This is one more argument to not move to the databases you are not well experienced in.

Using combination of MySQL and MongoDB

Does it make sense to use a combination of MySQL and MongoDB. What im trying to do basically is use MySQl as a "raw data backup" type thing where all the data is being stored there but not being read from there.
The Data is also stored at the same time in MongoDB and the reads happen only from mongoDB because I dont have to do joins and stuff.
For example assume in building NetFlix
in mysql i have a table for Comments and Movies. Then when a comment is made In mySQL i just add it to the table, and in MongoDB i update the movies document to hold this new comment.
And then when i want to get movies and comments i just grab the document from mongoDb.
My main concern is because of how "new" mongodb is compared to MySQL. In the case where something unexpected happens in Mongo, we have a MySQL backup where we can quickly get the app fallback to mysql and memcached.
On paper it may sound like a good idea, but there are a lot of things you will have to take into account. This will make your application way more complex than you may think. I'll give you some examples.
Two different systems
You'll be dealing with two different systems, each with its own behavior. These different behaviors will make it quite hard to keep everything synchronized.
What will happen when a write in MongoDB fails, but succeeds in MySQL?
Or the other way around, when a column constraint in MySQL is violated, for example?
What if a deadlock occurs in MySQL?
What if your schema changes? One migration is painful, but you'll have to do two migrations.
You'd have to deal with some of these scenarios in your application code. Which brings me to the next point.
Two data access layers
Your application needs to interact with two external systems, so you'll need to write two data access layers.
These layers both have to be tested.
Both have to be maintained.
The rest of your application needs to communicate with both layers.
Abstracting away both layers will introduce another layer, which will further increase complexity.
Chance of cascading failure
Should MongoDB fail, the application will fall back to MySQL and memcached. But at this point memcached will be empty. So each request right after MongoDB fails will hit the database. If you have a high-traffic site, this can easily take down MySQL as well.
Word of advice
Identify all possible ways in which you think 'something unexpected' can happen with MongoDB. Then use the most simple solution for each individual case. For example, if it's data loss you're worried about, use replication. If it's data corruption, use delayed replication.

What database systems should a startup company consider?

Right now I'm developing the prototype of a web application that aggregates large number of text entries from a large number of users. This data must be frequently displayed back and often updated. At the moment I store the content inside a MySQL database and use NHibernate ORM layer to interact with the DB. I've got a table defined for users, roles, submissions, tags, notifications and etc. I like this solution because it works well and my code looks nice and sane, but I'm also worried about how MySQL will perform once the size of our database reaches a significant number. I feel that it may struggle performing join operations fast enough.
This has made me think about non-relational database system such as MongoDB, CouchDB, Cassandra or Hadoop. Unfortunately I have no experience with either. I've read some good reviews on MongoDB and it looks interesting. I'm happy to spend the time and learn if one turns out to be the way to go. I'd much appreciate any one offering points or issues to consider when going with none relational dbms?
The other answers here have focused mainly on the technical aspects, but I think there are important points to be made that focus on the startup company aspect of things:
Availabililty of talent. MySQL is very common and you will probably find it easier (and more importantly, cheaper) to find developers for it, compared to the more rarified database systems. This larger developer base will also mean more tutorials, a more active support community, etc.
Ease of development. Again, because MySQL is so common, you will find it is the db of choice for a great many systems / services. This common ground may make any external integration a little easier.
You are preparing for a situation that may never exist, and is manageable if it does. Very few businesses (nevermind startups) come close to MySQL's limits, and with all due respect (and I am just guessing here); the likelihood that your startup will ever hit the sort of data throughput to cripple a properly structured, well resourced MySQL db is almost zero.
Basically, don't spend your time ( == money) worrying about which db to use, as MySQL can handle a lot of data, is well proven and well supported.
Going back to the technical side of things... Something that will have a far greater impact on the speed of your app than choice of db, is how efficiently data can be cached. An effective cache can have dramatic effects on reducing db load and speeding up the general responsivness of an app. I would spend your time investigating caching solutions and making sure you are developing your app in such a way that it can make the best use of those solutions.
FYI, my caching solution of choice is memcached.
So far no one has mentioned PostgreSQL as alternative to MySQL on the relational side. Be aware that MySQL libs are pure GPL, not LGPL. That might force you to release your code if you link to them, although maybe someone with more legal experience could tell you better the implications. On the other side, linking to a MySQL library is not the same that just connecting to the server and issue commands, you can do that with closed source.
PostreSQL is usually the best free replacement of Oracle and the BSD license should be more business friendly.
Since you prefer a non relational database, consider that the transition will be more dramatic. If you ever need to customize your database, you should also consider the license type factor.
There are three things that really have a deep impact on which one is your best database choice and you do not mention:
The size of your data or if you need to store files within your database.
A huge number of reads and very few (even restricted) writes. In that case more than a database you need a directory such as LDAP
The importance of of data distribution and/or replication. Most relational databases can be more or less well replicated, but because of their concept/design do not handle data distribution as well... but will you handle as much data that does not fit into one server or have access rights that needs special separate/extra servers?
However most people will go for a non relational database just because they do not like learning SQL
What do you think is a significant amount of data? MySQL, and basically most relational database engines, can handle rather large amount of data, with proper indexes and sane database schema.
Why don't you try how MySQL behaves with bigger data amount in your setup? Make some scripts that generate realistic data to MySQL test database and and generate some load on the system and see if it is fast enough.
Only when it is not fast enough, first start considering optimizing the database and changing to different database engine.
Be careful with NHibernate, it is easy to make a solution that is nice and easy to code with, but has bad performance with large amount of data. For example whether to use lazy or eager fetching with associations should be carefully considered. I don't mean that you shouldn't use NHibernate, but make sure that you understand how NHibernate works, for example what "n + 1 selects" -problem means.
Measure, don't assume.
Relational databases and NoSQL databases can both scale enormously, if the application is written right in each case, and if the system it runs on is properly tuned.
So, if you have a use case for NoSQL, code to it. Or, if you're more comfortable with relational, code to that. Then, measure how well it performs and how it scales, and if it's OK, go with it, if not, analyse why.
Only once you understand your performance problem should you go searching for exotic technology, unless you're comfortable with that technology or want to try it for some other reason.
I'd suggest you try out each db and pick the one that makes it easiest to develop your application. Go to http://try.mongodb.org to try MongoDB with a simple tutorial. Don't worry as much about speed since at the beginning developer time is more valuable than the CPU time.
I know that many MongoDB users have been able to ditch their ORM and their caching layer. Mongo's data model is much closer to the objects you work with than relational tables, so you can usually just directly store your objects as-is, even if they contain lists of nested objects, such as a blog post with comments. Also, because mongo is fast enough for most sites as-is, you can avoid dealing the complexities of caching and generally deliver a more real-time site. For example, Wordnik.com reported 250,000 reads/sec and 100,000 inserts/sec with a 1.2TB / 5 billion object DB.
There are a few ways to connect to MongoDB from .Net, but I don't have enough experience with that platform to know which is best:
Norm: http://wiki.github.com/atheken/NoRM/
MongoDB-CSharp: http://github.com/samus/mongodb-csharp
Simple-MongoDB: http://code.google.com/p/simple-mongodb/
Disclaimer: I work for 10gen on MongoDB so I am a bit biased.