MongoDB as a cache for frequent joins and queries from MySQL - mysql

Im thinking about doing the following and need suggestions if it makes sense to approach it this way. Basically since I am able to do queries in MongoDB and MongoDb is wicked fast at these since the hotspots of the data are cached in memory. I was thinking of storing data I would normally do a join from in mysql in mongoDB. While I am using memcached to store simple query results (for example a movie description page), for bigger stuff that requires more realtime/ondemand queries I was thinking about storing this in MongoDB. For example the view count for movies and who saw it, and doing analysis on it.
Hopefully I explained it clearly.
more info:
We dont want to keep writing to our mysql server on every rating like etc, MongoDB seemed like a good option to store the ratings,views of movies etc and then later on be able to do processing on that data. Whereas with Memcached data is not persisted and were unable to do queries
Thanks,
Faisal

Memory caching alone is not a good reason to go with MongoDB. Any properly configured RDBMS will cache frequently used data in memory.
What aspect of MySQL is currently limiting your performance? Do you have enough RAM in your server? Are your disks fast enough? Do you have a low latency cache device like an SSD configured appropriately?

There's nothing wrong with using both solutions in your application. As a matter of fact, I'm using mysql to store user sessions as suppose to cookies. In addition, I have another project that utilizes mysql but for certain parts of my application, I will be using MongoDB. Why? It's wicked fast and I hate writing join queries. It's so much easier to pop data in/out of mongo as suppose to having to do join queries in mysql.
i.e.
When saving tags for a particular user, it's so gosh darn eary to save/modify/delete the tag that's stored in mongodb. With MySQL, I would've had to write a query that JOINs multiple tables. For data such as user account, password, city, state - I saved everything into MySQL.

What your talking about is the idea of having normalized and de-normalized data. Using MongoDB as a denormalized data store for your normalized sql data is fine. Using Mongodb as your only data store for certain kinds of data is also fine. Just make sure that its clear in the system design as to where the true data is, and where the denormalized data is.
Normalized data is true fact. Denormalized data is gossip - you aren't sure if its up to date or not.

Related

On Node MySQL vs JSON

I'm currently using a MySQL where do I store JSON information and I'd fetch them from MySQL and parse them on my application. I would like to get rid of MySQL, but first I would like to know is that wise?
Is that efficient if I move to way that I store the data into data folder that contains .json files and these contains the data I need? There will be my app's coordinate data per user who wants to track themselves on map. Will that cause any issues? I don't need "query", but what about big data like 50K lines in example? Same amount will be in MySQL too. Amount doesn't change, but will there be any problems that appears when moving from "reading from sql to reading from json files"
It's difficult to answer all these questions in one, but I'll address some of them:
There are dedicated NoSQL databases that are very good at the type of data storage you're talking about: MongoDB, CouchDB etc. It might be worth checking these out. They are very good at dealing with JSON data. Querying and parsing are very simple in Node.js.
You can store JSON in MySQL (or other RDMS systems), I've done it in several projects with good results. As of MySQL 5.7.8 there is a dedicated JSON type. Queries can actually work surprisingly well, I know I've queried tables with tens of millions of JSON entries pretty quickly.
Make sure you consider backup and restore scenarios, what happens in the event of a data loss. Using MySQL or a NoSQL database will simplify this for you. Either way make sure you have this covered!
I wouldn't call 50K lines big data! I dealt with databases with tens of millions of rows.. this still wouldn't be called big data.
I would probably not recommend storing your data in files. I've worked in telematics before, we stored millions of JSON blobs in relational databases with very little problems. Later on we planned to move to a NoSQL database for these, but the relational database worked surprising well, especially because you can adopt a hybrid approach of using relational queries and including JSON data in the results (to be parsed by clients).
You might not need the ability to query, but it's very useful to get for example "Give me all JSON for user id 100". An RDBMS or NoSQL system would make this very easy.

How to store user activity history

I'm being told this question is subjective, but hey ho.
Am I best storing user activity in a table in a mysql database or in an xml file. The aim is for the data to be printed on their account page.
I'm worried that I will either end up with a huge/slow database or many many xml files on the server (one for each user).
Thanks
Use a DB of some sort. Files may have issues regarding I/O, locking, concurrent access and so on.
If you do use files, prefer json over xml.
For RDMS, Mysql is fine.
I would suggest using a NOSQL, my choice would be Redis.
Store it in a table. If you're storing billions of records you'll want to investigate partitioning or sharding, but those are problems you should tackle if and only if you will be hitting limits.
Test any design you have by simulating enough user activity to represent a year or two worth of vigorous use. If it holds up, you're okay. If not you'll have specific problems to address.
Remember in tables of this sort having indexes is important for retrieval speed, but too many indexes can slow down inserts. There's a balance here between too much and too little indexing you'll have to find.
XML files are often extremely expensive to append to unless you do something like what Adium did with their reverse XML parser built to append to XML logs efficiently.
I suggest it should be on the DB.
1) As it would be much easier to maintain a Database table for log information than separate log files. not much load on the server.
2) for RDBMS you need to query for those user log history which would be hard for the xml files
3) Proper indexing will help for faster data retrieval.
4) XML read/write cost more I/O OP

Loading large data sets into MySQL tables

I'd like to start tinkering with large government data sets—in particular, I want to work with campaign contribution records and lobbying disclosure records. The Sunlight Foundation and the Center for Responsive Politics offer cleaned-up versions of these data sets for download.
I want to load these data sets into MySQL tables, since MySQL is the database management system I'm most familiar with.
I have two questions:
What is the best method for loading these large CSV files into MySQL tables?
Is there a better method for loading these data sets into a database and running queries? Should I consider a different database platform? I'm open to alternatives, but I don't know where to start. For now, I'm only going to perform local queries, but at some point I'd like to build a public web app.
To answer your first question, try MySQL's LOAD DATA INFILE command. It is normally quite fast for this type of data load.
To answer your second question, properly indexed MySQL tables have no problem with 10s of millions of rows. Especially if you are only performing reads after your import.

Using combination of MySQL and MongoDB

Does it make sense to use a combination of MySQL and MongoDB. What im trying to do basically is use MySQl as a "raw data backup" type thing where all the data is being stored there but not being read from there.
The Data is also stored at the same time in MongoDB and the reads happen only from mongoDB because I dont have to do joins and stuff.
For example assume in building NetFlix
in mysql i have a table for Comments and Movies. Then when a comment is made In mySQL i just add it to the table, and in MongoDB i update the movies document to hold this new comment.
And then when i want to get movies and comments i just grab the document from mongoDb.
My main concern is because of how "new" mongodb is compared to MySQL. In the case where something unexpected happens in Mongo, we have a MySQL backup where we can quickly get the app fallback to mysql and memcached.
On paper it may sound like a good idea, but there are a lot of things you will have to take into account. This will make your application way more complex than you may think. I'll give you some examples.
Two different systems
You'll be dealing with two different systems, each with its own behavior. These different behaviors will make it quite hard to keep everything synchronized.
What will happen when a write in MongoDB fails, but succeeds in MySQL?
Or the other way around, when a column constraint in MySQL is violated, for example?
What if a deadlock occurs in MySQL?
What if your schema changes? One migration is painful, but you'll have to do two migrations.
You'd have to deal with some of these scenarios in your application code. Which brings me to the next point.
Two data access layers
Your application needs to interact with two external systems, so you'll need to write two data access layers.
These layers both have to be tested.
Both have to be maintained.
The rest of your application needs to communicate with both layers.
Abstracting away both layers will introduce another layer, which will further increase complexity.
Chance of cascading failure
Should MongoDB fail, the application will fall back to MySQL and memcached. But at this point memcached will be empty. So each request right after MongoDB fails will hit the database. If you have a high-traffic site, this can easily take down MySQL as well.
Word of advice
Identify all possible ways in which you think 'something unexpected' can happen with MongoDB. Then use the most simple solution for each individual case. For example, if it's data loss you're worried about, use replication. If it's data corruption, use delayed replication.

Database design with millions of entry

Suppose there is a messaging system. This system has millions of entry to be sent and get reported and the count is growing by 100K every hour. 2 service accesses db, one is sender, one is reporter. So what would you suggest in order to get maximum performance? How could the db be designed?
Also what open source RDBMS would you suggest among mysql, postgresql, mongodb etc. to fullfil this high volume db?
Thanks
You've not really provided much information on your requirement other than a few comments about expected data volumes. Simple storage of large volumes of data has no real intrinsic value, it's the ability to access that data which gives the real value; so knowing how you expected to retrieve information from the database is more important than how much data you want to store.
Do these messages really require a document db like MongDB, or are are they structured enough to use a straight RDBMS like Postgresql or MySQL. Do you need full text search capability? How often and what type of queries are executed against this message data? Are you trying to write your own Twitter?
If those are your current data volumes, look to using db replication for resilience. Consider partitioning your message table, perhaps by date posted. Use master/slave (or even multi-master/multi-slave) as Konerak has suggested. Look at the possibilities of an archive table for older messages that are less likely to be queried, but which are then still available. Look at what a commercial database like Oracle can offer you. Get in a professional to help tune the db for performance, rather than simply asking for free advice on sites like SO.
Consider your hardware as well... multiple load balanced servers to help with the volumes (we have 14 dedicated servers purely for accepting new messages, and three high performance servers tuned for querying the data).