How to store user activity history - mysql

I'm being told this question is subjective, but hey ho.
Am I best storing user activity in a table in a mysql database or in an xml file. The aim is for the data to be printed on their account page.
I'm worried that I will either end up with a huge/slow database or many many xml files on the server (one for each user).
Thanks

Use a DB of some sort. Files may have issues regarding I/O, locking, concurrent access and so on.
If you do use files, prefer json over xml.
For RDMS, Mysql is fine.
I would suggest using a NOSQL, my choice would be Redis.

Store it in a table. If you're storing billions of records you'll want to investigate partitioning or sharding, but those are problems you should tackle if and only if you will be hitting limits.
Test any design you have by simulating enough user activity to represent a year or two worth of vigorous use. If it holds up, you're okay. If not you'll have specific problems to address.
Remember in tables of this sort having indexes is important for retrieval speed, but too many indexes can slow down inserts. There's a balance here between too much and too little indexing you'll have to find.
XML files are often extremely expensive to append to unless you do something like what Adium did with their reverse XML parser built to append to XML logs efficiently.

I suggest it should be on the DB.
1) As it would be much easier to maintain a Database table for log information than separate log files. not much load on the server.
2) for RDBMS you need to query for those user log history which would be hard for the xml files
3) Proper indexing will help for faster data retrieval.
4) XML read/write cost more I/O OP

Related

On Node MySQL vs JSON

I'm currently using a MySQL where do I store JSON information and I'd fetch them from MySQL and parse them on my application. I would like to get rid of MySQL, but first I would like to know is that wise?
Is that efficient if I move to way that I store the data into data folder that contains .json files and these contains the data I need? There will be my app's coordinate data per user who wants to track themselves on map. Will that cause any issues? I don't need "query", but what about big data like 50K lines in example? Same amount will be in MySQL too. Amount doesn't change, but will there be any problems that appears when moving from "reading from sql to reading from json files"
It's difficult to answer all these questions in one, but I'll address some of them:
There are dedicated NoSQL databases that are very good at the type of data storage you're talking about: MongoDB, CouchDB etc. It might be worth checking these out. They are very good at dealing with JSON data. Querying and parsing are very simple in Node.js.
You can store JSON in MySQL (or other RDMS systems), I've done it in several projects with good results. As of MySQL 5.7.8 there is a dedicated JSON type. Queries can actually work surprisingly well, I know I've queried tables with tens of millions of JSON entries pretty quickly.
Make sure you consider backup and restore scenarios, what happens in the event of a data loss. Using MySQL or a NoSQL database will simplify this for you. Either way make sure you have this covered!
I wouldn't call 50K lines big data! I dealt with databases with tens of millions of rows.. this still wouldn't be called big data.
I would probably not recommend storing your data in files. I've worked in telematics before, we stored millions of JSON blobs in relational databases with very little problems. Later on we planned to move to a NoSQL database for these, but the relational database worked surprising well, especially because you can adopt a hybrid approach of using relational queries and including JSON data in the results (to be parsed by clients).
You might not need the ability to query, but it's very useful to get for example "Give me all JSON for user id 100". An RDBMS or NoSQL system would make this very easy.

XML or MySQL for User Database?

Might seem a strange question but would there be a performance benefit in using XML for a database rather than MySQL and tables?
To put this into context I wil be creating a website that has user profiles. I know more XML than MySQL and know most ppl will use MySQL as standard but was wondering if anyone could throw some pennies this way about how the two compare and if this suggestion is as outrageous to anyone understanding what the big O notation is as it could be...
The bigger xml file, the more memory usage because you'll have to load the entire xml file to RAM whilst running your script.
An average MySQL database is about 4mb big. Lets take that to a xml file of 4 mb, loaded to ram 4 mb, loaded from disk, into ram at every pageview, with about 25 visitors at any given moment that's 100mb already lost, let's say they flick a lotthrough pages it adds up to a fast 1 gigabyte of ram.
Not to mention you'll add about 1 second to page load every time, if not longer.
Not to mention continueus disk load for reading and writing changed vars. Threaded fork issues when two vitors want to update the same xml file.
These problems you don't have with an SQL server.
MySQL has indexes, and it's optimized for the binary values you will be storing. All you have with an xml file, is a plain file.. and any optimizations (caching, indexing, anything you can think of) will be up to you to implement.
XML is a great format for transport, everybody speaks it.. but you do not want to use it for storage.
And if you already know XML, but not yet MySQL.. I would say you're ahead of the game. You'll probably find writing SQL queries and fetching the results more straightforward than working with xml data.
As I see - there are several XML Db solutions available - these appear in a simple google search:
http://exist-db.org/exist/index.xml;jsessionid=1dowedwdr9hsanbcvdcom8aka
http://basex.org/
http://www.oracle.com/technetwork/database/features/xmldb/index.html
http://www.sedna.org/
So all it matters here is the speed of development. If you're mostly familiar with XML - then using one of those could be a booster for development time.
However - there is plenty of relational DB ORM products - depending on the programming language, that leverage the most dev effort and make it easy to use a database for a web site. So if you don't have some specific needs for your web site, you might go with any of the options above.
It depends on the structure of your database. This question cann't give a definite answer without knowing anything about your data. Any comparison of XML versus a relational database depends heavily on which data you choose, and what type of operations you plan.
For example you want store, index, and query is more than million rows and each row has a lot of the same fields. That’s a simple and fixed structure and it’s the same for all records. It’s a perfect fit for a relational database and can be stored in a single table. Relational databases handles such fixed records very efficiently.
Well, there are two main questions here.
First, if you're going to use a database, you have a choice between an XML database and a relational database. The choice depends primarily on the nature of your data (especially its complexity, but also the way in which it is used).
Then you have the choice between using a database and using a simple file (for example an XML file). That choice depends primarily on the quantity of data and the transaction throughput.
Since you haven't told us much about the nature of the data or its quantity or the throughput requirements, it's hard to advise you specifically on either question.

What database should I use to store timestamped pictures (or generally BLOBS)?

I need a database that can store a large number of BLOBS. The BLOBs would be picture files and would also have a timestamp and a few basic fields (size, metrics, ids of objects in other databases, things like that), but the main purpose of the database is to store the pictures.
We would like to be able to store the data in the database for a while, in the order of few months. With the data coming in maybe every few minutes, the number of BLOBs stored can grow quite quickly.
For now (development phase) we will be using a MySQL for this. I was wondering if MySQL is a good direction to go, in terms of:
Being able to store binary data efficiently
Scalability
Maintenance requirements.
Thanks,
MySQL is a good database, and can handle large data sets. However, there is a great benefit in making your whole database fit into RAM, in such case all database-related activity will be much faster. By putting large and seldom-accessed objects into your database, you're making this harder.
So, I think a combined approach is the best:
Save only metadata in the database, and save the files on disk as-is. Better to hash the directories if you're talking about 100,000 of files, then save file under the name of an index field in your database. E.g. such directory structure:
00/00001.jpg
00/00002.jpg
00/00003.jpg
....
....
10/10234.jpg
10/10235.jpg
In this case, your directories won't have too many files, and accessing the files is fast and easy. Of course if your database server is distributed/redundant, things get more interesting, any such approach may or may not be warranted, depending on the load, redundancy/fail over requirements, etc.
I suggest to store images on hard disk and in your mysql implementation maintain the metadata of your image including the filename (maybe). So your script can easily pick it up from your local hard drive.
For Reading & storing files, hard disk and most modern OS are really good at it. So I believe mysql is not going to solve anything here.

MongoDB as a cache for frequent joins and queries from MySQL

Im thinking about doing the following and need suggestions if it makes sense to approach it this way. Basically since I am able to do queries in MongoDB and MongoDb is wicked fast at these since the hotspots of the data are cached in memory. I was thinking of storing data I would normally do a join from in mysql in mongoDB. While I am using memcached to store simple query results (for example a movie description page), for bigger stuff that requires more realtime/ondemand queries I was thinking about storing this in MongoDB. For example the view count for movies and who saw it, and doing analysis on it.
Hopefully I explained it clearly.
more info:
We dont want to keep writing to our mysql server on every rating like etc, MongoDB seemed like a good option to store the ratings,views of movies etc and then later on be able to do processing on that data. Whereas with Memcached data is not persisted and were unable to do queries
Thanks,
Faisal
Memory caching alone is not a good reason to go with MongoDB. Any properly configured RDBMS will cache frequently used data in memory.
What aspect of MySQL is currently limiting your performance? Do you have enough RAM in your server? Are your disks fast enough? Do you have a low latency cache device like an SSD configured appropriately?
There's nothing wrong with using both solutions in your application. As a matter of fact, I'm using mysql to store user sessions as suppose to cookies. In addition, I have another project that utilizes mysql but for certain parts of my application, I will be using MongoDB. Why? It's wicked fast and I hate writing join queries. It's so much easier to pop data in/out of mongo as suppose to having to do join queries in mysql.
i.e.
When saving tags for a particular user, it's so gosh darn eary to save/modify/delete the tag that's stored in mongodb. With MySQL, I would've had to write a query that JOINs multiple tables. For data such as user account, password, city, state - I saved everything into MySQL.
What your talking about is the idea of having normalized and de-normalized data. Using MongoDB as a denormalized data store for your normalized sql data is fine. Using Mongodb as your only data store for certain kinds of data is also fine. Just make sure that its clear in the system design as to where the true data is, and where the denormalized data is.
Normalized data is true fact. Denormalized data is gossip - you aren't sure if its up to date or not.

Database design with millions of entry

Suppose there is a messaging system. This system has millions of entry to be sent and get reported and the count is growing by 100K every hour. 2 service accesses db, one is sender, one is reporter. So what would you suggest in order to get maximum performance? How could the db be designed?
Also what open source RDBMS would you suggest among mysql, postgresql, mongodb etc. to fullfil this high volume db?
Thanks
You've not really provided much information on your requirement other than a few comments about expected data volumes. Simple storage of large volumes of data has no real intrinsic value, it's the ability to access that data which gives the real value; so knowing how you expected to retrieve information from the database is more important than how much data you want to store.
Do these messages really require a document db like MongDB, or are are they structured enough to use a straight RDBMS like Postgresql or MySQL. Do you need full text search capability? How often and what type of queries are executed against this message data? Are you trying to write your own Twitter?
If those are your current data volumes, look to using db replication for resilience. Consider partitioning your message table, perhaps by date posted. Use master/slave (or even multi-master/multi-slave) as Konerak has suggested. Look at the possibilities of an archive table for older messages that are less likely to be queried, but which are then still available. Look at what a commercial database like Oracle can offer you. Get in a professional to help tune the db for performance, rather than simply asking for free advice on sites like SO.
Consider your hardware as well... multiple load balanced servers to help with the volumes (we have 14 dedicated servers purely for accepting new messages, and three high performance servers tuned for querying the data).