Newsfeed with MongoDB with user actions - mysql

I'm building a system where users can interact with various activities (buddies, events, etc.). I'm thinking of using mongodb for this, and have done a decent amount of work already. i also read the below, which is similar to what i'm thinking:
Constructing a personalized Facebook-like Newsfeed: SQL, MongoDB?
My concern is this... Say a user is interacting with an event of some sort, currently the data for the event is in MySQL. If the user later on updates parts of the event (specifically the title), is there a quicker way to update the newsfeed item that has the title in it? I'm storing the title as a json message for the activity stream...

Related

Database Design Best Practice for webchat

I'm designing a webchat that uses Pusher to handle the async calls. It's for a game/tool I'm designing. There can be any number of games, and each game has channels (for Pusher). And each channel will open a new session after being inactive for 6 hours.
The messages need to be stored by session to access later. So what would the best design for a database be? Should I store the messages individually, or something like as json in a mysql text field?
Currently there's a table for games, game sessions, channels, chat sessions (linked 1:1 to game sessions), and one for the messages (with the messages in json format in a text field). But I'm not sure if that's the best way to store what will amass to anywhere from 100-2000 messages in a single session, so storing each message in its own row sounds impractical.
Edit: In terms of what I plan on doing with it. I just need them archived so the users can view them by session again later.
So does that mean the best way would be to store each message in its own row with a unique id, and just filter all those out when loading the session?

How did Facebook or Twitter implement their subscribe system

I'm working on a SNS like mobile app project, where users upload their contents and can see updates of their subscribed topic or friends on their homepage.
I store user contents in mysql, and query the user specific homepage data by simply querying out first who and what the user subscribed and then query the content table filtering out using the 'where userid IN (....) or topic IN (....)' clause.
I suspect this would become quite slow when the content table piles up or when a user subscribe tons of users or topics. Our newly released app is already starting to have thousands of new users each week, and getting more over time. Scalability must be a concern for us right now.
So I wonder how Facebook or Twitter handle this subscribing problem with their amazing number of users. Do they handle a list for each user? I tried to search, but all I got is how to interact with Facebook or Twitter rather than how they actually implement this feature.
I noticed that you see only updates rather than history in your feed when using Facebook. Which means that subscribing a new user won't dump lots out dated content into your feed as how it would be by using my current method.
How do Facebook design their database and how did they dispatch new contents to subscribed users?
My backend is currently PHP+MySQL, and I don't mind introducing other backend technologies such as Redis or JMS and stuff if that's the way it should be done.
Sounds like you guys are still in a pretty early stage. There are N-number of ways to solve this, all depending on which stage of DAUs you think you'll hit in the near term, how much money you have to spend on hardware, time in your hands to build it, etc.
You can try an interim table that queues up the newly introduced items, its meta-data on what it entails (which topic, friend user_id list, etc.). Then use a queue-consumer system like RabbitMQ/GearMan to manage the consumption of this growing list, and figure out who should process this. Build the queue-consumer program in Scala or a J2EE system like Maven/Tomcat, something that can persist. If you really wanna stick with PHP, build a PHP REST API that can live in php5-fpm's memory, and managed by the FastCGI process manager, and called via a proxy like nginx, initiated by curl calls at an appropriate interval from a cron executed script.
[EDIT] - It's probably better to not use a DB for a queueing system, use a cache server like Redis, it outperforms a DB in many ways and it can persist to disk (lookup RDB and AOF). It's not very fault tolerant in case the job fails all of a sudden, you might lose a job record. Most likely you won't care on these crash edge cases. Also lookup php-resque!
To prep for the SNS to go out efficiently, I'm assuming you're already de-normalizing the tables. I'd imagine a "user_topic" table with the topic mapped to users who subscribed to them. Create another table "notification_metadata" describing where users prefer receiving notifications (SMS/push/email/in-app notification), and the meta-data needed to push to those channels (mobile client approval keys for APNS/GCM, email addresses, user auth-tokens). Use JSON blobs for the two fields in notification_metadata, so each user will have a single row. This saves I/O hits on the DB.
Use user_id as your primary key for "notification_meta" and user_id + topic_id as PK for "user_topic". DO NOT add an auto-increment "id" field for either, it's pretty useless in this use case (takes up space, CPU, index memory, etc). If both fields are in the PK, queries on user_topic will be all from memory, and the only disk hit is on "notification_meta" during the JOIN.
So if a user subscribes to 2 topics, there'll be two entries in "user_topic", and each user will always have a single row in "notification_meta"
There are more ways to scale, like dynamically creating a new table for each new topic, sharding to different MySQL instances based on user_id, partitioning, etc. There's N-ways to scale, especially in MySQL. Good luck!

Why should you FanOut when building a Activity Feed

I'm looking into the logistics of building an Activity Feed, similar to that of Facebook, or Twitter's timeline.
There are tons of answers here on StackOverlfow and on Quora and other articles I've found on google that describe fanning out on read or write. It all makes sense. You record all the activity in one main activity table/collection, and then at some point, write a copy of that data to separate, appropriate tables for each user.
What I dont completely understand is why is there a need for a fanout? That is, Why is there a need to record the activity on individual user feeds? Is there a reason why you cant just use one activity table/collection? It would have appropriate indexes, and have the acting user's ID. And then, when someone wants to see their activity stream, just query the activity stream for users that the current user is following.
I understand that this may not be as efficient since activities outnumber actual objects in the database a few times over. That is, there are might be 100 posts in a database, but over 1,000 actions on posts, thus queries may be slow on the activity table/collection when row numbers get pretty high.
But wouldnt this work? Cant you just scale the database so it can handle queries more efficiently? Is there really a need for fanning out?
Not necessary to fan-out always, but the decision is depends on many factors.
For eg. Twitter does both but Facebook follows fan-out-on-load.
As you can imagine, Facebook's activity stream is much more complex than Twitter's. FB needs to apply lot of filters/privacy settings per user/group basis, hence it make sense for them to pull and build the stream on-the fly. Their TAO graph infrastructure (Graphing on top of MySQL + Caching) makes it easy for them to build and fetch the feeds quite fast for each user.

What database/technology to use for a notification system on a node.js site?

I'm looking to implement notifications within my node.js application. I currently use mysql for relational data (users, submissions, comments, etc). I use mongodb for page views only.
To build a notification system, does it make more sense (from a performance standpoint) to use mongodb vs MySQL?
Also, what's the convention for showing new notifications to users? At first, I was thinking that I'd have a notification icon, and they click on it and it does an ajax call to look for all new notifications from the user, but I want to show the user that the icon is actually worth clicking (either with some different color or a bubble with the number of new notifications like Google Plus does).
I could do it when the user logs it, but that would mean the user would only see new notifications when they logged out and back in (because it'd be saved in their session). Should I poll for updates? I'm not sure if that's the recommended method as it seems like overkill to show a single digit (or more depending on the num of notifications).
If you're using node then you can 'push' notifications to a connected user via websockets. The linked document is an example of one well known websocket engine that has good performance and good documentation. That way your application can send notifications to any user, or sets of users, or everyone based on simple queries that you setup.
Data storage is a different question. Generally mysql does have poor perfomance in cases of high scalability, and mongo does generally have a quicker read query response, but it depends on what data structure you wish to use. If your data is in a simple key-value structure with no real need for relational data, then perhaps using a memory store such as Redis would be the most suitable.
This answer has more information on your question too if you want to follow up and investigate more.

Multi-room chat logging in Rails/MySQL app

I'm coding a browser game application in Rails. It's chat-based, and I'm using currently MySQL for the databases. However, I'm running into a problem when it comes to chat logging for the games.
The application goals dictate that there will be multiple rooms at any given time in which people are playing the chat-based game. Conversation will be pretty much constant, and there are a number of other actions, such as private messages and game actions, which must be logged as well. Players who join the game after other players must be able to see the chat log from before they joined, and games must be available to review.
My first thought was to, on game start, create a database that matches the game identifier, and store everything there. Then when someone joins the game, we could just parse it back to them. Once the game had been over for a certain time, the script would take the database content, parse it into an XML object, and store this in a dataase for game review, deleting the table to keep things running lean.
I created a moddel called Message, with a matching table with identical columns for those I want to store in the game tables - id, timestamp, sender, target (for PMs and actions), type of message and content. Then I set the initializer for the Message object to set the table name to 'game_#{game_id}'. Rails however is throwing tantrums - I get an undefined method has_key? error when I try to initialize the object in Rails. It occurs to me based on this that the method I'm using may be a bit un-Rails-ian, and that possibly it defeats the purpose of working in Rails to pass up using the excellent object/db management features it has.
I've considered other alternatives, such as temporarily keeping all the messages in the main Messages table and just querying them by game ID, but I'm unsure if a MySQL table is up to the task of speedily serving up this data while accepting constant writes, especially in the event that we get a dozen or more games going at once averaging a message or two per second. It was suggested to me that a noSQL solution like a MongoDB capped collection for the temporary storage would be an excellent option from a performance standpoint, but that would still waste all that ActiveRecord goodness that Rails offers.
Is there a reliable and relatively fast way to meet the constraints of making the logged messages able to be quickly stored and fetched for quick access while the game is ongoing and then stored in some type of low-overhead method for review? Would any of the above ideas be workable or is there a whole separate option I've overlooked?