Database Design Best Practice for webchat

Database Design Best Practice for webchat - mysql

I'm designing a webchat that uses Pusher to handle the async calls. It's for a game/tool I'm designing. There can be any number of games, and each game has channels (for Pusher). And each channel will open a new session after being inactive for 6 hours.
The messages need to be stored by session to access later. So what would the best design for a database be? Should I store the messages individually, or something like as json in a mysql text field?
Currently there's a table for games, game sessions, channels, chat sessions (linked 1:1 to game sessions), and one for the messages (with the messages in json format in a text field). But I'm not sure if that's the best way to store what will amass to anywhere from 100-2000 messages in a single session, so storing each message in its own row sounds impractical.
Edit: In terms of what I plan on doing with it. I just need them archived so the users can view them by session again later.
So does that mean the best way would be to store each message in its own row with a unique id, and just filter all those out when loading the session?

Related

If my users are stored in another database, should I duplicate them in my service that uses SQL database?

If my users are stored in some other database, but I am building posts in my SQL database, should I create another table users?
If I did, I would be duplicating all of my users and would have to make sure this stays in sync with the other database, but on the other hand, my posts tables could save space by referring to fk instead of full id string each time.
What is the recommendation? Create another table users or just pass in the user ids to query?

If you have a service that stores and provides information about users then other services that need that information should communicate with the User service to get it. That is, presumably, the reason the User service exists in the first place.
Depending on the volatility of the users list and requirements for changes there to be respected in the Posts service you might consider some short-term caching in the Posts service, but I certainly wouldn't persist another copy of the user list there.

There are 3 obvious solutions.
The simplest, cleanest and fastest is to use foreign keys and joins between your "posts" database and your "users" database. In this case, when you show a list of posts, you can get both the post and user data in a single query, and there's no need to keep things up to date.
The next option is to store a copy of the user data alongside your posts. This leads to entertaining failure modes - data in the user database may get out of sync. However, this is a fairly common strategy when using 3rd party authentication systems (e.g. logging on with your Google/Facebook/Github/Stack Exchange credentials). The way to make this work is to minimize the amount of data you duplicate, and have it be safe if it's out of date. For instance, a user's display name is probably okay; current bank account balance is probably not.
The final option is to store the primary key for users in your posts database, and to retrieve the user data at run time. This is less likely to lead to bugs with data getting out of sync, but it can cause performance problems - retrieving user details for 1000 posts one by one is obviously much slower than retrieving everything through a joined query.
The choice then is "do I have a service which combines post and user data and my UI retrieves everything from that service, or do I let the UI retrieve posts, and then users for each post". That's mostly down to the application usage, and whether you can use asynchronous calls to retrieve user information. If at all possible (assuming you're building a web application), the simplest option might be to return the posts and user IDs and use Ajax requests to retrieve the user data as needed.
The CQRS approach (common to microservice architectures) provides some structure for this.

How did Facebook or Twitter implement their subscribe system

I'm working on a SNS like mobile app project, where users upload their contents and can see updates of their subscribed topic or friends on their homepage.
I store user contents in mysql, and query the user specific homepage data by simply querying out first who and what the user subscribed and then query the content table filtering out using the 'where userid IN (....) or topic IN (....)' clause.
I suspect this would become quite slow when the content table piles up or when a user subscribe tons of users or topics. Our newly released app is already starting to have thousands of new users each week, and getting more over time. Scalability must be a concern for us right now.
So I wonder how Facebook or Twitter handle this subscribing problem with their amazing number of users. Do they handle a list for each user? I tried to search, but all I got is how to interact with Facebook or Twitter rather than how they actually implement this feature.
I noticed that you see only updates rather than history in your feed when using Facebook. Which means that subscribing a new user won't dump lots out dated content into your feed as how it would be by using my current method.
How do Facebook design their database and how did they dispatch new contents to subscribed users?
My backend is currently PHP+MySQL, and I don't mind introducing other backend technologies such as Redis or JMS and stuff if that's the way it should be done.

Sounds like you guys are still in a pretty early stage. There are N-number of ways to solve this, all depending on which stage of DAUs you think you'll hit in the near term, how much money you have to spend on hardware, time in your hands to build it, etc.
You can try an interim table that queues up the newly introduced items, its meta-data on what it entails (which topic, friend user_id list, etc.). Then use a queue-consumer system like RabbitMQ/GearMan to manage the consumption of this growing list, and figure out who should process this. Build the queue-consumer program in Scala or a J2EE system like Maven/Tomcat, something that can persist. If you really wanna stick with PHP, build a PHP REST API that can live in php5-fpm's memory, and managed by the FastCGI process manager, and called via a proxy like nginx, initiated by curl calls at an appropriate interval from a cron executed script.
[EDIT] - It's probably better to not use a DB for a queueing system, use a cache server like Redis, it outperforms a DB in many ways and it can persist to disk (lookup RDB and AOF). It's not very fault tolerant in case the job fails all of a sudden, you might lose a job record. Most likely you won't care on these crash edge cases. Also lookup php-resque!
To prep for the SNS to go out efficiently, I'm assuming you're already de-normalizing the tables. I'd imagine a "user_topic" table with the topic mapped to users who subscribed to them. Create another table "notification_metadata" describing where users prefer receiving notifications (SMS/push/email/in-app notification), and the meta-data needed to push to those channels (mobile client approval keys for APNS/GCM, email addresses, user auth-tokens). Use JSON blobs for the two fields in notification_metadata, so each user will have a single row. This saves I/O hits on the DB.
Use user_id as your primary key for "notification_meta" and user_id + topic_id as PK for "user_topic". DO NOT add an auto-increment "id" field for either, it's pretty useless in this use case (takes up space, CPU, index memory, etc). If both fields are in the PK, queries on user_topic will be all from memory, and the only disk hit is on "notification_meta" during the JOIN.
So if a user subscribes to 2 topics, there'll be two entries in "user_topic", and each user will always have a single row in "notification_meta"
There are more ways to scale, like dynamically creating a new table for each new topic, sharding to different MySQL instances based on user_id, partitioning, etc. There's N-ways to scale, especially in MySQL. Good luck!

Newsfeed with MongoDB with user actions

I'm building a system where users can interact with various activities (buddies, events, etc.). I'm thinking of using mongodb for this, and have done a decent amount of work already. i also read the below, which is similar to what i'm thinking:
Constructing a personalized Facebook-like Newsfeed: SQL, MongoDB?
My concern is this... Say a user is interacting with an event of some sort, currently the data for the event is in MySQL. If the user later on updates parts of the event (specifically the title), is there a quicker way to update the newsfeed item that has the title in it? I'm storing the title as a json message for the activity stream...

Storing socket.io data

I'm developing an app using socket.io where users send and receive data to users who are present in there channels/rooms. Here, I need your suggestion for storing data that is passed from user to a channel. So when some one enters that channel he should be able to get the data from that particular channel he entered.
So how will I save the data to the particular channel?
I had planned for storing data in MySQL database, which will have channel id, channel name, and channel message columns.
But I think it will be a problem if number of users increases and inserting each message as a new row into database?
Please help me the best way for these query.

Until you have thousands of simultaneous users, it hardly matters. Just use whatever you are most comfortable with. When you get thousands of users you can change the architecture, if necessary.

Multi-room chat logging in Rails/MySQL app

I'm coding a browser game application in Rails. It's chat-based, and I'm using currently MySQL for the databases. However, I'm running into a problem when it comes to chat logging for the games.
The application goals dictate that there will be multiple rooms at any given time in which people are playing the chat-based game. Conversation will be pretty much constant, and there are a number of other actions, such as private messages and game actions, which must be logged as well. Players who join the game after other players must be able to see the chat log from before they joined, and games must be available to review.
My first thought was to, on game start, create a database that matches the game identifier, and store everything there. Then when someone joins the game, we could just parse it back to them. Once the game had been over for a certain time, the script would take the database content, parse it into an XML object, and store this in a dataase for game review, deleting the table to keep things running lean.
I created a moddel called Message, with a matching table with identical columns for those I want to store in the game tables - id, timestamp, sender, target (for PMs and actions), type of message and content. Then I set the initializer for the Message object to set the table name to 'game_#{game_id}'. Rails however is throwing tantrums - I get an undefined method has_key? error when I try to initialize the object in Rails. It occurs to me based on this that the method I'm using may be a bit un-Rails-ian, and that possibly it defeats the purpose of working in Rails to pass up using the excellent object/db management features it has.
I've considered other alternatives, such as temporarily keeping all the messages in the main Messages table and just querying them by game ID, but I'm unsure if a MySQL table is up to the task of speedily serving up this data while accepting constant writes, especially in the event that we get a dozen or more games going at once averaging a message or two per second. It was suggested to me that a noSQL solution like a MongoDB capped collection for the temporary storage would be an excellent option from a performance standpoint, but that would still waste all that ActiveRecord goodness that Rails offers.
Is there a reliable and relatively fast way to meet the constraints of making the logged messages able to be quickly stored and fetched for quick access while the game is ongoing and then stored in some type of low-overhead method for review? Would any of the above ideas be workable or is there a whole separate option I've overlooked?

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008