I am building a small e-mail like messaging app for a project, where a user would send out a message to another with information like meeting times and such, and I'm wondering about how to store all the messages exchanged.
My issues are:
Should I store all messages between all users in one database table (where it would be expensive to get the messages of each user when they log in)? Or should each user have a personal table for his/her messages(would have too many tables)?
I also need to store the events that the user accepts. Again, should these be in one table for all users or a separate table for each (I need to retrieve these quite often)?
I've searched on the site for other similar questions but most seem to focus on real-time messaging or on specific implementation technologies.
Thanks for the help!
A table per user is a bad idea. It means every query will be different and for every new user you will need to modify the database. This is hard to build, hard to maintain, and inefficient for your database too.
So, just store it in one table. A couple of millions of rows won't be a problem if you have proper indexes (and proper hardware).
If you fear for bad performance, you may delete very old messages. Or you can move them to an 'archive' table. If a user wants to view recent messages (of the past year or so), they can get it from the normal table, and older messages can be fetched from the other one. It's usually acceptable that digging into the archives is a bit slower, so it's probably okay if that table grows very large.
That said, you already mentioned e-mail. I'd seriously consider inspecting the possibilities of actual e-mail and the post boxes that come with it. There are many existing implementations, and it's a powerful protocol that has survived since the dawn of the internet, so maybe you shouldn't reinvent the wheel.
E-mail can have headers (custom headers too), and multiple parts, so even if a 'normal' e-mail won't suffice, you can still use e-mail as a transport layer for custom types of messages.
Related
I want an admin account to send an announcement to all users in the db.
Right now for my Message table I am storing a message every time for user to user messages with senderId and receiverId etc.
My problem is can I treat announcement the same way as user to user messages and if yes, would it be wise to save into the message table n times for n number of users in the db every time there is an announcement?
So I want to see if there are cleaner approach to this.
It depends on how much time/effort you want to invest in this.
Separate table for announcements: You won't be able to reuse your current messaging system, but you will have maximum flexibility (special GUI features for announcements, they won't get mixed up with normal messages, etc.)
Modify your current messaging system to support multi-recipient and/or broadcast messages. With this you can reuse most of your current GUI with some backend modifications.
Do the simplest possible thing and send a message to everyone. This is very easy to implement. The obvious downside is that you will have a lot of copied messages in your DB, which may or may not be a problem.
I am working on a project where I am going to make a community.
The problem I encounter is the following:
There is going to be 8 different types of pages on the site like "Store", "Event" and "Blog".
Some pages will be owned by a company others will be owned by a user.
Some pages should have the possibility of writing messages to users and other pages (pages that is allowed to send messages to each other).
Users are going to be able to:
1. Follow some page types but not others
2. Like some page types but not others
3. Write a public message on some page types profile page but not others
4. Write personal messages to some page types but not others
5. All users who are administrators on a page are going to get an extra inbox where he/she can read and respond to the pages messages
I have been trying out a lot of different approaches to make this work, but I however I do it I get a LOT of tables where I need to make a LOT of joins which makes me worried about the performance later on.
The best solution I have come up with is to make the following tables:
Users
Pages
Stores
Events
Blogs
And set all page types (users, stores, events, blogs) to own a page.
Then all pages are allowed to message each other.
Store_likes
Event_followers
Blog_followers
to control which page types are allowed to be followed or liked
Inboxes
That represents which page inbox belongs to which users
Is this the best way of doing it, or do anyone have a better solution?
Any input is greatly appreciated!
/ Elias
You are asking a very general modeling question here which is difficult to answer. But general my recommendation is to first create a model. This amounts to deciding what the entities are, what are the primary keys and unique keys and properties of those entities and the relationships that exists between the entities. For each property decide whether it is required or optional. For relations between entities decide the cardinality of the relation and the inverse relation. Create a model first without worrying about performance at first. Then implement the model in the database. Your application is very standard. Dont worry about lots of tables and joins. Most likely it will be OK.
I'm working on a SNS like mobile app project, where users upload their contents and can see updates of their subscribed topic or friends on their homepage.
I store user contents in mysql, and query the user specific homepage data by simply querying out first who and what the user subscribed and then query the content table filtering out using the 'where userid IN (....) or topic IN (....)' clause.
I suspect this would become quite slow when the content table piles up or when a user subscribe tons of users or topics. Our newly released app is already starting to have thousands of new users each week, and getting more over time. Scalability must be a concern for us right now.
So I wonder how Facebook or Twitter handle this subscribing problem with their amazing number of users. Do they handle a list for each user? I tried to search, but all I got is how to interact with Facebook or Twitter rather than how they actually implement this feature.
I noticed that you see only updates rather than history in your feed when using Facebook. Which means that subscribing a new user won't dump lots out dated content into your feed as how it would be by using my current method.
How do Facebook design their database and how did they dispatch new contents to subscribed users?
My backend is currently PHP+MySQL, and I don't mind introducing other backend technologies such as Redis or JMS and stuff if that's the way it should be done.
Sounds like you guys are still in a pretty early stage. There are N-number of ways to solve this, all depending on which stage of DAUs you think you'll hit in the near term, how much money you have to spend on hardware, time in your hands to build it, etc.
You can try an interim table that queues up the newly introduced items, its meta-data on what it entails (which topic, friend user_id list, etc.). Then use a queue-consumer system like RabbitMQ/GearMan to manage the consumption of this growing list, and figure out who should process this. Build the queue-consumer program in Scala or a J2EE system like Maven/Tomcat, something that can persist. If you really wanna stick with PHP, build a PHP REST API that can live in php5-fpm's memory, and managed by the FastCGI process manager, and called via a proxy like nginx, initiated by curl calls at an appropriate interval from a cron executed script.
[EDIT] - It's probably better to not use a DB for a queueing system, use a cache server like Redis, it outperforms a DB in many ways and it can persist to disk (lookup RDB and AOF). It's not very fault tolerant in case the job fails all of a sudden, you might lose a job record. Most likely you won't care on these crash edge cases. Also lookup php-resque!
To prep for the SNS to go out efficiently, I'm assuming you're already de-normalizing the tables. I'd imagine a "user_topic" table with the topic mapped to users who subscribed to them. Create another table "notification_metadata" describing where users prefer receiving notifications (SMS/push/email/in-app notification), and the meta-data needed to push to those channels (mobile client approval keys for APNS/GCM, email addresses, user auth-tokens). Use JSON blobs for the two fields in notification_metadata, so each user will have a single row. This saves I/O hits on the DB.
Use user_id as your primary key for "notification_meta" and user_id + topic_id as PK for "user_topic". DO NOT add an auto-increment "id" field for either, it's pretty useless in this use case (takes up space, CPU, index memory, etc). If both fields are in the PK, queries on user_topic will be all from memory, and the only disk hit is on "notification_meta" during the JOIN.
So if a user subscribes to 2 topics, there'll be two entries in "user_topic", and each user will always have a single row in "notification_meta"
There are more ways to scale, like dynamically creating a new table for each new topic, sharding to different MySQL instances based on user_id, partitioning, etc. There's N-ways to scale, especially in MySQL. Good luck!
I'm looking into the logistics of building an Activity Feed, similar to that of Facebook, or Twitter's timeline.
There are tons of answers here on StackOverlfow and on Quora and other articles I've found on google that describe fanning out on read or write. It all makes sense. You record all the activity in one main activity table/collection, and then at some point, write a copy of that data to separate, appropriate tables for each user.
What I dont completely understand is why is there a need for a fanout? That is, Why is there a need to record the activity on individual user feeds? Is there a reason why you cant just use one activity table/collection? It would have appropriate indexes, and have the acting user's ID. And then, when someone wants to see their activity stream, just query the activity stream for users that the current user is following.
I understand that this may not be as efficient since activities outnumber actual objects in the database a few times over. That is, there are might be 100 posts in a database, but over 1,000 actions on posts, thus queries may be slow on the activity table/collection when row numbers get pretty high.
But wouldnt this work? Cant you just scale the database so it can handle queries more efficiently? Is there really a need for fanning out?
Not necessary to fan-out always, but the decision is depends on many factors.
For eg. Twitter does both but Facebook follows fan-out-on-load.
As you can imagine, Facebook's activity stream is much more complex than Twitter's. FB needs to apply lot of filters/privacy settings per user/group basis, hence it make sense for them to pull and build the stream on-the fly. Their TAO graph infrastructure (Graphing on top of MySQL + Caching) makes it easy for them to build and fetch the feeds quite fast for each user.
I've kinda silly question. I have a small community website. I'm thinking to make specific pages which can be viewed only by the members who have permission. So I suppose i will add each member ID in the database and when a member will try to access the page then i will first check if the member is logged in and then i will check the user ID, if it exists in the database table of users which have permission to view that content. Now Im just wondering if the database grows up, wont it take a long time to check everythng before loading the page?
Premature optimization is the root of all evil (Donald Knuth)
You can easily handle several millions of users with a single database, so that won't be a problem until your community is huge. When you reach that step, you can switch to more scalable DB solutions like Cassandra.
Having that said, take Brad Christie's comment into account, and use a reasonable identity management that won't thrash your database unnecessarily.
"a long time" is subjective and depends on many factors. For a small community website, you will likely not run into any issues with the method you've described. Still, it is considered best practice, and will speed up queries significantly, if you make use of proper indexes. Columns that will be queried against, such as the user ID, should be indexed. Not using an index means that MySQL has to read every record in your table and check to see if it matches your criteria.
This article may be of use to you:
http://www.databasejournal.com/features/mysql/article.php/1382791/Optimizing-MySQL-Queries-and-Indexes.htm
Also, if you are concerned about how your site will perform when your dataset grows, consider populating it with a bunch of dummy data and running a few tests. This site will help you generate a bunch of data to put in your database.
http://www.generatedata.com/#about
Lastly, if pages are not specific to a particular person or small group of people, consider using more general buckets for access control. For example, if only admins can view a page, tie that page to an "admin" permission and note which users are admins. Then, you can do a quick check to see what type or types of user a particular person is, and decide to show them the page or not. This type of system is typically refered to as an Access Control List (ACL).
http://en.wikipedia.org/wiki/Access_control_list