How to make a custom CMS go multi-user

How to make a custom CMS go multi-user - mysql

I have made a custom CMS and I'd like to make it multi-client while duplicating as little as possible. I don't mean multi-user as in different people in the same organisation accessing the same program, I mean multiple-clients as in different organisations using their own access of the same program as though they are independent applications.
I understand the principle of sharing functions, and I imagine I'd need to put all the functions I've created into a shared folder in a parent directory.
I think I have got my head around at least the way the code works, but the database(sql) structure seems like the biggest challenge.
How is this typically accomplished?
My tables are fairly basic and I can see after doing some reading that its normal to simply add a 'client_id' or 'app_id' or something like this to every table and entry. This way there is not a duplicate of the databases, however then you get a mixture of all the clients data in the same tables. The problem it seems comes with if this program were to get very large with many clients that the data multiplies and so does the speed of the system for everybody. I'm not at that stage yet however, so should I not worry about that far ahead and cross that bridge when it comes since for now the speed sacrifice would be negligible?
Is it possible to somehow keep databases separate without doubling up on work if I change the structure of a table in the future or add extra fields etc?
I understand this might be difficult to answer without knowing the way I've structure my tables but they are quite simple like:
unique_id | title | modified_date | content
xx hello 0000-00-00 00:00:00 i am content
The best I can think so far is that this would then become:
client_id | unique_id | title | modified_date | content
xx xx hello 0000-00-00 00:00:00 i am content
like I said, I see this could run into some problems mostly with becoming bloated down the track but right now I don't see another way - perhaps you have another way of looking at this. Thanks.

Keep it as a single database with the client_id column added. If it gets large with many clients, partition the tables by LIST: http://dev.mysql.com/doc/refman/5.5/en/partitioning-list.html
Horizontal partitioning allows you to have one logical table be sub-divided so when your SQL includes "... WHERE client_id = 1", it will only ever have to read the index(es) or partition that contain "client_id = 1" data. Other partitions get ignored, almost as if you have a separate table for each client_id.
DISCLAIMER: I haven't used partitioning in MySQL myself. I'm just familiar with the concept from Oracle. Be sure your MySQL storage engine supports partitioning: http://dev.mysql.com/doc/refman/5.5/en/partitioning.html

Your best bet is to make it use separate databases for each of the client's data and retaining all shared information (users, etc.) in one central database. Then when a client grows they can be moved off to another database server without affecting anyone else.

I have a similar situation with my web-app: many users sharing one database, where most of the tables have a client-identifier in them. It's not a CMS, but similar enough, the users are performing CRUD operations on their data.
There are pros and cons, but I wouldn't worry about performance overly. Since you will have to re-create your existing unique indexes to contain the client-id, you should see no great difference in performance: your look-ups now have an additional predicate, which appears in the indexes.
As George3 said, if you have significant volumes from one or a few clients, horizontal partitioning could be worth pursuing. But premature optimization and all that: I would wait and see if it becomes an issue first.
Managing multiple databases, or multiple versions of tables for different clients doesn't scale well, and is a maintenance nightmare. Just get the security on the content right.

Related

MySQL Normalize or Denormalize

I'm building a PHP app to prefill third party PDF account forms with client data, and am getting stuck on the database design.
The current form has about 70 fields, which seems like far too many to set up as individual columns, especially as some (ie company/trust information) are not relevant depending on the type of account the client requires.
I've tried to normalize but it seems like there would be a lot of joins, and also require several sub queries for things like multiple addresses.
It also means a ton of extra queries to check if rows exist or not when updating to decide if the script needs to do an INSERT, a DELETE or an UPDATE, whereas if it was all in one row, it would basically just be an UPDATE each time.
Not sure if this helps but here is a list of most of the fields:
id, account_type, account_phone, account_email, account_designation, account_adviser, account_source, account_complete,
account_residential_unit_number, account_residential_street_number, account_residential_street_name, account_residential_street_type, account_residential_suburb, account_residential_state, account_residential_postcode,
account_postal_unit_number, account_postal_street_number, account_postal_street_name, account_postal_street_type, account_postal_suburb, account_postal_state, account_postal_postcode,
individual_1_title, individual_1_firstname, individual_1_middlename, individual_1_lastname, individual_1_dob, individual_1_occupation, individual_1_email, individual_1_phone,
individual_1_unit_number, individual_1_street_number, individual_1_street_name, individual_1_street_type, individual_1_suburb, individual_1_state, individual_1_postcode,
individual_2_title, individual_2_firstname, individual_2_middlename, individual_2_lastname, individual_2_dob, individual_2_occupation, individual_2_email, individual_2_phone,
individual_2_unit_number, individual_2_street_number, individual_2_street_name, individual_2_street_type, individual_2_suburb, individual_2_state, individual_2_postcode,
company_name, company_date,
company_unit_number, company_street_number, company_street_name, company_street_type, company_suburb, company_state, company_postcode,
trust_name, trust_date,
settlement_bank, settlement_account, settlement_bsb
The most this will need to handle is around 200,000 applications, and once the data is in the database, it won't change very often, if at all - not sure if that is relevant?
So really just wanted to figure out the smartest way to do design this, even if it's just a name or topic to research further.

Generally speaking you can divide a database into two broad categories:
OLTP Systems
Online Transaction Processing Systems are normally write intensive i.e. a lot of updates compared to the reads of the data. This system is typically a day to day application used by a business users of all scopes i.e. data capture, admin etc. These databases are usually normalized to the extreme and then certain demoralized for performance gains in certain areas.
OLAP/DSS system:
On Line Analytic Processing are database that are normally large data warehouse like systems. Used to support Analytic activities such as data mining, data cubes etc. Typically the information is used by a more limited set of users than OLTP. These database are normally very denormalised.
Go read here for a short description of these and the main differences.
OLTP VS OLAP
Regarding your INSERT/UPDATE/DELETE point go read about the MySQL ON DUPLICATE KEY UPDATE statement which will resolve that issue for you easily. It is called a MERGE operation in most database systems.
Now I dont understand why you are worried about JOINS. I have had tables with millions (500 000 000+) rows that I joined with other tables also large in size and the queries ran very fast. So designing a database to eliminate joins is NOT a good idea.
My suggestion is:
If designing a OLTP system normalise as much as possible then denormalise to increase performance where needed. For A OLAP system look at star schemas etc and dont even bother with normalizing it first. Oh by the way most of the OLAP systems normally use a OLTP system as a data source.

Usually I normalise and then denormalise for performance. However
If I didn't have too much validation to do e.g Valid address, duplicated indivual
And I didn't want to reuse parts of the data for another version of the form, e.g select an existing individual , Name and address etc
And I didn't want to analyse it e.g Find all mentions of Fred Bloggs
And my user's were happy with entering all of this one form ( I wouldn't be)
Then I'd go with denormalise from the get go.
Thing is if you normalise, then denormalising if required is fairly trivial and low risk, normalising denormalised data usually means de-duplication which is likely to be really painful data and design wise.

Normalize your input, de-normalize the output. Meaning, for reporting, extract your data into a de-normalized format like Mongo and use that for querying. Or, create rollups of some sort. I have found, with large datasets, to extract the reporting data from the input data for best efficiency.

I find denormalized data extremely painful to work with at a very basic level. What if I want a tally of the number of people who live in Georgia. In your denormalized structure I would have to count where ind_1_state = GA or ind_2_state = GA.
This is not too bad I guess, but to anywho who has seen the ease of querying that normalization provides, it is quite painful.
Normalization establishing the foundation for more and more complex queries. Without it, you will find it increasingly difficult to implement richer data analysis.
Normalization also provides the basis for integrity and consistency in your database. If you have all the occurrences of a particular thing ( state abbreviations ) in one place ( one column ) you can easily check and constrain those values to not allow nonexistent codes.
The rationale for normalization goes on and on, but I hope I hit a few no brainers.

This is no brainer - all you have now is a noun-soup which you have shoved in a single table-storage-shoebox and glued some ID at the beginning of each row.
Create some kind of schema. If this is more like a OLAP -- and you decide for star schema -- it will have dimensions in 2-5 NF and facts in 2-6 NF. For OLTP (or different warehouse models) aim for BCNF - 6NF.
I would argue that you do not even have 1NF here, gluing that ID at the beginning does not count as preventing duplicates. Therefore, you can not de-normalize from this point even if you wanted to :) -- ok, maybe you could put some comma-separated list somewhere to make things definitely not in 1NF.
Joins are what relational databases do, so do not worry about that.

Creating a MySQL Database Schema for large data set

I'm struggling to find the best way to build out a structure that will work for my project. The answer may be simple but I'm struggling due to the massive number of columns or tables, depending on how it's set up.
We have several tools, each that can be run for many customers. Each tool has a series of questions that populate a database of answers. After the tool is run, we populate another series of data that is the output of the tool. We have roughly 10 tools, all populating a spreadsheet of 1500 data points. Here's where I struggle... each tool can be run multiple times, and many tools share the same data point. My next project is to build an application that can begin data entry for a tool, but allow import of data that shares the same datapoint for a tool that has already been run.
A simple example:
Tool 1 - company, numberofusers, numberoflocations, cost
Tool 2 - company, numberofusers, totalstorage, employeepayrate
So if the same company completed tool 1, I need to be able to populate "numberofusers" (or offer to populate) when they complete tool 2 since it already exists.
I think what it boils down to is, would it be better to create a structure that has 1500 tables, 1 for each data element with additional data around each data element, or to create a single massive table - something like...
customerID(FK), EventID(fk), ToolID(fk), numberofusers, numberoflocations, cost, total storage, employee pay,.....(1500)
If I go this route and have one large table I'm not sure how that will impact performance. Likewise - how difficult it will be to maintain 1500 tables.
Another dimension is that it would be nice to have a description of each field:
numberofusers,title,description,active(bool). I assume this is only possible if each element is in its own table?
Thoughts? Suggestions? Sorry for the lengthy question, new here.

Build a main table with all the common data: company, # users, .. other stuff. Give each row a unique id.
Build a table for each unique tool with the company id from above and any data unique to that implementation. Give each table a primary (unique key) for 'tool use' and 'company'.
This covers the common data in one place, identifies each 'customer' and provides for multiple uses of a given tool for each customer. Every use and customer is trackable and distinct.
More about normalization here.

I agree with etherbubunny on normalization but with larger datasets there are performance considerations that quickly become important. Joins which are often required in normalized databases to display human readable information can be performance killers on even medium sized tables which is why a lot of data warehouse models use de-normalized datasets for reporting. This is essentially pre-building the joined reporting data into new tables with heavy use of indexing, archiving and partitioning.
In many cases smart use of partitioning on its own can also effectively help reduce the size of the datasets being queried. This usually takes quite a bit of maintenance unless certain parameters remain fixed though.
Ultimately in your case (and most others) I highly recommend building it the way you are able to maintain and understand what is going on and then performing regular performance checks via slow query logs, explain, and performance monitoring tools like percona's tool set. This will give you insight into what is really happening and give you some data to come back here or the MySQL forums with. We can always speculate here but ultimately the real data and your setup will be the driving force behind what is right for you.

mysql table with 40+ columns

I have 40+ columns in my table and i have to add few more fields like, current city, hometown, school, work, uni, collage..
These user data wil be pulled for many matching users who are mutual friends (joining friend table with other user friend to see mutual friends) and who are not blocked and also who is not already friend with the user.
The above request is little complex, so i thought it would be good idea to put extra data in same user table to fast access, rather then adding more joins to the table, it will slow the query more down. but i wanted to get your suggestion on this
my friend told me to add the extra fields, which wont be searched on one field as serialized data.
ERD Diagram:
My current table: http://i.stack.imgur.com/KMwxb.png
If i join into more tables: http://i.stack.imgur.com/xhAxE.png
Some Suggestions
nothing wrong with this table and columns
follow this approach MySQL: Optimize table with lots of columns - which serialize extra fields into one field, which are not searchable's
create another table and put most of the data there. (this gets harder on joins, if i already have 3 or more tables to join to pull the records for users (ex. friends, user, check mutual friends)

As usual - it depends.
Firstly, there is a maximum number of columns MySQL can support, and you don't really want to get there.
Secondly, there is a performance impact when inserting or updating if you have lots of columns with an index (though I'm not sure if this matters on modern hardware).
Thirdly, large tables are often a dumping ground for all data that seems related to the core entity; this rapidly makes the design unclear. For instance, the design you present shows 3 different "status" type fields (status, is_admin, and fb_account_verified) - I suspect there's some business logic that should link those together (an admin must be a verified user, for instance), but your design doesn't support that.
This may or may not be a problem - it's more a conceptual, architecture/design question than a performance/will it work thing. However, in such cases, you may consider creating tables to reflect the related information about the account, even if it doesn't have a x-to-many relationship. So, you might create "user_profile", "user_credentials", "user_fb", "user_activity", all linked by user_id.
This makes it neater, and if you have to add more facebook-related fields, they won't dangle at the end of the table. It won't make your database faster or more scalable, though. The cost of the joins is likely to be negligible.
Whatever you do, option 2 - serializing "rarely used fields" into a single text field - is a terrible idea. You can't validate the data (so dates could be invalid, numbers might be text, not-nulls might be missing), and any use in a "where" clause becomes very slow.
A popular alternative is "Entity/Attribute/Value" or "Key/Value" stores. This solution has some benefits - you can store your data in a relational database even if your schema changes or is unknown at design time. However, they also have drawbacks: it's hard to validate the data at the database level (data type and nullability), it's hard to make meaningful links to other tables using foreign key relationships, and querying the data can become very complicated - imagine finding all records where the status is 1 and the facebook_id is null and the registration date is greater than yesterday.
Given that you appear to know the schema of your data, I'd say "key/value" is not a good choice.

I would advice to run some tests. Try it both ways and benchmark it. Nobody will be able to give you a definitive answer because you have not shared your hardware configuration, sample data, sample queries, how you plan on using the data etc. Here is some information that you may want to consider.
Use The Database as it was intended
A relational database is designed specifically to handle data. Use it as such. When written correctly, joining data in a well written schema will perform well. You can use EXPLAIN to optimize queries. You can log SLOW queries and improve their performance. Databases have been around for years, if putting everything into a single table improved performance, don't you think that would be all the buzz on the internet and everyone would be doing it?
Engine Types
How will inserts be affected as the row count grows? Are you using MyISAM or InnoDB? You will most likely want to use InnoDB so you get row level locking and not table. Make sure you are using the correct Engine type for your tables. Get the information you need to understand the pros and cons of both. The wrong engine type can kill performance.
Enhancing Performance using Partitions
Find ways to enhance performance. For example, as your datasets grow you could partition the data. Data partitioning will improve the performance of a large dataset by keeping slices of the data in separate partions allowing you to run queries on parts of large datasets instead of all of the information.
Use correct column types
Consider using UUID Primary Keys for portability and future growth. If you use proper column types, it will improve performance of your data.
Do not serialize data
Using serialized data is the worse way to go. When you use serialized fields, you are basically using the database as a file management system. It will save and retrieve the "file", but then your code will be responsible for unserializing, searching, sorting, etc. I just spent a year trying to unravel a mess like that. It's not what a database was intended to be used for. Anyone advising you to do that is not only giving you bad advice, they do not know what they are doing. There are very few circumstances where you would use serialized data in a database.
Conclusion
In the end, you have to make the final decision. Just make sure you are well informed and educated on the pros and cons of how you store data. The last piece of advice I would give is to find out what heavy users of mysql are doing. Do you think they store data in a single table? Or do they build a relational model and use it the way it was designed to be used?
When you say "I am going to put everything into a single table", you are saying that you know more about performance and can make better choices for optimization in your code than the team of developers that constantly work on MySQL to make it what it is today. Consider weighing your knowledge against the cumulative knowledge of the MySQL team and the DBAs, companies, and members of the database community who use it every day.

At a certain point you should look at the "short row model", also know as entity-key-value stores,as well as the traditional "long row model".
If you look at the schema used by WordPress you will see that there is a table wp_posts with 23 columns and a related table wp_post_meta with 4 columns (meta_id, post_id, meta_key, meta_value). The meta table is a "short row model" table that allows WordPress to have an infinite collection of attributes for a post.
Neither the "long row model" or the "short row model" is the best model, often the best choice is a combination of the two. As #nevillek pointed out searching and validating "short row" is not easy, fetching data can involve pivoting which is annoyingly difficult in MySql and Oracle.
The "long row model" is easier to validate, relate and fetch, but it can be very inflexible and inefficient when the data is sparse. Some rows may have only a few of the values non-null. Also you can't add new columns without modifying the schema, which could force a system outage, depending on your architecture.
I recently worked on a financial services system that had over 700 possible facts for each instrument, most had less than 20 facts. This could have been built by setting up dozens of tables, each for a particular asset class, or as a table with 700 columns, but we chose to use a combination of a table with about 20 columns containing the most popular facts and a 4 column table which contained the other facts. This design was efficient but was difficult ot access, so we built a few table functions in PL/SQL to assist with this.

I have a general comment for you,
Think about it: If you put anything more than 10-12 columns in a table even if it makes sense to put them in a table, I guess you are going to pay the price in the short term, long term and medium term.
Your 3 tables approach seems to be better than the 1 table approach, but consider making those into 5-6 tables rather than 3 tables because you still can.
Move currently, currently_position, currently_link from user-table and work from user-profile into a new table with your primary key called USERWORKPROFILE.
Move locale Information from user-profile to a newer USERPROFILELOCALE information because it is generic in nature.
And yes, all your generic attributes in all the tables should be int and not varchar.
For instance, City needs to move out to a new table called LIST_OF_CITIES with cityid.
And your attribute city should change from varchar to int and point to cityid in LIST_OF_CITIES.
Do not worry about performance issues; the more tables you have, better the performance, because you are actually handing out the performance to the database provider instead of taking it all in your own hands.

Is it good practice to automatically create tables in a database when a user registers?

I'm making my first site with Django, and I'm having a database design problem.
I need to store some of the users history, and I don't know whether it's better to create a table like this for each user every time one signs up:
table: $USERNAME$
id | some_data | some_more | even_more
or have one massive table from the start, with everyone's data in:
table: user_history
id | username | some_data | some_more | even_more
I know how to do the second one, just declare it in my Django models. If I should do the first one, how can I in Django?
The first one organises the data more hierarchically but could potentially create a lot of tables depending on the popularity of the service (is this a bad thing?)
The second one seems to more suit Django's design philosophies (from what I've seen so far), and would be easier to run comparative searches between users, but could get huge (number of users * average items in history). Can MySQL handle, say, 1 billion records? (I won't get that, but it's good to plan ahead)

Definitely the second format is the way you want to go. MySQL is pretty good at handling large numbers of rows (assuming they're indexed and cached as appropriate, of course). For example, all versions of all pages on Wikipedia are stored on one table in their database, and that works absolutely fine.

I just don't know what Django is, but I'm sure it's not a good practice to create a table per user for logging, (or almost anything, for that matter).
Best regards.

You should definitely store all users in one table, one row per user. It's the only way you can filter out data using a WHERE clause. And I'm not sure if MySQL can handle 1 billion records, but I've never found the records limit as a limiting factor. I wouldn't worry about the records limit for now.

You see, every high-loaded project started with something that was just well-designed. Well designed system has better perspectives of being improved to handle huge loads.
Also keep in mind, that even genious guys in twitter/fb/etc did not know what issues they will experience after a while. And you will not know either. Solving loading/scalability challenges and their prediction is a sort of rocket-science.
So the best you can do now - is just starting with the most normalized db and academic solutions, and solve the bottlenecks as soon as they will appear.

When creating a relational database, you would only want to create a new table if it contains significantly different data than the original table. In this case, all of the tables will be pretty much the same, so you would only want 1 table for all users.
If you want to break it down even further, you may not want to store all the users actions in the user table. You may want to have 1 table for user information, and another for user history, ie:
table: User
Id | UserName | Password | other data
table: User_history
Id | some_data | timestamp
There's no need to be worried about the speed of your database as long as you define proper indexes on the fields you plan to search. Using those indexes will definitely speed up your response time as more records are put into your table. The database I work on has several tables with 30,000,000+ records and there's no slow-down.

Definitely DO NOT create a TABLE per user. Create a row per user, and possibly a row per user and smaller tables if some data can be factored.

definitely stick with one table for all users, consider complicated queries that may request extra resources for running on multiple tables instead of just one.
run some tests, regarding resources i am sure you will find out one table works best.

Everyone has pointed out that the second option is the way to go, I'll add my +1 to that.
About the first option, in Django, you create tables by declaring subclasses of django.models.Model and then when you run the management command syncdb it will look at all the models and create missing tables for all "managed" models. It might be possible to invoke this behavior at run time, but it isn't the way things are done.

Private messaging system, large single table versus many small tables

I'm considering a design for a private messaging system and I need some input here, basically I have several questions regarding this. I've read most of the related questions and they've given me some thought already.
All of the basic messaging systems I've thus far looked into use a single table for all of the users' messages. With indexes etc this approach would seem fine.
What I wanted to know is if there would be any benefit to splitting the user messages into separate tables. So when a new user is created a new table is created (either in the same or a dedicated message database) which stores all of the messages - sent and received -for that user.
What are the pitfalls/benefits to approaching things that way?
I'm writing in PHP would the code required to write be particularly more cumbersome than the first large table option?
Would the eventual result, with a large amount of smaller tables be a more robust, trouble free design than one large table?
In the event of large amounts of concurrent users, how would the performance of the server compare where dealing with one large versus many small tables?
Any help with those questions or other input would be appreciated. I'm currently working through a smaller scale design for my test site before rewriting the PM module and would like to optimise it. My poor human brain handles separate table far more easily, but the same isn't necessarily so for a computer.

You'll just get headaches from moving to small numerous tables. Databases are made for handling lots of data, let it do it's thing.
You'll likely end up using dynamic table names in queries (SELECT * FROM $username WHERE ...), making smart features like stored procedures and possibly parameterized queries a lot trickier if not outright impossible. Usually a really bad idea.
Try rewriting SELECT * FROM messages WHERE authorID = 1 ORDER BY date_posted DESC, but where "messages" is anywhere between 1 and 30,000 different tables. Keeping your table relations monogamous will keep them bidirectional, way more useful.
If you think table size will really be a problem, set up an "archived messages" clone table and periodically move old & not-unread messages there where they won't get in the way. Also note how most forum software with private messaging allows for limiting user inbox sizes. There are a few ways to solve the problem while keeping things sane.

I'm agreeing with #MarkR here - in that initially the one table for messages is definitely the way to proceed. As time progresses and should you end up with a very large table then you can consider how to partition the table to best proceed. That's counter to the way I'd normally advise design, but we're talking about one table which is fairly simple - not a huge enterprise system.
A very long time ago (pre availability of SQL databases) I built a system that stored private and public messages, and I can confirm that once you split a message base logical entity into more than one everything¹ becomes a lot more complicated; and I doubt that a user per file is the right approach - the overheads will be massive compared to the benefit.
Avoid auto-increment[2] - and using natural keys is very important to the future scalability. Designing well to ensure that you can insert and retrieve without locking will be of more benefit.
¹ Indexing, threading, searching, purging/archiving.
² Natural keys are better if you can find one for your data as the autoincremented ID does not describe the data at all and databases are good at locating based on the primary key, so a natural primary key can improve things. Autoincrement can cause problems with a distributed database; it also leaks data when presented externally (to see the number of users registered just create a new account and check your user ID). If you can't find a natural key then a UUID (or GUID) may still be a better option - providing that the database has good support for this as a primary key. See When to use an auto-incremented primary key and when not to

Creating one table per user certainly won't scale well when there are a large number of users with a small number of messages. The way MySQL handles table opening/closing, very large numbers of tables (> 10k, say) become quite inefficient, especially at server startup and shutdown, as well as trying to backup non-transactional tables.
However, the way you've worded your question sounds like a case of premature optimisation. Make it work first, then fix performance problems. This is always the right way to do things.
Partitioning / sharding will become necessary once your scale gets high enough. But there are a lot of other things to worry about in the mean time. Sort them out first :)
One table is the right way to go from an RDBMS PoV. I recommend you use it until you know better.

Splitting large amounts of data into smaller sets makes sense if you're trying to avoid locking issues: for example - locking the messages table - doing big selects or updating huge amounts of data at once. In this case long running queries could block whole table and everyone needs to wait... You should ask yourself if this going to happen in your case? At least for me it looks like messaging system is not going to have such things because all information is being pushed into table or retrieved from it in rather small sets. If this is a user centric application - so, for example, getting all messages for single user is quite easy and fast to do, the same goes also for creating new messages for one or another particular user... Unless you would have really huge amounts of users/messages in your system.
Splitting data into multiple tables has also some drawbacks - you will need kind of management system or logic how do you split everything - giving separate table for each user could grow up soon into hundreds or thousands of tables - which is, in my opinion, not that nice. Therefore probably you would need some other criteria how to split the data. If you want splitting logic to be dynamic and easy adjustable - you would probably need also to save it in DB somehow. As you see complexity grows...
As advantage of such data sharding could be the scalability - you could easy put different sets of data on different machines once single machine is not able to handle whole load.

It depends how your message system works.
Are there cuncurrency issue?
Does it need to be scalable as the application accomodate more customers?
Designing one table will perfectly work on small, one message at a time single user system.
However, if you are considering multiple user, concurrent messaging system, the tables should be splited
Data model for Real time application is recommended to be "normalized"(Spliting table) due to "locking & latching" and data redundency issue.
Locking policy varies by Database Vendor. If you have tables that have updates & select by applicaiton concurrently, "Locking"(page level, row level, table level depending on vendor) issue araise. Some bad DB & app design completely lock the table so message never go through.
Redendency issue is more clear. If you use only one table, some information(like user. I guess one user could send multiple messages) is redundent.
Try to google with "normalization", 'Locking"..

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008