I am working on a project using MySQL and PHP. I will have many (hundreds to thousands, possibly) users, and each user will have many (several thousand) entries relating to him/her. I was initially thinking of sticking all of the entries into one table, and having one of the columns be the user ID which the entry corresponds to, but this table would become huge, and likely hard to manage. I'd need to query the table frequently to get the entries which correspond to a particular user ID, and this may take a while. However, I would rarely need to query data that doesn't share a user ID.
I am now thinking about making a table for every user ID (something like "table1" for userID one, for example), and then just querying the individual tables. However, having thousands of tables sounds like a bad idea as well.
Which would you recommend? Or is there a better solution I haven't though of? (I hope my question made sense!)
The only valid way of doing that is having everything in one table. MySQL in not made for such extreme usages.
I suggest you keep all the entries in one table, each enty having UserID. And don't forget to put the index on that field.
It may be reasonable thinking about multiple tables, but if you do it that way, you queries will actually take more time, and data will use more disk space, because each table creates aditional overhead.
Go with one table, go the only vaid way. Splitting is not an option, you will just create data fragmentation, making yourself hard time when wou will want to for example do a backup.
Just a aditional comment: I have seen 20GB tables many times, but I have never seen a database with more than 100 tables.
Related
In a news feed app, our customer requires to store all the ids of articles that read by a user.
We decided to create a single table for this, but from performance point of view, which of the following is a better approach:
Have one row per user with two fields, a user_id and article_ids, then each time a user read an article, append the id to the article_ids text - using update and concat (we might end up with a huge data in one column).
Have many rows with two columns, user_id and article_id, then each time a user read an article, insert the article_id along with the user_id in as a new record (we might end up with too many rows).
Or if there is a better way, any suggestions are very welcome.
With the second approach, you can keep track of other things which your client might ask going forward.
First, open/read/visit time.
Count of a total number of open/read/visit.
Last open/read/visit time.
In this approach, you can apply the indexing on article_id later on if required.
Note: As #Arjan said in his answer, with proper indexing there is no such a thing as too many rows.
Many records, one for each user_id and article_id combination. That's much easier to update (just insert a row, no need to apply logic) and also allows you to get information about articles when you want to list which ones a user has read. You can use a join and retrieve the correct information from the database at once, instead of having to convert a string to ids and then go back to the database to get the additional data.
With proper indexes there's not really such a thing as too many rows.
Try to split them as much as possible. Your performance will be increased a lot if, because you just have to pick small pieces of your database. If you go for the first option, you have to split it after certain characters to get the information you want. First it is more challenging in programming and if a user has a bad internet connection, the application would be very slow.
Let's say I would like to store votes to polls in mysql database.
As far as I know I have two options:
1. Create one table (let's say votes) with fields like poll_id, user_id, selected_option_id, vote_date and so on..
2. Create a new database for votes (let's say votes_base) and for each poll add a table to this base (a table, which consist the id of the poll in the name), let's say poll[id of the poll].
The problem with the first option is that the table will become big very soon. Let's say I have 1000 polls and each poll has 1000 votes - that's already a million records in the table. I don't know how much of the speed performance that will costs.
The problem with the second option is I'm not sure if this is the correct solution from the programming rules point of view. But I'm sure with this option it will be (much?) faster to find all votes to some poll.
Or maybe there is a better option?
Your first option is the better option. It is structurally more sound. Millions of rows in a table is no problem from MySQL. A new table per poll is an antipattern.
EDIT for first comment:
Even for a billion or more votes, MySQL should handle. Indexes are the key here. What is the difference between one database with 100 times the same table, or one table with 100 times the rows?
Technically, the second option works as well. Sometimes it might be even better. But we frequently see this:
Instead of one table, users, with 10 columns
Make 100 tables, users_uk, users_us, ... depending on where the users are from.
Great, no? Works, yes? Well it does, until you want to select all the male users, or join the users table onto another table. You'll have a huge UNION coming, and you won't even know the tables beforehand.
One big users table, with the appropriate indexes, is better. If it gets too big for your liking (or your disk), you can start with PARTITIONING: you still have the benefit of one table, but the partitions are stored on different locations.
Now, with your polls, these kind of queries might not happen. In that case, one big InnoDB table or 1000s of small tables might both work.. but the first option is a lot easier to program, and has no drawbacks over the second option. Why choose the second option?
The first option is the better, no doubt. Just be sure to define INDEXes for fields you will use to search data (such as poll_id, for sure) and you will not experience performance issues. MySQL is a DBMS perfectly capable to handle such amount of rows. Do not worry.
First option is better. And you can archive tables after a while, if you not going to use it often
I already saw a few forums with this question but they do not answer one thing I want to know. I'll explain first my topic:
I have a system where each log of multiple users are entered to the database (ex. User1 logged in, User2 logged in, User1 entered User management, User2 changed password, etc). So I would be expecting 100 to 200 entries per user per day. Right now, I'm doing it in a single table and to view it, I just have to filter out using UserID.
My question is, which is more efficient? Should I use one single table or create a table per user?
I am worried that if I use a single table, the system might have some difficulty filtering thousands of entries. I've read some pros and cons using multiple tables and a single table especially concerning updating the table(s).
I also want to know which one saves more space? multiple table or single table?
As long as you use indexes on the fields you're selecting from, you shouldn't have any speed problems (although indexes slow writes, so too many are a bad thing). A table with a few thousand entries is nothing to mySQL (or any other database engine).
The overhead of creating thousands of tables is much worse -- say you want to make a change to the fields in your user table -- now you'd have to change thousands of tables.
A table we regularly search against for a single record # work has about 150,000 rows, and because the field we search for is indexed, the search time is in very small fractions of a second.
If you're selecting those records without using the primary key, create an index on the field you use to select like this:
CREATE INDEX my_column_name ON my_table(my_column_name);
Thats the most basic form. To learn more about it, check here
I would go with a single table. With an index on userId, you should be able to scale easily to millions of rows with little issue.
A table per user might be more efficient, but it's generally poor design. The problem with a table per user is it makes it difficult to answer other kinds of questions like "who was in user management yesterday?" or "how many people have changed their passwords?"
As for storage space used - I would say a table per user would probably use a little more space, but the difference between the two options should be quite small.
I would go with just 1 table. I certainly wouldn't want to create a new table every time a user is added to the system. The number of entries you mention for each day really is really not that much data.
Also, create an index on the user column of your table to improve query times.
Definitely a single table. Having tables created dynamically for entities that are created by the application does not scale. Also, you would need to create your queries with variable tables names, something which makes things difficult to debug and maintain.
If you have an index on the user id you use for filtering it's not a big deal for a db to work through millions of lines.
Any database worth its salt will handle a single table containing all that user information without breaking a sweat. A single table is definitely the right way to do it.
If you used multiple tables, you'd need to create a new table every time a new user registered. You'd need to create a new statement object for each user you queried. It would be a complete mess.
I would go for the single table as well. You might want to go for multiple tables, when you want to server multiple customers with different set of users (multi tenancy).
Otherwise if you go for multiple tables, take a look at this refactoring tool: http://www.liquibase.org/. You can do schema modifications on the fly.
I guess, if you are using i.e. proper indexing, then the single table solution can perform well enough (and the maintenance will be much more simple).
Single table brings efficiency in $_POST and $_GET prepared statements of PHP. I think, for small to medium platforms, single table will be fine. Summary, few tables to many tables will be ideal.
However, multiple tables will not cause any much havoc as well. But, the best is on a single table.
I want to crate new table for each new user on the web site and I assume that there will be many users, I am sure that search performance will be good, but what is with maintenance??
It is MySQL which has no limit in number of tables.
Thanks a lot.
Actually tables are stored in a table too. So in this case you would move searching in a table of users to searching in the system tables for a table.
Performance AND maintainibility will suffer badly.
This is not a good idea:
The maximum number of tables is unlimited, but the table cache is finite in size, opening tables is expensive. In MyISAM, closing a table throws its keycache away. Performance will suck.
When you need to change the schema, you will need to do one ALTER TABLE per user, which will be an unnecessary pain
Searching for things for no particular user will involve a horrible UNION query between all or many users' tables
It will be difficult to construct foreign key constraints correctly, as you won't have a single table with all the user ids in any more
Why are you sure that performance will be good? Have you tested it?
Why would you possibly want to do this? Just have one table for each thing that needs a table, and add a "user" column. Having a bunch of tables vs a bunch of rows isn't going to make your performance better.
To give you a direct answer to your question: maintenance will lower your enthousiasm at the same rate that new users sign up for your site.
Not sure what language / framework you are using for your web site, but in this stage it is best to look up some small examples in that. Our guess is that in every example that you'll find, every new user gets one record in a table, not a table in the database.
I would go with option 1 (a table called tasks with a user_id foreign key) in the short run, assuming that a task can't have more than one user? If so then you'll need a JOIN table. Check into setting up an actual foreign key as well, this promotes referential integrity in the data itself.
I am working on a game that I am going to open to the public to have on their game.
The game stores lots of information (about 300 rows) per website and spends a lot of time updating values within this MySQL database.
Is it better (faster/efficient) to add a new table for every website or to just have 1000's of rows in one table and add a column "website_id" or similar?
It is better to add more rows in the same table and create an index on the table. An index on the website_id would probably help a lot.
Definitely go for rows with a website_id. Definitely index the website_id and it should be much more efficient -- I have about 50-60 sites running on a similar concept and it is blazing fast.
Could you perhaps give us some more information pertaining to the kind of information you're storing and how you are going to manipulate this data?
If a single table is going to have 300+ entries for what should be a single entity, you may have a flaw in your physical database design. Ideally, you want to keep your tables from having too many entries, but this is unavoidable at times.
You might also want to look into partitioning as a means of keeping your tables smaller, if you are going to have as much data as you think.
At a glance though, it appears that using a website_id would make sense. Create a table indexing just these, and all of your additional tables can use a field referencing the website_id where required.