I'm building a web application that uses database (MySQL in particular).
For each database, each user will have their own table, that each has their category. For example:
Database 1 (Music Playlist) - Tables:User1,User2,User3
Database 2 (Wall posts) - Tables:User1,User2,User3
Database 3 (Wall replies) - Tables:User1_post1,User1_post2,User3_post1
Let's say I have 100,000 users now. Thinking about how many tables there are in total, will this be wise? Is there a maximum table limit to this? Will this affect performance?
I'm taking a course right now and I just realized there is JOINing of tables. Is this a better idea? Will it have a difference in performance?
Relational databases are designed to handle large amounts of data. Tables with many millions of rows are not uncommon; there are examples with billions of rows as well. So, you don't have to worry about 100,000 users, as long as you understand how to structure your database. Two key ideas are indexes and partitions.
You do have a problem with your structure, however. You do not want a separate table for each user. You want a single table with a column specifying the user. Although the tables will have hundreds of thousands or millions of rows, you do not need to worry. Databases are designed for this type of volume.
Related
I am creating a database and a project. In this project we will create different-different companies. We have two options for create database.
Create a common table for all companies and save all information in a single table. Suppose company_daily_records which will have all companies data. Suppose a company have 1,00,000 records and we have 1000 companies so this company_daily_records will have 1,00,000*1000 records
Create separate db table for each company so their will be 1000 company_daily_records tables and each table will have 1,00,000 records.
Which db performance will be good,
Also which db SQL language we should prefer?
1) if you create separate database for each company, which is more likely, then your record will be organized. But if your project deal with all companies at the same time then you have to switch your connection frequently.
2) if you create one database for all companies, it is possible also you just have to add a additional table of 'company' includes all companies that can be used as foreign_key in e.g 'employee' table to separate employees from specific company...
But it has complexity of records as its not in very organized form.
As you mention the daily record can be in billions, I suggest you to go with separate databases that will surely save searching, query time which is the most important aspect...
--> I think you can use mysql to manage your record.
Thankyou
I would not suggest create a table for each companies because:
How do you know what/how many companies there will be?
When you have a new company, you would possibly need to create a new table in database, and update your application code manually. It could be made automatic, but not an easy task though
Because you are at the early state now, it is fine to with the traditional way of relational database. That is to a company table a company_record table. You can worry about performance later when it happens or when you have spare time for optimization
Don't design the schema for a large dataset until you have some thoughts on how how the data will be inserted and queried.
You need to avoid scanning 100 million (10 crore) rows to get an answer; it will be painfully slow. That implies indexing.
NoSQL implies no indexing, or you have to build the indexes yourself. You would be better off with a real RDBMS doing such heavy-lifting for you.
If you split by company into tables or databases or partitions or shards:
Today you have 1000 tables (etc), tomorrow you have 1123.
Any operation that goes across companies will be difficult and slow.
Working with 1000 tables/dbs/partition, or especially shards, has inefficiencies.
I vote for a single 'large' (but not 'huge') table with a SMALLINT UNSIGNED (2-bytes) column for company_id.
Since you are into the "Data Warehouse" realm, Summary Tables come to mind.
Will you be deleting "old" data? That is another thing to worry about in large tables.
Inserting 1000 rows per day is no problem. (1000/second would be another story.)
In my project, there are 20 mln users in two types 10mln for the first type and 10 mln for the second type. These users have access rights to other tables and use them. Also, I am using MySql database. My question is, Will it affect the performance of database if I add these two types of users in one table with 20mln users. Will it be slower or 20 mln records doesn't affect the performance for DBMS ?
If there is a index on type then it wont matter much on number of records, though your hardware configuration is a different matter all together.
One more point to consider is, that are you doing query on both type in one statement or not. If not go for different tables , if yes it will be good to have them in one table to save a join.
Also do consider your schema as whole(which is not provided here)
20 million rows is well within the capability of MySQL. But you need to be considerate when forming your SQL queries as inefficient queries can lead to slow performance.
If you are using Laravel's Eloquent then that is mostly taken care of.
Also, you might want to read MySQL tuning
I am creating a test site for many user to take many quizes. I want to store these results into a table. Each user can take up 5000 quizzes. My question is...Would it be better to make a table for each user and store his results into his own table (QuizID, Score)...OR...Would it be better to store ALL the results into ONE table (UserID, QuizID, Score)?
Example
5000 questions PER table * 1000 User Tables
VS
1 Table with 5,000,000 rows for the same 1000 Users.
Also, is there a limit to ROWs or TABLEs a DB can hold?
There is a limit to how much data a table can store. On modern operating systems, this is measured in Terabytes (see the documentation).
There are numerous reasons why you do not want to have multiple tables:
SQL databases are optimized for large tables, not for large numbers of tables. In fact, having large numbers of tables can introduce inefficiencies, because of partially filled data pages.
5,000,000 rows is not very big. If it is, partitioning can be used to improve efficiency.
Certain types of queries are a nightmare, when you are dealing with hundreds or thousands of tables. A simple question such as "What is the average of number of quizzes per user?" becomes a large effort.
Adding a new user requires adding new tables, rather than just inserting rows in existing tables.
Maintaining the database -- such as adding a column or an index -- becomes an ordeal, rather than a simple statement.
You lose the ability to refer to each user/quiz combination for foreign key purposes. You may not be thinking about it now, but perhaps a user starts taking the same quiz multiple times.
There are certain specialized circumstances where dividing the data among multiple tables might be a reasonable alternative. One example are security requirements, where you just are not allowed to mix different user's data. Another example would be different replication requirements on different subsets of the data. Even in these cases, it is unlikely that you would have thousands of different tables with the same structure.
Ideally you should have this approach.
Question Table with all the questions and primary key question Id.
User table with user details.
Table with 1 to many relationship having User id , quiz id and answer.
You are worrying about many rows in table but think there will be some user who will take only max 10-15 quiz. You will end up creating for 10 rows.
After reading about foreign keys and references on MySQL, I got this question in my mind.
If you run an email marketing service, and let your users create contact lists, that will hold tons of records, thousands probably a million or two, would it be better idea to have one table for each user's contact list? Or one big table that has all the contacts ever added to the website, and reference them to the list the belong?
If you have 50,000 users, and each of them has 2 contact lists each of 10,000 emails, you have 1 million emails logged into 1 single table. Wouldn't it be too intensive to run a select query on that table like...
SELECT * FROM name_of_the_big_table
WHERE belonging_contact_list = id_of_the_contact_list
If you have 50,000 users, and each of them has 2 contact lists each of 10,000 emails, you have 1 million emails logged into 1 single table. Wouldn't it be too intensive to run a select query on that table like...
No
You need to consider the design and implementation of your MySql environment and queries but handling the scale you mention is not inherently a problem.
Scale Mysql
A million records isn't a big number for a database - they're meant to handle such cases. You're better off having less fragmentation by keeping all contacts in one table - this way there will be no repetition. I would prefer the latter option.
Assuming I have an application where I expect no more than 50,000 users.
I have 7 tables in my DB schema. Is it a good idea to replicate all the
tables for every user? So, in my case, number of tables will roughly be
50,000 * 7 = 350,000.
Is it in anyway better than 7 monolithic tables?
NO, I would not recomend creating a table schema per user in the same database.
mysql should handle the load perfectly well, given the correct indexes on tables.
What you're proposing is horizontal partitoning. http://en.wikipedia.org/wiki/Partition_%28database%29
i.e. you take all the rows in what would (logically) be one table, and you spread those rows across multiple tables, according to some partitioning scheme (in this case, one table per user).
In general, for starter applications of this nature, only consider partitioning (along with a host of other options) when performance starts to suffer.
No, it's definitely not. For two reasons:
The database design should not change just because you store more data in it.
Accessing different tables is more expensive and more complicated than accessing specific data in a single table.
Just image if you wanted statistics for a value from each user. You would have to gather data from 50 000 tables using 50 000 separate queries instead of running a single query against one table.
A table can easily contain millons of records, but i'm not sure that the database can even contain 350 000 tables.
Certainly not. A DB is typically optimized for having multiple rows in a given table... Try to rethink your DB schema to have a field in each of your tables that holds the user's ID or another associative table that holds the user id and key for the particular entry in the data table.
There's a decent intro to DB design here.