Tables with less rows vs ONE table with MANY Rows - mysql

I am creating a test site for many user to take many quizes. I want to store these results into a table. Each user can take up 5000 quizzes. My question is...Would it be better to make a table for each user and store his results into his own table (QuizID, Score)...OR...Would it be better to store ALL the results into ONE table (UserID, QuizID, Score)?
Example
5000 questions PER table * 1000 User Tables
VS
1 Table with 5,000,000 rows for the same 1000 Users.
Also, is there a limit to ROWs or TABLEs a DB can hold?

There is a limit to how much data a table can store. On modern operating systems, this is measured in Terabytes (see the documentation).
There are numerous reasons why you do not want to have multiple tables:
SQL databases are optimized for large tables, not for large numbers of tables. In fact, having large numbers of tables can introduce inefficiencies, because of partially filled data pages.
5,000,000 rows is not very big. If it is, partitioning can be used to improve efficiency.
Certain types of queries are a nightmare, when you are dealing with hundreds or thousands of tables. A simple question such as "What is the average of number of quizzes per user?" becomes a large effort.
Adding a new user requires adding new tables, rather than just inserting rows in existing tables.
Maintaining the database -- such as adding a column or an index -- becomes an ordeal, rather than a simple statement.
You lose the ability to refer to each user/quiz combination for foreign key purposes. You may not be thinking about it now, but perhaps a user starts taking the same quiz multiple times.
There are certain specialized circumstances where dividing the data among multiple tables might be a reasonable alternative. One example are security requirements, where you just are not allowed to mix different user's data. Another example would be different replication requirements on different subsets of the data. Even in these cases, it is unlikely that you would have thousands of different tables with the same structure.

Ideally you should have this approach.
Question Table with all the questions and primary key question Id.
User table with user details.
Table with 1 to many relationship having User id , quiz id and answer.
You are worrying about many rows in table but think there will be some user who will take only max 10-15 quiz. You will end up creating for 10 rows.

Related

Best database design to have efficient analysis on it with some millions records

I have a basic question about database designing.
I have a lot of files which I have to read and insert them in database. Each file has some thousand lines and each line has about 30 fields (by these types: small int, int, big int, varchar, json). Of course I use multi threads along with bulk inserting in order to increase insert speed (finally I have 30-40 millions records).
After inserting I want to have some sophisticated analysis and the performance is important to me.
Now I get each line fields and I'm ready to insert so I have 3 approaches:
1- One big table:
In this case I can create a big table with 30 columns and stores all of the files fields in that. So there is a table with huge size which I want to have a lot of analysis on it.
2- A fairly large table (A) and some little tables (B)s
In this case I can create some little tables which consist of the columns that have fairly identical records if we separate them from the other columns. So these little tables just has some hundred or thousand records instead of 30 millions records. So in fairly large table (A), I emit the columns which I put them in another table and I use a foreign key instead of them. Finally I has a table (A) with 20 columns and 30 millions records and some tables (B) with 2-3 columns and 100-50000 records for each of them. So in order to analysis the table A, I have to use some joins ,for example in select and ...
3- just a fairly large table
In this case I can create a fairly large table like table A in above case (with 20 columns) and instead of using foreign keys, I use a mapping between source columns and destination columns (this is something like foreign keys but has a little difference). For example I have 3 columns c1,c2,c3 that in case 2, I put them in another table B and use foreign key to access them, but now I assign a specific number to each distinctive records consist of c1,c2,c3 at inserting time and store the relation between the record and its assigned value in the program codes. So this table is completely like the table A in case number 2 but there is no need to use join in select or ...
While the inserting time is important, the analysis time that I will have is more important to me, so I want to know your opinion about which of these case is better and also I will glad to see the other solutions.
From a design perspective 30 to 40 million is not that bad a number. Performance is fully dependent on how you would design your DB to be.
If you are using SQL Server then you could consider putting the large table on a separate database file group. I have worked on one case in a similar fashion where we had around 1.8 Billion record in a single table.
For the analysis if you are not going to look into the entire data in one shot. You could consider a vertical partitioning of the data. You could use a partition schema based on your need. Some sample could be to split the data as yearly partitions and this will help if your analysis will be limited to a years worth of data(just an example).
The major thing would be de-normalization /normalization based on your need and of course non clustered/clustered indexing of the data. Again this will depend on what sort of analysis queries you would be using.
A single thread can INSERT one row at a time and finish 40M rows in a day or two. With LOAD DATA, you can do it in perhaps an hour or less.
But is loading the real question? For doing grouping, summing, etc, the question is about SELECT. For "analytics", the question is not one of table structure. Have a single table for the raw data, plus one or more "Summary tables" to make the selects really fast for your typical queries.
Until you give more details about the data, I cannot give more details about a custom solution.
Partitioning (vertical or horizontal) is unlikely to help much in MySQL. (Again, details needed.)
Normalization shrinks the data, which leads to faster processing. But, it sounds like the dataset is so small that it will all fit in RAM?? (I assume your #2 is 'normalization'?)
Beware of over-normalization.

SQL Table MAX Limit

I'm building a web application that uses database (MySQL in particular).
For each database, each user will have their own table, that each has their category. For example:
Database 1 (Music Playlist) - Tables:User1,User2,User3
Database 2 (Wall posts) - Tables:User1,User2,User3
Database 3 (Wall replies) - Tables:User1_post1,User1_post2,User3_post1
Let's say I have 100,000 users now. Thinking about how many tables there are in total, will this be wise? Is there a maximum table limit to this? Will this affect performance?
I'm taking a course right now and I just realized there is JOINing of tables. Is this a better idea? Will it have a difference in performance?
Relational databases are designed to handle large amounts of data. Tables with many millions of rows are not uncommon; there are examples with billions of rows as well. So, you don't have to worry about 100,000 users, as long as you understand how to structure your database. Two key ideas are indexes and partitions.
You do have a problem with your structure, however. You do not want a separate table for each user. You want a single table with a column specifying the user. Although the tables will have hundreds of thousands or millions of rows, you do not need to worry. Databases are designed for this type of volume.

Database Optimisation through denormalization and smaller rows

Does tables with many columns take more time than the tables with less columns during SELECT or UPDATE query? (row count is same and I will update/select same number of columns in both cases)
example: I have a database to store user details and to store their last active time-stamp. In my website, I only need to show active users and their names.
Say, one table named userinfo has the following columns: (id,f_name,l_name,email,mobile,verified_status). Is it a good idea to store last active time also in the same table? Or its better to make a separate table(say, user_active) to store the last activity timestamp?
The reason I am asking, If I make two tables, userinfo table will only be accessed during new signups(to INSERT new user row) and I will use user_active table (table with less columns) to UPADATE timestamp and SELECT active users frequently.
But the cost I have to pay for creating two tables is data duplication as user_active table columns will be (id, f_name, timestamp).
The answer to your question is that, to a close approximation, having more columns in a table does not really take more time than having fewer columns for accessing a single row. This may seem counter-intuitive, but you need to understand how data is stored in databases.
Rows of a table are stored on data pages. The cost of a query is highly dependent on the number of pages that need to be read and written during the course of the query. Parsing the row from the data page is usually not a significant performance issue.
Now, wider rows do have a very slight performance disadvantage, because more data would (presumably) be returned to the user. This is a very minor consideration for rows that fit on a single page.
On a more complicated query, wider rows have a larger performance disadvantage, because more data pages need to be read and written for a given number of rows. For a single row, though, one page is being read and written -- assuming you have an index to find that row (which seems very likely in this case).
As for the rest of your question. The structure of your second table is not correct. You would not (normally) include fname in two tables -- that is data redundancy and causes all sort of other problems. There is a legitimate question whether you should store a table of all activity and use that table for the display purposes, but that is not the question you are asking.
Finally, for the data volumes you are talking about, having a few extra columns would make no noticeable difference on any reasonable transaction volume. Use one table if you have one attribute per entity and no compelling reason to do otherwise.
When returning and parsing a single row, the number of columns is unlikely to make a noticeable difference. However, searching and scanning tables with smaller rows is faster than tables with larger rows.
When searching using an index, MySQL utilizes a binary search so it would require significantly larger rows (and many rows) before any speed penalty is noticeable.
Scanning is a different matter. When scanning, it's reading through all of the data for all of the rows, so there's a 1-to-1 performance penalty for larger rows. Yet, with proper indexes, you shouldn't be doing much scanning.
However, in this case, keep the date together with the user info because they'll be queried together and there's a 1-to-1 relationship, and a table with larger rows is still going to be faster than a join.
Only denormalize for optimization when performance becomes an actual problem and you can't resolve it any other way (adding an index, improving hardware, etc.).

MySQL indexing - optional search criteria

"How many indexes should I use?" This question has been asked generally multiple times, I know. But I'm asking for an answer specific to my table structure and querying purposes.
I have a table with about 60 columns. I'm writing an SDK which has a function to fetch data based on optional search criteria. There are 10 columns for which the user can optionally pass in values (so the user might want all entries for a certain username and clientTimestamp, or all entries for a certain userID, etc). So potentially, we could be looking up data based on up to 10 columns.
This table will run INSERTS almost as often as SELECTS, and the table will usually have somewhere around 200-300K rows. Each row contains a significant amount of data (probably close to 0.5 MB).
Would it be a good or bad idea to have 10 indexes on this table?
Simple guide that may help you make a decision.
1. Index columns that have high selectivity.
2. Try normalizing your table (you mentioned username and userid columns; if it's not user table, no need for storing name here)
3. If your system is not abstract, it should be a number of parameters that are used more often than others. First of all, make sure you have indexes that support fast result retrieval with such parameters.

MySql DB Design question

Assuming I have an application where I expect no more than 50,000 users.
I have 7 tables in my DB schema. Is it a good idea to replicate all the
tables for every user? So, in my case, number of tables will roughly be
50,000 * 7 = 350,000.
Is it in anyway better than 7 monolithic tables?
NO, I would not recomend creating a table schema per user in the same database.
mysql should handle the load perfectly well, given the correct indexes on tables.
What you're proposing is horizontal partitoning. http://en.wikipedia.org/wiki/Partition_%28database%29
i.e. you take all the rows in what would (logically) be one table, and you spread those rows across multiple tables, according to some partitioning scheme (in this case, one table per user).
In general, for starter applications of this nature, only consider partitioning (along with a host of other options) when performance starts to suffer.
No, it's definitely not. For two reasons:
The database design should not change just because you store more data in it.
Accessing different tables is more expensive and more complicated than accessing specific data in a single table.
Just image if you wanted statistics for a value from each user. You would have to gather data from 50 000 tables using 50 000 separate queries instead of running a single query against one table.
A table can easily contain millons of records, but i'm not sure that the database can even contain 350 000 tables.
Certainly not. A DB is typically optimized for having multiple rows in a given table... Try to rethink your DB schema to have a field in each of your tables that holds the user's ID or another associative table that holds the user id and key for the particular entry in the data table.
There's a decent intro to DB design here.