I got one table called Table1, it has around 20 columns. Half of these columns are string values, and the rest are integer. My question is so simple: what's better, have all the columns into only one table, or have it distributed into 2, 3 or even 4 tables? If so, I'd have to join them using LEFT JOIN.
What's the best choice?
Thanks
The question of "best" depends on how the table is being used. So, there is no real answer to the question. I can say that 20 columns is not a lot and many very reasonable tables have more than 20 columns of mixed types.
First observation: If you are asking such a question, you have some knowledge of SQL but not in-depth knowledge. One table is almost certainly the way to go.
What might change this advice? If many of the integer columns are NULL -- say 90% of the records have all of them as NULL -- then those NULL values are probably just wasting space on the data page. By eliminating those rows and storing the values in another table, you would reduce the size of the data.
The same is true of the string values, but with a caveat. Whereas the integers occupy at least 4 bytes, variable length strings might be even smaller (depends on the exact way that the database stores them).
Another reason would be on how the data is typically used. If the queries are usually using just a handful of columns, then storing each column in a separate table could be beneficial. To be honest, the overhead of the key column generally overwhelms any savings. And, such a data structure is really lousy for updates, inserts, and deletes.
However, this becomes quite practical in a columnar database such as Paraccel, Amazon Redshift, or Vertica. Such databases have built-in support for this type of splitting and it can have some very remarkable effects on performance.
Answering this with an example for users table -
1) `users` - id, name, dob, city, zipcode etc.
2) `users_products` - id, user_id(FK), product_name, product_validity,...
3) `users_billing_details` - id, user_id(FK to `users`), billing_name, billing_address..
4) `users_friends` - id, user_id(FK to `users`), friend_id(FK to same table `users`)
Hence if have many relations, use MANY-to-MANY relationship. If few relationship go with using the same table. All depends upon your structure and requirements.
SUGGESTION - Many-to-Many makes your data structure more flexible.
You can have 20 columns in 1 table. Nothing wrong with that. But then are you sure you are designing the structure properly?
Could some of these data change significantly in the future?
Is the table trying to encapsulate a single activity or entity?
Does the table have a singular meaning with respect to the domain or does it encapsulate multiple entities?
Could the structure be simplified into smaller tables having singular meaning for each table and then "Relationships" added via primary key/foreign keys?
These are some of the questions you take into consideration while designing a database.
If you find answer to these questions, you will know yourself whether you should have a single table or multiple tables?
Related
I created a table where it has 30 columns.
CREATE TABLE "SETTINGS" (
"column1" INTEGER PRIMARY KEY,
...
...
"column30"
)
However, I can group them and create different table where they can have foreign keys to the primary table. Which is the best way to follow? Or the number of the columns is small so it's the same which way I will follow?
It depends on the data and the query you often do.
Best for one big table
If you need to extract all the columns always
If you need to update many fields at the same time
If all the fields or quite all have not null values
Best for many little tables
If the data are "sparse" it means not many columns have values you can imagine to split them in different tables and create a record in a child table only if not null values exists
If you extract only few related fields at one time
If you update only related fields at one time
Better names for each column (for example instead of domicile_address and residence_address you can have two columns with named address in two tables residences and domiciles)
The problem is that generally you can use both solutions depending from the situation. A usage analysis must be done to choose the right one.
If they really are named column1, column2....column30 then that's a fairly good indicator that your data is not normalized.
Following the rules of normalization should always be your starting point. There are sometimes reasons for breaking the rules - but that comes after you know what your data should look like.
Regarding breaking the rules.....there are 2 potential scenarios which may apply here (after you've worked out the correct structure and done an appropriate level of testing with an appropriate volume of data):
Joins are expensive. holding parent/child relations in a single table can improve query performance where you are routinely selecting only both parent and child and retrieving individual rows
unless you are using fixed width MyISAM tables, updating records can result in them changing size, and hence they have to relocated in the table data file. This is expensive and can have a knock on effect on retrieval.
I have seen few questions related to this but I felt they weren't exactly the same situation. This is also not a normalization related question.
Lets assume that we have a product which has some properties such as name,description,price,last_update_date,stock_amount
Lets assume, there will never be 2 different prices or stocks etc. for these 'product's and we don't have to keep historic data etc.
From a performance point of view, would it be better to keep all of these data in a single table? or divide it into seperate tables? such as:
products -> id, name, last_update_date, stock_amount, price
product_info -> id, products_id, description
I know data is not divided very logically but that is besides the point right now.
I can think of 2 arguments perhaps,
If you separate data into 2 tables, for example to update description, one would need to find products_id then update the data, which may cost more. On the other hand the products table's storage footprint would be so much smaller. Does this help in efficiency when finding the product, for example by name? or since we would have an index for 'name' it wouldn't matter how big the table is on disk?
Well, if everything was in one table, we wouldn't need to work on separate tables and this may increase efficiency?
What do you think? and what do you base your opinion on? Links and benchmark results are welcome.
Thanks!
If everything is a 1-to-1 mapping, there's no strong reason not to keep it all in one table. You should still have an ID column, so that if you have other data that's 1-to-many or many-to-many, you can refer to the products by ID in those tables.
However, one benefit of splitting it up into different tables can be improved concurrency. If everything is in one table, then an update to that table will lock the entire row (or the entire table if you use MyISAM). If you split it into multiple tables, then an update to one of the those tables won't interfere with queries that use the other tables.
I think efficiency is better with a single table. Two tables may be useful for further scalability.
Imagine a hypothetical database, which is storing products. Each product have have 100 attributes, although any given product will only have values set for ~50 of these. I can see three ways to store this data:
A single table with 100 columns,
A single table with very few (say the 10 columns that have a value for every product), and another table with columns (product_id, attribute, value). I.e, An EAV data store.
A separate table for every columns. So the core products table might have 2 columns, and there would be 98 other tables, each with the two columns (product_id, value).
Setting aside the shades of grey between these extremes, from a pure efficiency standpoint, which is best to use? I assume it depends on the types of queries being run, i.e. if most queries are for several attributes of a product, or the value of a single attribute for several products. How does this affect the efficiency?
Assume this is a MySQL database using InnoDB, and all tables have appropriate foreign keys, and an index on the product_id. Imagine that the attribute names and values are strings, and are not indexed.
In a general sense, I am asking whether accessing a really big table takes more or less time than a query with many joins.
I found a similar question here: Best to have hundreds of columns or split into multiple tables?
The difference is, that question is asking about a specific case, and doesn't really tell me about efficiency in the general case. Other similar questions are all talking about the best way to organize the data, I just want to know how the different organizational systems impact the speed of queries.
In a general sense, I am asking whether accessing a really big table takes more or less time than a query with many joins.
JOIN will be slower.
However, if you usually query only a specific subset of columns, and this subset is "vertically partitioned" into its own separate table, querying such "lean" table is typically quicker than querying the "fat" table with all the columns.
But this is very specific and fragile (easy to break-apart as the system evolves) situation and you should test very carefully before going down that path. Your default starting position should be one table.
In general, the more tables you have, the more normalised, more correct, and hence better (ie: reduced redundancy of data) your design.
If you later find you have problems with reporting on this data, then that may be the moment to consider creating denormalised values to improve any specific performance issues. Adding denormalised values later is much less painful than normalising an existing badly designed database.
In most cases, EAV is a querying and maintenance nightmare.
An outline design would be to have a table for Products, a table for Attributes, and a ProductAttributes table that contained the ProductID and the AttributeID of the relevant entries.
As you mentioned - it strictly depends on queries, which will be executed on these data. As you know, joins are aggravating for database. I can't imagine to make 50-60 joins for simple data reading. In my humble opinion it would be madness. :) The best thing, you can do is to introduce test data and check out your specific queries in tool as Estimated Execution Plan in Management Studio. There should exist similar tool for MySQL.
I would tend to advice you to avoid creating so much tables. I think, it have to cause problems in future. Maybe it is possible to categorise rarely used data for separate tables or to use complex types? For string data you can try to use nonclustered indexes.
Ok, I am creating a game, I have one table where I save a lot of information about a member, so I have many field in it. How many fields is normal to have in one table? Does it matter? Maybe I should split that info into two-three-four tables? What do you think?
Normalize the Database
If you feel you have too many columns, you probably have repeating groups, which suggests you should normalize the database. See an example here: Description of the database normalization basics
Hard MySQL Limits
MySQL 5.5 Column Count Limit
Every table has a maximum row size of 65,535 bytes.
There is a hard limit of 4096 columns
per table
Splitting of data into tables should generally not be dictated by the number of columns, but by the nature of the data. The process of splitting a large table into smaller ones is called normalization.
The only other reason I can think of to split a table is, if you may need data in clusters, i.e. you often need columns A-D together or columns E-L, but never all columns or columns D-F, then you can split the table into two tables, one containing columns A-D and the primary key, the other one containing columns E-L and the primary key.
Speaking about limits, MySQL says it's 4096 (source).
Yet I haven't seen so big tables yet, even those huge data mining tables don't come close.
You shouldn't be concerned about it as long as your database is normalized. As soon as you can spot same data being stored twice (for example, player table might have player_type column storing some fixed values), it's worth moving such data to separate table, instead od duplicating information in other tables and hence reducing columns count (but that's only "side effect" of normalization, not something that drives it).
I've never personally encountered one with more than 500 columns in it, but short of the maximum sizes there's no reason to have any fewer than the design demands. Beware of doing SELECT * FROM it though.
"information about a member" - umm always difficult, but I always separate identifiable information into another table and use a salt key to link the 2 together. That way it is not as easy to "hijack" usernames and passwords etc. And you can always use the SALT as a session variable rather than username/password/userId or whatever.
Typically I only store a ID, salt and joining date in 1 table. As I said, the rest I try to "hide" so that they cannot be "linked/hijacked".
Hope helps
How many field is possible in one table,
Shall i maintain 150 field in one table is good way ,
OR
Maintain relation ship with other tables,
Thanks
Bharanikumar
In the vast majority of cases having 150 columns in a single table is symptomatic of a badly denormalized database.
You might want to read this and re-evaluate your db design.
To put it in your terms, go with "maintain relationship with other tables"
http://dev.mysql.com/doc/refman/5.0/en/column-count-limit.html
If you have a business need to have 150 columns, then it's a "good way". I've never seen such a business need, but that doesn't mean one doesn't exist. I have seen very wide tables used in olap type cases, so if that's what you're doing, there's a good chance that you're on the right track. If you're using this table for more otap functionality, then you're probably going down the wrong road. Perhaps if you provided a bit more info about what you're trying to accomplish, we could provide some advice (instead of "do that" or "do it a different way").
150 bit type fields might be OK, but you also have to consider the maximum length of the record your database will allow you to store. With varchar fields, most databases will let you create a table that would in theory violate the max if all the fields were filled to their max length. However, it won't let you actually add records which are too long,. This is the kind of trap that can go along fine for years until someone puts just one character too many into a potential insert and then blow up and it takes a long time generally to find an fix such a problem. It is best to avoid ever designing a table where the total legnth of the columns is bigger than the length of the maximum record bytes.
Less wide tables can also tend to be faster to query.
Additonally 150 columns is usually a sign that you really need to look at the design and see if a related table would be better. For instance if you have phone1, phone2, phone3, then you need a related phone table.
If you genuiniely need all 150 columns, consider which are likely to be queried together most often. Put those inteh parent table. Then add the less often queried (or columns related only toa particular function) to the related table. There is no reason not to have a 1-1 relationship between tables, just use the id from the parante table sa the PK inthe child table as well as the FK to the parent table.