If a table has multiple sources (each source contains a different type of data), is it best practice to split that table into multiple tabes (with the necessary foreign keys) or do you nevertheless fit it into one table.
simplified example:
want to make a table containing client info
2 sources of data, all csv files:
static data, almost never changes (e.g. start of the relationship, headquarters, etc)
revenue data which changes monthly
In this case do you go for one table (e.g. t_client) containing both static and revenue data, which you then update monthly? Or do you make multiple tables: one with the static data (e.g. t_client_info) and one with the variable info (t_client_revenue, update monthly) and link them?
just want to know what is best practice
Related
I am developing an application for use within my department at work. The application needs to access a list of people that is supplied by my company.
This is kind of a mess, though, as my company does not maintain very clean data and that situation cannot be changed. I will use sample table names and such here for brevity.
Here is the current workflow for how I get this data into my database:
I receive a monthly report, in Excel (XLSX) format from my company.
Convert the report to CSV.
Delete all items from the current [people] table
Import the CSV data into the [people] table
I can not change this process to simple UPDATE or INSERT statements due to several factors:
There is a lot of duplicate data in the reports I receive
Each person can be listed in the report multiple times, with slight variations in the data in each column (ie: they may have several entries for each person but with different addresses for each row).
My company uses two "IDs" to identify the person, but they recycle these IDs. If a person is deleted from their records, for example, they may take that person's ID and assign it to a new person.
Without making the entire row the PK, is there any way to salvage this situation to create a true table that can be accurately referenced from another?
I am currently making a very simply database but haven't made one in a while.
My issue is that I have one table, Drinks, which has a column (the technical terms slips my mind) that is called ingredients. This column will be populated from two other tables, volume and ingred. I have split these tables up because there are many drinks that use the same ingredients but different volumes of them. So my question is what kind of query/relationship should I have to get the column to be correctly populated.
In your main table (Drinks) you need reference to splitted table. For example instead of ingredients field you need ingredients_id and the same for other splitted tables.
Then you can use one to many query to populate your data as you wish.
In relational databases, if we want to create a database for a football tournament for example, we consider the tournament as the mini-world (the unit for which we want to create a database and collect data). Therefore, we may create tables such as matches, teams, and so on. And, we don't create a table called tournament since we have only THE TOURNAMENT for which we are doing all this.
In practice, that's what I used to do. But, what if I want to save in my database some attributes about the tournament, such as its name, the date and the country in which it takes place... What can I do? Is it a good practice to create a table tournament that has only one record? And if yes, what about foreign keys? Is it good in this case to add the ID of the tournament as a foreign key in the tables matches, teams...? If not, what can be the best practice?
Why I want to store the tournament information in the database? Because I want to create a webpage that reads only dynamic data. I don't want to add those information (tournament name, date...) as static data on the web page.
I am also thinking about the benefit from the possibility of future evolution of the product. Later on, I may have more than one tournament and having the tournament table part of the database will allow a smooth integration of more tournaments without modification of the metadata.
Yes, it is typical to use a row to store relevant single values. (Frequently this is done for parameter settings). But you don't need an id in this row for the tournament or foreign keys to it in other tables until you have multiple tournaments.
Yes, this helps extend to multiple tournaments. It also helps in extending to a "temporal"/historical version of the database where we timestamp each row by when it held so that we can query about the state that was current at a given time. (This typically involves further normalization to have separate tables for columns that change together but possibly at different times from other column sets.)
In moving to multiple tournaments, as with any schema change, it is helpful to redefine the names of old tables as views of new tables. Unfortunately updates through views are typically poorly supported by SQL DBMSs so in that respect it can be useful to have a multiple-tournament-capable design right from the beginning.
I am running an e-commerce site with multiple stores and each store having its products. I currently have a table called product-store which has a list of all product id referencing the products name and description from a different table , prices etc and their corresponding store ids. This table could have same product repeating multiple times if multiple store carry it.
I am mooting the idea of having a separate table for each store(product-store1, product-store2) rather than having all stores in one product-store table. I could be adding 100 stores and hence 100 tables like this. The structure of each table is the same but the reason why I am thinking of doing this is for better encapsulation of data from the other stores. However this would also mean identifying the corresponding table first for the store and then fetching the data.
I need help in assessing if this is a right approach and how I can measure the two approaches.
There are very few good reasons for splitting a table into multiple tables. Here are reasons not to do it:
SQL is optimized for large tables, but not for lots of small tables with the same structure. (With small tables, you end up with lots of partially filled data pages.)
Maintenance is a nightmare. Adding a column, changing a data type, and so on has to be repeated many times.
A simple query such as "How many stores sell a single product?" are problematic.
You cannot have a foreign key relationship into this table, for instance, to have a history of prices or discounts on the product in each store.
A single table is almost always the best way to go.
I guess it also depends on if the products might be shared across different stores. I would not go the way of creating x tables for x stores, but a general structure to be able to hold all the information.
If so, you could set up at least three tables:
product (holds all the generic products information, shop independent)
store (information about the stores)
store_product (links the products to the stores)
This way you can add as many products / stores to your system without having to change database structure (which is bad anyways).
To answer some of your assumptions:
Encapsulation of data from different stores is rather selecting a subset of data that choosing different tables.
whenever you need some additional information (not being thought of in the beginning) for either stores or products, its easier to add by referencing the new table to stores/products instead of having to multiply those changes by the amount of stores.
I do have a datbase with multiple tables.
this multiple table is related to single name for example..
Table 1 contains name of the person, joined date,position,salary..etc
Table2 contains name of the person,current projects,finished,assigned...etc
Table 3 contains name of the person,time sheets,in,out,etc...
Table 4 contains name of the person,personal details,skill set,previous experiance,...etc
All table contains morethan 50000 names, and their details.
so my question is all tables contains information related to a name say Jose20856 this name is unique index of all 4 tables. when I search for Jose20856 all four table will give result and output to a front end software/html.
so do I need to keep multiple table or combined to a single table??
If so
CASE 1
Single table -> what are the advantages? will result will be faster? what about the system resource usage?
CASE 2
Multiple table ->what are the advantages? will result will be faster? what about the system resource usage?
As I am new to MySQL I would like to have your valuable opinion to move ahead
You can combine these into a single table but only if it makes sense. It's hard to tell if the relationships in your tables are one-to-one or one-to-many but seem to be one-to-many. e.g. A single employee from table 1 should be able to have multiple projects, skills, time sheets in the other tables. These are all one-to-many relationships.
So, keep the multiple table design. You also should consider using an integer-based primary key for the employee rather than the name. Use this pkey as the fkey in your other tables and you'll see performance improvement. (Also consider the amount of work you need to do if and when you want to change the name. You have to change all the names in all the tables. If you use a surrogate key, the int pkey, as suggested above, you only have to update a single row.)
Read on the web about database normalization.
E.g. http://en.wikipedia.org/wiki/Database_normalization
I think you can even add more tables to it. It all depends on the data and the relations.
Table1 = users incl. userdata
Table2 = Projects (if multiple users work on the same project)
Table3 = Linking user to projects (if multiple users work on the same project)
Table4 = Time spent? Contains the links to the user and to the project.
I think your table 4 can be merged into table 1 cause it also contains data specific to 1 user.
There is probably more you can do but as already stated it all depends and the relations.
What we're talking about here is vertical table partitioning (as opposed to horizontal table partitioning). It is a valid database design pattern, which can be useful in these cases:
There are too many columns to fit into one table. That's pretty obvious.
There are columns which are accessed relatively often, and some that are accessed relatively rarely. For example, if you very often need to display columns joined date,position,salary and columns personal details,skill set,previous experiance very rarely, then it makes sense to move these columns to separate a table, as it will (probably) improve performance in accessing those most commonly used. In MySQL this is especially true in case of TEXT and BLOB columns, since they're stored apart from the rest of the fileds, so accessing them takes more time.
There are NULLable columns, where majority of rows are NULL. Once again, if it's mostly null, moving it to a separate table will let you reduce size of your 'mani' table and improve performance. The new table should not allow null values and have entries only for rows where value is set. This way you reduce amount of storeage/memory resources as well.
MySQL specific - You might want tom move some of your columns from nnoDB table to MyISAM, so that you can use full text indexing, while still being able to use some of the features InnoDB provides. It's not a good design gnerally speaking though - it's better to use a full text search engine like Sphinx.
Last but not least. I'd suggest using a numeric field as a key joining all these tables, not a string.
Additional reading aboout MySQL partitioning (a bit outdated, since MySQL 5.5 added some new features)