We are currently in the process of developing our own e-commerce solution, as part of our research we have been examining the ZenCart Database Schema and found that data is quite frequently duplicated between various tables where it would seem that perhaps a Foreign Key would have been sufficient to link the two or more tables in question, for example:
Given that there is table "Products" that has the following columns
PRODUCT_IDPRODUCT_NAMEPRODUCT_PRICEPRODUCT_SKU
Then if there is a Sales_Item "Table" Then of course a product (and all its constituent columns)may be referenced by simply doing something like:
SALES_ITEM_IDProducts_PRODUCT_ID //This is the foreign key that relates a specific product to a sale item.SALE_TIMEREST_OF_SALE_SPECIFIC_DATA......
However instead it seems that the Sales table COPIES many of the field values defined in the Products table so it infact looks as follows:
SALES_ITEM_IDPRODUCT_IDPRODUCT_NAMEPRODUCT_PRICEPRODUCT_SKUSALE_TIME
My question is which approach would generally be considered best practice when attempting to build a scalable efficient solution. Using foreign keys means data is not duplicated but the caveat is that database or application-level JOINS would be needed in order to query the entire dataset. However than being said, for some reason the foreign key approach seems cleaner and more correct somehow.
Related
For this example, I'm trying to build a system that will allow output from multiple sources, but these sources are not yet built. The output "module" will be one component, and each source will be its own component to be built and expanded upon later.
Here's an example I designed in MySQLWorkbench:
The goal is to make my output module display data from the output table while being easily expanded upon later as more sources are built. I also want to minimize schema updates when adding new sources. Currently, I will have to add a new table per source, then add a foreign key to the output table.
Is there a better way to do this? I don't know how I feel about these NULL-able foreign keys because the JOIN query will contains IFNULL's and will get unruly quickly.
Thoughts?
EDIT 1: Clarification
I will be displaying a grid using data in the output table. The output table will contain general data for all items in the grid and will basically act as an aggregator for the output_source_X tables:
output(id, when_added, is_approved, when_approved, sort_order, ...)
The output_source_X tables will contain additional data specific to a source. For example, let's say one of the output source tables is for Facebook posts, so this table will contain columns specific to the Facebook API:
output_source_facebook(id, from, message, place, updated_time, ...)
Another may be Twitter, so the columns are specific for Twitter:
output_source_twitter(id, coordinates, favorited, truncated, text, ...)
A third output source table could be Instagram, so the output_source_instagram table will contain columns specific to Instagram.
There will be a one-to-one foreign key relationship with the output table and ONLY ONE of the output_source_X tables, depending on if the output item is a Facebook, Twitter, Instagram, etc... post, hence the NULL-able foreign keys.
output table
------------
foreign key (source_id_facebook) references output_source_facebook(id)
foreign key (source_id_twitter) references output_source_twitter(id)
foreign key (source_id_instagram) references output_source_instagram(id)
I guess my concern is that this is not as modular as I'd like it to be because I'd like to add other sources as well without having to update the schema much. Currently, this requires me to join output_source_X on the output table using whatever foreign key is not null.
This design in almost certainly bad in a few ways.
It's not that clear what your design is representing but a straightforward one would be something like:
// source [id] has ...
source(id,message,...)
// output [id] is approved when [approved]=1 and ...
output(id,approved,...)
// output [output_id] has [source_id] as a source
output_source(output_id,source_id)
foreign key (source_id) references source(id)
foreign key (source_id) references source(id)
Maybe you have different subtypes of outputs and/or sources? Based on sources and/or outputs? Maybe each source is restricted to feeding particular outputs? Are "outputs" and "sources" actually kinds of outputs and sources, and this is info not on how outputs are sourced but info on what kinds of output-source pairings are permittted?
Please give us statements parameterized by column names for the basic statements you want to make about your application. Ie for the application relationships you are interested in. (Eg like the code comments above.) (You could do it for the diagrammed design but probably that would be overly complicated and not really reflecting what you are trying to model.)
Re your EDIT:
There will be a one-to-one foreign key relationship with the output
table and ONLY ONE of the output_source_X tables, depending on if the
output item is a Facebook, Twitter, Instagram, etc... post, hence the
NULL-able foreign keys.
You have a case of multiple disjoint subtypes of a supertype.
Your situation is a lot like that of this question except that where they have a subtype discriminator/tag column indicating which subtype table you have a set of columns where the non-empty one indicates which subtype table. See Erwin Smout's & my answers. Also this answer.
Please give us statements parameterized by column names for the basic
statements you want to make about your application
and you will find straightforward statements (as above). And if you give the statements for your current design you will find them complex. See also this.
I guess my concern is that this is not as modular as I'd like it to be
because I'd like to add other sources as well without having to update
the schema much.
Your structure is not reducing schema changes compared to proper subtype designs.
Anyway, DDL is there for that. You can genericize subtypes to avoid DDL only by loss of the DBMS managing integrity. That would only be relevant or reasonable based on evaluating DDL vs DML performance tradeoffs. Search re (usually, anti-pattern) EAV.
(Only after you shown that creating & deleting new tables is infeasible and the corresponding horrible integrity-&-concurrency-challenged mega-joining table-and-metadata-encoded-in-table EAV information-equivalent design is feasible should you consider using EAV.)
Quick question about DB design! In this example there are users and schedules. Each user can have many schedules and each schedule can belong to many users.
I have two tables, 'user' and 'schedule', that each have a unique identifier/primary key (user_id and schedule_id): these tables have a many-to-many relationship.
This is where I am unsure/inexperienced: In order to connect them together and adhere to good db design, I want to create a link table that has two columns, user_id and schedule_id. I plan to make these both primary keys (therefore a composite key). However, do I also add two foreign keys, one on user_id linked to the 'user' table and one on schedule_id linked to the 'schedule' table?
TLDR: I plan to use a composite key in 2-column 'link' table that connects two tables. Should/Do I also need to make those into foreign keys?
PKs and FKs serve different purposes. In a link table, you need the PK to preserve uniqueness of the data. However, if you do not also create the FKs then you may end up with data integrity problems because the ID could be deleted from the original table and not the link table.
Sometimes people think they can get away without the FKs because they will enforce data integrity through the application. Almost always this is because they find it annoying when the constraints won't let them do something they want to do. Of course that is the purpose of the constraint, to prevent users and developers from doing things they should not. Data integrity must be preserved through the database; it is too important to risk letting the application handle it. I have seen a lot of data from hundreds of databases and the ones with the worst data are invariably the ones where the devs thought they could manage stuff like table relationships through the application. There are always holes when you do this and eventually they come back to bite you and then they can be very difficult to fix properly.
I have a MySql database containing data about users of an application. This application is in production already, however improvements are added every day. The last improvement I've made changed the way data is collected and inserted into the database.
Just to be clearer, my database is composed of 5 tables containing user data and 1 table to relate all the tables, through foreign keys. These 5 foreign keys, together, form my Unique Index for this "Main Table" I have.
The issue is that one of these tables containing user data changed its format, and I want to remove all the data older than the modification I made on my application (just from this table, the other ones I need to keep untouched). However, this dataset has foreign keys in the main table, and I can't just drop these lines on the main table because the other informations I have are important. I tried to change the value of the foreign key for this table, in specific, but then, obviously, I have a problem related to duplicated indexes.
Reading on internet, I've found a solution to my problem using "Insert ... On duplicate key update ...", but i'm not inserting data, just updating it. I have an Idea about how to make a program on PHP to update my database, but is there another easier solution? Is it possible to avoid these problems using just MySql syntax?
might be worth looking at the below link
http://www.kavoir.com/2009/05/mysql-insert-if-doesnt-exist-otherwise-update-the-existing-row.html
In the initial 'version' of the application that I'm working on, a design consideration wasn't taken into account - no one thought of it.
However, it seems that the original one-to-many relation needs to be refactored into a many-to-many. My question is how best to do this? I'm using MySQL for persistence.
Populating the relationship table will only be a one time effort, I'd rather go with a simple query or a stored procedure approach (I'm not well versed with the latter); rather than write java/jdbc based logic to do it (I know I can and it's not too difficult, but that's not what I want)
So here's an example of the relation:
|VirtualWhiteBoard| -1------*- |Post|
A virtual white board can have many posts. The new functionality is: 1 post should belong to multiple white boards if the user chooses to 'duplicate' current white board (not thought of before)
The schema looks like this:
VirtualWhiteBoard (wallName, projectName,dateOfCreation,..., Primary_Key(wallName, projectName));
Post(post_id, wallName,postData,..., Primary_Key(post_id), Foreign_Key(wallName, projectName));
The virtual white board has a composite primary key (wallName, projectName) and each post has a post_id as primary key
Question: Take the primary keys from VirtualWhiteBoard and Post and add it to the new relation 'has_posts':
|VirtualWhiteBoard| -1------*- |has_Post| -*------1- |Post|
To keep the previous relationships intact and then drop the foreign key column of wallName in Post.
How best to achieve this? Would a query suffice or stored procedures would be required?
(Although I can do this in the 'application' I'd prefer to do it this way, since such refactorings are bound to arise and I don't want unnecessary java-code lying around that'll need to be maintained and would personally prefer to have such a skill too :)
Create your has_Post table with two columns post_id and wallName and populate it with this query:
INSERT INTO has_Post(post_id, wallName) SELECT post_id, wallName FROM Post
Then delete the wallName column from Post table.
I am attempting to build a database for inventory control using a large number of tables and enforced relationships, and I just ran into the 32-relationship (index) limit for an Access table (using Access 2007).
Just to clarify: the problem isn't that the Employees table has 32 explicit indexes. Rather, the problem is the limitation on the number of times the Employee table can be referenced in FOREIGN KEY constraints. For example:
CREATE TABLE Employees (employee_number INTEGER NOT NULL UNIQUE)
;
CREATE TABLE Table01 (employee_number INTEGER NOT NULL REFERENCES Employees (employee_number))
;
CREATE TABLE Table02 (employee_number INTEGER NOT NULL REFERENCES Employees (employee_number))
;
CREATE TABLE Table03 (employee_number INTEGER NOT NULL REFERENCES Employees (employee_number))
;
...
CREATE TABLE Table30 (employee_number INTEGER NOT NULL REFERENCES Employees (employee_number))
;
CREATE TABLE Table31 (employee_number INTEGER NOT NULL REFERENCES Employees (employee_number))
;
CREATE TABLE Table32 (employee_number INTEGER NOT NULL REFERENCES Employees (employee_number))
;
An exception is thrown on the last line above, "Could not create index; too many indexes defined.
What options do I have to work around this limitation?
I've heard that creating a duplicate table with a 1:1 relationship is one method. I'm new to database design, so please correct me if I'm wrong; but given a table Employees with 31 indexes, I would create a table Employees2(with one field?) with a 1:1 relationship to Employees and relationships to this new table from any remaining relations in which EmployeeID is a foreign key. What's the best way to ensure the second table is populated alongside the first?
Is there another approach?
Based on the lack of information available, it seems this may be a rare problem with a properly-designed database, or the solution is common knowledge. Forgive the noob!
Update: Immediate consensus is that my design is borked or far too ambitious. This could very well be the case. However, I'd rather have a general design discussion within a separate question, so for the sake of argument, can someone answer this one? If the answer is simply "Don't ever do that" I'll have to accept it.
I've run into this limitation a number of times with my apps. And I can assure the other posters that my apps are very well designed.
One problem is that Access creates indexes due to relationships and lookup fields that aren't viewable on the main index property box but they are accessible via DAO collections. These indexes are frequently duplicate indexes to indexes you have created as well.
I have a tool consisting of several forms you import into your BE MDB that allows you to remove the duplicate indexes. As I haven't yet made this available on my website please email me for it.
I'd suggest just not defining all the relationships/indexes to implementing a 1:1 relationship to get around it. Neither solution is optimal, but the later is going to create a much higher maintenance burden and data anomaly potential.
I am not going to decry the design as quicky as some of the others, but it does have me intrigued. Could you list the fields of the employee table that are foreign keys? There is a good liklihood that some normalization is in order and maybe some of the smart people on SO could make some design suggestions to work around the issue.
It is hard for me to believe that an Employee table would need 32 indexes; if it actually does you should consider migrating to at least SQL Express.
... I would create a table
Employees2(with one field?) with a 1:1
relationship to Employees and
relationships to this new table from
any remaining relations in which
EmployeeID is a foreign key.
That is workable. Presumably your main table might have an Autonumber field as the primary key, or you generate an index number. Your Employees2 table obviously must echo that.
What's the best way to ensure the
second table is populated alongside
the first?
That depends somewhat on how you are adding records. But in general, of course you must comply with the rules for integrity. This usually comes down to appending to tables in the correct order and ensuring each record is saved before trying to add a related record elsewhere.