Lets consider one case, we have services at three levels
Rose
-red rose
- red rose by A dealer
- red rose by B dealer
-yellow rose
- yellow rose by A dealer
- yellow rose by B dealer
So what would be the ideal scenario for creating database relations between tables:
I want to create db structure for product and service/flavours. for example lets say rose is my product then red rose, yellow rose, black rose are my services and further red rose by A dealer and red rose by B dealer is further my services...in this scenario how I will make collections in mongodb.
Your data seems relational. MongoDB is not a relational database. That means you can't create relations in your data. If you want to use MongoDB you'll have to create the relations between the data yourself. If it's worth it, that's up to you. Most people choose MongoDB for speed, but when your data is very dependent on relationships, MongoDB might not help you at all.
So if you would create this strucuture in MongoDB, you would have three different collections: roses, flavors and dealers. If you want to get a rose for a certain flavor and a certain dealer, you'll have to execute 3 separate queries, and then join the 3 results together, where in mysql you can do 1 query where you get 1 result. Note that this difference gets bigger with every layer of relationships.
It could still be possible that doing the separate MongoDB queries is faster in the end, if your collections are really large, and maybe if you need to do complex filtering on it. You should just remember that it's not a free lunch, you'll have a completely different setup compared to a relational database and you should really think about it.
For speed, you can also consider adding a cache layer on top of your relational database, which you could in fact use MongoDB for.
Related
(edited 1/5 10:22hr. added some explanation about my notation. And added some additional information I received)
I am doing a course on database design and currently we're doing ERD's and designing db's in MySQL worksbench. Think 1st, 2nd and 3rd NF, creating schema, tables, constraints, etc.
Most of it is pretty clear to me.
However there's one aspect where things remain unclear: the X:X to 1:many relationship vs the X:X to 0:many relationship (meaning: whatever to 0:many, vs whatever to 1:many, etc).
In some cases it's obvious, in others not so much. Whenever it's unclear to me, it's mostly something like this:
Example :
an artist has 1 to many paintings. A painting has 1-and only 1 artist.
Relationship:
|artist| 1:1 -------- 1:many |painting|
the same in another notation
|artist| ||------------ 1< |painting|
This seems fair, but....Then there's the thought: I could be a new artist, not having produced a painting yet.
Or: I could be entering a new artist into a artist table, not yet having entered his paintings yet (which could lead to a practical issue).
Another example:
A workshop has 1 to many participants. A participant enters 0-to-many workshops.
Relationship:
|workshop| many:0 ------- 1:many |participant|
Okay. However: a workshop could have 0 participants (no one want to participate, probably leading to cancellation).
Or: I could be entering a new workshop into a table, not having added any participants yet.
Another example:
An event is held at 1 only 1 location. A location had 1 to many events.
Relationship: |event| many:1 -------- 1:1 |location|
However, maybe you're entering a new (future) location, and there have not been events there yet.
Long shorty short: I am having a hard time establishing the minimal cardinality in cases like above.
Also, when I'm designing a db and get Workbench to forward engineer the SQL for creating the tables (based on my ERD), there doesn't seem to be any difference between a X to 1/many vs a X to 0/many variant. Which makes me wonder: what's the actual (practical) effect or implication of doing one or the other? Maybe the implications (further down the road) make for an easier choice?
Can anyone explain this matter in a simple (fool-proof) way?
Looking forward to understanding!
Addition 1/5:
I've talked about my question/issue with a teacher. He agreed with me that certain minimum cardinalities could lead to a deadlock:
one table cannot be inserted without there being a occurence in the other, and vice versa.
He explained to me that the ERD diagram is a logical model, not perse a fysical model. In other words, the ERD's minimum cardinality is not neccessarily for technical implementation.
Well, if that is the case, I understand his point. Usually an artist has at least one painting. A workshop normally has at least one participant. A location usually has at least one event. So on a logical level, that seems fine.
On a technical/implementation level, it is another deal. You should be able to enter a artist, workshop or location without there already being occurrences in another table.
My question now is:
is this true? Is a ERD a logical model, not a technical model?
and if that is so, WHAT is the reason for adding the minimum cardinality? It seems of little use.
Let's continue with your artist::paintings relationship, I think that's the clearest. When we say 1 to many, we often mean 0/1 to 0/many, but not all the 4 permutations work or are meaningful.
How does "I can have no artists with no paintings" sound? That's the zero-to-zero permutation. It does not sound wrong, but it's a degenerate case that is of no use to us.
1 artist to 0 paintings is OK, as matbailie described, and presents no problems.
1 artist to many paintings is OK, and is the main use case.
0 artists to many paintings is just not correct and should not be supported in the model. A painting must have an artist.
Because cases 1 & 4 don't work, it is not really correct to say 0/1 to 0/many. It is more correct to characterize the cardinality as 1 to 0/many, which encompasses cases 2 & 3 above.
You might say, 'but there are cases where we don't know the artist so we should leave an opportunity to have paintings with no artists'. This statement feels to me like you are leaving the realm of the ERD and entering into physical design. From a design standpoint you could just as easily say there is an artist we just don't know who, so let's create an UNKNOWN ARTIST record and connect those paintings to it.
Your second example (workshops::participants) looks more like a 0/many to 0/many. If you run through the same 4 permutations, they all look credible, though the 0-to-0 case still seems kind of ludicrous.
Your last example is another 1 to 0/many because events without locations cannot be held. When you get to the physical level you can talk about the best way to handle virtual events.
So, none of your examples seem to show a true zero-to-many (which is more accurately stated 0/1 to 0/many). I'm thinking they are pretty rare, if they exist at all. It would have to be associated with some kind of optional activity where if you did enroll in it there were a constrained set of choices.
But... A participant can be in 0-N workshops. So that needs to be many-to-many.
In general "zero" is a degenerate case of "1" or "N"; it is rarely worth worrying about.
I like to start by identifying the "entities" in the model. Your participant, workshop, painting, event, artist, location are excellent examples of such. Then I look for "relations" between obvious pairs.
I say there are only 3 cases. And I like to remember that the two "entities" are manifested as two "database tables":
1:1 -- At which point I ask why the two entities (database tables) are not merged together. The two tables share the same unique key.
1:many -- Represented by the id of the "1" as a column in the "many" table.
Many:many -- This needs a link table between the two tables. This linking table has two ids.
"Id" means a unique key for a table. It is usually a number, but that is not a requirement.
For "0"...
1:0 or many:0 -- You may need a LEFT JOIN to provide NULLs when the entry on the "0" side is missing
Many:many -- If either id is non-existent (the zero case), then there are no rows for that relationship.
Then comes defining the INDEXes for efficient access. And, optionally, FOREIGN KEYs for integrity. The indexes that represent how two entities are related are prime candidates for FKs. Other INDEXes should be added to optimize WHERE clauses in SQL queries.
In all cases, the id/FK/index may be "composite" -- meaning that it is two or more columns that are used for a single id/FK/index.
I’m developing a database. I’d appreciate some help restructuring 2 to 3 tables so the database is both compliant with the first 3 normal forms; and practical to use and to expand on / add to in the future. I want to invest time now to reduce effort / and confusion later.
PREAMBLE
Please be aware that I'm both a nube, and an amateur, though I have a certain amount of experience and skill and an abundance of enthusiasm!
BACKGROUND TO PROJECT
I am writing a small (though ambitious!) web application (using PHP and AJAX to a MySQL database). It is essentially an inventory management system, for recording and viewing the current location of each individual piece of equipment, and its maintenance history. If relevant, transactions will be very low (probably less than 100 a day, but with a possibility of simultaneous connections / operations). Row count will also be very low (maybe a few thousand).
It will deal with many completely different categories of equipment, eg bikes and lamps (to take random examples). Each unit of equipment will have its details or specifications recorded in the database. For a bike, an important specification might be frame colour, whereas a lamp it might require information regarding lampshade material.
Since the categories of equipment have so little in common, I think the most logical way to store the information is 1 table per category. That way, each category can have columns specific to that category.
I intend to store a list of categories in a separate table. Each category will have an id which is unique to that category. (Depending on the final design, this may function as a lookup table and / or as a table to run queries against.) There are likely to be very few categories (perhaps 10 to 20), unless the system is particulary succesful and it expands.
A list of bikes will be held in the bikes table.
Each bike will have an id which is unique to that bike (eg bike 0001).
But the same id will exist in the lamp table (ie lamp 0001).
With my application, I want the user to select (from a dropdown list) the category type (eg bike).
They will then enter the object's numeric id (eg 0001).
The combination of these two ids is sufficient information to uniquely identify an object.
Images:
Current Table Design
Proposed Additional Table
PROBLEM
My gut feeling is that there should be an “overarching table” that encompasses every single article of equipment no matter what category it comes from. This would be far simpler to query against than god knows how many mini tables. But when I try to construct it, it seems like it will break various normal forms. Eg introducing redundancy, possibility of inconsistency, referential integrity problems etc. It also begins to look like a domain table.
Perhaps the overarching table should be a query or view rather than an entity?
Could you please have a look at the screenshots and let me know your opinion. Thanks.
For various reasons, I’d prefer to use surrogate keys rather than natural keys if possible. Ideally, I’d prefer to have that surrogate key in a single column.
Currently, the bike (or lamp) table uses just the first column as its primary key. Should I expand this to a composite key including the Equipment_Category_ID column too? Then make the Equipment_Article table into a view joining on these two columns (iteratively for each equipment category). Optionally Bike_ID and Lamp_ID columns could be renamed to something generic like Equipment_Article_ID. This might make the query simpler, but is there a risk of losing specificity? It would / could still be qualified by the table name.
Speaking of redundancy, the Equipment_Category_ID in the current lamp or bike tables seems a bit redundant (if every item / row in that table has the same value in that column).
It all still sounds messy! But surely this must be very common problem for eg online electronics stores, rental shops, etc. Hopefully someone will say oh that old chestnut! Fingers crossed! Sorry for not being concise, but I couldn't work out what bits to leave out. Most of it seems relevant, if a bit chatty. Thanks in advance.
UPDATE 27/03/2014 (Reply to #ElliotSchmelliot)
Hi Elliot.
Thanks for you reply and for pointing me in the right direction. I studied OOP (in Java) but wasn't aware that something similar was possible in SQL. I read the link you sent with interest, and the rest of the site/book looks like a great resource.
Does MySQL InnoDB Support Specialization & Generalization?
Unfortunately, after 3 hours searching and reading, I still can't find the answer to this question. Keywords I'm searching with include: MySQL + (inheritance | EER | specialization | generalization | parent | child | class | subclass). The only positive result I found is here: http://en.wikipedia.org/wiki/Enhanced_entity%E2%80%93relationship_model. It mentions MySQL Workbench.
Possible Redundancy of Equipment_Category (Table 3)
Yes and No. Because this is a lookup table, it currently has a function. However because every item in the Lamp or the Bike table is of the same category, the column itself may be redundant; and if it is then the Equipment_Category table may be redundant... unless it is required elsewhere. I had intended to use it as the RowSource / OptionList for a webform dropdown. Would it not also be handy to have Equipment_Category as a column in the proposed Equipment parent table. Without it, how would one return a list of all Equipment_Names for the Lamp category (ignoring distinct for the moment).
Implementation
I have no way of knowing what new categories of equipment may need to be added in future, so I’ll have to limit attributes included in the superclass / parent to those I am 100% sure would be common to all (or allow nulls I suppose); sacrificing duplication in many child tables for increased flexibility and hopefully simpler maintenance in the long run. This is particulary important as we will not have professional IT support for this project.
Changes really do have to be automated. So I like the idea of the stored procedure. And the CreateBike example sounds familiar (in principle if not in syntax) to creating an instance of a class in Java.
Lots to think about and to teach myself! If you have any other comments, suggestions etc, they'd be most welcome. And, could you let me know what software you used to create your UML diagram. Its styling is much better than those that I've used.
Cheers!
You sound very interested in this project, which is always awesome to see!
I have a few suggestions for your database schema:
You have individual tables for each Equipment entity i.e. Bike or Lamp. Yet you also have an Equipment_Category table, purely for identifying a row in the Bike table as a Bike or a row in the Lamp table as a Lamp. This seems a bit redundant. I would assume that each row of data in the Bike table represents a Bike, so why even bother with the category table?
You mentioned that your "gut" feeling is telling you to go for an overarching table for all Equipment. Are you familiar with the practice of generalization and specialization in database design? What you are looking for here is specialization (also called "top-down".) I think it would be a great idea to have an overarching or "parent" table that represents Equipment. Then, each sub-entity such as Bike or Lamp would be a child table of Equipment. A parent table only has the fields that all child tables share.
With these suggestions in mind, here is how I might alter your schema:
In the above schema, everything starts as Equipment. However, each Equipment can be specialized into Lamp, Bike, etc. The Equipment entity has all of the common fields. Lamp and Bike each have fields specific to their own type. When creating an entity, you first create the Equipment, then you create the specialized entity. For example, say we are adding the "BMX 200 Ultra" bike. We first create a record in the Equipment table with the generic information (equipmentName, dateOfPurchase, etc.) Then we create the specialized record, in this case a Bike record with any additional bike-specific fields (wheelType, frameColor, etc.) When creating the specialized entities, we need to make sure to link them back to the parent. This is why both the Lamp and Bike entities have a foreign key for equipmentID.
An easy and effective way to add specialized entities is to create a stored procedure. For example, lets say we have a stored procedure called CreateBike that takes in parameters bikeName, dateOfPurchase, wheelType, and frameColor. The stored procedure knows we are creating a Bike, and therefore can easily create the Equipment record, insert the generic equipment data, create the bike record, insert the specialized bike data, and maintain the foreign key relationship.
Using specialization will make your transactional life very simple. For example, if you want all Equipment purchased before 1/1/14, no joins are needed. If you want all Bikes with a frameColor of blue, no joins are needed. If you want all Lamps made of felt, no joins are needed. The only time you will need to join a specialized table back to the Equipment table is if you want data both from the parent entity and the specialized entity. For example, show all Lamps that use 100 Watt bulbs and are named "Super Lamp."
Hope this helps and best of luck!
Edit
Specialization and Generalization, as mentioned in your provided source, is part of an Enhanced Entity Relationship (EER) which helps define a conceptual data model for your schema. As such, it does not need to be "supported" per say, it is more of a design technique. Therefore any database schema naturally supports specialization and generalization as long as the designer implements it.
As far as your Equipment_Category table goes, I see where you are coming from. It would indeed make it easy to have a dropdown of all categories. However, you could simply have a static table (only contains Strings that represent each category) to help with this population, and still keep your Equipment tables separate. You mentioned there will only be around 10-20 categories, so I see no reason to have a bridge between Equipment and Equipment_Category. The fewer joins the better. Another option would be to include an "equipmentCategory" field in the Equipment table instead of having a whole table for it. Then you could simply query for all unique equipmentCategory values.
I agree that you will want to keep your Equipment table to guaranteed common values between all children. Definitely. If things get too complicated and you need more defined entities, you could always break entities up again down the road. For example maybe half of your Bike entities are RoadBikes and the other half are MountainBikes. You could always continue the specialization break down to better get at those unique fields.
Stored Procedures are great for automating common queries. On top of that, parametrization provides an extra level of defense against security threats such as SQL injections.
I use SQL Server. The diagram I created is straight out of SQL Server Management Studio (SSMS). You can simply expand a database, right click on the Database Diagrams folder, and create a new diagram with your selected tables. SSMS does the rest for you. If you don't have access to SSMS I might suggest trying out Microsoft Visio or if you have access to it, Visual Paradigm.
I am a data base admin and developer in MySQL. I have been a couple of years working with MySQL. I recently adquire and study O'Reilly High Performance MySQL 2nd Edition to improve my skills on MySQL advanced features, high performance and scalability, because I have often been frustated by the lack of advance knowledge of MySQL I had (and in a big part, I still have).
Currently, I am working on a ambicious web project. In this project, we will have quite content and users from the begining. I am the designer of the data base and this data base must be very fast (some inserts but mostly and more important READS).
I want here to discuss about these requirements:
There will be several kind of items
The items have some fields and relations in common
The items also have some fields and relations special that make them differents each other
Those items will have to be listed all together ordered or filtered by common fields or relations
The items will have to be also listed only by type (for examble item_specialA)
I have some basic design doubts, and I would like you to help me decide and learn which design aproach would be better for a high performance MySQL data base.
Classical aproach
The following diagram shows the clasical aproach which is the first you may think about with the mind thinking in database: Database diagram
Centralized aproach
But maybe we can improve it with some or pseudo object oriented paradigm centralicing the common items and the relations on one common item table. It would also be useful for listing all kind of items: Database diagram
Advantages and disadvantages of each one?
Which aproach would you choose or which changes would you apply seeing the requirements seen before?
Thanks all in advance!!
What you have are two distinct data mapping strategies. That you called "classical" is "one table per concrete class" in other sources, and that you called "centralized" is "one table per class" (Mapping Objects to Relational Databases: O/R Mapping In Detail). They both have their advantages and disadvantages (follow the link above). The queries in the first strategy will be faster (you will need to join only 2 tables vs 3 in the second strategy).
I think that you should explore classic supertype/subtype pattern. Here are some examples from the SO.
If you're looking mostly for speed, consider selective use of MyISAM tables, use a centralized "object" table, and just one additional table with correct indexes on this form:
object_type | object_id | property_name | property_value
user | 1 | photos | true
city | 2 | photos | true
user | 5 | single | true
city | 2 | metro | true
city | 3 | population | 135000
and so on... lookups on primary keys or indexed keys (object_type, object_id, property_name) for example will be blazing fast. Also, you reduce the need to end with 457 tables as new properties appear.
It isn't exactly a well-designed nor perfectly-normalized database and, if you are looking for a long-term big site, you should consider caching, or at least using a denormalized paradigm, denormalized mysql tables like this one, redis, or maybe MongoDB.
I am a professional designer that has done some databases. I would like some feedback on this on any big mistakes I am making in the table configurations and how the PK and FK relate.
The blue boxes represent data that will come from another database.
Click here to see database design
Click here to see New Design Changed the product sizes and color table
In keeping with what Gilbert Le Blanc described, you could make this more scalable and efficient as follows:
A. Anytime you find yourself adding columns for items which represent possible user choices, consider whether they should actually be modelled as ROWS in a new table. This is referred to as "Normalization" (there's more to it than that, but for this purpose, it should cover what I'm trying to say . . .), and is key to proper database design. If you fail to normalize properly, you will experience extensive pain and regret down the road. Imagin one of your suppliers introduces a new color 6 months after you go live with your database. You will have to re-code your data-access routines just to add that color to whatever front-end presentation you are creating.
B. You MIGHT want to combine some of your Category/Sub-Category/Class Structure into one or two tables. While I don't have a concrete suggestion without knowing more about the retail biz, it seems like there may be any number of heirarchies, depending upon the product. In theory, you could actually get way with a SINGLE table for this:
**tblCategories**
CategoryID Int PK
ParentCategoryID Int FK on tblCategories CategoryID
CategoryName
Records with a ParentCategoryID > 0 are sub-categories.
I am going to attempt attaching an image (I have not done that here on SO before) of what I have just described. Caveats:
I Am working in SQL Server, so things might look a tad different to you.
I have over-simplified the model for the purpose of this example. But it does illustrate the relationships I am describing.
THere may be others with better suggestions for modelling the Product/Categories. The concept I have presented can be challenging to keep straight in your head, but makes use of recursive relationships to create a very flexible/scalable table structure.
I think you are on the right track. However, there are still some areas for (potentially) significant improvement in your normalization. I say potentially because I don't know enough about the sports apparel business, sizing, and the like. However, some observations:
A. I see the same entities representated in several different tables, i.e. Nike, Adidas, Etc. While I understand that one vendor may have several different brands, your table structure could make this more clear. If "Nike" is the vendor, then possible Brands of Nike might be Nike, Converse, whatever other brands Nike provides. If this is what your table does, then forgive.
B. You apparel sizing table might have some potential for additional normalization, or maybe not. Seems complex, and again, I don't know enough about the relatiosnhips represented there. I DO see what appears to be repetition of data in fields which might be better represented as rows in other tables.
C. An example of what I describe in B. is to be had with the footwear sizing. THIS can be normalized more effectively. Note that I have rather arbitrarily placed the FK for GenderCategory in tblFootwear_Sizing_Index, it MIGHT belong in tblFootwearSizes. Again, don't know enough about the footwear industry. But beyond that quibble, you will find the following arrangement more effective and manageable:
There are other areas in your model which might lend themselves to simimlar restructuring. However, as I said, it becomes hard for me to see given my lack of knowledge of your industry. I STILL think you might want to re-examine the many flavors of "Category" and "Class" Further, you most definitely should find some more descriptive names for some of those Category/Class Tables (or any table, really). Think "ProductCategory", "GenderCategory", "FootwearCategory", Etc. Also, don't be too afraid of longer table names, if the make it easier for you (or more importantly, your successor four years form now) to discern what is going on in your code. It may be more cumbersome to type now, but 6 months after you go live, and you are trying to figure out why one of your queries is not returning as expected, you will be glad you did. After all, you can always alias the table names in general use.
I strongly recommend checking out some info on database normalization, then try to apply it to your model. Getting the back-end db model right from jump can make or break your application. Here is one of many articles I got back by googling "Database Normalization":
http://databases.about.com/od/specificproducts/a/3nf.htm
This article is focused on the Third Normal Form (3NF), but provides links to 1NF and 2NF, which are pre-requisites for 3NF.
You should always strive for a minimuim of 3NF in a database design.
Hope that helps, and I would love to hear how you progress on this.
You have 2 footwear size tables.
Taking the apparel size table as an example, you get more flexibility if you make the size one of the columns.
apparelSizeId Size Sort order
1 M 1
1 L 2
1 XL 3
2 S 1
2 M 2
2 L 3
With this type of table design, it's easier to add new sizes.
You can also combine a lot of your size and style tables into one table, although it does make the design harder for business types to understand.
I have a site with users that I want users to be able to identify their ethnicities. What's the best way to model this if there is only 1 level of hierarchy?
Solution 1 (single table):
Ethnicity
- Id
- Parent Id
- Name
Solution 2 (two tables):
Ethnicity Group
- Id
- Name
Ethnicity
- Id
- Ethnicity Group Id
- Name
I will be using this so that users can search for other users based on ethnicity. Which of the 2 approaches will work better for me? Is there another approach I have not considered? I'm using MySQL.
Well there is such a thing as an Ethnicity Group in the real world, so you do need two tables, not one. The real world has three levels (the top-most would be Race), but I understand that may not be necessary here. If you squash the three levels into two, you have to be careful, and lay them all out properly at the beginning. However, they will be vulnerable to people saying they want the real thing, and you may have to change it, or change the structure to fit more in ... much more work later).
If you do it correctly, as per real world, that problem is eliminated. Let me know if you want Race, and I will change the model.
The tables are far too small, and the keys are too meaningful, to add Id-iot columns to them; leave them as pure Relational keys, otherwise you will lose the power of the Relational engine. If you really want narrow keys, use a CHAR(2) EthnicityCode, rather than a NUMERIC(10,0) or a meaningless number.
Link to Ethnicity Data Model (plus the answer to your other question)
Link to IDEF1X Notation for those who are unfamiliar with the Relational Modelling Standard.
If there is nothing like an "ethnicity group" in the real world, I'd suggest you don't introduce one in your data model.
All the queries you can do with the second one you can also do with the first one, because you can just select FROM ethnicity AS e1 JOIN ethnicity AS es ON (e2.ethnicity_id = e1.parent_id).
I don't want to be awkward, but what are you going to do with people of mixed descent? I think that the best that you can hope for is a simple single-level enumeration like the kind of thing you get on census forms (e.g. 'Black', 'White', 'Asian', 'Hispanic' etc). It's not ideal, but it allows people to fairly easily self-identify. Concepts like race and ethnicity are wooly enough without trying to create additional (largely meaningless) hierarchies on top of them, so my gut feeling is to keep it simple.