I'm designing a database for a club. There is a "member" entity that practices activities (sports, etc.). These activities have different categories according to the age of the member. Each activity in its respective category will have schedules. In turn, the members will have their own schedules, because, for example they could attend only 2 days of an activity that takes place 4 days within a week.
I solved it like this:
https://ibb.co/dsJkHz
My question is: Is this a viable method for solving this problem? I seems a bit complicated and I don't think it's ideal/optimized for performance. I'm sure there must be another way. Thanks!
Performance will be fine as long as you have indexes on the id columns that will be searched or joined on. The other option is enums in some places to limit the joins which you can get away with if there are 4 categories that will not change or rarely ever change but it looks like you're trying to follow best practices and plan for the future.
Your schema is based on categories and activities wanting to use a set list but that list can grow or names can change so they have their own table and from what I can tell the other tables are required for a one to many relationship so even though other options are available they seem like a bad route to take.
Related
I'm struggling with a database design issue, and it's kind of a long winded one:
My website will have an unlimited number of organizations users they can join, subgroups under those organizations, and finally specific profiles for those subgroups. Subgroups within the same organization will be able to borrow and make changes to profiles from each other. Users will generate the organizations, the subgroups, and profiles.
I can draw it out, make the flow sensible on paper. When it comes to actually putting it to either SQL I'm lost. The majority of the help guides out there assumes static groups so a simple primary and foreign key set up can refer back to the right information. Mine has too much dynamic information for most of these to outright work as I understand it.
Most writers say stay away from dynamically generated tables, but that's where my instinct takes me. Another idea I had was 3 massive tables one for all Organizations, Groups, and Profiles.
So is there a better way to go about this? Or are there any good documents I should read up on to help me translate from drawing to actual code?
I have some experience with both SQL and MongoDB if that helps explain things.
I don't know about MongoDB(NoSQL), but from the SQL standpoint, here is my opinion.
As far as your schema goes, Most of the time when your "instinct" indicates that :- Only a "Dynamic Tables" solution is your best bet, for some problem that you are working on.
Remember there is a high chance that, that very problem can be solved by multiple static tables with different relationships. (By Static I mean the ones which you have created yourself as a developer.)
Also I'd like to mention that, I too myself in my initial days always thought of problem solving the similar way, but then I started understanding the principles and how exactly the databases work.
Back To Your Problem:-
If your organisation hirerchy consists of three major types of objects/levels, viz. Organizations, Groups, and Profiles then I'd suggest that you go with the 3 tables with correct relationships, which any SQL engine is quiet efficient at handling, in comparison to creating tables at runtime.
Now if the hierarchy is dynamic like say, An organisation can contain many groups which in turn shall contain profiles which again shall/can contain other organisations and so on.... Then you may want to look at Recursive structure with SQL(Recursion). (Just do a google search there are a lot of articles about that.)
Imagine the following categories: bars, places to eat, shops, etc...
Each of these categories will have some common fields, such as
id, name, address and geolocation (Lat and Lng position).
I am strongly in doubt whether I should create one table that combines these different categories, or whether I should split it up into separate tables (1 table per category).
My thought is that retrieving places based on category and geolocation for each category table separately would be faster in both retrieval and update, certainly when the number of places per category increases.
From this approach, I would go for 1 table per category.
BUT there is a supplementary requirement. Every place will have an owner (user), and one user could own multiple places. So this means that I would either:
Need a many-to-many table that connects the user table with the central giant table;
Need one many-to-many table for each of the categories.
Second BUT the situation is, that when the user logs in, all places that this user owns will be returned from a single query, ie. the id + name, assuming that a user could be the owner of multiple places.
From this point of view, the second option seems a very bad idea, because it appears that I would need to create queries to scan through each of the tables.
I realize that I can use indexes to speed up long table scans, yet I am still uncertain about the performance when the number of places would increase dramatically, and there are currently about 8 different categories.
What do you consider the most optimal solution, based on the options that I proposed (or do you see a better option I'm missing out on?).
I should point out that the web application will not often mix up categories, although bars will be able to create events as well.
The answer to this question is invaluable to me, because it will define the fundaments for the further development of my application.
Let me know if anything in my question is not clear to you.
Thank you.
When the data is that similar, a single table is usually best. Consider also that in the future you may add a new category - it would be much easier to just update a category field to include the new category than build a new table, new queries, and modify all the existing code.
As far as speed goes, indexes will make it irrelevant. Add a category field, index it, and speed won't be an issue.
I’m developing a database. I’d appreciate some help restructuring 2 to 3 tables so the database is both compliant with the first 3 normal forms; and practical to use and to expand on / add to in the future. I want to invest time now to reduce effort / and confusion later.
PREAMBLE
Please be aware that I'm both a nube, and an amateur, though I have a certain amount of experience and skill and an abundance of enthusiasm!
BACKGROUND TO PROJECT
I am writing a small (though ambitious!) web application (using PHP and AJAX to a MySQL database). It is essentially an inventory management system, for recording and viewing the current location of each individual piece of equipment, and its maintenance history. If relevant, transactions will be very low (probably less than 100 a day, but with a possibility of simultaneous connections / operations). Row count will also be very low (maybe a few thousand).
It will deal with many completely different categories of equipment, eg bikes and lamps (to take random examples). Each unit of equipment will have its details or specifications recorded in the database. For a bike, an important specification might be frame colour, whereas a lamp it might require information regarding lampshade material.
Since the categories of equipment have so little in common, I think the most logical way to store the information is 1 table per category. That way, each category can have columns specific to that category.
I intend to store a list of categories in a separate table. Each category will have an id which is unique to that category. (Depending on the final design, this may function as a lookup table and / or as a table to run queries against.) There are likely to be very few categories (perhaps 10 to 20), unless the system is particulary succesful and it expands.
A list of bikes will be held in the bikes table.
Each bike will have an id which is unique to that bike (eg bike 0001).
But the same id will exist in the lamp table (ie lamp 0001).
With my application, I want the user to select (from a dropdown list) the category type (eg bike).
They will then enter the object's numeric id (eg 0001).
The combination of these two ids is sufficient information to uniquely identify an object.
Images:
Current Table Design
Proposed Additional Table
PROBLEM
My gut feeling is that there should be an “overarching table” that encompasses every single article of equipment no matter what category it comes from. This would be far simpler to query against than god knows how many mini tables. But when I try to construct it, it seems like it will break various normal forms. Eg introducing redundancy, possibility of inconsistency, referential integrity problems etc. It also begins to look like a domain table.
Perhaps the overarching table should be a query or view rather than an entity?
Could you please have a look at the screenshots and let me know your opinion. Thanks.
For various reasons, I’d prefer to use surrogate keys rather than natural keys if possible. Ideally, I’d prefer to have that surrogate key in a single column.
Currently, the bike (or lamp) table uses just the first column as its primary key. Should I expand this to a composite key including the Equipment_Category_ID column too? Then make the Equipment_Article table into a view joining on these two columns (iteratively for each equipment category). Optionally Bike_ID and Lamp_ID columns could be renamed to something generic like Equipment_Article_ID. This might make the query simpler, but is there a risk of losing specificity? It would / could still be qualified by the table name.
Speaking of redundancy, the Equipment_Category_ID in the current lamp or bike tables seems a bit redundant (if every item / row in that table has the same value in that column).
It all still sounds messy! But surely this must be very common problem for eg online electronics stores, rental shops, etc. Hopefully someone will say oh that old chestnut! Fingers crossed! Sorry for not being concise, but I couldn't work out what bits to leave out. Most of it seems relevant, if a bit chatty. Thanks in advance.
UPDATE 27/03/2014 (Reply to #ElliotSchmelliot)
Hi Elliot.
Thanks for you reply and for pointing me in the right direction. I studied OOP (in Java) but wasn't aware that something similar was possible in SQL. I read the link you sent with interest, and the rest of the site/book looks like a great resource.
Does MySQL InnoDB Support Specialization & Generalization?
Unfortunately, after 3 hours searching and reading, I still can't find the answer to this question. Keywords I'm searching with include: MySQL + (inheritance | EER | specialization | generalization | parent | child | class | subclass). The only positive result I found is here: http://en.wikipedia.org/wiki/Enhanced_entity%E2%80%93relationship_model. It mentions MySQL Workbench.
Possible Redundancy of Equipment_Category (Table 3)
Yes and No. Because this is a lookup table, it currently has a function. However because every item in the Lamp or the Bike table is of the same category, the column itself may be redundant; and if it is then the Equipment_Category table may be redundant... unless it is required elsewhere. I had intended to use it as the RowSource / OptionList for a webform dropdown. Would it not also be handy to have Equipment_Category as a column in the proposed Equipment parent table. Without it, how would one return a list of all Equipment_Names for the Lamp category (ignoring distinct for the moment).
Implementation
I have no way of knowing what new categories of equipment may need to be added in future, so I’ll have to limit attributes included in the superclass / parent to those I am 100% sure would be common to all (or allow nulls I suppose); sacrificing duplication in many child tables for increased flexibility and hopefully simpler maintenance in the long run. This is particulary important as we will not have professional IT support for this project.
Changes really do have to be automated. So I like the idea of the stored procedure. And the CreateBike example sounds familiar (in principle if not in syntax) to creating an instance of a class in Java.
Lots to think about and to teach myself! If you have any other comments, suggestions etc, they'd be most welcome. And, could you let me know what software you used to create your UML diagram. Its styling is much better than those that I've used.
Cheers!
You sound very interested in this project, which is always awesome to see!
I have a few suggestions for your database schema:
You have individual tables for each Equipment entity i.e. Bike or Lamp. Yet you also have an Equipment_Category table, purely for identifying a row in the Bike table as a Bike or a row in the Lamp table as a Lamp. This seems a bit redundant. I would assume that each row of data in the Bike table represents a Bike, so why even bother with the category table?
You mentioned that your "gut" feeling is telling you to go for an overarching table for all Equipment. Are you familiar with the practice of generalization and specialization in database design? What you are looking for here is specialization (also called "top-down".) I think it would be a great idea to have an overarching or "parent" table that represents Equipment. Then, each sub-entity such as Bike or Lamp would be a child table of Equipment. A parent table only has the fields that all child tables share.
With these suggestions in mind, here is how I might alter your schema:
In the above schema, everything starts as Equipment. However, each Equipment can be specialized into Lamp, Bike, etc. The Equipment entity has all of the common fields. Lamp and Bike each have fields specific to their own type. When creating an entity, you first create the Equipment, then you create the specialized entity. For example, say we are adding the "BMX 200 Ultra" bike. We first create a record in the Equipment table with the generic information (equipmentName, dateOfPurchase, etc.) Then we create the specialized record, in this case a Bike record with any additional bike-specific fields (wheelType, frameColor, etc.) When creating the specialized entities, we need to make sure to link them back to the parent. This is why both the Lamp and Bike entities have a foreign key for equipmentID.
An easy and effective way to add specialized entities is to create a stored procedure. For example, lets say we have a stored procedure called CreateBike that takes in parameters bikeName, dateOfPurchase, wheelType, and frameColor. The stored procedure knows we are creating a Bike, and therefore can easily create the Equipment record, insert the generic equipment data, create the bike record, insert the specialized bike data, and maintain the foreign key relationship.
Using specialization will make your transactional life very simple. For example, if you want all Equipment purchased before 1/1/14, no joins are needed. If you want all Bikes with a frameColor of blue, no joins are needed. If you want all Lamps made of felt, no joins are needed. The only time you will need to join a specialized table back to the Equipment table is if you want data both from the parent entity and the specialized entity. For example, show all Lamps that use 100 Watt bulbs and are named "Super Lamp."
Hope this helps and best of luck!
Edit
Specialization and Generalization, as mentioned in your provided source, is part of an Enhanced Entity Relationship (EER) which helps define a conceptual data model for your schema. As such, it does not need to be "supported" per say, it is more of a design technique. Therefore any database schema naturally supports specialization and generalization as long as the designer implements it.
As far as your Equipment_Category table goes, I see where you are coming from. It would indeed make it easy to have a dropdown of all categories. However, you could simply have a static table (only contains Strings that represent each category) to help with this population, and still keep your Equipment tables separate. You mentioned there will only be around 10-20 categories, so I see no reason to have a bridge between Equipment and Equipment_Category. The fewer joins the better. Another option would be to include an "equipmentCategory" field in the Equipment table instead of having a whole table for it. Then you could simply query for all unique equipmentCategory values.
I agree that you will want to keep your Equipment table to guaranteed common values between all children. Definitely. If things get too complicated and you need more defined entities, you could always break entities up again down the road. For example maybe half of your Bike entities are RoadBikes and the other half are MountainBikes. You could always continue the specialization break down to better get at those unique fields.
Stored Procedures are great for automating common queries. On top of that, parametrization provides an extra level of defense against security threats such as SQL injections.
I use SQL Server. The diagram I created is straight out of SQL Server Management Studio (SSMS). You can simply expand a database, right click on the Database Diagrams folder, and create a new diagram with your selected tables. SSMS does the rest for you. If you don't have access to SSMS I might suggest trying out Microsoft Visio or if you have access to it, Visual Paradigm.
I'm creating an order system to to keep track of orders. There's about 60 products or so, each with their own prices. The system isn't very complicated though, just need to be able to submit how many of each product the person orders.
My question is, is it more efficient to have an 'orders' table, with columns representing each product, and numeric values representing how many of each they ordered.., example:
orders
id
product_a
product_b
product_c
etc...
OR
should I break it into different tables, with a many-to-many table to join them. something like this maybe:
customers
id
name
email
orders
id
customer_id
products
id
product
orders_products
order_id
product_id
I would break it out apart like you show in your second sample. This will make your application much more scalable and will still be quite efficient.
Always build for future features and expansion in mind. A shortcut here or there always seems to bite you later when you have to re-architect and refactor the whole thing. Look up normalization and why you want to separate every independent element in a relational DB.
I am often asked “why make it a separate table, when this way is simpler?” Then remind them that their “oh, there are no other of this type of thing we will use” then later having them ask for a feature that necessitates many-to-many, not realizing they painted you into a corner by not considering future features. People who do not understand data structures tend not to be able to realize this and are pretty bad at specifying system requirements. This usually happens when the DB starts getting big and they realize they want to be able to look at only subset of data. A flat DB means adding columns to handle a ton of different desires, while a many-to-many join table can do it with a few lines of code.
I'd also use the second way. If the db is simple as you say, the difference might not be much in terms of speed and such. But the second way is more efficient and easy to reuse/enhance in case you get new ideas and add to your application.
Should you go for the 1st case, how will you keep track of the prices and discounts you gave to your customers for each product? And even if you have no plans to track it now, this is quite common thing, so might have request for such change.
With normalized schema all you have to do is add a couple of fields.
In my database I currently have two tables that are almost identical except for one field.
For a quick explanation, with my project, each year businesses submit to me a list of suppliers that they sale to, and also purchase things from. Since this is done on an annual basis, I have a table called sales and one called purchases.
So in the sales table, I would have the fields like: BusinessID, year, PurchaserID, etc. And the complete opposite would be in the purchases table, except that there would be a SellerID.
So basically both tables are exactly the same field wise except for the PurchaserID/SellerID. I inherited this system, so I did not design the DB this way. I'm debating combing the two tables into one table called suppliers and just adding a type field to distinguish between whether they are selling to, or purchasing from.
Does this sound like a good idea? Is there something I'm missing in regards to why this wouldn't be a good idea?
Do what works for you.
The textbook answer is normalize. If you normalized you would probably have 2 tables, one with both your buyers and sellers as companies. And a transactions table telling who bought what from who.
If it ain't broke, don't fix it. Leave them separate.
Since the system is already built, I would only consider this if you find yourself doing a lot of queries across the two tables, like big nasty UNION queries. Joining the two tables in one makes queries like "show me all sellers or purchasers who sold/bought between these dates..." much easier.
But it sounds like these two groups are treated very differently from the business rule perspective, so its probably not worth the trouble to make application changes at this point. (Every query would have to have a "WHERE Type = 1" or something like that).
If you'd have asked this during the db design phase, my answer might be different.
Normalization would say "yes".
How many applications are affected by this change? That would affect the decision.
Definitely one table. And I wouldn't call it supplier since this does not reflect the meaning of the table. Something like busibess_partner or something better than that might be more appropriate. Instead of purchase_id and seller_id, then be more generic like business_partner_id, and yes, add a field to distinguish.
Not one table. They are different entities that have a similar structure. There's nothing to be gained by consolidating them. (Nothing lost, either, except lucidity; but that's critical IMHO).
"Normalization" doesn't include looking for tables with similar schemas, and merging them.
A database is always a limited model of your business objective. If it doesn't make sense for you business, ignore those who say you should add complexity to your data model by creating a new companies table (though you probably already have something similar). If you really want to get into the "perfect model" game, just start abstracting everything away into an "entities" table and pretty soon you will have a completely unmanageable database.
Normalization would dictate that you NOT combine the two fields, unless the foreign keys actually point to the same table. A key rule to keep in mind is that each column in a table should only mean one thing. Adding a second field that explains what the first field means breaks this rule.
If your queries are getting to be a mess because you are always joining the two tables, you could create a view.
Also, the number of records in the table is almost completely irrelevant. Always optimize for performance after you have the system in place. If it killing your application to have all the records in one table, set a clustered index on a column that partitions your table in a meaningful way.
You must take into consideration the number of records on both tables. if they are to big it could have a big inpact on queries that have multiple joins to customers and suppliers.
Example: Who sold computers to us and to whom did we sell them to.
From a completely different point of view. I tend to consider logic over technology. To me the decision is not whether the data is similar in shape or fields, but whether it makes sense mixing them. That is as much to say that whether the technical answer might be normalize, my answer would be: does it make sense to you (business logic) to have both together?
Another answer talks about merging both and changing naming conventions. To me that is a logic decision: you are saying that you don't work with buyers and sellers, but with business partners. If that is your case, then do it.
You might also consider what your use of the tables would be. If they are of one unique logic type (business partner) you will surely have queries that need to access both buyers and sellers. Else, if all your queries are separate, that might be an indication that they are not the same, and should not be held together. Pushing them together will imply a lot of extra checks and cpu time spent differing from what were separate entities.
There is a long used metaphor about interfaces that might apply here. Just because a fire gun and a camera both shoot, that does not mean they share an interface, unless you like playing Russian roulette.
From a logical view, there seems to be no difference between the reported transactions, it is just a difference in who reports it to you. It should be a single table with SellerID, BuyerID, and (if you need it) ReporterID(s) (and perhaps additional transaction information).
This is how it should be. Now, how to make the transition? Making a script that uses the two old tables to fill a new table should be an easy exercise, but then you also need to change all the queries that use the information. This is likely a lot of work, and might not be worth the effort.
Since none of the experts reporting in are willing to answer your question, the simple answer is: query1 UNION query2
EX.
SELECT * FROM table1 UNION SELECT * FROM table2 assuming table1 and table2 have the same structure/heading titles