Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have a question about database that I'm currently trying to design, I also would like to mention that I am a beginner so please keep that in mind.
So, the DB that I'm currently designing will be used for my application that I'm developing for school project. The application will be a fitness app which will have number of features but the main focus will be on creating diets based on some information provided by the user. All the data to generate such diet will be taken from the DB such as food and its nutrients etc.
The way I want to construct this diet is so that it will create be a sample diet for a period of 7 days. On each day there will be 5 meals, and each meal will contain number of products e.g 100g chicken breast, 100g brown rice, 100g broccoli.
I've made a ERD diagram to help me model this which you can see at the following link
http://imgur.com/0ivcM5x
As you can see from the picture I've created Food table which will hold all sort of food items that can be used to create the meals, and This is where I get stuck. In my model I have broken it down into separate tables but I don'k know whether that is correct and will work.
I'm also not sure how to create that "MealTable" so far I've got meal_id as PK and food_id as a FK but will I be able to to create Meals with multiple food items or will it be 1 meal 1 item from "food" table.
Similar with the "DietTable" and "dayOfTheWeek" I am trying to take similar approach here so I've got diet_id as PK and day_id as FK but will that allow me to have multiple "day" instances within the same diet.
I know this questions are not really specific, but I'm just trying to understand how to model this and whether this approach is correct. Will it work and are there any other alternative ways to model similar problems.
Any help or suggestions would be appreciated.
I think your ERD diagram looks pretty good, but I noticed a couple things:
It looks like your MealTable table will be having more than one row per meal, so I would rename it to MealFood to list all the food items in each meal. If you need to add fields to each meal (such as the name of the meal), then make a separate table called Meal that will have just one row per meal
I would remove day_id from DietTable and instead have a diet_id column in the dayOfTheWeek table. If you pair this with a mealOfTheDay column, you could make a unique key across diet_id, day_id, mealOfTheDay to make sure each diet only has one meal for each meal of the day
Example dayOfTheWeek table:
| diet_id | day_id | mealOfTheDay | meal_id |
---------------------------------------------
| 1 | 1 | 1 | 999 |
| 1 | 1 | 2 | 642 |
| 1 | 1 | 3 | 242 |
| 1 | 1 | 4 | 298 |
| 1 | 1 | 5 | 322 |
| 1 | 1 | 5 | 111 | <- Unique key will not allow
this row as it's the second
fifth meal on same day for
diet 1
Also, try to keep you naming consistent. Some of you table names end with Table and some don't.
I've been looking to your database model, and there's some things that are missing and other that needs to be change (my opinion). I've already disigned a similar model, not for fitness, but I need to follow meals protein for a PKU disease. Your final model is (I think) really near than what I've designed :
Here are the list of tables :
FOOD : As you've done, except that in my case I'm using it to describe the proteins for 100g of this food.
FOOD_UNIT : mass units (mg, g, ..., Kg), volume (ml, cl, ..., L)
FOOD_CATEGORY : Vegetables, fruits, ...
CALENDAR : Date, day by day... I think it would be more usable for you than using a simple day of week ID and a meal ID. If soneine start the diet on monday and another one on Friday, they will eat exactly the same meal but in fact maybe the first person needs more or less food than the other one. Actually you have a big problem on this.
MEAL_PERIOD : Morning, ... Evening. Use time here helps to precise if meal was taken in early morning or nearby from lunch
PERSON : For who you are editing the meal. Several persons with differents needs in proteins, fat, ...
MEAL : I think you know what to put here (primary key on person_id, calendar_id, meal_period_id). Add all food and quantity in this table. Don't forget to add the foreign key on food_unit_id, and in your case your diet_id
Can't draw a schema now, but normally you have all the elements to draw a cool diagram that will normally fit your needs.
Related
the first is the sectors table that has an id and sector name like this
id | sector
1 | Government
2 | Education
The second is the employee table like this (simplified)
Id | name
1 | sam
2 | tom
Finally I have a sectorMap table (this is used to join the two tables above together) like this
Id | sectorid | employeeid
1 | 1 | 2
2 | 1 | 1
3 | 2 | 2
So in this instance, once I join everything together and view the sectors or each employee, it would show that tom has two sectors (government, education) and sam only has one (government)… hope that makes sense
My question is, within my application, the user has the ability to change these sectors by selecting from a multiple selection dropdown in html. For some reason I thought that by doing an update on duplicate expression would work, however seeing how I have multiple rows of data, I would need to delete all rows within the sectormap table that do not reflect the new selection and contain the selected employees id. What would be the best way of going about that?
For instance, in this case, lets say I open the application and see that tom has two sectors tied to him (government, education) and I only wanted him to have one (government). When I deselect education and select GO. The application returns a list to the server that contains (‘government’). How can I formulate an expression to delete the education sector row from the sectormap table that contains his id?
Your answer is in your question.
1st when you are deselecting education. You will get data of (‘government’). right?
So just invert your query. select those records which is not (‘government’), those are education.
So, education records are you can delete.
Hope this will help you. thanks:)
Concise explanation
There is a row in the database which shows the current state of 'Umbrella', forged from the Model 'Product'.
You want to access the complete history of what you deem to be relevant changes to Umbrella, involving related models, quickly and painlessly.
The problem is that paper trail doesn't bring in the beef when the events table is tens of thousands of rows long, and you can't truncate it as it contains important history, and its performance is woeful as it has to parse thousands of lines of YAML to find 'relevant' changes.
Background reading done, still no idea what the problem is called
This seems like something basic to me but I see no mention of others tackling it beyond using papertrail, thus I don't know what its non-proprietarily commonly referred to as, if at all. "ruby on rails what is vs what was architecture without papertrail" was the best title I could think of. I'm creating a one to many relationship between models and time?
Have read "A!!! Design Patterns in Ruby, 2007" which references gang of four's design patterns, no mention of this problem?
Have tried "paper trail" gem but it doesn't quite solve it
The problem
Assuming you have Products, Companies and Categories, and
Product: id, name, price, barcode, (also company_id and category_id)
Company: id, name, registered_company_number
Category: id, name, some_immutable_field
Company has many Products
Category has many Products
And you need to see history of each Product, including changes on itself such as price, changes to which company it belongs to, changes to company name, same thing for categories, such as:
date | event | company name | cmp | category | cat | name | price
| | | id | name | id | |
------|---------------|--------------|-----|----------|-----|----------|------
jan11 | created | megacorp | 1 | outdoors | 101 | umbrella | 10
feb11 | cat change | megacorp | 1 | fashion | 102 | umbrella | 10
mar11 | cat rename | megacorp | 1 | vogue | 102 | umbrella | 10
apr11 | cmp rename | megacorp inc | 1 | vogue | 102 | umbrella | 10
may11 | cmp change | ultra & sons | 2 | vogue | 102 | umbrella | 12
jul11 | cmp change | megacorp | 1 | vogue | 102 | umbrella | 12
note that whilst umbrella was with ultra & sons, megacorp inc changed its name back to megacorp, but we don't show that in this history as its not relevant to this product. (The name change of company 1 happens in jun11, but is not shown)
This can be accomplished with papertrail, but the code to do it is either very complex, long and procedural; or if written 'elegantly' in the way papertrail intended, very very slow as it makes many db calls to what is currently a very bloated events table.
Why paper trail is not the right solution here
Paper trail stores all changes in YAML, the database table is polymorphic and stores a lot of data from many different model. This table and thus this gem seems to be suited to identify who did what changes... but to use it for history like I need to use it, its like a god table that stores all information about what was and has too much responsibility.
The history I am after does not care about all changes to an object, only certain fields. (But we still need to record all the small changes, just not include them in the history of products, so we can't just not-record these things as paper trail has its regular duties identifying who did what, it cannot be optimised solely for this purpose). Pulling this information requires getting all records where the item_type is Product, where the item_id is of the currently being viewed product_id, then parsing the YAML, and seeing if we are interested in the changes (is a field changed, which is a field we are interested in seeing the changes to?). Then doing the same for every category and company that product has been associated with in its lifetime, but only keeping the changes which occur in the windows for which product has been associated to said category/company.
Paper trail can be turned off quite easily... so if one of your devs were to disable it in the code somewhere as an optimisation whilst some operations were to be run, but forget to write the code to turn it back on, no history recorded. And because paper trail is more of a man on the loop than man in the loop, if its not running you might not notice (then have to write overly complex code which catches all the possible scenarios with holey data). A solution which enforces the saving of history is required.
Half baked solution
Conceptually I think that the models should be split between that which persists and that which changes. I am surprised this is not something baked into rails from the ground up, but then there are some issues with it:
Product: id, barcode
Product_period: id, name, price, product_id, start_date, (also company_id and product_id)
Company: id, registered_company_number
Company_period: id, name, company_id, start_date
Category: id, some_immutable_field
Category_period: id, name, category_id, start_date
Every time the price of the product, or the company_id of the product changes, a new row is added to product_period which records the beginning of a new era where the umbrella now costs $11, along with the start_date (well, time) that this auspicious period begins.
Thus in the product model, all calls to things which are immutable or we only care about what the most recent value is, remain as they are; whereas things which change and we care, have methods which to an outsider user (or existing code) appear to be operating on product model, but in fact make a call to most recent product_period for this product and get the latest values there.
This solves the problem superficially, but its a little long winded, and it still has the problem that you have to poke around through company_period and category_period selecting relevant entries (as in the company/category experiences a change and it is during a time when product was associated with it) rather than something more elegant.
At least the MySQL will run faster and there is more freedom to make indexes, and there is no longer thousands of YAML parses bogging it down.
On the quest to write more readable code, are these improvements sufficient? What do other people do? Does this have a name? Is there a more elegant solution or just a quagmire of trade offs?
There are a bunch of other versioning and history gems for rails (I contributed to the first one, 10 years ago!) - find them here, https://www.ruby-toolbox.com/categories/Active_Record_Versioning
They all have different methods for storing like you suggest above, and some are configurable. I also don't agree with the polymorphic god table for all users, but it's not too slow if you have decent indexes.
my rule of business is something like a used car/motobike dealership:
My table "stock" contains cars, so no two of the same products as each automobile belongs to a different owner.
Sometimes the owner has two cars that he wants to sell separately, but also wants to sell them together, eg:
Owner has a car and a motorcycle:
+----------------+
| id | Stock |
+----+-----------+
| 1 | car |
+----+-----------+
| 2 | motorcycle|
+----+-----------+
In case he wants to advertise or sell in two ways, the first would be the car for U$10.000 and motobike for U$5.000
But it also gives the option to sell both together for a lower price (car + bike U$ 12.000), eg:
+----+-----------+--------------------+-----------+
| id | id_poster | Stock | Price |
+----+-----------+--------------------+-----------+
| 1 | 1 | car | U$ 10.000 |
+----+-----------+--------------------+-----------+
| 2 | 2 | motorcycle | U$ 5.000 |
+----+-----------+--------------------+-----------+
| 1 | 3 | car | U$ 12.000 |
+----+-----------+--------------------+-----------+
| 2 | 3 | motorcycle | U$ 12.000 |
+----+-----------+--------------------+-----------+
This is the best way to do this?
My structure is already doing so (just as I believe to be the best way), I'm using foreign key and n:m, see my structure:
Ok, so if I'm understanding the question right, you're wondering if using a junction table is right. It's still difficult to tell from just your table structures. The poster table just has a price, and the stock table just has a title and description. It's not clear from those fields just what they're supposed to represent or how they're supposed to be used.
If you truly have a many-to-many relationship between stock and poster entities -- that is, a given stock can have 0, 1 or more poster, and a poster can have 0, 1 or more stock -- then you're fine. A junction table is the best way to represent a true many-to-many relationship.
However, I don't understand why you would want to store a price in poster like that. Why would one price need to be associated with multiple titles and descriptions? That would mean if you changed it in one spot that it would change for all related stock. Maybe that's what you want (say, if your site were offering both A1 and A0 size posters, or different paper weights with a single, flat price across the site regardless of the poster produced). However, there just aren't enough fields in your tables currently to see what you're trying to model or accomplish.
So: Is a junction table the best way to model a many-to-many relationship? Yes, absolutely. Are your data entities in a many-to-many relationship? I have no idea. There isn't enough information to be able to tell.
A price, in and of itself, may be one-to-one (each item has once price), one-to-many (each item has multiple prices, such as multiple currencies), or -- if you use a price category or type system like with paper sizes -- then each item has multiple price categories, and each price category applies to multiple items.
So, if you can tell me why a stock has multiple prices, or why a single poster price might apply to multiple stock, then I can tell you if using a junction table is correct in your situation.
Having seen your edit that includes your business rules, this is exactly the correct structure to use. One car can be in many postings, and one posting may have many cars. That's a classic many-to-many, and using a junction table is absolutely correct.
Not clear how the examples relate to your diagram because you use different terminology, but I think it's safe to say: If you want to store something like "this entity consists of orange, apple and pear" then the DB design you show is the correct way to do it. You'd have one poster entry, and three entries in the poster_has_stock pointing to the same poster and three elements in stock.
The structure which you're using is best solution in your case, no need to change, just 2 minor changes needed:
1. remove 2 indexes: fk_poster_has_stock_stock1_idx and fk_poster_has_stock_poster_idx, because they are primary keys already
2. stock_price field should use decimal data type (more precise)
You can read more about Decimal data type here
I think Your solution is nearly perfect. I think You may add "id" to "poster_has_stock" table. And of course change price type (it was written upper).
But You may consider second option with stock_id in poster table.
WHY?
-There should be no poster with no stock connected to it.
-In most cases there will be offers: one stock <=> one poster
This will allow You also to add as many dependend stocks to poster as You want.
You can also add poster_special_price DECIMAL (9,2) to poster table.
This will allow You easy to show:
price for stock item.
Special price for stock item with it's dependencies.
This will be also easier to manage in controller (create, update) - You will be adding poster already with stock, No transactions will be needed during adding new poster.
you may consider a new table that creates a relationship between the stock items such as:
stock_component
---------------
parent_stock_id
child_stock_id
child_qty
in this way, you can link up many children into one parent in the style of a bill of materials, then the rest of your links can continue to be simply related to stock_id of the appropriate parent.
Lets say I'm storing play by play info for sports: basketball, football, and baseball. The data basically fits the same model:
| play_id | play_type_id | play_description_id | player1_id | player2_id | player3_id |
Those are the basic columns that each sport would share, but there would be several more. Some columns would only be used by certain sports - like player3_id would be used by football for who made a tackle, but never by basketball - there wouldn't be a lot of these limited-use columns, but some.
Each game can have anywhere from 300 - 1000 rows (high estimate), so this table could grow to the billions eventually.
My questions are:
Should I just start off with different tables for each sport, even though there'd be about a 90% overlap of columns?
At what point should I look into partitioning the table? How would I do this? I'm thinking of archiving all the plays from the 2012 season (whether it be a sports specific table or all-inclusive).
Sorry if my post isn't more concise. This is all a hypothetical case, I'm just trying to figure out what the disadvantages of having one massive table would be, obviously performance is a consideration, but at what point does the table's size warrant being divided. Because this isn't a real project, it's hard to determine what the advantages of having a table like this would be. So again, sorry if this is a stupid question.
EDIT/ADDITIONAL QUESTION:
On a somewhat side-note, I haven't use noSQL DBs before, but is that something I should consider for a project like this? Lets say that there'd be a high velocity of reads and return time would be crucial, but it also needs to have the ability to run complex queries like "how many ground balls has playerA hit to secondbase, off playerB, in night games, during 2002 - 2013?"
I would separate it in multiple table. That way it is more flexible.
And if you want to make some statistics your are gonna be able to do more complex queries than if you have only one table.
It could look like this
Table PLAYER
ID | FIRSTNAME | LASTNAME | DATE_OF_BIRTH
-----------------------------------------
1 | michael | Jordan | 12.5.65
Table SPORT
ID | NAME | DESCRIPTION
------------------------------------------
1 | Basketball | Best sport in the world
2 | Golf | Nice sport too
Table PLAYER_SPORT
SPORT_ID | PLAYER_ID | PLAYER_POSITION_ID
--------------------------------------------
1 | 1 | 1 /* Michael Jordan play Basketball */
2 | 1 | NULL /* Michael Jordan play also Golf */
Table PLAYER_POSITION
ID | POSITION | DESCRIPTION | SPORT_ID
-------------------------------------------
1 | Middlefield | Any description.. | 1
As far as your table structure is concerned the best practice is to have another table for Mapping play_id and player_id. There is no need of columns player1_id,player2_id,player3_id. Just make a new table which has play_id and player_id columns.
Should I just start off with different tables for each sport, even
though there'd be about a 90% overlap of columns?
I don't think that would help you much, the problem of growth rate for a single table will occur for segmentation-ed tables, this kind of distribution will just make a delay and will not solve the problem. Also you will lose integrity and consistency by violating Normal Forms.
At what point should I look into partitioning the table? How would I
do this? I'm thinking of archiving all the plays from the 2012 season
(whether it be a sports specific table or all-inclusive).
You need to use logical database partitioning.
I think a range partition on mach-date field will be helpful.
Documents about MySql partitioning could be found here.
Recomanding to use NoSql will need more information about your application, BTW NoSql will come with its pros and cons. Having a look at the post may helps.
.
Trying to summarize in as few of words as possible:
I am trying to create a system that tracks the various products an individual can sell and the commission percentage they earn on that particular item. I am thinking about creating reference integers for each product called "levels" which will relate to their commission percentage in a new lookup table instead of a single reference point.. Is this overkill though or are there any benefits over just placing inline for each record?
My gut tells me there are advantages of design 1 below but not sure what they are the more I think about it. If I need to update all individuals selling product X with level Y, indexes and replaces make that easy and fast ultimately in both methods. By using design 2, I can dynamically change any "earn" to whatever percentage I can come up with (0.58988439) for a product whereas I would have to create this "level" in design 1.
Note: the product does not relate to the earn diretly (one sales rep can earn 50% for the same product another sales rep only earns 40% on).
Reference Examples:
Design 1 - two tables
table 1
ID | seller_id | product_id | level
-----------------------------------------------
1 | 11111 | 123A | 2
2 | 11111 | 15J1 | 6
3 | 22222 | 123A | 3
table 2
ID | level | earn
--------------------------
1 | 1 | .60
2 | 2 | .55
3 | 3 | .50
4 | 4 | .45
5 | 5 | .40
6 | 6 | .35
Design 2 - one table
ID | seller_id | product_id | earn
-----------------------------------------------
1 | 11111 | 123A | .55
2 | 11111 | 15J1 | .35
3 | 22222 | 123A | .45
(where earn is decimal based, commission percentage)
Update 1 - 7/9/13
It should also be noted that a rep's commission level can change at any given time. For this, we have planned on simply using status, start, and end dates with ranges for eligible commission levels / earn. For example, a rep may earn a Level 2 (or 55%) from Jan 1 to Feb 1. This would be noted in both designs above. Then when finding what level or percentage a rep was earning at any given time: select * from table where (... agent information) AND start <= :date AND (end > :date or END IS NULL)
Does level mean anything to the business?
For instance, I could imagine a situation where the levels are the unit of management. Perhaps there is a rush for sales one quarter, and the rates for each level change. Or, is there reporting by level? In these situations is would make sense to have a separate "level" table.
Another situation would be different levels for different prices of the product -- perhaps the most you sell it for, the higher the commission. Or, the commissions could be based on thresholds, so someone who has sold enough this year suddenly gets a higher commission.
In other words, there could be lots of rules around commission that go beyond the raw percentage. In that case, a "rule" table would be a necessary part of the data model (and "levels" are a particular type of rule).
On the other hand, if you don't have any such rules and the commission is always based on the person and product, then storing the percentage in the table makes a lot of sense. It is simple and understandable. It also has good performance when accessing the percentage -- which presumably happens much more often than changing it.
First of all, using id values to reference a lookup table has nothing to do with normalization per se. Your design #2 shown above is just as normalized. Lots of people have this misunderstanding about normalization.
One advantage to using a lookup table (design #1) is that you can change what is earned by level 6 (for example), and by updating one row in the lookup table, you implicitly affect all rows that reference that level.
Whereas in design #2, you would have to update every row to apply the same change. Not only does this mean updating many rows (which has performance implictations), but it opens the possibility that you might not execute the correct UPDATE matching all the rows that need updating. So some rows may have the wrong value for what should be the same earning level.
Again, using a lookup table can be a good idea in many cases, it's just not correct to call it normalization.