Forming my ERD, couple of issues with it - mysql

Okay, my partners and myself created these a while ago. We are going to be transferring this into SQL through Visual Basic soon, but I want to make sure everything is ready to go. Two major complaints that we were not able to fix was...
"By having Transaction and Product directly connected you are unable to allow multiple products on the same order (customer can't order both a latte and cappuccino on the same order)."
"Membership table: Not sure what the data in DiscountTypeTotal means - do you have multiple pieces of data in the same field? (then your table isn't on 1NF).
It looks like you need to allow each member to have multiple discounts - so you need another table to capture that."
How do we effectively correct these? How else would we connect Transaction with Products? I understand that the customer can only purchase one item per transactions, so would we have products and another table for multiple items? Allowing multiple customers to have discounts, I am lost. Any help would be appreciated.

Related

What do I gain from specifying relationships in my database?

I am building a project in MS Access 2010. I have previous experience in Oracle. I am reading about MS Access and keep seeing references to table relationships. It looks like a convenient way to assist the average person in data entry and validation and for query building, but I write queries exclusively in SQL mode and enforce data entry for users with forms that have their own validation rules.
Is it really necessary to enforce relationships? It doesn't seem like it really gains me anything at an advanced level, and might actually cause problems for me or someone else who eventually takes over maintenance from me later. I've never used them before and I'm not really seeing a benefit to starting now. Can anyone shed some light on that?
You say you have previous experience of Oracle. Did you never define Foreign Key constraints in Oracle? If you did, then that is what you are doing when you define relationships in Access. You can use it for enforcing referential integrity (not allowing you to delete a parent record if child records still exist) or, if you use the cascade delete option, for automatically deleting child records if you delete a parent record. It's a useful backup to cover coding errors where you might have forgotten about possible child records that would otherwise be orphaned if you did not have the relationship (FK) defined.
From a person just querying data, then the relationships are not that important. However from an application point of view, they are VERY helpful if not outright important.
For example, you might have a customer’s table, and then say an orders table. The business rule is that you can’t create an order unless you first have a customer. So if you freely write some SQL to add an order without a customer, your update/insert query will NOT work. And if you need to delete a customer, then all orders for that customer can/will automatic delete for you without having to write a complex delete SQL statement. You might for example want to delete all customers older than 5 or 10 years (so they are inactive). When you delete those customers, then you want all orders also deleted. (This is a VERY difficult query to write if you have to delete the child records for each customer. with enforced relations, then all child records will automatic delete for you (enforced cascade delete)).
And it also important from a reporting point of view. If you write a query to display all customers this month and their billing totals then you get one total result. However if you decide that you do NOT want to display/include customers, you might hit just the orders table and get a total amount that way. The problem is without RI, then you might (by accident or even just some user launching the orders form) have entered order information (with a total amount) but NO customer.
Now what happens is when you run the two different reports/quires, you find the total is DIFFERENT! In a complex application as to “why” the two reports are different can take days, or with lots of data even a week to figure out why two reports on monthly sales do NOT agree with each other. If you enforce the business rule that no orders can be entered into the system UNLESS they have a customer, then you eliminate such errors in reporting. You can “say” that you are perfect user of SQL, but with lots of code, lots of forms for data entry, how can you EVER be sure that orders are NEVER entered without a customer. The user during data entry may forget to enter the customer in that order form. And even if you write code in that order form to ENSURE that customer must be selected, maybe YOU during the writing of some SQL by accident insert an order record into the system without a customer. However your monthly customer total report query “assumes” that you have a customer record that you THEN join in the order totals data.
However some reports must just run on the orders data (a monthly summary total does not need to include customers). The problem now is somewhere in the system you have an order record with total data that does not have a customer. The result is different reports and quires on sales total now don’t agree. This is an outright nightmare.
So some bug or error in the application code might occur and result in what is supposed to be relational data now having “orphaned” records. Perhaps your business rules allow entering of orders without a customer assigned, but then your monthly sales report will have to show that fact, or any query that hits the orders table and does not include customers will have to “check” for the possibility in those queries that no customer record yet exits.
The above is only a SIMPLE scratching of the surface of the GAZILLION issues that crop up. So while you might be just creating simple quires on the data, the problem is that data correctly related in the system? The old saying about garbage in = garbage out rings true here.
At the end of the day when you’re SQL quires pulls data with MULTIPLE tables, then you HAVE to make assumptions about that data and its relational integrity (RI). So when you write that query to display customers and their order totals, you ASSUME and drop in the customers table, and then relational join in the orders table. However if orders exist without a customer record, then your query not going to produce the correct values. And worse a report that hits the orders table will now produce different results.
If you enforce RI then no matter what, you cannot enter an order by accident or force without FIRST having created a customer record. If you don’t enforce such rules, then your data will produce incorrect results.
And a typical complex application will have 40 or 70 related tables. And EVERY ONE of those tables is going to have assumptions made as to if parent (or child) record are “assumed” to have been created correctly based on your set of business assumptions.
You might have a tour booking system. Customers might phone up, put down a deposit but NOT yet be booked to a particular tour. If you allow this setup, then your query on customers this month and their booked tour will have to take this into account. However maybe the business rules are that any customer in the system that puts money down MUST ALSO be booked to a tour (and thus you query to grab that information will take this rule into account).
If every query you always made never was to include data from more than one table, then you likely don’t benefit much from enforcing relational data. However the instant you start bundling queries with multiple tables, then you MUST know the assumptions being made about that data before you can write a query. So do you allow customers with a deposit in the system without a tour booking or not? This rule will decide how you must write that query. If RI is enforced, then you can query on a customer “booking” that and you KNOW that it will be attached to a tour event. And same goes if any booking + deposit does not need a booking – but you HAVE to know the assumption made about that data.
So based on assumptions made about the data is the ONLY practical way to create a query to pull data based on those assumptions. And if you enforce RI, then you at least know the data MUST be related and setup based on those assumptions.
At the end of the day? Anyone creating a data base that models a business application and rules without enforcing RI is building a ship without rudder and without a compass.
And exporting data from each table is a NON issue. However if that data is a mess and has orphaned records, then you only wind up exporting a incorrect data model to another database and all of the issues and problems remain.
If you are building queries purely in SQL mode, defining the relationships probably doesn't make any difference for you. The only thing that might be useful is that if you built something, then didn't look at it again for a few months, you would be able to quickly re-acquaint yourself with the relationships conceptually.
For anyone using the access query builder, defining the relationships allows you to quickly add tables to the query while Access automatically builds the proper (GIGO) relationships for the query JOIN. Again, if you are writing in SQL, you probably already do this, so not much help for you in query building.
Bottom Line - it's more of a graphical tool to streamline the query process, at least until you try exporting the tables to a "real" RDBMS, as someone else already mentioned.
If its not a requirement for you use case then you don't necessarily have use this. A use case where this could be "required" is in an Order Based Scenario.
Lets say you have a Database that creates and tracks Orders. Each Order can have multiple Lines that are tied to the same Order. But for Normalization purposes, most people would separate these into two separate tables. OrderHead and OrderDetail. You would want to enforce Referential Integrity here to ensure that there is never a child record in OrderDetail that doesn't link back to a Parent Order.
I'm sure that you could prevent things like that without it, but it mainly just enforces it.
Relationships helps in preserving data integrity and I agree with your point that if user is entering from access form, probability of errors due to integrity is lesser. But in future if user is moving from MS Access to pure RDBMS, this relationship will definitely will be helpful.
Though objective of relationship is not for migration at later point-in-time, for your case that is one valid reason I could think-of.
Other than that, for MS Access with its own forms relationship may not add specific values.

Relationship database design - object specific many to many, do I solve with self join table or new table

Being new to relational database design, I am trying to clarify one piece of information to properly design this database. Although I am using Filemaker as the platform, I believe this is a universal question.
Using the logic of ideally having all one to many relationships, and using separate tables or join tables to solve these.
I have a database with multiple products, made by multiple brands, in multiple product categories. I also want this to be as scale-able as possible when it comes to reporting, being able to slice and dice the data in as many ways as possible since the needs of the users are constantly changing.
So when I ask the question "Does each Brand have multiple products" I get a yes, and "Does each product have multiple brands" the answer is no. So this is a one to many relationship, but it also seems that a self-join table might give me everything that I need.
This methodology also seems to go down a rabbit hole for other "product related" information such as product category, each product is tied to one product category, but only one product category is related to a product.
So I see 2 possibilities, make three tables and join them with primary and foreign keys, one for Brand, one for Product Category, and one for Products.
Or the second possibility is to create one table that has the brand and product category and product info all in one table (since they are all product related) and simply do self-joins and other query based tables to give me the future reporting requirements that will be changing over time.
I am looking for input from experiences that might point me in the right direction.
Thanks in advance!
Could you ever want to store additional information about a brand (company URL, phone number, etc.) or about a product category (description, etc.)?
If the answer is yes, you definitely want to use three tables. If you don't, you'll be repeating all that information for every single item that belongs to the same brand or same category.
If the answer is no, there is still an advantage to using three tables - it will prevent typos or other spelling inconsistencies from getting into your database. For example, it would prevent you from writing a brand as "Coca Cola" for some items and as "Coca-Cola" for other items. These inconsistencies get harder and harder to find and correct as your database grows. By having each brand only listed once in it's own table, it will always be written the same way.
The disadvantage of multiple tables is the SQL for your queries is more complicated. There's definitely a tradeoff, but when in doubt, normalize into multiple tables. You'll learn when it's better to de-normalize with more experience.
I am not sure where do you see a room for a self-join here. It seems to me you are saying: I have a table of products; each product has one brand and one (?) category. If that's the case then you need either three tables:
Brands -< Products >- Categories
or - in Filemaker only - you can replace either or both the Brands and the Categories tables with a value list (assuming you won't be renaming brands/categories and at the expense of some reporting capabilities). So really it depends on what type of information you want to get out in the end.
If you truly want your solution to be scalable you need to parse and partition your data now. Otherwise you will be faced with the re-structuring of the solution down the road when the solution grows in size. You will also be faced with parsing and relocating the data to new tables. Since you've also included the SQL and MySQL tags if you plan on connecting Filemaker to an external data source then you will definitely need to up your game structurally.
Building everything in one table is essentially using Filemaker to do Excel work and it won't cut it if you are connecting to SQL, MySQL, etc.
Self join tables are a great tool. However, they should really only be used for calculating small data points and should not be used as pivot points or foundations for your reporting features. It can grow out of control as time goes on and you need to keep your backend clean.
Use summary and sub-summary reporting features to slice product based data.
For retail and general product management solutions, whether it's Filemaker/SQL/or whatever the "Brand" or "Vendor" is it's own table. Then you would have a "Products" table (the match key being the "Brand ID").
The "Product Category" field should be a field in the "Products" table. You can manage the category values by building a standard value list or building a value list based on a "Product Category" table. The second scenario is better for long term administration.

Database model for a 24/7 Staff roster at a casino

We presently use a pen/paper based roster to manage table games staff at the casino. Each row is an employee, each column is a 20 minute block of time and each cell represents what table the employee is assigned to, or alternatively they've been assigned to a break. The start and end time of shifts for employees vary as do the games/skills they can deal. We need to keep a copy of the rosters for 7 years, with paper this is fairly easy, I'm wanting to develop a digital application and am having difficulty how to store the data in a database for archiving.
I'm fairly new to working with databases, I think I understand how to model the data for a graph database like neo4j, but I had difficulty when it came to working with time. I've tried to learn about RDBMS databases like MySQL, below is how I think the data should be modelled. Please point out if I'm going in the wrong direction or if a different database type would be more appropriate, it would be greatly appreciated!
Basic Data
Here is some basic data to work with before we factor in scheduling/time.
Employee
- ID Number
- Name
- Skills (Blackjack, Baccarat, Roulette, etc)
Table
- ID Number
- Skill/Type (Can only be one skill)
It may be better to store the roster data as a file like JSON instead? Time sensitive data wouldn't be so much of a problem then. The benefit of going digital with a database would be queries, these could help assist time consuming tasks where human error is common.
Possible Queries
Note: Staff that are on shift are either on a break or on the floor (assigned to a table), Skills have a major or minor type based on difficulty to learn.
What staff have been on the floor for 80 minutes or more? (They are due for a break)
What open tables can I assign this employee to based on their skillset?
I need an employee that has Baccarat skill but is not already been assigned to a Baccarat table.
What employee(s) was on this table during this period of time?
Where was this employee at this point in time?
Who is on shift right now?
How many staff on shift can deal Blackjack?
How many staff have 3 major skills?
What staff have had the Baccarat skill for at least 3 months?
These queries could also be sorted by alphabetical order or time, skill etc.
I'm pretty sure I know how to perform these queries with cypher for neo4j provided I model the data right. I'm not as knowledgeable with SQL queries, I've read it can get a bit complicated depending on the query and structure.
----------------------------------------------------------------------------------------
MYSQL Specific
An employee table could contain properties such as their ID number and Name, but am I right that for their skills and shifts these would be separate tables that reference the employee by a unique integer(I think this is called a foreign key?).
Another table could store the gaming Tables, these would have their own ID and reference a skill/gametype with a foreign key.
To record data like the pen/paper roster, each day could have a table with columns starting from 0000 increasing by 20 in value going all the way to 2340? Prior to the time columns I could have one for staff where each employee is represented with their foreign key, the time columns would then have foreign keys to the assigned gaming Tables, the row data is bound to have many cells that aren't populated since the employee shift won't be 24/7. If I'm using foreign keys to reference gaming Tables I now have a problem when the employee is on break? Unless I treat say the first gaming Table entry as a break?
I may need to further complicate things though, management will over time try different gaming Table layouts, some of the gaming Tables can be converted from say Blackjack to Baccarat. this is bound to happen quite a bit over 7 years, would I want to be creating new gaming Table entries or add a column to use a foreign key and refer to a new table that stores the history of game types during periods of time? Employees will also learn to deal new games during their career, very rarely they may also have the skill removed.
----------------------------------------------------------------------------------------
Neo4j Specific
With this data would I have an Employee and a Table node that have "isA" relationship edges mapping to actual employees or tables?
I imagine with the skills for the two types I would be best with a Skill node and establish relationships like so?: Blackjack->isA->Skill, Employee->hasSkill->Blackjack, Table->typeIs->Blackjack?
TIME
I find difficulty when I want this database to now work with a timeline. I've come across the following suggestions for connecting nodes with time:
Unix Epoch seems to be a common recommendation?
Connecting nodes to a year/month/day graph?
Lucene timeline? (I don't know much about this or how to work with it, have seen some mention it)
And some cases with how time and data relate:
Staff have varied days and start/end times from week to week, this could be shift node with properties {shiftStart,shiftEnd,actualStart,actualEnd}, staff may arrive late or get sick during shift. Would this be the right way to link each shift to an employee? Employee(node)->Shifts(groupNode)->Shift(node)
Tables and Staff may have skill data modified, with archived data this could be an issue, I think the solution is to have time property on the relationship to the skill?
We open and close tables throughout the day, each table has open/close times for each day, this could change in a month depending on what management wants, in addition the times are not strict, for various reasons a manager may open or close tables during the shift. The open/closed status of a table node may only be relevant for queries during the shift, which confuses me as I'd want this for queries but for archiving with time it might not make sense?
It's with queries that I have trouble deciding when to use a node or add a property to a node. For an Employee they have a name and ID number, if I wanted to find an employee by their ID number would it be better to have that as a node of it's own? It would be more direct right, instead of going through all employees for that unique ID number.
I've also come across labels just recently, I can understand that those would be useful for typing employee and table nodes rather than grouping them under a node. With the shifts for an employee I think should continue to be grouped with a shifts node, If I were to do cypher queries for employees working shifts through a time period a label might be appropriate, however should it be applied to individual shift nodes or the shifts group node that links back to the employee? I might need to add a property to individual shift nodes or the relationship to the shifts group node? I'm not sure if there should be a shifts group node, I'm assuming that reducing the edges connecting to the employee node would be optimal for queries.
----------------------------------------------------------------------------------------
If there are any great resources I can learn about database development that'd be great, there is so much information and options out there it's difficult to know what to begin with. Thanks for your time :)
Thanks for spending the time to put a quality question together. Your requirements are great and your specifications of your system are very detailed. I was able to translate your specs into a graph data model for Neo4j. See below.
Above you'll see a fairly explanatory graph data model. In case you are unfamiliar with this, I suggest reading Graph Databases: http://graphdatabases.com/ -- This website you can get a free digital PDF copy of the book but in case you want to buy a hard copy you can find it on Amazon.
Let's break down the graph model in the image. At the top you'll see a time indexing structure that is (Year)->(Month)->(Day)->(Hour), which I have abbreviated as Y M D H. The ellipses indicate that the graph is continuing, but for the sake of space on the screen I've only showed a sub-graph.
This time index gives you a way to generate time series or ask certain questions on your data model that are time specific. Very useful.
The bottom portion of the image contains your enterprise data model for your casino. The nodes represent your business objects:
Game
Table
Employee
Skill
What's great about graph databases is that you can look at this image and semantically understand the language of your question by jumping from one node to another by their relationships.
Here is a Cypher query you can use to ask your questions about the data model. You can just tweak it slightly to match your questions.
MATCH (employee:Employee)-[:HAS_SKILL]->(skill:Skill),
(employee)<-[:DEALS]-(game:Game)-[:LOCATION]->(table:Table),
(game)-[:BEGINS]->(hour:H)<-[*]-(day:D)<-[*]-(month:M)<-[*]-(year:Y)
WHERE skill.type = "Blackjack" AND
day.day = 17 AND
month.month = 1 AND
year.year = 2014
RETURN employee, skill, game, table
The above query finds the sub-graph for all employees who have the skill Blackjack and their table and location on a specific date (1/17/14).
To do this in SQL would be very difficult. The next thing you need to think about is importing your data into a Neo4j database. If you're curious on how to do that please look at other questions here on SO and if you need more help, feel free to post another question or reach out to me on Twitter #kennybastani.
Cheers,
Kenny

Proper way to model user groups

So I have this application that I'm drawing up and I start to think about my users. Well, My initial thought was to create a table for each group type. I've been thinking this over though and I'm not sure that this is the best way.
Example:
// Users
Users [id, name, email, age, etc]
// User Groups
Player [id, years playing, etc]
Ref [id, certified, etc]
Manufacturer Rep [id, years employed, etc]
So everyone would be making an account, but each user would have a different group. They can also be in multiple different groups. Each group has it's own list of different columns. So what is the best way to do this? Lets say I have 5 groups. Do I need 8 tables + a relational table connecting each one to the user table?
I just want to be sure that this is the best way to organize it before I build it.
Edit:
A player would have columns regarding the gear that they use to play, the teams they've played with, events they've gone to.
A ref would have info regarding the certifications they have and the events they've reffed.
Manufacturer reps would have info regarding their position within the company they rep.
A parent would have information regarding how long they've been involved with the sport, perhaps relations with the users they are parent of.
Just as an example.
Edit 2:
**Player Table
id
user id
started date
stopped date
rank
**Ref Table
id
user id
started date
stopped date
is certified
certified by
verified
**Photographer / Videographer / News Reporter Table
id
user id
started date
stopped date
worked under name
website / channel link
about
verified
**Tournament / Big Game Rep Table
id
user id
started date
stopped date
position
tourney id
verified
**Store / Field / Manufacturer Rep Table
id
user id
started date
stopped date
position
store / field / man. id
verified
This is what I planned out so far. I'm still new to this so I could be doing it completely wrong. And it's only five groups. It was more until I condensed it some.
Although I find it weird having so many entities which are different from each other, but I will ignore this and get to the question.
It depends on the group criteria you need, in the case you described where each group has its own columns and information I guess your design is a good one, especially if you need the information in a readable form in the database. If you need all groups in a single table you will have to save the group relevant information in a kind of object, either a blob, XML string or any other form, but then you will lose the ability to filter on these criteria using the database.
In a relational Database I would do it using the design you described.
The design of your tables greatly depends on the requirements of your software.
E.g. your description of users led me in a wrong direction, I was at first thinking about a "normal" user of a software. Basically name, login-information and stuff like that. This I would never split over different tables as it really makes tasks like login, session handling, ... really complicated.
Another point which surprised me, was that you want to store the equipment in columns of those user's tables. Usually the relationship between a person and his equipment is not 1 to 1 and in most cases the amount of different equipment varies. Thus you usually have a relationship between users and their equipment (1:n). Thus you would design an equipment table and there refer to the owner's user id.
But after you have an idea of which data you have in your application and which relationships exist between your data, the design of the tables and so on is rather straitforward.
The good news is, that your data model and database design will develop over time. Try to start with a basic model, covering the majority of your use cases. Then slowly add more use cases / aspects.
As long as you are in the stage of planning and early implementation phasis, it is rather easy to change your database design.

Interesting Database Architecture Scenario

I am currently in the process of rolling a custom order-processing system. My current structure is pretty standard, invoices, purchase orders, and items are all kept in separate tables. Items know which form(s) they are on by keeping track of the form's id, but forms don't know what items are in them (in the database). This was all well and good until I had a new requirement added to the mix: stocking orders.
The way a stocking order works is that sometimes a customer places an order for more units than what is in stock, so we want to place an order with our supplier for enough units to fulfill the order and replenish our stock. However, we often have to build these orders up as the minimums are pretty high, so one stocking order is usually comprised of several customer orders (sometimes for the same item) PLUS a few line items that are NOT connected to an order and are just for stocking purposes.
This presents a problem with my current architecture because I now need to keep track of what comes in from the stocking orders as often suppliers ship partial orders, where items have been allocated, and which incoming items are for stock.
My initial idea was to create a new database table that mostly mimics the items table, but is kind of like an aggregate (but not calculated) table that only keeps track of the items and their corresponding metadata (how many units received, how many for stock, etc) for only the stocking orders. I would have to keep the two tables in synch if something changed from one of them (like a quantity).
Is this overkill, and maybe there's a better way to do it? Database architecture is definitely not my forte, so I was hoping that someone could either tell me that this is an ok way to do things or that there is a better, more correct way to do it.
Thanks so much!
For what it's worth, what I'm using: VB, .NET 4.0, MySQL 5.0
Also, if you want clarification on anything, please ask! I will be closely monitoring this question.
Visit databaseanswers.com. Navigate to "free data models", and look for "Inventory Management". You should find some good examples.
you didnt mention them but you will need tables for:
SUPPLIERS, ORDERS, and INVENTORY
also, the base tables you mention 'knowing about' - these probably need associative style many to many tables which tell you things like which items are on which order, and which suppliers supply which items, lead times, costs etc.
it would be helpful to see your actual schema.
I use a single Documents table, with a DocType field. Client documents (Order, Invoice, ProForma, Delivery, Credit Notes) are mixed with Suppliers documents (PO, Reception).
This makes it quite easy to calculate Client backorders, Supplier backorders, etc...
I am just a bit unhappy because I have different tables for SUPPLIERS and CLIENTS, and therefore the DOCUMENTS table has both a SupplierId field and a ClientId field, which is a bit bad. If I had to redo it I might consider a single Companies table containing both Clients and Suppliers.
I use a PK (DocId) that's autoincrement, and a unique key (DocNum) that's like XYY00000, with
x= doc type
YY= year
00000 = increment.
This is because a doc can be saved but is only at validation time it receives a DocNum.
To track backorders (Supplier or Client), you will need to have a Grouping field in the DocDetails table, so that if you have an Order line 12345, you copy that Link field to every Detail line related to it (invoice, Delivery).
Hope I am not to confusing. The thing works well, after 3 years and over 50,000 docs.
This approach also implies that you will have a stock holding which is allocated for orders - without individual item tracking its a bit tricky to manage this. Consider, customer A orders
3 pink widgets
1 blue widget
But you only have 1 pink widget in stock - you order 3 pink widgets, and 1 blue.
Customer B orders
2 pink widgets
But you've still only got 1 in stock - you order another pink widget
3 pink widgets arrive from the first supplier order. What are you going to do? You can either reserve all of them for customer A's order and wait for the blue and red widget to arrive, or you can fulfill customer B's order.
What if the lead time on a pink widget is 3 days and for a blue widget it's 3 weeks? Do you ship partial orders to your customers? Do you have a limit on the amount of stock you will hold?
Just keeping a new table of backorders is not going to suffice.
This stuff gets scary complicated really quickly. You certainly need to spend a lot more time analysing the problem.