Which of two relational diagrams is normal? - relational-database

Transaction is an object which may describe a recurring transaction (an object of it "says" such things as "we receive a $10 payment every month from this PayPal user").
Organization is a customer which may pay us and in return receive some services. Currently one organization can receive max one service, but this may change in the future.
PricingPlan is something like "Gold $20/month".
Purchase links together a pricing plan with a transaction.
Note that we cannot (or can but with a hardship) add new fields to Transaction model, because it is a part of a reusable module. The rest three models are under our complete control.
I need to make a relational DB structure from these four models.
I've drawn two possible relations between the models. Which of the two: the top one or the bottom one is more normal than the other?

Related

ER Diagram, Physical Data Model Relations

I am trying to create a very simple database Supermarket management system.
And it seems that I am having a problem with how relations work between entities, I am using PowerDesigner to create the ERD and then generate everything from it(LDM, PDM, OOM). Is this a bad idea?.
Now for my main problem It's between these 3 tables:
Employee(Cashier)
Customer
Orders(Receipt).
The way I did it is:
The customer gather the products he wants to buy and present it to the employee, then the employee gets the order for the customer from the machine, so:
There is a relation between the Customer and the Employee (Many to Many) : each customer can request_order from one or more Employee and each Employee can get_order to one or more Customer.
There is a relation between the Employee and the Orders (1 To Many) : each Employee can get one or more orders, each order is fetched by one employee.
The problem is if I want to know the customer related to that specific order......I can't.
How do I fix this? How can I get the specific order that customer made.
I am still very new to this, so sorry for any obvious mistakes.
I am sticking to the Relational Database context, that you have tagged.
Data Modelling is an iterative process. There is a lot more definition that is needed, before the data model can be complete. Rather than answering the specifics that you request, which would be limited to one iteration; one increment, allow me to provide something more complete, several iterations progressed.
If it is useful, please discuss this data model, and progress it to fulfil all your requirements.
Of course it is too small as an inline graphic. As a PDF Supermarket Data Model.
The Standard for Relational Data Modelling since 1983 is IDEF1X. For those unfamiliar with the Standard, refer to the short IDEF1X Introduction.
I am using PowerDesigner to create the ERD and then generate everything from it (LDM, PDM, OOM). Is this a bad idea?.
PowerDesigner is great. Just ignore the Oracle-specific nonsense, it pushes you into considering the physical far too early.
Skip the ERD, it is brain-dead in the context of the Relational paradigm, and surpassed by IDEF1X, which is specific to that paradigm.
Use the Entity Level display for ERD equivalence.
For small projects you can ignore the academic distinctions {CDM; LDM; PDM; OOM, etc}.
There is actually just one model: it is "conceptual" at the beginning, and you just progress to "logical", and last, when the "logical" is stable, to the "physical".
Understand that the whole process is Logical.
Unfortunately, in PD you have to have separate "models" or files for each.
Now for my main problem It's between these 3 tables:
I have solved that issue. And exposed others.
each customer can request_order from one or more Employee and each Employee can get_order to one or more Customer
each Employee can get_order to one or more Customer
Yes, but that is the overall result. In each shopping or presentation instance:
a customer can request_order from one Employee (Cashier)
a Employee can get_order from one Customer
The problem is if I want to know the customer related to that specific order......I can't. How do I fix this?
Solved: Each Order is Identified by (CustomerId, DateTime), ie. the Customer who created the Order.
Note
Do not mix Process elements (eg. Get_Order) with Data elements (eg. the data model). The two areas are separate, and governed by quite different science. Here we are solving the Data; only the Data; and nothing but the Data. After that, the Process Model is easy.
RecordIds are anti-Relational. They are certainly not needed in a Relational database. Read my other Answers for detailed explanations.
Relational Keys (aka Compound Keys or Composite Keys) are standard fare in a Relational database. They provide far more integrity than a RecordId based file ever can.
You need to be more precise (state the exact sequence) in defining how an Order is created.
Please feel free to comment or ask specific questions.

Database Design for sub columns or many to many relations

If I have a list of theatres and in each theatre there are several classes of tickets eg. Rs.120, Rs.100 etc. These classes will apply for morning, noon and night shows. So all the classes of tickets will be available for all the shows(Many to Many Relationship) I need to model this as a database. I have a problem in modelling the classes and show timings. This makes the data base redundant.
Input Excel data
A good rule of thumb, is when you hit redundant data, make a new table.
Here is how I would break it down, though you could break it down further (also see a term normalize):
Tables:
theater_tbl
ticket_tbl
classes_tbl
relate each ticket to a class, and each theater may sell one or more tickets of any given class.
Information like address of theater go with theater_tbl
Ticket pricing would go in the ticket table under the type of ticket, unless I misunderstand what a class of ticket is, then the pricing should go there.
The time of day a ticket relates to should go in the ticket table.
This should get you started. To go further, you could break down show times into another table, and relate classes/tickets to those show times.
Its hard to draw out without a solid example.

Database design & normalization

I'm creating a messaging system for a e-learning platform and there are some design concerns that I'd like some feedback on.
First of all, it is important for me and my system to be highly modifiable in the future. As such, maintaining a fairly high normalization across my tables is important.
On to how my system will work:
All members (students or teachers) are part of a virtual classroom.
Teachers can create tasks and exercises in these classrooms and assign them to one or multiple students (member_task table not illustrated).
A student can request help for a specific task or exercise by sending a message to the teachers of the classroom.
Messages sent by students are sent to all the teachers. They cannot address a message to a specific teacher.
Messages sent by teachers can be addressed to one or more students.
Students cannot send messages to other students.
Messages behave like chat, meaning that a private conversation starts between a student and all teachers when they send a message.
Here's the ER diagram I made:
So my question is, is this table normalized properly for my purpose? Is there anything that can be done to reduce redundancy of data across my tables? And out of curiosity, is it in BCNF?
Another question: I don't intend to ever implement delete features anywhere in my system. Only "archiving" where said classroom/task/member/message/whatever is simply hidden/deactivated. So is there any reason to actually use FK?
EDIT: Also, a friend brought to my attention that the Conversations table might be redundant, and it kinda feels so. Thoughts?
Thanks.
In response to your emphasis on "modifiability" which I'm taking to mean with respect to application and schema evolution I'm actually going to suggest a fairly extreme solution. Before that some notes some aspects you've mentioned. First, foreign keys represent meaningful constraints in your data. They should always be defined and enforced. Foreign keys are not there just for cascading delete. Second, the Conversations table is arguably redundant. It would make sense if you had a notion of "session" of chat which would correspond to a Conversation. Otherwise, you just have a bunch of messages throughout time. The Conversation table could also enable a many-to-many relation between messages and tasks/exercises if you wanted to have chats that simultaneously covered multiple exercises, for example.
Now for the extreme suggestion. You could use 6NF. In particular, you might look at its incarnation in anchor modeling. The most notable difference in this approach is each attribute is modeled as a different table. 6NF supports temporal databases (supported in anchor modeling via "historized" attributes/ties). This means handling situations like a student being associated to a task now but not later won't cause all their messages to disappear. Most relevant to you, all schema modifications are non-destructive and additive, so no old code breaks when you make a change.
There are downsides. First, it's a bit weird, and in particular anchor modeling (somewhat gratuitously?) introduces a bunch of new terms. Second, it produces weird queries for most relational databases which they may not optimize well. This can sometimes be resolved with materialized views. Third, at the physical level, every attribute is effectively nullable. Finally, the tooling and support, while present, is pretty young. In particular, for MySQL, you may only be "inspired by" what's provided on the anchor modeling site.
As far as the actual database model would go, it would look roughly similar. Anchor modeling uses the term "anchor" for roughly the same thing as an entity, and "tie" for roughly the same thing as a relation. For simplicity, dropping the Conversation relation (and thus directly connecting Message to Task), the image would be similar: you'd have an anchor for Classroom, Member, Message, and Task, and a tie replacing Recipient that you might called ReceivedMessage representing the relation of "member received message message". The attributes on your entities would be attribute nodes. Making the message attribute on the Message anchor historized would allow messages to be edited if desired and support a history of revisions.
One concern I have is that I don't see a Users table which will hold all the students and teachers info (login, email, system id, role, etc) but I assume there is something similar in our system?
Now, looking into the Members table: usually students change classes every semester or so and you don't want last semesters' students to receive new messages. I would suggest the following:
Members
=============
PK member_id
FK class_id
FK user_id
--------------
join_date
leave_date
active
role
The last two fields might be redundant:
active: is an alternative solution if you want to avoid using dates. This will become false when a user stops being member of this class. Since there is not delete feature, the Members entry has to be preserved for archive purposes (and historical log).
role: Depends on how you setup Users table and roles in your system. If a user entry has role field(s) then this is not needed. However, this field allows for the same user to assume different roles in different classes. Example: a 3rd year student, who was a member of this class 2 years ago, is now working as TA/LA (teaching/lab assistant) for the same class. This depends on how the institution works... in my BSc we had the "rule": anyone with grade > 8.5/10 in Java could volunteer to do workshops to other students (using uni's labs). Finally, this field if used as a mask or a constant, allows for roles to be extended (future-proof)
As for FKs I will always suggest using them for data consistency. Things can get really ugly really fast without FKs. The limitations they impose can be worked around and they are usually needed: What is the purpose of archiving a message with sender_id if the sender has been deleted by accident? Also, note that in most systems FKs are indexed which improves the performance of queries/joins.
Hope the above helps and not confuse things :)

How to avoid paying for the same item multiple times

I'm trying to draw a database design of an ecommerce, and fulfilment of order platform. The company currently has a distribution centre for fulfilling the orders. But they want to extend this to use its stores for a part of the fulfilment process. I have designed a database of "internet sales" and "store sales", but I am stuck on the fulfilment of the internet order, and I wonder if any of you can help me with this.
Scenario : When the customer places in an order, and the distribution centre doesn't have a stock of an item to ship to the customers, the item needs to be taken from one of the stores. This item is then sent to the customer.
But the problem is that I can't just take an item from a store, and then send it to the customers, because the item hasn't been sold in the store, its (store) stock database isn't going to be updated. If I put the item through the cash machine, the item is removed from the stock table, but there are two transactions for the same item - one transaction from the internet, and the other from the store.
I guess my question is, how do I go about processing internet orders, and avoid having two transactions on the same item?
Any helpful pointers on this issue is greatly appreciated.
Update : Here's what I have done so far after advice from Jo Douglass,
Database Design Here
Sorry, I can't post images, because I don't have enough points. And please note that the above database design isn't complete
It sounds like you have a Transaction entity, and you have or are planning on having some logic which ensures that when one of these is created for an Item, your system knows to deplete the stock level for the relevant location (either a store of the distribution centre).
You could use an entity which shows an Item being transferred from one location (a store) to another (a distribution centre), and then create some logic which works very similarly to your existing logic - depleting the stock level in the starting location, and increasing the stock level in your destination location. Then when you carry out the last part of the process (sending the item to the customer), you'll have a Transaction showing that and depleting the distribution centre's stock level. Depending on the rest of your model, you might carry this out via a change to the Transaction entity, or by creating a new entity altogether.
Alternatively, if that doesn't really model what's happening in the business very well, then maybe you just need to modify your logic (and possibly your model - hard to tell without seeing your existing model). Rather than only being able to create store transactions via use of the cash register, perhaps you simply need to be able to create a store transaction that's been kicked off via the Internet.
One idea is to go ahead and treat the item as sold from the store (through an online transaction) and credit the store's account with the sale price. The distributor has probably already received the wholesale price from the store so it's happy, the store gets credit for the sale (with at least some part of the shipping charges) so it's happy, and you don't have to create new transaction codes or any other modification to the existing database.

Database model for a 24/7 Staff roster at a casino

We presently use a pen/paper based roster to manage table games staff at the casino. Each row is an employee, each column is a 20 minute block of time and each cell represents what table the employee is assigned to, or alternatively they've been assigned to a break. The start and end time of shifts for employees vary as do the games/skills they can deal. We need to keep a copy of the rosters for 7 years, with paper this is fairly easy, I'm wanting to develop a digital application and am having difficulty how to store the data in a database for archiving.
I'm fairly new to working with databases, I think I understand how to model the data for a graph database like neo4j, but I had difficulty when it came to working with time. I've tried to learn about RDBMS databases like MySQL, below is how I think the data should be modelled. Please point out if I'm going in the wrong direction or if a different database type would be more appropriate, it would be greatly appreciated!
Basic Data
Here is some basic data to work with before we factor in scheduling/time.
Employee
- ID Number
- Name
- Skills (Blackjack, Baccarat, Roulette, etc)
Table
- ID Number
- Skill/Type (Can only be one skill)
It may be better to store the roster data as a file like JSON instead? Time sensitive data wouldn't be so much of a problem then. The benefit of going digital with a database would be queries, these could help assist time consuming tasks where human error is common.
Possible Queries
Note: Staff that are on shift are either on a break or on the floor (assigned to a table), Skills have a major or minor type based on difficulty to learn.
What staff have been on the floor for 80 minutes or more? (They are due for a break)
What open tables can I assign this employee to based on their skillset?
I need an employee that has Baccarat skill but is not already been assigned to a Baccarat table.
What employee(s) was on this table during this period of time?
Where was this employee at this point in time?
Who is on shift right now?
How many staff on shift can deal Blackjack?
How many staff have 3 major skills?
What staff have had the Baccarat skill for at least 3 months?
These queries could also be sorted by alphabetical order or time, skill etc.
I'm pretty sure I know how to perform these queries with cypher for neo4j provided I model the data right. I'm not as knowledgeable with SQL queries, I've read it can get a bit complicated depending on the query and structure.
----------------------------------------------------------------------------------------
MYSQL Specific
An employee table could contain properties such as their ID number and Name, but am I right that for their skills and shifts these would be separate tables that reference the employee by a unique integer(I think this is called a foreign key?).
Another table could store the gaming Tables, these would have their own ID and reference a skill/gametype with a foreign key.
To record data like the pen/paper roster, each day could have a table with columns starting from 0000 increasing by 20 in value going all the way to 2340? Prior to the time columns I could have one for staff where each employee is represented with their foreign key, the time columns would then have foreign keys to the assigned gaming Tables, the row data is bound to have many cells that aren't populated since the employee shift won't be 24/7. If I'm using foreign keys to reference gaming Tables I now have a problem when the employee is on break? Unless I treat say the first gaming Table entry as a break?
I may need to further complicate things though, management will over time try different gaming Table layouts, some of the gaming Tables can be converted from say Blackjack to Baccarat. this is bound to happen quite a bit over 7 years, would I want to be creating new gaming Table entries or add a column to use a foreign key and refer to a new table that stores the history of game types during periods of time? Employees will also learn to deal new games during their career, very rarely they may also have the skill removed.
----------------------------------------------------------------------------------------
Neo4j Specific
With this data would I have an Employee and a Table node that have "isA" relationship edges mapping to actual employees or tables?
I imagine with the skills for the two types I would be best with a Skill node and establish relationships like so?: Blackjack->isA->Skill, Employee->hasSkill->Blackjack, Table->typeIs->Blackjack?
TIME
I find difficulty when I want this database to now work with a timeline. I've come across the following suggestions for connecting nodes with time:
Unix Epoch seems to be a common recommendation?
Connecting nodes to a year/month/day graph?
Lucene timeline? (I don't know much about this or how to work with it, have seen some mention it)
And some cases with how time and data relate:
Staff have varied days and start/end times from week to week, this could be shift node with properties {shiftStart,shiftEnd,actualStart,actualEnd}, staff may arrive late or get sick during shift. Would this be the right way to link each shift to an employee? Employee(node)->Shifts(groupNode)->Shift(node)
Tables and Staff may have skill data modified, with archived data this could be an issue, I think the solution is to have time property on the relationship to the skill?
We open and close tables throughout the day, each table has open/close times for each day, this could change in a month depending on what management wants, in addition the times are not strict, for various reasons a manager may open or close tables during the shift. The open/closed status of a table node may only be relevant for queries during the shift, which confuses me as I'd want this for queries but for archiving with time it might not make sense?
It's with queries that I have trouble deciding when to use a node or add a property to a node. For an Employee they have a name and ID number, if I wanted to find an employee by their ID number would it be better to have that as a node of it's own? It would be more direct right, instead of going through all employees for that unique ID number.
I've also come across labels just recently, I can understand that those would be useful for typing employee and table nodes rather than grouping them under a node. With the shifts for an employee I think should continue to be grouped with a shifts node, If I were to do cypher queries for employees working shifts through a time period a label might be appropriate, however should it be applied to individual shift nodes or the shifts group node that links back to the employee? I might need to add a property to individual shift nodes or the relationship to the shifts group node? I'm not sure if there should be a shifts group node, I'm assuming that reducing the edges connecting to the employee node would be optimal for queries.
----------------------------------------------------------------------------------------
If there are any great resources I can learn about database development that'd be great, there is so much information and options out there it's difficult to know what to begin with. Thanks for your time :)
Thanks for spending the time to put a quality question together. Your requirements are great and your specifications of your system are very detailed. I was able to translate your specs into a graph data model for Neo4j. See below.
Above you'll see a fairly explanatory graph data model. In case you are unfamiliar with this, I suggest reading Graph Databases: http://graphdatabases.com/ -- This website you can get a free digital PDF copy of the book but in case you want to buy a hard copy you can find it on Amazon.
Let's break down the graph model in the image. At the top you'll see a time indexing structure that is (Year)->(Month)->(Day)->(Hour), which I have abbreviated as Y M D H. The ellipses indicate that the graph is continuing, but for the sake of space on the screen I've only showed a sub-graph.
This time index gives you a way to generate time series or ask certain questions on your data model that are time specific. Very useful.
The bottom portion of the image contains your enterprise data model for your casino. The nodes represent your business objects:
Game
Table
Employee
Skill
What's great about graph databases is that you can look at this image and semantically understand the language of your question by jumping from one node to another by their relationships.
Here is a Cypher query you can use to ask your questions about the data model. You can just tweak it slightly to match your questions.
MATCH (employee:Employee)-[:HAS_SKILL]->(skill:Skill),
(employee)<-[:DEALS]-(game:Game)-[:LOCATION]->(table:Table),
(game)-[:BEGINS]->(hour:H)<-[*]-(day:D)<-[*]-(month:M)<-[*]-(year:Y)
WHERE skill.type = "Blackjack" AND
day.day = 17 AND
month.month = 1 AND
year.year = 2014
RETURN employee, skill, game, table
The above query finds the sub-graph for all employees who have the skill Blackjack and their table and location on a specific date (1/17/14).
To do this in SQL would be very difficult. The next thing you need to think about is importing your data into a Neo4j database. If you're curious on how to do that please look at other questions here on SO and if you need more help, feel free to post another question or reach out to me on Twitter #kennybastani.
Cheers,
Kenny