I'm a little stumped on whether i can make this process of changing addresses easier. I'll explain the situation:
Basically I have three entities, Students, Addresses, StudentsAddresses. Students have many addresses, since they can change alot and rapidly (especially foster kids / homeless kids). So ill be changing them a lot. However based on each address I Want a user to attach (enter it via the UI) the price it would cost to pick that student up via bus service. So my initial thought was, ok, let me attach a column onto my join table 'StudentsAddresses' called 'dailyPrice', this is the cost for each day a student is picked up, and another column called 'adjustmentPrice', which is an additional cost for whatever special circumstance that requires extra work to pick up a student. Is my thinking going to cause me problems the more students I have in the future? Will it get harder to manage?
Another option I thought about, was creating a new Table called Pricing. And another join-type table called StudentsAddressesPricing
StudentsAddressPricing has three columns,
studentId
addressId
pricingId
each field connects the three together. So if i ever needed Students, with their addresses, and the pricing, i would query this table and eager load Students, Addresses, and Pricing. Does this approach seem much cleaner since i've abstracted pricing out a bit? Trying to determine the best way to go about this without having to many headaches in the future incase I wan't to add more attributes pricing related, or address related.
And then I even thought, hey what if pricing is just different for one day? How would I even consider that. Would I need a different kind of entity to handle that? Is doing alot of joins going to hurt my application performance?
Just looking for some insight on how others would do it, and criticism on why im off the ball.
The main question you should ask yourself is: on what does the price depend?
If the price is determined by the address, you might as well add it to addresses. If the price also depends on the student (e.g., depending on their financial situation), it would make sense to put it into studentsaddresses.
In other words: The table where the price is stored should have foreign keys to everything outside the table that determines the price. If that makes it fit into one of the existing tables, keep it there.
Related
If I have a list of theatres and in each theatre there are several classes of tickets eg. Rs.120, Rs.100 etc. These classes will apply for morning, noon and night shows. So all the classes of tickets will be available for all the shows(Many to Many Relationship) I need to model this as a database. I have a problem in modelling the classes and show timings. This makes the data base redundant.
Input Excel data
A good rule of thumb, is when you hit redundant data, make a new table.
Here is how I would break it down, though you could break it down further (also see a term normalize):
Tables:
theater_tbl
ticket_tbl
classes_tbl
relate each ticket to a class, and each theater may sell one or more tickets of any given class.
Information like address of theater go with theater_tbl
Ticket pricing would go in the ticket table under the type of ticket, unless I misunderstand what a class of ticket is, then the pricing should go there.
The time of day a ticket relates to should go in the ticket table.
This should get you started. To go further, you could break down show times into another table, and relate classes/tickets to those show times.
Its hard to draw out without a solid example.
We presently use a pen/paper based roster to manage table games staff at the casino. Each row is an employee, each column is a 20 minute block of time and each cell represents what table the employee is assigned to, or alternatively they've been assigned to a break. The start and end time of shifts for employees vary as do the games/skills they can deal. We need to keep a copy of the rosters for 7 years, with paper this is fairly easy, I'm wanting to develop a digital application and am having difficulty how to store the data in a database for archiving.
I'm fairly new to working with databases, I think I understand how to model the data for a graph database like neo4j, but I had difficulty when it came to working with time. I've tried to learn about RDBMS databases like MySQL, below is how I think the data should be modelled. Please point out if I'm going in the wrong direction or if a different database type would be more appropriate, it would be greatly appreciated!
Basic Data
Here is some basic data to work with before we factor in scheduling/time.
Employee
- ID Number
- Name
- Skills (Blackjack, Baccarat, Roulette, etc)
Table
- ID Number
- Skill/Type (Can only be one skill)
It may be better to store the roster data as a file like JSON instead? Time sensitive data wouldn't be so much of a problem then. The benefit of going digital with a database would be queries, these could help assist time consuming tasks where human error is common.
Possible Queries
Note: Staff that are on shift are either on a break or on the floor (assigned to a table), Skills have a major or minor type based on difficulty to learn.
What staff have been on the floor for 80 minutes or more? (They are due for a break)
What open tables can I assign this employee to based on their skillset?
I need an employee that has Baccarat skill but is not already been assigned to a Baccarat table.
What employee(s) was on this table during this period of time?
Where was this employee at this point in time?
Who is on shift right now?
How many staff on shift can deal Blackjack?
How many staff have 3 major skills?
What staff have had the Baccarat skill for at least 3 months?
These queries could also be sorted by alphabetical order or time, skill etc.
I'm pretty sure I know how to perform these queries with cypher for neo4j provided I model the data right. I'm not as knowledgeable with SQL queries, I've read it can get a bit complicated depending on the query and structure.
----------------------------------------------------------------------------------------
MYSQL Specific
An employee table could contain properties such as their ID number and Name, but am I right that for their skills and shifts these would be separate tables that reference the employee by a unique integer(I think this is called a foreign key?).
Another table could store the gaming Tables, these would have their own ID and reference a skill/gametype with a foreign key.
To record data like the pen/paper roster, each day could have a table with columns starting from 0000 increasing by 20 in value going all the way to 2340? Prior to the time columns I could have one for staff where each employee is represented with their foreign key, the time columns would then have foreign keys to the assigned gaming Tables, the row data is bound to have many cells that aren't populated since the employee shift won't be 24/7. If I'm using foreign keys to reference gaming Tables I now have a problem when the employee is on break? Unless I treat say the first gaming Table entry as a break?
I may need to further complicate things though, management will over time try different gaming Table layouts, some of the gaming Tables can be converted from say Blackjack to Baccarat. this is bound to happen quite a bit over 7 years, would I want to be creating new gaming Table entries or add a column to use a foreign key and refer to a new table that stores the history of game types during periods of time? Employees will also learn to deal new games during their career, very rarely they may also have the skill removed.
----------------------------------------------------------------------------------------
Neo4j Specific
With this data would I have an Employee and a Table node that have "isA" relationship edges mapping to actual employees or tables?
I imagine with the skills for the two types I would be best with a Skill node and establish relationships like so?: Blackjack->isA->Skill, Employee->hasSkill->Blackjack, Table->typeIs->Blackjack?
TIME
I find difficulty when I want this database to now work with a timeline. I've come across the following suggestions for connecting nodes with time:
Unix Epoch seems to be a common recommendation?
Connecting nodes to a year/month/day graph?
Lucene timeline? (I don't know much about this or how to work with it, have seen some mention it)
And some cases with how time and data relate:
Staff have varied days and start/end times from week to week, this could be shift node with properties {shiftStart,shiftEnd,actualStart,actualEnd}, staff may arrive late or get sick during shift. Would this be the right way to link each shift to an employee? Employee(node)->Shifts(groupNode)->Shift(node)
Tables and Staff may have skill data modified, with archived data this could be an issue, I think the solution is to have time property on the relationship to the skill?
We open and close tables throughout the day, each table has open/close times for each day, this could change in a month depending on what management wants, in addition the times are not strict, for various reasons a manager may open or close tables during the shift. The open/closed status of a table node may only be relevant for queries during the shift, which confuses me as I'd want this for queries but for archiving with time it might not make sense?
It's with queries that I have trouble deciding when to use a node or add a property to a node. For an Employee they have a name and ID number, if I wanted to find an employee by their ID number would it be better to have that as a node of it's own? It would be more direct right, instead of going through all employees for that unique ID number.
I've also come across labels just recently, I can understand that those would be useful for typing employee and table nodes rather than grouping them under a node. With the shifts for an employee I think should continue to be grouped with a shifts node, If I were to do cypher queries for employees working shifts through a time period a label might be appropriate, however should it be applied to individual shift nodes or the shifts group node that links back to the employee? I might need to add a property to individual shift nodes or the relationship to the shifts group node? I'm not sure if there should be a shifts group node, I'm assuming that reducing the edges connecting to the employee node would be optimal for queries.
----------------------------------------------------------------------------------------
If there are any great resources I can learn about database development that'd be great, there is so much information and options out there it's difficult to know what to begin with. Thanks for your time :)
Thanks for spending the time to put a quality question together. Your requirements are great and your specifications of your system are very detailed. I was able to translate your specs into a graph data model for Neo4j. See below.
Above you'll see a fairly explanatory graph data model. In case you are unfamiliar with this, I suggest reading Graph Databases: http://graphdatabases.com/ -- This website you can get a free digital PDF copy of the book but in case you want to buy a hard copy you can find it on Amazon.
Let's break down the graph model in the image. At the top you'll see a time indexing structure that is (Year)->(Month)->(Day)->(Hour), which I have abbreviated as Y M D H. The ellipses indicate that the graph is continuing, but for the sake of space on the screen I've only showed a sub-graph.
This time index gives you a way to generate time series or ask certain questions on your data model that are time specific. Very useful.
The bottom portion of the image contains your enterprise data model for your casino. The nodes represent your business objects:
Game
Table
Employee
Skill
What's great about graph databases is that you can look at this image and semantically understand the language of your question by jumping from one node to another by their relationships.
Here is a Cypher query you can use to ask your questions about the data model. You can just tweak it slightly to match your questions.
MATCH (employee:Employee)-[:HAS_SKILL]->(skill:Skill),
(employee)<-[:DEALS]-(game:Game)-[:LOCATION]->(table:Table),
(game)-[:BEGINS]->(hour:H)<-[*]-(day:D)<-[*]-(month:M)<-[*]-(year:Y)
WHERE skill.type = "Blackjack" AND
day.day = 17 AND
month.month = 1 AND
year.year = 2014
RETURN employee, skill, game, table
The above query finds the sub-graph for all employees who have the skill Blackjack and their table and location on a specific date (1/17/14).
To do this in SQL would be very difficult. The next thing you need to think about is importing your data into a Neo4j database. If you're curious on how to do that please look at other questions here on SO and if you need more help, feel free to post another question or reach out to me on Twitter #kennybastani.
Cheers,
Kenny
I'm setting up a database to run practice management software for lawsuits. When adding people associated with the suit, some of them will be repeat parties (eg lawyers for the firm) and some will be one-off parties (witnesses, etc). Looking for input on whether to make 1 "case users" table with values for a user id as well as the rest of the info for the one-off parties, or make 2 tables, one being "case users-firm" with 2 columns for the case and the user id, and another "case users-other" with the one-off party information.
It's pretty common to have a "Persons" table, filled with things common to all people like first name, last name, and a primary key. Then store that key everywhere you might want a person. Who knows? Your lawyer might be a witness. No need to duplicate the entry, when they are in fact the same person.
I don't see why you would want to have two tables, especially if they are going to have mostly the same fields. On the other hand, if you want to keep a lot more info for the attorneys than for the witnesses (or the other way around) two tables could be beneficial...
Just one other point to add. For privacy and data protection it is best not to store, encourage your users to store, or even set up a framework for storing, any more personal data than you actually need for your purposes. ( In the UK, think 'Data Protection' law ). So unless you are planning on evolving a kind of legal-profession facebook, I would keep the tables separate on this occasion.
currently Im working on a project that, at first glance, will require many tables in a database. Most of the tables are fairly straightforward however I do have an issue. One of the tables will be a list of members for the website, things like username, password, contact info, bio, education, etc will be included. This is a simple design, however, there is also a need for each member to have their availability entered and store in the database as well. Availability is defined as a date and time range. Like available on 4/5/2011 from 1pm to 6pm EST, or NOT available every friday after 8pm EST. For a single user, this could be a table on its own, but for many users, Im not sure how to go about organizing the data in a manageable fashion. First thought would be to have code to create a table for each user, but that could mean alot of tables in the database in addition to the few I have for other site functions. Logically i could use the username appended to Avail_ or something for the table name ie: Avail_UserBob and then query that as needed. But im curious if anyone can think of a better option than having the potential of hundreds of tables in a single database.
edit
So general agreement would be to have a table for members, unique key being ID for instance. Then have a second table for availability (date, start time, end time, boolean for available or not, and id of member this applies to). Django might sound nice and work well, but i dont have the time to spend learning another framework while working on this project. The 2 table method seems plausable but Im worried about the extra coding required for features that will utilize the availability times to A) build a calender like page to add, edit, or remove entered values, and B) match availabilities with entries from another table that lists games. While I might have more coding, I can live with that as long as the database is sound, functional, and not so messy. Thanks for the input guys.
Not to sound like a troll, but you should take a look into using a web framework to build most of this for you. I'd suggest taking a look at Django. With it you can define the type of fields you wish to store (and how they relate) and Django builds all the SQL statements to make it so. You get a nice admin interface for free so staff can login and add/edit/etc.
You also don't have to worry about building the login/auth/change password, etc. forms. all that session stuff is taken care of by Django. You get to focus on what makes your project/app unique.
And it allow you to build your project really, really fast.
djangoproject.org
I don't have any other framework suggestions that meet your needs. I do... but I think Django will fit the bill.
Create a table to store users. Use its primary key as foreign key in other tables.
The databases are written to hold many many rows in a table. There are not optimized for table creation. So it is not a good idea to create a new table for each user. Instead give each user an unique identifier and put the availability in a separate table. Provide an additional flag to make an entry valid or invalid.
Create a table of users; then create a table of availabilities per user. Don't try to cram availabilities into the user table: that will guarantee giant grief for you later on; and you'll find you have to create an availabilities table then.
Google database normalization to get an idea why.
Take it as truth from one who has suffered such self-inflicted grief :-)
I'm planning a database who has a couple of tables who contain plenty of address information, city, zip code, email address, phone #, fax #, and so on (about 11 columns worth of it), a table is an organizations table containing (up to) 2 addresses (legal contacts and contacts they should actually be used), plus every user has the same information tied to him.
We are going to have to run some geolocation stuff on those addresses too (like every address that's within X Kilometers from another address).
I have a bunch of options, each with its own problem:
I could put all the information inside every table but that would make for tables with a very large amount of columns which I'd have problems indexing, and if I change my address format it'll take a while to fix it.
I could put all the information inside an array and serialize it, then store the serialized information in one field, same problem with the previous method with a little less columns and much less availability through mysql queries
I could create a separate table with address information and link it to the other tables either by
putting an address_id column in the users and organizations table
putting a related_id and related_table columns in the addresses table
That should keep stuff tidier, but it might create some unforeseen problems with excessive joining or whatever.
Personally I think that solution 3.2 is the best, but I'm not too confident about it, so I'm asking for opinions.
Option 2 is definitely out as it would put the filtering logic into your codes instead of letting the DBMS handle them.
Option 1 or 3 will depend on your need.
if you need fast access to all the data, and you usually access both addresses along with the organization information, then you might consider option 1. But this will make it difficult to query out (i.e. slow) if the table get too big in mysql.
option 3 is good provided you index the tables correctly.