I am creating a database which need to allow booking a resource from start time to end time on a particular day. For example, I have 11 badminton courts. These courts can be booked for 1 hour and it can very also and in a day each court takes 18 bookings from morning 6 am till night 12 pm. (considering each booking is for one hour). The price of booking also varies from day to day, for example morning charges are more than day charges. Weekend charges are more than weekdays charges.
Now my question is, is it advisable to pre-populate slots and then book it for user depending on the availability. But in this case for the abobe example If I need to store slots for next 1 month then I will have to store 11*18*30 = 5940 records in advance without any real bookings.Every midnight I will need to run script to create slots. If no of clubs increases this number can become huge. Is this good design for such systems? If not then what is the better designs in these scenerios.
club name||court || date || start_time || end_time || status || charge ||
a c1 20/04/2015 6:00 7:00 available
a c1 20/04/2015 7:00 8:00 available
.
.
.
a c1 20/04/2015 11:00 24:00 available
.
.
a c11 20/04/2015 11:00 24:00 available
Now my question is, is it advisable to pre-populate slots and then book it for user depending on the availability. But in this case for the abobe example If I need to store slots for next 1 month then I will have to store 11x18x30 = 5940 records in advance without any real bookings.Every midnight I will need to run script to create slots. If no of clubs increases this number can become huge.
Yes. that is a horrible method. For the reasons you have stated, plus many more.
The storage of non-facts is absurd
The storage of masses of non-facts cannot be justified
If the need to write simple code is an issue, deal with that squarely, and elevate your coding skills, such that it isn't an issue (instead of downgrading the database to a primitive filing system, in order to meet your coding skills).
Notice that what you are suggesting is a calendar for each court (which is not unreasonable as a visualisation, or as a result set), in which most of the slots will be empty (available).
Is this good design for such systems?
No, it is horrible.
It is not a design. It is an implementation without a design.
If not then what is the better designs in these scenerios.
We use databases. And given its unequalled position, and your platform, specifically Relational Database.
We store only those Facts that you need, about the real world that you need to engage with. We need to get away from visualising the thing we need for the job we have to do (thousands of calendars, partially empty) and think of the data, as data, and only as data. Including all the rules and constraints.
Following that, the determination of Facts, or the absence of a Fact, is dead easy. I can give you the Relational Database that you will need, but you have to be able to write SQL code, in order to use the database effectively.
Data Model
Try this:
Resource Reservation Data Model
That is an IDEF1X data model. IDEF1X is the Standard for modelling Relational Databases. Please be advised that every little tick; notch; and mark; the crows foot; the solid vs dashed lines; the square vs round corners; means something very specific and important. Refer to the IDEF1X Notation. If you do not understand the Notation, you will not be able to understand or work the model.
I have included:
Storage of Facts (Reservations) only. The non-fact or absence of a Fact (Availability) is easy enough to determine.
club_resource_slot.duration in the Key to allow any duration, rather than assuming one hour, which may change. It is required in any case, because it delimits the time slot.
resource_code, rather than court number. This allows any club resource (as well as a court number) to be reserved, rather than only a badminton or squash court. You may have meeting rooms in the future.
Joel's reply re the rate table is quite correct in terms of answering that specific question. I have given a simpler form in the context of the rest of the model (less Normalised, easier to code).
If you would like the Predicates, please ask.
Code/General
You seem to have problems with some aspects of coding, which I will address first:
But the problem in this approach is if I need to find the availability of court based on game,location, date and time slot then I will have to load this rate table for all the clubs and the look into actual booking table if someone has already booked the slots. Is nt the better approach be if I keep the slots in advance and then someone book , jst change the status to booked. so That query will be performed entirely in DB without doing any computation in memory.
The existence of the rate table, or not, does not create an issue. That can be accomplished via a join. The steps described are not necessary.
Note that you do not need to "load this whole table" as a matter of course, but you may have to load one table or other in order to populate your drop-downs, etc.
When someone books a court, simply INSERT reservation VALUES ()
When someone cancels a reservation, simply DELETE reservation VALUES ()
Code/Data Model
Printing your matrix of Reserved slots should be obvious, it is simple.
Printing your matrix of Available or Available plus Reserved (your calendar visual) requires Projection. If you do not understand this technique, please read this Answer. Once you understand that, the code is as simple as [1].
You need to be able to code Subqueries and Derived tables.
Determination of whether a slot is Reserved or Available requires a simple query. I will supply some example code to assist you. "Game" isn't specified, and I will assume location means club.
IF (
SELECT COUNT(*) -- resources/courts reserved
FROM reservation
WHERE club_code = $club_code
AND date_time = $date_time
) = 0
THEN PRINT "All courts available"
ELSE IF (
SELECT COUNT(*) -- resources/courts that exist
FROM club_resource_slot
WHERE club_code = $club_code
AND date_time = $date_time
) = (
SELECT COUNT(*) -- resources/courts reserved
FROM reservation
WHERE club_code = $club_code
AND date_time = $date_time
)
THEN PRINT "All courts reserved"
ELSE PRINT "Some courts available"
Please feel free to comment or ask questions.
Assuming that each booking is for one hour (that is, if someone wants two hours on the court, they're taking two bookings of one hour each) it seems to me the most efficient storage mechanism would be a table Booking with columns Court, Date, and Hour (and additional columns for the person who booked, payment stated, etc..) You would insert one record each time a court was booked for an hour.
This table would be sparsely populated, in that there would only be records for the booked hourly units, not for the available ones. No data would be pre-generated; you would only create records when a booking occurred.
To produce an daily or weekly calendar your application would retrieve the booked hours from the database and join this with its knowledge of your hours (6am to midnight) to produce a visualization of court availability.
It is probably much more efficient from a data maintenance perspective to have a table with courts (1 record per court) and a table with bookings (1 record per booking).
The BOOKING record should have a foreign key to the COURT a booking start date/time and a booking end date/time. It would also have information about who made the booking, which could be a foreign key to a CUSTOMER table or it might be a fill-in name, etc., depending on how your business works.
Related
I've got a annoying design issue when designing a database and it's models. Essentially, the database got clients and customers which should be able to make appointments with eachother. The clients should have their availability (on a general week basis) stored in the database, and this needs to be added to the appointment model. The solution does not require or want precise hours for the availability, just one value for each day - ranging from "not available", to "maybe available " to "available". The only solution i've come up with so far includes having all 7 days stored in a row for each client, but it looks nasty.
So here's some of what I got so far:
Client model:
ClientId
Service,
Fee
Customer-that-uses-Client model:
CustomerId
ServiceNeed
Availability-model:
ClientID (FK/PK)
Monday, (int)
...
...
Sunday (int)
And finally, appointment model:
AppointmentId
ClientID
CustomerID
StartDate
Hourse
Problem: is there any way i can redesign the avilability model to ... well, need less fields and still get each day stored with a (1-3) value depending on the clients availability ? Would also be really good if the appointment model wouldnt need to reference all that data from the availability model...
Problem
Answering the narrow question is easy. However, noting the Relational Database tag, there are a few problems in your model, that render it somewhat less than Relational.
Eg. the data content in each logical row needs to be unique. (Uniqueness on the Record id, which is physical, system-generated, and not from the data, cannot provide row uniqueness.) The Primary Key must be "made up from the data", which is of course the only way to make the data row unique.
Eg. values such as Day of availability and AvailabilityType are not constrained, and they need to be.
Relational Data Model
With the issues fixed, the answer looks like this:
Notation
All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993.
My IDEF1X Introduction is essential reading for those who are new to the Relational Model or data modelling.
Content
In the Relational Model, there is a large emphasis on constraining the data, such that the database as a whole contains only valid data.
The only solution i've come up with so far includes having all 7 days stored in a row for each client, but it looks nasty.
Yes. What you had was a repeating attribute (they are named Monday..Sunday, which may not look like a repeating attribute, but it is one, no less than a CSV list). That breaks Codd's Second Normal Form.
The solution is to place the single element in a subordinate table ProviderAvailable.
Day of availability and AvailabilityType are now constrained to a set of values.
The rows in Provider (sorry, the use of "Client" in this context grates on me) and Customer are now unique, due to addition of a Name. The users will not use an internal number to identify such entities, they will use a name, usually a ShortName.
Once the model is tightened up, and all the columns are defined, if Name (not a combination of LastName, FirstName, Initial) is unique, you can eliminate the RecordId, and elevate the Name AK to the PK.
Not Modelled
You have not asked, and I have not modelled these items, but I suspect they will come up as you progress in the development.
A Provider (Client) provides 1 Service. There may be more than 1 in future.
A Customer, seeking 1 Service, can make an Appointment with any Provider (who may or may not provide that Service). You may want to constrain each Appointment to a Provider who provides the sought Service.
As per my comment. It depends on how tight you want this Availability/Reservation system to be. Right now, there is nothing to prevent more than one Customer reserving one Provider on a particular Day, ie. a double-booking.
Normalize that availability table: instead of
ClientID (FK/PK)
Monday, (int)
...
...
Sunday (int)
go with
ClientID (PK/FK)
weekday integer value (0-6 or maybe 1-7) (PK)
availability integer value 1-3
This table has a compound primary key, made of (ClientID, weekday) because each client may have either zero or one entry for each of the seven weekdays.
In this table, you might have these rows:
43 2 3 (on Tuesdays = 2, client 43 is Available =3)
43 3 2 (on Wednesdays = 3, client 43 is MaybeAvailable =2)
If the row is missing, it means the client is unavailable. an availability value of 1 also means that.
I'm working on the Product Catalog module of an Invoicing application.
When the user creates a new invoice the product name field should be an autocomplete field which shows the most recently used products from the product catalog.
How can I store this "usage recency/frequency" in the database?
I'm thinking about adding a new field recency which would be increased by 1 every time the product was used, and decreased by 1/(count of all products), when an other product is used. Then use this recency field for ordering, but it doesn't seem to me the best solution.
Can you help me what is the best practice for this kind of problem?
Solution for the recency calculation:
Create a new column in the products table, named last_used_on for example. Its data type should be TIMESTAMP (the MySQL representation for the Unix-time).
Advantages:
Timestamps contains both date and time parts.
It makes possible VERY precise calculations and comparisons in regard
to dates and times.
It lets you format the saved values in the date-time format of your
choice.
You can convert from any date-time format into a timestamp.
In regard to your autocomplete fields, it allows you to filter
the products list as you wish. For example, to display all products
used since [date-time]. Or to fetch all products used between
[date-time-1] and [date-time-2]. Or get the products used only on Mondays, at 1:37:12 PM, in the last two years, two months and three
days (so flexible timestamps are).
Resources:
Unix-Time
The DATE, DATETIME, and TIMESTAMP Types
How should unix timestamps be stored in int columns?
How to convert human date to unix timestamp in Mysql?
Solution for the usage rate calculation:
Well, actually, you are not speaking about a frequency calculation, but about a rate - even though one can argue that frequency is a rate, too.
Frequency implies using the time as the reference unit and it's measured in Hertz (Hz = [1/second]). For example, let's say you want to query how many times a product was used in the last year.
A rate, on the other hand, is a comparison, a relation between two related units. Like for example the exchange rate USD/EUR - they are both currencies. If the comparison takes place between two terms of the same type, then the result is a number without measurement units: a percentage. Like: 50 apples / 273 apples = 0.1832 = 18.32%
That said, I suppose you tried to calculate the usage rate: the number of usages of a product in relation with the number of usages of all products. Like, for a product: usage rate of the product = 17 usages of the product / 112 total usages = 0.1517... = 15.17%. And in the autocomplete you'd want to display the products with a usage rate bigger than a given percentage (like 9% for example).
This is easy to implement. In the products table add a column usages of type int or bigint and simply increment its value each time a product is used. And then, when you want to fetch the most used products, just apply a filter like in this sql statement:
SELECT
id,
name,
(usages*100) / (SELECT sum(usages) as total_usages FROM products) as usage_rate
FROM products
GROUP BY id
HAVING usage_rate > 9
ORDER BY usage_rate DESC;
Here's a little study case:
In the end, recency, frequency and rate are three different things.
Good luck.
To allow for future flexibility, I'd suggest the following additional (*) table to store the entire history of product usage by all users:
Name: product_usage
Columns:
id - internal surrogate auto-incrementing primary key
product_id (int) - foreign key to product identifier
user_id (int) - foreign key to user identifier
timestamp (datetime) - date/time the product was used
This would allow the query to be fine tuned as necessary. E.g. you may decide to only order by past usage for the logged in user. Or perhaps total usage within a particular timeframe would be more relevant. Such a table may also have a dual purpose of auditing - e.g. to report on the most popular or unpopular products amongst all users.
(*) assuming something similar doesn't already exist in your database schema
Your problem is related to many other web-scale search applications, such as e.g. showing spell corrections, related searches, or "trending" topics. You recognized correctly that both recency and frequency are important criteria in determining "popular" suggestions. In practice, it is desirable to compromise between the two: Recency alone will suffer from random fluctuations; but you also don't want to use only frequency, since some products might have been purchased a lot in the past, but their popularity is declining (or they might have gone out of stock or replaced by successor models).
A very simple but effective implementation that is typically used in these scenarios is exponential smoothing. First of all, most of the time it suffices to update popularities at fixed intervals (say, once each day). Set a decay parameter α (say, .95) that tells you how much yesterday's orders count compared to today's. Similarly, orders from two days ago will be worth α*α~.9 times as today's, and so on. To estimate this parameter, note that the value decays to one half after log(.5)/log(α) days (about 14 days for α=.95).
The implementation only requires a single additional field per product,
orders_decayed. Then, all you have to do is to update this value each night with the total daily orders:
orders_decayed = α * orders_decayed + (1-α) * orders_today.
You can sort your applicable suggestions according to this value.
To have an individual user experience, you should not rely on a field in the product table, but rather on the history of the user.
The occurrences of the product in past invoices created by the user would be a good starting point. The advantage is that you don't need to add fields or tables for this functionality. You simply rely on data that is already present anyway.
Since it is an auto-complete field, maybe past usage is not really relevant. Display n search results as the user types. If you feel that results are better if you include recency in the calculation of the order, go with it.
Now, implementation may defer depending on how and when product should be displayed. Whether it has to be user specific usage frequency or application specific (overall). But, in both case, I would suggest to have a history table, which later you can use for other analysis.
You could design you history table with atleast below columns:
Id | ProductId | LastUsed (timestamp) | UserId
And, now you can create a view, which will query this table for specific time range (something like product frequency of last week, last month or last year) and will give you highest sold product for specific time range.
Same can be used for User's specific frequency by adding additional condition to filter by Userid.
I'm thinking about adding a new field recency which would be increased
by 1 every time the product was used, and decreased by 1/(count of all
products), when an other product is used. Then use this recency field
for ordering, but it doesn't seem to me the best solution.
Yes, it is not a good practice to add a column for this and update every time. Imagine, this product is most awaiting product and people love to buy it. Now, at a time, 1000 people or may be more requested for this product and for every request you are going to update same row, since to maintain the concurrency database has to lock that specific row and update for each request, which is definitely going to hit your database and application performance instead you can simply insert a new row.
The other possible solution is, you could use your existing invoice table as it will definitely have all product and user specific information and create a view to get frequently used product as I mentioned above.
Please note that, this is an another option to achieve what you are expecting. But, I would personally recommend to have history table instead.
The scenario
When the user creates a new invoice the product name field should be an autocomplete field which shows the most recently used products from the product catalogue.
your suggested solution
How can I store this "usage recency/frequency" in the database?
If it is a web application, don't store it in a Database in your server. Each user has different choices.
Store it in the user's browser as Cookie or Localstorage because it will improve the User Experience.
If you still want to store it in MySQL table,
Do the following
Create a column recency as said in question.
When each time the item used, increase the count by 1 as said in question.
Don't decrease it when other items get used.
To get the recent most used item,
query
SELECT * FROM table WHERE recence = (SELECT MAX(recence) FROM table);
Side note
Go for the database use only if you want to show the recent most used products without depending the user.
As you aren't certain on wich measure to choose, and it's rather user experience related problem, I advice you have a number of measures and provide a user an option to choose one he/she prefers. For example the set of available measures could include most popular product last week, last month, last 3 months, last year, overall total. For the sake of performance I'd prefer to store those statistics in a separate table which is refreshed by a scheduled job running every 3 hours for example.
I am working with money expiration tracking problem at the moment (originally it is not money, but I have used it as a more convenient example).
An user can earn money from a platform for some mysterious reason and spent them for buying stuff (products, gifts etc.).
I am looking an algorithm (SQL query best case) to find a current balance of an user balance.
The events of spending and earning money are stored different database (MySQL) tables (let's say user_earned and user_spent). So in normal case, I would simply count user totals from user_earned and subtract spent money (total of user_spent).
BUT! There is a condition, that earned user money expires in 2 years if they are not used.
That means, if user have not used his money or used just a part of it, they will expire. If an user uses his money, they are used from the oldest not expired money record, so the balance (bonus) could be calculated in user's favor.
These are 5 scenarios with events in time, to have a better understanding on the case:
Both tables (user_earned and user_spent) have timestamps for date tracking.
I did something similar in one of my projects.
Looks like you need an additional table spends_covering with columns
spend_id, earhed_id, sum
So for each spends record you need to insert one or many rows into the spends_covering to mark 'used' money.
Then balance would be just sum of not used where date is less than 2 years.
select sum(sub.earned_sum-sub.spent_sum) as balance
from
(select e.sum as earned_sum, sum(sc.sum) as spent_sum
from earned e
left join spends_covering sc on e.earhed_id=sc.earhed_id
where e.date BETWEEN ...
group by e.earhed_id
having earned_sum > spent_sum) sub
It may be worth it to have two tables -- one (or more) with all the historical details, one with just the current balances for each 'user'. Be sure to use transactions to keep the two in sync.
We presently use a pen/paper based roster to manage table games staff at the casino. Each row is an employee, each column is a 20 minute block of time and each cell represents what table the employee is assigned to, or alternatively they've been assigned to a break. The start and end time of shifts for employees vary as do the games/skills they can deal. We need to keep a copy of the rosters for 7 years, with paper this is fairly easy, I'm wanting to develop a digital application and am having difficulty how to store the data in a database for archiving.
I'm fairly new to working with databases, I think I understand how to model the data for a graph database like neo4j, but I had difficulty when it came to working with time. I've tried to learn about RDBMS databases like MySQL, below is how I think the data should be modelled. Please point out if I'm going in the wrong direction or if a different database type would be more appropriate, it would be greatly appreciated!
Basic Data
Here is some basic data to work with before we factor in scheduling/time.
Employee
- ID Number
- Name
- Skills (Blackjack, Baccarat, Roulette, etc)
Table
- ID Number
- Skill/Type (Can only be one skill)
It may be better to store the roster data as a file like JSON instead? Time sensitive data wouldn't be so much of a problem then. The benefit of going digital with a database would be queries, these could help assist time consuming tasks where human error is common.
Possible Queries
Note: Staff that are on shift are either on a break or on the floor (assigned to a table), Skills have a major or minor type based on difficulty to learn.
What staff have been on the floor for 80 minutes or more? (They are due for a break)
What open tables can I assign this employee to based on their skillset?
I need an employee that has Baccarat skill but is not already been assigned to a Baccarat table.
What employee(s) was on this table during this period of time?
Where was this employee at this point in time?
Who is on shift right now?
How many staff on shift can deal Blackjack?
How many staff have 3 major skills?
What staff have had the Baccarat skill for at least 3 months?
These queries could also be sorted by alphabetical order or time, skill etc.
I'm pretty sure I know how to perform these queries with cypher for neo4j provided I model the data right. I'm not as knowledgeable with SQL queries, I've read it can get a bit complicated depending on the query and structure.
----------------------------------------------------------------------------------------
MYSQL Specific
An employee table could contain properties such as their ID number and Name, but am I right that for their skills and shifts these would be separate tables that reference the employee by a unique integer(I think this is called a foreign key?).
Another table could store the gaming Tables, these would have their own ID and reference a skill/gametype with a foreign key.
To record data like the pen/paper roster, each day could have a table with columns starting from 0000 increasing by 20 in value going all the way to 2340? Prior to the time columns I could have one for staff where each employee is represented with their foreign key, the time columns would then have foreign keys to the assigned gaming Tables, the row data is bound to have many cells that aren't populated since the employee shift won't be 24/7. If I'm using foreign keys to reference gaming Tables I now have a problem when the employee is on break? Unless I treat say the first gaming Table entry as a break?
I may need to further complicate things though, management will over time try different gaming Table layouts, some of the gaming Tables can be converted from say Blackjack to Baccarat. this is bound to happen quite a bit over 7 years, would I want to be creating new gaming Table entries or add a column to use a foreign key and refer to a new table that stores the history of game types during periods of time? Employees will also learn to deal new games during their career, very rarely they may also have the skill removed.
----------------------------------------------------------------------------------------
Neo4j Specific
With this data would I have an Employee and a Table node that have "isA" relationship edges mapping to actual employees or tables?
I imagine with the skills for the two types I would be best with a Skill node and establish relationships like so?: Blackjack->isA->Skill, Employee->hasSkill->Blackjack, Table->typeIs->Blackjack?
TIME
I find difficulty when I want this database to now work with a timeline. I've come across the following suggestions for connecting nodes with time:
Unix Epoch seems to be a common recommendation?
Connecting nodes to a year/month/day graph?
Lucene timeline? (I don't know much about this or how to work with it, have seen some mention it)
And some cases with how time and data relate:
Staff have varied days and start/end times from week to week, this could be shift node with properties {shiftStart,shiftEnd,actualStart,actualEnd}, staff may arrive late or get sick during shift. Would this be the right way to link each shift to an employee? Employee(node)->Shifts(groupNode)->Shift(node)
Tables and Staff may have skill data modified, with archived data this could be an issue, I think the solution is to have time property on the relationship to the skill?
We open and close tables throughout the day, each table has open/close times for each day, this could change in a month depending on what management wants, in addition the times are not strict, for various reasons a manager may open or close tables during the shift. The open/closed status of a table node may only be relevant for queries during the shift, which confuses me as I'd want this for queries but for archiving with time it might not make sense?
It's with queries that I have trouble deciding when to use a node or add a property to a node. For an Employee they have a name and ID number, if I wanted to find an employee by their ID number would it be better to have that as a node of it's own? It would be more direct right, instead of going through all employees for that unique ID number.
I've also come across labels just recently, I can understand that those would be useful for typing employee and table nodes rather than grouping them under a node. With the shifts for an employee I think should continue to be grouped with a shifts node, If I were to do cypher queries for employees working shifts through a time period a label might be appropriate, however should it be applied to individual shift nodes or the shifts group node that links back to the employee? I might need to add a property to individual shift nodes or the relationship to the shifts group node? I'm not sure if there should be a shifts group node, I'm assuming that reducing the edges connecting to the employee node would be optimal for queries.
----------------------------------------------------------------------------------------
If there are any great resources I can learn about database development that'd be great, there is so much information and options out there it's difficult to know what to begin with. Thanks for your time :)
Thanks for spending the time to put a quality question together. Your requirements are great and your specifications of your system are very detailed. I was able to translate your specs into a graph data model for Neo4j. See below.
Above you'll see a fairly explanatory graph data model. In case you are unfamiliar with this, I suggest reading Graph Databases: http://graphdatabases.com/ -- This website you can get a free digital PDF copy of the book but in case you want to buy a hard copy you can find it on Amazon.
Let's break down the graph model in the image. At the top you'll see a time indexing structure that is (Year)->(Month)->(Day)->(Hour), which I have abbreviated as Y M D H. The ellipses indicate that the graph is continuing, but for the sake of space on the screen I've only showed a sub-graph.
This time index gives you a way to generate time series or ask certain questions on your data model that are time specific. Very useful.
The bottom portion of the image contains your enterprise data model for your casino. The nodes represent your business objects:
Game
Table
Employee
Skill
What's great about graph databases is that you can look at this image and semantically understand the language of your question by jumping from one node to another by their relationships.
Here is a Cypher query you can use to ask your questions about the data model. You can just tweak it slightly to match your questions.
MATCH (employee:Employee)-[:HAS_SKILL]->(skill:Skill),
(employee)<-[:DEALS]-(game:Game)-[:LOCATION]->(table:Table),
(game)-[:BEGINS]->(hour:H)<-[*]-(day:D)<-[*]-(month:M)<-[*]-(year:Y)
WHERE skill.type = "Blackjack" AND
day.day = 17 AND
month.month = 1 AND
year.year = 2014
RETURN employee, skill, game, table
The above query finds the sub-graph for all employees who have the skill Blackjack and their table and location on a specific date (1/17/14).
To do this in SQL would be very difficult. The next thing you need to think about is importing your data into a Neo4j database. If you're curious on how to do that please look at other questions here on SO and if you need more help, feel free to post another question or reach out to me on Twitter #kennybastani.
Cheers,
Kenny
Assume a simple database for hotel reservations with three tables.
Table 1: Reservations
This table contains a check-in and check-out date as well as a reference to one or more rooms and a coupon if applicable.
Table 2: Rooms
This table holds the data of all the hotel rooms with prices per night and number of beds.
Table 3: Coupons
This table holds the data of all the coupons.
Option #1:
If you want to get an overview of the reservations for a particular month with the total cost of each reservation, you'd have to fetch the reservations, the rooms for each reservation, and the coupon (if one is present).
With this data, you can then calculate the total amount for the reservation.
Option #2:
However, there is also another option, which is to store the total cost and discount in the reservation table so that it is much easier to fetch these calculations. The downside is that your data becomes much more dependent and much less flexible to work with. What I mean is that you have to manually update the total cost and discount of the reservation table every time you change a room or a coupon that is linked to a reservation.
What is generally recommended in terms of performance (option #2) version data independence (option #1).
UPDATE:
It is a MySQL database with over 500 000 rows (reservations) at this point, but is growing rapidly. I want to optimize database performance at an early stage to make sure that the UX remains fast and responsive.
Let me start to answer this with a story. (Somewhat simplified.)
2011-01-01 I reserve a room for two nights, 2011-03-01 and 2011-03-02. You don't tell me which room I'll get. (Because you don't know yet which room I'll get.) You tell me it will cost $40 per night. I have no coupons. You enter my reservation into your computer, even though you're already fully reserved for both those nights. In fact, you already have one person on the waiting list for both those nights. (Overbooking is a normal thing, not an abnormal thing.)
2011-01-15 You raise the rates for every room by $5.
2011-02-01 I call again to make sure you still have my reservation. You confirm that I have a reservation for two nights, 2011-03-01 and 2011-03-02, at $40. (Not $45, your current rate. That wasn't our deal. Our deal was $40 a night.)
2011-02-12 One person calls and cancels their reservation for 2011-03-01 and 2011-03-02. You still don't yet have a room you know for certain that I'll be able to check in to. The other person from the waiting list now has a room; I'm still on the waiting list.
2011-02-15 One person calls and cancels their reservation for 2011-03-01 and 2011-03-02. Now I have a room.
2011-03-01 I check in with a coupon.
You can store the "current" or "default" price with each room, or with each class of
rooms, but you need to store the price we agreed to with my
reservation.
Reservations don't reserve rooms; they reserve potential rooms. You
don't know who will leave early, who will leave late, who will
cancel, and so on. (Based on my experience, once in a while a room will
be sealed with crime scene tape. You don't know how long that will last, either.)
You can have more reservations than room-nights.
Coupons can presumably appear at any time before check out.
If you want to get an overview of the reservations for a particular
month with the total cost of each reservation, you'd have to fetch the
reservations, the rooms for each reservation, and the coupon (if one
is present).
I don't think so. The price you agreed to should be in the reservation itself. Specific rooms can't resonably be assigned until the last minute. If there's one coupon per reservation, that might need to be stored with the reservation, too.
The only reporting problem is in making sure your reports clearly report how much expected revenue should be ignored due to overbooking.
The response of your answer depends of the size of your database. For small database option #1 is better, but for huge database option #2 is better. So if you could say how many rows you got in table, and the database used (oracle, sqlserver etc.) you will have a more precise answer.
You can add a table holds the data of the rooms`s historical prices and reason for change.
Table 2 only records the latest price.