MySQL - move from flat table to first normal form - mysql

I am building an application that will allow a user to record weekly activity over a 6 week period. Each week has 3 benchmarks to record against, here is an example:
Week 1
+------------+-----------+------------+-----------+
| Day | Minutes | Location | Miles |
+------------+-----------+------------+-----------+
| Monday | | | |
+------------+-----------+------------+-----------+
| Tuesday | | | |
+------------+-----------+------------+-----------+
| Wednesday | | | |
+------------+-----------+------------+-----------+
| Thursday | | | |
+------------+-----------+------------+-----------+
| Friday | | | |
+------------+-----------+------------+-----------+
| Saturday | | | |
+------------+-----------+------------+-----------+
| Sunday | | | |
+------------+-----------+------------+-----------+
This is repeated for each week up to 6.
In my flat table I have the following:
UserID | Username | Week 1 Day 1 Minutes | Week 1 Day 1 Location | Week 1 Day 1 Miles | Week 1 Day 2 Minutes | Week 1 Day 2 Location | Week 1 Day 2 Miles ETC...
X 7 for a week and then X 6 for the 6 weeks.
I am trying to figure out where my eliminations are, and what my separate tables would be. So far I have the following:
User Table
+------------+-----------+
| UserID | Username |
+------------+-----------+
| | |
+------------+-----------+
Activity Table
+------------+-----------+------------+-----------+------------+-----------+
| UserID | WeekID | Day | Minutes | Location | Miles |
+------------+-----------+------------+-----------+------------+-----------+
| | | | | | |
+------------+-----------+------------+-----------+------------+-----------+
Weeks Table
+------------+-----------+------------+
| UserID | WeekID | Week_No |
+------------+-----------+------------+
| | | |
+------------+-----------+------------+
I think I am getting along the right lines, but the Weeks Table doesn't seem right and I am not sure what the relationships are - I don't think I need UserID in each table, and I'm not sure what the PKs should be.
Any comments on this schema, or an efficient way to achieve the first normal form given the application requirements would be much appreciated, many thanks.
EDIT:
Thanks very much for all the answers, great stuff.
I think having a Location Table would be beneficial as I could standardize locations (could provide a list to choose from) and if I need to query based on location, I'll have consistent location names.
Revised the schema to this:
User Table - UserID PK
+------------+-----------+
| UserID | Username |
+------------+-----------+
| | |
+------------+-----------+
Activity Table - ActivityID PK
+------------+-----------+------------+-----------+------------+-------------+-----------+
| ActivityID | UserID | Week_No | Day | Minutes | LocationID | Miles |
+------------+-----------+------------+-----------+------------+-------------+-----------+
| | | | | | | |
+------------+-----------+------------+-----------+------------+-------------+-----------+
Location Table - LocationID PK
+------------+---------------+
| LocationID | Location_Name |
+------------+---------------+
| | |
+------------+---------------+
2nd EDIT:
I now have a question on 2NF and 3NF on this topic:
MySQL - moving from 1st Normal Form to 2nd and 3rd Normal Forms

Add a Location table and change Location to LocationID (PK). The Weeks table does not need UserID in it. You can find what weeks a user has by querying the Activity table.
I only see the need for a Week table if Week_No changes by user, which doesn't seem to make too much sense. Otherwise, you can just replace WeekID with WeekNo in the Activity table, and delete the Weeks table.

In the weeks table your primary key should be WeekID. I'm not sure you would need a week table though as you don't seem to be storing anything in it apart from the week that the activity took place, which could actually be in the activity table. So I would get rid of it, add Week_No to the Activity table, and have an ActivityID in the Activity table as primary, and UserID as Foriegn Key.
Don't want to tell you too much, just enough to get you on your way as you seem to want to normalise this fully own your own.

PKs:
User tbl: UserID
Weeks tbl: WeekId
Activity tbl: (UserID,WeekID)
And you don't need a UserId in Weeks Tbl

i don't think you need a week table.
activity table with userid, weekno, day(enum), minutes, location, miles, is enough for what you mentioned...

Related

Mysql query for hotel room availability

I am trying to create an online room reservation system for a small hotel. One of the tables of the database
is supposed to hold the bookings. It has an autonumber field, customer data fields, two date fields for arrival and departure, and a number field for the number of rooms booked.
A search page submits the arrival and departure dates to a result page which is then supposed to tell the customer how many rooms are available within the period if any. This is where it all goes wrong.
I just can't get an accurate count of the number of rooms already booked within the period requested.
guest | arrive | depart |booked
Smith | 2002-06-11 | 2002-06-18 | 1
Jones | 2002-06-12 | 2002-06-14 | 2
Brown | 2002-06-13 | 2002-06-16 | 1
White | 2002-06-15 | 2002-06-17 | 2
If the hotel has 9 rooms, here is a day-by-day listing of the number of available rooms.
I want the result like this.
date available status
2002-06-10 | 9 | Hotel is empty
2002-06-11 | 8 | Smith checks in
2002-06-12 | 6 | Jones checks in
2002-06-13 | 5 | Brown checks in
2002-06-14 | 7 | Jones checks out
2002-06-15 | 5 | White checks in
2002-06-16 | 6 | Brown checks out
2002-06-17 | 8 | White checks out
2002-06-18 | 9 | Smith checks out
Please help me to find a solution
A calendar table isn't strictly necessary for problems of this nature, but they can help to conceptualise the problem in a quick and easy manner. So I have a calendar table holding dates from 1900 until 4000 and something...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(booking_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,guest VARCHAR(12) NOT NULL
,arrive DATE NOT NULL
,depart DATE NOT NULL
,booked INT NOT NULL
,UNIQUE KEY(guest,arrive)
);
INSERT INTO my_table (guest,arrive,depart,booked) VALUES
('Smith','2002-06-11','2002-06-18',1),
('Jones','2002-06-12','2002-06-14',2),
('Brown','2002-06-13','2002-06-16',1),
('White','2002-06-15','2002-06-17',2);
SELECT x.dt
, 9 - COALESCE(SUM(booked),0) available
FROM calendar x
LEFT
JOIN my_table y
ON x.dt >= y.arrive AND x.dt < y.depart
WHERE x.dt BETWEEN '2002-06-10' AND '2002-06-20'
GROUP
BY dt;
+------------+-----------+
| dt | available |
+------------+-----------+
| 2002-06-10 | 9 |
| 2002-06-11 | 8 |
| 2002-06-12 | 6 |
| 2002-06-13 | 5 |
| 2002-06-14 | 7 |
| 2002-06-15 | 5 |
| 2002-06-16 | 6 |
| 2002-06-17 | 8 |
| 2002-06-18 | 9 |
| 2002-06-19 | 9 |
| 2002-06-20 | 9 |
+------------+-----------+
11 rows in set (0.00 sec)

Database design for 150 million records p.a. with categories and sub categories

I need some help for a MySQL database design. The MySQL database should handle about 150 million records a year. I want to use the myisam engine.
The data structure:
Car brand (>500 brands)
Every car brand has 30+ car models
Every car model has the same 5 values, some model have additional values
Every value has exactly 3 fields:
timestamp
quality
actual value
The car brand can have some values with the same fields
The values are tracked every 5 minutes -> 105120 records a year
About the data:
The field quality should be always 'good' but when it's not I need to know.
The field timestamp is usually the but at least one value has a different timestamp
Deviation: 1-60 seconds
If the timestamp has a different timestamp it has always a different timestamp
Sometimes I don't get data because the source server is down.
How I want to use the data for
Visualisations in chart(time and actual value) with a selection of values
Aggregation of some values for every brand
My Questions:
I thought it's a good idea to split the data into different tables, so I put every brand in an extra table. To find the table by car brand name I created an index table. Is this a good practice?
Is it better to create tables for every car model (about 1500 tables)?
Should I store the quality (if it is not 'good') and the deviation of the timestamp in a seperate table?
Any other suggestions?
Example:
Table: car_brand
| car_brand | tablename | Address |
|-----------|-----------|-------------|
| BMW | bmw_table | the address |
| ... | ... | ... |
Table: bmw_table (105120*30+ car models = more than 3,2 million records per year)
| car_model | timestamp_usage | quality_usage | usage | timestamp_fuel_consumed | quality_usage |fuel_consumed | timestamp_fuel_consumed | quality_kilometer | kilometer | timestamp_revenue | quality_revenue | revenue | ... |
|-------------|---------------------|---------------|-------|-------------------------|----------------|--------------|-------------------------|-------------------|-----------|---------------------|-----------------|---------|-----|
| Z4 | 2015-12-12 12:12:12 | good | 5% | 2015-12-12 12:12:12 | good | 10.6 | 2015-12-12 12:11:54 | good | 120 | null | null | null | ... |
| Z4 | 2015-12-12 12:17:12 | good | 6% | 2015-12-12 12:17:12 | good | 12.6 | 2015-12-12 12:16:54 | good | 125 | null | null | null | ... |
| brand_value | null |null | null | null | null | null | null | null | null | 2015-12-12 12:17:12 | good | 1000 | ... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
And the other brand tables..
Edit: Queries and quality added
Possible Queries
Note: I assume that the table bmw_table has an extra column that is called car_brand and the table name is simple_table instead of bmw_table to reduce complexity.
SELECT car_brand, sum(revenue), avg(usage)
FROM simple_table
WHERE timestamp_usage>=2015-10-01 00:00:00 AND timestamp_usage>=2015-10-31 23:59:59
GROUP BY car_brand;
SELECT timestamp_usage,usage,revenue,fuel_consumed,kilometer
FROM simple_table
WHERE timestamp_usage>=2015-10-01 00:00:00 AND timestamp_usage>=2015-10-31 23:59:59;
Quality Values
I collect the data from an OPC Server so the qualtiy field contains one of the following values:
bad
badConfigurationError
badNotConnected
badDeviceFailure
badSensorFailure
badLastKnownValue
badCommFailure
badOutOfService
badWaitingForInitialData
uncertain
uncertainLastUsableValue
uncertainSensorNotAccurate
uncertainEUExceeded
uncertainSubNormal
good
goodLocalOverride
Thanks in advance!
Droider
Do not have a separate table per brand. There is no advantage, only unnecessary complexity. Nor 1 table per model. In general, if two table look the same, the data should be combined into a single table. In your example, that one table would have brand and model as columns.
Indexes are your friend for performance. Let's see the queries you will perform, so we can discuss the optimal indexes.
What will you do if the data quality is not 'good'? Simply display "good" or "not good"?

Calculate the fullness of an apartment using SQL expression

I have a database which looks like this:
Reservations Table:
-------------------------------------------------
id | room_id | start | end |
1 | 1 | 2015-05-13 | 2015-05-16 |
2 | 1 | 2015-05-18 | 2015-05-20 |
3 | 1 | 2015-05-21 | 2015-05-24 |
-------------------------------------------------
Apartment Table:
---------------------------------------
id | room_id | name |
1 | 1 | test apartment |
---------------------------------------
Meaning that in the month 05 (May) there is 31 days in the database we have 3 events giving us 8 days of usage 31 - 8 = 23 / 31 = 0.741 * 100 = %74.1 is the percentage of the emptiness and %25.9 is the percentage of usage. how can i do all of that in SQL? (mySQL).
This is my proposal:
SELECT SUM(DAY(`end`)-DAY(`start`))/EXTRACT(DAY FROM LAST_DAY(`start`)) FROM `apt`;
LAST_DAY function gives as output the date of last day of the month.
Check this
http://sqlfiddle.com/#!9/7c53b/2/0
Not the most efficient query but will get the job done.
select
sum(a.days)*100/(SELECT DAY(LAST_DAY(min(start))) from test1)
as usePercent,
100-(sum(a.days)*100/(SELECT DAY(LAST_DAY(min(start))) from test1))
as emptyPercent
FROM
(select DATEDIFF(end,start) as days from test1) a
What I did is first get the date difference and count them. Then in a nested query use the day(last_day()) function to get the last day of month. Then calculated by using your logic.

Different value counts on same column using LIKE

I have a database like below
+------------+---------------------------------------+--------+
| sender | subject | day |
+------------+---------------------------------------+--------+
| Darshana | Re: [Dev] [Platform] Build error | Monday |
| Dushan A | (MOLDOVADEVDEV-49) GREG Startup Error | Monday |
+------------+---------------------------------------+--------+
I want to get the result using the above table. It should check if the subject contains the given word then add one to the that word column for a given day.
|Day | "Dev" | "startup"|
+---------+------------+----------+
| Monday | 1 | 2 |
| Friday | 0 | 3 |
I was thought of using DECODE function but I couldn't get the expected result.
You can do this with conditional aggregation:
select day, sum(subject like '%Dev%') as Dev,
sum(subject like '%startup%') as startup
from table t
group by day;

Troubles conceptualizing a query

I have a 'Course' table and an 'Event' table.
I would like to have all the courses that actually take place, i.e. they are not cancelled by an event.
I have done this by a simple request for all the course and a script analysis (basically some loops), but this request take a time that I believe too long. I think what I want is possible in one query and no loops to optimize this request.
Here are the details :
'Course' c have the fields 'date', 'duration' and a many to many relation with the 'Grade' table
'Event' e have the fields 'begin', 'end', 'break' and a many to many relation with the 'Grade' table
A course is cancelled by an event if they occur at the same time and if the event is a break (e.break = 1)
A course is cancelled by an event if all the grades of the course are in the events that occurs at the same time (many events can occurs, I have to sum up the grades of these events and compare them to the grades of the courses). This is the part I'm doing with a loop, I have some trouble to conceptualize that.
Any help is welcome,
Thanks in advance,
PS : I'm using mysql
EDIT : Tables details
-Course
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| date | datetime | NO | | NULL | |
| duration | time | NO | | NULL | |
| type | int(11) | NO | | NULL | |
+-----------+-------------+------+-----+---------+----------------+
+-------+---------------------+----------+------+
| id | date | duration | type |
+-------+---------------------+----------+------+
| 1 | 2013-12-10 10:00:00 | 02:00:00 | 0 |
| 2 | 2013-12-11 10:00:00 | 02:00:00 | 0 |
+-------+---------------------+----------+------+
-Event
+-------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| begin | datetime | NO | | NULL | |
| end | datetime | YES | | NULL | |
| break | tinyint(1) | NO | | NULL | |
+-------------+-------------+------+-----+---------+----------------+
+----+---------------------+---------------------+-------+
| id | begin | end | break |
+----+---------------------+---------------------+-------+
| 1 | 2013-12-10 00:00:00 | 2013-12-11 23:59:00 | 1 |
+----+---------------------+---------------------+-------+
-course_grade
+-----------+----------+
| course_id | grade_id |
+-----------+----------+
| 1 | 66 |
| 2 | 65 |
| 2 | 66 |
+-----------+----------+
-event_grade
+----------+----------+
| grade_id | event_id |
+----------+----------+
| 66 | 1 |
+----------+----------+
So here, only the course 2 should appear, because course 1 has only one grade, and this grade has an event.
I like riddles, this is a nice one, has many solutions, I think
As you say 'Any help is welcome', I give an answer altough its not the solution (and it does not fit into a comment)
I dont know, if you just want (A) the naked statement (over and out), or if you want (B) to understand how to get to the solution, I take (B)
I start with 'what would I change' before starting about the solution:
you are mixing date,datetime,start,end and duration, try to use only one logic (if it is your model ofcourse) ie.
an event/course has a start and an end time (or start/duration)
duration should (IMHO) not be a time
try to find a smallest timeslice for events/course (are there 1 sec events? or is a granularity of 5' (ie. 10:00, 10:05, 10:10 ans so on) a valid compromise?
My solution, a prgmatic one not academic
(sounds funny, but does work good in a simillar prob I had see annotation)
Create a table (T_TIME_OF_DAY) having all from 00:00, 00:05, .. 23:55
Create a Table (T_DAYS) in a valid and usefull range (a year?)
the carthesian product - call it points in time - (ie. select date, time from T_DAYS,T_TIME_OF_DAY no condition) of them (days x times) 300*24*12 ~ 100.000 rows if you need to look at a whole year (and 5' are ok for you) - thats not much and no prob
the next step is to join the curses fitting to your points in time (and the rows are down to <<100.000)
if you next join these with your events (again using point in time) you should get what you want.
simplyfied quarters of a day:
12 12 12 12 12 12 12 12
08 09 10 11 12 13 14 15
|...|...|...|...|...|...|...|...
grade 65 (C).............2..................
grade 66 (C).........1...2..................
grade 65 (E)................................
grade 66 (e)........1111..................
(annotation: I use this logic to calculate the availabillity of services regarding to their downtimes per Month / Year, and could use the already in timeslices prepared data for display)
(second answer, because it is a totaly different and mor3 standard aproach)
I made an SQLFiddle for you
so what to do:
and thats the a solution:
step one (in mind) select course,grades (lets call them C)
step two (in mind) select events, grades (lets call them E)
and - tada -
select all from C where there a no rows in E that have the same grade and the same date(somehow) and eventtype='break'
so your solution:
select
id, date start_time, date+duration end_time, grade_id
from Course c join course_grade cg on c.id=cg.course_id
where not exists (
select grade_id, begin start_time, end end_time
from event_grade eg join event e on eg.event_id=e.id
where
eg.grade_id=cg.grade_id
and e.break=1
and
(
(e.begin<=c.date and e.end >=c.date+c.duration)
or e.begin between c.date and c.date+c.duration
or e.end between c.date and c.date+c.duration
)
)
I did take no attention to optimize here