I have a 'Course' table and an 'Event' table.
I would like to have all the courses that actually take place, i.e. they are not cancelled by an event.
I have done this by a simple request for all the course and a script analysis (basically some loops), but this request take a time that I believe too long. I think what I want is possible in one query and no loops to optimize this request.
Here are the details :
'Course' c have the fields 'date', 'duration' and a many to many relation with the 'Grade' table
'Event' e have the fields 'begin', 'end', 'break' and a many to many relation with the 'Grade' table
A course is cancelled by an event if they occur at the same time and if the event is a break (e.break = 1)
A course is cancelled by an event if all the grades of the course are in the events that occurs at the same time (many events can occurs, I have to sum up the grades of these events and compare them to the grades of the courses). This is the part I'm doing with a loop, I have some trouble to conceptualize that.
Any help is welcome,
Thanks in advance,
PS : I'm using mysql
EDIT : Tables details
-Course
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| date | datetime | NO | | NULL | |
| duration | time | NO | | NULL | |
| type | int(11) | NO | | NULL | |
+-----------+-------------+------+-----+---------+----------------+
+-------+---------------------+----------+------+
| id | date | duration | type |
+-------+---------------------+----------+------+
| 1 | 2013-12-10 10:00:00 | 02:00:00 | 0 |
| 2 | 2013-12-11 10:00:00 | 02:00:00 | 0 |
+-------+---------------------+----------+------+
-Event
+-------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| begin | datetime | NO | | NULL | |
| end | datetime | YES | | NULL | |
| break | tinyint(1) | NO | | NULL | |
+-------------+-------------+------+-----+---------+----------------+
+----+---------------------+---------------------+-------+
| id | begin | end | break |
+----+---------------------+---------------------+-------+
| 1 | 2013-12-10 00:00:00 | 2013-12-11 23:59:00 | 1 |
+----+---------------------+---------------------+-------+
-course_grade
+-----------+----------+
| course_id | grade_id |
+-----------+----------+
| 1 | 66 |
| 2 | 65 |
| 2 | 66 |
+-----------+----------+
-event_grade
+----------+----------+
| grade_id | event_id |
+----------+----------+
| 66 | 1 |
+----------+----------+
So here, only the course 2 should appear, because course 1 has only one grade, and this grade has an event.
I like riddles, this is a nice one, has many solutions, I think
As you say 'Any help is welcome', I give an answer altough its not the solution (and it does not fit into a comment)
I dont know, if you just want (A) the naked statement (over and out), or if you want (B) to understand how to get to the solution, I take (B)
I start with 'what would I change' before starting about the solution:
you are mixing date,datetime,start,end and duration, try to use only one logic (if it is your model ofcourse) ie.
an event/course has a start and an end time (or start/duration)
duration should (IMHO) not be a time
try to find a smallest timeslice for events/course (are there 1 sec events? or is a granularity of 5' (ie. 10:00, 10:05, 10:10 ans so on) a valid compromise?
My solution, a prgmatic one not academic
(sounds funny, but does work good in a simillar prob I had see annotation)
Create a table (T_TIME_OF_DAY) having all from 00:00, 00:05, .. 23:55
Create a Table (T_DAYS) in a valid and usefull range (a year?)
the carthesian product - call it points in time - (ie. select date, time from T_DAYS,T_TIME_OF_DAY no condition) of them (days x times) 300*24*12 ~ 100.000 rows if you need to look at a whole year (and 5' are ok for you) - thats not much and no prob
the next step is to join the curses fitting to your points in time (and the rows are down to <<100.000)
if you next join these with your events (again using point in time) you should get what you want.
simplyfied quarters of a day:
12 12 12 12 12 12 12 12
08 09 10 11 12 13 14 15
|...|...|...|...|...|...|...|...
grade 65 (C).............2..................
grade 66 (C).........1...2..................
grade 65 (E)................................
grade 66 (e)........1111..................
(annotation: I use this logic to calculate the availabillity of services regarding to their downtimes per Month / Year, and could use the already in timeslices prepared data for display)
(second answer, because it is a totaly different and mor3 standard aproach)
I made an SQLFiddle for you
so what to do:
and thats the a solution:
step one (in mind) select course,grades (lets call them C)
step two (in mind) select events, grades (lets call them E)
and - tada -
select all from C where there a no rows in E that have the same grade and the same date(somehow) and eventtype='break'
so your solution:
select
id, date start_time, date+duration end_time, grade_id
from Course c join course_grade cg on c.id=cg.course_id
where not exists (
select grade_id, begin start_time, end end_time
from event_grade eg join event e on eg.event_id=e.id
where
eg.grade_id=cg.grade_id
and e.break=1
and
(
(e.begin<=c.date and e.end >=c.date+c.duration)
or e.begin between c.date and c.date+c.duration
or e.end between c.date and c.date+c.duration
)
)
I did take no attention to optimize here
Related
I am trying to create an online room reservation system for a small hotel. One of the tables of the database
is supposed to hold the bookings. It has an autonumber field, customer data fields, two date fields for arrival and departure, and a number field for the number of rooms booked.
A search page submits the arrival and departure dates to a result page which is then supposed to tell the customer how many rooms are available within the period if any. This is where it all goes wrong.
I just can't get an accurate count of the number of rooms already booked within the period requested.
guest | arrive | depart |booked
Smith | 2002-06-11 | 2002-06-18 | 1
Jones | 2002-06-12 | 2002-06-14 | 2
Brown | 2002-06-13 | 2002-06-16 | 1
White | 2002-06-15 | 2002-06-17 | 2
If the hotel has 9 rooms, here is a day-by-day listing of the number of available rooms.
I want the result like this.
date available status
2002-06-10 | 9 | Hotel is empty
2002-06-11 | 8 | Smith checks in
2002-06-12 | 6 | Jones checks in
2002-06-13 | 5 | Brown checks in
2002-06-14 | 7 | Jones checks out
2002-06-15 | 5 | White checks in
2002-06-16 | 6 | Brown checks out
2002-06-17 | 8 | White checks out
2002-06-18 | 9 | Smith checks out
Please help me to find a solution
A calendar table isn't strictly necessary for problems of this nature, but they can help to conceptualise the problem in a quick and easy manner. So I have a calendar table holding dates from 1900 until 4000 and something...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(booking_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,guest VARCHAR(12) NOT NULL
,arrive DATE NOT NULL
,depart DATE NOT NULL
,booked INT NOT NULL
,UNIQUE KEY(guest,arrive)
);
INSERT INTO my_table (guest,arrive,depart,booked) VALUES
('Smith','2002-06-11','2002-06-18',1),
('Jones','2002-06-12','2002-06-14',2),
('Brown','2002-06-13','2002-06-16',1),
('White','2002-06-15','2002-06-17',2);
SELECT x.dt
, 9 - COALESCE(SUM(booked),0) available
FROM calendar x
LEFT
JOIN my_table y
ON x.dt >= y.arrive AND x.dt < y.depart
WHERE x.dt BETWEEN '2002-06-10' AND '2002-06-20'
GROUP
BY dt;
+------------+-----------+
| dt | available |
+------------+-----------+
| 2002-06-10 | 9 |
| 2002-06-11 | 8 |
| 2002-06-12 | 6 |
| 2002-06-13 | 5 |
| 2002-06-14 | 7 |
| 2002-06-15 | 5 |
| 2002-06-16 | 6 |
| 2002-06-17 | 8 |
| 2002-06-18 | 9 |
| 2002-06-19 | 9 |
| 2002-06-20 | 9 |
+------------+-----------+
11 rows in set (0.00 sec)
I need some help for a MySQL database design. The MySQL database should handle about 150 million records a year. I want to use the myisam engine.
The data structure:
Car brand (>500 brands)
Every car brand has 30+ car models
Every car model has the same 5 values, some model have additional values
Every value has exactly 3 fields:
timestamp
quality
actual value
The car brand can have some values with the same fields
The values are tracked every 5 minutes -> 105120 records a year
About the data:
The field quality should be always 'good' but when it's not I need to know.
The field timestamp is usually the but at least one value has a different timestamp
Deviation: 1-60 seconds
If the timestamp has a different timestamp it has always a different timestamp
Sometimes I don't get data because the source server is down.
How I want to use the data for
Visualisations in chart(time and actual value) with a selection of values
Aggregation of some values for every brand
My Questions:
I thought it's a good idea to split the data into different tables, so I put every brand in an extra table. To find the table by car brand name I created an index table. Is this a good practice?
Is it better to create tables for every car model (about 1500 tables)?
Should I store the quality (if it is not 'good') and the deviation of the timestamp in a seperate table?
Any other suggestions?
Example:
Table: car_brand
| car_brand | tablename | Address |
|-----------|-----------|-------------|
| BMW | bmw_table | the address |
| ... | ... | ... |
Table: bmw_table (105120*30+ car models = more than 3,2 million records per year)
| car_model | timestamp_usage | quality_usage | usage | timestamp_fuel_consumed | quality_usage |fuel_consumed | timestamp_fuel_consumed | quality_kilometer | kilometer | timestamp_revenue | quality_revenue | revenue | ... |
|-------------|---------------------|---------------|-------|-------------------------|----------------|--------------|-------------------------|-------------------|-----------|---------------------|-----------------|---------|-----|
| Z4 | 2015-12-12 12:12:12 | good | 5% | 2015-12-12 12:12:12 | good | 10.6 | 2015-12-12 12:11:54 | good | 120 | null | null | null | ... |
| Z4 | 2015-12-12 12:17:12 | good | 6% | 2015-12-12 12:17:12 | good | 12.6 | 2015-12-12 12:16:54 | good | 125 | null | null | null | ... |
| brand_value | null |null | null | null | null | null | null | null | null | 2015-12-12 12:17:12 | good | 1000 | ... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
And the other brand tables..
Edit: Queries and quality added
Possible Queries
Note: I assume that the table bmw_table has an extra column that is called car_brand and the table name is simple_table instead of bmw_table to reduce complexity.
SELECT car_brand, sum(revenue), avg(usage)
FROM simple_table
WHERE timestamp_usage>=2015-10-01 00:00:00 AND timestamp_usage>=2015-10-31 23:59:59
GROUP BY car_brand;
SELECT timestamp_usage,usage,revenue,fuel_consumed,kilometer
FROM simple_table
WHERE timestamp_usage>=2015-10-01 00:00:00 AND timestamp_usage>=2015-10-31 23:59:59;
Quality Values
I collect the data from an OPC Server so the qualtiy field contains one of the following values:
bad
badConfigurationError
badNotConnected
badDeviceFailure
badSensorFailure
badLastKnownValue
badCommFailure
badOutOfService
badWaitingForInitialData
uncertain
uncertainLastUsableValue
uncertainSensorNotAccurate
uncertainEUExceeded
uncertainSubNormal
good
goodLocalOverride
Thanks in advance!
Droider
Do not have a separate table per brand. There is no advantage, only unnecessary complexity. Nor 1 table per model. In general, if two table look the same, the data should be combined into a single table. In your example, that one table would have brand and model as columns.
Indexes are your friend for performance. Let's see the queries you will perform, so we can discuss the optimal indexes.
What will you do if the data quality is not 'good'? Simply display "good" or "not good"?
I'm developing a wardrobe application that uses a database table called "entrances".
The program is used to organize a normal wardrobe storage where the storage can have different amount of numbers/slots to hang clothes on. When a customer comes up to the merchant, the merchant scans the customer's bar code and will then get a free number from the system to hang the customer's clothes on. But there can of course only be one entry for each number.
My entrances db could look something like:
ID | wardrobeNo | storeID | customerBarcode | deliveredTime | collectedTime
---+------------+---------+-----------------+---------------+--------------
1 | 1 | 1 | XX | 20:12:55 | NULL
2 | 2 | 1 | XA | 20:44:44 | NULL
3 | 1 | 2 | XZ | 20:55:55 | NULL
4 | 2 | 2 | XC | 22:22:22 | NULL
Later that day the same entries do still exist in the DB but they will now have a collected time if the clothes have been collected from the wardrobe on some of the numbers before people went home.
ID | wardrobeNo | storeID | customerBarcode | deliveredTime | collectedTime
---+------------+---------+-----------------+---------------+--------------
1 | 1 | 1 | XX | 20:12:55 | 23:23:23
2 | 2 | 1 | XA | 20:44:44 | NULL
3 | 1 | 2 | XZ | 20:55:55 | 22:23:23
4 | 2 | 2 | XC | 22:22:22 | NULL
I will then be able to see the occupied numbers with:
SELECT * FROM db WHERE storeID = x AND delivered NOT NULL AND collected = NULL
What i'm wondering about is how I would be able to lock these 'wardrobeNo' while the merchant is handling payment, so another merchant does not make order on the same 'wardrobeNo'... just like a restaurant that would link orders to tables.
Is this even a good way to tackle the problem or is there something a lot smarter? Or should I consider thinking about this problem in another way.
Hope it makes sense..
Updated: Instead of taking care of maintaining a sequence yourself, use MySQL's auto_increment in combination with a scheduled alter table command at midnight:
CREATE TABLE idTable (
idKey INT(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (idKey)
)
And at midnight:
TRUNCATE TABLE idTable;
ALTER TABLE idTable AUTO_INCREMENT = 1;
Then simply add a new record to idTable prior to adding a row to your wardrobe table and use the inserted ID (via mysql_insert_id()) to get a daily unique ID.
I have a data table that I use to do some calculations. The resulting data set after calculations looks like:
+------------+-----------+------+----------+
| id_process | id_region | type | result |
+------------+-----------+------+----------+
| 1 | 4 | 1 | 65.2174 |
| 1 | 5 | 1 | 78.7419 |
| 1 | 6 | 1 | 95.2308 |
| 1 | 4 | 1 | 25.0000 |
| 1 | 7 | 1 | 100.0000 |
+------------+-----------+------+----------+
By other hand I have other table that contains a set of ranges that are used to classify the calculations results. The range tables looks like:
+----------+--------------+---------+
| id_level | start | end | status |
+----------+--------------+---------+
| 1 | 0 | 75 | Danger |
| 2 | 76 | 90 | Alert |
| 3 | 91 | 100 | Good |
+----------+--------------+---------+
I need to do a query that add the corresponding 'status' column to each value when do calculations. Currently, I can do that adding the following field to calculation query:
select
...,
...,
[math formula] as result,
(select status
from ranges r
where result between r.start and r.end) status
from ...
where ...
It works ok. But when I have a lot of rows (more than 200K), calculation query become slow.
My question is: there is some way to find that 'status' value without do that subquery?
Some one have worked on something similar before?
Thanks
Yes, you are looking for a subquery and join:
select s.*, r.status
from (select s.*
from <your query here>
) s left outer join
ranges r
on s.result between r.start and r.end
Explicit joins often optimize better than nested select. In this case, though, the ranges table seems pretty small, so this may not be the performance issue.
For a personal project I'm working on right now I want to make a line graph of game prices on Steam, Impulse, EA Origins, and several other sites over time. At the moment I've modified a script used by SteamCalculator.com to record the current price (sale price if applicable) for every game in every country code possible or each of these sites. I also have a column for the date in which the price was stored. My current tables look something like so:
THIS STRUCTURE IS NO LONGER VALID. SEE BELOW
+----------+------+------+------+------+------+------+------------+
| steam_id | us | at | au | de | no | uk | date |
+----------+------+------+------+------+------+------+------------+
| 112233 | 999 | 899 | 999 | NULL | 899 | 699 | 2011-8-21 |
| 123456 | 1999 | 999 | 1999 | 999 | 999 | 999 | 2011-8-20 |
| ... | ... | ... | ... | ... | ... | ... | ... |
+----------+------+------+------+------+------+------+------------+
At the moment each country is updated separately (there's a for loop going through the countries), although if it would simplify it then this could be modified to temporarily store new prices to an array then update an entire row at a time. I'll likely be doing this eventually, anyway, for performance reasons.
Now my issue is determining how to best update this table if one of the prices changes. For instance, let's suppose that on 8/22/2011 the game 112233 goes on sale in America for $4.99, Austria for 3.99€, and the other prices remain the same. I would need the table to look like so:
THIS STRUCTURE IS NO LONGER VALID. SEE BELOW
+----------+------+------+------+------+------+------+------------+
| steam_id | us | at | au | de | no | uk | date |
+----------+------+------+------+------+------+------+------------+
| 112233 | 999 | 899 | 999 | NULL | 899 | 699 | 2011-8-21 |
| 123456 | 1999 | 999 | 1999 | 999 | 999 | 999 | 2011-8-20 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 112233 | 499 | 399 | 999 | NULL | 899 | 699 | 2011-8-22 |
+----------+------+------+------+------+------+------+------------+
I don't want to create a new row EVERY time the price is checked, otherwise I'll end up having millions of rows of repeated prices day after day. I also don't want to create a new row per changed price like so:
THIS STRUCTURE IS NO LONGER VALID. SEE BELOW
+----------+------+------+------+------+------+------+------------+
| steam_id | us | at | au | de | no | uk | date |
+----------+------+------+------+------+------+------+------------+
| 112233 | 999 | 899 | 999 | NULL | 899 | 699 | 2011-8-21 |
| 123456 | 1999 | 999 | 1999 | 999 | 999 | 999 | 2011-8-20 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 112233 | 499 | 899 | 999 | NULL | 899 | 699 | 2011-8-22 |
| 112233 | 499 | 399 | 999 | NULL | 899 | 699 | 2011-8-22 |
+----------+------+------+------+------+------+------+------------+
I can prevent the first problem but not the second by making each (steam_id, <country>) a unique index then adding ON DUPLICATE KEY UPDATE to every database query. This will only add a row if the price is different, however it will add a new row for each country which changes. It also does not allow the same price for a single game for two different days (for instance, suppose game 112233 goes off sale later and returns to $9.99) so this is clearly an awful option.
I can prevent the second problem but not the first by making (steam_id, date) a unique index then adding ON DUPLICATE KEY UPDATE to every query. Every single day when the script is run the date has changed, so it will create a new row. This method ends up with hundreds of lines of the same prices from day to day.
How can I tell MySQL to create a new row if (and only if) any of the prices has changed since the latest date?
UPDATE -
At the recommendation of people in this thread I have changed the schema of my database to facilitate adding new country codes in the future and avoid the issue of needing to update entire rows at a time. The new schema looks something like:
+----------+------+---------+------------+
| steam_id | cc | price | date |
+----------+------+---------+------------+
| 112233 | us | 999 | 2011-8-21 |
| 123456 | uk | 699 | 2011-8-20 |
| ... | ... | ... | ... |
+----------+------+---------+------------+
On top of this new schema I have discovered that I can use the following SQL query to grab the price from the most recent update:
SELECT `price` FROM `steam_prices` WHERE `steam_id` = 112233 AND `cc`='us' ORDER BY `date` ASC LIMIT 1
At this point my question boils down to this:
Is it possible to (using only SQL rather than application logic) insert a row only if a condition is true? For instance:
INSERT INTO `steam_prices` (...) VALUES (...) IF price<>(SELECT `price` FROM `steam_prices` WHERE `steam_id` = 112233 AND `cc`='us' ORDER BY `date` ASC LIMIT 1)
From the MySQL manual I can not find any way to do this. I have only found that you can ignore or update if a unique index is the same. However if I made the price a unique index (allowing me to update the date if it was the same) then I would not be able to recognize when a game went on sale and then returned to its original price. For instance:
+----------+------+---------+------------+
| steam_id | cc | price | date |
+----------+------+---------+------------+
| 112233 | us | 999 | 2011-8-20 |
| 112233 | us | 499 | 2011-8-21 |
| 112233 | us | 999 | 2011-8-22 |
| ... | ... | ... | ... |
+----------+------+---------+------------+
Also, after just finding and reading MySQL Conditional INSERT, I created and tried the following query:
INSERT INTO `steam_prices`(
`steam_id`,
`cc`,
`update`,
`price`
)
SELECT '7870', 'us', NOW(), 999
FROM `steam_prices`
WHERE
`price`<>999
AND `update` IN (
SELECT `update`
FROM `steam_prices`
ORDER BY `update`
ASC LIMIT 1
)
The idea was to insert the row '7870', 'us', NOW(), 999 if (and only if) the price of the most recent update wasn't 999. When I ran this I got the following error:
1235 - This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
Any ideas?
You will probably find this easier if you simply change your schema to something like:
steam_id integer
country varchar(2)
date date
price float
primary key (steam_id,country,date)
(with other appropriate indexes) and then only worrying about each country in turn.
In other words, your for loop has a unique ID/country combo so it can simply query the latest-date record for that combo and add a new row if it's different.
That will make your selections a little more complicated but I believe it's a better solution, especially if there's any chance at all that more countries may be added in future (it won't break the schema in that case).
First, I suggest you store your data in a form that is is less hard-coded per country:
+----------+--------------+------------+-------+
| steam_id | country_code | date | price |
+----------+--------------+------------+-------+
| 112233 | us | 2011-08-20 | 12.45 |
| 112233 | uk | 2011-08-20 | 12.46 |
| 112233 | de | 2011-08-20 | 12.47 |
| 112233 | at | 2011-08-20 | 12.48 |
| 112233 | us | 2011-08-21 | 12.49 |
| ...... | .. | .......... | ..... |
+----------+--------------+------------+-------+
From here, you place a primary key on the first three columns...
Now for your question about not creating extra rows... That is what a simple transaction + application logic is great at.
Start a transaction
Run a select to see if the record in question is there
If not, insert one
Was there a problem with that approach?
Hope this helps.
After experimentation, and with some help from MySQL Conditional INSERT and http://www.artfulsoftware.com/infotree/queries.php#101, I found a query that worked:
INSERT INTO `steam_prices`(
`steam_id`,
`cc`,
`price`,
`update`
)
SELECT 7870, 'us', 999, NOW()
FROM `steam_prices` AS p1
LEFT JOIN `steam_prices` AS p2 ON p1.`steam_id`=p2.`steam_id` AND p1.`update` < p2.`update`
WHERE
p2.`steam_id` IS NULL
AND p1.`steam_id`=7870
AND p1.`cc`='us'
AND (
p1.`price`<>999
)
The answer is to first return all rows where there is no earlier timestamp. This is done with a within-group aggregate. You join a table with itself only on rows where the timestamp is earlier. If it fails to join (the timestamp was not earlier) then you know that row contains the latest timestamp. These rows will have a NULL id in the joined table (failed to join).
After you have selected all rows with the latest timestamp, grab only those rows where the steam_id is the steam_id you're looking for and where the price is different from the new price that you're entering. If there are no rows with a different price for that game at this point then the price has not changed since the last update, so an empty set is returned. When an empty set is returned the SELECT statement fails and nothing is inserted. If the SELECT statement succeeds (a different price was found) then it returns the row 7870, 'us', 999, NOW() which is inserted into our table.
EDIT - I actually found a mistake with the above query a little while later and I have since revised it. The query above will insert a new row if the price has changed since the last update, but it will not insert a row if there are currently no prices in the database for that item.
To resolve this I had to take advantage of the DUAL table (which always contains one row), then use an OR in the where clause to test for a different price OR an empty set
INSERT INTO `steam_prices`(
`steam_id`,
`cc`,
`price`,
`update`
)
SELECT 12345, 'us', 999, NOW()
FROM DUAL
WHERE
NOT EXISTS (
SELECT `steam_id`
FROM `steam_prices`
WHERE `steam_id`=12345
)
OR
EXISTS (
SELECT p1.`steam_id`
FROM `steam_prices` AS p1
LEFT JOIN `steam_prices` AS p2 ON p1.`steam_id`=p2.`steam_id` AND p1.`update` < p2.`update`
WHERE
p2.`steam_id` IS NULL
AND p1.`steam_id`=12345
AND p1.`cc`='us'
AND (
p1.`price`<>999
)
)
It's very long, it's very ugly, and it's very complicated. But it works exactly as advertised. If there is no price in the database for a certain steam_id then it inserts a new row. If there is already a price then it checks the price with the most recent update and, if different, inserts a new row.