I'm experiencing some troubles into making my query faster for production.
The query I want to execute currently takes 12 sec to show the resultset, and it crashes the production server which is ressource restricted.
The point is that I need to get all the enregistrement records when they are the last of the given periode (which is a date as YYYYMM).
After getting these records, I want to sum one of the fields given into I.sum_field as a total field.
When I comment the CASE part, the query takes approx 5sec (+/- 500ms).
Here is the query :
SELECT
I.libelle,
E1.periode,
E1.created_at,
CASE WHEN I.sum_field = 'fat' THEN SUM(E1.Fat)
WHEN I.sum_field = 'etp' THEN SUM(E1.Etp)
WHEN I.sum_field = 'nb_ident' THEN COUNT(*)
WHEN I.sum_field = 'cdi_actif' THEN SUM(E1.cdi_actif)
END AS total
FROM
indicateur_motif IM
INNER JOIN indicateur I
ON IM.indicateur_id = I.id
INNER JOIN `position` P
ON IM.motif_id = P.id
INNER JOIN enregistrement E1
ON P.id = E1.position_id
INNER JOIN
( SELECT
MAX(id) AS id,
MAX(created_at) AS created_at
FROM
enregistrement
WHERE
(etat_mouvement_id IN (1,3,4))
AND (periode >= '201410' AND periode <= '201512')
AND created_at <= DATE_FORMAT('2015-02-03', '%Y-%m-%d %H:%i:%s')
GROUP BY
salarie_id,
periode ) E2
ON E1.id = E2.id
AND E1.created_at = E2.created_at
WHERE
I.formule_id = 1
GROUP BY
I.id,
E1.periode
ORDER BY
I.position,
E1.periode
Here is the EXPLAIN result :
id select_type table type possible_keys key key_len ref rows Extra
------ ----------- -------------- ------ ---------------------------------------------- ---------------------------------------------- ------- ------------------ ------ ----------------------------------------------------
1 PRIMARY I ALL PRIMARY (NULL) (NULL) (NULL) 21 Using where; Using temporary; Using filesort
1 PRIMARY IM ref indicateur_motif_indicateur_id_motif_id_unique indicateur_motif_indicateur_id_motif_id_unique 4 orhase.I.id 2 Using index
1 PRIMARY P eq_ref PRIMARY PRIMARY 4 orhase.IM.motif_id 1 Using index
1 PRIMARY <derived2> ALL (NULL) (NULL) (NULL) (NULL) 165352 Using where; Using join buffer (Block Nested Loop)
1 PRIMARY e1 eq_ref PRIMARY PRIMARY 4 e2.id 1 Using where
2 DERIVED enregistrement index sp sp 771 (NULL) 165352 Using where
Here is a sample of the resultset :
libelle periode created_at total
------------------------------------------ ------- ------------------- ---------
CDI actifs fin de période 201410 2014-10-01 00:00:00 4689
CDI actifs fin de période 201411 2015-01-29 08:12:03 4674
CDI actifs fin de période 201412 2015-01-29 08:12:03 4660
CDI actifs fin de période 201501 2015-01-29 08:12:04 4444
CDI actifs fin de période 201502 2015-01-29 08:12:04 4222
CDI actifs fin de période 201503 2015-01-29 08:12:04 4195
CDI actifs fin de période 201504 2015-01-29 08:12:04 4176
CDI actifs fin de période 201505 2015-01-29 08:12:04 4155
CDI actifs fin de période 201506 2015-01-29 08:12:04 4136
CDI actifs fin de période 201507 2015-01-29 08:12:04 4121
CDI actifs fin de période 201508 2015-01-29 08:12:04 4080
CDI actifs fin de période 201509 2015-01-29 08:12:04 4061
CDI actifs fin de période 201510 2015-01-29 08:12:04 4036
CDI actifs fin de période 201511 2015-01-29 08:12:04 4001
CDI actifs fin de période 201512 2015-01-29 08:12:04 3976
ETP fin de période CDI stock 201410 2014-10-01 00:00:00 4259.16
ETP fin de période CDI stock 201411 2015-01-29 08:12:03 4241.91
ETP fin de période CDI stock 201412 2015-01-29 08:12:03 4222.12
ETP fin de période CDI stock 201501 2015-01-29 08:12:04 4028.07
I just have no idea where to put a new index to avoid this execution time... I've already put one on enregistrement, called sp :
ALTER TABLE enregistrement ADD INDEX sp(salarie_id, periode);
This one makes me get an execution time from 16sec to 12s.
Any ideas ?
Thanks.
Don't know if this will help, but what is your case doing... You are summing totally different fields and counting another into a "Total". I would suspect you might actually want these as their own columns.
However, that being said, what do you have for indexes... Your explain shows some, but I would try to include the following if they are NOT available...
table index
indicateur ( formule_id, id, position )
indicateur_motif ( indicateur_id, motif_id )
`position` ( id )
enregistrement ( position_id, id, created_at ) <-- for the JOIN portion
enregistrement ( etat_mouvement_id, periode, created_at, salarie_id, id ) <-- for sub-select query
Also, from your joins, you are not really using anything from the 'Position' table. Yes, you join from motif to position, position to enreg, but since
IM.motif_id = P.id and P.id = E1.position_id
then you could jump directly
IM.motif_id = E1.position_id
and remove the 'position' table from the query. Here is a slightly revised query to what you started. I removed the position reference, and also changed the "group by" of the inner query so that it might be better performance matching the available index for columns periode, and salarie_id.
SELECT
I.libelle,
E1.periode,
E1.created_at,
CASE WHEN I.sum_field = 'fat' THEN SUM(E1.Fat)
WHEN I.sum_field = 'etp' THEN SUM(E1.Etp)
WHEN I.sum_field = 'nb_ident' THEN COUNT(*)
WHEN I.sum_field = 'cdi_actif' THEN SUM(E1.cdi_actif)
END AS total
FROM
indicateur I
JOIN indicateur_motif IM
ON I.id = IM.indicateur_id
INNER JOIN enregistrement E1
ON IM.motif_id = E1.position_id
INNER JOIN
( SELECT
MAX(id) AS id,
MAX(created_at) AS created_at
FROM
enregistrement
WHERE
etat_mouvement_id IN (1,3,4)
AND periode >= '201410'
AND periode <= '201512'
AND created_at <= '2015-02-03'
GROUP BY
periode,
salarie_id ) E2
ON E1.id = E2.id
AND E1.created_at = E2.created_at
WHERE
I.formule_id = 1
GROUP BY
I.id,
E1.periode
ORDER BY
I.position,
E1.periode
I dont know what your tables look like, but this query:
SELECT MAX(id) AS id, MAX(created_at) AS created_at
FROM enregistrement
WHERE (etat_mouvement_id IN (1,3,4))
AND (periode >= '201410' AND periode <= '201512')
AND created_at <= DATE_FORMAT('2015-02-03', '%Y-%m-%d %H:%i:%s')
GROUP BY salarie_id, periode
is very expensive. If you want to try to fix this solely through indexes, adding indexes to the id and created_at columns might be a good start. The other suggestion I might make is to run this query in a separate transaction, and insert the results into a temp table. That should at least free up some of the required resources by turning it into a simple join rather than a very complex search operation in the middle of your query. If that doesnt work, you could also try running all of the selects and joins without the sums, inserting those results into a temp table, and then selecting and summing the results from there.
That said, without seeing your tables, the number of rows in each and all of the data in each column, what kind of hardware youre running, or having any idea what your prod environment looks like in regards to usage, its really hard to say exactly where the problem might be. I'm pretty sure there is no built-in function in MySQL yet, but profiling the query using something like Jet Profiler might be worthwhile if this is business critical. Seeing exactly where the resource pressure is coming from would be the first thing I would want to do if I were writing a query that is crashing production servers.
your slowness is coming from your sub-select on enregistrement. they are both seem to be table scanning what looks all the records. The IN is also not helping.
try creating indexes on the following table fields and let me know.
enregistrement.etat_mouvement_id
enregistrement.periode
enregistrement.created_at
Here it is. I reduced the execution time from 12s to 6.8s with this query :
SELECT I.libelle, e1.periode,
CASE WHEN I.sum_field = 'fat' THEN SUM(E1.Fat)
WHEN I.sum_field = 'etp' THEN SUM(E1.Etp)
WHEN I.sum_field = 'nb_ident' THEN COUNT(*)
WHEN I.sum_field = 'cdi_actif' THEN SUM(E1.cdi_actif) END AS 'total'
FROM indicateur_motif IM
INNER JOIN indicateur I ON IM.indicateur_id = I.id
INNER JOIN enregistrement e1 ON IM.motif_id = e1.position_id
INNER JOIN
(
SELECT MAX(created_at) AS createdat, salarie_id, periode
FROM enregistrement
WHERE (etat_mouvement_id IN (1,3,4))
AND (periode >= '201410' AND periode <= '201512')
AND created_at <= DATE_FORMAT('2015-02-03', '%Y-%m-%d %H:%i:%s')
GROUP BY salarie_id, periode
) e2 ON (e1.created_at = e2.createdat AND e1.salarie_id = e2.salarie_id AND e1.periode = e2.periode)
WHERE I.formule_id = 1
GROUP BY I.id, e1.periode
ORDER BY I.position, e1.periode
Just for information, this subquery :
SELECT MAX(created_at) AS createdat, salarie_id, periode
FROM enregistrement
WHERE (etat_mouvement_id IN (1,3,4))
AND (periode >= '201410' AND periode <= '201512')
AND created_at <= DATE_FORMAT('2015-02-03', '%Y-%m-%d %H:%i:%s')
GROUP BY salarie_id, periode
Only takes 0.003s to execute, thanks to my sp index :
ALTER TABLE enregistrement ADD INDEX sp(salarie_id, periode);
#DRapp : You were right on my JOINS, I removed position from the joins and corrected the query. On total field, I do want to get the values on a single column, to not to do conditions on my code logic.
I tried #DRapp indexes and query proposition, they just slowed or changed nothing to my query.
id select_type table type possible_keys key key_len ref rows Extra
------ ----------- -------------- ------ ---------------------------------------------- ---------------------------------------------- ------- --------------------------------- ------ ----------------------------------------------------
1 PRIMARY <derived2> ALL (NULL) (NULL) (NULL) (NULL) 165352 Using temporary; Using filesort
1 PRIMARY e1 ref sp sp 771 e2.salarie_id,e2.periode 1 Using where
1 PRIMARY I ALL PRIMARY (NULL) (NULL) (NULL) 21 Using where; Using join buffer (Block Nested Loop)
1 PRIMARY IM eq_ref indicateur_motif_indicateur_id_motif_id_unique indicateur_motif_indicateur_id_motif_id_unique 8 orhase.I.id,orhase.e1.position_id 1 Using index
2 DERIVED enregistrement index sp sp 771 (NULL) 165352 Using where
With this EXPLAIN result, I want to resolve the first line, which describes Using temporary; Using filesort. The solution would be to index the GROUP BY columns, but I dont know if it's possible to create a composite index on these two fields, because they come from different tables. What would be a better or an alternative solution ?
Thanks all for your answers :)
Related
I would like assistance with adding a subquery into the below query as I understand this is the method I need to use to get the result from the last record for scan_type column, not the first record in the group by due to mysql server running 5.7.
I have tried doing this but I am not understanding how I can put the subquery into the current query. I have tried unsuccessfully which causes the query to error.
Currently I am able to get the date/time stamp by using MAX which gives me the last record for the person's attendance, but I am having trouble getting the related "scan_type". Apart from this, the remainder of the query returns all of the expected results.
Below is the current query:
SELECT A.attendance_sessions_id, A.person_id, A.scan_type, A.absence_type, MAX(A.date_time), B.name, B.student_level
FROM `attendance_record` A
LEFT JOIN `person` B ON A.person_id = B.student_no
WHERE A.scan_type IS NULL
OR A.scan_type <> 'evac_scan'
OR A.scan_type NOT LIKE 'evac_%'
GROUP BY A.attendance_sessions_id, A.person_id
Below is the current output of the above query:
attendance_sessions_id
person_id
scan_type
absence_type
MAX(A.date_time)
name
student_level
1
65
scan_in
NULL
2022-02-06 12:59:48
Chris
Year 1
Expecting scan_type = "scan_out"
attendance_record table:
attendance_record_id
attendance_sessions_id
person_id
scan_type
absence_type
date_time
4
1
65
scan_in
NULL
2022-02-05 20:13:17
5
1
65
scan_out
NULL
2022-02-05 20:14:39
6
1
65
scan_in
NULL
2022-02-06 12:06:45
7
1
65
evac_scan
NULL
2022-02-06 12:53:01
8
1
65
scan_out
NULL
2022-02-06 12:59:48
person table:
person_id
student_no
name
student_level
9
65
Chris
Year 1
attendance_sessions table:
attenance_sessions_id
session_name
session_date_time
1
February Weekend 1
2022-02-05 00:01:00
Since some time only_full_group_by is the default, (at least for MySQL 8+ ). It would be great to change this query in such a way that it's handled correctly, als in the furture.
SELECT
x.attendance_sessions_id,
x.person_id,
A.scan_type,
A.absence_type,
x.max_date_time,
B.name,
B.student_level
FROM (
SELECT
A.attendance_sessions_id,
A.person_id,
-- A.scan_type,
-- A.absence_type,
MAX(A.date_time) as max_date_time,
-- B.name,
-- B.student_level
FROM `attendance_record` A
-- LEFT JOIN `person` B ON A.person_id = B.student_no
WHERE A.scan_type IS NULL
OR A.scan_type <> 'evac_scan'
OR A.scan_type NOT LIKE 'evac_%'
GROUP BY
A.attendance_sessions_id,
A.person_id
) x
INNER JOIN `attendance_record` A ON A.attendance_sessions_id = x.attendance_sessions_id
AND A.person_id = x.person_id
AND A.date_time = x.max_date_time
LEFT JOIN `person` B ON B.student_no = A.person_id
Removed some columns (--) because of the only_full_group_by setting, and removed the LEFT JOIN because in the current sub-query the table person is no longer used.
Changed query to sub-query, and added all (remove)fields to the outer query which also includes a JOIN to get the MAX record from attendance_record
NOTE: When there are multiple records with the same date_time for one attendance_sessions_id,person_id, this query will not produce correct results.
I recently posted this within a different page of Stack Exchange but believe this to be the more appropriate place for it.
Ok, the title seems abit confusing but I am struggling to put down what I need this query to do so best to explain it. I have 3 tables in my database (Using MySQL Workbench), but for this query I'm just trying to use one. The table named service_data has the following columns:
Services_ID|Service_Type|Day|Time|Customer_ID(FK)
1001 |SERVICE1 |Mon|0950|1
1002 |SERVICE2 |Tue|1032|65
1003 |SERVICE3 |Wed|0859|4
the table contains approx 200 records, my aim is to group the timings together, which i have managed to achieve by doing this:
select
case
WHEN (Delivery_Time between '08:00:00' and '09:00:00') then '0800-0900'
WHEN (Delivery_Time between '09:00:00' and '10:00:00') then '0900-1000'
WHEN (Delivery_Time between '10:00:00' and '11:00:00') then '1000-1100'
WHEN (Delivery_Time between '11:00:00' and '12:00:00') then '1100-1200'
WHEN (Delivery_Time between '12:00:00' and '13:00:00') then '1200-1300'
WHEN (Delivery_Time between '13:00:00' and '14:00:00') then '1300-1400'
WHEN (Delivery_Time between '14:00:00' and '15:00:00') then '1400-1500'
WHEN (Delivery_Time between '15:00:00' and '16:00:00') then '1500-1600'
WHEN (Delivery_Time between '16:00:00' and '17:00:00') then '1600-1700'
WHEN (Delivery_Time between '17:00:00' and '18:00:00') then '1700-1800'
WHEN (Delivery_Time between '18:00:00' and '19:00:00') then '1800-1900'
WHEN (Delivery_Time between '19:00:00' and '20:00:00') then '1900-2000'
WHEN (Delivery_Time between '20:00:00' and '21:00:00') then '2000-2100'
else 'Outside Opening Hours'
end as `Time Period`,
count(0) as 'count'
from service_data
group by `Time Period`
order by count desc
limit 20;
Which produces the below result:
TimePeriod Count
1700-1800 24
1500-1600 21
1200-1300 19
1400-1500 19
1800-1900 17
1100-1200 17
1300-1400 16
1600-1700 16
1000-1100 16
1900-2000 12
0800-0900 12
0900-1000 11
What I am now trying to do is split the count up so that there are 4 columns labelled SERVICE1 SERVICE2 SERVICE3 and SERVICE4 (the values within the Service_Type column. Hopefully so it looks something like this:
TimePeriod|SERVICE1|SERVICE2|SERVICE3|SERVICE4
1700-1800 | 6 | 7 | 10 | 1
1500-1600 | 5 | 9 | 1 | 6
1200-1300 | 0 | 4 | 2 | 13`
Is this Possible!? I'm sure it must be but i have been pulling my hair out trying to work it out, SQL isn't my first language! Any help would be appreciated
My second issue is:
I would like a second query to be able to do all of the above and then also link the results to a customer_data table who’s primary key customer_id is a foreign key in service_data and link the customer_id to the quadrant (column within customer_data table with values NE,SE,SW,NW dependant on coords) and group the count a second time by quadrant as well as service, so it looks like this:
TimePeriod| SERVICE1 | SERVICE2 | SERVICE3 | SERVICE4 |
-----------|NE|SE|SW|NW|NE|SE|SW|NW|NE|SE|SW|NW|NE|SE|SW|NW|
1700-1800 |2 |1 | 0| 3|4 | 0| 0|3 |2 |5 |2 |1 |0 |1 | 0| 0|
Again is this possible or am i asking too much? I was wondering if i could use the SUM(IF) function in some way to achieve all this?
Here's something to get you started, although I do agree with #Strawberry that this needs a programming language to do your last step.
This is not in any way optimised for performance or elegance, but I have tested it with your data as given above.
Here's my CREATE TABLE statement:
CREATE TABLE `service_data` (
`services_id` int(11) NOT NULL,
`service_type` varchar(45) DEFAULT NULL,
`day` varchar(45) DEFAULT NULL,
`time` time DEFAULT NULL,
`customer_id` int(11) DEFAULT NULL,
PRIMARY KEY (`services_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
First get your time ranges. I have knocked a second off the to-time so that a time that is bang on the hour doesn't get double counted.
create view spans as
select time(concat(hour(time),':00:00')) as fromtime,
time(concat(hour(addtime(time,'00:59:59')),':00:00')) as totime
from service_data
Now look up the time range for each row of data.
create view withspans as
select * from service_data s1
join spans s2
on time between fromtime and totime
Now summarise the data as input to the pivot.
create view summary as
select fromtime, totime, service_type, count(*) as spancount
from withspans
group by fromtime, totime, service_type
Now do the pivot via derived tables.
select w.fromtime, w.totime,
s1.spancount as service1,
s2.spancount as service2,
s3.spancount as service3,
s4.spancount as service4
from summary w
left join (select * from summary where service_type = 'SERVICE1') s1
on s1.fromtime=w.fromtime and s1.totime=w.totime
left join (select * from summary where service_type = 'SERVICE2') s2
on s2.fromtime=w.fromtime and s2.totime=w.totime
left join (select * from summary where service_type = 'SERVICE3') s3
on s3.fromtime=w.fromtime and s3.totime=w.totime
left join (select * from summary where service_type = 'SERVICE4') s4
on s4.fromtime=w.fromtime and s4.totime=w.totime
I have a problem with a MySQL query.
Here are my three tables:
employees:
id initials deleted
------ -------- ---------
1 TV 0
2 AH 0
3 JE 0
4 MA 0
5 MJ 0
6 CE 0
7 KB 1
8 KL 1
Schedule :
id place start end deleted
------ -------------- ------------------- ------------------- ---------
1 Somewhere 2018-05-11 16:48:17 2018-05-15 16:48:26 0
2 Somewhere else 2018-05-12 16:48:50 2018-05-14 16:48:55 1
3 Here 2018-05-13 00:00:00 2018-05-13 00:00:00 0
4 Not here 2018-05-18 16:49:42 2018-05-16 16:49:48 0
And schedule_link :
id id_employee id_schedule
------ ----------- -------------
1 1 1
2 1 2
3 4 3
4 5 4
I would like a request that returns all that is concerned by the date of the day for each employee. Even if the employee does not have any records found, I would like the query to return its initials with NULL in the other columns.
Here is my current query:
SELECT
`employees`.`id`,
`employees`.`initials`,
`schedule`.`place`,
`schedule`.`start`,
`schedule`.`end`,
`schedule`.`deleted`
FROM
`employees`
LEFT JOIN `schedule_link`
ON (
`employees`.`id` = `schedule_link`.`id_employee`
)
LEFT JOIN `schedule`
ON (
`schedule_link`.`id_schedule` = `schedule`.`id`
)
WHERE (
`employees`.`deleted` = '0'
AND `schedule`.`deleted` = '0'
)
AND (
DATE(CURDATE()) BETWEEN CAST(schedule.start AS DATE)
AND CAST(schedule.end AS DATE)
)
This returns me the following data:
id initials place start end deleted
------ -------- --------- ------------------- ------------------- ---------
1 TV Somewhere 2018-05-11 16:48:17 2018-05-15 16:48:26 0
4 MA Here 2018-05-13 00:00:00 2018-05-13 00:00:00 0
It's correct, but what I want is the following result:
id initials place START END deleted
------ -------- --------- ------------------- ------------------- ---------
1 TV Somewhere 2018-05-11 16:48:17 2018-05-15 16:48:26 0
4 MA Here 2018-05-13 00:00:00 2018-05-13 00:00:00 0
2 AH NULL NULL NULL 0
3 JE NULL NULL NULL 0
5 MJ NULL NULL NULL 0
6 CE NULL NULL NULL 0
Is it possible to obtain this result with a single request?
Thank you in advance for your help.
You need to put the conditions on all but the first table in the on clause. Your where clause is turning the outer join into an inner join.
I have some other suggestions:
SELECT e.id, e.initials, s.place, s.start, s.end, s.deleted
FROM employees e LEFT JOIN
schedule_link sl
ON e.id = sl.id_employee LEFT JOIN
schedule s
ON sl.id_schedule = s.id AND s.deleted = 0 AND
CURDATE() BETWEEN CAST(s.start AS DATE) AND CAST(s.end AS DATE)
WHERE e.deleted = 0;
Notes:
Table aliases make the query easier to write and to read.
Backticks just make the query harder to read and write.
Don't use start and end as column names (i.e., rename them if you can). They are keywords (although not reserved), so they have other purposes in a SQL statement.
I am guessing that deleted is numeric. Don't use single quotes for the comparison (unless the column is really a string).
CURDATE() is already a date. No need for conversion.
I don't recommend using BETWEEN with dates, because of the possibility of a lingering time component. However, you are using explicit conversions, so the code unambiguously does what you want (at the risk perhaps of not using an available index).
EDIT:
I see. Because the date condition is in the third table, not the second, you are getting duplicate rows. I think this will fix your problem:
SELECT e.id, e.initials, ss.place, ss.start, ss.end, ss.deleted
FROM employees e LEFT JOIN
(SELECT sl.id_employee, s.*
FROM schedule_link sl JOIN
schedule s
ON sl.id_schedule = s.id AND s.deleted = 0
WHERE CURDATE() BETWEEN CAST(s.start AS DATE) AND CAST(s.end AS DATE)
) ss
ON e.id = ss.id_employee
WHERE e.deleted = 0;
This will include every employee with no match on the time frame exactly once. You will still get every record from schedule if there are multiple matches.
You can actually express this without a subquery:
SELECT e.id, e.initials, s.place, s.start, s.end, s.deleted
FROM employees e LEFT JOIN
(schedule_link sl JOIN
schedule s
ON sl.id_schedule = s.id AND s.deleted = 0 AND
CURDATE() BETWEEN CAST(s.start AS DATE) AND CAST(s.end AS DATE)
)
ON e.id = sl.id_employee
WHERE e.deleted = 0;
I find the subquery version easier to follow.
My general take (without testing the query since I am away posting from my phone atm):
You should use RIGHT JOIN on second join in your query.
You will also need additional WHERE clause, i.e. where employees.id is not null
Don't forget to utilize ifull() for all fields from schedule table, i.e. ifnull(schedule_link.id, schedule_link.id, schedule.myfield)
Please notice that this should be done for all fields you want to show that come out of schedule table in this query
Hope these guidelines will be of any help to you.
I've got the following tables
menu_supp
menu_id supp_id supp_weight supp_load_follow
1 29 10.00 1
1 31 20.00 2
supps
supp_id user_id supp_name
29 1 Test supp 1
31 1 Test supp 2
supps_prop
supp_id supp_dry_w supp_price supp_date
29 95.00 125.00 2015-10-25
29 94.00 124.00 2015-11-06
29 94.00 128.00 2015-11-12
31 25.00 200.00 2015-06-25
Now I've got this query:
SELECT s.supp_id, s.supp_name, ms.supp_weight, sp.supp_price, sp.supp_dry_weight
FROM menu_supp ms
LEFT JOIN supps s ON ms.supp_id = s.supp_id
LEFT JOIN supps_prop sp ON ms.supp_id = sp.supp_id
WHERE menu_id = 1
GROUP BY s.supp_id
ORDER BY ms.supp_load_follow ASC
Which gives me this result:
supp_id supp_name supp_weight supp_price supp_dry_weight
29 Test supp 1 10.00 125.00 95.00
31 Test supp 2 20.00 200.00 25.00
From supp 29 it gets the oldest value. Where it should take the value based on the current date. How can I achieve that?
If the supp_date is unique for a supp_id then you can use the following to get the value for the latest date:-
SELECT s.supp_id, s.supp_name, ms.supp_weight, sp.supp_price, sp.supp_dry_weight
FROM menu_supp ms
LEFT JOIN supps s
ON ms.supp_id = s.supp_id
LEFT JOIN
(
SELECT supp_id, MAX(supp_date) AS max_supp_date
FROM supps_prop
GROUP BY supp_id
) sub0
ON ms.supp_id = sub0.supp_id
LEFT OUTER JOIN supps_prop sp
ON sub0.supp_id = sp.supp_id
AND sub0.max_supp_date = sp.supp_date
WHERE menu_id = 1
ORDER BY ms.supp_load_follow ASC
This gets the max supp_date for each supp_id and joins that back to the supps_prop table to get the other fields from it.
EDIT - Coping with either the highest date, or the lowest date after today is a bit more complicated.
I would suggest having 2 sub queries. One to get the highest date for each supp_id and one to get the lowest date on or after today for each supp_id. If the 2nd is found then use that, if not use the first. Not tested but:-
SELECT s.supp_id, s.supp_name, ms.supp_weight, COALESCE(sp1.supp_price, sp0.supp_price), COALESCE(sp1.supp_dry_weight, sp0.supp_dry_weight)
FROM menu_supp ms
LEFT JOIN supps s
ON ms.supp_id = s.supp_id
LEFT JOIN
(
SELECT supp_id, MAX(supp_date) AS max_supp_date
FROM supps_prop
GROUP BY supp_id
) sub0
ON ms.supp_id = sub0.supp_id
LEFT OUTER JOIN supps_prop sp0
ON sub0.supp_id = sp0.supp_id
AND sub0.max_supp_date = sp0.supp_date
LEFT JOIN
(
SELECT supp_id, MIN(supp_date) AS max_supp_date
FROM supps_prop
WHERE supp_date >= CURDATE()
GROUP BY supp_id
) sub1
ON ms.supp_id = sub1.supp_id
LEFT OUTER JOIN supps_prop sp1
ON sub1.supp_id = sp1.supp_id
AND sub1.max_supp_date = sp1.supp_date
WHERE menu_id = 1
ORDER BY ms.supp_load_follow ASC
EDIT - An explanation of GROUP BY, etc:-
GROUP BY is used for aggregate functions; these are functions that give a value over a range of rows which share common field values. For example, SUM would be used to add up the values of the fields over multiple rows often for a shared value (ie, maybe the SUM of order values for a customer id). The shared value field is used given in the GROUP BY field.
In normal standard SQL all the returned non aggregate fields returned by the SELECT statement must be mentioned in the GROUP BY statement. This makes logical sense as if they are not mentioned then the values for a group of rows could be different and then there is the problem of which one to choose.
However there are times when this can be a bit too restrictive. For example if you are grouping by a customer id then the customer name is directly related to this customer id. MySQL does allow you to return non aggregate fields in the SELECT statement that are not specified in the GROUP BY clause, but if the values vary over the rows that are grouped together then which value is chosen is not specified; it could be from any of the rows, and indeed there is no reason that it might not change in the future or when using a different storage engine.
Sometimes GROUP BY is abused to return unique rows, in the way that DISTINCT is meant to be used.
In your original query
SELECT s.supp_id, s.supp_name, ms.supp_weight, sp.supp_price,
sp.supp_dry_weight FROM menu_supp ms LEFT JOIN supps s ON ms.supp_id =
s.supp_id LEFT JOIN supps_prop sp ON ms.supp_id = sp.supp_id WHERE
menu_id = 1 GROUP BY s.supp_id ORDER BY ms.supp_load_follow ASC
you are using GROUP BY s.supp_id. While s.supp_name is dependent on this, ms.supp_weight and sp.supp_price are not. There could be numerous values of each of these for any s.supp_id. MySQL has just used the value from one of the grouped rows for these and doesn't really care which row it chose to use.
Here is your query without the group by and using inner joins. It appears to me that no supp_id would be inserted into menu_supp that is not already defined in supps. I suppose it would be possible to have no entry in supps_prop but that looks doubtful also. If I am wrong, simply change it back.
SELECT s.supp_id, s.supp_name, ms.supp_weight, sp.supp_price,
sp.supp_dry_w, sp.supp_date
FROM menu_supp ms
JOIN supps s
ON s.supp_id = ms.supp_id
JOIN supps_prop sp
ON sp.supp_id = ms.supp_id
WHERE menu_id = 1
ORDER BY ms.supp_load_follow;
I've also added the date to make it easier to follow. The results are all four possible rows:
supp_id supp_name supp_weight supp_price supp_dry_w supp_date
------- --------- ----------- ---------- ---------- ---------
29 Test supp 1 10.00 125.00 95.00 2015-10-25
29 Test supp 1 10.00 124.00 94.00 2015-11-06
29 Test supp 1 10.00 128.00 94.00 2015-11-12
31 Test supp 2 20.00 200.00 25.00 2015-06-25
Obviously, you only want to join with the prop information contained in the row with the current or most recent date. That date is the largest value still in the past. Which can be found like this:
SELECT s.supp_id, s.supp_name, ms.supp_weight, sp.supp_price,
sp.supp_dry_w, sp.supp_date
FROM menu_supp ms
JOIN supps s
ON s.supp_id = ms.supp_id
JOIN supps_prop sp
ON sp.supp_id = ms.supp_id
and sp.supp_date =(
select Max( supp_date )
from supps_prop
where supp_id = ms.supp_id
and supp_date <= NOW() )
WHERE menu_id = 1
ORDER BY ms.supp_load_follow;
Don't let the subquery concern you. Since the combination of supp_id and supp_date are the most obvious PK for the prop table, those fields should already be indexed, making this an impressively fast query.
See it in action at sqlfiddle.
if I have a table like the following, how could I only select out those serial_number and its contract_type with later expiry date?
serial_number contract_type expiry_date
abc001 SPRT 2011-05-31 00:00:00
abc001 HOMD 2013-05-31 00:00:00
abc002 SPRT 2012-10-14 00:00:00
abc002 HOMD 2011-10-14 00:00:00
abc003 SPRT 2014-05-31 00:00:00
abc003 HOMD 2011-05-31 00:00:00
................
1) I could make the assumption if it makes this query simpler: each serial_number(SN) will have two and only two contract_types in the table.
2) the actual situation is: SN and contract_type are the primary key, and I'm only looking for the contract_type 'SPRT' and 'HOMD'.
The final result set I need is:
SN with only 'SPRT' or 'HOMD' contract_type
if SN has both 'SPRT' and 'HOMD', I only need the SN's record with later expiry date (if they have the same expiry date, only pick one)
Anyone could give out the query? the actual case might be too complicated to get in one query, but how about the first simplified case.
SELECT t.serial_number, t.contract_type, t.expiry_date
FROM YourTable t
INNER JOIN (SELECT serial_number, MAX(expiry_date) AS MaxDate
FROM YourTable
WHERE contract_type IN ('SPRT', 'HOMD')
GROUP BY serial_number) q
ON t.serial_number = q.serial_number
AND t.expiry_date = q.MaxDate
WHERE t.contract_type IN ('SPRT', 'HOMD')