MySql Grouping with Most Recent Values

MySql Grouping with Most Recent Values - mysql

I've got quite a long query that collects data from various sources to create a report on jobs being checked.
Just to add a bit of context:
I have a table of 'jobs'. Each of these jobs is linked to a certain area and location and given a complexity rating depending on how difficult the job is. Each job is checked by a supervisor, scored out of 10 and then entered onto the system. Depending on the score it achieves it is given a complexity rating and an interval to be checked again. i.e. lower score means checked more often than higher score.
I'm writing a query that collects each job by ID, name, etc, and then gets the last time it was checked, the next time it's due to be checked as well as the current score, who entered it onto the system and actual supervisor who checked the job.
Unfortunately the query I have isn't giving me the most recent values but rather the least recent. Besides the "Last_Checked" field (see below). For this the MAX() function is working; however with other fields this isn't relevant.
Here's a breakdown of the tables and query:
Table 1 : Jobs
Job_ID | Job_Name | Job_Area | Job_Location | Job_Complexity
1 MyJob 1 1 2
2 AnothJob 1 2 1
Table 2 : Areas
Area_ID | Area_Name
1 Area
Table 3 : Locations
Location_ID | Location_Area | Location_Name
1 1 MyLocation1
2 1 MyLocation2
Table 4 : Complexity
Complexity_ID | Complexity_Label | Complexity_Interval_Days
1 Very Difficult 25
2 Difficult 35
Table 5 : Users
User_ID | User_FirstName | User_LastName
1 Jane Doe
Table 6 : Supervisors
Supervisor_ID | Supervisor_FirstName | Supervisor_LastName
1 John Doe
2 Barry Sheen
Table 7 : Checks
Check_ID | Check_Job_ID | Check_Date | Check_Score | Check_User | Check_Supervisor
1 1 27-03-17 8 1 1
2 1 28-03-17 5 1 2
3 1 29-03-17 6 1 2
Current Query
SELECT
j.Job_ID,
a.Area_Name,
d.Location_Name,
j.Job_Name,
MAX(c.Checked_Date) as Last_Checked,
Date_Add(MAX(c.Checked_Date), interval r.Complexity_TimePeriod day) as Due_Date,
Datediff(Date_Add(MAX(c.Checked_Date), interval r.Complexity_TimePeriod day), Now()) as Due_Days,
c.Check_Score as Current_Score,
CONCAT(u.User_FirstName, ' ', u.User_LastName) as Entered_By,
CONCAT(s.Supervisor_FirstName, ' ', s.Supervisor_LastName) as Supervisor,
r.Complexity_Level
from Jobs_active j
left join pdc_admin.admin_areas a
on a.Area_ID = j.Job_area
left join pdc_admin.admin_Locations l
on l.Location_ID = j.Job_Location
left join Jobs_Checks c
on c.Check_Job_ID = j.Job_ID
left join pdc_admin.admin_users u
on u.user_id = c.Check_Person
left join Jobs_Complexity_config r
on r.Complexity_ID = j.Job_Complexity
left join admin_Supervisors s on
s.Supervisor_ID = c.Check_Supervisor
group by j.Job_ID
What I'd like to get from this would be something like so:
Job_ID | Area_Name | Location_Name | Job_Name | Last_Checked | Due_Date | Due_Days | Current_Score | Entered_By | Supervisor | Complexity_Level
1 Area | MyLocation1 MyJob 29-03-17 03-05-17 35 6 Jane Doe Barry Sheen Difficult
As you can see, the results show the latest fields (i.e. score/supervisor who checked) but don't show any more than 1 row per job. In essence, I am after the latest information about each job without showing anything about the previous times it has been checked.
Information overload... all help is appreciated thank-you!

Related

MySQL - Retrieve the max value of an associated column within a LEFT JOIN with a different perimeter than the WHERE clause of the main query

I'm using MySql 5.6 and have a select query with a LEFT JOIN but i need to retrieve the max of a associated column email_nb) but with a different "perimeter" of constraints.
Let's take an example: let me state that it is a mere example with only 5 rows but it should work also when I have thousands... (I'm stating this since there is a LIMIT clause in my query)
Table 'query_results'
+-----------------------------+------------+--------------+
| query_result_id | query_id | author |
+-----------------------------+------------+--------------+
| 2 | 1 | john |
| 3 | 1 | eric |
| 7 | 3 | martha |
| 9 | 4 | john |
| 10 | 1 | john |
+-----------------------------+------------+--------------+
Table 'customers_emails'
+-------------------+-----------------+--------------+-----------+-------------+------------------------
| customer_email_id | query_result_id | customer_id | author | email_nb | days_since_sending
+-------------------+-----------------+--------------+-----------+-------------+------------------------
| 5 | 2 | 12 | john | 2 | 150
| 12 | 3 | 7 | eric | 4 | 90
| 27 | 3 | 12 | eric | 2 | 86
| 40 | 9 | 15 | john | 9 | 87
| 42 | 2 | 12 | john | 7 | 23
| 51 | 10 | 12 | john | 3 | 89
+-------------------+-----------------+--------------+-----------+-------------+-----------------------
Notes:
you can have a query_result where the author appears in NO row at all in any of the customers_emails, hence the LEFT JOIN I'm using.
You can see author is by design kind of duplicated as it's both on the first table and the second table each time associated with a query_result_id. It's important to note.
email_nb is an integer between 0 and 10
there is a LIMIT clause as I need to retrieve a set number of records
Today my query aims at retrieving query_results with a certain number of conditions on The specificity is that I make sure to retrieve query_results with an author who does not appear in any customer_email_id where the days_since_sending would be less than 60 days: it means i check these days_since_sending not only within the records for this query, but across all customers_emails thanks to the subquery NOT IN (see below).
This is my current query for customer_id = 12 and query_id = 1
SELECT
qr.query_result_id,
qr.author,
FROM
query_results qr
LEFT JOIN
customers_emails ce
ON
qr.author = ce.author
WHERE
qr.query_id = 1 AND
qr.author IS NOT NULL
AND qr.author NOT IN (
SELECT recipient
FROM customers_emails
WHERE
(
customer_id = 12 AND
( days_since_sending >= 60) )
)
)
# we don't take by coincidence/bad luck 2 query results with the same author
GROUP BY
qr.author
ORDER BY
qr.query_result_id ASC
LIMIT
20
This is the expected output:
+-----------------------------+------------+--------------+
| query_result_id | author | email_nb |
+-----------------------------+------------+--------------+
| 10 | john | 7 |
| 3 | eric | 2 |
+-----------------------------+------------+--------------+
My challenge/difficulty today:
Notice on the 2nd line Eric is tied to email_nb 2 and not the max of all Eric's emails which could have been 4 if we had taken the max of email_nb across ALL messages to author=eric. but we stay within the limit of customer_id = 12 so there's only one left with email_nb = 2
Also notice that on the first line, the email_nb associated with query_result = 10 is 7, and not 3, which could have been the case as 3 is what appears in table customers_emails on the last line.
Indeed for emails to 'john' i had the choice between email_nb 2, 7 and 3 but I take highest so it's 7 (even if this email is from more than 60 days ago !! This is very important and part of what I don't know how to do: the perimeters are different: today I retrieve all the query_results where the author has NOT been sent a email for the past 60 days (see the NOT IN subquery) BUT I need to have in the column the max email_nb sent to john by customer_id=12 and query_id=1 EVEN if it was sent more than 60 days ago so these are different perimeters...Don't really know how to do this...
It means in other words I don't want to find the max (email_nb) within the same WHERE clauses such as days_since_sending >= 60 or within the same LIMIT and GROUP BY...as my current query: what I neeed is to retrieve the maximum value of email_nb for customer_id=12 AND query_id=1 and sent to john across ALL records on the customers_emails table!
If there is no associated row on customers_emails at all (it means no email have been ever sent by this customer for this query in the past) then the email_nb should be sth like NULL..
This means I do NOT want this output:
+-----------------------------+------------+--------------+
| query_result_id | author | email_nb |
+-----------------------------+------------+--------------+
| 10 | john | 3 |
| 3 | eric | 2 |
+-----------------------------+------------+--------------+
How to achieve this in MySQL 5.6 ?

Since you were confusing a bit, I came up on this.
select
max(q.query_result_id) as query_result_id,q.author,max(email_nb) as email_nb
from query_results q
left join customers_emails c on q.author=c.author
where customer_id=12 and query_id=1
group by q.author;

I think the best thing to do in a situation like this is break it down into smaller queries and then combine them together.
The first thing you want to do is this:
The specificity is that I make sure to retrieve query_results with an author who does not appear in any customer_email_id where the days_since_sending would be less than 60 days
This might look something like this:
-- Query A
SELECT DISTINCT q.author FROM query_results q
WHERE q.author NOT IN (
SELECT c.author FROM customers_emails c
WHERE c.days_since_sending < 60
)
AND q.query_id = 1
This will get you the list of authors (with duplicates removed) that haven't had an email in the last 60 days that appear for the given query ID. Your next requirement is the following:
I need to have in the column the max email_nb sent to john by customer_id=12 and query_id=1 EVEN if it was sent more than 60 days ago
This query could look like this:
-- Query B
SELECT c.query_result_id, c.author, MAX(c.email_nb) as max_email_nb
FROM customers_emails c
LEFT JOIN query_results q ON c.author = q.author
WHERE c.customer_id = 12
AND q.query_id = 1
GROUP BY c.query_result_id, c.author
That gets you the maximum email_nb for each author/query_result combination, not taking into consideration the date at all.
The only thing left to do is reduce the set of results from the second query down to only the authors that appear in the first query. There are a few different methods for doing that. For example, you could INNER JOIN the two queries by author:
SELECT b.* FROM (
-- Query B
SELECT c.query_result_id, c.author, MAX(c.email_nb) as max_email_nb
FROM customers_emails c
LEFT JOIN query_results q ON c.author = q.author
WHERE c.customer_id = 12
AND q.query_id = 1
GROUP BY c.query_result_id, c.author
) b INNER JOIN (
-- Query A
SELECT DISTINCT q.author FROM query_results q
WHERE q.author NOT IN (
SELECT c.author FROM customers_emails c
WHERE c.days_since_sending < 60
)
AND q.query_id = 1
) a ON a.author = b.author
You could use another NOT IN clause:
SELECT b.* FROM (
-- Query B
SELECT c.query_result_id, c.author, MAX(c.email_nb) as max_email_nb
FROM customers_emails c
LEFT JOIN query_results q ON c.author = q.author
WHERE c.customer_id = 12
AND q.query_id = 1
GROUP BY c.query_result_id, c.author
) b
WHERE b.author NOT IN (
-- Query A
SELECT DISTINCT q.author FROM query_results q
WHERE q.author NOT IN (
SELECT c.author FROM customers_emails c
WHERE c.days_since_sending < 60
)
AND q.query_id = 1
) a
There are most likely ways to improve the speed or reduce down the lines of code for this query, but if you need to do that you now have a query that works at least that you can compare the results to.

MySQL add avg of count by id to existing select with id

Im not even sure what the title of this question should be but lets start out with my data.
I have a table of users who have taken a few lessons while belonging to a particular training center.
lesson table
id | lesson_id | user_id | has_completed
----------------------------------------
1 | asdf3314 | 2 | 1
2 | d13saf12 | 2 | 1
3 | a33adff5 | 2 | 0
4 | a33adff5 | 1 | 1
5 | d13saf12 | 1 | 0
user table
id | center_id | ...
----------------------------------------
1 | 20 | ...
2 | 30 | ...
training center table
id | center_name | ...
----------------------------------------
20 | learn.co | ...
30 | teach.co | ...
I've written a small chunk but am now stuck as I don't know how to proceed. This statement gets the counted total of completed lessons per user. it then figures the average completed value from a center id. if two users belong to a center and have completed 3 lessons and 2 lessons it finds the average of 3 and 2 then returns that.
SELECT
FLOOR(AVG(a.total)) AS avg_completion,
FROM
(SELECT
user_id,
user.center_id,
count(user_id) AS total
FROM lesson
LEFT JOIN user ON user.id = user_id
WHERE is_completed = 1 AND center_id = 2
GROUP BY user_id) AS a;
The question I have is how do I loop through the training centers table and also append average data from similar select statement as above to each center that is queried. I cant seem to pass the center id down to the subquery so there must be a fundamentally different way to achieve the same query but also loop through training centers.
An example of desired result:
center.id | avg_completion | ...training center table
-----------------------------------------------------
20 | 2 | ...

Your main query needs to select a.center_id and then use GROUP BY center_id. You can then join it with the training_center table.
SELECT c.*, x.avg_completion
FROM training_center AS c
JOIN (
SELECT
a.center_id,
FLOOR(AVG(a.total)) AS avg_completion
FROM (
SELECT
user_id
user.center_id,
count(*) AS total
FROM lesson
JOIN user ON user.id = user_id
WHERE is_completed = 1 AND center_id = 2
GROUP BY user_id) AS a
GROUP BY a.center_id) AS x
ON x.center_id = c.id

If I understand correctly:
select u.center_id, count(*) as num_users,
sum(l.has_completed) as num_completed,
avg(l.has_completed) as completed_ratio
from lesson l join
user u
on l.user_id = u.id
group by u.center_id

MySQL SUM previous row by date column using Union

I am hoping I am just stumped because its the end of the work day on a Monday, and someone here can give me a hand.
Basically I have 2 tables that have invoice information and a table that has payment information. Using the following I get the first part of my display.
SELECT d.id, i.id as invid, i.company_id, d.total, created, adjustment FROM tbl_finance_invoices as i
LEFT JOIN tbl_finance_invoice_details as d ON d.invoice_id = i.id
WHERE company_id = '69350'
UNION
SELECT id, 0, comp_id, amount_paid, uploaded_date, 'paid' FROM tbl_finance_invoice_paid_items
WHERE comp_id = '69350'
ORDER BY created
What I want to do is:
Create a new column called "Balance" that adds total to the previous total by the created column regardless of how the rest of the table is sorted.
To give a quick example, my current output is something like:
id | invid | company_id | total | created | adjustment
12 | 16 | 1 | 40 | 01/01/16| 0
100| 0 | 1 | 10 | 01/05/16| 0
50 | 20 | 1 | 50 | 05/01/16| 0
What my goal is would be:
id | invid | company_id | total | created | adjustment | balance |Notes
12 | 16 | 1 | 40 | 01/01/16| 0 | 40 | 0 + 40
100| 0 | 1 | 10 | 01/05/16| 1 | 50 | 40 + 10
50 | 20 | 1 | 50 | 05/01/16| 0 | 100 | 50 + 50
And regardless of sorting by id, invid, total, created, etc, the balance would always be tied to the created date.
So if I added a "Where adjustment = '1'" to my sql, I would get:
100| 0 | 1 | 10 | 01/05/16| 1 | 50 | 40 + 10

Since the OP confirmed my understanding in comments, I'm basing my answer on the following assumption:
The running total would be tied to the order of created_date. The
running total would only be affected by company id as a filtering
criterion, all other filters should be disregarded for that
calculation.
Since the running total may have a different order by and filtering criteria than the rest of the query, therefore the running total calculation has to be placed in a subquery.
The other assumption I have to make is that there cannot be more than one invoice with the same created date for a single customer id, since the original query in the OP does not have any group by or summing either.
I prefer to use the approach suggested by #OMG Ponies in this post on SO, where he initiates the mysql variable holding the running total in a subquery, thus there is no need to initialize the variable in a separate set statement.
SELECT d.id, i.id as invid, i.company_id, rt.total, rt.cumulative_sum, rt.created, adjustment
FROM tbl_finance_invoices as i
LEFT JOIN tbl_finance_invoice_details as d ON d.invoice_id = i.id
LEFT JOIN
(SELECT d.total, created, #running_total := #running_total + t.count AS cumulative_sum
FROM tbl_finance_invoices as i
LEFT JOIN tbl_finance_invoice_details as d ON d.invoice_id = i.id
JOIN (SELECT #running_total := 0) r -- no join condition, so this produces a carthesian join
WHERE company_id = '69350'
ORDER BY created) rt
ON i.created=rt.created --this is also an assumption, I do not know which original table holds the created field
WHERE company_id = '69350' and adjustment=1
ORDER BY d.id
If you need to take the amounts from the tbl_finance_invoice_paid_items into account as well, then you need to add that to the subquery.

MYSQL to update values based on where clause and Join

I am very new to MYSQL, and need to know how to update a table based on average data, and also data in another table.
I have a list of grades out of 10 for pupils
user | score | average grade | Band
-----------------------------------
1 | 4 | 3.5 |
2 | 2 | 2 |
4 | 9 | 9 |
1 | 3 | 3 |
1 | 3.5 | |
2 | 2 | |
I want to update band and a scale of A,B or C to indicate their average score is 0-3, 3,6, or 6-10.
Band A = 0-4
Band B = 4-7
Band C = 7-10
Sometimes there is a delay between a user registering for a test and the score being inputted (as in case of row 5) I want the band to be visible. So this is the final goal result.
user | score | average grade | Band
------------------------------------
1 | 4 | 3.5 | A
2 | 2 | 2 |
4 | 9 | 9 | C
1 | 3 | 3 | A
2 | NULL | 3.5 | A
2 | NULL | 2 |
Also I want the band only to be updated if the user has paid a fee, so I have a seperate table with this data.
User | Paid
-----------
1 | 1
2 | 0
3 | 0
4 | 1
So if a user hasn't paid, then average grade is updated, but Band remains empty (or unchanged if populated)
At present I have a score table and a user table. The user table is a view which calculates the average grade
The only way I can think of doing this without the view, is to have a cron job which runs every 10 minutes that inserts the Bands and average grade into grade table.

Ok firstly there is absolutely no reason why you should store average grade or bands in this table.
You can always calculate it as needed, or if you needed to cache it it could be stored in a separate table that has a single entry per user. As it stands, you are repeating the same information in many records. So actually you don't need an update at all, just drop both of those columns and use a better select statement.
To make things a little bit easier, you could add a band table to keep track of bands.
CREATE TABLE bands (
band varchar(2),
maxval int,
minval int
);
INSERT INTO bands (band,minval,maxval)
VALUES
('A',0,3),
('B',4,6),
('C',7,10);
You should probably index that but it's so small it might not matter. It lets you modify the bands in the future anyway. You could, of course, not use that table and use if statements instead, but I like this way, because then you can do:
SELECT sub.user,sub.average_score,bands.band
FROM (
SELECT u.user, avg(u.score) as average_score
FROM user_tests as u
GROUP BY u.user ) as sub
LEFT JOIN user_paid ON user_paid.user = sub.user
LEFT JOIN bands
ON sub.average_score >= bands.minval
AND sub.average_score <= bands.maxval
AND user_paid.paid = 1
Fiddle
whenever you need average score and band.
Alternatively, if you want to keep those values stored somewhere, add the columns to your user paid table (make it user_info or something) and use this statement (which is really just wrapping the previous statement inside an update)
UPDATE user_info
INNER JOIN (
SELECT sub.user,sub.average_score,bands.band
FROM (
SELECT u.user, avg(u.score) as average_score
FROM user_tests as u
GROUP BY u.user ) as sub
LEFT JOIN user_info ON user_info.user = sub.user
LEFT JOIN bands
ON sub.average_score >= bands.minval
AND sub.average_score <= bands.maxval
AND user_info.paid = 1
) as upselect ON upselect.user = user_info.user
SET user_info.average_score = upselect.average_score,
user_info.band = upselect.band
Fiddle

Select where sum less than value

I am working on a PHP/MySQL timesheet system, and the report I want to create selects all employees who have worked less than the required amount of time between two dates.
The employee's time is stored in hours and minutes (INT), but I am only concerned with the hours.
The employee table looks like:
ID | Name
1 | George
2 | Fred
The timesheet_entry table:
ID | employeeID | hour | date
1 | 1 | 2 | 2013-07-25
2 | 2 | 4 | 2013-07-25
3 | 1 | 3 | 2013-07-25
So if I SELECT employees who have worked less than 5 hours (PHP variable hrsLimit) on 2013-07-25, it should return 2 Fred, as George has worked a total of 5 hours on that date.
I have a HTML form so the user can set the variables for the query.
I have tried:
SELECT employeeid,
employeename
FROM employee
JOIN timesheet_entry tse
ON tse.tse_employeeid = employeeid
AND Sum(tse.hour) < $hrslimit
I have not worried about the date yet.
The confusing bit here is that we are joining two tables. Perhaps I should select the hours and put the SUM clause at the end in a WHERE instead?

You need to group data and then place SUM condition in the HAVING part of the query.
select employee.id,
employee.Name,
Date,
sum(`hour`)
from timesheet_entry
join employee on timesheet_entry.employeeID=employee.ID
group by timesheet_entry.employeeID,date
having sum(`hour`)<$hrslimit
SQLFiddle demo

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySql Grouping with Most Recent Values - mysql

Related

MySQL - Retrieve the max value of an associated column within a LEFT JOIN with a different perimeter than the WHERE clause of the main query

MySQL add avg of count by id to existing select with id

MySQL SUM previous row by date column using Union

MYSQL to update values based on where clause and Join

Select where sum less than value

Categories

Resources