Select most recent record grouped by 3 columns - mysql

I am trying to return the price of the most recent record grouped by ItemNum and FeeSched, Customer can be eliminated. I am having trouble understanding how I can do that reasonably.
The issue is that I am joining about 5 tables containing hundreds of thousands of rows to end up with this result set. The initial query takes about a minute to run, and there has been some trouble with timeout errors in the past. Since this will run on a client's workstation, it may run even slower, and I have no access to modify server settings to increase memory / timeouts.
Here is my data:
Customer Price ItemNum FeeSched Date
5 70.75 01202 12 12-06-2017
5 70.80 01202 12 06-07-2016
5 70.80 01202 12 07-21-2017
5 70.80 01202 12 10-26-2016
5 82.63 02144 61 12-06-2017
5 84.46 02144 61 06-07-2016
5 84.46 02144 61 07-21-2017
5 84.46 02144 61 10-26-2016
I don't have access to create temporary tables, or views and there is no such thing as a #variable in C-tree, but in most ways it acts like MySql. I wanted to use something like GROUP BY ItemNum, FeeSched and select MAX(Date). The issue is that unless I put Price into the GROUP BY I get an error.
I could run the query again only selecting ItemNum, FeeSched, Date and then doing an INNER JOIN, but with the query taking a minute to run each time, it seems there is a better way that maybe I don't know.
Here is my query I am running, it isn't really that complicated of a query other than the amount of data it is processing. Final results are about 50,000 rows. I can't share much about the database structure as it is covered under an NDA.
SELECT DISTINCT
CustomerNum,
paid as Price,
ItemNum,
n.pdate as newest
from admin.fullproclog as f
INNER JOIN (
SELECT
id,
itemId,
MAX(TO_CHAR(pdate, 'MM-DD-YYYY')) as pdate
from admin.fullproclog
WHERE pdate > timestampadd(sql_tsi_year, -3, NOW())
group by id, itemId
) as n ON n.id = f.id AND n.itemId = f.itemId AND n.pdate = f.pdate
LEFT join (SELECT itemId AS linkid, ItemNum FROM admin.itemlist) AS codes ON codes.linkid = f.itemId AND ItemNum >0
INNER join (SELECT DISTINCT parent_id,
MAX(ins1.feesched) as CustomerNum
FROM admin.customers AS p
left join admin.feeschedule AS ins1
ON ins1.feescheduleid = p.primfeescheduleid
left join admin.group AS c1
ON c1.insid = ins1.feesched
WHERE status =1
GROUP BY parent_id)
AS ip ON ip.parent_id = f.parent_id
WHERE CustomerNum >0 AND ItemNum >0
UNION ALL
SELECT DISTINCT
CustomerNum,
secpaid as Price,
ItemNum,
n.pdate as newest
from admin.fullproclog as f
INNER JOIN (
SELECT
id,
itemId,
MAX(TO_CHAR(pdate, 'MM-DD-YYYY')) as pdate
from admin.fullproclog
WHERE pdate > timestampadd(sql_tsi_year, -3, NOW())
group by id, itemId
) as n ON n.id = f.id AND n.itemId = f.itemId AND n.pdate = f.pdate
LEFT join (SELECT itemId AS linkid, ItemNum FROM admin.itemlist) AS codes ON codes.linkid = f.itemId AND ItemNum >0
INNER join (SELECT DISTINCT parent_id,
MAX(ins1.feesched) as CustomerNum
FROM admin.customers AS p
left join admin.feeschedule AS ins1
ON ins1.feescheduleid = p.secfeescheduleid
left join admin.group AS c1
ON c1.insid = ins1.feesched
WHERE status =1
GROUP BY parent_id)
AS ip ON ip.parent_id = f.parent_id
WHERE CustomerNum >0 AND ItemNum >0

I feel it quite simple when I'd read the first three paragraphs, but I get a little confused when I've read the whole question.
Whatever you have done to get the data posted above, once you've got the data like that it's easy to retrive "the most recent record grouped by ItemNum and FeeSched".
How to:
Firstly, sort the whole result set by Date DESC.
Secondly, select fields you need from the sorted result set and group by ItemNum, FeeSched without any aggregation methods.
So, the query might be something like this:
SELECT t.Price, t.ItemNum, t.FeeSched, t.Date
FROM (SELECT * FROM table ORDER BY Date DESC) AS t
GROUP BY t.ItemNum, t.FeeSched;
How it works:
When your data is grouped and you select rows without aggregation methods, it will only return you the first row of each group. As you have sorted all rows before grouping, so the first row would exactly be "the most recent record".
Contact me if you got any problems or errors with this approach.

You can also try like this:
Select Price, ItemNum, FeeSched, Date from table where Date IN (Select MAX(Date) from table group by ItemNum, FeeSched,Customer);
Internal sql query return maximum date group by ItemNum and FeeSched and IN statement fetch only the records with maximum date.

Related

SQL: Select records based on comparison of two most recent associated records

Let's say we have a person table and survey table. survey is a set of attributes collected from a person at some point in time. Let's say survey has columns address and marriage_status
How do I select all persons whose address or marriage status has changed in the last survey?
Here's how I would write it if MySQL were able to magically interpret my intention:
SELECT *
FROM person
JOIN
(SELECT *
FROM survey
GROUP BY survey.person_id
ORDER BY survey.timestamp DESC
LIMIT 2 EACH) -- of course this part doesn't actually work. Trying to get last 2 records per person
surveys
ON surveys.person_id = person.id
WHERE surveys[0].address != surveys[1].address
OR surveys[0].marriage_status != surveys[1].marriage_status;
OR
SELECT *
FROM person
JOIN
(SELECT MOST RECENT survey FOR EACH person) latest_survey
ON latest_survey.person_id = person.id
JOIN
(SELECT SECOND MOST RECENT survey FOR EACH person) previous_survey
ON previous_survey.person_id = person.id
WHERE latest_survey.address != previous_survey.address
OR latest_survey.marriage_status != previous_survey.marriage_status;
This seems like a relatively straightforward query, but it's driving me crazy. I suspect I have tunnel vision and I'm not approaching this the right way.
EDIT: I am on MySQL v5. Based on the first couple answers, it seems like this might be the time to migrate to v8 (among other reasons)
So here's how I ended up doing it. It's a little long, but I think it's pretty straightforward? This felt amazing to get working.
(Note that underscores are used as prefixes in table aliases to help keep track of subquery depth)
SELECT person.*
FROM person
JOIN (
-- Join full survey data against each 'most recent' survey timestamp
SELECT s1.*
FROM survey s1
JOIN (
-- get most recent timestamp for each person
SELECT _s1.person_id, MAX(_s1.timestamp) timestamp
FROM survey _s1
GROUP BY person_id
) latest_surveys
ON latest_surveys.person_id = s1.person_id and latest_surveys.timestamp = s1.timestamp
) latest
ON latest.person_id = person.id
JOIN (
-- Join full survey data against each 'SECOND most recent' survey timestamp
select s2.*
from survey s2
JOIN (
-- to get SECOND most recent survey timestamp, do similar query, but exclude latest timestamp
SELECT _s2.person_id, MAX(_s2.timestamp) timestamp
FROM survey _s2
JOIN (
-- get most recent timestamp for each person (again)
SELECT __s2.person_id, MAX(__s2.timestamp) timestamp
FROM survey __s2
GROUP BY person_id
) _latest_surveys
-- Note the *NOT* equal here
ON _latest_surveys.person_id = _s2.person_id and _latest_surveys.timestamp != _s2.timestamp
GROUP BY _s2.person_id
) previous_surveys
ON previous_surveys.person_id = s2.person_id and previous_surveys.timestamp = s2.timestamp
) previous
ON previous.person_id = person.id
WHERE latest.address != previous.address
OR latest.marriage_status != previous.marriage_status;
Analytic functions make your question much more tractable. If you are not yet using MySQL 8+, then now would be a good time to upgrade. Assuming you are using MySQL 8+, we can try:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY p.id ORDER BY s.timestamp DESC) rn
FROM person p
INNER JOIN survey s ON p.id = s.person_id
)
SELECT id
FROM cte
GROUP BY id
HAVING
MAX(CASE WHEN rn = 1 THEN address END) <> MAX(CASE WHEN rn = 2 THEN address END) OR
MAX(CASE WHEN rn = 1 THEN marriage_status END) <> MAX(CASE WHEN rn = 2 THEN marriage_status END);
The above query uses a pivot trick to isolate the latest, and second latest, addresses and marriage statuses for each person. It retains person id values for those whose latest and second latest addresses or marriage statuses are not identical.
This might be how you can achieve that:
SELECT *
FROM person
JOIN (
SELECT *,
MAX(survey_date) latest_survey,
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(survey_date ORDER BY person_id, survey_date ASC),',',-2),',',1) previous_survey,
SUBSTRING_INDEX(GROUP_CONCAT(address ORDER BY person_id, survey_date ASC),',',-1) curadd,
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(address ORDER BY person_id, survey_date ASC),',',-2),',',1) prevadd,
SUBSTRING_INDEX(GROUP_CONCAT(marriage_status ORDER BY person_id, survey_date ASC),',',-1) curms,
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(marriage_status ORDER BY person_id, survey_date ASC),',',-2),',',1) prevms
FROM survey GROUP BY person_id
HAVING curadd != prevadd OR curms != prevms) A
ON person.id=A.person_id;
Using GROUP_CONCAT and SUBSTRING_INDEX to combine the data value then separate it again and using those to compare at the end. I know there are a bunch of ways to achieve without all these, like your second example is something that I think can be done but when I think about it, it's going to be a very long query. This query however, since you're not using MySQL 8+ is much shorter but the performance of this query is a concern especially on a large table.
It is not given, but I hope you have at least MySQL 8 or similar to have ability to use Common Table Expression. It can simplify the complex query.
The trick part is getting survey records #1 and #2 for each user. I will do it this way: see cte1 and cte2 definition
WITH
cte1 AS (
SELECT MAX(x1.id) AS id, x1.person_id
FROM survey x1
GROUP BY x1.person_id),
cte2 AS (
SELECT MAX(x2.id) AS id, x2.person_id
FROM survey x2
JOIN cte1 ON cte1.person_id = x2.person_id
AND cte1.id > x2.id
GROUP BY x2.person_id)
SELECT
p.*,
s1.address, s2.address address2,
s1.marriage_status, s2.marriage_status marriage_status2
FROM person AS p
JOIN (
cte1 JOIN survey s1 ON s1.id = cte1.id
) ON cte1.person_id = p.id
JOIN (
cte2 JOIN survey s2 ON s2.id = cte2.id
) ON cte2.person_id = p.id
WHERE
(s1.address <> s2.address)
OR (s1.marriage_status <> s2.marriage_status)
https://www.db-fiddle.com/f/hLwdHiZin4MkdUZ4aBz67H/2
Update: Thanks to Ian, I replaced MIN to MAX to get recent records

MYSQL select max date from joined tables

I have 2 tables which I want to join and retrieve some specific data. These are my tables.
tbl_user (reg_id, l_name, f_name, status)
tbl_payments (pay_id, reg_id, mem_plan, from_date, to_date, bill_no, payed_date)
What I need to do is select and view the users who have due payments. To do that I want to get the user details where "status=0" from tbl_user and join the 2 tables together and the conditions are to_date< current date, difference between [current date and the to_date] < 31 and filter by the Max value of to_date.
What I did so far gives me a result according to above mentioned conditions except it dosen't filter by the MAX(to_date). This is my query.
SELECT
A.reg_id,
A.f_name,
A.l_name,
B.mem_plan,
B.from_date,
Max(B.to_date) AS to_date,
B.bill_no,
B.payed_date
FROM
tbl_user A,
tbl_payments B
WHERE
A.status = 0
AND A.reg_id = B.reg_id
AND Date(Now()) >= Date(B.to_date)
AND Datediff(Date(Now()), Date(b.to_date)) < 31
GROUP BY
a.reg_id, b.mem_plan, b.from_date, b.bill_no, b.payed_date;
I'm not very familiar with MYSQL, So please someone tell me what I did wrong or if this query is not up to the standard.
Here are some sample data to work on.
tbl_user ( [M1111,Jon, Doe,0], [M1112,Jane,Doe,1],[M1113,Jony,Doe,0] )
tbl_payment ( [1,M1111,Monthly,2018-05-14,2018-06-14,b123,2018-05-14],[2,M1112,3Months,2018-02-03,2018-05-03,b112,2018-02-03],[3,M1113,Monthly,2018-06-14,2018-07-14,b158,2018-06-14],[4,M1111,Monthly,2018-06-15,2018-07-15,b345,2018-06-15],[5,M1113,Monthly,2018-06-06,2018-07-06,b158,2018-06-06],[6,M1111,Monthly,2018-07-05,2018-08-05,b345,2018-07-05] )
Assuming current date is 2018-07-17, The expecting result should be this
[M1111,Jon,Doe,Monthly,2018-06-15,2018-07-15,b345,2018-06-15],[M1113,Jony,Doe,Monthly,2018-06-14,2018-07-14,b158,2018-06-14]
Instead of that, my query gives me this.
[M1111,Jon,Doe,Monthly,2018-06-15,2018-07-15,b345,2018-06-15],[M1113,Jony,Doe,Monthly,2018-06-06,2018-07-06,b158,2018-06-06],
[M1113,Jony,Doe,Monthly,2018-06-14,2018-07-14,b158,2018-06-14]
I wrote another query which gives me the result set exactly as i want. But I'm not sure whether it's up to the standards. If someone can simplify this or make it better, appreciate very much.
SELECT A.reg_id,A.f_name,A.l_name,D.mem_plan,D.from_date,D.to_date,D.bill_no,D.payed_date
FROM tbl_user A
JOIN (SELECT B.reg_id,B.mem_plan,B.from_date,B.to_date,B.bill_no,B.payed_date
FROM tbl_payments B
JOIN (
SELECT reg_id, MAX(to_date) as to_date
FROM tbl_payments
WHERE DATE(NOW()) >= DATE(to_date) AND DATEDIFF(DATE(NOW()), DATE(to_date))<31
GROUP BY reg_id) C
ON B.reg_id = C.reg_id AND B.to_date= C.to_date) D
ON A.reg_id = D.reg_id
WHERE A.status=0;
I believe having won't work here and that your second query is about as good as it gets. I've condensed it a little here:
SELECT A.reg_id,f_name,l_name,mem_plan,from_date,to_date,bill_no,payed_date
FROM #tbl_user A
JOIN #tbl_payments B ON A.reg_id = b.reg_id
JOIN (
SELECT reg_id, MAX(to_date) as max_to_date
FROM #tbl_payments
WHERE DATE(NOW()) >= DATE(to_date) AND DATEDIFF(DATE(NOW()), DATE(to_date))<31
GROUP BY reg_id
) C ON B.reg_id = C.reg_id AND B.to_date= C.max_to_date
WHERE A.status=0;

finding change between records in MySQL

I have a table where I am storing the stored number of barrels inside of many tanks. I am storing values here every night at midnight, and at the beggining and end of any operator initiated transfer.
What I want to return is the number of barrels difference since the previous event record for that specific tank. I have the correct ID for the self join to get the previous record number, however the barrels is incorrect.
Here is what I currently have.
SELECT
inventory.id,
MAX(inventory2.id) AS id2,
inventory.tankname,
inventory.barrels,
inventory.eventstamp,
inventory2.barrels
FROM
inventory
LEFT JOIN
inventory inventory2 ON inventory2.tankname = inventory.tankname AND inventory2.eventstamp < inventory.eventstamp
GROUP BY
inventory.id,
inventory.tankname,
inventory.barrels,
inventory.eventstamp
ORDER BY
inventory.tankname,
inventory.eventstamp
That returns the following
Just use correlated subqueries:
SELECT i.*,
(SELECT i2.id
FROM inventory i2
WHERE i2.tankname = i.tankname AND
i2.eventstamp < i.eventstamp
ORDER BY i2.eventstamp DESC
LIMIT 1
) as prev_id,
(SELECT i2.barrels
FROM inventory i2
WHERE i2.tankname = i.tankname AND
i2.eventstamp < i.eventstamp
ORDER BY i2.eventstamp DESC
LIMIT 1
) as prev_barrels
FROM inventory i
ORDER BY i.tankname, i.eventstamp;
Your query doesn't work because you have columns in the SELECT that are not in the GROUP BY and are not aggregated. That shouldn't be allowed in any database; it is unfortunate that MySQL does allow it.

MySQL GROUP BY grouping by lowest field value

I'm trying to fetch the lowest price per day per hotel, I get multiple results.
I first try to fetch the lowest amount with the MIN() function, then inner join.
When i later try to group by outside the subquery, it just groups by the lowest id.
The SQL itself:
SELECT mt.id, mt.amount, mt.fk_hotel, mt.start_date
FROM price mt
INNER JOIN
(
SELECT price.id, MIN(price.amount) minAmount
FROM price
WHERE 1=1 AND price.start_date >= '2014-10-08' AND price.start_date <= '2014-10-10' AND price.active = 1 AND price.max_people = 2
GROUP BY id
) t
ON mt.id = t.id AND mt.amount = t.minAmount
ORDER BY mt.fk_hotel, mt.amount;
And the results looks like this:
http://jsfiddle.net/63mg3b2j/
I want to group by the start date and fk_hotel so that it groups by the lowest amount value, can anybody help me? Am I being clear?
Edit: I also need a field fk_room from the corresponding row, so i can inner join
Try this:
SELECT MIN(mt.amount) AS min_amount, mt.fk_hotel, mt.start_date
FROM price mt
WHERE
mt.active = 1 AND
mt.max_people = 2 AND
mt.start_date >= '2014-10-08' AND mt.start_date <= '2014-10-10'
GROUP BY mt.fk_hotel, mt.start_date
ORDER BY mt.fk_hotel, min_amount;
Well first of all get a table with minimum value in top row using ORDER BY and then GROUP BY for your required result
SELECT mt.id, mt.amount, mt.fk_hotel, mt.start_date
FROM
(SELECT id, amount, fk_hotel, start_date
FROM price
WHERE start_date >= '2014-10-08' AND start_date <= '2014-10-10'
AND active = 1 AND max_people = 2
ORDER BY amount DESC) AS mt
GROUP BY mt.id
Well I had to still go with a subquery, cause i needed some additional foreign key fields from the corresponding row to inner join some other stuff. It isn't a great solution, cause it fetches too much stuff, the rest is filtered out programmatically.
The most annoying thing here, when I try to use MIN() or MAX() function and get the appropriate fields to that row, it fetches the first results from the DB, which are incorrect and so i have to use a subquery to inner join to get the other fields, I can use grouping, but I had too many fields to group. Maybe I'm missing something. The amount of data doesn't grow in time, so I guess it works for me. So this is the final SQL i came up with, for future reference..
SELECT mt.*, roomtype.name roomname, hotel.name hotelname
FROM booking.price mt
INNER JOIN roomtype ON roomtype.id = mt.fk_roomtype
INNER JOIN hotel ON hotel.id = mt.fk_hotel
INNER JOIN(
SELECT price.id, MIN(price.amount) minAmount
FROM booking.price WHERE 1=1 AND price.start_date >= '2014-10-22' AND price.start_date <= '2014-10-31' AND price.max_people = 2 AND price.active = 1
GROUP BY id
) t
ON mt.id = t.id AND mt.amount = t.minAmount
ORDER BY mt.start_date, mt.amount

Query output differs from the expected output

Below query is doing what I need:
SELECT assign.from_uid, assign.aid, assign.message, curriculum.asset,
curriculum.title, curriculum.description
FROM assignment assign
INNER JOIN curriculum_topics_assets curriculum
ON assign.nid = curriculum.asset
WHERE assign.to_uid = 13 AND assign.status = 1
GROUP BY assign.from_uid, assign.to_uid, assign.nid
ORDER BY assign.created DESC
Now I need to get the total count of rows of the result. For example if it is displaying 5 rows the o/p should be like My expected o/p. The query I tried is given below.
SELECT count(description) FROM assignment assign
INNER JOIN curriculum_topics_assets curriculum ON assign.nid = curriculum.asset
WHERE assign.to_uid = 13 AND assign.status = 1
GROUP BY assign.from_uid, assign.to_uid, assign.nid
ORDER BY assign.created DESC
My expected o/p:
count(*)
---------
5
My current o/p:
count(*)
---------
6
2
5
6
6
The easiest solution would be to
place your initial GROUP BY query in a subselect
select the amount of rows retrieved from this subselect
SQL Statement
SELECT COUNT(*)
FROM (
SELECT assign.from_uid
FROM assignment assign
INNER JOIN curriculum_topics_assets curriculum ON assign.nid = curriculum.asset
WHERE assign.to_uid = 13
AND assign.status = 1
GROUP BY
assign.from_uid
, assign.to_uid
, assign.nid
) q
Edit - why doesn't the original query return the results required
It did already prepared what was needed to get the correct result
Your query without grouping returns a resultset of 25 records (6+2+5+6+6)
From these 25 records, you have 5 unique combinations of from_uid, to_uid, nid
Now you don't want to count how many records each combination has (as you did in your example) but how many unique (distinct anyone?) combinations there are.
One solution to this is the subselect I presented but following equivalent statement using a DISTINCT clause might be more comprehensive.
SELECT COUNT(*)
FROM (
SELECT DISTINCT assign.from_uid
, assign.to_uid
, assign.nid
FROM assignment assign
INNER JOIN curriculum_topics_assets curriculum ON assign.nid = curriculum.asset
WHERE assign.to_uid = 13
AND assign.status = 1
) q
Note that my personal preference goes to the GROUP BY solution.
To get the number of rows for a query do:
SELECT COUNT(*) as RowCount FROM (--insert other query here--) s
In you example:
SELECT COUNT(*) as RowCount FROM (SELECT a.from_uid
FROM assignment a
INNER JOIN curriculum_topics_assets c ON a.nid = c.asset
WHERE a.to_uid = 13
AND a.status = 1
GROUP BY a.from_uid, a.to_uid, a.nid
) s
Note that I the dropped the stuff that has no effect on the number of rows to make the query run slightly faster.
You should use COUNT(*) instead of count(description). Look at: http://www.mysqlperformanceblog.com/2007/04/10/count-vs-countcol/