MySQL getting data of max other column for each groups - mysql

The table seems
group_id | time | data | others...
1 2020-11-30 12:00:00 13
1 2020-11-30 13:00:00 15
2 2020-11-30 12:30:00 254
3 2021-02-21 18:00:00 25565
...
I wanted to get the "data" of "latest time"(which can be different for each group) for each group specified,
so I wrote down like
SELECT group_id, max(time), data
where group_id between (...)
(and others = ...)
group by group_id
but It's data was not the value of that time (actually seems like the latest record)
How can I get the data of maximum time for each group?

Since you are using a primary key on group_id, time, you can use a standard exclusion join, as opposed to a GROUP BY, since the records are unambiguous.
In this instance to obtain the group_id and MAX(time) with the corresponding other values you would LEFT JOIN table AS b ON a.group_id = b.group_id AND a.time < b.time Then remove the undesired entries by filtering the left join table results with WHERE b.group_id IS NULL
Exclusion Join
DB-Fiddle
SELECT t1.group_id, t1.`time`, t1.`data`
FROM `table` AS t1
LEFT JOIN `table` AS t2
ON t2.group_id = t1.`group_id`
AND t1.`time` < t2.`time`
WHERE t1.group_id BETWEEN ...
AND t1.others = ...
AND t2.group_id IS NULL;
Result:
| group_id | time | data |
| -------- | ------------------- | ----- |
| 1 | 2020-11-30 13:00:00 | 15 |
| 2 | 2020-11-30 12:30:00 | 254 |
| 3 | 2021-02-21 18:00:00 | 25565 |
Alternatively, since you have group_id, time as your primary key, you can use a compound IN subquery criteria with the MAX(time).
Compound IN Subquery
DB-Fiddle
SELECT t1.group_id, t1.`time`, t1.`data`
FROM `table` AS t1
WHERE t1.others = ....
AND (t1.group_id, t1.`time`) IN(
SELECT group_id, MAX(`time`)
FROM `table`
WHERE group_id BETWEEN ....
GROUP BY group_id
);
Result:
| group_id | time | data |
| -------- | ------------------- | ----- |
| 1 | 2020-11-30 13:00:00 | 15 |
| 2 | 2020-11-30 12:30:00 | 254 |
| 3 | 2021-02-21 18:00:00 | 25565 |

I myself found answer for this using JOIN
with tmp as (
SELECT group_id, max(time) as time FROM table
WHERE group_id between (...)
(and others = ...)
group by group_id
)
SELECT r.data FROM table as r
INNER JOIN tmp ON r.group_id = tmp.group_id and r.time = tmp.time
where group_id between (...)
(and others = ...)

Related

MySQL - Return Latest Date and Total Sum from two rows in a column for multiple entries

For every ID_Number, there is a bill_date and then two types of bills that happen. I want to return the latest date (max date) for each ID number and then add together the two types of bill amounts. So, based on the table below, it should return:
| 1 | 201604 | 10.00 | |
| 2 | 201701 | 28.00 | |
tbl_charges
+-----------+-----------+-----------+--------+
| ID_Number | Bill_Date | Bill_Type | Amount |
+-----------+-----------+-----------+--------+
| 1 | 201601 | A | 5.00 |
| 1 | 201601 | B | 7.00 |
| 1 | 201604 | A | 4.00 |
| 1 | 201604 | B | 6.00 |
| 2 | 201701 | A | 15.00 |
| 2 | 201701 | B | 13.00 |
+-----------+-----------+-----------+--------+
Then, if possible, I want to be able to do this in a join in another query, using ID_Number as the column for the join. Would that change the query here?
Note: I am initially only wanting to run the query for about 200 distinct ID_Numbers out of about 10 million. I will be adding an 'IN' clause for those IDs. When I do the join for the final product, I will need to know how to get those latest dates out of all the other join possibilities. (ie, how do I get ID_Number 1 to join with 201604 and not 201601?)
I would use NOT EXISTS and GROUP BY
select, t1.id_number, max(t1.bill_date), sum(t1.amount)
from tbl_charges t1
where not exists (
select 1
from tbl_charges t2
where t1.id_number = t2.id_number and
t1.bill_date < t2.bill_date
)
group by t1.id_number
the NOT EXISTS filter out the irrelevant rows and GROUP BY do the sum.
I would be inclined to filter in the where:
select id_number, sum(c.amount)
from tbl_charges c
where c.date = (select max(c2.date)
from tbl_charges c2
where c2.id_number = c.id_number and c2.bill_type = c.bill_type
)
group by id_number;
Or, another fun way is to use in with tuples:
select id_number, sum(c.amount)
from tbl_charges c
where (c.id_number, c.bill_type, c.date) in
(select c2.id_number, c2.bill_type, max(c2.date)
from tbl_charges c2
group by c2.id_number, c2.bill_type
)
group by id_number;

mySQL largest value by unique entry

I have tried to solve the following problem for the last couple of hours and could not find anything that pointed me in the right direction on Google or Stackoverflow. I believe that this could be a similar problem, but I did not really understand what the author wanted to achieve, hence I am trying it with my own concrete example:
I have a table that basically tracks prices of different products over time:
+------------+--------+----------+
| Product_id | Price | Time |
+------------+--------+----------+
| 1 | 1.30 | 13:00:00 |
| 1 | 1.10 | 13:30:00 |
| 1 | 1.50 | 14:00:00 |
| 1 | 1.60 | 14:30:00 |
| 2 | 2.10 | 13:00:00 |
| 2 | 2.50 | 13:30:00 |
| 2 | 1.90 | 14:00:00 |
| 2 | 2.00 | 14:30:00 |
| 3 | 1.45 | 13:00:00 |
| 3 | 1.15 | 13:30:00 |
| 3 | 1.50 | 14:00:00 |
| 3 | 1.55 | 14:30:00 |
+------------+--------+----------+
I would now like to query the table so that the rows with max. Price for each product are returned:
+------------+--------+----------+
| Product_id | Price | Time |
+------------+--------+----------+
| 1 | 1.60 | 14:30:00 |
| 2 | 2.50 | 13:30:00 |
| 3 | 1.55 | 14:30:00 |
+------------+--------+----------+
Also, in case of duplicates, i.e. if there is a max. Price at two different points in time, it should only return one row, preferably the one with the smallest value of time.
I have tried MAX() and GREATEST(), but could not achieve the desired outcome to show the wanted values for each product. Efficiency of the query is not the most important factor, but I have about 500 different products with several million rows of data, hence splitting the table by unique product did not seem like an appropriate solution.
Group the data product id and pick the max price and max time
select t1.product_id,t1.price,min(t1.time) as time from your_table t1
inner join (
select Product_id,max(price)as price from
your_table group by Product_id
) t2 on t1.Product_id=t2.Product_id and t1.price=t2.price group by t1.product_id
Sql Fiddle Example:
http://sqlfiddle.com/#!9/020c3/9
http://sqlfiddle.com/#!9/020c3/1
SELECT p.*
FROM prices p
LEFT JOIN prices p1
ON p.product_id = p1.product_id
AND p.time<p1.time
WHERE p1.product_id IS NULL
If you need maximum price to get you can:
http://sqlfiddle.com/#!9/020c3/6
SELECT p.*
FROM prices p
LEFt JOIN prices p1
ON p.product_id = p1.product_id
AND p.price<p1.price
WHERE p1.product_id IS NULL;
And the last approach since I didn't get the goal from the beggining:
http://sqlfiddle.com/#!9/ace04/2
SELECT p.*
FROM prices p
LEFt JOIN prices p1
ON p.product_id = p1.product_id
AND (
p.price<p1.price
OR (p.price=p1.price AND p.time<p1.time)
)
WHERE p1.product_id IS NULL;
This solution is assuming the existence of an additional my_table.id column that needs to be used in case there are duplicate values for (Product_id, price, time) in your table. id is assumed to be a unique value in the table.
SELECT *
FROM my_table t1
WHERE NOT EXISTS (
SELECT *
FROM my_table t2
WHERE t1.Product_id = t2.Product_id
AND ((t1.price < t2.price) OR
(t1.price = t2.price AND t1.time > t2.time) OR
(t1.price = t2.price AND t1.time = t2.time AND t1.id > t2.id))
)
Alternatively, the predicate on price and time could also be expressed using a row value expression predicate (not sure if it's more readable, as t1 and t2 columns are mixed in a each row value expression):
SELECT *
FROM my_table t1
WHERE NOT EXISTS (
SELECT *
FROM my_table t2
WHERE t1.Product_id = t2.Product_id
AND (t1.price, t2.time, t2.id) < (t2.price, t1.time, t1.id)
)

How to update foreign key with the most recent associated record's id in MySQL

What is the most performant way to generate the latest_entry_id on checks table from the entries with the same user_id, with the newest start_date that is prior to create_date of the check?
Before:
checks table
id | user_id | create_date | latest_entry_id
------------------------------------------------------
1 | 1 | 2012-01-01 | NULL
2 | 2 | 2012-01-01 | NULL
entries table
id | user_id | start_date
-------------------------------------
1 | 1 | 2012-02-01
2 | 1 | 2011-01-01
3 | 2 | 2011-09-01
4 | 2 | 2011-10-01
After:
checks table
id | user_id | create_date | latest_entry_id
------------------------------------------------------
1 | 1 | 2012-01-01 | 2
2 | 2 | 2012-01-01 | 4
I think this can make it work:
UPDATE checks c
INNER JOIN (
SELECT e.user_id,
MAX(e.id) AS ID
FROM entries e
INNER JOIN checks c1 ON e.user_id = c1.user_id
AND e.start_date < c1.create_date
GROUP BY e.user_id
) a ON a.user_id = c.user_id
SET c.latest_entry_id = a.id;
sqlfiddle demo
p.s. Your second row in your expected results is not consistent with your requirements. latest_entry_id should be 4, not 3.
The best query I came up with is this
Update checks INNER JOIN
(
SELECT checks.id AS c_id, MAX(entries.start_date) AS max_start_date
FROM checks LEFT OUTER JOIN entries ON checks.user_id = entries.user_id
WHERE entries.start_date < checks.create_date
GROUP BY checks.id
) AS tmp
ON checks.id = tmp.c_id
LEFT JOIN entries
ON tmp.max_start_date = entries.start_date AND checks.user_id = entries.user_id
SET checks.latest_entry_id = entries.id
This query is performant and avoid running one subquery per check. If you know of another performant way to do this update to get the same results, I would like to hear your way.

unable to join two tables, one with multiple rows

I have two tables that I am attempting to join in MySQL:
reviews:
| review_id | comment | reviewer_id | user_id |
-----------------------------------------------------------
| 1 | some text. | 501 | 100 |
| 2 | lorem ipsum | 606 | 100 |
| 3 | blah blah. | 798 | 120 |
| 4 | foo bar! | 798 | 133 |
-----------------------------------------------------------
review_status:
| review_id | status | timestamp |
----------------------------------------
| 1 | 10 | 1364507521 |
| 1 | 101 | 1364508057 |
| 2 | 100 | 1364509033 |
| 1 | 150 | 1364509149 |
| 2 | 120 | 1364509283 |
| 2 | 122 | 1364855948 |
| 3 | 120 | 1364509283 |
| 3 | 122 | 1364855948 |
| 1 | 110 | 1364855945 |
| 4 | 100 | 1364509283 |
| 4 | 115 | 1364855948 |
| 4 | 210 | 1364855945 |
----------------------------------------
What I WANT is a result that looks something like this:
result
| review_id | comment | reviewer_id | user_id | status | timestamp |
--------------------------------------------------------------------------
| 1 | some text. | 501 | 100 | 200 | 1364855945 |
| 2 | lorem ipsum | 606 | 120 | 122 | 1364855948 |
--------------------------------------------------------------------------
I'm after: 1) The newest entry from the review_status table 2) A certain range of status codes (100 - 199 in this case) 3) And multiple user_id's from the review table.
This is currently my query, that I can't get to work for the life of me:
SELECT r.review_id, r.comment, r.reviewer_id, r.user_id
FROM reviews AS r
INNER JOIN
(SELECT s.status, max(s.timestamp)
FROM review_status AS s
WHERE s.status < 200
AND s.status > 99;
GROUP BY s.review_id) AS r_s
ON r.review_id = r_s.review_id
WHERE r.user_id IN (100,120);
Any help is greatly appreciated! Thanks.
You have a few issues with your current query.
the subquery is not returning review_id so you cannot use that in the join
you have an extra semi-colon in the subquery
I might suggest rewriting the query to use the following:
SELECT r.review_id, r.comment, r.reviewer_id, r.user_id,
rs.status, rs.timestamp
FROM reviews AS r
INNER JOIN review_status rs
ON r.review_id = rs.review_id
INNER JOIN
(
SELECT s.review_id, max(s.timestamp) MaxDate
FROM review_status AS s
WHERE s.status < 200
AND s.status > 99
GROUP BY s.review_id
) AS r_s
ON rs.review_id = r_s.review_id
AND rs.timestamp = r_s.MaxDate
WHERE r.user_id IN (100,120)
and rs.status < 200
AND rs.status > 99
See SQL Fiddle with Demo.
The main reason for the query to be written this way is because in your current query you are grouping by review_id but are returning the status. MySQL uses an extension to the GROUP BY clause that will allow items in the select list to be excluded being used in a GROUP BY or aggregate function but this could cause unexpected results. (see MySQL Extensions to GROUP BY)
From the MySQL Docs:
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. ... You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values the server chooses.
Try this:
SELECT r.*, r_s.*
FROM review_status r_s LEFT JOIN reviews r
ON r.review_id = r_s.review_id
WHERE r_s.user_id > 100 AND r_s.user_id < 120
ORDER BY r_s.timestamp DESC;
SELECT r.review_id, r.comment, r.reviewer_id, r.user_id, tt.status,tt.timestamp
FROM (
SELECT rs2.review_id,rs2.status,rs2.timestamp
FROM (
SELECT MAX(rs.timestamp) as mts
FROM reviews rr
JOIN review_status AS rs ON rs.review_id = rr.id
WHERE rs.status < 200 AND rs.status > 99
AND rr.user_id IN (100,120)
GROUP BY rs.review_id
) as t
JOIN review_status rs2 ON rs2.timestamp = t.mts
GROUP BY rs2.review_id #remove duplicate statuses with the same timestamp
) as tt
JOIN reviews as r ON r.id = tt.review_id
The user_id and status filters have to be in the innermost query to avoid selecting and join-ing the entire statuses table every time.
Here's my attempt with one JOIN and one correlated sub-query:
SELECT r.*, rs.*
FROM Reviews AS r
INNER JOIN Review_status AS rs ON r.review_id = rs.review_id
WHERE rs.status BETWEEN 99 AND 200 AND
r.user_id IN (100,120) AND
rs.timestamp = (SELECT MAX(timestamp) FROM Review_status
WHERE review_id = r.review_id
ORDER BY timestamp DESC)
ORDER BY r.review_id;
Its SQL Fiddle: http://sqlfiddle.com/#!2/02f18/6

MySQL Inner Join two tables on the maximum values of the second table

I have two tables "listings" and "bids". I have a query that will return all the listings that a particular user has bid on, where the end_date is past. What I need to do though is limit the results even further to find only the listings where the user has bid, and is the last bidder. Here is the query I have so far...
SELECT listings.end_date, listings.user_id, listings.title, listings.auc_fp, listings.id, listings.auc_image1
FROM listings INNER JOIN bids ON listings.id = bids.listing_id
WHERE bids.user_id=$userid
AND listings.end_date < NOW()
ORDER BY list_ts DESC"
I'm not good with subqueries and I'm assuming I'm going to need one here to find all the users bids in the "bids" table where the bid_ts (bid timestamp) is the lastest timestamp for that corresponding listing_id in the bids table. The columns in my bids table are: listing_id, user_id, bid, bid_ts.
+------------+---------+------+---------------------+
| listing_id | user_id | bids | bid_ts |
+------------+---------+------+---------------------+
| 1 | 10 | 100 | 2012-11-16 00:54:03 |
| 1 | 11 | 101 | 2012-11-16 00:54:04 |
| 2 | 10 | 33 | 2012-11-16 00:54:03 |
| 2 | 11 | 34 | 2012-11-16 00:54:04 |
| 2 | 12 | 35 | 2012-11-16 00:54:05 |
+------------+---------+------+---------------------+
Thanks for any help
SELECT a.*
FROM tableName a
INNER JOIN
(
SELECT listing_ID, user_ID, MAX(bid_ts) maxDate
FROM tableName
GROUP BY listing_ID, user_ID
) b ON a.listing_ID = b.listing_ID AND
a.user_ID = b.user_ID AND
a.bid_ts = b.maxDate
SQLFiddle Demo