Specifying MAX group function in MySQL for versioned data [duplicate] - mysql

This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 2 years ago.
I have the following MySQL table:
[orders]
===
order_id BIGINT UNSIGNED AUTO INCREMENT (primary key)
order_ref_id VARCHAR(36) NOT NULL,
order_version BIGINT NOT NULL,
order_name VARCHAR(100) NOT NULL
...
lots of other fields
The orders are "versioned" meaning multiple orders can have the same identical order_ref_id but will have different versions (version 0, version 1, version 2, etc.).
I want to write a query that returns all order fields for an order based on its order_ref_id and its MAX version, so something like this:
SELECT
*
FROM
orders
WHERE
order_ref_id = '12345'
AND
MAX(order_version)
In other words: given the org_ref_id, give me the single orders record that has the highest version number. Hence if the following rows exist in the table:
order_id | order_ref_id |. order_version | order_name
======================================================================
1 | 12345 | 0 | "Hello"
2 | 12345 | 1 | "Goodbye"
3 | 12345 | 2 | "Wazzup"
I want a query that will return:
3 | 12345 | 2 | "Wazzup"
However the above query is invalid, and yields:
ERROR 1111 (HY000): Invalid use of group function
I know typical MAX() examples would have me writing something like:
SELECT
MAX(order_version)
FROM
orders
WHERE
order_ref_id = '12345';
But that just gives me order_version, not all (*) the fields. Can anyone help nudge me across the finish line here?

You could try using a subquery for max version group by order_ref_id
select * from orders o
inner join (
SELECT order_ref_id
, MAX(order_version) max_ver
FROM orders
group by order_ref_id
) t on t.order_ref_id = o.order_ref_id
and t.max_ver = o.order_version

Related

How to get distinct id based on the order of increasing times of ID in MYSQL

How to get the distinct id from a group of id based on the order of increasing number of times it present.
For Example , input: 3,1,1,2,2,2
Here id 2 present 3 times , id 1 present 2 times and id 3 present 1 time..
here is my output 2,1,3
How to get these with a single query using mysql
select distinct id, COUNT(id) from your_table
group by id
order by COUNT(id)
heres a simple query with the count as well if you want to check its in the correct order.
At first, we need to analyse how you have got this input:
3,1,1,2,2,2
The CSV input can be pre-filtered, if it is through:
User Input
Query Output
If it was a User Input, then there's no way MySQL can directly access the value, unless it is stored as data. In that case, you will be having some kind of PHP or other programming language that sends the data to MySQL. So, assuming it for PHP, what I would do is:
<?php
$csv = "3,1,1,2,2,2";
$arr = explode(",", $csv);
$arr = array_unique($arr);
?>
Now you will have unique values.
If it was a query output, you just need to use DISTINCT keyword.
SELECT DISTINCT(`id`) FROM `table` WHERE `SomeCondition`='Value';
You can also try by using GROUP BY, but using DISTINCT is much faster IMHO. (What's faster, SELECT DISTINCT or GROUP BY in MySQL?)
Suppose we have 2 tables with us:
1) student: Fields are as follows:
a) id: INTEGER AUTO INCREMENT PRIMARY KEY
b) name: VARCHAR
Sample Data:
student
id | name
----------
1 | A
2 | B
3 | C
2) marks: Fields are as follows:
a) id: INTEGER AUTO INCREMENT PRIMARY KEY
b) sid: INTEGER FOREIGN KEY (refers to id field from student table)
c) subject: VARCHAR
d) marks: INTEGER
Sample Data:
marks:
id | sid | subject | marks
--------------------------
1 | 1 | s1 | 40
2 | 2 | s2 | 50
3 | 2 | s1 | 60
4 | 2 | s2 | 70
5 | 3 | s1 | 80
Use below query to get distinct student id's with referring records in descending order:
SELECT `student`.`id`, COUNT(*) AS `total` FROM `student` INNER JOIN `marks` ON (`student`.`id` = `marks`.`sid`) WHERE 1 GROUP BY `student`.`id` ORDER BY `total` DESC
You can use group by to get unique ids.
SQL Query:
select id from table group by id;

MySQL selecting latest occurrences of a GROUP BY'd field [duplicate]

This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 6 years ago.
I have a table that has a non-unique id, id, a status, status (which I'm trying to get), and a timestamp of when it was inserted, inserted. I need to select the most recent occurrence of each id ordered by the id. I have the following table ordered by inserted:
id | status | inserted
------------------------------------
4 | approved | 2016-08-09 15:51:52
5 | denied | 2016-08-09 15:52:36
5 | pending | 2016-08-09 15:55:05
The results I need are:
id | status | inserted
------------------------------------
4 | approved | 2016-08-09 15:51:52
5 | pending | 2016-08-09 15:55:05
I have the following SELECT:
SELECT * FROM table
GROUP BY id
ORDER BY inserted
and I'm getting these results:
id | status | inserted
------------------------------------
4 | approved | 2016-08-09 15:51:52
5 | denied | 2016-08-09 15:52:36
The solution is probably an easy one, but I've racked my brain on this long enough trying things such as inner selects and whatnot. Any help would be appreciated.
EDIT:
I had to use the third option from the linked duplicate question to get the results I expected (the one with the LEFT JOIN). I assume it was because I was using a DATETIME type, but I'm unsure.
select t.*
from
<T> t inner join
(select id, max(inserted) as max_inserted from <T> group by id) as m
on m.id = t.id and m.max_inserted = t.inserted
or
select
id,
(
select status from <T> t2
where t2.id = t.id and t2.inserted = max_inserted
) as status,
max(inserted) as max_inserted
from <T>
group by id
You can try searching for "mysql latest per group" or something like that for alternatives more specific to MySQL.
If you want the most recent record for each id, then don't use group by. Instead:
select t.*
from table t
where t.inserted = (select max(t2.inserted) from table t2 where t2.id = t.id);

MySQL group/order behaves differently in 5.7

I have a table that looks like this:
id | text | language_id | other_id | dateCreated
1 | something | 1 | 5 | 2015-01-02
2 | something | 1 | 5 | 2015-01-01
3 | something | 2 | 5 | 2015-01-01
4 | something | 2 | 6 | 2015-01-01
and I want to get all latest rows for each language_id that have other_id 5.
my query looks like this
SELECT * (
SELECT *
FROM tbl
WHERE other_id = 5
ORDER BY dateCreated DESC
) AS r
GROUP BY r.language_id
With MySQL 5.6 I get 2 rows with ID 1 and 3, which is what I want.
With MySQL 5.7.10 I get 2 rows with IDs 2 and 3 and it seems to me that the ORDER BY in the subquery is ignored.
Any ideas what might be the problem ?
You should go with the query below:
SELECT
*
FROM tbl
INNER JOIN
(
SELECT
other_id,
language_id,
MAX(dateCreated) max_date_created
FROM tbl
WHERE other_id = 5
GROUP BY language_id
) AS t
ON tbl.language_id = t.language_id AND tbl.other_id = t.other_id AND
tbl.dateCreated = t.max_date_created
Using GROUP BY without aggregate function will pick row in arbitrary order. You should not rely on what's row is returned by the GROUP BY. MySQL doesn't ensure this.
Quoting from this post
In a nutshell, MySQL allows omitting some columns from the GROUP BY,
for performance purposes, however this works only if the omitted
columns all have the same value (within a grouping), otherwise, the
value returned by the query are indeed indeterminate, as properly
guessed by others in this post. To be sure adding an ORDER BY clause
would not re-introduce any form of deterministic behavior.
Although not at the core of the issue, this example shows how using *
rather than an explicit enumeration of desired columns is often a bad
idea.
Excerpt from MySQL 5.0 documentation:
When using this feature, all rows in each group should have the same
values for the columns that are omitted from the GROUP BY part. The
server is free to return any value from the group, so the results are
indeterminate unless all values are the same.

MySQL How can I add values of a column together and remove the duplicate rows?

Good day,
I have a MySQL table which has some duplicate rows that have to be removed while adding a value from one column in the duplicated rows to the original.
The problem was caused when another column had the wrong values and that is now fixed but it left the balances split among different rows which have to be added together. The newer rows that were added must then be removed.
In this example, the userid column determines if they are duplicates (or triplicates). userid 6 is duplicated and userid 3 is triplicated.
As an example for userid 3 it has to add up all balances from rows 3, 11 and 13 and has to put that total into row 3 and then remove rows 11 and 13. The balance columns of both of those have to be added together into the original, lower ID row and the newer, higher ID rows must be removed.
ID | balance | userid
---------------------
1 | 10 | 1
2 | 15 | 2
3 | 300 | 3
4 | 80 | 4
5 | 0 | 5
6 | 65 | 6
7 | 178 | 7
8 | 201 | 8
9 | 92 | 9
10 | 0 | 10
11 | 140 | 3
12 | 46 | 6
13 | 30 | 3
I hope that is clear enough and that I have provided enough info. Thanks =)
Two steps.
1. Update:
UPDATE
tableX AS t
JOIN
( SELECT userid
, MIN(id) AS min_id
, SUM(balance) AS sum_balance
FROM tableX
GROUP BY userid
) AS c
ON t.userid = c.userid
SET
t.balance = CASE WHEN t.id = c.min_id
THEN c.sum_balance
ELSE 0
END ;
2. Remove the extra rows:
DELETE t
FROM
tableX AS t
JOIN
( SELECT userid
, MIN(id) AS min_id
FROM tableX
GROUP BY userid
) AS c
ON t.userid = c.userid
AND t.id > c.min_id
WHERE
t.balance = 0 ;
Once you have this solved, it would be good to add a UNIQUE constraint on userid as it seems you want to be storing the balance for each user here. That will avoid any duplicates in the future. You could also remove the (useless?) id column.
SELECT SUM(balance)
FROM your_table
GROUP BY userid
Should work, but the comment saying fix the table is really the best approach.
You can create a table with the same structure and transfer the data to it with this query
insert into newPriceTable(id, userid, balance)
select u.id, p.userid, sum(balance) as summation
from price p
join (
select userid, min(id) as id from price group by userid
) u ON p.userid = u.userid
group by p.userid
Play around the query here: http://sqlfiddle.com/#!2/4bb58/2
Work is mainly done in MSSQL but you should be able to convert the syntax.
Using a GROUP BY UserID you can SUM() the Balance, join that back to your main table to update the balance across all the duplicates. Finally you can use RANK() to order your duplicate Userids and preserve only the earliest values.
I'd select all this into a new table and if it looks good, deprecate your old table and rename then new one.
http://sqlfiddle.com/#!3/068ee/2

Group by - Overriding default behaviour of deciding row under each group in result

Extending further from this question Query to find top rated article in each category -
Consider the same table -
id | category_id | rating
---+-------------+-------
1 | 1 | 10
2 | 1 | 8
3 | 2 | 7
4 | 3 | 5
5 | 3 | 2
6 | 3 | 6
There is a table articles, with fields id, rating (an integer from 1-10), and category_id (an integer representing to which category it belongs). And if I have the same goal to get the top rated articles in each query (this should be the result):-
Desired Result
id | category_id | rating
---+-------------+-------
1 | 1 | 10
3 | 2 | 7
6 | 3 | 6
Extension of original question
But, running the following query -
SELECT id, category_id, max( rating ) AS max_rating
FROM `articles`
GROUP BY category_id
results into the following where everything, except the id field, is as desired. I know how to do this with a subquery - as answered in the same question - Using subquery.
id category_id max_rating
1 1 10
3 2 7
4 3 6
In generic terms
Excluding the grouped column (category_id) and the evaluated columns (columns returning results of aggregate function like SUM(), MAX() etc. - in this case max_rating), the values returned in the other fields are simply the first row under every grouped result set (grouped by category_id in this case). E.g. the record with id =1 is the first one in the table under category_id 1 (id 1 and 2 under category_id 1) so it is returned.
I am just wondering is it not possible to somehow overcome this default behavior to return rows based on conditions? If mysql can perform calculation for every grouped result set (does MAX() counting etc) then why can't it return the row corresponding to the maximum rating. Is it not possible to do this in a single query without a subquery? This looks to me like a frequent requirement.
Update
I could not figure out what I want from Naktibalda's solution too. And just to mention again, I know how to do this using a subquery, as again answered by OMG Ponies.
Use:
SELECT x.id,
x.category_id,
x.rating
FROM YOUR_TABLE x
JOIN (SELECT t.category_id,
MAX(t.rating) AS max_rating
FROM YOUR_TABLE t
GROUP BY t.category_id) y ON y.category_id = x.category_id
AND y.max_rating = x.rating