Can I get the full rows when using group by multiple columns? - mysql

If the date, item, and category are the same in the table,
I'd like to treat it as the same row and return n rows out of them(ex: if n is 3, then limit 0, 3).
------------------------------------------
id | date | item | category | ...
------------------------------------------
101 | 20220201| pencil | stationery | ... <---
------------------------------------------ | treat as same result
105 | 20220201| pencil | stationery | ... <---
------------------------------------------
120 | 20220214| desk | furniture | ...
------------------------------------------
125 | 20220219| tongs | utensil | ... <---
------------------------------------------ | treat as same
129 | 20220219| tongs | utensil | ... <---
------------------------------------------
130 | 20220222| tongs | utensil | ...
expected results (if n is 3)
-----------------------------------------------
id | date | item | category | ... rank
-----------------------------------------------
101 | 20220201| pencil | stationery | ... 1
-----------------------------------------------
105 | 20220201| pencil | stationery | ... 1
-----------------------------------------------
120 | 20220214| desk | furniture | ... 2
-----------------------------------------------
125 | 20220219| tongs | utensil | ... 3
-----------------------------------------------
129 | 20220219| tongs | utensil | ... 3
The problem is that I have to bring the values of each group as well.
If I have only one column to group by, I can compare id value with origin table, but I don't know what to do with multiple columns.
Is there any way to solve this problem?
For reference, I used a user variable to compare it with previous values,
I couldn't use it because the duration was slow.
SELECT
*,
IF(#prev_date=date and #prev_item=item and #prev_category=category,#ranking, #ranking:=#ranking+1) AS sameRow,
#prev_item:=item,
#prev_date:= date,
#prev_category:=category,
#ranking
FROM ( SELECT ...
I'm using Mysql 8.0 version and id value is not a continuous number because I have to order by before group by.

if I understand correctly, you can try to use dense_rank window function and set order by with your expected columns
if date column can represent the order number I would put it first.
SELECT *
FROM (
SELECT *,dense_rank() OVER(ORDER BY date, item, category) rnk
FROM T
) t1
SQLFIDDLE

Window functions come in very handy in this situation. But for those of us still using MySQL 5.7, where functions such as row_number don't exist, we have to either resort to using a user variable and resetting the value every time before the main statement, or defining the user variable directly in the statement.
method 1
set #row_id=0; -- remember to reset the row_id to 0 every time before the main query below
select id,date,item,category,rank from testtb join
(
select date,item,category, (#row_id:=#row_id+1) as rank
from
(select date,item,category from testtb group by date,item,category) t1
) t2
using(date,item,category);
method 2
select id,date,item,category,rank from testtb join
(
select date,item,category, (#row_id:=#row_id+1) as rank
from
(select date,item,category from testtb group by date,item,category) t1, (select #row_id := 0) as n
) t2
using(date,item,category);

Related

Select corresponding non-aggregated column after group by statment in MySQL

I have a temporary table I've derived from a much larger table.
+-----+----------+---------+
| id | phone | attempt |
+-----+----------+---------+
| 1 | 12345678 | 15 |
| 2 | 87654321 | 0 |
| 4 | 12345678 | 16 |
| 5 | 12345678 | 14 |
| 10 | 87654321 | 1 |
| 11 | 87654321 | 2 |
+-----+----------+---------+
I need to find the id (unique) corresponding to the highest attempt made on each phone number. Phone and attempt are not unique.
SELECT id, MAX(attempt) FROM temp2 GROUP BY phone
The above query does not return the id for the corresponding max attempt.
Try this:
select
t.*
from temp2 t
inner join (
select phone, max(attempt) attempt
from temp2
group by phone
) t2 on t.phone = t2.phone
and t.attempt = t2.attempt;
It will return rows with max attempts for a given number.
Note that this will return multiple ids if there are multiple rows for a phone if the attempts are same as maximum attempts for that phone.
Demo here
As an alternative to the answer given by #GurV, you could also solve this using a correlated subquery:
SELECT t1.*
FROM temp2 t1
WHERE t1.attempt = (SELECT MAX(t2.attempt) FROM temp2 t2 WHERE t2.phone = t1.phone)
This has the advantage of being a bit less verbose. But I would probably go with the join option because it will scale better for large data sets.
Demo

Retrieve distinct values without reducing number of results

I'm writing a MySQL request for retrieving data from a list of questions.
The table looks like this :
-----------------------------------------------------
| id | answer_name | rating | question_id | answers |
-----------------------------------------------------
Where several rows can have the same answer_name value, since several questions can be asked about the same answer.
Now, for retrieving the data I use a LIMIT clause which is calculated from ratings and the total number of rows.
For example, if I wanna get the data between 80% and 100% of rating, and there are 100 rows, I would use ORDER BY rating LIMIT 80, 20.
My problem is the following : I need to retrieve data with distinct values for answer_name column, but using a GROUP BY clause makes the number of result (e.g. of rows in the table) reduce cause of aggregation, causing the top percentages of rows to return nothing cause of searching rows at a limit that doesn't exist.
Does anyone know if there is a way to keep the number of results the same and still to retrieve distinct results for the answer_name column ?
EDIT :
Here are some sample rows and expected output :
game_data table :
-----------------------------------------------------
| id | answer_name | rating | question_id | answers |
|----|-------------|--------|-------------|---------|
| 1 | A. Merkel | 40 | 1 | [1,2,3] |
| 2 | A. Merkel | 45 | 2 | [2,3,4] |
| 3 | B. Clinton | 55 | 1 | [2,5,8] |
| 4 | B. Clinton | 50 | 2 | [3,5,8] |
| 5 | L. Messi | 17 | 4 | [7,8,9] |
| 6 | L. Messi | 18 | 5 | [7,8,9] |
| 7 | L. Messi | 25 | 6 | [7,8,9] |
| 8 | D. Beckham | 21 | 4 | [6,7,8] |
| 9 | D. Beckham | 52 | 5 | [6,7,8] |
| 10 | D. Beckham | 41 | 6 | [6,7,8] |
-----------------------------------------------------
Where answers is an array of ids referring to another table.
Let's say I wanna retrieve the 50% to 80% of the table, ordered by rating.
SELECT id FROM game_data GROUP BY answer_name ORDER BY rating LIMIT 5, 3
Here the problem is the GROUP BY answer_name is gonna reduce the number of rows of the table, and therefore instead of returning 3 results, will return an empty set.
Also, I want the selected value in the GROUP BY close to be randomly chosen.
Using group by like this goes against pretty much every instinct, but you said you want random values, so it's good enough.
select * from (
select q.*, #rank := #rank + 1 as rank
from (
select * from game_data
group by answer_name
order by rating desc
) q, (select #rank := 0) qq
) qqq
where rank between (#rank * .5) and (#rank * .8)
demo here
How does it work? First (in the innermost query) we group by your answer_name, to get your distinct results, and we order it by the rating as required.
Then in the query wrapping around that one, we give those results a ranking from 1 to however many rows are in the result. Once this level of the query completes, we know our best answer is answer 1, and our 'worst' answer is the last value of our #rank variable.
Then we get to the outermost query. We can use that #rank variable to determine our percentages, which we use to filter the where clause.
In all likelihood this will give you the same results each time you run the same query, but the values chosen are indeterminate - so it could change. If you want truly random (ie changes with each execution) that's a different kettle of fish altogether.
(note, this bit: , (select #rank := 0) qq is purely to initialise the variable)
Simple is That.
Use Group By 'id' not 'answer_name' b/c Group By not get duplicate values
SELECT * FROM game_data GROUP BY id ORDER BY rating

sql - Why doesn't MAX() of SUM() work?

I am trying to understand why the SQL command of MAX(SUM(col)) gives the a syntax error. I have the two tables as below-:
+--------+--------+---------+-------+
| pname | rollno | address | score |
+--------+--------+---------+-------+
| A | 1 | CCU | 1234 |
| B | 2 | CCU | 2134 |
| C | 3 | MMA | 4321 |
| D | 4 | MMA | 1122 |
| E | 5 | CCU | 1212 |
+--------+--------+---------+-------+
Personnel Table
+--------+-------+----------+
| rollno | marks | sub |
+--------+-------+----------+
| 1 | 90 | SUB1 |
| 1 | 88 | SUB2 |
| 2 | 89 | SUB1 |
| 2 | 95 | SUB2 |
| 3 | 99 | SUB1 |
| 3 | 99 | SUB2 |
| 4 | 82 | SUB1 |
| 4 | 79 | SUB2 |
| 5 | 92 | SUB1 |
| 5 | 75 | SUB2 |
+--------+-------+----------+
Results Table
Essentially I have a details table and a results table. I want to find the name and marks of the candidate who has got the highest score in SUB1 and SUB2 combined. Basically the person with the highest aggregate marks.
I can find the summation of SUB1 and SUB2 for all candidates using the following query-:
select p.pname, sum(r.marks) from personel p,
result r where p.rollno=r.rollno group by p.pname;
It gives the following output-:
+--------+--------------+
| pname | sum(r.marks) |
+--------+--------------+
| A | 178 |
| B | 167 |
| C | 184 |
| D | 198 |
| E | 161 |
+--------+--------------+
This is fine but I need the output to be only D | 198 as he is the highest scorer. Now when I modify query like the following it fails-:
select p.pname, max(sum(r.marks)) from personel p,
result r where p.rollno=r.rollno group by p.pname;
In MySQL I get the error of Invaild Group Function.
Now searching on SO I did get my correct answer which uses derived tables. I get my answer by using the following query-:
SELECT
pname, MAX(max_sum)
FROM
(SELECT
p.pname AS pname, SUM(r.marks) AS max_sum
FROM
personel p, result r
WHERE
p.rollno = r.rollno
GROUP BY p.pname) a;
But my question is Why doesn't MAX(SUM(col)) work ?
I don't understand why max can't compute the value returned by SUM(). Now an answer on SO stated that since SUM() returns only a single value so MAX() find its meaningless to compute the value of one value, but I have tested the following query -:
select max(foo) from a;
on the Table "a" which has only one row with only one column called foo that holds an integer value. So if MAX() can't compute single values then how did this work ?
Can someone explain to me how the query processor executes the query and why I get the error of invalid group function ? From the readability point of view using MAX(SUM(col)) is perfect but it doesn't work out that way. I want to know why.
Are MAX and SUM never to be used together? I am asking because I have seen queries like MAX(COUNT(col)). I don't understand how that works and not this.
Aggregate functions require an argument that provides a value for each row in the group. Other aggregate functions don't do that.
It's not very sensical anyway. Suppose MySQL accepted MAX(SUM(col)) -- what would it mean? Well, the SUM(col) yields the sum of all non-NULL values of column col over all rows in the relevant group, which is a single number. You could take the MAX() of that to be that same number, but what would be the point?
Your approach using a subquery is different, at least in principle, because it aggregates twice. The inner aggregation, in which you perform the SUM(), computes a separate sum for each value of p.pname. The outer query then computes the maximum across all rows returned by the subquery (because you do not specify a GROUP BY in the outer query). If that's what you want, that's how you need to specify it.
The error is 1111: invalid use of group function. As for why specifically MySQL has this problem I can really only say it is part of the underlying engine itself. SELECT MAX(2) does work (in spite of a lack of a GROUP BY) but SELECT MAX(SUM(2)) does not work.
This error will occur when grouping/aggregating functions such as MAX are used in the wrong spot such as in a WHERE clause. SELECT SUM(MAX(2)) also does not work.
You can imagine that MySQL attempts to aggregate both simultaneously rather than doing things in an order of operations, i.e. it does not SUM first and then get the MAX. This is why you need to do the queries as separate steps.
Try something like this:
select max(rs.marksums) maxsum from
(
select p.pname, sum(r.marks) marksums from personel p,
result r where p.rollno=r.rollno group by p.pname
) rs
with temp_table (name, max_marks) as
(select name, sum(marks) from personel p,result r, where p.rollno = r.rollno group by p.name)
select *from temp_table where max_marks = (select max(max_marks) from temp_table);
I didn't run this. But try this one. Hope it will work :)

can i use GROUP_CONCAT to update table?

can i use GROUP_CONCAT to update table? I have 2 tables
i
d | label
------------------------------
1 | ravi,rames,raja
------------------------------
2 | ravi
------------------------------
3 | ravi,raja
------------------------------
4 | null
------------------------------
5 | null
------------------------------
6 | rames
------------------------------
and
id | values
------------------------------
12 | raja
------------------------------
13 | rames
------------------------------
14 | ravi
------------------------------
And i want the result like following table--
id | label
------------------------------
1 | 12,13,14
------------------------------
2 | 14
------------------------------
3 | 14,12
------------------------------
4 | null
------------------------------
5 | null
------------------------------
6 | 13
------------------------------
but by using the following query -
SELECT `table1`.`id`, GROUP_CONCAT(`table2`.`id` ORDER BY `table2`.`id`) AS label
FROM `table1`
JOIN `table2` ON FIND_IN_SET(`table2`.`values`, `table1`.`nos`)
GROUP BY `table1`.`id`;
Im getting-
id | label
------------------------------
1 | 12,13,14
------------------------------
2 | 14
------------------------------
3 | 12,14
------------------------------
6 | 13
------------------------------
I want to keep the null value. otherwise the order of rows will be broken. please help.
sorry for the large font :(
You just need a LEFT JOIN to preserve the nulls:
SELECT `table1`.`id`, GROUP_CONCAT(`table2`.`id` ORDER BY `table2`.`id`) AS label
FROM `table1`
LEFT JOIN `table2` ON FIND_IN_SET(`table2`.`values`, `table1`.`nos`)
GROUP BY `table1`.`id`;
However, I recommend against updating a table to include comma-separated values in a column. It forces you to use FIND_IN_SET() when querying it, and breaks the ability to index the column, affecting the performance of your queries. The more sustainable action would be to normalize table1 so that it doesn't include a comma-separated column.
Update:
To use GROUP_CONCAT() in an UPDATE statement, you would use a syntax like the following. Substitute your correct table and column names, and in your case, you probably want to replace the entire JOIN subquery with your SELECT statement.
UPDATE
tbl_to_update
JOIN (SELECT id, GROUP_CONCAT(concatcolumn) AS label FROM tbl GROUP BY id) tbl_concat
ON tbl_to_update.id = tbl_concat.id
SET tbl_to_update.column_to_update = tbl_concat.label
WHERE <where condition>
So in your case:
UPDATE
table1
INNER JOIN (SELECT id, GROUP_CONCAT(id) AS label FROM table1 GROUP BY id) table2
ON FIND_IN_SET(`table2`.`label`, `table1`.`nos`)
SET table1.nos = table2.id

Using SQL to get distinct rows, but also the whole row for those

Ok so its easier to give an example and hopefully some has a solution:
I have table that holds bids:
ID | companyID | userID | contractID | bidAmount | dateAdded
Below is an example set of rows that could be in the table:
ID | companyID | userID | contractID | bidAmount | dateAdded
--------------------------------------------------------------
10 | 2 | 1 | 94 | 1.50 | 1309933407
9 | 2 | 1 | 95 | 1.99 | 1309933397
8 | 2 | 1 | 96 | 1.99 | 1309933394
11 | 103 | 1210 | 96 | 1.98 | 1309947237
12 | 2 | 1 | 96 | 1.97 | 1309947252
Ok so what I would like to do is to be able to get all the info (like by using * in a normal select statement) the lowest bid for each unique contractID.
So I would need the following rows:
ID = 10 (for contractID = 94)
ID = 9 (for contractID - 95)
ID = 12 (for contractID = 96)
I want to ignore all the others. I thought about using DISTINCT, but i haven't been able to get it to return all the columns, only the column I'm using for distinct.
Does anyone have any suggestions?
Thanks,
Jeff
select *
from mytable main
where bidAmount = (
select min(bidAmount)
from mytable
where contractID = main.contractID)
Note that this will return multiple rows if there is more than one record sharing the same minimum bid.
Didn't test it but it should be possible with this query although it might not be really fast:
SELECT * FROM bids WHERE ID IN (
SELECT ID FROM bids GROUP BY contractID ORDER BY MIN(bidAmount) ASC
)
This would be the query for MySQL, maybe you need to adjust it for another db.
You could use a subquery to find the lowest rowid per contractid:
select *
from YourTable
where id in
(
select min(id)
from YourTable
group by
ContractID
)
The problem is that distinct does not return a specific row - it return distinct values, which ( by definition ) could occur on multiple rows.
Subqueries are your answer, and somewhere in the suggestions above is probably the answer. Your subquery need to return the ids or the rows with the minimum bidvalue. Then you can select * from the rows with those ids.