sql - Why doesn't MAX() of SUM() work? - mysql

I am trying to understand why the SQL command of MAX(SUM(col)) gives the a syntax error. I have the two tables as below-:
+--------+--------+---------+-------+
| pname | rollno | address | score |
+--------+--------+---------+-------+
| A | 1 | CCU | 1234 |
| B | 2 | CCU | 2134 |
| C | 3 | MMA | 4321 |
| D | 4 | MMA | 1122 |
| E | 5 | CCU | 1212 |
+--------+--------+---------+-------+
Personnel Table
+--------+-------+----------+
| rollno | marks | sub |
+--------+-------+----------+
| 1 | 90 | SUB1 |
| 1 | 88 | SUB2 |
| 2 | 89 | SUB1 |
| 2 | 95 | SUB2 |
| 3 | 99 | SUB1 |
| 3 | 99 | SUB2 |
| 4 | 82 | SUB1 |
| 4 | 79 | SUB2 |
| 5 | 92 | SUB1 |
| 5 | 75 | SUB2 |
+--------+-------+----------+
Results Table
Essentially I have a details table and a results table. I want to find the name and marks of the candidate who has got the highest score in SUB1 and SUB2 combined. Basically the person with the highest aggregate marks.
I can find the summation of SUB1 and SUB2 for all candidates using the following query-:
select p.pname, sum(r.marks) from personel p,
result r where p.rollno=r.rollno group by p.pname;
It gives the following output-:
+--------+--------------+
| pname | sum(r.marks) |
+--------+--------------+
| A | 178 |
| B | 167 |
| C | 184 |
| D | 198 |
| E | 161 |
+--------+--------------+
This is fine but I need the output to be only D | 198 as he is the highest scorer. Now when I modify query like the following it fails-:
select p.pname, max(sum(r.marks)) from personel p,
result r where p.rollno=r.rollno group by p.pname;
In MySQL I get the error of Invaild Group Function.
Now searching on SO I did get my correct answer which uses derived tables. I get my answer by using the following query-:
SELECT
pname, MAX(max_sum)
FROM
(SELECT
p.pname AS pname, SUM(r.marks) AS max_sum
FROM
personel p, result r
WHERE
p.rollno = r.rollno
GROUP BY p.pname) a;
But my question is Why doesn't MAX(SUM(col)) work ?
I don't understand why max can't compute the value returned by SUM(). Now an answer on SO stated that since SUM() returns only a single value so MAX() find its meaningless to compute the value of one value, but I have tested the following query -:
select max(foo) from a;
on the Table "a" which has only one row with only one column called foo that holds an integer value. So if MAX() can't compute single values then how did this work ?
Can someone explain to me how the query processor executes the query and why I get the error of invalid group function ? From the readability point of view using MAX(SUM(col)) is perfect but it doesn't work out that way. I want to know why.
Are MAX and SUM never to be used together? I am asking because I have seen queries like MAX(COUNT(col)). I don't understand how that works and not this.

Aggregate functions require an argument that provides a value for each row in the group. Other aggregate functions don't do that.
It's not very sensical anyway. Suppose MySQL accepted MAX(SUM(col)) -- what would it mean? Well, the SUM(col) yields the sum of all non-NULL values of column col over all rows in the relevant group, which is a single number. You could take the MAX() of that to be that same number, but what would be the point?
Your approach using a subquery is different, at least in principle, because it aggregates twice. The inner aggregation, in which you perform the SUM(), computes a separate sum for each value of p.pname. The outer query then computes the maximum across all rows returned by the subquery (because you do not specify a GROUP BY in the outer query). If that's what you want, that's how you need to specify it.

The error is 1111: invalid use of group function. As for why specifically MySQL has this problem I can really only say it is part of the underlying engine itself. SELECT MAX(2) does work (in spite of a lack of a GROUP BY) but SELECT MAX(SUM(2)) does not work.
This error will occur when grouping/aggregating functions such as MAX are used in the wrong spot such as in a WHERE clause. SELECT SUM(MAX(2)) also does not work.
You can imagine that MySQL attempts to aggregate both simultaneously rather than doing things in an order of operations, i.e. it does not SUM first and then get the MAX. This is why you need to do the queries as separate steps.

Try something like this:
select max(rs.marksums) maxsum from
(
select p.pname, sum(r.marks) marksums from personel p,
result r where p.rollno=r.rollno group by p.pname
) rs

with temp_table (name, max_marks) as
(select name, sum(marks) from personel p,result r, where p.rollno = r.rollno group by p.name)
select *from temp_table where max_marks = (select max(max_marks) from temp_table);
I didn't run this. But try this one. Hope it will work :)

Related

Inner Join 2 tables multiple result

i have 2 tables, one for the products and one for the sizes.
they have a relation with fk and the problem is that when I'm using the inner join I cannot use the "group by" in order to don't have repeated results.
this is the code:
SELECT
sneakers.sneaker_id,
sneakers.sneaker_name,
sneakers.gender,
sneakers.description,
sneakers.price,
sizes.size,
brand_names.brand_name
FROM sneakers
INNER JOIN sizes ON sneaker_fk = sneaker_id
if i try to use the GROUP BY sneaker_fk it will give me this response:
Error
SQL query: Documentation
SELECT sneakers.sneaker_id,sneakers.sneaker_name, sneakers.gender, sneakers.description,
sneakers.price, sizes.size, brand_names.brand_name FROM sneakers
INNER JOIN sizes ON sneaker_fk = sneaker_id
INNER JOIN brand_names ON brand_name_fk = brand_name_id
GROUP BY sneaker_fk LIMIT 0, 30
MySQL said: Documentation
#1055 - Expression #6 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'sneakerstore.sizes.size' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
What am I doing wrong??
Do you have any better solution to display One item with all the related sizes without having multiple results?
I hope you can help me as fast as possible!
thanks in advance
Use MySQL's Group_Concat() function
The MySQL GROUP_CONCAT() function returns a string with concatenated non-NULL value from a group.
Excerpt from Using GROUP_CONCAT with joined tables
GROUP_CONCAT works with joins as well.
Let's say we have a table courses:
| id | name |
+—-+—————+
| 1 | Ruby 101 |
+—-+—————+
| 2 | TDD for Poets |
+—-+—————+'
We also have a second table bookings:
| id | course_id |
+—-+———–+
| 7 | 1 |
+—-+———–+
| 8 | 1 |
+—-+———–+
| 9 | 1 |
+—-+———–+
| 10 | 2 |
+—-+———–+
| 11 | 2 |
+—-+———–+
SELECT courses.name, GROUP_CONCAT(bookings.id)
FROM bookings
INNER JOIN courses ON courses.id == bookings.course_id
GROUP BY bookings.course_id;
The result set looks like this:
| courses.name | GROUP_CONCAT(bookings.id) |
+—————+—————————+
| Ruby 101 | 7,8,9 |
+—————+—————————+
| TDD for Poets | 10,11 |
+—————+—————————+

SQL sorting does not follow group by statement, always uses primary key

I have a SQL database with a table called staff, having following columns:
workerID (Prim.key), name, department, salary
I am supposed to find the workers with the highest salary per department and used the following statement:
select staff.workerID, staff.name, staff.department, max(staff.salary) AS biggest
from staff
group by staff.department
I get one worker shown from each department, but they are NOT the workers with the highest salary, BUT the biggest salary value is shown, even though the worker does not get that salary.
The person shown is the worker with the "lowest" workerID per department.
So, there is some sorting going on using the primary key, even though it is not mentioned in the group by statement.
Can someone explain, what is going on and maybe how to sort correctly.
Explanation for what is going on:
You are performing a GROUP BY on staff.department, however your SELECT list contains 2 non-grouping columns staff.workerID, staff.name. In standard sql this is a syntax error, however MySql allows it so the query writers have to make sure that they handle such situations themselves.
Reference: http://dev.mysql.com/doc/refman/5.0/en/group-by-handling.html
In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause.
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause.
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
Starting with MySQL 5.1 the non-standard feature can be disabled by setting the ONLY_FULL_GROUP_BY flag in sql_mode: http://dev.mysql.com/doc/refman/5.6/en/sql-mode.html#sqlmode_only_full_group_by
How to fix:
select staff.workerID, staff.name, staff.department, staff.salary
from staff
join (
select staff.department, max(staff.salary) AS biggest
from staff
group by staff.department
) t
on t.department = staff.department and t.biggest = staff.salary
In the inner query, fetch department and its highest salary using GROUP BY. Then in the outer query join those results with the main table which would give you the desired results.
This is the usual case group by with a aggregate function does not guarantee proper row corresponding to the aggregate function. Now there are many ways to do it and the usual practice is a sub-query and join. But if the table is big then performance wise it kills, so the other approach is to use left join
So lets say we have the table
+----------+------+-------------+--------+
| workerid | name | department | salary |
+----------+------+-------------+--------+
| 1 | abc | computer | 400 |
| 2 | cdf | electronics | 200 |
| 3 | gfd | computer | 400 |
| 4 | wer | physics | 300 |
| 5 | hgt | computer | 700 |
| 6 | juy | electronics | 100 |
| 7 | wer | physics | 400 |
| 8 | qwe | computer | 200 |
| 9 | iop | electronics | 800 |
| 10 | kli | physics | 800 |
| 11 | qsq | computer | 600 |
| 12 | asd | electronics | 300 |
+----------+------+-------------+--------+
SO we can get the data as
select st.* from staff st
left join staff st1 on st1.department = st.department
and st.salary < st1.salary
where
st1.workerid is null
The above will give you as
+----------+------+-------------+--------+
| workerid | name | department | salary |
+----------+------+-------------+--------+
| 5 | hgt | computer | 700 |
| 9 | iop | electronics | 800 |
| 10 | kli | physics | 800 |
+----------+------+-------------+--------+
My favorite solution to this problem uses LEFT JOIN:
SELECT m.workerID, m.name, m.department, m.salary
FROM staff m # 'm' from 'maximum'
LEFT JOIN staff o # 'o' from 'other'
ON m.department = o.department # match rows by department
AND m.salary < o.salary # match each row in `m` with the rows from `o` having bigger salary
WHERE o.salary IS NULL # no bigger salary exists in `o`, i.e. `m`.`salary` is the maximum of its dept.
;
This query selects all the workers that have the biggest salary from their department; i.e. if two or more workers have the same salary and it is the bigger in their department then all these workers are selected.
Try this:
SELECT s.workerID, s.name, s.department, s.salary
FROM staff s
INNER JOIN (SELECT s.department, MAX(s.salary) AS biggest
FROM staff s GROUP BY s.department
) AS B ON s.department = B.department AND s.salary = B.biggest;
OR
SELECT s.workerID, s.name, s.department, s.salary
FROM (SELECT s.workerID, s.name, s.department, s.salary
FROM staff s
ORDER BY s.department, s.salary DESC
) AS s
GROUP BY s.department;

What is SQL to select a property and the max number of occurrences of a related property?

I have a table like this:
Table: p
+----------------+
| id | w_id |
+---------+------+
| 5 | 8 |
| 5 | 10 |
| 5 | 8 |
| 5 | 10 |
| 5 | 8 |
| 6 | 5 |
| 6 | 8 |
| 6 | 10 |
| 6 | 10 |
| 7 | 8 |
| 7 | 10 |
+----------------+
What is the best SQL to get the following result? :
+-----------------------------+
| id | most_used_w_id |
+---------+-------------------+
| 5 | 8 |
| 6 | 10 |
| 7 | 8 |
+-----------------------------+
In other words, to get, per id, the most frequent related w_id.
Note that on the example above, id 7 is related to 8 once and to 10 once.
So, either (7, 8) or (7, 10) will do as result. If it is not possible to
pick up one, then both (7, 8) and (7, 10) on result set will be ok.
I have come up with something like:
select counters2.p_id as id, counters2.w_id as most_used_w_id
from (
select p.id as p_id,
w_id,
count(w_id) as count_of_w_ids
from p
group by id, w_id
) as counters2
join (
select p_id, max(count_of_w_ids) as max_counter_for_w_ids
from (
select p.id as p_id,
w_id,
count(w_id) as count_of_w_ids
from p
group by id, w_id
) as counters
group by p_id
) as p_max
on p_max.p_id = counters2.p_id
and p_max.max_counter_for_w_ids = counters2.count_of_w_ids
;
but I am not sure at all whether this is the best way to do it. And I had to repeat the same sub-query two times.
Any better solution?
Try to use User defined variables
select id,w_id
FROM
( select T.*,
if(#id<>id,1,0) as row,
#id:=id FROM
(
select id,W_id, Count(*) as cnt FROM p Group by ID,W_id
) as T,(SELECT #id:=0) as T1
ORDER BY id,cnt DESC
) as T2
WHERE Row=1
SQLFiddle demo
Formal SQL
In fact - your solution is correct in terms of normal SQL. Why? Because you have to stick with joining values from original data to grouped data. Thus, your query can not be simplified. MySQL allows to mix non-group columns and group function, but that's totally unreliable, so I will not recommend you to rely on that effect.
MySQL
Since you're using MySQL, you can use variables. I'm not a big fan of them, but for your case they may be used to simplify things:
SELECT
c.*,
IF(#id!=id, #i:=1, #i:=#i+1) AS num,
#id:=id AS gid
FROM
(SELECT id, w_id, COUNT(w_id) AS w_count
FROM t
GROUP BY id, w_id
ORDER BY id DESC, w_count DESC) AS c
CROSS JOIN (SELECT #i:=-1, #id:=-1) AS init
HAVING
num=1;
So for your data result will look like:
+------+------+---------+------+------+
| id | w_id | w_count | num | gid |
+------+------+---------+------+------+
| 7 | 8 | 1 | 1 | 7 |
| 6 | 10 | 2 | 1 | 6 |
| 5 | 8 | 3 | 1 | 5 |
+------+------+---------+------+------+
Thus, you've found your id and corresponding w_id. The idea is - to count rows and enumerate them, paying attention to the fact, that we're ordering them in subquery. So we need only first row (because it will represent data with highest count).
This may be replaced with single GROUP BY id - but, again, server is free to choose any row in that case (it will work because it will take first row, but documentation says nothing about that for common case).
One little nice thing about this is - you can select, for example, 2-nd by frequency or 3-rd, it's very flexible.
Performance
To increase performance, you can create index on (id, w_id) - obviously, it will be used for ordering and grouping records. But variables and HAVING, however, will produce line-by-line scan for set, derived by internal GROUP BY. It isn't such bad as it was with full scan of original data, but still it isn't good thing about doing this with variables. On the other hand, doing that with JOIN & subquery like in your query won't be much different, because of creating temporery table for subquery result set too.
But to be certain, you'll have to test. And keep in mind - you already have valid solution, which, by the way, isn't bound to DBMS-specific stuff and is good in terms of common SQL.
Try this query
select p_id, ccc , w_id from
(
select p.id as p_id,
w_id, count(w_id) ccc
from p
group by id,w_id order by id,ccc desc) xxx
group by p_id having max(ccc)
here is the sqlfidddle link
You can also use this code if you do not want to rely on the first record of non-grouping columns
select p_id, ccc , w_id from
(
select p.id as p_id,
w_id, count(w_id) ccc
from p
group by id,w_id order by id,ccc desc) xxx
group by p_id having ccc=max(ccc);

Groupwise maximum

I have a table from which I am trying to retrieve the latest position for each security:
The Table:
My query to create the table: SELECT id, security, buy_date FROM positions WHERE client_id = 4
+-------+----------+------------+
| id | security | buy_date |
+-------+----------+------------+
| 26 | PCS | 2012-02-08 |
| 27 | PCS | 2013-01-19 |
| 28 | RDN | 2012-04-17 |
| 29 | RDN | 2012-05-19 |
| 30 | RDN | 2012-08-18 |
| 31 | RDN | 2012-09-19 |
| 32 | HK | 2012-09-25 |
| 33 | HK | 2012-11-13 |
| 34 | HK | 2013-01-19 |
| 35 | SGI | 2013-01-17 |
| 36 | SGI | 2013-02-16 |
| 18084 | KERX | 2013-02-20 |
| 18249 | KERX | 0000-00-00 |
+-------+----------+------------+
I have been messing with versions of queries based on this page, but I cannot seem to get the result I'm looking for.
Here is what I've been trying:
SELECT t1.id, t1.security, t1.buy_date
FROM positions t1
WHERE buy_date = (SELECT MAX(t2.buy_date)
FROM positions t2
WHERE t1.security = t2.security)
But this just returns me:
+-------+----------+------------+
| id | security | buy_date |
+-------+----------+------------+
| 27 | PCS | 2013-01-19 |
+-------+----------+------------+
I'm trying to get the maximum/latest buy date for each security, so the results would have one row for each security with the most recent buy date. Any help is greatly appreciated.
EDIT: The position's id must be returned with the max buy date.
You can use this query. You can achieve results in 75% less time. I checked with more data set. Sub-Queries takes more time.
SELECT p1.id,
p1.security,
p1.buy_date
FROM positions p1
left join
positions p2
on p1.security = p2.security
and p1.buy_date < p2.buy_date
where
p2.id is null;
SQL-Fiddle link
You can use a subquery to get the result:
SELECT p1.id,
p1.security,
p1.buy_date
FROM positions p1
inner join
(
SELECT MAX(buy_date) MaxDate, security
FROM positions
group by security
) p2
on p1.buy_date = p2.MaxDate
and p1.security = p2.security
See SQL Fiddle with Demo
Or you can use the following in with a WHERE clause:
SELECT t1.id, t1.security, t1.buy_date
FROM positions t1
WHERE buy_date = (SELECT MAX(t2.buy_date)
FROM positions t2
WHERE t1.security = t2.security
group by t2.security)
See SQL Fiddle with Demo
This is done with a simple group by. You want to group by the securities and get the max of buy_date. The SQL:
SELECT security, max(buy_date)
from positions
group by security
Note, this is faster than bluefeet's answer but does not display the ID.
The answer by #bluefeet has two more ways to get the results you want - and the first will probably be more efficient than your query.
What I don't understand is why you say that your query doesn't work. It seems pretty fine and returns the expected result. Tested at SQL-Fiddle
SELECT t1.id, t1.security, t1.buy_date
FROM positions t1
WHERE buy_date = ( SELECT MAX(t2.buy_date)
FROM positions t2
WHERE t1.security = t2.security ) ;
If the problems appears when you add the client_id = 4 condition, then it's because you add it only in one WHERE clause while you have to add it in both:
SELECT t1.id, t1.security, t1.buy_date
FROM positions t1
WHERE client_id = 4
AND buy_date = ( SELECT MAX(t2.buy_date)
FROM positions t2
WHERE client_id = 4
AND t1.security = t2.security ) ;
select security, max(buy_date) group by security from positions;
is all you need to get max buy date for each security (when you say out loud what you want from a query and you include the phrase "for each x", you probably want a group by on x)
When you use a group by, all columns in your select must either be columns that have been grouped by or aggregates, so if, for example, you wanted to include id, you'd probably have to use a subquery similar to what you had before, since there doesn't seem to be any aggregate you can reasonably use on the ids, and another group by would give you too many rows.

Distinct Help

Alright so I have a table, in this table are two columns with ID's. I want to make one of the columns distinct, and once it is distinct to select all of those from the second column of a certain ID.
Originally I tried:
select distinct inp_kll_id from kb3_inv_plt where inp_plt_id = 581;
However this does the where clause first, and then returns distinct values.
Alternatively:
select * from (select distinct(inp_kll_id) from kb3_inv_plt) as inp_kll_id where inp_plt_id = 581;
However this cannot find the column inp_plt_id because distinct only returns the column, not the whole table.
Any suggestions?
Edit:
Each kll_id may have one or more plt_id. I would like unique kll_id's for a certain kb3_inv_plt id.
| inp_kll_id | inp_plt_id |
| 1941 | 41383 |
| 1942 | 41276 |
| 1942 | 38005 |
| 1942 | 39052 |
| 1942 | 40611 |
| 1943 | 5868 |
| 1943 | 4914 |
| 1943 | 39511 |
| 1944 | 39511 |
| 1944 | 41276 |
| 1944 | 40593 |
| 1944 | 26555 |
If you do mean, by "make distinct", "pick only inp_kll_ids that happen just once" (not the SQL semantics for Distinct), this should work:
select inp_kll_id
from kb3_inv_plt
group by inp_kll_id
having count(*)=1 and inp_plt_id = 581;
Get all the distinct first (alias 'a' in my following example) and then join it back to the table with the specified criteria (alias 'b' in my following example).
SELECT *
FROM (
SELECT
DISTINCT inp_kll_id
FROM kb3_inv_plt
) a
LEFT JOIN kb3_inv_plt b
ON a.inp_kll_id = b.inp_kll_id
WHERE b.inp_plt_id = 581
in this table are two columns with
ID's. I want to make one of the
columns distinct, and once it is
distinct to select all of those from
the second column of a certain ID.
SELECT distinct tableX.ID2
FROM tableX
WHERE tableX.ID1 = 581
I think your understanding of distinct may be different from how it works. This will indeed apply the where clause first, and then get a distinct list of unique entries of tableX.ID2, which is exactly what you ask for in the first part of your question.
By making a row distinct, you're ensuring no other rows are exactly the same. You aren't making a column distinct. Let's say your table has this data:
ID1 ID2
10 4
10 3
10 7
4 6
When you select distinct ID1,ID2 - you get the same as select * because the rows are already distinct.
Can you add information to clear up what you are trying to do?