Group-wise Maximum of a Certain Column - mysql

I've got the table:
SELECT * FROM shop;
+---------+--------+------
| article | dealer | price
+---------+--------+------
| 0001 | A | 3.45
| 0001 | B | 3.99
| 0002 | A | 10.99
| 0003 | B | 1.45
| 0003 | C | 1.69
| 0003 | D | 1.25
| 0004 | D | 19.95
+---------+--------+------
7 rows in set (0.20 sec)
And I want to get - for each article - the dealer or dealers with the most expensive price.
Could anyone tell me why this doesn’t work?
SELECT article, dealer, MAX(price) FROM shop GROUP BY(article);
For this query, I get the following result-set;
+---------+--------+------------+
| article | dealer | MAX(price) |
+---------+--------+------------+
| 0001 | A | 3.99 |
| 0002 | A | 10.99 |
| 0003 | B | 1.69 |
| 0004 | D | 19.95 |
+---------+--------+------------+
4 rows in set (0.03 sec)
Although the max prices are correct, I got the wrong dealers for some articles.

According to your question it seems that you have already read the article about group-wise maximum of a certain column, however you just don't understand why the method you mentioned does not work as you expect.
Let's imagine a query like this:
SELECT article, dealer, MAX(price), MIN(price)
FROM shop
GROUP BY article
What value of a dealer do you expect?
I think this answers your question.

Standard SQL would reject your query because you can not SELECT non-aggregate fields that are not part of the GROUP BY clause in an aggregate query.
You're using a MySQL extension of SQL described here:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.

This does not work, because if you use group by, you can not use the individual fields of the original rows (except for the field you are grouping on). The correct way to do this, is to make an inner/nested query to select the dealer, suck as this (I haven't tested it, so it might be slightly off):
SELECT article, MAX(price) as maxPrice,
(SELECT dealer FROM shop AS s2 WHERE s2.article = s1.article AND s2.price = maxPrice) AS expensiveDealer
FROM shop AS s1 GROUP BY(article);

Here you go:
SELECT article, dealer, price
FROM (SELECT article, dealer, price
FROM shop
ORDER BY price DESC) AS h
GROUP BY article
This solution doesn't even require a MAX() function. :)
Note: This solution doesn't work with ONLY_FULL_GROUP_BY active and only works in MySQL. This solution is to a certain extent unsupported due to lack of documentation confirming this behavior. It works well for me and has always worked well for me however.
This method still works on the latest MySQL on sqlfiddle.

I just tumbled over this question and wonder why noone comes to idea to join the table with itself as described in certain tutorials (see links below).
So I'd suggest the following solution:
Select A.*
From shop As A
Left Join shop As B On A.article
= B.Article
AND A.price
< B.price
Where B.price Is Null;
The magic is obvious: join the table with itself and link any records in it to any other record having a higher price. From those, grab only those having NO linked record with a higher price (for these records are the ones with the highest price).
As far as I have experienced, this solution is even the best regarding its performance.
This part of the MySQL documentation and/or this very interesting article by Jan Kneschke might be helpful — enjoy!

Related

Need validation that interpretation for a Grouping Query is correct

I am running the following query and at first it appears to give the sub totals for customers and shows by date each customers payment amounts only if that total for all payments is greater than $90,000.
SELECT
Customername,
Date(paymentDate),
CONCAT('$', Round(SUM(amount),2)) AS 'High $ Paying Customers'
FROM Payments
JOIN Customers
On payments.customernumber = customers.customernumber
Group by customername, Date(paymentDate) WITH ROLLUP
having sum(amount)> 90000;
But upon looking at the records for Dragon Souveniers, Ltd. and Euro+ Shopping Channel is is actually showing the paydates that have amounts individually over $90000 as well as the subtotal for that customer as a rollup. For all other customers, their individual payment dates are not reported in the result set and only their sum is if it over $90000. For example Annna's Decorations as 4 payment records and none of them are over 90000 but her sum is reported as the value for the total payments in the query with the rollup. Is this the correct interpretation?
The HAVING clause work correct, It filters all records with a total no above 90000. It also does do this for totals.
When using GROUP BY .... WITH ROLLUP, you can detect the created ROLL UP lines by using the GROUPING() function.
You should add a condition in a way that the desired columns are not filtered.
Simple example:
select a, sum(a), grouping(a<3)
from (select 1 as a
union
select 2
union select 3) x
group by a<3 with rollup;
output:
+---+--------+---------------+
| a | sum(a) | grouping(a<3) |
+---+--------+---------------+
| 3 | 3 | 0 |
| 1 | 3 | 0 |
| 1 | 6 | 1 |
+---+--------+---------------+
this shows that the last line (with grouping(i<3) == 1) is a line containing totals for a<3.

How do I sum a column, and join it to another table based on a condition in SQL?

I have two tables in SQL, one that contains product_id, products_name, department_name, and product_sales and one that has department_id, department_name, and over_head_costs.
I want to be able to find the sum of all sales (grouped by department_name in table 1) and subtract the over_head_costs from table 2 so that I know how profitable a department is. Then I want to output the information like:
department_id, department_name, over_head_costs, product/department sales, total_profit.
I've been searching for like 2-3 hours. I've messed around with joins (which I'm pretty sure is how to solve this) and found the SUM function, which achieves summing (but not by department) and honestly, even if I'd seen the solution I wouldn't know it. I'm just really struggling to understand SQL.
SELECT SUM(products.product_sales), department_id, departments.department_name, over_head_costs
FROM products, departments
WHERE products.department_name = departments.department_name;
This is my most recent query and the closest I've gotten, except it only returns one department (I currently have 3).
This is roughly what I’d like it to look like:
Table 1 (products):
ID ITEM DEPARTMENT SALES
1 Hammer Tools 40
2. Nails Tools 40
3. Keyboard Computer 80
Table 2 (departments):
ID DEPARTMENT COST
1 Tools 20
2. Computer 30
Output:
ID DEPARTMENT COST SALES PROFIT
1 Tools 20 80 60
2. Computer 30 80 50
I'm not really sure what else to try. I think I'm just not understanding how joins and such work. Any help would be greatly appreciated.
You can try to use SUM wiht group by in a subquery. then do join.
Query 1:
SELECT d.*,
t1.SALES,
(t1.SALES - d.COST)PROFIT
FROM (
SELECT DEPARTMENT,SUM(SALES) SALES
FROM products
GROUP BY DEPARTMENT
) t1 JOIN departments d on d.DEPARTMENT = t1.DEPARTMENT
Results:
| DEPARTMENT | COST | SALES | PROFIT |
|------------|------|-------|--------|
| Tools | 20 | 80 | 60 |
| Computer | 30 | 80 | 50 |

Limit the count using GROUP BY

I want to limit the count to 5 using COUNT(*) and group by but it returns all the rows.Consider I have a table names tbhits
tbhits
id | uname
------------------------
101 | john
101 | james
101 | henry
101 | paul
101 | jacob
101 | jaden
101 | steve
101 | lucas
102 | marie
SELECT id,COUNT(*) as 'hits' FROM tbhits GROUP BY id
returns
id | hits
--------------------
101 | 8
102 | 1
But I want the group by to limit maximum count to 5.
Say I have got 1000 rows I dont want to count them all, if rows are just greater than 5 then just display 5+
I tried using LIMIT 5 but it does not seem to work
SELECT id,COUNT(*) as 'hits' FROM tbhits GROUP BY id LIMIT 5 does not work.
I also used WHERE Clause
SELECT id,COUNT(*) as 'hits' FROM tbhits WHERE id = 101 GROUP BY id LIMIT 5
but it stil returns hits as 8.
id | hits
--------------------
101 | 8
Any help is greatly appreciated.
LIMIT is intended to limit the number of rows you'll get from your query. I suggest you use the COUNT function as follows :
SELECT id, CASE WHEN COUNT(*) < 6 then COUNT(*) ELSE '5+' END as 'hits'
FROM tbhits
GROUP BY id
More details about selecting the minimum of two numbers here, and here goes the sqlfiddle (consider providing it yourself next time).
Note that I went for 6 instead of '5+' on my first suggestion, because you should not, in my opinion, mix data types. But putting 6 is not a good solution either, because someone not aware of the trick will not notice it ('5+', at least, is explicit)
As far as performance is concerned, AFAIK you should not expect MySQL to do the optimization itself.
LIMIT on GROUP BY clause won't actually limit the counts, it will limit the rows being outputed.
Try using if statement to compare count result,
SELECT id,if(COUNT(*)>5,'5+',COUNT(*)) as 'hits'
FROM tbhits
GROUP BY id
O/p:
id | hits
--------------------
101 | 5+
102 | 1
Regarding performance issue, AFAIK GROUP BY will always lead to lead down performance and there is no direct way to limit counts in GROUP BY clause. You will have to go with either IF or CASE statement if you want solution from MySQL. Otherwise go with PHP itself.
Moreover you should have a look at GROUP BY optimization
As was already said LIMIT applies last in this case, thus after the grouping. What you want to do is modify the value that is selected once the grouping is done.
This will appropriately output "5+" if you have more than 5 records for your table.
SELECT id,
IF(COUNT(*)>5,"5+",COUNT(*)) AS 'count'
FROM Whatever GROUP BY id
See the SQL Fiddle here:
http://sqlfiddle.com/#!2/e381e/4
Try with this?
SELECT id,COUNT(*) as 'hits' FROM tbhits GROUP BY id
HAVING hits >= 5

how to sort 2 different column in sql

can you guys show me how to sort user define column and fixed column name in sql. i need to display the highest transaction and outletid, instead i only get the highest transaction but the oulet id is not in grouping.
pardon me, im very bad at english
here is the problem
outlet id | revenue code | total transaction | total amount
6837 | 014 | 326 | 39158.94
6821 | 408 | 291 | 48786.50
6814 | 014 | 285 | 74159.76
6837 | 452 | 282 | 8846.80
and here is my sql
SELECT
outletid,
revcode,
count(receiptnumbe) as Transactions,
sum(amount) as total
FROM
user_payment
WHERE
date = (SELECT MAX(date) FROM user_payment GROUP BY date desc LIMIT 0, 1)
GROUP BY
outletid, revcode
ORDER BY Transactions desc
i need it to be like this. sort by outlet id and highest transactions.
outlet id | revenue code | total transaction | total amount
6837 | 014 | 326 | 39158.94
6837 | 452 | 282 | 8846.80
6821 | 408 | 291 | 48786.50
6814 | 014 | 285 | 74159.76
Is this what you want?
ORDER BY OutletId, Transactions desc
EDIT:
If I understand correctly, you want it sorted by the outlet that has the most total transactions. Then by transactions within that group. To do that, you need to summarize again at the outlet level and join back the results:
select outor.*
from (SELECT up.outletid, up.revcode, count(up.receiptnumbe) as Transactions,
sum(up.amount) as total
FROM user_payment up
WHERE date = (SELECT MAX(date) FROM user_payment)
GROUP BY outletid, revcode
) outor join
(SELECT up.outletid, count(up.receiptnumbe) as Transactions,
sum(up.amount) as total
FROM user_payment up
WHERE date = (SELECT MAX(date) FROM user_payment)
GROUP BY outletid
) o
on outor.outletid = o.outletid
order by o.Transactions desc, outor.outletid, outor.Transactions desc;
1)The first thing to do is make sure that you are sorting the fields the way you want to. Do you want them sorted numerically or alphabetically?
See Sorting Lexical and Numeric
Count should be numerical, but you should check outletid.
If you have access to the tables, you could change the field to a number type for it to be sorted numerically or a string for it to be sorted alphabetically.
You might have to use cast or convert. See Oracle Cast Documentation.
2)If you want the whole table sorted by outlet id and amount of transactions you might consider removing the group by clause.
3)The third thing I would look at even if this did work is renaming column names that had reserved words to the tables that were reserved words. I noticed transaction highlighted in blue.
When these things are checked Melon's comment should work.
Good question. Feel free to comment so I can follow up.

Counting distinct values for multiple months

Got a little problem here. I can't for the life of me, figure out how to do this.
pid | firstlast | lastvisit | zip
---------------------------------------
435 | 2001-01-17 | 2012-01-21 | 46530
567 | 2001-01-18 | 2012-01-21 | 46530
532 | 2001-01-19 | 2012-01-22 | 46535
536 | 2001-01-19 | 2012-01-23 | 46535
539 | 2001-01-20 | 2012-01-27 | 46521
Here is my SQL query:
SELECT DISTINCT zip, COUNT(zip) AS totalzip FROM production WHERE MONTH(lastvisit) = "1" GROUP BY zip ORDER BY totalzip DESC;
Output:
Jan:
zip | totalzip
---------------------
46530 | 2
46535 | 2
46521 | 1
Feb:
zip | totalzip
---------------------
46530 | 1
46521 | 4
49112 | 3
This is great for the 1st month, but I need this for the entire year. I could run this query 12 times, however 2 problems occur. I have over 300 zip codes for the entire year. On some months the zip code is not present, so the count is 0 (but the MySQL output doesn't output the "zero data". Also, when I order by totalzip, the order changes from month to month, and this does not allow me to paste them into a spread sheet. I can order by zip code, but again the "zero" data zipcodes are not present and so the list changes from month to month.
Any thoughts or suggestions would be much appreciated!
You can make this work with subqueries:
select
a.*, count(c.zip) as totalZip
from
(select
monthVisit, zip
from
(select distinct last_day(lastVisit) as monthVisit from production) as m,
(select distinct zip from production) as z
) as a
left join (select
last_day(lastVisit) as monthVisit, zip
from production) as c
on a.monthVisit=c.monthVisit and a.zip=c.zip
group by
a.monthVisit, a.zip
This should give you the count of zips for each month you have, including zeros.
Let me explain how this works:
First, I defined a subquery that makes all the possible combinations of zips and months (the a subquery), and then I left joined this with a second subquery that returns the values of ZIPs and months (the c subquery). Using left join allows to count the possible empty combinations in the a subquery.
Hope this help you.
Note: The last_day() function returns the last day of the month of a given date; e.g.: last_day('2012-07-17')='2012-07-31'
If you have a zipcode table (you should), you could join it with your data table (a left join), which would bring even the zero-count zipcodes.
The first part of your question is solved with additional grouping. Try something like this:
SELECT DISTINCT zip, YEAR(lastvisit), MONTH(lastvisit), COUNT(zip) AS totalzip
FROM production
GROUP BY zip, YEAR(lastvisit), MONTH(lastvisit)
ORDER BY totalzip DESC;
To add in the "zero" summaries when no data is present I typically do a left-join with a complete list. (This is also stated by #Alfabravo above). So the final query looks a bit like:
SELECT DISTINCT zip, YEAR(lastvisit), MONTH(lastvisit), COUNT(zip) AS totalzip
FROM production left join
(SELECT DISTINCT zip from production) as zipMap on zipmap.zip = production.zip
GROUP BY zip, YEAR(lastvisit), MONTH(lastvisit)
ORDER BY totalzip DESC;