SQL: get A with max B for every distinct C - mysql

In my example, I have a table containing info about different venues, with columns for city, venue_name, and capacity. I need to select the city and venue_name for the venue with the highest capacity within each city. So if I have data:
city | venue | capacity
LA | venue1 | 10000
LA | venue2 | 20000
NY | venue3 | 1000
NY | venue4 | 500
... the query should return:
LA | venue2
NY | venue3
Can anybody give me advice on how to accomplish this query in SQL? I've gotten tangled up in joins and nested queries :P. Thanks!

select t.city, t.venue
from tbl t
join (select city, max(capacity) as max_capacity from tbl group by city) v
on t.city = v.city
and t.capacity = v.max_capacity

One way to do this is with not exists:
select i.*
from info i
where not exists (select 1
from into i2
where i2.city = i.city and i2.capacity > i.capacity);

The common approach is to join the table back to itself using a subquery with max:
select y.city, y.venue_name
from yourtable y
join (select city, max(capacity) maxcapacity
from yourtable
group by city
) t on y.city = t.city and y.capacity = t.maxcapacity

You can use an outer apply to order those values and bring the results back to your main query.
http://www.codeproject.com/Articles/607246/Making-OUTER-and-CROSS-APPLY-work-for-you
Another alternative would be to use the RowNum() function. http://msdn.microsoft.com/en-us/library/ms186734.aspx
SELECT
v.city,
Ranked.Venue,
Ranked.Capacity
FROM Venues v WITH (NOLOCK)
Outer Apply
(
SELECT TOP 1
Venue, Capacity
FROM Venues Ranked WITH (NOLOCK)
WHERE v.City = Ranked.City
ORDER BY Capacity DESC
) as Ranked
GROUP BY
v.city,
Ranked.Venue,
Ranked.Capacity

Related

Get the max value in a specific group of rows

I have these two tables:
popular_song
song_name | rate | country_id
------------------------------
Tic Tac | 10 | 1
Titanic | 2 | 1
Love Boat | 8 | 2
Battery | 9 | 2
country
conutry_id | country
--------------------------
1 | United States
2 | Germany
What I'd like to achieve is to get the most poular song in each country, e.g.:
song_name | rate | country
--------------------------
Tic Tac | 10 | United States
Battery | 9 | Germany
I've tried this query:
SELECT MAX(rate), song_name, country
FROM popular_song ps JOIN country cnt
ON ps.country_id = cnt.country_id
GROUP BY country
But this doesn't work. I've tried looking at questions like "Order by before group by" but didn't find an answer.
Which mysql query could achieve this result?
You can use another self join to popular songs table with the max rating
SELECT ps.*,cnt.country
FROM popular_song ps
JOIN (SELECT MAX(rate) rate, country_id FROM popular_song GROUP BY country_id) t1
ON(ps.country_id = t1.country_id and ps.rate= t1.rate)
JOIN country cnt
ON ps.country_id = cnt.conutry_id
See Demo
There is a trick that you can use with substring_index() and group_concat():
SELECT MAX(rate),
substring_index(group_concat(song_name order by rate desc separator '|'), '|', 1) as song,
country
FROM popular_song ps JOIN
country cnt
ON ps.country_id = cnt.country_id
GROUP BY country;
EDIT:
If you have big tables and lots of songs per country, I would suggest the not exists approach:
select rate, song country
from popular_song ps join
country cnt
on ps.country_id = cnt.country_id
where not exists (select 1
from popular_song ps2
where ps2.country_id = ps.country_id and ps2.rate > ps.rate
);
Along with an index on popular_song(country_id, rate). I recommended the group_concat() approach because the OP already had a query with a group by, so the trick is the easiest to plug into such a query.
Here is another way I'v learned from #Gordon Linoff. Here is that question you could learn too.
SELECT ps.*,cnt.country
FROM
(SELECT popular_song.*,
#rownum:= if (#c = country_id ,#rownum+1,if(#c := country_id, 1, 1) )as row_number
FROM popular_song ,
(SELECT #c := '', #rownum:=0) r
order by country_id, rate desc) as ps
LEFT JOIN country cnt
ON ps.country_id = cnt.conutry_id
WHERE ps.row_number = 1
This is the way of implementing row_number()(Partition by ...) window function in MySql.
You can do this with EXISTS like this:
SELECT rate, song_name, cnt.country_id
FROM popular_song ps JOIN country cnt
ON ps.country_id = cnt.country_id
WHERE NOT EXISTS
(SELECT * FROM popular_song
WHERE ps.country_id = country_id AND rate > ps.rate)
It is not specified in the question whether two songs can be returned per country if their rating is the same. Above query will return several records per country if ratings are not unique at country level.

making an unique column value after order by mysql

So, my problem is that I have a list of customers (table now has around 100k records) with income per each customer. When I group it by country I get around 60 countries with sum of income. Than I need to order it by the income DESC, my query looks something like this:
SELECT s2.i,s1.year,s1.short_c,s1.country,s1.uges FROM
(SELECT u.year,k.short_c,s.country, IFNULL(ROUND(SUM(u.income)),0) as uges
FROM im_income u,im_contact k,td_countries s
WHERE u.year=2012
AND u.customer_id=k.id
AND k.kat='K'
AND k.short_c=s.short_c
GROUP BY k.short_c, u.year
ORDER BY u.year ASC,uges DESC) s1
CROSS JOIN
(SELECT #i:=#i+1 as i FROM (SELECT #i:= 0) AS i) s2
And I know that this with CROSS JOIN is wrong since it is not giving me what I need, but is there a way to make an unique id after ORDER BY since I need to order countries with income DESC and than assing them id that would represent a rank number???
Result looks like this now:
+-+----+-------+---------+------+
|i|year|short_c|country |uges |
+-+----+-------+---------+------+
|1|2012|USA |United S.|123456|
+-+----+-------+---------+------+
|1|2012|RU |Russia |23456 |
+-+----+-------+---------+------+
And I would want it in this way, but to assign after order by the unique i value:
+-+----+-------+---------+------+
|i|year|short_c|country |uges |
+-+----+-------+---------+------+
|1|2012|USA |United S.|123456|
+-+----+-------+---------+------+
|2|2012|RU |Russia |23456 |
+-+----+-------+---------+------+
|3| | | | |
+-+----+-------+---------+------+
Any help would be appreciated.
I think this is what you are looking for:
SELECT #i := #i + 1 as i, s1.year, s1.short_c, s1.country, s1.uges
FROM (SELECT u.year,
k.short_c,
s.country,
IFNULL(ROUND(SUM(u.income)),0) as uges
FROM im_income u join
im_contact k
on u.customer_id = k.id join
td_countries s
on k.short_c = s.short_c
WHERE u.year = 2012 AND k.kat = 'K'
GROUP BY k.short_c, u.year
) s1
CROSS JOIN
(SELECT #i:= 0) const
ORDER BY year, uges desc;
The variable evaluation occurs when the results are being "output", after the order by.
I also fixed your join syntax. You should learn to use the explicit join rather than implicit joins in the where clause.

Identifying groups in Group By

I am running a complicated group by statement and I get all my results in their respective groups. But I want to create a custom column with their "group id". Essentially all the items that are grouped together would share an ID.
This is what I get:
partID | Description
-------+---------+--
11000 | "Oven"
12000 | "Oven"
13000 | "Stove"
13020 | "Stove"
12012 | "Grill"
This is what I want:
partID | Description | GroupID
-------+-------------+----------
11000 | "Oven" | 1
12000 | "Oven" | 1
13000 | "Stove" | 2
13020 | "Stove" | 2
12012 | "Grill" | 3
"GroupID" does not exist as data in any of the tables, it would be a custom generated column (alias) that would be associated to that group's key,id,index, whatever it would be called.
How would I go about doing this?
I think this is the query that returns the five rows:
select partId, Description
from part p;
Here is one way (using standard SQL) to get the groups:
select partId, Description,
(select count(distinct Description)
from part p2
where p2.Description <= p.Description
) as GroupId
from part p;
This is using a correlated subquery. The subquery is finding all the description values less than the current one -- and counting the distinct values. Note that this gives a different set of values from the ones in the OP. These will be alphabetically assigned rather than assigned by first encounter in the data. If that is important, the OP should add that into the question. Based on the question, the particular ordering did not seem important.
Here's one way to get it:
SELECT p.partID,p.Description,b.groupID
FROM (
SELECT Description,#rn := #rn + 1 AS groupID
FROM (
SELECT distinct description
FROM part,(SELECT #rn:= 0) c
) a
) b
INNER JOIN part p ON p.description = b.description;
sqlfiddle demo
This gets assigns a diferent groupID to each description, and then joins the original table by that description.
Based on your comments in response to Gordon's answer, I think what you need is a derived table to generate your groupids, like so:
select
t1.description,
#cntr := #cntr + 1 as GroupID
FROM
(select distinct table1.description from table1) t1
cross join
(select #cntr:=0) t2
which will give you:
DESCRIPTION GROUPID
Oven 1
Stove 2
Grill 3
Then you can use that in your original query, joining on description:
select
t1.partid,
t1.description,
t2.GroupID
from
table1 t1
inner join
(
select
t1.description,
#cntr := #cntr + 1 as GroupID
FROM
(select distinct table1.description from table1) t1
cross join
(select #cntr:=0) t2
) t2
on t1.description = t2.description
SQL Fiddle
SELECT partID , Description, #s:=#s+1 GroupID
FROM part, (SELECT #s:= 0) AS s
GROUP BY Description

SQL query with multiple tables, possible to apply group by only to count(*)?

I am trying to list bookjobs info for jobtype 'N' and having publishers creditcode of 'C'. Then, add a count of the total number of po's (purchase orders- from table pos) for each row of the previous queries' output. Can you use group by to apply only to that count and not to the rest of the query? Do i have to use a join? My attempts thus far have been unsuccessful.
These are the tables i am working with:
bookjobs:
+--------+---------+----------+
| job_id | cust_id | jobtype |
+--------+---------+----------+
publishers:
+---------+------------+------------+
| cust_id | name | creditcode |
+---------+------------+------------+
pos:
+--------+-------+------------+-----------+
| job_id | po_id | po_date | vendor_id |
+--------+-------+------------+-----------+
This is what i came up with, although it is wrong (count is not grouped to job_id):
select b.*, (select count(*) from pos o) as count
from bookjobs b, publishers p, pos o
where b.cust_id=p.cust_id
and b.job_id=o.job_id
and b.jobtype='N'
and p.creditcode='C';
I believe i need to have the count grouped by job_id, but not the rest of the query. Is this possible or do i need to use a join? I tried a few joins but couldn't get anything to work. Any help appreciated.
Try this sql
select b.*, (select count(*) from pos where job_id=o.job_id) as count
from bookjobs b, publishers p, pos o
where b.cust_id=p.cust_id
and b.job_id=o.job_id
and b.jobtype='N'
and p.creditcode='C';
Based on what you describe, I would assume that your original query would return duplicate rows. You can fix this by pre-aggregating the pos table and then joining it in:
select b.*, o.cnt
from bookjobs b join
publishers p
on b.cust_id = p.cust_id join
(select job_id, count(*) as cnt
from pos o
group by job_id
) o
on b.job_id = o.job_id
where b.jobtype = 'N' and p.creditcode = 'C';

MySQL: Group by date proximity?

I wrote this query, it does almost what I want:
SELECT * FROM
(
SELECT COUNT(*) as cnt,
lat,
lon,
elev,
GROUP_CONCAT(CONCAT(usaf,'-',wban))
FROM `ISH-HISTORY_HASPOS`
GROUP BY lat,lon,elev
) AS x WHERE cnt >=1;
output:
+-----+--------+----------+--------+-------------------------------------------------+
| cnt | lat | lon | elev | GROUP_CONCAT(CONCAT(usaf,'-',wban)) |
+-----+--------+----------+--------+-------------------------------------------------+
| 4 | 30.478 | -87.187 | 36 | 722220-13899,722221-13899,722223-13899,999999-13899 |
| 4 | 36.134 | -80.222 | 295.7 | 723190-93807,723191-93807,723193-93807,999999-93807 |
| 5 | 37.087 | -84.077 | 369.1 | 723290-03849,723291-03849,723293-03849,724243-03849,999999-03849 |
| 5 | 38.417 | -113.017 | 1534.1 | 745200-23176,745201-23176,999999-23176,724757-23176,724797-23176 |
| 4 | 40.217 | -76.851 | 105.8 | 999999-14751,725110-14751,725111-14751,725118-14751 |
+-----+--------+----------+--------+-------------------------------------------------+
This returns a concatenated list of stations that are located at identical coordinates. However, I am only interested in concatenating stations with adjoining date ranges. The table that I select from (ISH-HISTORY_HASPOS) has two datetime columns : 'begin' and 'end'. I need the values for these two columns to be within 3 days of each other to satisfy the GROUP_CONCAT conditions.
Edit: In order for a station to be included in the final result's GROUP_CONCAT it must satisfy the following conditions:
It must be co-located with another station in the list (group by
lat,lon,elev)
Its end time must be within 3 days of another station's begin time OR its begin time must be within 3 days of another station's
end time. When I say "another station", I am referring to stations
that are co-located (meet the conditions for #1).
I figure that I will have to use a subquery but I can't seem to figure out how to do it. Some help would be greatly appreciated! Either a query or a stored procedure would be great but a php solution would also be acceptable.
Here is a dump of the table that I am querying:sql dump
The results should look the same as my example, but non-adjoining items (date-wise) should not be there.
A solution could be using a subquery to compute the list of station within 3 days of each other and adding this subquery as a where clause to the main query.
The subquery consists of a cartesian product to list all possible station couples with a first condition to get just the first half of the resulting matrix and two conditions to specify the time constraints. As to these latter conditions I just guessed them, I don't really know the begin and end fields unit of measure.
The resulting query could be this:
SELECT * FROM (
SELECT COUNT(*) AS
cnt,
lat,
lon,
elev,
GROUP_CONCAT(CONCAT(usaf, '-', wban))
FROM ISH-HISTORY_HASPOS
WHERE id IN (
SELECT DISTINCT t1.id
FROM ISH-HISTORY_HASPOS t1
INNER JOIN ISH-HISTORY_HASPOS t2
ON t1.lon = t2.lon
AND t1.lat = t2.lat
AND t1.elev = t2.elev
WHERE t1.id < t2.id
AND abs(t1.begin - t2.end) < 259200
AND abs(t1.end - t2.begin) < 259200
UNION
SELECT DISTINCT t2.id
FROM ISH-HISTORY_HASPOS t1
INNER JOIN ISH-HISTORY_HASPOS t2
ON t1.lon = t2.lon
AND t1.lat = t2.lat
AND t1.elev = t2.elev
WHERE t1.id < t2.id
AND abs(t1.begin - t2.end) < 259200
AND abs(t1.end - t2.begin) < 259200
)
GROUP BY lat, lon, elev
) AS x WHERE cnt >= 1;
I only have access and knowledge of SQL Server so I can't get your data to work and I don't know if MySQL has the equivalent functionality but here is a verbal description of what you need to do.
You need a recursive statement (WITH CTE in SQL Server) to join the table to itself on lat, lon, elev and begin BETWEEN end -3 AND end +3. You will need to be careful not to get caught in an infinite loop - I suggest building a comma seperated list of the IDs you have visited and checking this as you go. Its painful but keep this list in ID order becuase it is what you will need to group on at the end. You also need to keep track of your depth and the original id.
Something like ...
WITH cte(id, idlist, lat, lon, elev, starts, ends)
AS (
SELECT id, CAST(id AS varchar), lat, lon, elev, starts, ends
FROM `ISH-HISTORY_HASPOS`
UNION ALL
SELECT i.id, FunctionToManagetheList(i.idlist, cte.id), lat, lon, elev, starts, ends
FROM `ISH-HISTORY_HASPOS` i
INNER JOIN
cte ON i.lat=cte.lat AND
i.lon=cte.lon AND
i.elev=cte.elev AND
NOT FunctionToCheckIfTheIDisintheLitst(i.id, cte.idlist)
)
SELECT stuffyouneed
FROM `ISH-HISTORY_HASPOS` i
INNER JOIN
(SELECT id, MAX(depth) AS MaxDepth
FROM cte
GROUP BY id) cte1 ON i.id=cte.id
INNER JOIN
cte cte2 ON cte1.id=cte2.id AND cte1.MaxDepth=cte2.Depth
GROUP BY cte.idlist