SQL Aggregate Fraction - mysql

I have a table with City and ComplaintType.
I am trying to create a normalization column that has the following computation:
(pseudo) select number of a particular type in a particular city) / (number of all complaints in a particular city
I currently have the following SQL:
SELECT City AS city_name, ComplaintType AS complaint_type,
count(*) / (SELECT count(City) FROM data GROUP BY City) AS complaint_frac,
count(*) AS count_freq,
(SELECT count(City) FROM data GROUP BY City) AS count_city
FROM data
GROUP BY City, ComplaintType
ORDER BY complaint_frac DESC
Which gives me the following table:
The total complaints in a city (count_city) is incorrect. However, when I run the count_city query on it's own, the counts are correct and give the following output:
How do I correctly get my city_count associated with the number of x complaints by city so I can compute the correct fraction?
Cold hard numbers example:
Bronx & Hot Water = 79690
Bronx (total complaints) = 579363
complaint_frac = 79690 / 579363 = 0.13754761695

correlate your subquery in your main table.
SELECT City AS city_name, ComplaintType AS complaint_type,
count(*) / (SELECT count(City) FROM data GROUP BY City) AS complaint_frac,
count(*) AS count_freq,
(SELECT count(d1.City) FROM data d1 WHERE d1.City = d2.City GROUP BY d1.City) AS count_city
FROM data d2
GROUP BY City, ComplaintType
ORDER BY complaint_frac DESC

You don't need subqueries for this, at least in MySQL 8+; window functions do the work:
SELECT City AS city_name, ComplaintType AS complaint_type,
count(*) / sum(count(*)) over (partition by city) as complaint_frac,
count(*) as count_freq,
sum(count(*)) over (partition by city) as count_city
FROM data
GROUP BY City, ComplaintType
ORDER BY complaint_frac DESC

Related

How do I return only one row with MAX() aggregate?

SELECT Name, MAX(Population) as Population
FROM County
GROUP BY Name;
Question: This just returned 16 rows... I want to return only the ONE county which has the highest population, showing 1 row with with the name of that county and its population,, not the highest population for every county. How do I fix this?
Thanks in advance.
Order and take first
SELECT Name,
Population
FROM County
ORDER BY Population DESC LIMIT 1
with data (Population, Name) as(
Select '12345' ,'c1' union all
Select '1234500001' ,'c2' union all
Select '1234500002' ,'c3' union all
Select '12346' ,'c4' union all
Select '1234600001' ,'c4' union all
Select '1234600002' ,'c4' )
SELECT Name, Population
FROM data
where Population = (SELECT MAX(Population) FROM data)
;

How can i group by but select only the top 10 under each group by sum?

I have what i tihnk is a complicated question... the query below is usally exported into excel:
select
shop_type,
shop_name,
company,
sum(amount) as amount,
sum(counts) as count,
from df
group by shop_type,shop_name,company
company is only of two values A, B. The priblem is that when i run the above and throw this into a pivot i am unable to do so because it's so large e.g. 5 million rows.. when i put this into a pivot my rows are shop_type, shop_name (with shop name nested under shop_type) and my columns are company, i then sort by grand total (largest to smallest) whilst also sorting from largest to smallest for each shop_name under shop_type!
How can i run the above but select only the top 10 shop_names u (those with largest share of amount) under each shop_type for both companies (i.e. total)?
Use window functions:
select df.*
from (select shop_type, shop_name, company,
sum(amount) as amount, sum(counts) as count,
row_number() over (partition by company, shop_type order by sum(amount) desc) as seqnum
from df
group by shop_type,s hop_name, company
) df
where seqnum <= 10;
Note: Your question is also tagged Hive. The above is standard SQL, but Hive might be quirky about mixing aggregation and window functions. If so, you can use one more level of subqueries:
select df.*
from (select df.*
row_number() over (partition by company, shop_type order by sum(amount) desc) as seqnum
from (select shop_type, shop_name, company,
sum(amount) as amount, sum(counts) as count
from df
group by shop_type,s hop_name, company
) df
) df
where seqnum <= 10;

My sql get id which has maximum likes grouped by column and ordered by sum(likes)

I need to retrieve data from a view. View will have details such as country, location_id, content_id, content_url, content_likes and .... I need to retrieve location_id which has max(content_likes) grouped by country order by sum(content_likes) desc.
Right now I am getting the correct data based on country side, but Id I am getting which is of default order. But instead of default ordered id I need to get Id which has maximum likes.
My current query is
select * from <view_name> group by country order by sum(content_likes) desc;
Data From View:
Result :
You seem to want something like:
select country_id, location_id, sum(content_likes)
from view_name vn
group by country_id, location_id
having location_id = (select vn2.location_id
from view_name vn
where vn2.country_id = vn.country_id
group by location_id
order by sum(content_likes) desc
limit 1
);
You can try this (sorry any syntax error):
select cc.*
from
(select aa.location_id, aa.country /*[, add fields if you need them...]*/,
rank() over (partition by aa.country_id order by aa.content_likes) ranking_field
from <view_name> aa
join (select country, sum(content_likes) sum_content_likes
from <view_name>
group by country) bb
on aa.country=bb.country) cc
where ranking_field=1
order by cc.sum_content_likes desc
This returns only the location with max likes in each country ordered by the total likes in each country
EDIT:
With your new example, perhaps you can do simply this:
SELECT *
FROM <<view>> aa
JOIN(SELECT country, MAX(content_likes) max_likes
FROM <<view>>
GROUP BY country) bb
ON aa.country=bb.country
AND aa.content_likes=bb.max_likes
It is two pass query but give your example result. It can return more than one row for country if more than one location has same likes number and those are the max likes in that country.
Hope this help you.
My issue resolved with below sql
select t.location_id,t.content_likes from
(select country,max(content_likes) as mcl,sum(content_likes) as scl from test_eresh
group by country) x,
test_eresh t
where t.country = x.country
and t.content_likes = x.mcl
order by x.scl desc

How to sum top results?

I'm wondering how one would sum the results from a query?
I want to know how many people live in total in the three biggest cities in Norway. I'm using mysql, the world.sql sample database in mysql workbench.
This is the closest I've gotten
SELECT population
FROM city
WHERE CountryCode = 'NOR'
ORDER BY population DESC
LIMIT 3
There's a few problems here namely this gives me three results instead of one, and while using LIMIT which actually limits how many results it gives, not how many it uses.
Any ideas?
You would use a subquery:
SELECT SUM(population)
FROM (SELECT population
FROM city
WHERE CountryCode = 'NOR'
ORDER BY population DESC
LIMIT 3
) cp
simply sum the result:
select sum(population) from (SELECT population
FROM city
WHERE CountryCode = 'NOR'
ORDER BY population DESC
LIMIT 3) t1
select sum(population) from (SELECT population FROM city WHERE
CountryCode = 'NOR' ORDER BY population DESC LIMIT 3) temp
Read on subqueries.
Make your current query a subquery and get sum from your subquery.
SELECT SUM(population) FROM (
SELECT population
FROM city
WHERE CountryCode = 'NOR'
ORDER BY population DESC
LIMIT 3) p
You query will now act as a virtual table, from which you can you can write a select query to get the sum

Count exclude max an min occurances

For eg. if we go to w3schools:
And put
SELECT City, count(City) as Occurrences
FROM Customers
GROUP by City
ORDER BY count(City) DESC
But what I really want is to exclude max and min occurances (ignore hard-coded 6 and 1 values for max and min), like
SELECT City, count(City) as Occurances
FROM Customers
GROUP by City
HAVING count(City) != 6 AND count(City) != 1
ORDER BY count(City) DESC
What would be the way to get desired output without hard-coding 6 and 1?
you can try with this
select c1.city, c1.cnt from (
select city, count(*) cnt from customers c
group by city
) c1 inner join
(select max(cnt) max_cnt, min(cnt) min_Cnt from (
select city, count(*) cnt from customers c
group by city
)) c2
on c1.cnt!=c2.max_cnt and c1.cnt!=c2.min_cnt
;
as MySQL doesn't have an OVER..PARTITION BY function, which could maybe be useful.
Another approach could be on rownums and ordering but I prefer this
Try this
SELECT City, count(City) as Occurences FROM Customers,
(SELECT MAX(Occur) AS Ma,MIN(Occur) AS Mi FROM (SELECT City, count(City) as Occur
FROM Customers GROUP by City)) as T
GROUP BY City HAVING Occurences!=T.Ma AND Occurences!=T.Mi ORDER BY Occurences DESC