SUM subquery with GROUP BY inside of it - mysql

I have a kinda peculiar query to run. I need to SUM the population value of different prefectures under each region as a column, and return it to the main query. For example this query:
SELECT region_en
, population AS temppop
FROM prefectures
WHERE region_id = 12
GROUP
BY region_en
returns this table:
Karditsa 129541
Larissa 279305
Magnesia 206995
Sporades 13798
Trikala 138047
All the above belongs to the same region id (12), and i need to get the SUM of all those populations under the same query. I tried applying the above but it is not working. I dont get the sum which is 767686 but 95960750 instead:
SELECT SUM(b.cases) as cases
, COALESCE(SUM(o.poptemp), 0) as pop
FROM prefectures AS b
LEFT
JOIN
( SELECT region_en
, population AS poptemp
FROM prefectures
WHERE region_id = 12
GROUP
BY region_en
) AS o
ON o.region_en = b.region_en
WHERE b.region_id = '12'
Basically I need the total Cases per region, as well as the sum of all people living under it.

You seem to be looking for a window sum. Your original query is not a valid aggregation query, which makes things a little unclear.
If there is just one row per region_en, then no need to aggregate:
SELECT region_en, population, SUM(population) OVER() as region_population
FROM prefectures
WHERE region_id = 12
You can get the same result for all regions at once like so:
SELECT region_en, population,
SUM(population) OVER(PARTITION BY region_id) AS region_population
FROM prefectures
If there really are several rows per region_en:
SELECT region_en, SUM(population) AS population,
SUM(SUM(population)) OVER(PARTITION BY region_id) as region_population
FROM prefectures
GROUP BY region_id, region_en
Note that window functions are available in MySQL 8.0 only. In earlier versions, you would phrase the query as:
SELECT region_en, population,
(SELECT SUM(p1.population) FROM prefectures p1 WHERE p1.region_id = p.region_id) AS region_population
FROM prefectures p

Actually i used a different approach.
Select SUM(o.cas) as cases, SUM(o.pop) as population FROM
(SELECT a.population as pop, SUM(a.cases) as cas FROM prefectures a WHERE a.region_id = 12 GROUP BY a.region_en) AS o
This brings back the correct values and it is superfast as well.
In the end you get 2 columns:
cases population
2515 767686
which is the correct values.

Related

Retrieving top company for each quarter and corresponding revenue

Company_name
Quarter
Year
Revenue
TCS
Q1
2001
50
CTS
Q2
2010
60
ZOHO
Q2
2007
70
CTS
Q4
2015
90
This is my sample table where I store the names of the companies, quarters of the years, years and revenue for each year per a certain quarter.
I want to find the company with top revenue for each quarter, regardless of the year, and display its revenue too.
In the above case the resultant output should be something like this:
QUARTER
COMPANY_NAME
REVENUE
Q1
TCS
50
Q2
ZOHO
70
Q4
CTS
90
Here's what I've tried:
SELECT DISTINCT(C1.QUARTER),
C1.REVENUE
FROM COMPANY_REVENUE C1,
COMPANY_REVENUE C2
WHERE C1.REVENUE = GREATEST(C1.REVENUE, C2.REVENUE);
There are a couple of problems in your query, among which:
the fact that the DISTINCT keyword can be applied to full rows rather than single fields,
the SELF JOIN should be explicit, though most importantly it requires a matching condition, defined by an ON clause (e.g. SELECT ... FROM tab1 JOIN tab2 ON tab1.field = tab2.field WHERE ...)
Though probably you could solve your problem in another way.
Approach for MySQL 8.0
One way of computing values on partitions (in your case you want to partition on quarters only) is using window functions. In the specific case you can use ROW_NUMBER, which will compute a ranking over your revenues descendently for each selected partition. As long as you want the highest revenue for each quarter, you can select the row number equal to 1 for each quarter group.
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER(
PARTITION BY Quarter
ORDER BY Revenue DESC
) AS rn
FROM tab
)
SELECT Quarter,
Company_name,
Revenue
FROM cte
WHERE rn = 1
Check the demo here.
Approach for MySQL 5.7
In this case you can use an aggregation function. As long as you want your max "Revenue" for each "Quarter", you need first to select the maximum value for each "Quarter", then you need to join back to your original table on two conditions:
table's quarter matches subquery quarter,
table's revenue matches subquery max revenue
SELECT tab.Quarter,
tab.Company_name,
tab.Revenue
FROM tab
INNER JOIN (SELECT Quarter,
MAX(Revenue) AS Revenue
FROM tab
GROUP BY Quarter ) max_revenues
ON tab.Quarter = max_revenues.Quarter
AND tab.Revenue = max_revenues.Revenue
Check the demo here.
Note: the second solution will find for each quarter all companies that have the maximum revenue for that quarter, which means that if two or more companies have the same maximum value, both will be returned. This won't happen for the first solution, as long as the ranking ensures only one (the ranked = 1) will be retrieved.
You can just use a cte:
with x as (
select Quarter, max(Revenue) as Revenue
from table
group by Quarter
)
select t.Company_name, x.Quarter, x.Revenue
from x
join table t
on x.Revenue = t.Revenue
and t.Quarter = x.Quarter;
see db<>fiddle.
First you select the max Revenue group by Quarter, then I'm joining to the table on the returned max(Revenue) but as #lemon pointed out in comments that's not enough because what would happen when there's two revenues on same company but different quarters it will return more rows as shown in this db<>fiddle.
So that's why I need to add the join on quarter so it will only return one result per quarter.
But if you're using a version of MySql that doesn't support cte you can use a subquery like:
select t.Company_name, x.Quarter, x.Revenue
from
(
select Quarter, max(Revenue) as Revenue
from test
group by Quarter
) x
join test t
on x.Quarter = t.Quarter
and x.Revenue = t.Revenue;
Try this,
SELECT quarter, company_name,max(revenue) FROM table_name GROUP BY quarter

How to query GHTorrent's (SQL-like language) for most common languages per country

Based on this question How to query GHTorrent's (SQL-like language) for country/city/users number/repositories number? and first query here https://ghtorrent.org/gcloud.html, I am trying to get an sql query to get the most common coding language per country and ideally per month/year from the GHtorrent bigquery database. I have tried to edit this answer code https://stackoverflow.com/a/65460166/10624798/, but fail to get the correct join. My ideal outcome would looks something like this
country
Year
Month
Language
Number of commits
total_bytes
US
2016
Jan
Python
10000
46789390
CH
2016
Jan
Java
20000
5679304
Basically, I am not very good at creating SQL queries.
I checked the two examples of the query that you passed, then I found the common value that was the project_id and I modified the second example to bring the project_id and the created_date of the commits. Then I decided as you mention to format the created_date to bring the year and the month and to add it as a filter.
Then I join the two examples in a CTE and I only SELECT the names of the columns that are needed.
Finally I used a ROW_NUMBER only to bring the maximum value of the processed bytes of every language by country/year/month.
WITH ltb as(
select pl3.lang, sum(pl3.size) as total_bytes, pl3.project_id
from (
select pl2.bytes as size, pl2.language as lang, pl2.project_id
from (
select pl.language as lang, max(pl.created_at) as latest, pl.project_id as project_id
from `ghtorrent-bq.ght.project_languages` pl
join `ghtorrent-bq.ght.projects` p on p.id = pl.project_id
where p.deleted is false
and p.forked_from is null
group by lang, project_id
) pl1 join `ghtorrent-bq.ght.project_languages` pl2 on pl1.project_id = pl2.project_id
and pl1.latest = pl2.created_at
and pl1.lang = pl2.language
) pl3
group by pl3.lang, pl3.project_id
order by total_bytes desc
), fprt as(
SELECT country_code, count(*) AS NoOfCommits, c.project_id,
FORMAT_TIMESTAMP("%m", c.created_at)
AS formattedmonth,FORMAT_TIMESTAMP("%b", c.created_at)
AS formattedmonthname, FORMAT_TIMESTAMP("%Y", c.created_at)
AS formattedyear,
FROM `ghtorrent-bq.ght.commits` AS c
JOIN `ghtorrent-bq.ght.users` AS u
ON c.Committer_Id = u.id
WHERE NOT u.fake and country_code is not null
GROUP BY country_code, c.project_id, formattedmonth, formattedyear, formattedmonthname
ORDER BY NoOfCommits DESC
), almst as(
SELECT country_code,formattedmonth, formattedmonthname, formattedyear, lang, NoOfCommits, total_bytes FROM fprt JOIN ltb
on ltb.project_id=fprt.project_id
where country_code is not null
)
SELECT country_code, formattedyear as year, formattedmonthname as month, lang, NoOfCommits, total_bytes
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY country_code, formattedyear, formattedmonth ORDER BY total_bytes DESC) rn
FROM almst
) t
WHERE rn = 1
ORDER BY formattedyear asc, formattedmonth asc
Output:

Finding Max of Max mysql

I am using a table called covid_vaccinations.
To briefly explain about the table, it tracks down all the countries' vaccination completion by every single day from Feb xx, 2020 to Jan XX, 2022.
The name of the countries are called 'location' in this table.
The countries (location) are also categorized in the column of 'continent'
To find the people who are fully vaccinated in Asia, I used the query below:
SELECT continent,location, MAX(people_fully_vaccinated)
FROM covid_vaccinations
WHERE continent LIKE '%ASIA%'
GROUP BY continent, location
ORDER BY 3 DESC;
I used MAX() since the <people_fully_vaccinated> column includes the cumulative number of data.
The query above gave me the result I wanted, see <image 1>
HERE IS MY QUESTION:
If I just want to get the GREATEST result of people_fully_vaccinated, how should I write the query?
I tried below, and it gave me the same result as <image 1>
SELECT location, MAX(peep_f_vacc_asia)
FROM (
SELECT location, MAX(people_fully_vaccinated) as peep_f_vacc_asia
FROM covid_vaccinations
WHERE continent LIKE '%ASIA%'
GROUP BY continent,location
) A
GROUP BY location
ORDER BY 2 DESC;
The desired result I want to see would be only a single row, China (which has the greatest number of people_fully_vaccinated)
Thank you so much guys...
You might be able to get away with just using a LIMIT query. A slight modification of your first query:
SELECT continent, location, MAX(people_fully_vaccinated)
FROM covid_vaccinations
WHERE continent LIKE '%ASIA%'
GROUP BY continent, location
ORDER BY 3 DESC
LIMIT 1;
But this only works in the case that there are no ties for a given continent and location for the max number of fully vaccinated. If you do have to worry about ties, and you are using MySQL 8+, then we can use RANK as follows:
WITH cte AS (
SELECT continent, location, MAX(people_fully_vaccinated) AS max_fv,
RANK() OVER (ORDER BY MAX(people_fully_vaccinated) DESC) rnk
FROM covid_vaccinations
WHERE continent LIKE '%ASIA%'
GROUP BY continent, location
)
SELECT continent, location, max_fv
FROM cte
WHERE rnk = 1;

MySQL GROUP BY - get SUM of few grouped values

I have a simple db where I have users and every user have 'country', for ex:
Dmitry - US
Ann - US
John - UK
Roman - Japan
Mila - China
Jane - Australia
I want to get count of very country users, BUT I need to get TOP 3 countries users counts (US, UK, Japan for example), and all other countries users count should be summarized together as "Rest". How to do this?
So in my example this should give me this result from SQL:
US = 2
UK = 1
Japan = 1
Rest = 2
If I will make regular SQL:
SELECT count(userid) FROM users GROUP BY country
I will get results for every country, but I need only TOP 3 and all others count as "Rest" in results. Thanks!
P.S.: I tried to create SQLFiddle for this, but their website is down right now and I can't use it.
You can group by country and use ROW_NUMBER() window function to rank the countries based on the number of times they appear.
Then add another level of aggregation based on the ranking position of each country:
SELECT CASE WHEN rn <= 3 THEN country ELSE 'Rest' END country,
SUM(counter) counter
FROM (
SELECT country, COUNT(*) counter,
ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) rn
FROM users
GROUP BY country
) t
GROUP BY 1;
Note that the countries returned as top 3 in case of ties may be arbitrary chosen, so you could add another condition in the ORDER BY clause of ROW_NUMBER(), like:
ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC, country)
which would return different but consistent results.
See the demo.

MYSQL Sum results of a calculation

I am building a query in mysql 5.0 to calculate a student semester grade. The initial table (studentItemGrades) contains the list of assignments etc which will be used to calculate the final grade. Each assignment has a PossibleScore, Grade and Weight. The calculation should group all similarly weighted items, and provide the SUM(GRADE)/SUM(POSSIBLESCORE) based on a date range of when the assignment was due. The problem I am encountering is the final summation of all the individual weighted grades. For example, the results currently produce the following:
CourseScheduleID sDBID AssignedDate DueDate Weight WeightedGrade
1 519 2010-08-26 2010-08-30 10 0.0783333333333333
1 519 2010-09-01 2010-09-03 20 0.176
1 519 2010-09-01 2010-09-10 70 0.574
from the query:
SELECT CourseScheduleID, sDBID, AssignedDate, DueDate, Weight,
((SUM(Grade)/SUM(PossibleScore))*(Weight/100)) AS WeightedGrade
FROM studentItemGrades
WHERE DueDate>='2010-08-23'
AND DueDate<='2010-09-10'
AND CourseScheduleID=1
AND sDBID=519
AND Status>0
GROUP BY Weight
The question: How do I now SUM the three results in the WeighedGrade output? And by the way, this is part of a much larger query for calculating all grades for all courses on a particular campus.
Thanks in advance for your help.
You can use a subquery, like so:
SELECT SUM(WeightedGrade) FROM
(
SELECT CourseScheduleID, sDBID, AssignedDate, DueDate, Weight,
((SUM(Grade)/SUM(PossibleScore))*(Weight/100)) AS WeightedGrade
FROM studentItemGrades
WHERE DueDate>='2010-08-23'
AND DueDate<='2010-09-10'
AND CourseScheduleID=1
AND sDBID=519
AND Status>0
GROUP BY Weight
) t1
In order to sum the three results, you would need to requery the results of this select using another select with a group by. This could be done using a single sql statement by using subqueries.
SELECT sq.CourseScheduleID, sq.sDBID, SUM(sq.WeightedGrade) as FinalGrade
FROM
(
SELECT CourseScheduleID, sDBID, AssignedDate, DueDate, Weight,
((SUM(Grade)/SUM (PossibleScore))*(Weight/100)) AS WeightedGrade
FROM studentItemGrades WHERE DueDate>='2010-08-23' AND DueDate<='2010-09-10'
AND CourseScheduleID=1 AND sDBID=519 AND Status>0 GROUP BY Weight
) AS sq
GROUP BY sq.CourseScheduleID, sq.sDBID