SUM For Distinct Rows - mysql

Given the following table structures:
countries: id, name
regions: id, country_id, name, population
cities: id, region_id, name
...and this query...
SELECT c.name AS country, COUNT(DISTINCT r.id) AS regions, COUNT(s.id) AS cities
FROM countries AS c
JOIN regions AS r ON r.country_id = c.id
JOIN cities AS s ON s.region_id = r.id
GROUP BY c.id
How would I add a SUM of the regions.population value to calculate the country's population? I need to only use the value of each region once when summing, but the un-grouped result has multiple rows for each region (the number of cities in that region).
Example data:
mysql> SELECT * FROM countries;
+----+-----------+
| id | name |
+----+-----------+
| 1 | country 1 |
| 2 | country 2 |
+----+-----------+
2 rows in set (0.00 sec)
mysql> SELECT * FROM regions;
+----+------------+-----------------------+------------+
| id | country_id | name | population |
+----+------------+-----------------------+------------+
| 11 | 1 | region 1 in country 1 | 10 |
| 12 | 1 | region 2 in country 1 | 15 |
| 21 | 2 | region 1 in country 2 | 25 |
+----+------------+-----------------------+------------+
3 rows in set (0.00 sec)
mysql> SELECT * FROM cities;
+-----+-----------+---------------------------------+
| id | region_id | name |
+-----+-----------+---------------------------------+
| 111 | 11 | City 1 in region 1 in country 1 |
| 112 | 11 | City 2 in region 1 in country 1 |
| 121 | 12 | City 1 in region 2 in country 1 |
| 211 | 21 | City 1 in region 1 in country 2 |
+-----+-----------+---------------------------------+
4 rows in set (0.00 sec)
Desired output with example data:
+-----------+---------+--------+------------+
| country | regions | cities | population |
+-----------+---------+--------+------------+
| country 1 | 2 | 3 | 25 |
| country 2 | 1 | 1 | 25 |
+-----------+---------+--------+------------+
I prefer a solution that doesn't require changing the JOIN logic.
The accepted solution for this post seems to be in the neighborhood of what I'm looking for, but I haven't been able to figure out how to apply it to my issue.
MY SOLUTION
SELECT c.id AS country_id,
c.name AS country,
COUNT(x.region_id) AS regions,
SUM(x.population) AS population,
SUM(x.cities) AS cities
FROM countries AS c
LEFT JOIN (
SELECT r.country_id,
r.id AS region_id,
r.population AS population,
COUNT(s.id) AS cities
FROM regions AS r
LEFT JOIN cities AS s ON s.region_id = r.id
GROUP BY r.country_id, r.id, r.population
) AS x ON x.country_id = c.id
GROUP BY c.id, c.name
Note: My actual query is much more complex and has nothing to do with countries, regions, or cities. This is a minimal example to illustrate my issue.

First of all, the other post you reference is not the same situation. In that case, the joins are like [A -> B and A -> C], so the weighted average (which is what that calculation does) is correct. In your case the joins are like [A -> B -> C], so you need a different approach.
The simplest solution that comes to mind right away does involve a subquery, but not a complex one:
SELECT
c.name AS country,
COUNT(r.id) AS regions,
SUM(s.city_count) AS cities,
SUM(r.population) as population
FROM countries AS c
JOIN regions AS r ON r.country_id = c.id
JOIN
(select region_id, count(*) as city_count
from cities
group by region_id) AS s
ON s.region_id = r.id
GROUP BY c.id
The reason this works is that it resolves the cities to one row per region before joining to the region, thus eliminating the cross join situation.

How about leaving the rest and just adding one more join for the population
SELECT c.name AS country,
COUNT(distinct r.id) AS regions,
COUNT(s.id) AS cities,
pop_regs.sum as total_population
FROM countries AS c
LEFT JOIN regions AS r ON r.country_id = c.id
LEFT JOIN cities AS s ON s.region_id = r.id
left join
(
select country_id, sum(population) as sum
from regions
group by country_id
) pop_regs on pop_regs.country_id = c.id
GROUP BY c.id, c.name
SQLFiddle demo

To start, you should know that the question and it's solution mentioned in your question are a little bit different from your question and it's solution. That's why you can not use only JOINs without sub-queries.
Tables :
Countries :
===========================
| id | name |
===========================
| 1 | country 1 |
---------------------------
| 2 | country 2 |
---------------------------
| 3 | country 3 |
---------------------------
| 4 | country 4 |
---------------------------
Regions :
=============================================
| id |country_id| name |population|
=============================================
| 1 | 1 | c1 - r1 | 10 |
---------------------------------------------
| 2 | 1 | c1 - r2 | 15 |
---------------------------------------------
| 3 | 1 | c1 - r3 | 15 |
---------------------------------------------
| 4 | 2 | c2 - r1 | 25 |
---------------------------------------------
| 5 | 3 | c3 - r1 | 13 |
---------------------------------------------
Cities :
========================================
| id | region_id | name |
========================================
| 1 | 1 | city 1 |
----------------------------------------
| 2 | 1 | city 2 |
----------------------------------------
| 3 | 2 | city 3 |
----------------------------------------
| 4 | 2 | city 4 |
----------------------------------------
| 5 | 2 | city 5 |
----------------------------------------
| 6 | 3 | city 6 |
----------------------------------------
| 7 | 3 | city 7 |
----------------------------------------
| 8 | 4 | city 8 |
----------------------------------------
| 9 | 4 | city 9 |
----------------------------------------
| 10 | 4 | city 10 |
----------------------------------------
As a simple method, you can join countries table with a sub-query that joins regions and cities tables to get 2 tables : countries and regions with cities columns :
SQL :
SELECT
r.id AS id,
r.country_id AS country_id,
r.name AS name,
r.population AS population,
COUNT(s.region_id) AS cities
FROM regions r
/* we use left joint and not only join to get also regions without cities */
LEFT JOIN cities s
ON r.id = s.region_id
GROUP BY r.id
Data :
==================================================================
| id | country_id | name | population | cities |
==================================================================
| 1 | 1 | c1 - r1 | 10 | 2 |
------------------------------------------------------------------
| 2 | 1 | c1 - r2 | 15 | 3 |
------------------------------------------------------------------
| 3 | 1 | c1 - r3 | 15 | 2 |
------------------------------------------------------------------
| 4 | 2 | c2 - r1 | 25 | 3 |
------------------------------------------------------------------
| 5 | 3 | c3 - r1 | 13 | 0 |
------------------------------------------------------------------
Then you have to do your normal requet which gives you this code :
SQL :
SELECT
c.name AS country,
COUNT(r.country_id) AS regions,
/* ifnull is used here to show 0 instead of null */
SUM(IFNULL(r.cities, 0)) AS cities,
SUM(IFNULL(r.population, 0)) AS population
FROM countries c
/* we use left joint and not only join to get also countries without regions */
LEFT JOIN (
SELECT
/* we don't need regions.id and regions.name */
r.country_id AS country_id,
r.population AS population,
COUNT(s.region_id) AS cities
FROM regions r
LEFT JOIN cities s
ON r.id = s.region_id
GROUP BY r.id
) r
ON c.id = r.country_id
GROUP BY c.id
And this result :
=====================================================
| country | regions | cities | population |
=====================================================
| country 1 | 3 | 7 | 40 |
-----------------------------------------------------
| country 2 | 1 | 3 | 25 |
-----------------------------------------------------
| country 3 | 1 | 0 | 13 |
-----------------------------------------------------
| country 4 | 0 | 0 | 0 |
-----------------------------------------------------
To compare, using only JOIN removes countries without regions and countries with regions that haven't cities :
=====================================================
| country | regions | cities | population |
=====================================================
| country 1 | 3 | 7 | 40 |
-----------------------------------------------------
| country 2 | 1 | 3 | 25 |
-----------------------------------------------------
For your exact example (with data mentioned in your question), you will get :
=====================================================
| country | regions | cities | population |
=====================================================
| country 1 | 2 | 3 | 25 |
-----------------------------------------------------
| country 2 | 1 | 1 | 25 |
-----------------------------------------------------
I hope all that can help you to get what you want.

I have test in sql with this query for the same table you provide below
select regioncount.name as country,regioncount.regions, citycount.cities,regioncount.population from
(SELECT c.name,c.id,COUNT(r.id) AS regions ,SUM(r.population) as population
FROM countries AS c
JOIN regions AS r on c.id = r.country_id GROUP BY c.id,c.name) as regioncount
join
(SELECT
r.country_id,
COUNT(s.id) AS cities
FROM regions AS r
JOIN cities AS s on r.id =s.region_id GROUP BY r.country_id) as citycount on citycount.country_id = regioncount.id
and i got the result u want
+-----------+---------+--------+------------+
| country | regions | cities | population |
+-----------+---------+--------+------------+
| country 1 | 2 | 3 | 25 |
| country 2 | 1 | 1 | 25 |
+-----------+---------+--------+------------+

Use LEFT OUTER JOIN instead of INNER JOIN because If country have no regions then that country will not come in result if you use INNER JOIN, same wat If any regions have no cities then that will not counted in result.
So use LEFT OUTER JOIN instead of INNER JOIN or JOIN.
Try this:
SELECT c.name AS country, r.regions, r.population, r.cities
FROM countries AS c
LEFT OUTER JOIN (SELECT r.country_id,
COUNT(r.id) AS regions,
SUM(r.population) AS population,
SUM(c.cities) AS cities
FROM regions AS r
LEFT OUTER JOIN (SELECT c.region_id, COUNT(c.id) AS cities
FROM cities AS C
GROUP BY c.region_id
) AS c ON r.id = c.region_id
GROUP BY r.country_id
) AS r ON c.id = r.country_id;
Check the SQL FIDDLE DEMO
OUTPUT
| COUNTRY | REGIONS | POPULATION | CITIES |
|---------|---------|------------|--------|
| usa | 3 | 16 | 4 |
| germany | 2 | 5 | 1 |

Here's another way of doing it, if you dont want to introduce/change a JOIN or a SUBQUERY
SELECT
c.name AS country,
COUNT(distinct r.id) AS regions,
COUNT(s.id) AS cities,
SUM(DISTINCT(((((r.id*r.id) + (r.population*r.id)))-(r.id*r.id))/r.id)) as total_population
FROM
countries AS c
JOIN regions AS r ON r.country_id = c.id
LEFT JOIN cities AS s ON s.region_id = r.id
GROUP
BY c.id
http://sqlfiddle.com/#!2/3dd8ba/22/0

Your problem is quite common. You join all tables that have something to do with the data you want to see, and then you start thinking about how to get to that data. When it comes to different aggregations as in your case, this is not easy to achieve.
So better join what you are actually interested in. In your case: countries and (aggregated) region/city data per country. This keeps the query straight-forward and easy to maintain.
select
c.name as country,
r.regions,
r.population,
r.cities
from countries as c
join
(
select
country_id,
count(*) as regions,
sum(population) as population,
sum((select count(*) from cities where cities.region_id = regions.id)) as cities
from regions
group by country_id
) as r on r.country_id = c.id;

Related

MySQL get the sum with left joins

I've three tables.
REGIONS
CUISINE
BANNERS
If I run this query
SELECT SUM(fee) FROM BANNERS;
Output will be 10,000
If I run this query
SELECT SUM(fee) FROM CUISINE;
Output will be 12,800
But if I run this query
SELECT REGIONS.name,
sum(BANNERS.fee) as banner_revenue,
sum(CUISINE.fee) as cuisine_revenue
FROM REGIONS
LEFT JOIN BANNERS ON REGIONS.id = BANNERS.region_id
LEFT JOIN CUISINE ON REGIONS.id = CUISINE.region_id
GROUP BY REGIONS.name;
Output is wrong. My desired output is
name | banner_revenue | cuisine_revenue
------------------------------------------
NY | 10,000 | 4,800
Paris | NULL | 8,000
London | NULL | NULL
DB fiddle reproduce
Why could this happen?
Please refer my DB fiddle.
If you run
SELECT *
FROM REGIONS
LEFT JOIN BANNERS
ON REGIONS.id = BANNERS.region_id
LEFT JOIN CUISINE
ON REGIONS.id = CUISINE.region_id;
you'll notice, that for every region banner pair all the cusines are join, thus "multiplying" the cuisins. I.e. their fees also multiply.
Do the grouping in the derived tables and join them to get your desired result.
SELECT r.name,
sb.fee,
sc.fee
FROM REGIONS r
LEFT JOIN (SELECT sum(b.fee) fee,
b.region_id
FROM BANNERS b
GROUP BY b.region_id) sb
ON sb.region_id = r.id
LEFT JOIN (SELECT sum(c.fee) fee,
c.region_id
FROM CUISINE c
GROUP BY c.region_id) sc
ON sc.region_id = r.id;
Consider the following:
SELECT r.name
, x.header
, x.fee
FROM REGIONS r
LEFT
JOIN
( SELECT 'banner' header, region_id, fee FROM banners
UNION
SELECT 'cuisine', region_id, fee FROM cuisine
) x
ON x.region_id = r.id
ORDER
BY r.name;
+--------+---------+------+
| name | header | fee |
+--------+---------+------+
| London | NULL | NULL |
| NY | cuisine | 2500 |
| NY | cuisine | 2300 |
| NY | banner | 2000 |
| NY | banner | 5000 |
| NY | banner | 3000 |
| Paris | cuisine | 8000 |
+--------+---------+------+

How to find items without relation to MySQL table

I got a problem to exclude items on my MySQL query. I want to get all animals that have no relation to "Asia" e.g.
My tables look like that.
Table 'animals'
+----+--------------+
| id | name |
+----+--------------+
| 1 | Tiger |
| 2 | Lion |
| 3 | Spider |
| 4 | Bird |
+----+--------------+
Table 'continent'
+----+--------------+
| id | name |
+----+--------------+
| 1 | Europe |
| 2 | Asia |
| 3 | Africa |
+----+--------------+
Table 'relations'
+----+--------+-----------+
| id | animal | continent |
+----+--------+-----------+
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 2 | 2 |
| 4 | 2 | 3 |
| 5 | 3 | 3 |
| 6 | 4 | 2 |
+----+--------+-----------+
This is what my query looks like:
SELECT a.`id`,
a.`name`
FROM a.`animals` AS a
LEFT JOIN `relations` AS r
ON r.`animal` = a.`id`
WHERE r.`continent` != 2
ORDER BY a.`name` asc;
The problem ist that this gives me the following result:
Lion
Spider
Tiger
The thing is that "Lion" has a relation to continent Asia (ID 2) and shouldn't be in the results. Can you please help me to solve this issue?
Use NOT EXISTS to show only these animals for which there is no relation to Asia continent:
select a.*
from animals a
where not exists (
select 1
from relations r
join continent c on
c.id = r.continent
where c.name = 'Asia'
and a.id = r.animal
)
It's because Lion have a relation with another country that isn't Asia.
What you want to do is :
SELECT a.id, a.name
FROM animals a
WHERE a.id NOT IN (
SELECT DISTINCT r.animal FROM relations r WHERE r.continent = 2
)
ORDER BY a.name DESC;
;)
You obtain liion because you have lion (id 2) in the relation table for continent ASIA
could be you need the animal that are only i continent different for asia then
SELECT a.`id`,
a.`name`
FROM a.`animals` AS a
LEFT JOIN `relations` AS r
ON r.`animal` = a.`id`
WHERE r.`continent` != 2
AND a.id not in (
select animal from relation where continent = 2
)
ORDER BY a.`name` asc;
One option uses an EXISTS clause:
SELECT
a.id, a.name
FROM animals a
WHERE NOT EXISTS (SELECT 1 FROM relations r INNER JOIN continent c
ON r.continent = c.id
WHERE a.id = r.animal AND c.name = 'Asia');
Demo
The idea here is that for each animal, we scan the relations table joined to continent searching for the same animal being assigned to the Asia continent. If we can't find that relationship, then retain that particular animal.
select a.* from animal A where a.id not in(select animal from relations where continent=2);

SQl - find average amount per person

I have two tables
class
| id | area | students |
| 1 | area1 | 2 |
| 2 | area1 | 28 |
| 3 | area1 | 22 |
| 4 | area2 | 4 |
deliveries
| id | kg | classid |
| 1 | 120 | 1 |
| 2 | 80 | 1 |
| 3 | 20 | 1 |
| 4 | 200 | 2 |
| 5 | 150 | 3 |
| 6 | 14 | 2 |
I need to sum up the average of kg delivered per student in a each area.
For area1 that should amount to (120+80+20+200+150+14)/(2+28+22) = 11.23
But I can't figure out how to write that query. I guess I have to use some kind of subquery to first sum out students in area1 (52), before I sum kg delivered and divide on students?
This is a little tricky, because the students should be counted separately from the classes:
select c.area, sum(d.kg) / max(area_students) as avg_kg_per_student
from class c join
deliveries d
on d.classid = c.id join
(select c2.area, sum(students) as area_students
from class c2
group by c2.area
) c2
on c2.area = c.area
group by c.area;
I think you cannot use average because you need to determine the denominator yourself:
SELECT sum(kg)/ studSum AS avg
FROM _class LEFT JOIN _deliveries ON _class.id=_deliveries.classid
left join (select area, sum(students) as studSum from _class group by area) subT
ON subT.area=_class.area
GROUP BY _class.area;
Here is a very readable approach: Get students per area and kg per area, then join the two.
select stu.area, stu.students, del.kg, del.kg / stu.students
from
(
select area, sum(students) as students
from class
group by area
) stu
join
(
select c.area, sum(d.kg) as kg
from class c
join deliveries d on d.classid = c.classid
group by c.area
) del on del.area = stu.area;

LEFT JOIN does not return all the records from the left side table

SELECT d.mt_code,
d.dep_name,
d.service_name,
COUNT(*)
FROM DepartmentService AS d
LEFT JOIN tbl_outgoing AS t ON d.mt_code = t.depCode
WHERE d.service_type = 'MT'
AND t.smsc = "mobitelMT"
AND t.sendDate BETWEEN '2014-07-01' AND '2014-07-02'
GROUP BY d.mt_code
DepartmentService table has details about departments that offer services. tbl_outgoing table contains all the transactions happened for a particular service which are done by customers. In the WHERE clause two cafeterias should be fulfilled which are service_type = 'MT' and smsc = "newMT". I want to get a report which shows all the departments with the transactions for a given period. I have used a LEFT JOIN because I want to get all the departments. SQL works fine and get the result I want except,
When there are no transactions for a particular service for a particular period, The department is also ignored. What I want to do is show the department in the resultset and COUNT(*) column to be 0.
How can I do that?
The problem could be that you are filtering on the joined table using the where condition which will filter also the department services which don"t have a match in the join, move the filtering in the join and leave only the filters on d in the where clause:
SELECT d.mt_code,
d.dep_name,
d.service_name,
COUNT(t.id)
FROM DepartmentService AS d
LEFT JOIN tbl_outgoing AS t
ON d.mt_code = t.depCode
AND t.smsc = "mobitelMT"
AND t.sendDate BETWEEN '2014-07-01' AND '2014-07-02'
WHERE d.service_type = 'MT'
GROUP BY d.mt_code
To explain why this happens I'll walk you through what happens with your query and with my query, as dataset I'll use this:
states
____ _________
| id | state |
| 1 | Germany |
| 2 | Italy |
| 3 | Sweden |
|____|_________|
cities
____ ________ ___________ ____________
| id | city | state_fk | population |
| 1 | Berlin | 1 | 10 |
| 2 | Milan | 2 | 5 |
|____|________|___________|____________|
First I'll go through your query.
SELECT s.id, s.state, c.population, c.city
FROM states s
LEFT JOIN cities c
ON c.state_fk = s.id
WHERE c.population < 10
So le't go step by step, you select the three states, left join with cities ending up with:
____ _________ ____________ ________
| id | state | population | city |
| 1 | Germany | 10 | Berlin |
| 2 | Italy | 5 | Milan |
| 3 | Sweden | NULL | NULL |
|____|_________|____________|________|
The you filter the population using WHERE c.population < 10, at this point your left with this:
____ _________ ____________ ________
| id | state | population | city |
| 2 | Italy | 5 | Milan |
|____|_________|____________|________|
You loose Germany because Berlin population was 10 but you lost also Sweden which had NULL, if you wanted to keep the nulls you should have specified it in the query:
WHERE (c.population < 10 OR IS NULL c.population)
Which returns:
____ _________ ____________ ________
| id | state | population | city |
| 2 | Italy | 5 | Milan |
| 3 | Sweden | NULL | NULL |
|____|_________|____________|________|
Now my query:
SELECT s.id, s.state, c.population, c.city
FROM states s
LEFT JOIN cities c
ON c.state_fk = s.id
AND c.population < 10
Before joining the two, we filter the table cities (using the AND c.population < 10 condition after the ON), what remains is:
____ ________ ___________ ____________
| id | city | state_fk | population |
| 2 | Milan | 2 | 5 |
|____|________|___________|____________|
Because Milan is the only city with population minor than 10, now we can join the two tables:
____ _________ ____________ ________
| id | state | population | city |
| 1 | Germany | NULL | NULL |
| 2 | Italy | 5 | Milan |
| 3 | Sweden | NULL | NULL |
|____|_________|____________|________|
As you can see the data from the left table stays because the filtering condition was applied only to the cities table.
The result set changes depending on what you want to achieve, if for example you do want to filter Germany because Berlin has population minor than 10 and keep Sweden you should use the first approach adding the IS NULL condition, if you want to keep it instead, you should use the second approach and pre filter the table on the right of the left join.

Join on same column name

Hello there I want to get data from two tables that share same column name. My table structure are
Table patients
---------------------------------------
| id | affiliate_id | somecolumn |
---------------------------------------
| 1 | 8 | abc |
---------------------------------------
| 2 | 8 | abc |
---------------------------------------
| 3 | 9 | abc |
---------------------------------------
Table Leads
---------------------------------------
| id | affiliate_id | someothern |
---------------------------------------
| 1 | 8 | xyz |
---------------------------------------
| 2 | 8 | xyz |
---------------------------------------
| 3 | 3 | xyz |
---------------------------------------
Now my requirement was to get COUNT(ID) from both tables in a single query. I want result like
----------------------------------------------------
| affiliate_id | total_patients | total_leads |
----------------------------------------------------
| 8 | 2 | 2 |
----------------------------------------------------
| 9 | 1 | 0 |
----------------------------------------------------
| 3 | 0 | 1 |
----------------------------------------------------
I wrote following query
SELECT `p`.`affiliate_id`, COUNT(p.id) AS `total_patients`,
COUNT(cpl.id) AS `total_leads`
FROM `patients` AS `p`
INNER JOIN `leads` AS `cpl` ON p.affiliate_id =cpl.affiliate_id
GROUP BY `p`.`affiliate_id`
But I am not getting result . This query results giving only one affiliate with same number of total_patients and total_leads
The problem is that you need to get a list of the distinct affiliate_id first and then join to your other tables to get the result:
select a.affiliate_id,
count(distinct p.id) total_patients,
count(distinct l.id) total_leads
from
(
select affiliate_id
from patients
union
select affiliate_id
from leads
) a
left join patients p
on a.affiliate_id = p.affiliate_id
left join leads l
on a.affiliate_id = l.affiliate_id
group by a.affiliate_id;
See SQL Fiddle with Demo
Two ways:
Select l.affiliate_id ,
count(distinct p.id) patientCount,
count(distinct l.id) LeadCOunt
From patients p Join leads l
On l.affiliate_id = p.Affiliate_id
Group By l.affiliate_id
or, (assuming affiliates are in their own table somewhere)
Select Affiliate_id,
(Select Count(*) From Patients
Where Affiliate_id = a.Affiliate_id) patientCount,
(Select Count(*) From Leads
Where Affiliate_id = a.Affiliate_id) LeadCount
From affiliates a