SQL max is non-deterministic? - mysql

I have two tables: cities and states. States has columns for state codes and full name. Cities contains columns for population, state code, and the city name. My goal is to create a table of the city in each state with the highest population.
This is my solution which seems to work in a test, but I've been told that using max() is non-deterministic and I should use a window function instead.
SELECT
s.name,
c.name,
max(c.population)
FROM cities AS c
LEFT JOIN states AS s
ON c.state_code = s.code
GROUP BY s.name
ORDER BY s.name;
What is wrong with using max here, when would it give incorrect results?

In most databases your query would not even run, because you are selecting the non-aggregated column c.name without also using it in the GROUP BY clause.
For MySql, the code would run if ONLY_FULL_GROUP_BY mode is disabled, but still it would return wrong results because the query would pick a random city name out of all the cities of each state.
See the demo.
For SQLite, your query is correct!
SQLite's feature of bare columns, makes sure that the city name you get in the results is the one that has the max population.
This is non-standard, but it is documented.
The only problem here is that if there are 2 or more cities with the same max population you will get only one of them in the results.
See the demo.

You can find the city in each state with the max population and use it in a sub-query and join it with the tables.
Query
select s.name as state, c.name as city, c.population
from states s
join cities c
on c.state_code = s.code
join (
select state_code, max(population) as max_pop
from cities
group by state_code
) as p
on p.state_code = c.state_code
and p.max_pop = c.population;

create table states(code varchar(50),name varchar(50));
create table cities(code varchar(50),name varchar(50),population int, state_code varchar(50));
insert into states values('s01','state1');
insert into cities values('c01','city1',100,'s01');
insert into cities values('c02','city2',10,'s01');
Query:
with cte as
(
SELECT
s.name state_name,
c.name city_name,
c.population,
row_number()over(partition by s.name order by c.population desc)rn
FROM cities AS c
LEFT JOIN states AS s
ON c.state_code = s.code
)
select state_name, city_name, population from cte where rn=1
Output:
state_name
city_name
population
state1
city1
100
db<>fiddle here

Related

SQL Query, number of city and continent

I have a problem with a query.
I have to find for all the continent: the name of the continent, number of cities and number of countries. This is what I did
SELECT co.continent, COUNT(*)
FROM Country co
JOIN City c ON c.countrycode = co.code
GROUP BY co.continent
UNION
SELECT COUNT(*)
FROM Country co2
WHERE co.continent = co2.continent ( <---- ??? )
GROUP BY co2.continent
But I don't know if is it legal the part "WHERE co.continent = co2.continent" because the second query isn't a subquery of the first, is it? Is there another way to do this query?
UNION is not required, a single query with GROUP BY and COUNT aggregate will get the desired result, there could be multiple cities in the same country, a country could appear multiple times, use COUNT(DISTINCT...) to remove duplicates.
SELECT co.continent, COUNT(*) cities, COUNT(DISTINCT co.code) countries
FROM Country co
JOIN City c ON c.countrycode = co.code
GROUP BY co.continent
co.continent = co2.continent in the original union query is invalid. Queries in union are independent from each other.

Getting error in SQL query using INNER JOIN

I'm still learning the MySQl.
This is the relational DBMS :
CUSTOMER (CustID, CustName, AnnualRevenue)
TRUCK (TruckNumber, DriverName)
CITY (CityName, Population)
SHIPMENT (ShipmentNumber, CustID, Weight, Year, TruckNumber, CityName)
Now, I have to formulate for these two queries:
Total weight of shipments per year for each city.
Drivers who drove shipments to London but not Paris.
These are the queries i have came up with:
1.
select sum(s.weight), s.year , c.city
from shipment s, city c
INNER JOIN CITY
on s.CityName = c.CityName
You are mixing and old way to JOIN table (which you should avoid because the joining columns are not explicitly stated and it is confusing for others):
FROM shipment s, city c
You should group columns in the select that are not aggregated (year, city). Also it is better to use an alias for the aggregated column (AS total_weight)
select sum(s.weight) AS total_weight, s.year , c.city
from shipment s
INNER JOIN CITY as c
on s.CityName = c.CityName
GROUP BY s.year, c.city
Try to solve the second query and come back if you have a problem.

Simple inner join not working

I'm trying to query the sum of the populations of all cities where the CONTINENT is 'Asia'.
The two tables CITY and COUNTRY are as follows,
city - id, countrycode, name population
country - code, name, continent, population
Here's my query
SELECT SUM(POPULATION) FROM COUNTRY CITY
JOIN ON COUNTRY.CODE = CITY.COUNTRYCODE
WHERE CONTINENT = "Asia";
This doesn't work. What am I doing wrong. I'm new to SQL.
It isn't working because the way you've written it CITY is being interpreted as a table alias for COUNTRY. Additionally, it looks like you've got a POPULATION column in each table so you need to disambiguate it. Let me rewrite the query for you:
SELECT SUM(CITY.POPULATION)
FROM COUNTRY
JOIN CITY
ON COUNTRY.CODE = CITY.COUNTRYCODE
WHERE COUNTRY.CONTINENT = "Asia";
I know the question was already answered, but I would like to put out the optimised solution. The below solution will decrease the execution time and at the same time it will take less resource to perform the SQL query.
select sum(a.population) from city a
inner join(select * from country where continent = 'Asia') b
on a.countrycode=b.code;
I would like to explain a bit on top of that, as you see I'm applying the filter condition before performing Join operation. So during reshuffling phase, the data would be very less and this way query will take less time to execute. You will not see a drastic performance changes in less data size, however while running this queries in large dataset, you can see the performance improvement.
The JOIN needs to go between the two table names:
SELECT SUM(CITY.POPULATION) FROM COUNTRY INNER JOIN CITY
ON COUNTRY.CODE = CITY.COUNTRYCODE
WHERE CONTINENT = "Asia";
MySQL JOIN syntax manual
SELECT SUM(COUNTRY.POPULATION)
FROM COUNTRY
JOIN CITY
ON COUNTRY.CODE = CITY.COUNTRYCODE
WHERE CONTINENT = "Asia";
SELECT SUM(CITY.POPULATION)
FROM CITY
INNER JOIN COUNTRY ON CITY.COUNTRYCODE = COUNTRY.Code
where COUNTRY.CONTINENT = 'Asia';
Line 3 has INNER JOIN because there is one column in both the tables that are common to both
SELECT sum(city.population) FROM city LEFT JOIN country ON city.countrycode=country.code
WHERE country.continent='Asia'
You can run the following code using Oracle.
SELECT SUM(c.POPULATION)
FROM CITY c
INNER JOIN COUNTRY co ON c.CountryCode = co.Code
WHERE CONTINENT ='Asia' ;
select SUM(cty.POPULATION) from COUNTRY cntry, CITY cty where cty.COUNTRYCODE=cntry.CODE AND cntry.CONTINENT='Asia';
select sum(S.Population)
from City S
where S.CountryCode in (select Code
from Country C
where CONTINENT = 'Asia');

MySQL error 1242: Subquery returns more than 1 row

I'm working on some SQL homework, and I've come to a dead-end on this one question and I'm hoping someone can point out what exactly I'm doing wrong here.
SELECT Name,
(SELECT Name
FROM City
WHERE City.CountryCode = Country.Code) AS 'city',
(SELECT Population
FROM City
WHERE City.CountryCode = Country.Code) AS 'city_population'
FROM Country
WHERE Region IN ('Western Europe')
HAVING city_population > (SUM(Population) / COUNT(city))
ORDER BY Name, city;
What I'm trying to do here is retrieve from a database of global statistics a list of cities (from the City table) matched with their Country from that table, in which the country is in the region of Western Europe and the population of the city is greater than the average population of cities for its country, ordered by country and city name. The CountryCode and Code are the keys for the tables.
Can anyone tell me where I'm going wrong? I'm guessing MySQL is unhappy because my subqueries are returning more rows than the selector for country names does, but that's exactly what I want to do. I want multiple rows for a country value, one row for each city that meets the search criteria of having greater than average populations. The assignment also specifically forbids me from using joins to solve this problem.
A join should do it. You can join city on country code, and filter out cities that have a lower than average population
select
co.Name as CountryName,
ci.Name as CityName,
ci.Population as CityPopulation
from
Country co
inner join City ci
on ci.CountryCode = co.CountryCode
where
co.Region in ('Western Europe')
and ci.Population >
(select sum(ca.Population) / count(*) from City ca
where ca.CountryCode = co.CountryCode)
Additions:
Since you are not allowed to use joins, you could solve it in a couple of ways.
1) You can alter your query a little bit, but it won't return rows for each city. Instead it will return the list of cities as a single field. This is only a slight modification of your query. Note the GROUP_CONCAT function, which works like SUM only it concats the values instead of summing them. Also note the added ORDER BY clause in the subselects, so you can make sure The nth Population matches the nth City name.
SELECT Name,
(SELECT GROUP_CONCAT(Name)
FROM City
WHERE City.CountryCode = Country.Code
ORDER BY City.Name) AS 'city',
(SELECT GROUP_CONCAT(Population)
FROM City
WHERE City.CountryCode = Country.Code
ORDER BY City.Name) AS 'city_population'
FROM Country
WHERE Region IN ('Western Europe')
HAVING city_population > (SUM(Population) / COUNT(city))
ORDER BY Name, city;
2) You can alter by query a little bit. Remove the join on Country, and instead use some subselects in the filter and in the select. The latter is only needed if you need country name at all. If country code is enough, you can select that from City.
select
(select County.Name
from Country
where County.CountyCode = ci.CountryCode) as CountryName,
ci.CountryCode,
ci.Name as CityName,
ci.Population
from
City ci
where
-- Select only cities in these countries.
ci.CountryCode in
( select co.CountryCode
from Country co
where co.Region in ('Western Europe'))
-- Select only cities of above avarage population.
-- This is the same subselect that existed in the join before,
-- except it matches on CountryCode of the other 'instance' of
-- of the City table. Note, you will _need_ to use aliases (ca/ci)
-- here to make it work.
and ci.Population >
( select sum(ca.Population) / count(*)
from City ca
where ca.CountryCode = ci.CountryCode)
A subquery in the the select part of the statement only expects one value returned from the query. Remember the commas that separate the values in your select statement represent columns and each column expects one value. In order to get a list of values returned in a subquery (as if it were another table) and use it in the outer query you would have to put the subqueries in the from part of your Query. Note: this may not be a proper code for your results. I was just addressing the issue of the MySQL error 1242.
SELECT Name
FROM Country, (SELECT Name
FROM City
WHERE City.CountryCode = Country.Code) AS 'city',
(SELECT Population
FROM City
WHERE City.CountryCode = Country.Code) AS 'city_population'
WHERE Region IN ('Western Europe')
HAVING city_population > (SUM(Population) / COUNT(city))
ORDER BY Name, city;

Turning mysql subquery into Join

I'm new to mysql & just started learning it. Last night I was trying to re-form following sub-query on country table of world database, into a join.
SELECT continent, NAME, population FROM country c WHERE
population = (SELECT MAX(population) FROM country c2
WHERE c.continent=c2.continent AND population > 0)
I tried following query and several others with inner join etc. but failed. I'm getting result with the following query where max population is as expected but continent & country name as different.
SELECT c.continent, c2.name, MAX(c2.population) AS pop FROM country c, country c2
WHERE c.continent = c2.continent GROUP BY continent
Please help, how can I get same result as the sub-query above.
Thanks in advance
You should get the MAX(population) with GROUP BY continent inside a subquery, then JOIN it with the table itself; Like this:
SELECT c1.continent, c1.NAME, c1.population
FROM country c1
INNER JOIN
(
SELECT continent, MAX(population) AS Maxp
FROM country
WHERE population > 0
GROUP BY continent
) AS c2 ON c1.population = c2.maxp
AND c1.continent = c2.continent;