I am trying to craft code that "Shows the countries that are big by area or big by population but not both". It should show the name, population and area.
The table the code references...
This is my code so far...
SELECT name, population, area FROM world
WHERE area > 3000000 OR population > 250000000 OR name != LIKE '%United States%'
world contains name, area, and population.
Anyone have any advice?
You can use XOR for this. It's true if only one of its parameters is true.
SELECT name, population, area
FROM world
WHERE (area > 3000000 XOR population > 250000000)
AND name NOT LIKE '%United States%'
I also changed the way the United States test is combined. I assume you're trying to exclude United States from the results, so it needs to be AND.
Use the XOR (exclusive OR) operator:
SELECT name, population, area FROM world
WHERE area > 3000000 XOR population > 250000000
Related
I just started an sql exercise-style tutorial BUT I still haven't grasped the concept of correlated queries.
name, area and continent are fields on a table.
The query is to Find the largest country (by area) in each continent, show the continent, the name and the area.
The draft work so far:
SELECT continent, name, population FROM world x
WHERE area >= ALL
(SELECT area FROM world y
WHERE y.continent=x.continent
AND population>0)
Tried reading up on it on a few other blogs.
need to understand the logic behind correlated queries.
I assume the query you posted work. You just need clarification of what it does.
SELECT continent, name, population
FROM world x
WHERE area >= ALL (
SELECT area FROM world y
WHERE y.continent=x.continent
AND population>0
)
The query translates to
"Get the continent, name, and population of a country where area is bigger than or equal to all other countries in the same continent".
The WHERE clause in the inner query is to link the 2 queries (in this case countries in the same continent). Without the WHERE, it will get the country with the largest are in the world.
You can think of a correlated subquery as a looping mechanism. This is not necessarily how it is implemented, but it describes what it does.
Consider data such as:
row continent area population
1 a 100 19
2 a 200 10
3 a 300 20
4 b 15 2000
The outer query loops through each row. Then it looks at all matching rows. So, it takes record 1:
row continent area population
1 a 100 19
It then runs the subquery:
(SELECT w2.area
FROM world w2
WHERE w2.continent = w.continent AND
w2.population > 0
)
And substitutes in the values from the outer table:
(SELECT w2.area
FROM world w2
WHERE w2.continent = 'a' AND
w2.population > 0
)
This returns the set (100, 200, 300).
Then it applies the condition:
where w1.area >= all (100, 200, 300)
(This isn't really valid SQL but it conveys the idea.)
Well, we know that w1.area = 100, so this condition is false.
The process is then repeated for each of the rows. For the "a" continent, the only row that meets the condition is the third one -- the one with the largest area.
I'm doing this question:
Which countries have a GDP greater than every country in Europe? [Give the name only.] (Some countries may have NULL gdp values)
The suggested answer from the website is:
SELECT name
FROM world
WHERE GDP >= ALL(SELECT GDP
FROM world
WHERE population>0 AND continent = 'Europe' )
Here it uses ALL keyword , and need to take care of the null value using WHERE population >0
My solution is like this:
SELECT name
FROM world
WHERE GDP >= (SELECT MAX(GDP)
FROM world
WHERE continent = 'Europe')
I use the MAX keyword and it seems that in this case we don't need to considering taking care of NULL value
Is my solution right? What's the trade-off of the two solutions?
Aggregate functions like MAX ignore NULL-values, so the meaning of condition
... WHERE GDP >= (SELECT MAX(GDP) FROM world WHERE continent = 'Europe')
is: it is true, if GDP is greater or equal than the GDP of every european country that has a GDP defined.
And it is equivalent to a condition like:
GDP >= ALL(SELECT GDP FROM world WHERE continent = 'Europe' and GDP is not null)
So it this is what you want to achieve (and I would interpret the exercise that way), then your approach is correct.
There is a very small content available regarding the performance of both. The page is available at MySQL docs.
MySQL rewrites IN, ALL, ANY, and SOME subqueries in an attempt to take advantage of the possibility that the select-list
columns in the subquery are indexed.
MySQL enhances expressions of the following form with an expression involving MIN() or MAX(), unless NULL values or empty sets are
involved:
value {ALL|ANY|SOME} {> | < | >= | <=} (uncorrelated subquery)
For example, this WHERE clause:
WHERE 5 > ALL (SELECT x FROM t)
might be treated by the optimizer like this:
WHERE 5 > (SELECT MAX(x) FROM t)
Therefore; for your particular case, the website mentions using ALL because of the possible NULL values in data. If that were not the case, you can replace ALL with MAX as stated above.
I have a query.I have table with two columns country and state.I want to display columns in following format
Country State
----------- ---------
India Delhi
Bangalore
Kolkata
Mumbai
USA California
Florida
Las Vegas
Virginia
It means "India" just appear one time in country column and and repeated values would come as blank value in country column when i select country and state from table.
Thanks in advance
Presentation is usually if not always better done outside of SQL so I'd recommend doing this in whatever your presentation layer runs, but if it's a requirement for the query, you can do it using session variables;
SELECT Country, State FROM (
SELECT IF(Country=#country, '', Country) Country, State, #country := Country
FROM (SELECT Country, State FROM Table1 ORDER BY Country, State) dummy1,
(SELECT #country:='') dummy2
) dummy3;
An SQLfiddle to test with.
Just to show a (probably) better way, you can use this to get a list of states per country, and process it further in your presentation layer;
SELECT Country, GROUP_CONCAT(State) FROM Table1 GROUP BY Country;
Another SQLfiddle.
use pl/sql.Moreover your table would be voilating 5th normal form.
I have been intrigued by a problem on SQLZoo. It is a "greatest-n-per-group" problem. I would like to understand how the engine is operating.
A table called bbc contains the name, region of the world and population of each country:
bbc( name, region, population)
The given task is to select the most populous country of each region, showing its name, the region and population.
The solution provided is:
SELECT region, name, population FROM bbc x
WHERE population >= ALL
(SELECT population FROM bbc y
WHERE y.region=x.region
AND population>0)
1. Main Question. I am finding this a bit of a mind twister. I would like to understand how the engine processes this, because at first blush it seems there is some kind of co-dependence (x depending on y, and y depending on x). Does the engine follow some kind of recursion to produce the final selection? Or am I missing something, such that either x or y is actually fixed?
2. Secondary Question. Oddly, when I pull the "AND population>0" out of the parenthesis and leave it on its own at the bottom, one of the regions (Europe / Russia) goes missing from the 8 results. Why? I don't understand that.
And indeed, when I try the query on the world database (available from the mySQL website on the same page as Sakila), the behavior is different:
With population > 0 out of the parentheses, I get 6 regions. Six is the right number in this database, because "SELECT continent FROM country GROUP BY continent" reveals seven continents, of which one is Antarctica, which includes 5 "countries", all with a 0 population.
So that seems right.
SELECT continent, `name`, population FROM country X
WHERE population >= ALL
(SELECT population FROM country Y
WHERE Y.`Continent` = X.`Continent`)
AND population>0
On the other hand, when I pull "population > 0" back into the parentheses as on SQLZoo, I also get 5 countries with a zero (the countries "belonging to Antarctica"). It doesn't matter if I specify x.population or y.population, I get zeroes.
continent name population
------------- -------------------------------------------- ------------
Antarctica Antarctica 0
Antarctica French Southern territories 0
Oceania Australia 18886000
South America Brazil 170115000
Antarctica Bouvet Island 0
Asia China 1277558000
Antarctica Heard Island and McDonald Islands 0
Africa Nigeria 111506000
Europe Russian Federation 146934000
Antarctica South Georgia and the South Sandwich Islands 0
North America United States 278357000
Very much looking for insights on these questions!
Wishing you all a beautiful week.
:)
Notes:
For reference, the problem is number 3a on this page:
http://old.sqlzoo.net/1a.htm?answer=1
A thread mentioning the "greatest-n-per-group" problem for the same query:
MySQL world database Trying to avoid subquery
The world database is available here: http://dev.mysql.com/doc/index-other.html
Main Question. I am finding this a bit of a mind twister. I would like to understand how the engine processes this, because at first
blush it seems there is some kind of co-dependence (x depending on y,
and y depending on x). Does the engine follow some kind of recursion
to produce the final selection? Or am I missing something, such that
either x or y is actually fixed?
This isn't recursion. See this from the MySQL docs. Their solution to the problem is equivalent to this
SELECT region, name, population FROM bbc x
WHERE population =
(SELECT max(population) FROM bbc y
WHERE y.region=x.region
)
Secondary Question. Oddly, when I pull the "AND population>0" out of the parenthesis and leave it on its own at the bottom, one of the
regions (Europe / Russia) goes missing from the 8 results. Why? I
don't understand that.
Slight changes (as suggested by ypercube above) work
SELECT region, name, population FROM bbc x
WHERE population >= ALL
(SELECT population FROM bbc y
WHERE y.region=x.region
AND population IS NOT NULL)
This query
SELECT region, name, population FROM bbc x
WHERE population is null
Returns a row. Not sure why population should be nullable, but didn't take a good look at the rest of it. Otherwise, the query should work fine without the >0
Also, this is different from the greatest-n-per-group. In that problem you seek to find the top N items instead of just the top one.
i dont understand the problem with returning multiple rows:
here is my table BBC:
name region area population gdp
Afghanistan South Asia 652225 26000000
Albania Europe 28728 3200000 6656000000
Algeria Middle East 2400000 32900000 75012000000
Andorra Europe 468 64000
Angola Africa 1250000 14500000 14935000000
etc.............................
question:
List the name and region of countries
in the regions containing 'India',
'Iran'.
this is my statement:
select name from bbc where region = (select region from bbc where name='India' or name='Iran')
it returns:
sql: errorSubquery returns more than 1 row
whats wrong with my statement? the answer should be in the form of a select statement within a select statement
thank you!
This is because you are trying to compare region to a table of values. Instead, try using in:
select name
from bbc
where region in
(select region from bbc where name='India' or name='Iran')
You might have slightly different syntax and it'll work:
SELECT name
FROM bbc
WHERE region IN
(
SELECT region FROM bbc WHERE name='India' OR name='Iran'
)
The only difference being that instead of equals (=), we use IN.
The reason your previous one failed is because to use equals, you compare one value with one other value. What you were accidentally doing is comparing one value with multiple values (the "SubQuery returns more than one row"). The change here is saying where region is within the results returned from the sub query.
select name,region from bbc where region IN (select region from bbc where name IN('India','Iran'))