How do I Understand correlated queries? - mysql

I just started an sql exercise-style tutorial BUT I still haven't grasped the concept of correlated queries.
name, area and continent are fields on a table.
The query is to Find the largest country (by area) in each continent, show the continent, the name and the area.
The draft work so far:
SELECT continent, name, population FROM world x
WHERE area >= ALL
(SELECT area FROM world y
WHERE y.continent=x.continent
AND population>0)
Tried reading up on it on a few other blogs.
need to understand the logic behind correlated queries.

I assume the query you posted work. You just need clarification of what it does.
SELECT continent, name, population
FROM world x
WHERE area >= ALL (
SELECT area FROM world y
WHERE y.continent=x.continent
AND population>0
)
The query translates to
"Get the continent, name, and population of a country where area is bigger than or equal to all other countries in the same continent".
The WHERE clause in the inner query is to link the 2 queries (in this case countries in the same continent). Without the WHERE, it will get the country with the largest are in the world.

You can think of a correlated subquery as a looping mechanism. This is not necessarily how it is implemented, but it describes what it does.
Consider data such as:
row continent area population
1 a 100 19
2 a 200 10
3 a 300 20
4 b 15 2000
The outer query loops through each row. Then it looks at all matching rows. So, it takes record 1:
row continent area population
1 a 100 19
It then runs the subquery:
(SELECT w2.area
FROM world w2
WHERE w2.continent = w.continent AND
w2.population > 0
)
And substitutes in the values from the outer table:
(SELECT w2.area
FROM world w2
WHERE w2.continent = 'a' AND
w2.population > 0
)
This returns the set (100, 200, 300).
Then it applies the condition:
where w1.area >= all (100, 200, 300)
(This isn't really valid SQL but it conveys the idea.)
Well, we know that w1.area = 100, so this condition is false.
The process is then repeated for each of the rows. For the "a" continent, the only row that meets the condition is the third one -- the one with the largest area.

Related

What are the MySQL queries for the following tasks? Are my solutions and line of thinking correct so far? Thank you very much

This is my very first question on Quora. Thanks for any recommendations , solutions and remarks. I was not provided with many specifics and data, this is a assignments where one must use his or her imagination to fill in particular data and variables. What counts is the correct logic, approach and covering possibilities. Your help is highly appreciated! Thank you!
Country and Continent table
Required Visual Result for Question 3
Questions:
Write a query that would select all countries with GDP of more than 1 000 000 000 USD
Write a query that would return all countries in Europe (specifically) with GDP of more than 1 000 000 000 USD
Write a query that lists all continents with GDP per continent (as the sum of the GDP of all countries). Each country belong to one continent only.
For what result should look like - please resort to "Required Visual Result for Question 3" image.
My Solutions:
select * from countries where GDP > 1000000000
select * from countries where continent_id = 2 and GDP>1000000000;
select sum(GDP) from countries where continent_id = 4;
However, here in 3) I can only have the GDP sum displayed, and do not know how to have the continent's name on the left side as well. Please, if possible, assist with having the continent's name displayed and then right next to it and on the right handside - the relevant GDP sum.
Welcome!
Your image shows the tables as country and continent but your queries refer to countries? So which is it please?
On the basis your image is correct but the queries are wrong then number 3 would be as below:
With no 3 your currently only going to get the some for the continent with the ID of 4.
select sum(GDP) from country where continent_id = 4;
So what you want to do is remove the WHERE and the GROUP BY continent_id to give you 1 result per continent.
select sum(GDP) from country where 1 GROUP BY continent_id
No to get the continent name included in your results you can use the JOIN syntax.
In this instance you want all your records from country and just the records from continent that match your join condition which will be the continent_id from country and the id from the continent table.
SELECT
`continent`.`name`, sum(`GDP`)
FROM `country`
LEFT JOIN `continent`
ON `country`.`continent_id` = `continent`.`id`
GROUP BY `continent_id`
ORDER BY `continent`.`name` ASC;
That should give you the results 1 per continent as required.
I've specified the table names as its clearer how to target specific columns from each table.

MySQL: Conditional MIN() with GROUP BY

I have this table called times where I record race information for a racing game:
race_id map name time
30509 desert Peter 12.68
30510 desert Jakob 10.72
30511 desert Peter 18.4
30512 jungle Peter 39.909
30513 jungle Peter 39.84
30514 desert Harry 16.129
30515 space Harry 774.765
30516 jungle Jonas 46.047
30517 city Jonas 23.54
30518 city Jonas 23.13
30519 desert Mike 22.9
30520 space Fred 174.244
I have two questions. How would I best go about:
Finding the lowest time (world record) on a given map?
I have tried this query:
SELECT *, MIN(time) FROM times WHERE map = 'desert';
This yields a seemingly incorrect arbitrary row with an added column called MIN(time) where the correct lowest time is.
Finding the lowest time on all maps, but only if it's done by a certain player (find all world records by given player)?
For this I have tried this query:
SELECT *, MIN(time) FROM times WHERE name = 'Peter' GROUP BY map;
This seems to only return the first row by the given name for each map, regardless if it's the lowest time or not.
I'm fairly new to SQL(MySQL), so I might be missing something obvious here. I've been looking around for quite a while now, and any help would be greatly appreciated. Thanks!
if you want the fastest performance on a given race, you can just order by and limit:
select *
from times
where map = 'desert'
order by time limit 1
On the other hand, if you want all race records for a given user, then it is a bit different. One option uses a correlated subquery for filtering:
select t.*
from times t
where
name = 'Peter'
and time = (select min(t1.time) from times t1 where t1.map = t.map)
Finding the lowest time (world record) on a given map
SELECT `time`
FROM times
WHERE map = #map
ORDER BY `time` ASC
LIMIT 1
Finding the lowest time on all maps, but only if it's done by a certain player (find all world records by given player)
SELECT `time`
FROM times
WHERE name = #name
ORDER BY `time` ASC
LIMIT 1

MySQL get duplicate rows in subquery

I want to display all duplicate records from my table, rows are like this
uid planet degree
1 1 104
1 2 109
1 3 206
2 1 40
2 2 76
2 3 302
I have many different OR statements with different combinations in subquery and I want to count every one of them which matches, but it only displays the first match of each planet and degree.
Query:
SELECT DISTINCT
p.uid,
(SELECT COUNT(*)
FROM Params AS p2
WHERE p2.uid = p.uid
AND(
(p2.planet = 1 AND p2.degree BETWEEN 320 - 10 AND 320 + 10) OR
(p2.planet = 7 AND p2.degree BETWEEN 316 - 10 AND 316 + 10)
...Some more OR statements...
)
) AS counts FROM Params AS p HAVING counts > 0 ORDER BY p.uid DESC
any solution folks?
updated
So, the problem most people have with their counting-joined-sub-query-group-queries, is that the base query isn't right, and the following may seem like a complete overkill for this question ;o)
base data
in this particular example what you would want as a data basis is at first this:
(uidA, planetA, uidB, planetB) for every combination of player A and player B planets. that one is quite simple (l is for left, r is for right):
SELECT l.uid, l.planet, r.uid, r.planet
FROM params l, params r
first step done.
filter data
now you want to determine if - for one row, meaning one pair of planets - the planets collide (or almost collide). this is where the WHERE comes in.
WHERE ABS(l.degree-r.degree) < 10
would for example only leave those pairs of planet with a difference in degrees of less than 10. more complex stuff is possible (your crazy conditional ...), for example if the planets have different diameter, you may add additional stuff. however, my advise would be, that you put some additional data that you have in your query into tables.
for example, if all 1st planets players have the same size, you could have a table with (planet_id, size). If every planet can have different sizes, add the size to the params table as a column.
then your WHERE clause could be like:
WHERE l.size+r.size < ABS(l.degree-r.degree)
if for example two big planets with size 5 and 10 should at least be 15 degrees apart, this query would find all those planets that aren't.
we assume, that you have a nice conditional, so at this point, we have a list of (uidA, planetA, uidB, planetB) of planets, that are close to colliding or colliding (whatever semantics you chose). the next step is to get the data you're actually interested in:
limit uidA to a specific user_id (the currently logged in user for example)
add l.uid = <uid> to your WHERE.
count for every planet A, how many planets B exist, that threaten collision
add GROUP BY l.uid, l.planet,
replace r.uid, r.planet with count(*) as counts in your SELECT clause
then you can even filter: HAVING counts > 1 (HAVING is the WHERE for after you have GROUPed)
and of course, you can
filter out certain players B that may not have planetary interactions with player A
add to your WHERE
r.uid NOT IN (1)
find only self collisions
WHERE l.uid = r.uid
find only non-self collisions
WHERE l.uid <> r.uid
find only collisions with one specific planet
WHERE l.planet = 1
conclusion
a structured approach where you start from the correct base data, then filter it appropriately and then group it, is usually the best approach. if some of the concepts are unclear to you, please read up on them online, there are manuals everywhere
final query could look something like this
SELECT l.uid, l.planet, count(*) as counts
FROM params l, params r
WHERE [ collision-condition ]
GROUP BY l.uid, l.planet
HAVING counts > 0
if you want to collide a non-planet object, you might want to either make a "virtual table", so instead of FROM params l, params r you do (with possibly different fields, I just assume you add a size-field that is somehow used):
FROM params l, (SELECT 240 as degree, 2 as planet, 5 as size) r
multiple:
FROM params l, (SELECT 240 as degree, 2 as planet, 5 as size
UNION
SELECT 250 as degree, 3 as planet, 10 as size
UNION ...) r

MAX or ALL in sql

I'm doing this question:
Which countries have a GDP greater than every country in Europe? [Give the name only.] (Some countries may have NULL gdp values)
The suggested answer from the website is:
SELECT name
FROM world
WHERE GDP >= ALL(SELECT GDP
FROM world
WHERE population>0 AND continent = 'Europe' )
Here it uses ALL keyword , and need to take care of the null value using WHERE population >0
My solution is like this:
SELECT name
FROM world
WHERE GDP >= (SELECT MAX(GDP)
FROM world
WHERE continent = 'Europe')
I use the MAX keyword and it seems that in this case we don't need to considering taking care of NULL value
Is my solution right? What's the trade-off of the two solutions?
Aggregate functions like MAX ignore NULL-values, so the meaning of condition
... WHERE GDP >= (SELECT MAX(GDP) FROM world WHERE continent = 'Europe')
is: it is true, if GDP is greater or equal than the GDP of every european country that has a GDP defined.
And it is equivalent to a condition like:
GDP >= ALL(SELECT GDP FROM world WHERE continent = 'Europe' and GDP is not null)
So it this is what you want to achieve (and I would interpret the exercise that way), then your approach is correct.
There is a very small content available regarding the performance of both. The page is available at MySQL docs.
MySQL rewrites IN, ALL, ANY, and SOME subqueries in an attempt to take advantage of the possibility that the select-list
columns in the subquery are indexed.
MySQL enhances expressions of the following form with an expression involving MIN() or MAX(), unless NULL values or empty sets are
involved:
value {ALL|ANY|SOME} {> | < | >= | <=} (uncorrelated subquery)
For example, this WHERE clause:
WHERE 5 > ALL (SELECT x FROM t)
might be treated by the optimizer like this:
WHERE 5 > (SELECT MAX(x) FROM t)
Therefore; for your particular case, the website mentions using ALL because of the possible NULL values in data. If that were not the case, you can replace ALL with MAX as stated above.

Why this query doesn't work with such condition?

I'm trying to solve some tasks from this http://sqlzoo.net/wiki/SELECT_within_SELECT_Tutorial
At the last task(number 8) I wrote a query:
select name, continent from world a
where a.population >
(select 3*max(population) from world b
where b.continent = a.continent)
but this query doesn't return any rows. But works almost the same query(just added an additional conditin in the end of subquery). But what's the matter? Why doesn't it return raws even if names of countries are the same?
select name, continent from world a
where a.population >
(select 3*max(population) from world b
where b.continent = a.continent and a.name <> b.name)
Let me translate what both query does to english, so you can realize the difference.
first query; compare and get all countries who are on the same continent and have more than 3 times of the maximum populated country in that continent.
second query; compare and get all countries who are on the same continent and have more than 3 times of the maximum populated country in that continent except himself.
in your first query the maximum populated country cannot be more than 3 times more populated than himself if he is the max populated country himself so your query returns 0 results.
but on the second query the maximum populated country EXCEPT himself can have population 3 times more than other countries in the same continent.