Correlated subqueries using SQL

Correlated subqueries using SQL - mysql

I just started coding using sql. This is an example of correlated- subquery I an struggling with; Can anyone please explain in layman terms whats going on here:
SELECT id
FROM flights AS f
WHERE distance > (SELECT AVG(distance)
FROM flights
WHERE carrier = f.carrier);

Your query is semantically identical to the following, which (personally) I find easier to read. (I suspect it's fractionally faster too)...
SELECT id
FROM flights g
JOIN
( SELECT carrier
, AVG(distance) avg_distance
GROUP
BY carrier
) x
ON x.carrier = f.carrier
AND x.distance > avg_distance;

Your query returns all flights, where the distance is bigger than average distance of that particular carrier.
An example:
id flight carrier distance
1 Brussels Swiss 200
2 New York Swiss 2000
3 Berlin Lufthansa 300
4 London Lufthansa 400
average distance of Swiss is 1100 and of Lufthansa is 350
And your query returns:
2
4
Flight 2 is of carrier Swiss and its distance 2000 is bigger than the average distance of Swiss with 1100.
Flight 4 is of carrier Lufthansa and its distance 400 is bigger than the average distance of Lufthansa with 350.

Related

How do I return a subquery more than 1 rows using having?

I have a query that can return the intended value but only 1 row. I need to have at least 26 rows of the values due based on the having clause.
Town
floor_area_sqm
resale_value
toronto
30
4500
chicago
44
300
toronto
22
3000
sydney
54
3098
pittsburg
102
2000
sydney
101
2000
pittsburg
129
2000
SELECT town, floor_area_sqm, resale_price
FROM X.flat_prices as X
GROUP BY town
HAVING Min(floor_area_sqm) = (SELECT MIN(floor_area_sqm) FROM X.flat_prices HAVING MAX(resale_price));
By using the formula above I get this:
Town
floor_area_sqm
resale_value
chicago
44
300
So the answer should show something like the following:
Town
floor_area_sqm
resale_value
chicago
44
300
toronto
22
3000
sydney
54
3098
pittsburg
102
2000
It should pull the lowest sqm for the town with the highest resale value. I got 26 towns and a database of over 200k.
I would like to replicate with MAX sqm using the same formula. Is join the way/only way to do it?

Use a subquery to get the minimum sqm for each town. Join that with the table to get all the properties with that sqm. Then get the maximum resale value within each of these groups.
SELECT t1.town, t1.floor_area_sqm, MAX(t1.resale_value) AS resale_value
FROM flat_prices AS t1
JOIN (
SELECT town, MIN(floor_area_sqm) AS floor_area_sqm
FROM flat_prices
GROUP BY town
) AS t2 ON t1.town = t2.town AND t1.floor_area_sqm = t2.floor_area_sqm
GROUP BY t1.town, t1.floor_area_sqm
DEMO
In MySQL 8.0 you can do it in one query with window functions, but I still haven't learned to use them.

This one goes for the highest resale price , then choose the one with the lowest sqm if multiple choices exist.
select t1.town ,min(floor_area_sqm) mi_sqm,resale_value from flat_prices t1
join
(select town,max(resale_value) mx_value from flat_prices group by town) t2
on t1.town=t2.town and resale_value=mx_value
group by town
;

How to avoid Group By working on every output?

I have a table like this:
LocationID CountryName CustomerAmount
C01 Australia 500
C02 Australia 200
C03 China 100
C04 China 200
C05 Japan 50
C06 Canada 120
I want to find the "number of customers in each country" AND the total number of customers.
Now I have the following query:
select countryName, sum(CustomerAmount)
from test
group by countryName;
I obviously got this output:
CountryName. customerAmount
Australia 700
China 300
Japan 50
Canada 120
But I want the output like this
CountryName. customerAmount totalAmount
Australia 700 1170
China 300 1170
Japan 50 1170
Canada 120 1170
My problem is how can I put two same sum(customerAmount) side by side, but one is grouped by countryName, while the other just sum up all values in customerAmount table.
Thank you in advance!!!! I have to say sorry as my expression may be ambiguous.

One easy way is just to use a sub-query like
select countryName, sum(CustomerAmount) customerAmount,
(select Sum(customerAmount) from test) totalAmount
from test
group by countryName;
If you can use window functions (MySql 8+) you can do
select countryName, sum(CustomerAmount) customerAmount,
sum(Sum(CustomerAmount)) over() totalAmount
from test
group by countryName;
note the nested sum().

SELECT countryName, SUM(CustomerAmount), SUM(CustomerAmount) OVER()
FROM test
GROUP BY countryName;
I did not test this, but using the over clause should do what you are looking for as seen here.

MySQL modifying order by rand() to other methods

I am now trying to make random selections from each grouped column array, with chances followed by the weight of each row. For example, I have a table (DemoTable) like this:
http://sqlfiddle.com/#!9/23470/3/0
Name
State
Grade
Weight
John
NY
100
1
Liam
NY
90
2
Olivia
NY
90
3
Emma
NY
80
4
James
CA
10
1
Henry
CA
20
1
Mia
NJ
50
1
Ava
NJ
30
4
For State = 'NY', there are four rows with grade array: [100, 90, 90, 80] and the weight [1, 2, 3, 4], respectively. So 80 has the largest chance to be picked while 100 has the least within its State group.
I made a query for it:
SELECT a.*,
(SELECT b.Grade FROM DemoTable b WHERE a.State = b.State
ORDER BY RAND() * -b.Weight LIMIT 1) AS 'random_val' FROM DemoTable a;
and it worked with the result:
Name
State
Grade
Weight
random_val
John
NY
100
1
80
Liam
NY
90
2
80
Olivia
NY
90
3
80
Emma
NY
80
4
90
James
CA
10
1
20
Henry
CA
20
1
10
Mia
NJ
50
1
30
Ava
NJ
30
4
30
Though, I would like to know if there is any other method like join or union instead of using order by rand() alone.
Is there any other way to modify my MySQL query that gives the same result?
I've searched for solving this problem all day, but couldn't find the proper way to do so; and that's why I asked here for the aid.
I would sincerely appreciate if I could get some advice.

My first attempt using analytic functions, though I suspect yours is faster over larger datasets...
WITH
ranged AS
(
SELECT
*,
SUM(weight) OVER (PARTITION BY state ORDER BY id) - weight AS weight_range_lower,
SUM(weight) OVER (PARTITION BY state ORDER BY id) AS weight_range_upper,
SUM(weight) OVER (PARTITION BY state ) * rand() AS rand_threshold
FROM
DemoTable
)
SELECT
ranged.*,
lookup.grade AS random_grade
FROM
ranged
INNER JOIN
ranged AS lookup
ON lookup.state = ranged.state
AND lookup.weight_range_lower <= ranged.rand_threshold
AND lookup.weight_range_upper > ranged.rand_threshold
ORDER BY
ranged.id
Or, if you want all members of the same state to be given the same random_grade...
SELECT
*,
FIRST_VALUE(grade) OVER (PARTITION BY state ORDER BY weight * rand() DESC)
FROM
DemoTable
ORDER BY
id
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=133f9e86b013a477ac342d0295132dd5

Finding the maximum in MYSQL but what if maximum value resides in two rows

I am trying to get the maximum area for each id in the below table:
id name area
1001 Land AA 0.55
1001 Land AB 0.55
1001 Land AC 0.25
1001 Land AD 0.1
1002 Land BA 1
1002 Land BB 0.8
1002 Land BC 0.4
1003 Land CA 0.65
1003 Land CB 0.22
1003 Land CC 0.22
But what if the data contains values that has two or more maximum? For example, 1001 has 2 rows that has its maximum value. When I use this query: SELECT id, name, max(area) FROM land GROUP BY id
id name max(area)
1001 Land AA 0.55
1002 Land BA 1
1003 Land CA 0.65
The desired result:
id name area
1001 Land AA 0.55
1001 Land AB 0.55
1002 Land BA 1
1003 Land CA 0.65
Thanks in advance. Please tell me if this question has a duplicate, I can't search for it because I don't know how to explain it in words or what search query to use. Thanks again.

You need to do this with a join or other condition. Here is a typical approach:
select l.*
from land l join
(select id, max(area) as maxarea
from land l
group by id
) lid
on l.id = lid.id and l.area = lid.maxarea;
Another approach which is sometimes more efficient is to use not exists:
select l.*
from land l
where not exists (select 1 from land l2 where l2.id = l.id and l2.area > l.area);
That is, get all rows from land where there is no corresponding row with the same id and a bigger area.
Finally, your query:
SELECT id, name, max(area)
FROM land
GROUP BY id;
Does not do what you think it does. name doesn't come from the row with a maximum id. It comes from an arbitrary (technically "indeterminate") row for each id. This uses a group by extension, which is documented here. In other databases, this would generate a syntax error. I would advise you to avoid using this extension until you understand what it is really doing. That is, be sure that all the columns not using aggregation functions in an aggregation query are also in the group by (in your query, name is not in the group by).

How can I write a query as a value in a row?

I have two tables on MySQL (using phpMyAdmin), looking like the following:
Table 1:
Country Total Minutes
USA 100
USA 90
Canada 60
Mexico 80
UK 90
France 70
France 10
Germany 10
In Table 2, what I need to do is the following:
Region Total Minutes
North America USA+USA+Canada+Mexico Mins
Europe UK+France+France+Germany Mins
Is there a way to have a row be the result of a query?

You either need a region column in table 1:
SELECT region, SUM(`Total Minutes`)
FROM timespent
GROUP BY region;
Or a separate region <-> country table:
SELECT region, SUM(`Total Minutes`)
FROM myregions r
INNER JOIN timespent t USING (country)
GROUP BY r.region;
The regions table would look like this:
region | country
--------------+--------
North America | USA
North America | Mexico
If you can't change anything in your database, look at Andomar's solution :)

You could translate the countries to regions in a subquery. The outer query can then group by on region:
select Region
, sum(TotalMinutes) as TotalMinutes
from (
select case country
when 'USA' then 'North America'
when 'France' then 'Europe'
end as Region
, TotalMinutes
from YourTable
) as SubQueryAlias
group by
Region

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Correlated subqueries using SQL - mysql

I just started coding using sql. This is an example of correlated- subquery I an struggling with; Can anyone please explain in layman terms whats going on here: SELECT id FROM flights AS f WHERE distance > (SELECT AVG(distance) FROM flights WHERE carrier = f.carrier);

Your query is semantically identical to the following, which (personally) I find easier to read. (I suspect it's fractionally faster too)... SELECT id FROM flights g JOIN ( SELECT carrier , AVG(distance) avg_distance GROUP BY carrier ) x ON x.carrier = f.carrier AND x.distance > avg_distance;

Related

How do I return a subquery more than 1 rows using having?

How to avoid Group By working on every output?

MySQL modifying order by rand() to other methods

Finding the maximum in MYSQL but what if maximum value resides in two rows

How can I write a query as a value in a row?

Categories

Resources