Why do we need nested Select statement? - mysql

SELECT A.country
,A.market
,SUM(COALESCE(A.sales_value,0)) AS local_currency_sales
FROM
(
SELECT country
,market
,local_currency_sales
FROM TABLE
) A
GROUP BY A.country
,A.market
The above is the pseudo code I am referring to. I am new to SQL and would like to ask the reason that if there is a need to have a nested Select like this above? I tried to remove the nested Selected and it throws an error: country must appear in the Group By Clause. What I would like to know is that intuitively, the below should work even with Group by
SELECT A.country
,A.market
,SUM(COALESCE(A.sales_value,0)) AS local_currency_sales
FROM TABLE A
GROUP BY A.country
,A.market

Your original post does not need a sub query at all, in fact the first query will not execute because it is trying to perform a SUM over a column that is not defined. The following adaptation might help explain how this works:
SELECT A.CName
,A.M
,SUM(A.Sales) AS local_currency_sales
FROM
(
SELECT country as CName
,market as M
,COALESCE(A.sales_value,0) as Sales
FROM TABLE
) A
GROUP BY A.CName
,A.M
The use of Sub-Queries is usually to make the query easier to read or maintain. In this example you can see that the nested query evaluates the COALESCE function and has aliased the column names.
It is important to note that the outer query can only access the columns that are returned from the inner query AND that they can only be accessed by aliases that have been assigned.
In advanced scenarios you might use nested queries to manually force query optimisations, however most database engines will have very good query optimisations by default, so it is important to recognise that nested queries can easily get in the way of standard optimisations and if done poorly may negatively affect performance.
There are of course some types of expressions that cannot be used in Order By, and Group By clauses. When you come across these conditions it might be neccessary to nest the query so that you can group or sort by the results of those evaluations.
Window functions is a prime example of this requirement.
The specific queries where this becomes important are different for each RDBMS, and they each have their own workarounds or alternate implementations that may be more efficient than using a sub-query at all. The specifics are outside the scope of this post.
The simple form of your query that you have posted is all you need in this case:
SELECT A.country
,A.market
,SUM(COALESCE(A.sales_value,0)) AS local_currency_sales
FROM TABLE A
GROUP BY A.country
,A.market

Related

what is the mean of this sql grammar: having 1?

My requirements are: I now have a table, I need to group according to one of the fields, and get the latest record in the group, and then I search the scheme on the Internet,
SELECT
* FROM(
SELECT
*
FROM
record r
WHERE
r.id in (xx,xx,xx) HAVING 1
ORDER BY
r.time DESC
) a
GROUP BY
a.id
, the result is correct, but I can't understand the meaning of "having 1" after the where statement. I hope a friend can give me an answer. Thank you very much.
It does nothing, just like having true would. Presumably it is a placeholder where sometimes additional conditions are applied? But since there is no group by or use of aggregate functions in the subquery, any having conditions are going to be treated no differently than where conditions.
Normally you select rows and apply where conditions, then any grouping (explicit, or implicit as in select count(*)) occurs, and the having clause can specify further constraints after the grouping.
Note that your query is not guaranteed to give the results you want; the order by in the subquery in theory has no effect on the outer query and the optimizer may skip it. It is possible the presence of having makes a difference to the optimizer, but that is not something you should rely on, certainly from one version of mysql to another.

Please breakdown this MySQL statement for me

I came up with this solution in my class by piecing together Internet knowledge. Please break this down for me I would love to know how I made it work. Specifically the t.s and the closing t.
SELECT
CourseType,
GPA,
NumberOfStudents * 100 / t.s AS `Percentage of Students`
FROM View1
CROSS JOIN
(
SELECT
SUM(NumberOfStudents) AS s
FROM View1) t;
Your query uses a subquery. A sub-query is a query that is done within another query. In your case, your subquery is:
(
SELECT
SUM(NumberOfStudents) AS s
FROM View1)
When you create subqueries, you need to give them an alias. An alias is just a name you give a subquery, so you can use it in the main query.
In your example, you named your subquery "t".
Fields can also have aliases. in your subquery, you created a field SUM(NumberOfStudents), and you named it s.
Going back to your question, you use the aliases to address fields inside the subquery. in your case, when you do 100 / t.s you are basically saying:
"I want to divide 100 by the field s from my subquery t".
The other concept that is important in your query is the Cross join. A cross join is the Cartesian product of two tables.
You can find a great and intuitive explanation of how a cross join works in the following link:
https://www.sqlshack.com/sql-cross-join-with-examples/#:~:text=The%20CROSS%20JOIN%20is%20used,also%20known%20as%20cartesian%20join.&text=The%20main%20idea%20of%20the,product%20of%20the%20joined%20tables.
I this case, the use is simpler than that. your subquery should return only one value, which is the sum of all students. And since a cross join basically pairs every row of one table with every row from the other, your cross join just provides a way to use the number of students as a constant value for the calculation of the percentage of students in the main query.
A better way to do this uses window functions:
SELECT v.CourseType, v.GPA,
v.NumberOfStudents * 100 / SUM(v.NumberOfStudents) OVER () AS Percentage_of_Students
FROM View1 v;
If you are learning SQL, you might as well learn the correct way to express logic.
Notes:
Use meaningful table aliases (abbreviations for the table/view names).
Qualify column references. This is less important in a query with only one table reference, but it is a good habit.
Window functions allow you to summarize data across multiple rows, without using an explicit JOIN.

SQL Query: Joining on a SUM()

I'm trying to run a query that sums the value of items and then JOIN on the value of that SUM.
So in the below code, the Contract_For is what I'm trying to Join on, but I'm not sure if that's possible.
SELECT `items_value`.`ContractId` as `Contract`,
`items_value`.`site` as `SiteID`,
SUM(`items_value`.`value`) as `Contract_For`,
`contractitemlists`.`Text` as `Contracted_Text`
FROM items_value
LEFT JOIN contractitemlists ON (`items_value`.`Contract_For`) = `contractitemlists`.`Ref`;
WHERE `items_value`.`ContractID`='2';
When I've face similar issues in the past, I've just created a view that holds the SUM, then joined to that in another view.
At the moment, the above sample is meant to work for just one dummy value, but it's intended to be stored procedure, where the user selects the ContractID. The error I get at the moment is 'Unknown Column items_value.Contract_For
You cannot use aliases or aggregate using expressions from the SELECT clause anywhere but HAVING and ORDER BY*; you need to make the first "part" a subquery, and then JOIN to that.
It might be easier to understand, though a bit oversimplified and not precisely correct, if you look at it this way as far as order of evaluation goes...
FROM (Note: JOIN is only within a FROM)
WHERE
GROUP BY
SELECT
HAVING
ORDER BY
In actual implementation, "under the hood", most SQL implementations actually use information from each section to optimize other sections (like using some where conditions to reduce records JOINed in a FROM); but this is the conceptual order that must be adhered to.
*In some versions of MSSQL, you cannot use aliases from the SELECT in HAVING or ORDER BY either.
Your query needs to be something like this:
SELECT s.*
, `cil`.`Text` as `Contracted_Text`
FROM (
SELECT `iv`.`ContractId` as `Contract`
, `iv`.`site` as `SiteID`
, SUM(`iv`.`value`) as `Contract_For`
FROM items_value AS iv
WHERE `iv`.`ContractID`='2'
) AS s
LEFT JOIN contractitemlists AS cil ON `s`.`Contract_For` = cil.`Ref`
;
But as others have mentioned, the lack of a GROUP BY is something to be looked into; as in "what if there are multiple site values."

How to avoid a 'where' clause affecting row ordering?

I have a case where I do a select from another select and the order of the returned rows is changed if I add a where clause.
Example:
SELECT t.id
FROM (
SELECT t.id
FROM table1 t
ORDER BY
t.viewsTotal ASC
LIMIT 20
OFFSET 0
) base
INNER JOIN table1 t ON base.id = t.id
LEFT JOIN table2 t2 ON t2.id = t1.secondTableId
# WHERE t2.someBoolColumn = FALSE
;
Now, the order is the same for the inner select and the outer select, but if I uncomment the where condition, the outer select will change the ordering.
How can I prevent this from happening?
Lets assume the following for a given example:
I can not do one select.
I do not know what order has been applied to an inner select when doing an outer select. So, if I order from a joined table, I wouldn't know that I need to join it here.
More info on my use case
There is a query builder that provides inner select, and it may apply order by a third table that is joined to that inner select, if i would like to apply the same order i would need to know what tables were joined, and in the case of this poor query builder i do not have that knowledge
tl;dr If you want a particular order in your result set, use ORDER BY.
The ordering of rows in a result set from any RDMS server without an ORDER BY clause is formally unpredictable. Unpredictable is like random, except worse. Random ordering implies you'll get your rows in a different order every time you run the query. Truly random ordering, if it existed, would make it hard for simple unit tests to pass when your assumptions about ordering fail.
Unpredictable means you'll get them in the same order, until you don't. That means your unit tests will pass, and your system tests will pass, and your system will fail six months into production, if it depends on result set ordering.
Why is this so? A server's query planner is free to use any algorithm at its disposal to satisfy the queries you give it. These algorithms work differently for different types of table and different sizes of table. If you don't constrain the query planner by specifying the result set ordering, it may pick some algorithm that gives an ordering that appears strange to you the programmer.
Query planners have, literally, thousands of programmer years' worth of optimizations built in to them.
For people used to the procedural ways of thinking encouraged by all kinds of programming languages, it's sometimes hard to switch your thinking to the declarative / descriptive mode used by SQL. With SQL (at least clean SQL without stuff like SELECT #a := #a+1 and other hacks) you're simply describing the result set you want. The server generates results matching your specification.
I would suggest you not rely on the implicit ordering produced my SQL (because there is no implicit ordering as per Bohemian's comment). Rather, you should use an ORDER BY statement and select one of your columns in the query by which you should order your results. That way you can ensure that the results are always presented in the same way regardless of the WHERE clauses.

mysql max query then JOIN?

I have followed the tutorial over at tizag for the MAX() mysql function and have written the query below, which does exactly what I need. The only trouble is I need to JOIN it to two more tables so I can work with all the rows I need.
$query = "SELECT idproducts, MAX(date) FROM results GROUP BY idproducts ORDER BY MAX(date) DESC";
I have this query below, which has the JOIN I need and works:
$query = ("SELECT *
FROM operators
JOIN products
ON operators.idoperators = products.idoperator JOIN results
ON products.idProducts = results.idproducts
ORDER BY drawndate DESC
LIMIT 20");
Could someone show me how to merge the top query with the JOIN element from my second query? I am new to php and mysql, this being my first adventure into a computer language I have read and tried real hard to get those two queries to work, but I am at a brick wall. I cannot work out how to add the JOIN element to the first query :(
Could some kind person take pity on a newb and help me?
Try this query.
SELECT
*
FROM
operators
JOIN products
ON operators.idoperators = products.idoperator
JOIN
(
SELECT
idproducts,
MAX(date)
FROM results
GROUP BY idproducts
) AS t
ON products.idproducts = t.idproducts
ORDER BY drawndate DESC
LIMIT 20
JOINs function somewhat independently of aggregation functions, they just change the intermediate result-set upon which the aggregate functions operate. I like to point to the way the MySQL documentation is written, which hints uses the term 'table_reference' in the SELECT syntax, and expands on what that means in JOIN syntax. Basically, any simple query which has a table specified can simply expand that table to a complete JOIN clause and the query will operate the same basic way, just with a modified intermediate result-set.
I say "intermediate result-set" to hint at the mindset which helped me understand JOINS and aggregation. Understanding the order in which MySQL builds your final result is critical to knowing how to reliably get the results you want. Generally, it starts by looking at the first row of the first table you specify after 'FROM', and decides if it might match by looking at 'WHERE' clauses. If it is not immediately discardable, it attempts to JOIN that row to the first JOIN specified, and repeats the "will this be discarded by WHERE?". This repeats for all JOINs, which either add rows to your results set, or remove them, or leaves just the one, as appropriate for your JOINs, WHEREs and data. This process builds what I am referring to when I say "intermediate result-set". Somewhere between starting and finishing your complete query, MySQL has in it's memory a potentially massive table-like structure of data which it built using the process I just described. Only then does it begin to aggregate (GROUP) the results according to your criteria.
So for your query, it depends on what specifically you are going for (not entirely clear in OP). If you simply want the MAX(date) from the second query, you can simply add that expression to the SELECT clause and then add an aggregation spec to the end:
SELECT *, MAX(date)
FROM operators
...
GROUP BY idproducts
ORDER BY ...
Alternatively, you can add the JOIN section of the second query to the first.