SQL Query: Joining on a SUM()

SQL Query: Joining on a SUM() - mysql

I'm trying to run a query that sums the value of items and then JOIN on the value of that SUM.
So in the below code, the Contract_For is what I'm trying to Join on, but I'm not sure if that's possible.
SELECT `items_value`.`ContractId` as `Contract`,
`items_value`.`site` as `SiteID`,
SUM(`items_value`.`value`) as `Contract_For`,
`contractitemlists`.`Text` as `Contracted_Text`
FROM items_value
LEFT JOIN contractitemlists ON (`items_value`.`Contract_For`) = `contractitemlists`.`Ref`;
WHERE `items_value`.`ContractID`='2';
When I've face similar issues in the past, I've just created a view that holds the SUM, then joined to that in another view.
At the moment, the above sample is meant to work for just one dummy value, but it's intended to be stored procedure, where the user selects the ContractID. The error I get at the moment is 'Unknown Column items_value.Contract_For

You cannot use aliases or aggregate using expressions from the SELECT clause anywhere but HAVING and ORDER BY*; you need to make the first "part" a subquery, and then JOIN to that.
It might be easier to understand, though a bit oversimplified and not precisely correct, if you look at it this way as far as order of evaluation goes...
FROM (Note: JOIN is only within a FROM)
WHERE
GROUP BY
SELECT
HAVING
ORDER BY
In actual implementation, "under the hood", most SQL implementations actually use information from each section to optimize other sections (like using some where conditions to reduce records JOINed in a FROM); but this is the conceptual order that must be adhered to.
*In some versions of MSSQL, you cannot use aliases from the SELECT in HAVING or ORDER BY either.
Your query needs to be something like this:
SELECT s.*
, `cil`.`Text` as `Contracted_Text`
FROM (
SELECT `iv`.`ContractId` as `Contract`
, `iv`.`site` as `SiteID`
, SUM(`iv`.`value`) as `Contract_For`
FROM items_value AS iv
WHERE `iv`.`ContractID`='2'
) AS s
LEFT JOIN contractitemlists AS cil ON `s`.`Contract_For` = cil.`Ref`
;
But as others have mentioned, the lack of a GROUP BY is something to be looked into; as in "what if there are multiple site values."

Related

Why do we need nested Select statement?

SELECT A.country
,A.market
,SUM(COALESCE(A.sales_value,0)) AS local_currency_sales
FROM
(
SELECT country
,market
,local_currency_sales
FROM TABLE
) A
GROUP BY A.country
,A.market
The above is the pseudo code I am referring to. I am new to SQL and would like to ask the reason that if there is a need to have a nested Select like this above? I tried to remove the nested Selected and it throws an error: country must appear in the Group By Clause. What I would like to know is that intuitively, the below should work even with Group by
SELECT A.country
,A.market
,SUM(COALESCE(A.sales_value,0)) AS local_currency_sales
FROM TABLE A
GROUP BY A.country
,A.market

Your original post does not need a sub query at all, in fact the first query will not execute because it is trying to perform a SUM over a column that is not defined. The following adaptation might help explain how this works:
SELECT A.CName
,A.M
,SUM(A.Sales) AS local_currency_sales
FROM
(
SELECT country as CName
,market as M
,COALESCE(A.sales_value,0) as Sales
FROM TABLE
) A
GROUP BY A.CName
,A.M
The use of Sub-Queries is usually to make the query easier to read or maintain. In this example you can see that the nested query evaluates the COALESCE function and has aliased the column names.
It is important to note that the outer query can only access the columns that are returned from the inner query AND that they can only be accessed by aliases that have been assigned.
In advanced scenarios you might use nested queries to manually force query optimisations, however most database engines will have very good query optimisations by default, so it is important to recognise that nested queries can easily get in the way of standard optimisations and if done poorly may negatively affect performance.
There are of course some types of expressions that cannot be used in Order By, and Group By clauses. When you come across these conditions it might be neccessary to nest the query so that you can group or sort by the results of those evaluations.
Window functions is a prime example of this requirement.
The specific queries where this becomes important are different for each RDBMS, and they each have their own workarounds or alternate implementations that may be more efficient than using a sub-query at all. The specifics are outside the scope of this post.
The simple form of your query that you have posted is all you need in this case:
SELECT A.country
,A.market
,SUM(COALESCE(A.sales_value,0)) AS local_currency_sales
FROM TABLE A
GROUP BY A.country
,A.market

Please breakdown this MySQL statement for me

I came up with this solution in my class by piecing together Internet knowledge. Please break this down for me I would love to know how I made it work. Specifically the t.s and the closing t.
SELECT
CourseType,
GPA,
NumberOfStudents * 100 / t.s AS `Percentage of Students`
FROM View1
CROSS JOIN
(
SELECT
SUM(NumberOfStudents) AS s
FROM View1) t;

Your query uses a subquery. A sub-query is a query that is done within another query. In your case, your subquery is:
(
SELECT
SUM(NumberOfStudents) AS s
FROM View1)
When you create subqueries, you need to give them an alias. An alias is just a name you give a subquery, so you can use it in the main query.
In your example, you named your subquery "t".
Fields can also have aliases. in your subquery, you created a field SUM(NumberOfStudents), and you named it s.
Going back to your question, you use the aliases to address fields inside the subquery. in your case, when you do 100 / t.s you are basically saying:
"I want to divide 100 by the field s from my subquery t".
The other concept that is important in your query is the Cross join. A cross join is the Cartesian product of two tables.
You can find a great and intuitive explanation of how a cross join works in the following link:
https://www.sqlshack.com/sql-cross-join-with-examples/#:~:text=The%20CROSS%20JOIN%20is%20used,also%20known%20as%20cartesian%20join.&text=The%20main%20idea%20of%20the,product%20of%20the%20joined%20tables.
I this case, the use is simpler than that. your subquery should return only one value, which is the sum of all students. And since a cross join basically pairs every row of one table with every row from the other, your cross join just provides a way to use the number of students as a constant value for the calculation of the percentage of students in the main query.

A better way to do this uses window functions:
SELECT v.CourseType, v.GPA,
v.NumberOfStudents * 100 / SUM(v.NumberOfStudents) OVER () AS Percentage_of_Students
FROM View1 v;
If you are learning SQL, you might as well learn the correct way to express logic.
Notes:
Use meaningful table aliases (abbreviations for the table/view names).
Qualify column references. This is less important in a query with only one table reference, but it is a good habit.
Window functions allow you to summarize data across multiple rows, without using an explicit JOIN.

Getting different results from group by and distinct

this is my first post here since most of the time I already found a suitable solution :)
However this time nothing seems to help properly.
Im trying to migrate information from some mysql Database I have just read-only access to.
My problem is similar to this one: Group by doesn't give me the newest group
I also need to get the latest information out of some tables but my tables have >300k entries therefore checking whether the "time-attribute-value" is the same as in the subquery (like suggested in the first answer) would be too slow (once I did "... WHERE EXISTS ..." and the server hung up).
In addition to that I can hardly find the important information (e.g. time) in a single attribute and there never is a single primary key.Until now I did it like it was suggested in the second answer by joining with subquery that contains latest "time-attribute-entry" and some primary keys but that gets me in a huge mess after using multiple joins and unions with the results.
Therefore I would prefer using the having statement like here: Select entry with maximum value of column after grouping
But when I tried it out and looked for a good candidate as the "time-attribute" I noticed that this queries give me two different results (more = 39721, less = 37870)
SELECT COUNT(MATNR) AS MORE
FROM(
SELECT DISTINCT
LAB_MTKNR AS MATNR,
LAB_STG AS FACH,
LAB_STGNR AS STUDIENGANG
FROM
FKT_LAB
) AS TEMP1
SELECT COUNT(MATNR) AS LESS
FROM(
SELECT
LAB_MTKNR AS MATNR,
LAB_STG AS FACH,
LAB_STGNR AS STUDIENGANG,
LAB_PDATUM
FROM
FKT_LAB
GROUP BY
LAB_MTKNR,
LAB_STG,
LAB_STGNR
HAVING LAB_PDATUM = MAX(LAB_PDATUM)
)AS TEMP2
Although both are applied to the same table and use "GROUP BY" / "SELECT DISTINCT" on the same entries.
Any ideas?
If nothing helps and I have to go back to my mess I will use string variables as placeholders to tidy it up but then I lose the overview of how many subqueries, joins and unions I have in one query... how many temproal tables will the server be able to cope with?

Your second query is not doing what you expect it to be doing. This is the query:
SELECT COUNT(MATNR) AS LESS
FROM (SELECT LAB_MTKNR AS MATNR, LAB_STG AS FACH, LAB_STGNR AS STUDIENGANG, LAB_PDATUM
FROM FKT_LAB
GROUP BY LAB_MTKNR, LAB_STG, LAB_STGNR
HAVING LAB_PDATUM = MAX(LAB_PDATUM)
) TEMP2;
The problem is the having clause. You are mixing an unaggregated column (LAB_PDATUM) with an aggregated value (MAX(LAB_PDATAUM)). What MySQL does is choose an arbitrary value for the column and compare it to the max.
Often, the arbitrary value will not be the maximum value, so the rows get filtered. The reference you give (although an accepted answer) is incorrect. I have put a comment there.
If you want the most recent value, here is a relatively easy way:
SELECT COUNT(MATNR) AS LESS
FROM (SELECT LAB_MTKNR AS MATNR, LAB_STG AS FACH, LAB_STGNR AS STUDIENGANG,
max(LAB_PDATUM) as maxLAB_PDATUM
FROM FKT_LAB
GROUP BY LAB_MTKNR, LAB_STG, LAB_STGNR
) TEMP2;
It does not, however, affect the outer count.

WHY don't aggregate functions work, unless using GROUP BY statement?

To calculate the price of invoices (that have *invoice item*s in a separate table and linked to the invoices), I had written this query:
SELECT `i`.`id`, SUM(ii.unit_price * ii.quantity) invoice_price
FROM (`invoice` i)
JOIN `invoiceitem` ii
ON `ii`.`invoice_id` = `i`.`id`
WHERE `i`.`user_id` = '$user_id'
But it only resulted ONE row.
After research, I got that I had to have GROUP BY i.id at the end of the query. With this, the results were as expected.
From my opinion, even without GROUP BY i.id, nothing is lost and it should work well!
Please in some simple sentences tell me...
Why should I always use the additional!!! GROUP BY i.id, What is lost without it, and maybe as the most functioning question, How should I remember that I have lost the additional GROUP BY?!

You have to include the group by because there are many IDs that went into the sum. If you don't specify it then MySQL just picks the first one, and sums across the entire result set. GroupBy tells MySQL to sum (or generically aggregate) for each Grouped By Entity.

Why should I always use GROUP BY?
SUM() and others are Aggregate Functions. Their very nature requires that they be used in combination with GROUP BY.
What is lost without it?
From the documentation:
If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.
In the end, there is nothing to remember, as these are GROUP BY aggregate functions. You will quickly tell from the result that you have forgotten GROUP BY when the result includes the entire result set (incorrectly), instead of your grouped subsets.

mysql max query then JOIN?

I have followed the tutorial over at tizag for the MAX() mysql function and have written the query below, which does exactly what I need. The only trouble is I need to JOIN it to two more tables so I can work with all the rows I need.
$query = "SELECT idproducts, MAX(date) FROM results GROUP BY idproducts ORDER BY MAX(date) DESC";
I have this query below, which has the JOIN I need and works:
$query = ("SELECT *
FROM operators
JOIN products
ON operators.idoperators = products.idoperator JOIN results
ON products.idProducts = results.idproducts
ORDER BY drawndate DESC
LIMIT 20");
Could someone show me how to merge the top query with the JOIN element from my second query? I am new to php and mysql, this being my first adventure into a computer language I have read and tried real hard to get those two queries to work, but I am at a brick wall. I cannot work out how to add the JOIN element to the first query :(
Could some kind person take pity on a newb and help me?

Try this query.
SELECT
*
FROM
operators
JOIN products
ON operators.idoperators = products.idoperator
JOIN
(
SELECT
idproducts,
MAX(date)
FROM results
GROUP BY idproducts
) AS t
ON products.idproducts = t.idproducts
ORDER BY drawndate DESC
LIMIT 20

JOINs function somewhat independently of aggregation functions, they just change the intermediate result-set upon which the aggregate functions operate. I like to point to the way the MySQL documentation is written, which hints uses the term 'table_reference' in the SELECT syntax, and expands on what that means in JOIN syntax. Basically, any simple query which has a table specified can simply expand that table to a complete JOIN clause and the query will operate the same basic way, just with a modified intermediate result-set.
I say "intermediate result-set" to hint at the mindset which helped me understand JOINS and aggregation. Understanding the order in which MySQL builds your final result is critical to knowing how to reliably get the results you want. Generally, it starts by looking at the first row of the first table you specify after 'FROM', and decides if it might match by looking at 'WHERE' clauses. If it is not immediately discardable, it attempts to JOIN that row to the first JOIN specified, and repeats the "will this be discarded by WHERE?". This repeats for all JOINs, which either add rows to your results set, or remove them, or leaves just the one, as appropriate for your JOINs, WHEREs and data. This process builds what I am referring to when I say "intermediate result-set". Somewhere between starting and finishing your complete query, MySQL has in it's memory a potentially massive table-like structure of data which it built using the process I just described. Only then does it begin to aggregate (GROUP) the results according to your criteria.
So for your query, it depends on what specifically you are going for (not entirely clear in OP). If you simply want the MAX(date) from the second query, you can simply add that expression to the SELECT clause and then add an aggregation spec to the end:
SELECT *, MAX(date)
FROM operators
...
GROUP BY idproducts
ORDER BY ...
Alternatively, you can add the JOIN section of the second query to the first.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

SQL Query: Joining on a SUM() - mysql

Related

Why do we need nested Select statement?

Please breakdown this MySQL statement for me

Getting different results from group by and distinct

WHY don't aggregate functions work, unless using GROUP BY statement?

mysql max query then JOIN?

Categories

Resources