LINQ to SQL does not generate ORDER BY when DISTINCT is used?

LINQ to SQL does not generate ORDER BY when DISTINCT is used? - linq-to-sql

The following basic LINQ to SQL statement does not result in the orderby working. As you can see in the T-SQL there is no orderby. Do you know why?
LINQ to SQL:
var results = (from stats in db.t_harvest_statistics
orderby stats.unit_number
select stats.unit_number).Distinct().ToList();
Above Results in following TSQL
SELECT
[Distinct1].[unit_number] AS [unit_number]
FROM ( SELECT DISTINCT
[Extent1].[unit_number] AS [unit_number]
FROM [dbo].[t_harvest_statistics] AS [Extent1]
) AS [Distinct1]

That is a limitation of SQL and Relational Algebra of where the ORDER BY is in relation to the DISTINCT.
The ORDER BY must be "further out" in the SQL (at the "top level") since it's a view operation. While one can write SQL that has the ORDER BY "further in", in relationship to a RA operation, it often results in Undefined Behavior (that sometimes works). In this light it makes sense that Linq2Sql is free to ignore the ORDER BY although, perhaps an exception would be better... it would be less subtle anyway ;-) (Actually, this same issue exists for any Linq provider that does not provide a "stricter" definition of Distinct.)
Remove the Distinct() and the Linq2Sql should once again generate the ORDER BY as expected. The solution is just to switch the order of operations so the ORDER BY is once again at the "top level".
This is covered by the article Use of Distinct and OrderBy in LINQ:
This behavior might appear strange. The problem is that the Distinct operator does not grant that it will maintain the original order of values. Applied to LINQ to SQL, this mean that a sort constraint can be ignored in the case of a query like queryA.
The solution is pretty s[i]mple: put the OrderBy operator after the Distinct one, like in the following queryB definition:
var queryB =
(from o in db.Orders
select o.Employee.LastName)
.Distinct().OrderBy( n => n );
Happy coding.

I got same issue while sorting year from transaction table.
try this
var results = (from stats in db.t_harvest_statistics
select stats.unit_number).Distinct().OrderBy(x =(Int16)x.unit_number).ToList();
after getting distinct value use orderby method

Related

Using "dynamic" variables in a MySQL query [duplicate]

I have a doubt and question regarding alias in sql. If i want to use the alias in same query can i use it. For eg:
Consider Table name xyz with column a and b
select (a/b) as temp , temp/5 from xyz
Is this possible in some way ?

You are talking about giving an identifier to an expression in a query and then reusing that identifier in other parts of the query?
That is not possible in Microsoft SQL Server which nearly all of my SQL experience is limited to. But you can however do the following.
SELECT temp, temp / 5
FROM (
SELECT (a/b) AS temp
FROM xyz
) AS T1
Obviously that example isn't particularly useful, but if you were using the expression in several places it may be more useful. It can come in handy when the expressions are long and you want to group on them too because the GROUP BY clause requires you to re-state the expression.
In MSSQL you also have the option of creating computed columns which are specified in the table schema and not in the query.

You can use Oracle with statement too. There are similar statements available in other DBs too. Here is the one we use for Oracle.
with t
as (select a/b as temp
from xyz)
select temp, temp/5
from t
/
This has a performance advantage, particularly if you have a complex queries involving several nested queries, because the WITH statement is evaluated only once and used in subsequent statements.

Not possible in the same SELECT clause, assuming your SQL product is compliant with entry level Standard SQL-92.
Expressions (and their correlation names) in the SELECT clause come into existence 'all at once'; there is no left-to-right evaluation that you seem to hope for.
As per #Josh Einstein's answer here, you can use a derived table as a workaround (hopefully using a more meaningful name than 'temp' and providing one for the temp/5 expression -- have in mind the person who will inherit your code).
Note that code you posted would work on the MS Access Database Engine (and would assign a meaningless correlation name such as Expr1 to your second expression) but then again it is not a real SQL product.

Its possible I guess:
SELECT (A/B) as temp, (temp/5)
FROM xyz,
(SELECT numerator_field as A, Denominator_field as B FROM xyz),
(SELECT (numerator_field/denominator_field) as temp FROM xyz);

This is now available in Amazon Redshift
E.g.
select clicks / impressions as probability, round(100 * probability, 1) as percentage from raw_data;
Ref:
https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-lateral-column-alias-reference/

You might find W3Schools "SQL Alias" to be of good help.
Here is an example from their tutorial:
SELECT po.OrderID, p.LastName, p.FirstName
FROM Persons AS p,
Product_Orders AS po
WHERE p.LastName='Hansen' AND p.FirstName='Ola'
Regarding using the Alias further in the query, depending on the database you are using it might be possible.

What is the order of execution of "SELECT" To "ORDER BY"? [duplicate]

I am confused with the order of execution of this query, please explain me this.
I am confused with when the join is applied, function is called, a new column is added with the Case and when the serial number is added. Please explain the order of execution of all this.
select Row_number() OVER(ORDER BY (SELECT 1)) AS 'Serial Number',
EP.FirstName,Ep.LastName,[dbo].[GetBookingRoleName](ES.UserId,EP.BookingRole) as RoleName,
(select top 1 convert(varchar(10),eventDate,103)from [3rdi_EventDates] where EventId=13) as EventDate,
(CASE [dbo].[GetBookingRoleName](ES.UserId,EP.BookingRole)
WHEN '90 Day Client' THEN 'DC'
WHEN 'Association Client' THEN 'DC'
WHEN 'Autism Whisperer' THEN 'DC'
WHEN 'CampII' THEN 'AD'
WHEN 'Captain' THEN 'AD'
WHEN 'Chiropractic Assistant' THEN 'AD'
WHEN 'Coaches' THEN 'AD'
END) as Category from [3rdi_EventParticipants] as EP
inner join [3rdi_EventSignup] as ES on EP.SignUpId = ES.SignUpId
where EP.EventId = 13
and userid in (
select distinct userid from userroles
--where roleid not in(6,7,61,64) and roleid not in(1,2))
where roleid not in(19, 20, 21, 22) and roleid not in(1,2))
This is the function which is called from the above query.
CREATE function [dbo].[GetBookingRoleName]
(
#UserId as integer,
#BookingId as integer
)
RETURNS varchar(20)
as
begin
declare #RoleName varchar(20)
if #BookingId = -1
Select Top 1 #RoleName=R.RoleName From UserRoles UR inner join Roles R on UR.RoleId=R.RoleId Where UR.UserId=#UserId and R.RoleId not in(1,2)
else
Select #RoleName= RoleName From Roles where RoleId = #BookingId
return #RoleName
end

Queries are generally processed in the follow order (SQL Server). I have no idea if other RDBMS's do it this way.
FROM [MyTable]
ON [MyCondition]
JOIN [MyJoinedTable]
WHERE [...]
GROUP BY [...]
HAVING [...]
SELECT [...]
ORDER BY [...]

SQL is a declarative language. The result of a query must be what you would get if you evaluated as follows (from Microsoft):
Logical Processing Order of the SELECT statement
The following steps show the logical
processing order, or binding order,
for a SELECT statement. This order
determines when the objects defined in
one step are made available to the
clauses in subsequent steps. For
example, if the query processor can
bind to (access) the tables or views
defined in the FROM clause, these
objects and their columns are made
available to all subsequent steps.
Conversely, because the SELECT clause
is step 8, any column aliases or
derived columns defined in that clause
cannot be referenced by preceding
clauses. However, they can be
referenced by subsequent clauses such
as the ORDER BY clause. Note that the
actual physical execution of the
statement is determined by the query
processor and the order may vary from
this list.
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
The optimizer is free to choose any order it feels appropriate to produce the best execution time. Given any SQL query, is basically impossible to anybody to pretend it knows the execution order. If you add detailed information about the schema involved (exact tables and indexes definition) and the estimated cardinalities (size of data and selectivity of keys) then one can take a guess at the probable execution order.
Ultimately, the only correct 'order' is the one described ion the actual execution plan. See Displaying Execution Plans by Using SQL Server Profiler Event Classes and Displaying Graphical Execution Plans (SQL Server Management Studio).
A completely different thing though is how do queries, subqueries and expressions project themselves into 'validity'. For instance if you have an aliased expression in the SELECT projection list, can you use the alias in the WHERE clause? Like this:
SELECT a+b as c
FROM t
WHERE c=...;
Is the use of c alias valid in the where clause? The answer is NO. Queries form a syntax tree, and a lower branch of the tree cannot be reference something defined higher in the tree. This is not necessarily an order of 'execution', is more of a syntax parsing issue. It is equivalent to writing this code in C#:
void Select (int a, int b)
{
if (c = ...) then {...}
int c = a+b;
}
Just as in C# this code won't compile because the variable c is used before is defined, the SELECT above won't compile properly because the alias c is referenced lower in the tree than is actually defined.
Unfortunately, unlike the well known rules of C/C# language parsing, the SQL rules of how the query tree is built are somehow esoteric. There is a brief mention of them in Single SQL Statement Processing but a detailed discussion of how they are created, and what order is valid and what not, I don't know of any source. I'm not saying there aren't good sources, I'm sure some of the good SQL books out there cover this topic.
Note that the syntax tree order does not match the visual order of the SQL text. For example the ORDER BY clause is usually the last in the SQL text, but as a syntax tree it sits above everything else (it sorts the output of the SELECT, so it sits above the SELECTed columns so to speak) and as such is is valid to reference the c alias:
SELECT a+b as c
FROM t
ORDER BY c;

SQL query is not imperative but declarative, so you have no idea which the statement is executed first, but since SQL is evaluated by SQL query engines, most of the SQL engines follows similar process to obtain the results. You may have to understand how the query engine works internally to understand some SQL execution behavior.
Julia Evens has a great post explaining this, it is worth to check it out:
https://jvns.ca/blog/2019/10/03/sql-queries-don-t-start-with-select/

SQL is a declarative language, meaning that it tells the SQL engine what to do, not how. This is in contrast to an imperative language such as C, in which how to do something is clearly laid out.
This means that not all statements will execute as expected. Of particular note are boolean expressions, which may not evaluate from left-to-right as written. For example, the following code is not guaranteed to execute without a divide by zero error:
SELECT 'null' WHERE 1 = 1 OR 1 / 0 = 0
The reason for this is the query optimizer chooses the best (most efficient) way to execute a statement. This means that, for example, a value may be loaded and filtered before a transforming predicate is applied, causing an error. See the second link above for an example
See: here and here.

"Order of execution" is probably a bad mental model for SQL queries. Its hard to actually write a single query that would actually depend on order of execution (this is a good thing). Instead you should think of all join and where clauses happening simultaneously (almost like a template)
That said you could run display the Execution Plans which should give you insight into it.
However since its's not clear why you want to know the order of execution, I'm guessing your trying to get a mental model for this query so you can fix it in some way. This is how I would "translate" your query, although I've done well with this kind of analysis there's some grey area with how precise it is.
FROM AND WHERE CLAUSE
Give me all the Event Participants rows. from [3rdi_EventParticipants
Also give me all the Event Signup rows that match the Event Participants rows on SignUpID inner join 3rdi_EventSignup] as ES on EP.SignUpId = ES.SignUpId
But Only for Event 13 EP.EventId = 13
And only if the user id has a record in the user roles table where the role id is not in 1,2,19,20,21,22
userid in (
select distinct userid from userroles
--where roleid not in(6,7,61,64) and roleid not in(1,2))
where roleid not in(19, 20, 21, 22) and roleid not in(1,2))
SELECT CLAUSE
For each of the rows give me a unique ID
Row_number() OVER(ORDER BY (SELECT 1)) AS 'Serial Number',
The participants First Name EP.FirstName
The participants Last Name Ep.LastName
The Booking Role name GetBookingRoleName
Go look in the Event Dates and find out what the first eventDate where the EventId = 13 that you find
(select top 1 convert(varchar(10),eventDate,103)from [3rdi_EventDates] where EventId=13) as EventDate
Finally translate the GetBookingRoleName in Category. I don't have a table for this so I'll map it manually (CASE [dbo].[GetBookingRoleName](ES.UserId,EP.BookingRole)
WHEN '90 Day Client' THEN 'DC'
WHEN 'Association Client' THEN 'DC'
WHEN 'Autism Whisperer' THEN 'DC'
WHEN 'CampII' THEN 'AD'
WHEN 'Captain' THEN 'AD'
WHEN 'Chiropractic Assistant' THEN 'AD'
WHEN 'Coaches' THEN 'AD'
END) as Category
So a couple of notes here. You're not ordering by anything when you select TOP. You should probably have na order by there. You could also just as easily put that in your from clause e.g.
from [3rdi_EventParticipants] as EP
inner join [3rdi_EventSignup] as ES on EP.SignUpId = ES.SignUpId,
(select top 1 convert(varchar(10),eventDate,103)
from [3rdi_EventDates] where EventId=13
Order by eventDate) dates

There is a logical order to evaluation of the query text, but the database engine can choose what order execute the query components based upon what is most optimal. The logical text parsing ordering is listed below. That is, for example, why you can't use an alias from SELECT clause in a WHERE clause. As far as the query parsing process is concerned, the alias doesn't exist yet.
FROM
ON
OUTER
WHERE
GROUP BY
CUBE | ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
See the Microsoft documentation (see "Logical Processing Order of the SELECT statement") for more information on this.

Simplified order for T-SQL -> SELECT statement:
1) FROM
2) Cartesian product
3) ON
4) Outer rows
5) WHERE
6) GROUP BY
7) HAVING
8) SELECT
9) Evaluation phase in SELECT
10) DISTINCT
11) ORDER BY
12) TOP
as I had done so far - same order was applicable in SQLite.
Source => SELECT (Transact-SQL)
... of course there are (rare) exceptions.

SQL Query: Joining on a SUM()

I'm trying to run a query that sums the value of items and then JOIN on the value of that SUM.
So in the below code, the Contract_For is what I'm trying to Join on, but I'm not sure if that's possible.
SELECT `items_value`.`ContractId` as `Contract`,
`items_value`.`site` as `SiteID`,
SUM(`items_value`.`value`) as `Contract_For`,
`contractitemlists`.`Text` as `Contracted_Text`
FROM items_value
LEFT JOIN contractitemlists ON (`items_value`.`Contract_For`) = `contractitemlists`.`Ref`;
WHERE `items_value`.`ContractID`='2';
When I've face similar issues in the past, I've just created a view that holds the SUM, then joined to that in another view.
At the moment, the above sample is meant to work for just one dummy value, but it's intended to be stored procedure, where the user selects the ContractID. The error I get at the moment is 'Unknown Column items_value.Contract_For

You cannot use aliases or aggregate using expressions from the SELECT clause anywhere but HAVING and ORDER BY*; you need to make the first "part" a subquery, and then JOIN to that.
It might be easier to understand, though a bit oversimplified and not precisely correct, if you look at it this way as far as order of evaluation goes...
FROM (Note: JOIN is only within a FROM)
WHERE
GROUP BY
SELECT
HAVING
ORDER BY
In actual implementation, "under the hood", most SQL implementations actually use information from each section to optimize other sections (like using some where conditions to reduce records JOINed in a FROM); but this is the conceptual order that must be adhered to.
*In some versions of MSSQL, you cannot use aliases from the SELECT in HAVING or ORDER BY either.
Your query needs to be something like this:
SELECT s.*
, `cil`.`Text` as `Contracted_Text`
FROM (
SELECT `iv`.`ContractId` as `Contract`
, `iv`.`site` as `SiteID`
, SUM(`iv`.`value`) as `Contract_For`
FROM items_value AS iv
WHERE `iv`.`ContractID`='2'
) AS s
LEFT JOIN contractitemlists AS cil ON `s`.`Contract_For` = cil.`Ref`
;
But as others have mentioned, the lack of a GROUP BY is something to be looked into; as in "what if there are multiple site values."

Sql syntax: select without from clause as subquery in select (subselect)

While editing some queries to add alternatives for columns without values, I accidentally wrote something like this (here is the simplyfied version):
SELECT id, (SELECT name) FROM t
To my surprise, MySQL didn't throw any error, but completed the query giving my expected results (the name column values).
I tried to find any documentation about it, but with no success.
Is this SQL standard or a MySQL specialty?
Can I be sure that the result of this syntax is really the column value from the same (outer) table? The extended version would be like this:
SELECT id, (SELECT name FROM t AS t1 where t1.id=t2.id) FROM t AS t2
but the EXPLAIN reports No tables used in the Extra column for the former version, which I think is very nice.
Here's a simple fiddle on SqlFiddle (it keeps timing out for me, I hope you have better luck).
Clarification: I know about subqueries, but I always wrote subqueries (correlated or not) that implied a table to select from, hence causing an additional step in the execution plan; my question is about this syntax and the result it gives, that in MySQL seems to return the expected value without any.

What you within your first query is a correlated subquery which simply returns the name column from the table t. no actual subquery needs to run here (which is what your EXPLAIN is telling you).
In a SQL database query, a correlated subquery (also known as a
synchronized subquery) is a subquery (a query nested inside another
query) that uses values from the outer query.
https://en.wikipedia.org/wiki/Correlated_subquery
SELECT id, (SELECT name) FROM t
is the same as
SELECT id, (SELECT t.name) FROM t
Your 2nd query
SELECT id, (SELECT name FROM t AS t1 where t1.id=t2.id) FROM t AS t2
Also contains correlated subquery but this one is actually running a query on table t to find records where t1.id = t2.id.

This is the default behavior for the SQL language and it is defined on the SQL ANSI 2011 over ISO/IEC 9075-1:2011(en) documentation. Unfortunately it is not open. This behavior is described on the section 4.11 SQL-Statements.
This behavior happens because the databases process the select comand without the from clause, therefore if it encounters:
select id, (select name) from some
It will try to find that name field as a column of the outer queries to process.
Fortunately I remember that some while ago I've answered someone here and find a valid available link to an SQL ANSI document that is online in FULL but it is for the SQL ANSI 99 and the section may not be the same one as the new document. I think, did not check, that it is around the section 4.30. Take a look. And I really recommend the reading (I did that back in the day).
Database Language SQL - ISO/IEC 9075-2:1999 (E)

It's not standard. In oracle,
select 1, (select 2)
from dual
Throws error, ORA-00923: FROM keyword not found where expected
How can you be sure of your results? Get a better understanding of what the query is supposed to acheive before you write it. Even the exetended version in the question does not make any sense.

mysql max query then JOIN?

I have followed the tutorial over at tizag for the MAX() mysql function and have written the query below, which does exactly what I need. The only trouble is I need to JOIN it to two more tables so I can work with all the rows I need.
$query = "SELECT idproducts, MAX(date) FROM results GROUP BY idproducts ORDER BY MAX(date) DESC";
I have this query below, which has the JOIN I need and works:
$query = ("SELECT *
FROM operators
JOIN products
ON operators.idoperators = products.idoperator JOIN results
ON products.idProducts = results.idproducts
ORDER BY drawndate DESC
LIMIT 20");
Could someone show me how to merge the top query with the JOIN element from my second query? I am new to php and mysql, this being my first adventure into a computer language I have read and tried real hard to get those two queries to work, but I am at a brick wall. I cannot work out how to add the JOIN element to the first query :(
Could some kind person take pity on a newb and help me?

Try this query.
SELECT
*
FROM
operators
JOIN products
ON operators.idoperators = products.idoperator
JOIN
(
SELECT
idproducts,
MAX(date)
FROM results
GROUP BY idproducts
) AS t
ON products.idproducts = t.idproducts
ORDER BY drawndate DESC
LIMIT 20

JOINs function somewhat independently of aggregation functions, they just change the intermediate result-set upon which the aggregate functions operate. I like to point to the way the MySQL documentation is written, which hints uses the term 'table_reference' in the SELECT syntax, and expands on what that means in JOIN syntax. Basically, any simple query which has a table specified can simply expand that table to a complete JOIN clause and the query will operate the same basic way, just with a modified intermediate result-set.
I say "intermediate result-set" to hint at the mindset which helped me understand JOINS and aggregation. Understanding the order in which MySQL builds your final result is critical to knowing how to reliably get the results you want. Generally, it starts by looking at the first row of the first table you specify after 'FROM', and decides if it might match by looking at 'WHERE' clauses. If it is not immediately discardable, it attempts to JOIN that row to the first JOIN specified, and repeats the "will this be discarded by WHERE?". This repeats for all JOINs, which either add rows to your results set, or remove them, or leaves just the one, as appropriate for your JOINs, WHEREs and data. This process builds what I am referring to when I say "intermediate result-set". Somewhere between starting and finishing your complete query, MySQL has in it's memory a potentially massive table-like structure of data which it built using the process I just described. Only then does it begin to aggregate (GROUP) the results according to your criteria.
So for your query, it depends on what specifically you are going for (not entirely clear in OP). If you simply want the MAX(date) from the second query, you can simply add that expression to the SELECT clause and then add an aggregation spec to the end:
SELECT *, MAX(date)
FROM operators
...
GROUP BY idproducts
ORDER BY ...
Alternatively, you can add the JOIN section of the second query to the first.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

LINQ to SQL does not generate ORDER BY when DISTINCT is used? - linq-to-sql

I got same issue while sorting year from transaction table. try this var results = (from stats in db.t_harvest_statistics select stats.unit_number).Distinct().OrderBy(x =(Int16)x.unit_number).ToList(); after getting distinct value use orderby method

Related

Using "dynamic" variables in a MySQL query [duplicate]

What is the order of execution of "SELECT" To "ORDER BY"? [duplicate]

SQL Query: Joining on a SUM()

Sql syntax: select without from clause as subquery in select (subselect)

mysql max query then JOIN?

Categories

Resources