I am trying to convert below MySQL query to SQL Server.
SELECT
#a:= #a + 1 serial_number,
a.id,
a.file_assign_count
FROM usermaster a,
workgroup_master b,
(
SELECT #a: = 0
) AS c
WHERE a.wgroup = b.id
AND file_assign_count > 0
I understand that := operator in MySQL assigns value to a variable & returns the value immediately. But how can I simulate this behavior in SQL Server?
SQL Server 2005 and later support the standard ROW_NUMBER() function, so you can do it this way:
SELECT ROW_NUMBER() OVER (ORDER BY xxxxx) AS serial_number,
a.id,
a.file_assign_count
FROM usermaster a
JOIN workgroup_master b ON a.wgroup = b.id
WHERE file_assign_count > 0
Re your comments: I edited the above to show the OVER clause. The row-numbering only has any meaning if you define the sort order. Your original query didn't do this, which means it's up to the RDBMS what order the rows are returned in.
But when using ROW_NUMBER() you must be specific. Where I put xxxxx, you would put a column or expression to define the sort order. See explanation and examples in the documentation.
The subquery setting #a:=0 is only for initializing that variable, and it doesn't need to be joined into the query anyway. It's just a style that some developers use. This is not needed in SQL Server, because you don't need the user variable at all when you can use ROW_NUMBER() instead.
If the SQL Server database is returning two rows where your MySQL database returned one row, the data must be different. Because neither ROW_NUMBER() or the MySQL user variables would limit the number of rows returned.
P.S.: please use JOIN syntax. It has been standard since 1992.
Related
I need some assistance building a SQL statement that will output the top 5 retired assets per client that can be put into a SQL View.
I have built the following SQL statement but it will not work within a view and need an alternative.
SET #row_number := 0;
SELECT DISTINCT NAME, RetiredDate, COMMENT,
#row_number:=CASE WHEN #client_ID=clientID THEN #row_number+1 ELSE 1 END AS num,
#client_ID:=clientID ClientID
FROM `retiredassets`
WHERE `retiredassets`.`ClientID` IN(SELECT clientID FROM `clients`)
HAVING num <=5
Does anyone have any suggestions for me? The above statement works flawlessly but cannot work within a SQL View.
In MySQL 8.0 this should be:
WITH cte AS (
SELECT r.NAME, r.RetiredDate, r.COMMENT,
ROW_NUMBER() OVER (PARTITION BY r.ClientID ORDER BY ...) AS num,
r.ClientID
FROM retiredassets
JOIN clients USING (ClientID)
)
SELECT * FROM cte WHERE num <= 5;
I left ... because I don't know what your ordering is. Your original query doesn't specify an order, so you'll have to choose one.
Re your comment.
If you can't upgrade to MySQL 8.0, and you can't use variables in a VIEW definition, then you can't use a VIEW. You will have to write out the full query when you use it, until you can upgrade to MySQL 8.0.
I was going through this answer How do you select every n-th row from mysql. In that I am not able to understand the initialisation in the following subquery.
SELECT
#row := #row +1 AS rownum, [column name]
FROM (
SELECT #row :=0) r, [table name]
How exactly the initialisation of
SELECT #row :=0
is working?
Is some kind of join happening between table ‘r’ and ‘table name’?
If I change above query as below, would there be any difference in the performance?
SET #row = 0;
SELECT
#row := #row +1 AS rownum, [column name] FROM [table name]
Please share your thoughts.
Using two statements, initializing the user-defined variable in a separate statement will be equivalent performance.
Instead of the SET statement, we could do
SELECT #row : = 0
which would achieve the same result, assigning a value to the user-defined variable #row. The difference would that MySQL needs to prepare a resultset to be returned to the client. We avoid that with the SET statement, which doesn't return a resultset.
With two separate statement executions, there's the overhead of sending an extra statement: parsing tokens, syntax check, semantic check, ... and returning the status to the client. It's a small amount of overhead. We aren't going to notice it onesie-twosie.
So performance will be equivalent.
I strongly recommend ditching the oldschool comma syntax for join operation, and using the JOIN keyword instead.
Consider the query:
SELECT t.foo
FROM r
CROSS
JOIN t
ORDER BY t.foo
What happens when the table r is guaranteed to contain exactly one row?
The query is equivalent to:
SELECT t.foo
FROM t
ORDER BY t.foo
We can use a SELECT query in place of a table or view. Consider for example:
SELECT v.foo
FROM ( SELECT t.foo
FROM t
) v
Also consider what happens with this query:
SELECT #foo := 0
There is no FROM clause (or Oracle-style FROM dual), so the query will return a single row. The expression in the SELECT list is evaluated... the constant value 0 is assigned to the user-defined variable #foo.
Consider this query:
SELECT 'bar'
FROM ( SELECT #foo := 0 ) r
Before the outer query runs, the SELECT inside the parens is executed. (MySQL calls it an "derived table" but more generically it's an inline view definition.) The net effect is that the constant 0 is assigned to the user-defined variable, and a single row is returned. So the outer query returns a single row.
If we understand that, we have what we need to understand what is happening here:
SELECT t.mycol
FROM ( SELECT #row := 0 ) r
CROSS
JOIN mytable t
ORDER
BY t.mycol
Inline view r is evaluated, the SELECT returns a single row, the value 0 is assigned to user-defined variable #row. Since r is guaranteed to return a single row, we know that the Cartesian product (cross join) with mytable will result in one row for each row in mytable. Effectively yielding just a copy of mytable.
To answer the question that wasn't asked:
The benefit of doing the initialization within the statement rather than a separate statement is that we now have a single statement that stands alone. It knocks out a dependency i.e. doesn't require a separate execution of a SET statement to assign the user defined variable. Which also cuts out a roundtrip to the database to prepare and execute a separate statement.
I want to calculate percentile_cont on this table.
In Oracle, the query would be
SELECT PERCENTILE_CONT(0.05) FROM sometable;
What would be it's alternative in MariaDB/MySQL?
While MariaDB 10.3.3 has support for these functions in the form of window functions (see Lukasz Szozda's answer), you can emulate them using window functions in MySQL 8 as well:
SELECT DISTINCT first_value(matrix_value) OVER (
ORDER BY CASE WHEN p <= 0.05 THEN p END DESC /* NULLS LAST */
) x,
FROM (
SELECT
matrix_value,
percent_rank() OVER (ORDER BY matrix_value) p,
FROM some_table
) t;
I've blogged about this more in detail here.
MariaDB 10.3.3 introduced PERCENTILE_CONT, PERCENTILE_DISC, and MEDIAN windowed functions.
PERCENTILE_CONT
PERCENTILE_CONT() (standing for continuous percentile) is an ordered set aggregate function which can also be used as a window function. It returns a value which corresponds to the given fraction in the sort order. If required, it will interpolate between adjacent input items.
SELECT name, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY star_rating)
OVER (PARTITION BY name) AS pc
FROM book_rating;
There is no built in function for this in either MariaDB or MySQL, so you have to solve this on the SQL level (or by adding a user defined function written in C ...)
This might help with coming up with a SQL solution:
http://rpbouman.blogspot.de/2008/07/calculating-nth-percentile-in-mysql.html
MariaDB 10.2 has windowing functions.
For MySQL / older MariaDB, and assuming you just want the Nth percentile for a single set of values.
This is best done form app code, but could be built into a stored routine.
Count the total number of rows: SELECT COUNT(*) FROM tbl.
Construct and execute a SELECT with LIMIT n,1 where n is computed as the percentile times the count, then filled into the query.
If you need to interpolate between two values, it gets messier. Do you need that, too?
SQL Server 2008 R2 - Query from 2014 SSMS but fails from code as well.
Strange - first reference to table B works, second fails with an 'Invalid object B' error. What am I doing wrong? GO's don't help.
WITH B as (SELECT BatchOutId, SettleMerchantCode, BatchDate, BatchStatusCode, BatchTransCnt, BatchTotAmt, BatchAdjustAmt, BatchAdjustCnt
FROM MAF01
GROUP BY BatchOutId, SettleMerchantCode, BatchDate, BatchStatusCode, BatchTransCnt, BatchTotAmt, BatchAdjustAmt, BatchAdjustCnt)
SELECT * FROM B ORDER BY BatchOutId DESC
SELECT * FROM B ORDER BY BatchOutId DESC
This is as expected.
CTEs are only in scope for the next statement. They are just named queries.
You would need to either
Repeat the definition of the CTE.
Move the definition out into a view or inline function.
Materialise the results into a temp table.
Depending on what you were expecting to happen.
A cte is only valid for one query, not an entire batch. So once you do the first SELECT * FROM B, that query is finished. The next query no longer has access to the cte that the first query used.
I know this has to be duplicate
You can have multiple CTE but only one statement
The two Select are two statements
If you use a #temp then you can have more than one statement
What is the predefined order in which the clauses are executed in MySQL? Is some of it decided at run time, and is this order correct?
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
The actual execution of MySQL statements is a bit tricky. However, the standard does specify the order of interpretation of elements in the query. This is basically in the order that you specify, although I think HAVING and GROUP BY could come after SELECT:
FROM clause
WHERE clause
SELECT clause
GROUP BY clause
HAVING clause
ORDER BY clause
This is important for understanding how queries are parsed. You cannot use a column alias defined in a SELECT in the WHERE clause, for instance, because the WHERE is parsed before the SELECT. On the other hand, such an alias can be in the ORDER BY clause.
As for actual execution, that is really left up to the optimizer. For instance:
. . .
GROUP BY a, b, c
ORDER BY NULL
and
. . .
GROUP BY a, b, c
ORDER BY a, b, c
both have the effect of the ORDER BY not being executed at all -- and so not executed after the GROUP BY (in the first case, the effect is to remove sorting from the GROUP BY and in the second the effect is to do nothing more than the GROUP BY already does).
This is how you can get the rough idea about how mysql executes the select query
DROP TABLE if exists new_table;
CREATE TABLE `new_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`testdecimal` decimal(6,2) DEFAULT NULL,
PRIMARY KEY (`id`));
INSERT INTO `new_table` (`testdecimal`) VALUES ('1234.45');
INSERT INTO `new_table` (`testdecimal`) VALUES ('1234.45');
set #mysqlorder := '';
select #mysqlorder := CONCAT(#mysqlorder," SELECT ") from new_table,(select #mysqlorder := CONCAT(#mysqlorder," FROM ")) tt
JOIN (select #mysqlorder := CONCAT(#mysqlorder," JOIN1 ")) t on ((select #mysqlorder := CONCAT(#mysqlorder," ON1 ")) or rand() < 1)
JOIN (select #mysqlorder := CONCAT(#mysqlorder," JOIN2 ")) t2 on ((select #mysqlorder := CONCAT(#mysqlorder," ON2 ")) or rand() < 1)
where ((select #mysqlorder := CONCAT(#mysqlorder," WHERE ")) or IF(new_table.testdecimal = 1234.45,true,false))
group by (select #mysqlorder := CONCAT(#mysqlorder," GROUPBY ")),id
having (select #mysqlorder := CONCAT(#mysqlorder," HAVING "))
order by (select #mysqlorder := CONCAT(#mysqlorder," ORDERBY "));
select #mysqlorder;
And here is the output from above mysql query, hope you can figure out the mysql execution of a SELECT query :-
FROM JOIN1 JOIN2 WHERE ON2 ON1 ORDERBY GROUPBY SELECT WHERE
ON2 ON1 ORDERBY GROUPBY SELECT HAVING HAVING
It appears that the generalized pattern in Standard SQL for Logical Query Processing Phase is (at least from SQL-92 - starting on p.177) :
from clause
joined table
where clause
group by clause
having clause
query specification (ie. SELECT)
You can find and download newer Standard SQL Standardization documents from here:
https://wiki.postgresql.org/wiki/Developer_FAQ#Where_can_I_get_a_copy_of_the_SQL_standards.3F
For MSSQL (since it tends to stay farily close to standard in my experience) the Logical Query Processing Phase is generally:
FROM <left_table>
ON <join_condition>
<join_type> JOIN <right_table>
WHERE <where_condition>
GROUP BY <group_by_list>
WITH {CUBE | ROLLUP}
HAVING <having_condition>
SELECT
DISTINCT
ORDER BY <order_by_list>
<TOP_specification> <select_list>
From: Chapter 1 of Ben-Gan, Itzik, et. al., Inside Microsoft SQL Server 2005: T-SQL Querying, (Microsoft Press)
Also: Ben-Gan, Itzik, et. al., Training Kit (Exam 70-461) Querying Microsoft SQL Server 2012 (MCSA), (Microsoft Press)
It should be noted that MySQL can be configured to operate closer to standard as well if desired by setting the SQL Mode (although probably only recommended for fringe cases):
https://dev.mysql.com/doc/refman/8.0/en/sql-mode.html
For MySQL, I searched both MySQL and MariaDB documentation and could find nothing other than the few statements that Gordon Linoff mentioned in passing that were in the MySQL documentation for SELECT. They are:
If ORDER BY occurs within a parenthesized query expression and also is applied in the outer query, the results are undefined and may change in a future version of MySQL.
If LIMIT occurs within a parenthesized query expression and also is applied in the outer query, the results are undefined and may change in a future version of MySQL.
The HAVING clause is applied nearly last, just before items are sent to the client, with no optimization. (LIMIT is applied after HAVING.)
From MySQL JOIN Documentation: Natural joins and joins with USING, including outer join variants, are processed according to the SQL:2003 standard
Given that a quick skim through of SQL-92 from "from clause" to "query specification" showed that the Logic can be conditional at times depending on how the query is written, and given that I could not find anything in the MySQL or MariaDB documentation (not saying it is not there, I just could not find it), and other articles on MySQL's Logical Query Processing Phase were conflicting in their order, it seems that the best way that MySQL gives to determine some sort of Logical Query Processing Phase (or at least the steps used for join optimization for the query plan) for a specific query is to do a trace on the query execution by doing the following (from MySQL documentation "Tracing The Optimizer/Typical Usage"):
# Turn tracing on (it's off by default):
SET optimizer_trace="enabled=on";
SELECT ...; # your query here
SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;
# possibly more queries...
# When done with tracing, disable it:
SET optimizer_trace="enabled=off";
You can interpret the results by looking at MySQL documentation's Tracing Example.
Basically, it appears that you want to look for "join_optimization" (which says that the from/join statements are being evaluated for the specific query, the specific query being the one stated as "Select#"), then look for "condition_processing: condition", and then "clause_processing: clause". As it says in the MySQL documentation for General Trace Structure:
A trace follows closely the actual execution path: there is a join-preparation object, a join-optimization object, a join-execution object, for each JOIN... It is far from showing everything happening in the optimizer, but we plan to show more information in the future.
Interesting enough, I found that running a query like the following in MySQL gave me that its apparent process order for query optimization was FROM,WHERE,HAVING,ORDER BY, and then GROUP BY:
SELECT
a.id
, max(a.timestamp)
FROM database.table AS a
LEFT JOIN database.table2 AS b on a.id = b.id
WHERE a.id > 1
GROUP BY a.id HAVING max(a.timestamp) > 0
ORDER BY a.id
I am assuming that since "condition_processing" and "clause_processing" are within the "Select#" group that these are processed before SELECT - which lines up with SQL-99, but it is an assumption.
In terms of operators and variables, Zawodny, Jeremy D., et. al., High Performance MySQL, 2nd Edition, (O'Reilly) states that:
The := assignment operator has lower precedence than any other operator, so you have to be careful to parenthesize explicitly.
I only mention this since sometimes it may not be order of Logical Query Processing Phase as much as precedence of assignment operator when working with a variable, or variables, that could be an issue for troubleshooting if the query is not executing as thought.
I think the execution order is like this:
(7) SELECT
(8) DISTINCT <select_list>
(1) FROM <left_table>
(3) <join_type> JOIN <right_table>
(2) ON <join_condition>
(4) WHERE <where_condition>
(5) GROUP BY <group_by_list>
(6) HAVING <having_condition>
(9) ORDER BY <order_by_condition>
(10) LIMIT <limit_number>[, <offset_number>]