You are given a table, BST, containing two columns: N and P, where N represents the value of a node in Binary Tree, and P is the parent of N.
Question - Write a query to find the node type of Binary Tree ordered by the value of the node. Output one of the following for each node:
Root: If node is root node.
Leaf: If node is leaf node.
Inner: If node is neither root nor leaf node.
Solution 1 - I am using dot(.) notation on alias B.N=P
SELECT N,
CASE
WHEN P IS NULL THEN 'Root'
WHEN (SELECT COUNT(*) FROM BST WHERE B.N=P)>0 THEN 'Inner'
ELSE 'Leaf'
END AS PLACE
FROM BST B
ORDER BY N;
Solution 2 - Not using and dot operator?
SELECT N,
CASE
WHEN P IS NULL THEN 'Root'
WHEN (SELECT COUNT(*) FROM BST WHERE N=P)>0 THEN 'Inner'
ELSE 'Leaf'
END AS PLACE
FROM BST
ORDER BY N;
My question is -
Why do the Solution 1 is generating correct answer, is it due to . (dot) ? If it is due to dot operator why we didn't use dot operation on P (B.N = P)?
Even I modify solution 2 and write (BST.N = P), it is still giving me wrong answer. Why it is so?
I am confused in usage of .(dot)
You use BST twice in your query. The . tells the DBMS which instance you are using. When omitted, the DBMS has to chose it implicitly.
The table that is implicitly chosen happens not be the same throughout your query.
With more explicit aliases, your query is:
SELECT N,
CASE
WHEN P IS NULL THEN 'Root'
WHEN (SELECT COUNT(*) FROM BST InsideAlias WHERE OutsideAlias.N=InsideAlias.P)>0 THEN 'Inner'
ELSE 'Leaf'
END AS PLACE
FROM BST OutsideAlias
When you remove the aliases, the implicitly chosen instance of BST is:
Inside the subquery SELECT COUNT(*) FROM BST InsideAlias : InsideAlias
In the rest of your query: OutsideAlias (InsideAlias is out of scope for the rest of the query anyway).
Which means:
(SELECT COUNT(*) FROM BST InsideAlias WHERE N=P)
is equivalent to
(SELECT COUNT(*) FROM BST InsideAlias WHERE InsideAlias.N=InsideAlias.P)
Therefore, you are getting the wrong results because it requires a node to be its own parent for the COUNT(*) to be greater than 0.
Instead, OutsideAlias.N=InsideAlias.P translates to: is my node the parent of some other node? Another way to do the test is with EXISTS (SELECT * FROM BST WHERE OutsideAlias.N = P), although that was not your question.
This is about correlated subquery. A correlated subquery is a subquery that contains a reference to a table that also appears in the outer query. And how do we distinguish the subquery table and the table from the outer query? There are two cases.
The inner table and the outer table are different tables (having different names): In this case,we simply use their table names as they are distinct.
The inner table and the outer table are the same table (same table name): In this case, in order to distinguish them, we need to give the outer table an alias (if you give the inner table an aliase e.g i and use i.n=n, if means innertable.n=innertable.n, NOT innertable.n=outertable.n)
Therefore, to answer your first question,please check the comments besides the query:
SELECT N,
CASE
WHEN P IS NULL THEN 'Root'
WHEN (SELECT COUNT(*) FROM BST -- this is the table name of the subquery table which does not need an alias
WHERE B.N=P /*B.N=P means table condition of this subquery requires that value of P from this subquery table BST equals to column p of its outer(parent) table which is aliased as B */)>0 THEN 'Inner'
ELSE 'Leaf'
END AS PLACE
FROM BST B -- this is the table name of the main query which needs an alias so it can be distinguishable in the correlated subquery
ORDER BY N;
Before answering your second question, how do we make two tables of the same name distinguishable? We need to give one of them a different name ,which calls for using an alias. But if you use BST.N = P(Here you didn't state in your second question as where you would put the condition. From the context i presume you mean the subquery table condition) in the subquery, then this BST actually means the innertable,thus making the express BST.N = P same as N=P(both prefixed using the innertable). To fix the issue, give the outer table an alias and use the aliase as prefix for the outertable columns which are used in the subquery.
It has to do with namespaces -- where columns and expressions "live".
A SQL query may include multiple namespaces where columns and expressions are named and can be accessible.
In your queries there are two namespaces:
one for the main query.
another for the inner scalar subquery.
In the first query, columns in the main query can be referenced by prepending the namespace B, as in B.<column> while columns in the inner namespace can be referenced using the namespace BST as in BST.<column>.
If a column name (or expression name) does not explicitly includes a namespace, then the closer accessible one wins.
In your second query you don't specify a namespace and, therefore, the columns N and P in the expression N = P reference the same inner table, and so the subquery is not correlated to the main one. In the first query B.N references a column on the main query, and therefore the expression B.N = P compares columns from different tables, and then the query is correlated.
Related
I'm trying to add features to a preexisting application and I came across a MySQL view something like this:
SELECT
AVG(table_name.col1),
AVG(table_name.col2),
AVG(table_name.col3),
table_name.personID,
table_name.col4
FROM table_name
GROUP BY table_name.personID;
OK so there's a few aggregate functions. You can select personID because you're grouping by it. But it also is selecting a column that is not in an aggregate function and is not a part of the GROUP BY clause. How is this possible??? Does it just pick a random value because the values definitely aren't unique per group?
Where I come from (MSSQL Server), that's an error. Can someone explain this behavior to me and why it's allowed in MySQL?
It's true that this feature permits some ambiguous queries, and silently returns a result set with an arbitrary value picked from that column. In practice, it tends to be the value from the row within the group that is physically stored first.
These queries aren't ambiguous if you only choose columns that are functionally dependent on the column(s) in the GROUP BY criteria. In other words, if there can be only one distinct value of the "ambiguous" column per value that defines the group, there's no problem. This query would be illegal in Microsoft SQL Server (and ANSI SQL), even though it cannot logically result in ambiguity:
SELECT AVG(table1.col1), table1.personID, persons.col4
FROM table1 JOIN persons ON (table1.personID = persons.id)
GROUP BY table1.personID;
Also, MySQL has an SQL mode to make it behave per the standard: ONLY_FULL_GROUP_BY
FWIW, SQLite also permits these ambiguous GROUP BY clauses, but it chooses the value from the last row in the group.†
† At least in the version I tested. What it means to be arbitrary is that either MySQL or SQLite could change their implementation in the future, and have some different behavior. You should therefore not rely on the behavior staying they way it is currently in ambiguous cases like this. It's better to rewrite your queries to be deterministic and not ambiguous. That's why MySQL 5.7 now enables ONLY_FULL_GROUP_BY by default.
I should have Googled for just a bit longer... It seems I found my answer.
MySQL extends the use of GROUP BY so
that you can use nonaggregated columns
or calculations in the SELECT list
that do not appear in the GROUP BY
clause. You can use this feature to
get better performance by avoiding
unnecessary column sorting and
grouping. For example, you do not need
to group on customer.name in the
following query
In standard SQL, you would have to add
customer.name to the GROUP BY clause.
In MySQL, the name is redundant.
Still, that just seems... wrong.
Let's say you have a query like this:
SELECT g, v
FROM t
GROUP BY g;
In this case, for each possible value for g, MySQL picks one of the corresponding values of v.
However, which one is chosen, depends on some circumstances.
I read somewhere that for each group of g, the first value of v is kept, in the order how the records were inserted into the table t.
This is quite ugly, because the records in a table should be treated as a set where the order of the elements should not matter. This is so "mysql-ish"...
If you want to determine which value for v to keep, you need to apply a subselect for t like this:
SELECT g, v
FROM (
SELECT *
FROM t
ORDER BY g, v DESC
) q
GROUP BY g;
This way you define which order the records of the subquery are processed by the external query, thus you can trust which value of v it will pick for the individual values of g.
However, if you need some WHERE conditions then be very careful. If you add the WHERE condition to the subquery then it will keep the behaviour, it will always return the value you expect:
SELECT g, v
FROM (
SELECT *
FROM t
WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
ORDER BY g, v DESC
) q
GROUP BY g;
This is what you expect, the subselect filters and orders the table. It keeps the records where g has the given value and the external query returns that g and the first value for v.
However, if you add the same WHERE condition to the outer query then you get a non-deterministic result:
SELECT g, v
FROM (
SELECT *
FROM t
-- WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
ORDER BY g, v DESC
) q
WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
GROUP BY g;
Surprisingly, you may get different values for v when executing the same query again and again which is... strange. The expected behaviour is to get all the records in the appropriate order from the subquery, filtering them in the outer query and then picking the same as it picked in the previous example. But it does not.
It picks a value for v seemingly randomly. The same query returned different values for v if I executed more (~20) times, but the distribution was not uniform.
If instead of adding an outer WHERE, you specify a HAVING condition like this:
SELECT g, v
FROM (
SELECT *
FROM t1
-- WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
ORDER BY g, v DESC
) q
-- WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
GROUP BY g
HAVING g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9';
Then you get a consistent behaviour again.
CONCLUSION
I would suggest not to rely on this technique at all. If you really want/need to then avoid WHERE conditions in the outer query. Use it in the inner query if you can or a HAVING clause in the outer query.
I tested it with this data:
CREATE TABLE t1 (
v INT,
g VARCHAR(36)
);
INSERT INTO t1 VALUES (1, '737a8783-110c-447e-b4c2-1cbb7c6b72c9');
INSERT INTO t1 VALUES (2, '737a8783-110c-447e-b4c2-1cbb7c6b72c9');
in MySQL 5.6.41.
Maybe it is just a bug that gets/got fixed in newer versions, please give feedback if you have experience with newer versions.
select * from personel where p_id IN(select
min(dbo.personel.p_id)
FROM
personel
GROUP BY dbo.personel.p_adi)
I have a table Product as
Start End
A M
M T
T F
I need to get all preceding values of the input value. For example, if input is T, the query should return A, M. If input is M, it should return A. For F, the output should be A, M, T. I have tried self join but could not get the result.
Not much of an answer, but too long for a comment...
(MySQL 8.0 aside) there's no recursion built in to MySQL. Instead, options include:
joining the table to itself as often as could possibly be required
switching to another model (e.g. Nested Set)
handling the recursion at the application level (e.g. with a bit of PHP)
Here's an example of the first option:
http://sqlfiddle.com/#!9/355414/4
you could add order column to identify the preceding columns. e.g primary key like id will work also. After that you can execute query like
SELECT GROUP_CONCAT(start) preceding_val FROM table_name WHERE id < (SELECT id FROM table_name WHERE end=your_value)
this will give you a row with all the preceding start value comma separated for your given end value.
SELECT N, IF(P IS NULL,'Root',IF((SELECT COUNT(*) FROM BST WHERE P=B.N)>0,'Inner','Leaf'))
FROM BST AS B
ORDER BY N;
Here N and P are the column names where N is node and P is parent,BST is the name of table and the above query is to find node type of BST but i am not able to understand what P=B.N mean?
First, let me start by saing I really hope these are not the actual names you are using. If they are, do your future self a huge favor and replace them with readable names that actually describs the data the columns and tables holds.
That being said, B.N is the N column in the row of the outer query, since it's using B as an alias to the table name.
In the where clause of the sub query, you are comparing the value of P with the value of N from the main query. This subquery will run once for each row in the main query, so for each row you are getting the count of rows where N is a parent of some node.
In
WHERE P=B.N
P is the "parent" column of BST of the inner most SELECT statement.
B.N refers to th eN ("node") column of the BST table referred in the outer SELECT statement.
The clause
FROM BST AS B
creates B as the alias for the outer BST.
"From BST as B" defines that B to be used as variable for the table BST for this query and N has to be a column in that table so:
cell value of column N in table BST
For any value in column N from table B, first find how many records does the column P has when the value in P equal to the one in N, if the total number is larger than 0, populate it as Inner, otherwise Leaf
I have this query:
SELECT (#a:=#a+1) AS priority
FROM (SELECT t1.name FROM t1 LIMIT 100) x, (SELECT #a:=0) r
a few questions:
1 - What is the comma doing between the SELECTS? I have never seen a comma between commands, and I don't know what it means
2 - why is the second SELECT given a name?
3 - why is the second SELECT inside brackets?
4 - Performance-wize: Does it select the first 100 rows form t1, and then assigns them a number? What is going on here??
It is performing a CROSS JOIN (a cartesian product of the rows) but without the explicit syntax. The following 2 queries produce identical in results:
SELECT *
FROM TableA, TableB
SELECT *
FROM TableA
CROSS JOIN TableB
The query in the question uses 2 "derived tables" instead. I would encourage you to use the explicit join syntax CROSS JOIN and never use just commas. The biggest issue with using just commas is you have no idea if the Cartesian product is deliberate or accidental.
Both "derived tables" have been given an alias - and that is a good thing. How else would you reference some item of the first or second "derived table"? e.g. Imagine they were both queries that had the column ID in them, you would then be able to reference x.ID or r.ID
Regarding what the overall query is doing. First note that the second query is just a single row (1 row). So even though the syntax produces a CROSS JOIN it does not expand the total number of rows because 100 * 1 = 100. In effect the subquery "r" is adding a "placeholder" #a (initially at value zero) on every row. Once that #a belongs on each row, then you can increment the value by 1 for each row, and as a result you get that column producing a row number.
x and r are effectively anonymous views produced by the SELECT statements. If you imagine that instead of using SELECTs in brackets, you defined a view using the select statement and then referred to the view, the syntax would be clear.
The selects are given names so that you can refer to these names in WHERE conditions, joins or in the list of fields to select.
That is the syntax. You have to have brackets.
Yes, it selects the first 100 rows. I am not sure what you mean by "gives them a number".
I'm trying to add features to a preexisting application and I came across a MySQL view something like this:
SELECT
AVG(table_name.col1),
AVG(table_name.col2),
AVG(table_name.col3),
table_name.personID,
table_name.col4
FROM table_name
GROUP BY table_name.personID;
OK so there's a few aggregate functions. You can select personID because you're grouping by it. But it also is selecting a column that is not in an aggregate function and is not a part of the GROUP BY clause. How is this possible??? Does it just pick a random value because the values definitely aren't unique per group?
Where I come from (MSSQL Server), that's an error. Can someone explain this behavior to me and why it's allowed in MySQL?
It's true that this feature permits some ambiguous queries, and silently returns a result set with an arbitrary value picked from that column. In practice, it tends to be the value from the row within the group that is physically stored first.
These queries aren't ambiguous if you only choose columns that are functionally dependent on the column(s) in the GROUP BY criteria. In other words, if there can be only one distinct value of the "ambiguous" column per value that defines the group, there's no problem. This query would be illegal in Microsoft SQL Server (and ANSI SQL), even though it cannot logically result in ambiguity:
SELECT AVG(table1.col1), table1.personID, persons.col4
FROM table1 JOIN persons ON (table1.personID = persons.id)
GROUP BY table1.personID;
Also, MySQL has an SQL mode to make it behave per the standard: ONLY_FULL_GROUP_BY
FWIW, SQLite also permits these ambiguous GROUP BY clauses, but it chooses the value from the last row in the group.†
† At least in the version I tested. What it means to be arbitrary is that either MySQL or SQLite could change their implementation in the future, and have some different behavior. You should therefore not rely on the behavior staying they way it is currently in ambiguous cases like this. It's better to rewrite your queries to be deterministic and not ambiguous. That's why MySQL 5.7 now enables ONLY_FULL_GROUP_BY by default.
I should have Googled for just a bit longer... It seems I found my answer.
MySQL extends the use of GROUP BY so
that you can use nonaggregated columns
or calculations in the SELECT list
that do not appear in the GROUP BY
clause. You can use this feature to
get better performance by avoiding
unnecessary column sorting and
grouping. For example, you do not need
to group on customer.name in the
following query
In standard SQL, you would have to add
customer.name to the GROUP BY clause.
In MySQL, the name is redundant.
Still, that just seems... wrong.
Let's say you have a query like this:
SELECT g, v
FROM t
GROUP BY g;
In this case, for each possible value for g, MySQL picks one of the corresponding values of v.
However, which one is chosen, depends on some circumstances.
I read somewhere that for each group of g, the first value of v is kept, in the order how the records were inserted into the table t.
This is quite ugly, because the records in a table should be treated as a set where the order of the elements should not matter. This is so "mysql-ish"...
If you want to determine which value for v to keep, you need to apply a subselect for t like this:
SELECT g, v
FROM (
SELECT *
FROM t
ORDER BY g, v DESC
) q
GROUP BY g;
This way you define which order the records of the subquery are processed by the external query, thus you can trust which value of v it will pick for the individual values of g.
However, if you need some WHERE conditions then be very careful. If you add the WHERE condition to the subquery then it will keep the behaviour, it will always return the value you expect:
SELECT g, v
FROM (
SELECT *
FROM t
WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
ORDER BY g, v DESC
) q
GROUP BY g;
This is what you expect, the subselect filters and orders the table. It keeps the records where g has the given value and the external query returns that g and the first value for v.
However, if you add the same WHERE condition to the outer query then you get a non-deterministic result:
SELECT g, v
FROM (
SELECT *
FROM t
-- WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
ORDER BY g, v DESC
) q
WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
GROUP BY g;
Surprisingly, you may get different values for v when executing the same query again and again which is... strange. The expected behaviour is to get all the records in the appropriate order from the subquery, filtering them in the outer query and then picking the same as it picked in the previous example. But it does not.
It picks a value for v seemingly randomly. The same query returned different values for v if I executed more (~20) times, but the distribution was not uniform.
If instead of adding an outer WHERE, you specify a HAVING condition like this:
SELECT g, v
FROM (
SELECT *
FROM t1
-- WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
ORDER BY g, v DESC
) q
-- WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
GROUP BY g
HAVING g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9';
Then you get a consistent behaviour again.
CONCLUSION
I would suggest not to rely on this technique at all. If you really want/need to then avoid WHERE conditions in the outer query. Use it in the inner query if you can or a HAVING clause in the outer query.
I tested it with this data:
CREATE TABLE t1 (
v INT,
g VARCHAR(36)
);
INSERT INTO t1 VALUES (1, '737a8783-110c-447e-b4c2-1cbb7c6b72c9');
INSERT INTO t1 VALUES (2, '737a8783-110c-447e-b4c2-1cbb7c6b72c9');
in MySQL 5.6.41.
Maybe it is just a bug that gets/got fixed in newer versions, please give feedback if you have experience with newer versions.
select * from personel where p_id IN(select
min(dbo.personel.p_id)
FROM
personel
GROUP BY dbo.personel.p_adi)