I have a doubt and question regarding alias in sql. If i want to use the alias in same query can i use it. For eg:
Consider Table name xyz with column a and b
select (a/b) as temp , temp/5 from xyz
Is this possible in some way ?
You are talking about giving an identifier to an expression in a query and then reusing that identifier in other parts of the query?
That is not possible in Microsoft SQL Server which nearly all of my SQL experience is limited to. But you can however do the following.
SELECT temp, temp / 5
FROM (
SELECT (a/b) AS temp
FROM xyz
) AS T1
Obviously that example isn't particularly useful, but if you were using the expression in several places it may be more useful. It can come in handy when the expressions are long and you want to group on them too because the GROUP BY clause requires you to re-state the expression.
In MSSQL you also have the option of creating computed columns which are specified in the table schema and not in the query.
You can use Oracle with statement too. There are similar statements available in other DBs too. Here is the one we use for Oracle.
with t
as (select a/b as temp
from xyz)
select temp, temp/5
from t
/
This has a performance advantage, particularly if you have a complex queries involving several nested queries, because the WITH statement is evaluated only once and used in subsequent statements.
Not possible in the same SELECT clause, assuming your SQL product is compliant with entry level Standard SQL-92.
Expressions (and their correlation names) in the SELECT clause come into existence 'all at once'; there is no left-to-right evaluation that you seem to hope for.
As per #Josh Einstein's answer here, you can use a derived table as a workaround (hopefully using a more meaningful name than 'temp' and providing one for the temp/5 expression -- have in mind the person who will inherit your code).
Note that code you posted would work on the MS Access Database Engine (and would assign a meaningless correlation name such as Expr1 to your second expression) but then again it is not a real SQL product.
Its possible I guess:
SELECT (A/B) as temp, (temp/5)
FROM xyz,
(SELECT numerator_field as A, Denominator_field as B FROM xyz),
(SELECT (numerator_field/denominator_field) as temp FROM xyz);
This is now available in Amazon Redshift
E.g.
select clicks / impressions as probability, round(100 * probability, 1) as percentage from raw_data;
Ref:
https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-lateral-column-alias-reference/
You might find W3Schools "SQL Alias" to be of good help.
Here is an example from their tutorial:
SELECT po.OrderID, p.LastName, p.FirstName
FROM Persons AS p,
Product_Orders AS po
WHERE p.LastName='Hansen' AND p.FirstName='Ola'
Regarding using the Alias further in the query, depending on the database you are using it might be possible.
Related
I came up with this solution in my class by piecing together Internet knowledge. Please break this down for me I would love to know how I made it work. Specifically the t.s and the closing t.
SELECT
CourseType,
GPA,
NumberOfStudents * 100 / t.s AS `Percentage of Students`
FROM View1
CROSS JOIN
(
SELECT
SUM(NumberOfStudents) AS s
FROM View1) t;
Your query uses a subquery. A sub-query is a query that is done within another query. In your case, your subquery is:
(
SELECT
SUM(NumberOfStudents) AS s
FROM View1)
When you create subqueries, you need to give them an alias. An alias is just a name you give a subquery, so you can use it in the main query.
In your example, you named your subquery "t".
Fields can also have aliases. in your subquery, you created a field SUM(NumberOfStudents), and you named it s.
Going back to your question, you use the aliases to address fields inside the subquery. in your case, when you do 100 / t.s you are basically saying:
"I want to divide 100 by the field s from my subquery t".
The other concept that is important in your query is the Cross join. A cross join is the Cartesian product of two tables.
You can find a great and intuitive explanation of how a cross join works in the following link:
https://www.sqlshack.com/sql-cross-join-with-examples/#:~:text=The%20CROSS%20JOIN%20is%20used,also%20known%20as%20cartesian%20join.&text=The%20main%20idea%20of%20the,product%20of%20the%20joined%20tables.
I this case, the use is simpler than that. your subquery should return only one value, which is the sum of all students. And since a cross join basically pairs every row of one table with every row from the other, your cross join just provides a way to use the number of students as a constant value for the calculation of the percentage of students in the main query.
A better way to do this uses window functions:
SELECT v.CourseType, v.GPA,
v.NumberOfStudents * 100 / SUM(v.NumberOfStudents) OVER () AS Percentage_of_Students
FROM View1 v;
If you are learning SQL, you might as well learn the correct way to express logic.
Notes:
Use meaningful table aliases (abbreviations for the table/view names).
Qualify column references. This is less important in a query with only one table reference, but it is a good habit.
Window functions allow you to summarize data across multiple rows, without using an explicit JOIN.
In case I only provide a single value, are the following sql statements equivalent (eg in terms of performance)?
SELECT * FROM mytable where lastname IN(:lastnames);
SELECT * FROM mytable where lastname = :lastname;
Background: I have service that should serve a list, and a service that serves a single result. Now I thought why creating two database query endpoints, if I could achieve the same thing with just one query (means: also a single result could be queried by using the IN clause).
i tried it on my mariaDB database on a small table with hundred of records and the query with IN is a bit slower than the first one (which is to be expected) but we are talking of 0.02 sec difference
Assuming your db's engine is optimised and would check if there is one value inside the IN parameters and "convert" it to an equal/do the correct operation it would still be technically longer than just a written equal.
Also see this about IN performance.
Use This Query.
It May Be Solve Your Problem.
SELECT * FROM mytable where lastname IN(SELECT lastname FROM mytable where lastname = :lastname);
While editing some queries to add alternatives for columns without values, I accidentally wrote something like this (here is the simplyfied version):
SELECT id, (SELECT name) FROM t
To my surprise, MySQL didn't throw any error, but completed the query giving my expected results (the name column values).
I tried to find any documentation about it, but with no success.
Is this SQL standard or a MySQL specialty?
Can I be sure that the result of this syntax is really the column value from the same (outer) table? The extended version would be like this:
SELECT id, (SELECT name FROM t AS t1 where t1.id=t2.id) FROM t AS t2
but the EXPLAIN reports No tables used in the Extra column for the former version, which I think is very nice.
Here's a simple fiddle on SqlFiddle (it keeps timing out for me, I hope you have better luck).
Clarification: I know about subqueries, but I always wrote subqueries (correlated or not) that implied a table to select from, hence causing an additional step in the execution plan; my question is about this syntax and the result it gives, that in MySQL seems to return the expected value without any.
What you within your first query is a correlated subquery which simply returns the name column from the table t. no actual subquery needs to run here (which is what your EXPLAIN is telling you).
In a SQL database query, a correlated subquery (also known as a
synchronized subquery) is a subquery (a query nested inside another
query) that uses values from the outer query.
https://en.wikipedia.org/wiki/Correlated_subquery
SELECT id, (SELECT name) FROM t
is the same as
SELECT id, (SELECT t.name) FROM t
Your 2nd query
SELECT id, (SELECT name FROM t AS t1 where t1.id=t2.id) FROM t AS t2
Also contains correlated subquery but this one is actually running a query on table t to find records where t1.id = t2.id.
This is the default behavior for the SQL language and it is defined on the SQL ANSI 2011 over ISO/IEC 9075-1:2011(en) documentation. Unfortunately it is not open. This behavior is described on the section 4.11 SQL-Statements.
This behavior happens because the databases process the select comand without the from clause, therefore if it encounters:
select id, (select name) from some
It will try to find that name field as a column of the outer queries to process.
Fortunately I remember that some while ago I've answered someone here and find a valid available link to an SQL ANSI document that is online in FULL but it is for the SQL ANSI 99 and the section may not be the same one as the new document. I think, did not check, that it is around the section 4.30. Take a look. And I really recommend the reading (I did that back in the day).
Database Language SQL - ISO/IEC 9075-2:1999 (E)
It's not standard. In oracle,
select 1, (select 2)
from dual
Throws error, ORA-00923: FROM keyword not found where expected
How can you be sure of your results? Get a better understanding of what the query is supposed to acheive before you write it. Even the exetended version in the question does not make any sense.
I was doing an assignment where I had to convert SQL queries into Relational Algebra queries. I got stuck in converting the group by clause.
Could anyone please tell me how the group by clause can be written in relational algebra?
e.g.:
SELECT job, sal
FROM emp
GROUP BY job
;
Thanks!
Noting you want to get the sum of salary, in Tutorial D:
SUMMARIZE emp BY { job } ADD ( SUM ( sal ) AS total_sal )
Note aggregation is not a relational operator, hence will not form part of a relational algebra.
As for HAVING, is it a historical anomaly. Before the SQL-92 Standard, it was not possible to write SELECT expressions in the FROM clause (a.k.a derived tables) i.e. you had to do all work in one SELECT expression. Because of SQL's rigid evaluation order, the aggregate value doesn't come into existence after the WHERE clause has been evaluated i.e. it was impossible apply restriction based on aggregated values. HAVING was introduced to address this problem.
But even with HAVING, SQL remained relationally incomplete as regards Codd's until derived tables had been introduced. Derived tables rendered HAVINGredundant but using HAVING is still popular (if Stackoverflow is anything to go by): folk still seem to like to use a single SELECT where possible and SQL's aforementioned rigidity as regards evaluations order (projection is performed last in a SELECT expression) makes derived table usage quite verbose when compared to HAVING.
First of all your query is wrong you cannot select something that you did not group unless you use aggregation. I assume you want to get sum of the sal.
job F sum(sal), job(emp).
Why would someone use a group by versus distinct when there are no aggregations done in the query?
Also, does someone know the group by versus distinct performance considerations in MySQL and SQL Server. I'm guessing that SQL Server has a better optimizer and they might be close to equivalent there, but in MySQL, I expect a significant performance advantage to distinct.
I'm interested in dba answers.
EDIT:
Bill's post is interesting, but not applicable. Let me be more specific...
select a, b, c
from table x
group by a, b,c
versus
select distinct a,b,c
from table x
GROUP BY maps groups of rows to one row, per distinct value in specific columns, which don't even necessarily have to be in the select-list.
SELECT b, c, d FROM table1 GROUP BY a;
This query is legal SQL (correction: only in MySQL; actually it's not standard SQL and not supported by other brands). MySQL accepts it, and it trusts that you know what you're doing, selecting b, c, and d in an unambiguous way because they're functional dependencies of a.
However, Microsoft SQL Server and other brands don't allow this query, because it can't determine the functional dependencies easily. edit: Instead, standard SQL requires you to follow the Single-Value Rule, i.e. every column in the select-list must either be named in the GROUP BY clause or else be an argument to a set function.
Whereas DISTINCT always looks at all columns in the select-list, and only those columns. It's a common misconception that DISTINCT allows you to specify the columns:
SELECT DISTINCT(a), b, c FROM table1;
Despite the parentheses making DISTINCT look like function call, it is not. It's a query option and a distinct value in any of the three fields of the select-list will lead to a distinct row in the query result. One of the expressions in this select-list has parentheses around it, but this won't affect the result.
A little (VERY little) empirical data from MS SQL Server, on a couple of random tables from our DB.
For the pattern:
SELECT col1, col2 FROM table GROUP BY col1, col2
and
SELECT DISTINCT col1, col2 FROM table
When there's no covering index for the query, both ways produced the following query plan:
|--Sort(DISTINCT ORDER BY:([table].[col1] ASC, [table].[col2] ASC))
|--Clustered Index Scan(OBJECT:([db].[dbo].[table].[IX_some_index]))
and when there was a covering index, both produced:
|--Stream Aggregate(GROUP BY:([table].[col1], [table].[col2]))
|--Index Scan(OBJECT:([db].[dbo].[table].[IX_some_index]), ORDERED FORWARD)
so from that very small sample SQL Server certainly treats both the same.
In MySQL I've found using a GROUP BY is often better in performance than DISTINCT.
Doing an "EXPLAIN SELECT DISTINCT" shows "Using where; Using temporary " MySQL will create a temporary table.
vs a "EXPLAIN SELECT a,b, c from T1, T2 where T2.A=T1.A GROUP BY a" just shows "Using where"
Both would generate the same query plan in MS SQL Server.... If you have MS SQL Server you could just enable the actual execution plan to see which one is better for your needs ...
Please have a look at those posts:
http://blog.sqlauthority.com/2007/03/29/sql-server-difference-between-distinct-and-group-by-distinct-vs-group-by/
http://www.sqlmag.com/Article/ArticleID/24282/sql_server_24282.html
If you really are looking for distinct values, the distinct makes the source code more readable (like if it's part of a stored procedure) If I'm writing ad-hoc queries I'll usually start with the group by, even if I have no aggregations because I'll often end up putting them on.