I want to get account related information by using Relational Algebra based on the descending order of the balance attribute.
Table:
Account (id, account_number, branch_name, balance)
How to get ORDER BY clause representation in Relational Algebra?
I don't think that is possible; relational algebra is about sets, which don't have an order. In this book on SQL and relation theory it is stated that "ORDER BY isn’t actually part of the relational algebra". Also, there's a quite thorough answer to a similar question on SO.
Related
I have a doubt and question regarding alias in sql. If i want to use the alias in same query can i use it. For eg:
Consider Table name xyz with column a and b
select (a/b) as temp , temp/5 from xyz
Is this possible in some way ?
You are talking about giving an identifier to an expression in a query and then reusing that identifier in other parts of the query?
That is not possible in Microsoft SQL Server which nearly all of my SQL experience is limited to. But you can however do the following.
SELECT temp, temp / 5
FROM (
SELECT (a/b) AS temp
FROM xyz
) AS T1
Obviously that example isn't particularly useful, but if you were using the expression in several places it may be more useful. It can come in handy when the expressions are long and you want to group on them too because the GROUP BY clause requires you to re-state the expression.
In MSSQL you also have the option of creating computed columns which are specified in the table schema and not in the query.
You can use Oracle with statement too. There are similar statements available in other DBs too. Here is the one we use for Oracle.
with t
as (select a/b as temp
from xyz)
select temp, temp/5
from t
/
This has a performance advantage, particularly if you have a complex queries involving several nested queries, because the WITH statement is evaluated only once and used in subsequent statements.
Not possible in the same SELECT clause, assuming your SQL product is compliant with entry level Standard SQL-92.
Expressions (and their correlation names) in the SELECT clause come into existence 'all at once'; there is no left-to-right evaluation that you seem to hope for.
As per #Josh Einstein's answer here, you can use a derived table as a workaround (hopefully using a more meaningful name than 'temp' and providing one for the temp/5 expression -- have in mind the person who will inherit your code).
Note that code you posted would work on the MS Access Database Engine (and would assign a meaningless correlation name such as Expr1 to your second expression) but then again it is not a real SQL product.
Its possible I guess:
SELECT (A/B) as temp, (temp/5)
FROM xyz,
(SELECT numerator_field as A, Denominator_field as B FROM xyz),
(SELECT (numerator_field/denominator_field) as temp FROM xyz);
This is now available in Amazon Redshift
E.g.
select clicks / impressions as probability, round(100 * probability, 1) as percentage from raw_data;
Ref:
https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-lateral-column-alias-reference/
You might find W3Schools "SQL Alias" to be of good help.
Here is an example from their tutorial:
SELECT po.OrderID, p.LastName, p.FirstName
FROM Persons AS p,
Product_Orders AS po
WHERE p.LastName='Hansen' AND p.FirstName='Ola'
Regarding using the Alias further in the query, depending on the database you are using it might be possible.
This question already has answers here:
Join vs. sub-query
(20 answers)
Closed 2 years ago.
I am in the process of teaching myself SQL queries. My question arrives from w3resource.com sql exercise 6 in the sub-query set of questions. To summarize the question, we want a query that displays the commission of the salesmen serving customers in Paris. Here are the two tables with their columns given:
Salesman(salesman_id, name, city, commission)
Customer(customer_id, cust_name, city, grade, salesman_id)
Below are two queries that I wrote that achieve the solution in different ways. My question is if there is a more 'correct' one out of the two? With 'correct' taking into account performance, standards, etc. Or are both equally fine? I ask because I imagine as I continue with more complex queries, should I be sticking to one for performance/versatility reasons?
Thanks
select commission from Salesman
where salesman_id IN
(select salesman_id from Customer
where city = 'Paris')
select commission from Salesman
join Customer on (Salesman.salesman_id = Customer.salesman_id)
where Customer.city = 'Paris'
They're both fine.
In terms of style, both have their advantages, e.g.
Using the IN clause simplifies the main SQL query (from a readability point of view) since you don't have to do a join. If you have several other joins it's nice to reduce them as much as possible to keep track of what's going on.
Using the JOIN is needed if you had a composite key instead of a single salesman_id.
They might generate different query plans depending on your database, but like with most code it's better to start by writing for readability & maintainability and optimise only the bits you need to. There's not going to be any difference for this sort of thing.
In an oracle book I read that when when we perform SELECT by joining 2 or more tables, if tablename is used before the column name SELECT works faster.
Eg:
SELECT table1.name, table1.dob.... instead of SELECT name, dob....
Is it the same way in MySQL?
EDIT
I know that this is a good practice when there are identical field names. What i was thinking was about the performance point of view even if there are no identical field names
I dunno about performance, but it is a good practice, especially when joining tables. Joined tables could have for example identical field names, and the query will then fail. You can also use aliases if your table names are too long:
SELECT t1.name, t2.dob FROM table1 t1 JOIN table t2 ON ...
From the efficiency point of view, Oracle and MySQL compile the SQL to an internal representation before executing it, so I don't think there must a significant difference in execution time as they will decide the table from the fields name if they are not specified. The time difference will be at compilation time, where they deduce the tables for each field.
In fact, I personally doubt the fact that Oracle executes faster if the table names are specified!
It's not good practice when the field names are the same, it's good practice all the time. It's not about efficiency, it's about your query not breaking when someone else adds fields to one of the join'd tables with overlapping names, so your stuff works in six months time, not just today.
I was doing an assignment where I had to convert SQL queries into Relational Algebra queries. I got stuck in converting the group by clause.
Could anyone please tell me how the group by clause can be written in relational algebra?
e.g.:
SELECT job, sal
FROM emp
GROUP BY job
;
Thanks!
Noting you want to get the sum of salary, in Tutorial D:
SUMMARIZE emp BY { job } ADD ( SUM ( sal ) AS total_sal )
Note aggregation is not a relational operator, hence will not form part of a relational algebra.
As for HAVING, is it a historical anomaly. Before the SQL-92 Standard, it was not possible to write SELECT expressions in the FROM clause (a.k.a derived tables) i.e. you had to do all work in one SELECT expression. Because of SQL's rigid evaluation order, the aggregate value doesn't come into existence after the WHERE clause has been evaluated i.e. it was impossible apply restriction based on aggregated values. HAVING was introduced to address this problem.
But even with HAVING, SQL remained relationally incomplete as regards Codd's until derived tables had been introduced. Derived tables rendered HAVINGredundant but using HAVING is still popular (if Stackoverflow is anything to go by): folk still seem to like to use a single SELECT where possible and SQL's aforementioned rigidity as regards evaluations order (projection is performed last in a SELECT expression) makes derived table usage quite verbose when compared to HAVING.
First of all your query is wrong you cannot select something that you did not group unless you use aggregation. I assume you want to get sum of the sal.
job F sum(sal), job(emp).
So firstly here's my query: (NOTE:I know SELECT * is bad practice I just switched it in to make the query more readable)
SELECT pcln_cities.*,COUNT(pcln_hotels.cityid) AS hotelcount
FROM pcln_cities
LEFT OUTER JOIN pcln_hotels ON pcln_hotels.cityid=pcln_cities.cityid
WHERE pcln_cities.state_name='California' GROUP BY pcln_cities.cityid
ORDER BY hotelcount DESC
LIMIT 5
So I know that to solve things like that you add EXPLAIN to the beginning of the query but I'm not 100% sure how to read the results, so here they are:
alt text http://www.andrew-g-johnson.com/query-results.JPG
Bonus points to an answer that tells me what to look for in the EXPLAIN results
EDIT The cities tables has the following indexes (or is it indices?)
cityid
state_name
and I just added one with both as I thought it might help (it didn't)
The hotels tables has the following indexes (or is it indices?)
cityid
Hmm, there's something not very right in your query.
You use an aggregate function (count), but you simply group by on id.
Normally, you should group on all columns in your select list, that are not an aggregate function.
As you've specified the query now, IMHO, the DBMS can never correctly determine which values he should display for those columns that are not an aggregate ...
It would be more correct if your query was written like:
select cityname, count(*)
from city inner join hotel on hotel.city_id = city_id
group by cityname
order by count(*) desc
If you do not have an index on the cityName, and you filter on cityname, it will improve performance if you put an index on that column.
In short: adding an index on columns that you regularly use for filtering or for sorting, may improve performance.
(That is simply put offcourse, you can use it as a 'guideline', but every situation is different. Sometimes it can be helpfull to add an index which spans multiple columns.
Also, remember that if you update or insert a record, indexes need to be updated as well, so there's a slight performance cost in adding/updating/deleting records)
Another thing that could improve performance, is using an inner join instead of an outer join. I don't think that it is necessary to use an outer join here.
It looks like you don't have an index on pcln_cities.state_name, or pcln_cities.cityid? Try adding them.
Given that you've updated your question to say that you do have these indexes, I can only suggest that your database currently has a preponderance of cities in California, so the query optimizer decided it would be easier to do a table scan and throw out the non-California ones than to use the index to pick out the California ones.
Your query looks fine. Is there a chance that something else has a lock on a record that you need? Are the tables especially big? I doubt that data is the problem as there are not that many hotels...
I've run in to similar issues with MySQL. After spending over a year tuning, patching, and thinking I'm a SQL dummy, I switched to SQL Server Express. The exact same queries with the exact same data would run 2-5 orders of magnitude faster in SQL Server Express. MySQL seemed to have an especially difficult time with moderately complex queries (5+ tables). I think the MySQL optimizer became retarded after SUN bought the organization...