Understanding an SQL Alias - mysql

I'm new to SQL.
I've read the book called 'Sams Teach Yourself Oracle PL/SQL in 10 minutes'.
I found it very interesting and easy to understand. There was some information on aliases but when I started doing exercises I came across an alias I don't know the purpose of.
Here is the cite http://www.sql-ex.ru/ and here is the database schema http://www.sql-ex.ru/help/select13.php#db_1, just in case. I'm working with the computer firm database i.e database number 1. The task is:
To find the makers producing PCs but not laptops.
Here is one of the solutions:
SELECT DISTINCT maker
FROM Product AS pcproduct
WHERE type = 'PC'
AND NOT EXISTS (SELECT maker
FROM Product
WHERE type = 'laptop'
AND maker = pcproduct.maker
);
The question is: Why do we need to alias product as pc_product and make the comparison 'maker = pc_product.maker' in the subquery?

Because in the inner query there are columns, which are named exactly the same as those in outer query (due to the fact that you use the same table there).
Since outer query columns are available in the inner query, there must be a distinction, which column you want, without alias, you'd write in the inner query maker = maker, which would be always true.

You are accessing the table twice, once in the main query, one in the sub query.
In the main query you say: Look at each record. Dismiss it, if the type doesn't equal 'PC'. Dismiss it, if you find a record in the table for the same maker with type 'laptop'.
In order to ask for the same maker, you must compare the maker of the main query's record with the records of the subquery. Both stem from the same table, so where product.maker = product.maker would be ambiguous. (Or rather the DBMS would assume you are talking about the subquery record, because the expression is inside the subquery. where product.maker = product.maker would hence be true, and you'd end up checking only whether there is at least one laptop in the table, regardless of the maker.)
So when dealing with the same table twice in a query, give at least on of them an alias in order to tell one record from the other.
Anyway for the given query I'd also qualify the other column in the expression for readability:
AND product.maker = pcproduct.maker
or even
FROM Product laptopproduct
WHERE type = 'laptop'
AND laptopproduct.maker = pcproduct.maker
On a sidenote: The query looks for makers that produce PCs, but no laptops. I'd prefer asking for this with aggregation:
select maker
from product
group by maker
having sum(type = 'PC') > 0
and sum(type = 'laptop') = 0;

That query can be understood as :
Gimme the maker's that have products of the 'PC' type, but where a
product of the 'laptop' type doesn't exist for that maker.
Including the tablename or alias name is sometimes needed when the same column names are used in more than 1 table.
So that the optimizer will know from which table the column is used.
It's not some smart AI that could guess that a criteria as
WHERE x = x
would actually mean
WHERE table1.x = table2.x
But more often, shorter alias names are used.
To increase readability and make the SQL more concise.
For example. The following two queries are equivalent.
Without aliases:
SELECT myawesometableone.id, mysecondevenmoreawesometable.id,
mysecondevenmoreawesometable.col1
FROM myawesometableone
JOIN mysecondevenmoreawesometable on mysecondevenmoreawesometable.one_id = myawesometableone.id
With aliases:
SELECT t1.id, t2.id, t2.col1
FROM myawesometableone AS t1
JOIN mysecondevenmoreawesometable AS t2 on t2.one_id = t1.id
Which SQL do you think looks better?
As for why that maker = pc_product.maker is used inside the EXISTS?
That's how the syntax for EXISTS works.
You establish a link between the query in the EXISTS and the outer query.
And in that case, that link is the "maker" column.

This doesn't detract from the other (correct) answers, but an easier to follow example might be:
SELECT DISTINCT pcproduct.maker
FROM Product AS pcproduct
WHERE pcproduct.type = 'PC'
AND NOT EXISTS (SELECT internalproduct.maker
FROM Product AS internalproduct
WHERE internalproduct.type = 'laptop'
AND internalproduct.maker = pcproduct.maker
);

Related

MYSQL - SubSelect when FK does and doesnt exists

Situation Overview
The current question is a problem about selecting values from two tables table A (material) and table B (MaterialRevision). However, The PK of table A might or Might not exist in Table B. When it doesnt exists, the query described in this question wont return the values of table A, but IT SHOULD. So basically here's whats happening :
The query is only returning values when A.id exists in B.id, when In fact, I need it to return values from A when A.id ALSO dont exist in B.id.
Problem:
Suppose two tables. Table Material and Table Material Revision.
Notice that the PK idMaterial is a FK in MaterialRevision.
Current "Mock" Tables
Query Objective
Obs: remember these two tables are a simplification of the real
tables.
For each Material, print the material variables and the last(MAX) RevisionDate from MaterialRevision. In case theres no RevisionDate, print BLANK ("") for the "last revision date".
What is wrongly happening
For each Material, print the material variables and the last(MAX) RevisionDate from MaterialRevision. In case theres no Revision for the Material, doesnt print the Material (SKIP).
Current Code
SELECT
Material.idMaterial,
Material.nextRevisionDate,
Material.obsolete,
lastRevisionDate
FROM Material,
(SELECT MaterialRevision.idMaterial, max(MaterialRevision.revisionDate) as "revisionDate" from MaterialRevision
GROUP BY MaterialRevision.idMaterial
) AS Revision
WHERE (Material.idMaterial = Revision.idMaterial AND Material.obsolete = 0)
References and Links used to reach the state described in this question
Why is MAX() 100 times slower than ORDER BY ... LIMIT 1?
MySQL get last date records from multiple
MySQL - How to SELECT based on value of another SELECT
MySQL Query Select where id does not exist in the JOIN table
PS I hope this question is correctly understood since it took me a lot of time to build it. I researched a lot in stackoverflow and after
several failed attempts I had no option but to ask for help.
You should use JOIN :
SELECT m.idMaterial, m.nextRevisionDate, mr.revisionDate AS "lastRevisionDate"
FROM Material m
LEFT JOIN MaterialRevision AS mr ON mr.idMaterial = m.idMaterial AND mr.revisionDate = (
SELECT MAX(ch.revisionDate) from MaterialRevision ch
WHERE mr.idMaterial = ch.idMaterial)
WHERE m.obsolete = 0
Here is an explanation of what INNER JOIN, LEFT JOIN and RIGHT JOIN are. (You will love them if you often cross tables in your queries)
As m.obsolete will always be true, I ommited it in the SELECT clause
You should use the left outer join instead of using the cross product.
You're query should be something like this:
SELECT idMaterial, nextRevisionableDate, obsolete,
revisionDate AS lastRevisionDate
FROM Material
LEFT OUTER JOIN MaterialRevision AS mr On
Material.idMaterial = MaterialRevision.id
AND mr.revisionDate = (SELECT MAX(ch.revisionDate) from MaterialRevision ch
WHERE mr.idMaterial = ch.idMaterial)
WHERE obsolete = 0;
Here you can find some documentation about types of join.

SQL not exists to find some issues

I'm practicing some SQL (new to this),
I have the next tables:
screening_occapancy(idscreening,row,col,idclient)
screening(screeningid,idmovie,idtheater,screening_time)
Im trying to creating a query to search which clients watched all the movies in the "screening" table and show their ID(idclient).
this is what I written(which doesn't work):
select idclient from screening_occapancy p where not exists
(select screeningid from screening where screeningid=p.idscreening)
I know it's probably not that good so please try to explain also what am I doing wrong.
P.S My mission is to use not/exists while doing it...
Thanks!
Your query is basically fine, although the select distinct is unnecessary in the subquery:
select p.idclient
from screening_occapancy p
where not exists (select 1
from screening s
where s.screeningid = p.idscreening
);
Notes:
You can select anything in the exists subquery. Selecting a column is misleading.
Use table aliases and use them for all column references, particularly in a correlated subquery.
If you are designing the tables, I would advise you to give the primary key and foreign key the same name (screeningid or idscreening, but not both).
EDIT:
If you want clients who watched all movies, then I would approach this as:
select p.idclient
from screening_occapancy p
group by p.idclient
having count(distinct p.screening_occapancy p) = (select count(*) from screening);
Why don't you count the number of movies in the screening_table, load it into a variable and check the results of your query results against the variable?
load number of movies into variable (identified by idmovie):
SELECT count(DISTINCT(idmovie)) FROM screening INTO #number_of_movies;
check the results of your query against the variable:
SELECT A.idclient,
count(DISTINCT(idmovie)) AS number_of_movies_watched,
FROM screening_occapancy A
INNER JOIN screening B
ON(A.idscreening = B.screeningid)
GROUP BY A.idclient
HAVING number_of_movies_watched = #number_of_movies ;
If you want to find all clients, that attended all screenings, replace idmovie with screeningid.
Even someone relatively new to MySQL can get his head around this query. The "not exists"-approach is more difficult to understand.

Lazily evaluate MySQL view

I have some MySQL views which define a number of extra columns based on some relatively straightforward subqueries. The database is also multi-tenanted so each row has a company ID against it.
The problem I have is my views are evaluated for every row before being filtered by the company ID, giving huge performance issues. Is there any way to lazily evaluate the view so the 'where' clause in the outer query applies to the subqueries in the view. Or is there something similar to views that I can use to add the extra fields. I want to calculate them in SQL so the calculated fields can be used for filtering/searching/sorting/pagination.
I've taken a look at the MySQL docs that explain the algorithms available and am aware that the views can't be proccessed as a 'merge' since they contain subqueries.
view
create view companies_view as
select *,
(
select count(id) from company_user where company_user.company_id = companies.id
) as user_count,
(
select count(company_user.user_id)
from company_user join users on company_user.user_id = users.id
where company_user.company_id = companies.id
and users.active = 1
) as active_user_count,
(
select count(company_user.user_id)
from company_user join users on company_user.user_id = users.id
where company_user.company_id = companies.id
and users.active = 0
as inactive_user_count
from companies;
query
select * from companies_view where company_id = 123;
I want the subqueries in the view to be evaluated AFTER applying the 'where company_id = 123' from the main query scope. I can't hard code the company ID in the view since I want the view to be usable for any company ID.
You cannot change the order of evaluation, that is set by the MySQL server.
However, in this particular case you could rewrite the whole sql statement to use joins and conditional counts instead of subqueries:
select c.*,
count(u.id) as user_count,
count(if(u.active=1, 1, null)) as active_user_count,
count(if(u.active=0, 1, null)) as inactive_user_count
from companies c
left join company_user cu on c.id=cu.company_id
left join users u on cu.user_id = u.id
group by c.company_id, ...
If you have MySQL v5.7, then you may not need to add any further fields to the group by clause since the other fields in the companies table would be functionally dependent on the company_id. In earlier versions you may have to list all fields in the companies table (depends on the sql mode settings).
Another way to optimalise such query would be using denormalisation. Your users and company_user table probably have a lot more records than your companies table. You could add a user_count, an active_user_count, and an inactive_user_count field to the companies table, add after insert / update / delete triggers to the company_user table and an after update to the users table and update these 2 fields there. This way you would not need to do the joins and the conditional counts in the view.
It is possible to convince the optimizer to handle a view with scalar subqueries using the MERGE algorithm... you just have to beat the optimizer at its own game.
This will seem quite unorthodox to some, but it is a pattern I use with success in cases where this is needed.
Create a stored function to encapsulate each subquery, then reference the stored function in the view. The optimizer remains blissfully unaware that the functions will invoke the subqueries.
CREATE FUNCTION user_count (_cid INT) RETURNS INT
DETERMINISTIC
READS SQL DATA
RETURN (SELECT count(id) FROM company_user WHERE company_user.company_id = _cid);
Note that a stored function with a single statement does not need BEGIN/END or a change of DELIMITER.
Then in the view, replace the subquery with:
user_count(id) AS user_count,
And repeat the process for each subquery.
The optimizer will then process the view as a MERGE view, select the one appropriate row from the companies table based on the outer WHERE, invoke the functions, and... problem solved.

SQL query to select based on many-to-many relationship

This is really a two-part question, but in order not to mix things up, I'll divide into two actual questions. This one is about creating the correct SQL statement for selecting a row based on values in a many-to-many related table:
Now, the question is: what is the absolute simplest way of getting all resources where e.g metadata.category = subject AND where that category's corresponding metadata.value ='introduction'?
I'm sure this could be done in a lot of different ways, but I'm a novice in SQL, so please provide the simplest way possible... (If you could describe briefly what the statement means in plain English that would be great too. I have looked at introductions to SQL, but none of those I have found (for beginners) go into these many-to-many selections.)
The easiest way is to use the EXISTS clause. I'm more familiar with MSSQL but this should be close
SELECT *
FROM resources r
WHERE EXISTS (
SELECT *
FROM metadata_resources mr
INNER JOIN metadata m ON (mr.metadata_id = m.id)
WHERE mr.resource_id = r.id AND m.category = 'subject' AND m.value = 'introduction'
)
Translated into english it's 'return me all records where this subquery returns one or more rows, without returning the data for those rows'. This sub query is correlated to the outer query by the predicate mr.resource_id = r.id which uses the outer row as the predicate value.
I'm sure you can google around for more examples of the EXIST statement

Selecting multiple columns/fields in MySQL subquery

Basically, there is an attribute table and translation table - many translations for one attribute.
I need to select id and value from translation for each attribute in a specified language, even if there is no translation record in that language. Either I am missing some join technique or join (without involving language table) is not working here since the following do not return attributes with non-existing translations in the specified language.
select a.attribute, at.id, at.translation
from attribute a left join attributeTranslation at on a.id=at.attribute
where al.language=1;
So I am using subqueries like this, problem here is making two subqueries to the same table with the same parameters (feels like performance drain unless MySQL groups those, which I doubt since it makes you do many similar subqueries)
select attribute,
(select id from attributeTranslation where attribute=a.id and language=1),
(select translation from attributeTranslation where attribute=a.id and language=1),
from attribute a;
I would like to be able to get id and translation from one query, so I concat columns and get the id from string later, which is at least making single subquery but still not looking right.
select attribute,
(select concat(id,';',title)
from offerAttribute_language
where offerAttribute=a.id and _language=1
)
from offerAttribute a
So the question part.
Is there a way to get multiple columns from a single subquery or should I use two subqueries (MySQL is smart enough to group them?) or is joining the following way to go:
[[attribute to language] to translation] (joining 3 tables seems like a worse performance than subquery).
Yes, you can do this. The knack you need is the concept that there are two ways of getting tables out of the table server. One way is ..
FROM TABLE A
The other way is
FROM (SELECT col as name1, col2 as name2 FROM ...) B
Notice that the select clause and the parentheses around it are a table, a virtual table.
So, using your second code example (I am guessing at the columns you are hoping to retrieve here):
SELECT a.attr, b.id, b.trans, b.lang
FROM attribute a
JOIN (
SELECT at.id AS id, at.translation AS trans, at.language AS lang, a.attribute
FROM attributeTranslation at
) b ON (a.id = b.attribute AND b.lang = 1)
Notice that your real table attribute is the first table in this join, and that this virtual table I've called b is the second table.
This technique comes in especially handy when the virtual table is a summary table of some kind. e.g.
SELECT a.attr, b.id, b.trans, b.lang, c.langcount
FROM attribute a
JOIN (
SELECT at.id AS id, at.translation AS trans, at.language AS lang, at.attribute
FROM attributeTranslation at
) b ON (a.id = b.attribute AND b.lang = 1)
JOIN (
SELECT count(*) AS langcount, at.attribute
FROM attributeTranslation at
GROUP BY at.attribute
) c ON (a.id = c.attribute)
See how that goes? You've generated a virtual table c containing two columns, joined it to the other two, used one of the columns for the ON clause, and returned the other as a column in your result set.