how to properly use JOIN? - mysql

I have two tables and a foreign_key index table:
table xymply_locations
id
name
lat
lng
table xymply_categories
id
name
table xymply_categoryf_key
locid
catid
and i want to select the categories that are assigned to locid 1. How do I do this, I tried
SELECT *
FROM `xymply_categoryf_key`, xymply_categories
JOIN `xymply_categories` ON
xymply_categories.id = xymply_categoryf_key.catid
WHERE locid = 1;
but I get "Not unique table/alias: 'xymply_categories' " and I'm wondering why...?

You're mixing implicit (all tables listed in the FROM clause) and explicit JOIN styles in your code, hence the error.
SELECT xc.id, xc.name
FROM xymply_categories xc
INNER JOIN xymply_categoryf_key xck
ON xc.id = xck.catid
WHERE xck.locid = 1;

In your query, you're selecting from two tables. One of them is xymply_categoryf_key, the other is a JOIN of two instances of xymply_categories. You're using two instances of the same table, so when you write xymply_categories.id it is not clear which instance you mean - the one that is the first argument of JOIN, or that one which is the second argument? That's what "Not unique table/alias" means. If I understand correctly what you want to do, try this:
SELECT c.id, c.name FROM xymply_categories c, xymply_categoryf_key k WHERE c.id = k.catid AND k.locid = 1;
This was done without JOIN, although the evaluation of
WHERE c.id = k.catid
maybe would be faster with JOIN, I am not sure. Also, note the usage of k and c as aliases for the tables xymply_categoryf_key (k for key) and xymply_categories c (c for categories). This is how to avoid the problem of "Not unique table/alias" which occured to you before. In your case, you would use e.g.
xymply_categories a JOIN xymply_categories b WHERE a.id = ...
So, although I gave an example how to write the query without using JOIN - as I mentioned, using JOIN will maybe produce a faster query. Therefore, all you should do is to add the aliases.

Because you are "joining" xymply_catagories twice so the db wants an alias for the tables in order to know which one to go to when selecting a column.
you can do joins several ways depending on what you want. A straight inner join (which appears to be what you want) can be
select * from xymply_categoryf_key a, xymply_categories b where a.catid = b.id
WHERE b.locid = 1;
or you can also do an explicit inner join as Joe Stefanelli shows. Either of these gives you the records where there is matching info from each table.

Related

Sql statement fetching information from two different tables

I have a two tables, one of the table is called participants_tb while the second is called allocation_tb. On the participants_tb, I have my columns as participant_id, name, username.
Under the allocation_tb, I have my columns as allocation_id, sender_username, receiver_username, done. The column done holds any of these three numbers: 0, 1, 2.
I used this sql statement to fetch my values
SELECT *, COUNT(done) d
FROM participants_tb
JOIN allocation_tb ON (username=receiver_username)
WHERE done = 0 || done = 1
GROUP BY receiver_username
It worked very well, the problem I have is that, I want it to also include the information of participants that are in the participants_tb but not in the allocation_tb. I tried to use the left outer join but it did not work as expected because I want it to include participants that are only in the participants_tb but not in the allocation_tb, since the done in the where clause is in the allocation_tb, it won't include those information.
You seem to want:
SELECT p.*, COUNT(a.done) as d
FROM participants_tb p LEFT JOIN
allocation_tb a
ON p.username = a.receiver_username) AND
a.done IN (0, 1)
GROUP BY p.participant_id;
Notes:
The LEFT JOIN keeps all participants.
The GROUP BY needs to be on the first table.
You can use SELECT p.* with the GROUP BY -- assuming that the GROUP BY key is unique (or the primary key).
All columns should be qualified.
IN is an easier way to express your logic.

Use * in subquery

I have two tables: finaldata and finalqualification, and finlpcqlfctin_qlifictins_id is the foreign key in finaldata i.e. is a primary key in finalqualification table.
I want to get all the data from these two tables through a subquery in MYSQL , but it is showing me an error. My query is:
SELECT result.name FROM (SELECT * FROM finalpocpassportdata f LEFT JOIN finalpocqualification q ON f.id=q.id)result
Quick Hint here - if your sub query returns more than one column with the same name it will fall over. We strongly GUID all our data with consistent field names, eg. customerGUID, so I come across this quite regular. Especially when you are using sub queries with joins.
Your subquery return more than value
SELECT result.name FROM (SELECT * FROM finaldata f LEFT JOIN finalqualification ON f.id=f.finlpcqlfctin_qlifictins_id)
you need to specify the name of column also you have a problem in your alias
SELECT result.name FROM (SELECT only_one_column FROM finaldata f LEFT JOIN finalqualification ON f.id=finalqualification .finlpcqlfctin_qlifictins_id)
This is your query:
SELECT result.name
FROM (SELECT *
FROM finalpocpassportdata f LEFT JOIN
finalpocqualification q
ON f.id = q.id
) result;
Obviously, this is a test query, because the subquery is unnecessary and adversely affects performance.
Equally obvious, the SELECT * has two columns with the name id. It might have other columns with the same name as well. But, if id are the only duplicates, then one simple solution is to use USING:
SELECT result.name
FROM (SELECT *
FROM finalpocpassportdata f LEFT JOIN
finalpocqualification q
USING (id)
) result;
(Although NATURAL JOIN is also an option, I strongly advise not to use it here or anywhere. It is just bugs waiting to happen.)
If there are other columns with the same name -- which can happen for very reasonable purposes -- then you can list the columns individually. That is sometimes a pain, it can be easier to use * for one table and list from another:
SELECT . . .
FROM (SELECT f.*, q.name
. . .
In practice, I cannot see a need for this sort of subquery. There is little need to do a join as a subquery, because you can have many joins in a single FROM clause.
I don't think so to get name you need subquery
SELECT name FROM finalpocpassportdata f LEFT JOIN
finalpocqualification q ON f.id=q.id
This query will also give the same result.

INNER JOIN execution/evaluation order

I was wondering how exactly inner joins works in mysql.
If I do
SELECT * FROM A a
INNER JOIN B b ON a.row = b.row
INNER JOIN C c ON c.row2 = b.row2
WHERE name='Paul';
Does it do the joins first, then pick the ones where name = paul? Because when I do it this way, it is SUPER DUPER slow.
is there a way to do something along the lines:
SELECT * FROM (A a WHERE name='paul')
INNER JOIN B b ON a.row = b.row
INNER JOIN C c ON c.row2 = b.row2]
When I try it that way, I just get an error.
or alternately, is it better to just have 3 separate queries, one for A, B and C? example:
string query1 = "SELECT * FROM A WHERE name = 'paul'";
//send query, get data reader
string query2 = "SELECT * FROM b WHERE b = " + query1.b;
//send query, get data reader
string query3 = "SELECT * FROM C WHERE c = " + query1.c;
//send query, get data reader
Obviously this is just pseudo code, but I think it illustrates the point.
Which way is faster/recommended?
Edit
Table structure:
**tblTimesheet**
int timesheetID (primary key)
datetime date
varchar username
int projectID
string description
float hours
**tblProjects**
int projectID (primary key)
string project name
int clientID
**tblClients**
int clientID
string clientName
The join that I want is:
select * from tblTimesheet time
INNER JOIN tblProject proj on time.projectID = proj.projectID
INNER JOIN tblClient client on proj.clientID = client.clientID
WHERE username = 'paul';
something like that
You are probably missing an index on a key table; you can use the MySql EXPLAIN keyword to help in finding out where your query is slow.
To answer another section of your question;
is there a way to do something along the lines:
SELECT * FROM (A a WHERE name='paul')
INNER JOIN B b ON a.row = b.row
INNER JOIN C c ON c.row2 = b.row2]
You can use a SubQuery;
SELECT *
FROM (SELECT * FROM tblTimesheet WHERE username = 'Paul') AS time
INNER JOIN tblProject proj on time.projectID = proj.projectID
INNER JOIN tblClient client on proj.clientID = client.clientID
What this query is effectively doing is attempting to prefilter the fields the JOIN will operate on. Rather than join all the fields to together, and then filter those down, it only attempts to JOIN fields from tblTimesheet where the name is 'Paul' first.
However, the query optimizer should already be doing this so this query should perform similarly to your original query.
For more help with indexes, the understanding of which will aid you greatly in database development, start by looking at a tutorial like this one.
It's fantastically unlikely a join will be slower than three database hits. Reordering the clauses shouldn't have an impact either if MySQL's query optimizer is at all competent. Are the columns in the WHERE / ON clauses indexed?
I think you'll find the query optimiser will give you the best possible query most of the time. You need to look at the execution plan to find out why the query is slow - my guess is lack of indexes.
When MySql looks in these tables, it will usually do it in the best way to get the best speed - a simple join as you've illustrated won't confuse the query optimiser, but missing indexes can cause the database engine to scan tables instead of looking up values (i.e. it needs to walk through the table row by row to match the criteria you specified)
An index ensures that the engine doesn't need to go searching down to the leaf page level and will usually speed up queries
What's the table structure here or is this all hypothetical?
The general rule of thumb with SQL is - try it and see!
Use your first query, the mySql query optimizer should pick the fastest strategy
if you want it to be faster, make sure that there is an index on the name column

Selecting multiple columns/fields in MySQL subquery

Basically, there is an attribute table and translation table - many translations for one attribute.
I need to select id and value from translation for each attribute in a specified language, even if there is no translation record in that language. Either I am missing some join technique or join (without involving language table) is not working here since the following do not return attributes with non-existing translations in the specified language.
select a.attribute, at.id, at.translation
from attribute a left join attributeTranslation at on a.id=at.attribute
where al.language=1;
So I am using subqueries like this, problem here is making two subqueries to the same table with the same parameters (feels like performance drain unless MySQL groups those, which I doubt since it makes you do many similar subqueries)
select attribute,
(select id from attributeTranslation where attribute=a.id and language=1),
(select translation from attributeTranslation where attribute=a.id and language=1),
from attribute a;
I would like to be able to get id and translation from one query, so I concat columns and get the id from string later, which is at least making single subquery but still not looking right.
select attribute,
(select concat(id,';',title)
from offerAttribute_language
where offerAttribute=a.id and _language=1
)
from offerAttribute a
So the question part.
Is there a way to get multiple columns from a single subquery or should I use two subqueries (MySQL is smart enough to group them?) or is joining the following way to go:
[[attribute to language] to translation] (joining 3 tables seems like a worse performance than subquery).
Yes, you can do this. The knack you need is the concept that there are two ways of getting tables out of the table server. One way is ..
FROM TABLE A
The other way is
FROM (SELECT col as name1, col2 as name2 FROM ...) B
Notice that the select clause and the parentheses around it are a table, a virtual table.
So, using your second code example (I am guessing at the columns you are hoping to retrieve here):
SELECT a.attr, b.id, b.trans, b.lang
FROM attribute a
JOIN (
SELECT at.id AS id, at.translation AS trans, at.language AS lang, a.attribute
FROM attributeTranslation at
) b ON (a.id = b.attribute AND b.lang = 1)
Notice that your real table attribute is the first table in this join, and that this virtual table I've called b is the second table.
This technique comes in especially handy when the virtual table is a summary table of some kind. e.g.
SELECT a.attr, b.id, b.trans, b.lang, c.langcount
FROM attribute a
JOIN (
SELECT at.id AS id, at.translation AS trans, at.language AS lang, at.attribute
FROM attributeTranslation at
) b ON (a.id = b.attribute AND b.lang = 1)
JOIN (
SELECT count(*) AS langcount, at.attribute
FROM attributeTranslation at
GROUP BY at.attribute
) c ON (a.id = c.attribute)
See how that goes? You've generated a virtual table c containing two columns, joined it to the other two, used one of the columns for the ON clause, and returned the other as a column in your result set.

In what order are MySQL JOINs evaluated?

I have the following query:
SELECT c.*
FROM companies AS c
JOIN users AS u USING(companyid)
JOIN jobs AS j USING(userid)
JOIN useraccounts AS us USING(userid)
WHERE j.jobid = 123;
I have the following questions:
Is the USING syntax synonymous with ON syntax?
Are these joins evaluated left to right? In other words, does this query say: x = companies JOIN users; y = x JOIN jobs; z = y JOIN useraccounts;
If the answer to question 2 is yes, is it safe to assume that the companies table has companyid, userid and jobid columns?
I don't understand how the WHERE clause can be used to pick rows on the companies table when it is referring to the alias "j"
Any help would be appreciated!
USING (fieldname) is a shorthand way of saying ON table1.fieldname = table2.fieldname.
SQL doesn't define the 'order' in which JOINS are done because it is not the nature of the language. Obviously an order has to be specified in the statement, but an INNER JOIN can be considered commutative: you can list them in any order and you will get the same results.
That said, when constructing a SELECT ... JOIN, particularly one that includes LEFT JOINs, I've found it makes sense to regard the third JOIN as joining the new table to the results of the first JOIN, the fourth JOIN as joining the results of the second JOIN, and so on.
More rarely, the specified order can influence the behaviour of the query optimizer, due to the way it influences the heuristics.
No. The way the query is assembled, it requires that companies and users both have a companyid, jobs has a userid and a jobid and useraccounts has a userid. However, only one of companies or user needs a userid for the JOIN to work.
The WHERE clause is filtering the whole result -- i.e. all JOINed columns -- using a column provided by the jobs table.
I can't answer the bit about the USING syntax. That's weird. I've never seen it before, having always used an ON clause instead.
But what I can tell you is that the order of JOIN operations is determined dynamically by the query optimizer when it constructs its query plan, based on a system of optimization heuristics, some of which are:
Is the JOIN performed on a primary key field? If so, this gets high priority in the query plan.
Is the JOIN performed on a foreign key field? This also gets high priority.
Does an index exist on the joined field? If so, bump the priority.
Is a JOIN operation performed on a field in WHERE clause? Can the WHERE clause expression be evaluated by examining the index (rather than by performing a table scan)? This is a major optimization opportunity, so it gets a major priority bump.
What is the cardinality of the joined column? Columns with high cardinality give the optimizer more opportunities to discriminate against false matches (those that don't satisfy the WHERE clause or the ON clause), so high-cardinality joins are usually processed before low-cardinality joins.
How many actual rows are in the joined table? Joining against a table with only 100 values is going to create less of a data explosion than joining against a table with ten million rows.
Anyhow... the point is... there are a LOT of variables that go into the query execution plan. If you want to see how MySQL optimizes its queries, use the EXPLAIN syntax.
And here's a good article to read:
http://www.informit.com/articles/article.aspx?p=377652
ON EDIT:
To answer your 4th question: You aren't querying the "companies" table. You're querying the joined cross-product of ALL four tables in your FROM and USING clauses.
The "j.jobid" alias is just the fully-qualified name of one of the columns in that joined collection of tables.
In MySQL, it's often interesting to ask the query optimizer what it plans to do, with:
EXPLAIN SELECT [...]
See "7.2.1 Optimizing Queries with EXPLAIN"
Here is a more detailed answer on JOIN precedence. In your case, the JOINs are all commutative. Let's try one where they aren't.
Build schema:
CREATE TABLE users (
name text
);
CREATE TABLE orders (
order_id text,
user_name text
);
CREATE TABLE shipments (
order_id text,
fulfiller text
);
Add data:
INSERT INTO users VALUES ('Bob'), ('Mary');
INSERT INTO orders VALUES ('order1', 'Bob');
INSERT INTO shipments VALUES ('order1', 'Fulfilling Mary');
Run query:
SELECT *
FROM users
LEFT OUTER JOIN orders
ON orders.user_name = users.name
JOIN shipments
ON shipments.order_id = orders.order_id
Result:
Only the Bob row is returned
Analysis:
In this query the LEFT OUTER JOIN was evaluated first and the JOIN was evaluated on the composite result of the LEFT OUTER JOIN.
Second query:
SELECT *
FROM users
LEFT OUTER JOIN (
orders
JOIN shipments
ON shipments.order_id = orders.order_id)
ON orders.user_name = users.name
Result:
One row for Bob (with the fulfillment data) and one row for Mary with NULLs for fulfillment data.
Analysis:
The parenthesis changed the evaluation order.
Further MySQL documentation is at https://dev.mysql.com/doc/refman/5.5/en/nested-join-optimization.html
SEE http://dev.mysql.com/doc/refman/5.0/en/join.html
AND start reading here:
Join Processing Changes in MySQL 5.0.12
Beginning with MySQL 5.0.12, natural joins and joins with USING, including outer join variants, are processed according to the SQL:2003 standard. The goal was to align the syntax and semantics of MySQL with respect to NATURAL JOIN and JOIN ... USING according to SQL:2003. However, these changes in join processing can result in different output columns for some joins. Also, some queries that appeared to work correctly in older versions must be rewritten to comply with the standard.
These changes have five main aspects:
The way that MySQL determines the result columns of NATURAL or USING join operations (and thus the result of the entire FROM clause).
Expansion of SELECT * and SELECT tbl_name.* into a list of selected columns.
Resolution of column names in NATURAL or USING joins.
Transformation of NATURAL or USING joins into JOIN ... ON.
Resolution of column names in the ON condition of a JOIN ... ON.
Im not sure about the ON vs USING part (though this website says they are the same)
As for the ordering question, its entirely implementation (and probably query) specific. MYSQL most likely picks an order when compiling the request. If you do want to enforce a particular order you would have to 'nest' your queries:
SELECT c.*
FROM companies AS c
JOIN (SELECT * FROM users AS u
JOIN (SELECT * FROM jobs AS j USING(userid)
JOIN useraccounts AS us USING(userid)
WHERE j.jobid = 123)
)
as for part 4: the where clause limits what rows from the jobs table are eligible to be JOINed on. So if there are rows which would join due to the matching userids but don't have the correct jobid then they will be omitted.
1) Using is not exactly the same as on, but it is short hand where both tables have a column with the same name you are joining on... see: http://www.java2s.com/Tutorial/MySQL/0100__Table-Join/ThekeywordUSINGcanbeusedasareplacementfortheONkeywordduringthetableJoins.htm
It is more difficult to read in my opinion, so I'd go spelling out the joins.
3) It is not clear from this query, but I would guess it does not.
2) Assuming you are joining through the other tables (not all directly on companyies) the order in this query does matter... see comparisons below:
Origional:
SELECT c.*
FROM companies AS c
JOIN users AS u USING(companyid)
JOIN jobs AS j USING(userid)
JOIN useraccounts AS us USING(userid)
WHERE j.jobid = 123
What I think it is likely suggesting:
SELECT c.*
FROM companies AS c
JOIN users AS u on u.companyid = c.companyid
JOIN jobs AS j on j.userid = u.userid
JOIN useraccounts AS us on us.userid = u.userid
WHERE j.jobid = 123
You could switch you lines joining jobs & usersaccounts here.
What it would look like if everything joined on company:
SELECT c.*
FROM companies AS c
JOIN users AS u on u.companyid = c.companyid
JOIN jobs AS j on j.userid = c.userid
JOIN useraccounts AS us on us.userid = c.userid
WHERE j.jobid = 123
This doesn't really make logical sense... unless each user has their own company.
4.) The magic of sql is that you can only show certain columns but all of them are their for sorting and filtering...
if you returned
SELECT c.*, j.jobid....
you could clearly see what it was filtering on, but the database server doesn't care if you output a row or not for filtering.