Optimizing JOIN with WHERE - mysql

Lets say I have 2 tables. The first table, table_1, contains each posted content including
table_1
title,
author name,
email,
city_name, etc.
The second table provides a lookup for table_1. It has 2 columns,
table_2
city_id and
city_name.
For instance, city_id =1 corresponds to New York, city_id =2 corresponds to Chicago... and so on. Under the 'city' column in table1, the city_id is listed which can easily be joined with table 2, producing a readable city name.
Would the following statement be as efficient as using a WHERE with city_id? Reason being is that I would be filtering results based on a city which is a string and I don't want (or need?) to correlate each input to its matching ID number in table2.
SELECT table1.city, table2.city_name
FROM table1
WHERE table2.city_name=(input city name)
JOIN table2.city_name ON table2.city_id = table1.city

Because the join is an inner join the following should lead to equivalent execution plans. (That is, they should exhibit the same performance characteristics -- write the SQL clearly and let the SQL engine do the dirty optimization work.)
As presented in the other answers:
SELECT table1.*, table2.city_name
FROM table1
JOIN table2 ON table1.city_id = table2.city_id
WHERE table2.city_name = (city_input);
And, as what I believe is the "optimized form" presented in the question:
SELECT table1.*, t2.city_name
FROM table1
JOIN (SELECT * FROM table2
WHERE table2.city_name = (city_input)) AS t2
ON table1.city_id = t2.city_id
This is because of the Relation Algebra Model that SQL follows; at least under RA the equality ("selection") here can be moved across the join (a "natural join" in RA) while keeping the same semantics. Of course, "to make for certain", run a query analysis. The most basic form is using EXPLAIN.
Happy coding.

I'm not exactly following what the question is, but I'll say that the proper way to handle a query in which you need to filter on a specific city name, rather than ID, would be like this:
SELECT table1.*, table2.city_name
FROM table1
JOIN table2 ON table1.city_id = table2.city_id
WHERE table2.city_name = (city_input);

Related

sql stored procedure with out duplicates name

How can I create a query to SELECT ALL DB WITHOUT duplicates
Like (old DB that is no longer in use c,f,g. basically if it does have eur and has an original name than it is relevant):
a
b
c
ceur
d
f
feur
g
geur
I need it to be like:
a
b
ceur
d
feur
geur
Many thanks...
SELECT DISTINCT
is what you're looking for. See more here.
For instance, let's say you have a table that contains the following rows:
name, city, address, country.
You now wish to get the countries that has been stored, without duplicates. Multiple people might come from the same country, and so the table would most likely have duplicate entries of that country.
How you achieve this is by using the SELECT DISTINCT.
Example:
SELECT DISTINCT country FROM table_name;
What this will do is retreive the country row without duplicates. That way, you can see which countries are actually stored in that table without duplicates.
If you have multiple databases (I don't know if that's what you were getting at), then you will need to perform a JOIN on the relevant tables, given you have access to them all. I would recommend doing a LEFT JOIN if you are to join more than just 1 extra table.
Example:
SELECT DISTINCT table_name.row_name, table_name.row_name2, table_name.row_name3
FROM table_name
LEFT JOIN table_name2 ON table_name.row_name = table_name2.row_name
LEFT JOIN table_name3 ON table_name2.row_name = table_name3.row_name
[...]
WHERE table.row_name = 'value';
Can you query information_schema.TABLES and distinct in the select, plus a predicate to filter out whatever you don't want?
You can do:
select t.*
from t
where name like '%eur'
union all
select t.*
from t
where not like '%eur' and
not exists (select 1 from t t2 where t2.name = concat(t.name, 'eur');

How to do a join on 2 tables, but only return the data for one table?

I am not sure if this is possible. But is it possible to do a join on 2 tables, but return the data for only one of the tables. I want to join the two tables based on a condition, but I only want the data for one of the tables. Is this possible with SQL, if so how? After reading the docs, it seems that when you do a join you get the data for both tables. Thanks for any help!
You get data from both tables because join is based on "Cartesian Product" + "Selection". But after the join, you can do a "Projection" with desired columns.
SQL has an easy syntax for this:
Select t1.* --taking data just from one table
from one_table t1
inner join other_table t2
on t1.pk = t2.fk
You can chose the table through the alias: t1.* or t2.*. The symbol * means "all fields".
Also you can include where clause, order by or other join types like outer join or cross join.
A typical SQL query has multiple clauses.
The SELECT clause mentions the columns you want in your result set.
The FROM clause, which includes JOIN operations, mentions the tables from which you want to retrieve those columns.
The WHERE clause filters the result set.
The ORDER BY clause specifies the order in which the rows in your result set are presented.
There are a few other clauses like GROUP BY and LIMIT. You can read about those.
To do what you ask, select the columns you want, then mention the tables you want. Something like this.
SELECT t1.id, t1.name, t1.address
FROM t1
JOIN t2 ON t2.t1_id = t1.id
This gives you data from t1 from rows that match t2.
Pro tip: Avoid the use of SELECT *. Instead, mention the columns you want.
This would typically be done using exists (or in) if you prefer:
select t1.*
from table1 t1
where exists (select 1 from table2 t2 on t2.x = t1.y);
Although you can use join, it runs the risk of multiplying the number of rows in the result set -- if there are duplicate matches in table2. There is no danger of such duplicates using exists (or in). I also find the logic to be more natural.
If you join on 2 tables.
You can use SELECT to select the data you want
If you want to get a table of data, you can do this,just select one table date
SELECT b.title
FROM blog b
JOIN type t ON b.type_id=t.id;
If you want to get the data from two tables, you can do this,select two table date.
SELECT b.title,t.type_name
FROM blog b
JOIN type t ON b.type_id=t.id;

Left Join to find difference between two tables with different id

I have two mysql tables: table_old and table_new, both with columns:
name,
surname,
birthdate
birthplace
plus others info columns. Both have an id column that doesn't match between tables. Some records are changed over time between tables, and now I need to have a third table with the records that are changed (added or deleted) between tables.
So I need to compare tables with name and surname and birthdate and birthplace.
I think that I have to use a Left Join, but I'm not sure abut syntax. Any help?
MySQL does not actually support a formal OUTER JOIN operation, but we can simulate one using a union of two joins. Use a full outer join to include records from both tables which do not match to anything in the other table:
SELECT t1.*, 'table one' AS missing
FROM table_old t1
LEFT JOIN table_new t2
ON t1.name = t2.name AND
t1.surname = t2.surname AND
t1.birthdate = t2.birthdate AND
t1.birthplace = t2.birthplace
WHERE t2.name IS NULL
UNION ALL
SELECT t2.*, 'table two'
FROM table_old t1
RIGHT JOIN table_new t2
ON t1.name = t2.name AND
t1.surname = t2.surname AND
t1.birthdate = t2.birthdate AND
t1.birthplace = t2.birthplace
WHERE t1.name IS NULL;
My criteria for claiming that a record is out of date is that any one of the fields does not match. That is, if three fields agree, but one does not, then I do not count it as a match.
Follow the link below for a running demo. You might be able to refine my answer and make it easier to group together pairs of records which are logically the same, but that would take more work.
Demo

Selection SQL where two conditions must be true

What's the most efficient way to select rows that must satisfy two conditions met in the same column?
name | title
------------------
John | Associate
John | Manager
Fran | Manager
Fran | President
I'd like to do something like
select name
from table
where title = 'Associate'
and name in ( select *
from table
where title = 'Manager')
which should return
John
but that seems woefully inefficient, especially if the table itself is super big. You could also do
select name
from table a,
table b
where a.title = 'Associate'
and b.title = 'Manager'
and a.name = b.name
Is that the best one can do?
Your first query is not syntactically correct. It should be:
select name
from table
where title = 'Associate' and
name in (select name from table where title = 'Manager');
The second is better written as a join:
select name
from table a join
table b
on a.title = 'Associate' and b.title = 'Manager' and a.name = b.name;
The second is probably better in terms of taking advantage of indexes on the table.
You can also do this with a group by:
select name
from table a
group by name
having sum(title = 'Associate') > 0 and sum(title = 'Manager') > 0;
MySQL is not very good at optimizing group by. But if there are no indexes on the table, it might be faster than the join methods.
I would have an index on your table via ( title, name ), then do a self-join. In this case, I am putting what would be less probable as the outer primary condition of the query where the manager records are considered first... ie. a company may have 5 managers and 100 associates vs looking for 100 associates that match the 5 managers.
select
t.name
from
table t
join table t2
on t2.title = 'Associate'
AND t.name = t2.name
where
t.title = 'Manager'
There's not a whole lot of data given as an example, but I'm assuming both John's here are the same person with multiple titles? If that were the case you would be better off having your titles being a child table of the employees table (if that's what this table is)
So instead you could have:
Employee
----
id
name
titles
----
id
titleName
employeeTitles
----
employeeId
titleId
If you can't do it this way, i would think another way to write your original query would be:
select name
from table t1
inner join (
select distinct name
from table
where title = 'manager'
) t2 on t1.name = t2.name
where title = 'Associate'
could also do group by name rather than distinct. But still, doing the above solution i think would be better all around (assuming my own assumptions are correct about your data)
Using WHERE EXISTS (or NOT EXISTS) is a very effective alternative for this
select
name
from table1
where title = 'Associate'
and exists (
select 1 /* could be: select NULL as nothing actually needs to be "returned */
from table1 as t2
where t2.title = 'Manager'
and t2.name = table1.name /* correlated here */
)
;
Similar to using IN() it requires a subquery but "correlates" that subquery. However this subquery does NOT need to move any data (IN can require this).
Also, similar to IN() using EXISTS has no impact on the number of result rows. Joins can create unwanted row repetition (but of course sometimes the extra rows are needed).
This reference is for SQL Server, but it compares several relevant methods (ignore Outer Apply - mssql specific), including potential issues dealing with NULLs when using IN() that do not affect EXISTS
| NOT | EXISTS () should be one of the first methods to consider.
It depends on MySQL Version (MySQL 5.6 has query rewrite feature which improves IN() subquery) and Table Relationships.
There are at least 3 ways to get result you're expecting. In my experience, INNE JOIN is faster than others in general cases.
Try your self with your data.
IN () - you've wrote firstly.
please note that in MySQL. IN() produces dependent sub-query plan.
SELECT DISTINCT name
FROM table
WHERE title = 'Associate'
AND name IN (SELECT name FROM table WHERE title = 'Manager')
SELF JOIN - your 2nd
SELECT DISTINCT name
FROM table t1 INNER JOIN table t2
WHERE a.title = 'Associate' AND b.title = 'Manager'
AND t1.name = t2.name
EXISTS - Semi JOIN
EXISTS is fast when tables have 1:n relationship. This requires no DISTINCT, GROUP BY.
SELECT name
FROM table t1
WHERE a.title = 'Associate'
AND EXISTS (SELECT 1 FROM table t2
WHERE t2.name = t1.name AND t2.title = 'Manager')

Inner Join SQL Syntax

I've never done an inner join SQL statement before, so I don't even know if this is the right thing to use, but here's my situation.
Table 1 Columns: id, course_id, unit, lesson
Table 2 Columns: id, course_id
Ultimately, I want to count the number of id's in each unit in Table 1 that are also in Table 2.
So, even though it doesn't work, maybe something like....
$sql = "SELECT table1.unit, COUNT( id ) as count, table2.id, FROM table1, table2, WHERE course_id=$im_course_id GROUP BY unit";
I'm sure the syntax of what I'm wanting to do is a complete fail. Any ideas on fixing it?
SELECT unit, COUNT( t1.id ) as count
FROM table1 as t1 inner JOIN table2 as t2
ON t1.id = t2.id
GROUP BY unit
hope this helps.
If I understand what you want (maybe you could post an example input and output?):
SELECT unit, COUNT( id ) as count
FROM table1 as t1 JOIN table2 as t2
ON t1.id = t2.id
GROUP BY unit
Okay, so there are a few things going on here. First off, commas as joins are deprecated so they may not even be supported (depending on what you are using). You should probably switch to explicitly writing inner join
Now, whenever you have any sort of join, you also need on. You need to tell sql how it should match these two tables up. The on should come right after the join, like this:
Select *
From table1 inner join table2
on table1.id = table2.id
and table1.name = table2.name
You can join on as many things as you need by using and. This means that if the primary key of one table is several columns, you can easily create a one-to-one match between tables.
Lastly, you may be having issues because of other general syntax errors in your query. A comma is used to separate different pieces of information. So in your query,
SELECT table1.unit, COUNT( id ) as count, table2.id, FROM ...
The comma at the end of the select shouldn't be there. Instead this should read
SELECT table1.unit, COUNT( id ) as count, table2.id FROM ...
This is subtle, but the sql query cannot run with the extra comma.
Another issue is with the COUNT( id ) that you have. Sql doesn't know which id to count since table1 and table2 both have ids. So, you should use either count(table1.id) or count(table2.id)