How to find items without relation to MySQL table - mysql

I got a problem to exclude items on my MySQL query. I want to get all animals that have no relation to "Asia" e.g.
My tables look like that.
Table 'animals'
+----+--------------+
| id | name |
+----+--------------+
| 1 | Tiger |
| 2 | Lion |
| 3 | Spider |
| 4 | Bird |
+----+--------------+
Table 'continent'
+----+--------------+
| id | name |
+----+--------------+
| 1 | Europe |
| 2 | Asia |
| 3 | Africa |
+----+--------------+
Table 'relations'
+----+--------+-----------+
| id | animal | continent |
+----+--------+-----------+
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 2 | 2 |
| 4 | 2 | 3 |
| 5 | 3 | 3 |
| 6 | 4 | 2 |
+----+--------+-----------+
This is what my query looks like:
SELECT a.`id`,
a.`name`
FROM a.`animals` AS a
LEFT JOIN `relations` AS r
ON r.`animal` = a.`id`
WHERE r.`continent` != 2
ORDER BY a.`name` asc;
The problem ist that this gives me the following result:
Lion
Spider
Tiger
The thing is that "Lion" has a relation to continent Asia (ID 2) and shouldn't be in the results. Can you please help me to solve this issue?

Use NOT EXISTS to show only these animals for which there is no relation to Asia continent:
select a.*
from animals a
where not exists (
select 1
from relations r
join continent c on
c.id = r.continent
where c.name = 'Asia'
and a.id = r.animal
)

It's because Lion have a relation with another country that isn't Asia.
What you want to do is :
SELECT a.id, a.name
FROM animals a
WHERE a.id NOT IN (
SELECT DISTINCT r.animal FROM relations r WHERE r.continent = 2
)
ORDER BY a.name DESC;
;)

You obtain liion because you have lion (id 2) in the relation table for continent ASIA
could be you need the animal that are only i continent different for asia then
SELECT a.`id`,
a.`name`
FROM a.`animals` AS a
LEFT JOIN `relations` AS r
ON r.`animal` = a.`id`
WHERE r.`continent` != 2
AND a.id not in (
select animal from relation where continent = 2
)
ORDER BY a.`name` asc;

One option uses an EXISTS clause:
SELECT
a.id, a.name
FROM animals a
WHERE NOT EXISTS (SELECT 1 FROM relations r INNER JOIN continent c
ON r.continent = c.id
WHERE a.id = r.animal AND c.name = 'Asia');
Demo
The idea here is that for each animal, we scan the relations table joined to continent searching for the same animal being assigned to the Asia continent. If we can't find that relationship, then retain that particular animal.

select a.* from animal A where a.id not in(select animal from relations where continent=2);

Related

SQL-Query for a table with a foreign-key-field that references other foreign-key-fields

I have the following structure of tables in my database:
[table workers]
ID [PK] | worker | combined [FK]
--------+--------+--------------+
1 | John | 2
--------------------------------+
2 | Adam | 1
[table combined]
ID [PK] | name | helper [FK]
--------+----------------------+
1 | name1 | 1
2 | name2 | 2
[table helper]
ID [PK] | department [FK] | location [FK]
--------+-------------+-------------------
1 | 2 | 3
2 | 1 | 1
[table departments]
ID [PK] | department
--------+-------------+
1 | Development |
2 | Production |
[table location]
ID [PK] | department
--------+--------------+
1 | Paris |
2 | London |
3 | Berlin |
The table "workers" has an foreign-key-field ("combined"). The table "combined" has a field name and a foreign-key-field "helper" which again is a table with two foreign-key-fields.
My question is now, what is the simplest SQL-Query to get the following table:
[table workers]
ID [PK] | worker | combined-Name| department | location
--------+--------+--------------+------------+-----------
1 | John | name2 | Development| Paris
--------------------------------+------------+-----------
2 | Adam | name1 | Production | Berlin
I tried it already with some LEFT-JOINS but did not manage it to get all "clearnames" to the table "workers"
This query would work:
SELECT w.ID, worker, c.name AS `combined-Name`, d.department, l.department as
location FROM workers w
LEFT JOIN combined c ON c.ID = w.combined
LEFT JOIN helper h ON h.ID = c.helper
LEFT JOIN departments d ON d.ID = h.department
LEFT JOIN location l ON l.ID = h.location
GROUP BY w.ID
I used the AS keyword to set the names to your preferred output.
This was tested locally using the provided structures and data.
It's basically 4 simple left joins, and then instead of selecting the ID's I select the name columns of the foreign tables.
The alias on c.name is quoted because we need to escape the special character -
use following query:
select [workers].worker,[combined].name as combined-name,[departments].name as department,[location].name as location from [workers]
left join [combined] on [workers].combined = [combined].combined
left join [helper] on [helper].ID = [combined].helper
left join [departments] on [departments].ID = [helper].department
left join [location] on [location].ID = [helper].location

MySQL Select Combined Unique

Table: Contacts
id | name | has_this
------------------------
1 | Jeff | 0
2 | Terry | 1
3 | Tom | 0
4 | Henry | 1
Table: has_thing
id | owner | thing
---------------------
1 | Terry | stuff
2 | Tom | stuff
3 | Toby | stuff
I want a SELECT that will return
name | thing
-------------
Terry | stuff
Tom | stuff
Henry |
Toby | stuff
Basically, I think I want a JOIN but I want any name that is in table 2(has_thing) that is not in table 1 to be included the output and any name that is in table 1(Contacts) WHERE has_this=1 to be included in the output
SELECT name, MAX(thing) as thing
FROM (SELECT c.name, h.thing
FROM Contacts AS c
JOIN has_thing AS h ON c.name = h.name
UNION
SELECT name, ''
FROM Contacts
WHERE has_thing = 1) AS subquery
GROUP BY name
MAX(thing) ensures that we pick up the non-empty thing from the first query when the contact has has_thing = 1.
You could also do it with LEFT JOIN:
SELECT c.name, IFNULL(h.thing, '') AS thing
FROM Contacts AS c
LEFT JOIN has_thing AS h ON c.name = h.name
WHERE c.has_thing = 1
OR h.name IS NOT NULL

SUM For Distinct Rows

Given the following table structures:
countries: id, name
regions: id, country_id, name, population
cities: id, region_id, name
...and this query...
SELECT c.name AS country, COUNT(DISTINCT r.id) AS regions, COUNT(s.id) AS cities
FROM countries AS c
JOIN regions AS r ON r.country_id = c.id
JOIN cities AS s ON s.region_id = r.id
GROUP BY c.id
How would I add a SUM of the regions.population value to calculate the country's population? I need to only use the value of each region once when summing, but the un-grouped result has multiple rows for each region (the number of cities in that region).
Example data:
mysql> SELECT * FROM countries;
+----+-----------+
| id | name |
+----+-----------+
| 1 | country 1 |
| 2 | country 2 |
+----+-----------+
2 rows in set (0.00 sec)
mysql> SELECT * FROM regions;
+----+------------+-----------------------+------------+
| id | country_id | name | population |
+----+------------+-----------------------+------------+
| 11 | 1 | region 1 in country 1 | 10 |
| 12 | 1 | region 2 in country 1 | 15 |
| 21 | 2 | region 1 in country 2 | 25 |
+----+------------+-----------------------+------------+
3 rows in set (0.00 sec)
mysql> SELECT * FROM cities;
+-----+-----------+---------------------------------+
| id | region_id | name |
+-----+-----------+---------------------------------+
| 111 | 11 | City 1 in region 1 in country 1 |
| 112 | 11 | City 2 in region 1 in country 1 |
| 121 | 12 | City 1 in region 2 in country 1 |
| 211 | 21 | City 1 in region 1 in country 2 |
+-----+-----------+---------------------------------+
4 rows in set (0.00 sec)
Desired output with example data:
+-----------+---------+--------+------------+
| country | regions | cities | population |
+-----------+---------+--------+------------+
| country 1 | 2 | 3 | 25 |
| country 2 | 1 | 1 | 25 |
+-----------+---------+--------+------------+
I prefer a solution that doesn't require changing the JOIN logic.
The accepted solution for this post seems to be in the neighborhood of what I'm looking for, but I haven't been able to figure out how to apply it to my issue.
MY SOLUTION
SELECT c.id AS country_id,
c.name AS country,
COUNT(x.region_id) AS regions,
SUM(x.population) AS population,
SUM(x.cities) AS cities
FROM countries AS c
LEFT JOIN (
SELECT r.country_id,
r.id AS region_id,
r.population AS population,
COUNT(s.id) AS cities
FROM regions AS r
LEFT JOIN cities AS s ON s.region_id = r.id
GROUP BY r.country_id, r.id, r.population
) AS x ON x.country_id = c.id
GROUP BY c.id, c.name
Note: My actual query is much more complex and has nothing to do with countries, regions, or cities. This is a minimal example to illustrate my issue.
First of all, the other post you reference is not the same situation. In that case, the joins are like [A -> B and A -> C], so the weighted average (which is what that calculation does) is correct. In your case the joins are like [A -> B -> C], so you need a different approach.
The simplest solution that comes to mind right away does involve a subquery, but not a complex one:
SELECT
c.name AS country,
COUNT(r.id) AS regions,
SUM(s.city_count) AS cities,
SUM(r.population) as population
FROM countries AS c
JOIN regions AS r ON r.country_id = c.id
JOIN
(select region_id, count(*) as city_count
from cities
group by region_id) AS s
ON s.region_id = r.id
GROUP BY c.id
The reason this works is that it resolves the cities to one row per region before joining to the region, thus eliminating the cross join situation.
How about leaving the rest and just adding one more join for the population
SELECT c.name AS country,
COUNT(distinct r.id) AS regions,
COUNT(s.id) AS cities,
pop_regs.sum as total_population
FROM countries AS c
LEFT JOIN regions AS r ON r.country_id = c.id
LEFT JOIN cities AS s ON s.region_id = r.id
left join
(
select country_id, sum(population) as sum
from regions
group by country_id
) pop_regs on pop_regs.country_id = c.id
GROUP BY c.id, c.name
SQLFiddle demo
To start, you should know that the question and it's solution mentioned in your question are a little bit different from your question and it's solution. That's why you can not use only JOINs without sub-queries.
Tables :
Countries :
===========================
| id | name |
===========================
| 1 | country 1 |
---------------------------
| 2 | country 2 |
---------------------------
| 3 | country 3 |
---------------------------
| 4 | country 4 |
---------------------------
Regions :
=============================================
| id |country_id| name |population|
=============================================
| 1 | 1 | c1 - r1 | 10 |
---------------------------------------------
| 2 | 1 | c1 - r2 | 15 |
---------------------------------------------
| 3 | 1 | c1 - r3 | 15 |
---------------------------------------------
| 4 | 2 | c2 - r1 | 25 |
---------------------------------------------
| 5 | 3 | c3 - r1 | 13 |
---------------------------------------------
Cities :
========================================
| id | region_id | name |
========================================
| 1 | 1 | city 1 |
----------------------------------------
| 2 | 1 | city 2 |
----------------------------------------
| 3 | 2 | city 3 |
----------------------------------------
| 4 | 2 | city 4 |
----------------------------------------
| 5 | 2 | city 5 |
----------------------------------------
| 6 | 3 | city 6 |
----------------------------------------
| 7 | 3 | city 7 |
----------------------------------------
| 8 | 4 | city 8 |
----------------------------------------
| 9 | 4 | city 9 |
----------------------------------------
| 10 | 4 | city 10 |
----------------------------------------
As a simple method, you can join countries table with a sub-query that joins regions and cities tables to get 2 tables : countries and regions with cities columns :
SQL :
SELECT
r.id AS id,
r.country_id AS country_id,
r.name AS name,
r.population AS population,
COUNT(s.region_id) AS cities
FROM regions r
/* we use left joint and not only join to get also regions without cities */
LEFT JOIN cities s
ON r.id = s.region_id
GROUP BY r.id
Data :
==================================================================
| id | country_id | name | population | cities |
==================================================================
| 1 | 1 | c1 - r1 | 10 | 2 |
------------------------------------------------------------------
| 2 | 1 | c1 - r2 | 15 | 3 |
------------------------------------------------------------------
| 3 | 1 | c1 - r3 | 15 | 2 |
------------------------------------------------------------------
| 4 | 2 | c2 - r1 | 25 | 3 |
------------------------------------------------------------------
| 5 | 3 | c3 - r1 | 13 | 0 |
------------------------------------------------------------------
Then you have to do your normal requet which gives you this code :
SQL :
SELECT
c.name AS country,
COUNT(r.country_id) AS regions,
/* ifnull is used here to show 0 instead of null */
SUM(IFNULL(r.cities, 0)) AS cities,
SUM(IFNULL(r.population, 0)) AS population
FROM countries c
/* we use left joint and not only join to get also countries without regions */
LEFT JOIN (
SELECT
/* we don't need regions.id and regions.name */
r.country_id AS country_id,
r.population AS population,
COUNT(s.region_id) AS cities
FROM regions r
LEFT JOIN cities s
ON r.id = s.region_id
GROUP BY r.id
) r
ON c.id = r.country_id
GROUP BY c.id
And this result :
=====================================================
| country | regions | cities | population |
=====================================================
| country 1 | 3 | 7 | 40 |
-----------------------------------------------------
| country 2 | 1 | 3 | 25 |
-----------------------------------------------------
| country 3 | 1 | 0 | 13 |
-----------------------------------------------------
| country 4 | 0 | 0 | 0 |
-----------------------------------------------------
To compare, using only JOIN removes countries without regions and countries with regions that haven't cities :
=====================================================
| country | regions | cities | population |
=====================================================
| country 1 | 3 | 7 | 40 |
-----------------------------------------------------
| country 2 | 1 | 3 | 25 |
-----------------------------------------------------
For your exact example (with data mentioned in your question), you will get :
=====================================================
| country | regions | cities | population |
=====================================================
| country 1 | 2 | 3 | 25 |
-----------------------------------------------------
| country 2 | 1 | 1 | 25 |
-----------------------------------------------------
I hope all that can help you to get what you want.
I have test in sql with this query for the same table you provide below
select regioncount.name as country,regioncount.regions, citycount.cities,regioncount.population from
(SELECT c.name,c.id,COUNT(r.id) AS regions ,SUM(r.population) as population
FROM countries AS c
JOIN regions AS r on c.id = r.country_id GROUP BY c.id,c.name) as regioncount
join
(SELECT
r.country_id,
COUNT(s.id) AS cities
FROM regions AS r
JOIN cities AS s on r.id =s.region_id GROUP BY r.country_id) as citycount on citycount.country_id = regioncount.id
and i got the result u want
+-----------+---------+--------+------------+
| country | regions | cities | population |
+-----------+---------+--------+------------+
| country 1 | 2 | 3 | 25 |
| country 2 | 1 | 1 | 25 |
+-----------+---------+--------+------------+
Use LEFT OUTER JOIN instead of INNER JOIN because If country have no regions then that country will not come in result if you use INNER JOIN, same wat If any regions have no cities then that will not counted in result.
So use LEFT OUTER JOIN instead of INNER JOIN or JOIN.
Try this:
SELECT c.name AS country, r.regions, r.population, r.cities
FROM countries AS c
LEFT OUTER JOIN (SELECT r.country_id,
COUNT(r.id) AS regions,
SUM(r.population) AS population,
SUM(c.cities) AS cities
FROM regions AS r
LEFT OUTER JOIN (SELECT c.region_id, COUNT(c.id) AS cities
FROM cities AS C
GROUP BY c.region_id
) AS c ON r.id = c.region_id
GROUP BY r.country_id
) AS r ON c.id = r.country_id;
Check the SQL FIDDLE DEMO
OUTPUT
| COUNTRY | REGIONS | POPULATION | CITIES |
|---------|---------|------------|--------|
| usa | 3 | 16 | 4 |
| germany | 2 | 5 | 1 |
Here's another way of doing it, if you dont want to introduce/change a JOIN or a SUBQUERY
SELECT
c.name AS country,
COUNT(distinct r.id) AS regions,
COUNT(s.id) AS cities,
SUM(DISTINCT(((((r.id*r.id) + (r.population*r.id)))-(r.id*r.id))/r.id)) as total_population
FROM
countries AS c
JOIN regions AS r ON r.country_id = c.id
LEFT JOIN cities AS s ON s.region_id = r.id
GROUP
BY c.id
http://sqlfiddle.com/#!2/3dd8ba/22/0
Your problem is quite common. You join all tables that have something to do with the data you want to see, and then you start thinking about how to get to that data. When it comes to different aggregations as in your case, this is not easy to achieve.
So better join what you are actually interested in. In your case: countries and (aggregated) region/city data per country. This keeps the query straight-forward and easy to maintain.
select
c.name as country,
r.regions,
r.population,
r.cities
from countries as c
join
(
select
country_id,
count(*) as regions,
sum(population) as population,
sum((select count(*) from cities where cities.region_id = regions.id)) as cities
from regions
group by country_id
) as r on r.country_id = c.id;

Most efficient way to SELECT one row in a one:many pair of tables in MySQL

Let's say I've got the following data in one-to-many tables city and person, respectively:
SELECT city.*, person.* FROM city, person WHERE city.city_id = person.person_city_id;
+---------+-------------+-----------+-------------+----------------+
| city_id | city_name | person_id | person_name | person_city_id |
+---------+-------------+-----------+-------------+----------------+
| 1 | chicago | 1 | charles | 1 |
| 1 | chicago | 2 | celia | 1 |
| 1 | chicago | 3 | curtis | 1 |
| 1 | chicago | 4 | chauncey | 1 |
| 2 | new york | 5 | nathan | 2 |
| 3 | los angeles | 6 | luke | 3 |
| 3 | los angeles | 7 | louise | 3 |
| 3 | los angeles | 8 | lucy | 3 |
| 3 | los angeles | 9 | larry | 3 |
+---------+-------------+-----------+-------------+----------------+
9 rows in set (0.00 sec)
And I want to select a single record from person for each unique city using some particular logic. For example:
SELECT city.*, person.* FROM city, person WHERE city.city_id = person.person_city_id
GROUP BY city_id ORDER BY person_name DESC
;
The implication here is that within each city, I want to get the lexigraphically greatest value, eg:
+---------+-------------+-----------+-------------+----------------+
| city_id | city_name | person_id | person_name | person_city_id |
+---------+-------------+-----------+-------------+----------------+
| 2 | new york | 5 | nathan | 2 |
| 3 | los angeles | 6 | luke | 3 |
| 1 | chicago | 1 | curtis | 1 |
+---------+-------------+-----------+-------------+----------------+
The actual output I get, however, is:
+---------+-------------+-----------+-------------+----------------+
| city_id | city_name | person_id | person_name | person_city_id |
+---------+-------------+-----------+-------------+----------------+
| 2 | new york | 5 | nathan | 2 |
| 3 | los angeles | 6 | luke | 3 |
| 1 | chicago | 1 | charles | 1 |
+---------+-------------+-----------+-------------+----------------+
I understand that the reason for this discrepancy is that MySQL first performs the GROUP BY, then it does the ORDER BY. This is unfortunate for me, as I want the GROUP BY to have selection logic in which record it picks.
I can workaround this by using some nested SELECT statements:
SELECT c.*, p.* FROM city c,
( SELECT p_inner.* FROM
( SELECT * FROM person ORDER BY person_city_id, person_name DESC ) p_inner
GROUP BY person_city_id ) p
WHERE c.city_id = p.person_city_id;
+---------+-------------+-----------+-------------+----------------+
| city_id | city_name | person_id | person_name | person_city_id |
+---------+-------------+-----------+-------------+----------------+
| 1 | chicago | 3 | curtis | 1 |
| 2 | new york | 5 | nathan | 2 |
| 3 | los angeles | 6 | luke | 3 |
+---------+-------------+-----------+-------------+----------------+
This seems like it would be terribly inefficient when the person table grows arbitrarily large. I assume the inner SELECT statements don't know about outermost WHERE filters. Is this true?
What is the accepted best approach for doing what effectively is an ORDER BY before the GROUP BY?
The usual way to do this (in MySQL) is with a join of your table to itself.
First to get the greatest person_name per city (ie per person_city_id in the person table):
SELECT p.*
FROM person p
LEFT JOIN person p2
ON p.person_city_id = p2.person_city_id
AND p.person_name < p2.person_name
WHERE p2.person_name IS NULL
This joins person to itself within each person_city_id (your GROUP BY variable), and also pairs the tables up such that p2's person_name is greater than p's person_name.
Since it's a left join if there's a p.person_name for which there is no greater p2.person_name (within that same city), then the p2.person_name will be NULL. These are precisely the "greatest" person_names per city.
So to join your other information (from city) to it, just do another join:
SELECT c.*,p.*
FROM person p
LEFT JOIN person p2
ON p.person_city_id = p2.person_city_id
AND p.person_name < p2.person_name
LEFT JOIN city c -- add in city table
ON p.person_city_id = c.city_id -- add in city table
WHERE p2.person_name IS NULL -- ORDER BY c.city_id if you like
Your "solution" is not valid SQL but it works in MySQL. You can't be sure however if it will break with a future change in the query optimizer code. It could be slightly improved to have just 1 level of nesting (still not valid SQL):
--- Option 1 ---
SELECT
c.*
, p.*
FROM
city AS c
JOIN
( SELECT *
FROM person
ORDER BY person_city_id
, person_name DESC
) AS p
ON c.city_id = p.person_city_id
GROUP BY p.person_city_id
Another way (valid SQL syntax, works in other DBMS, too) is to make a subquery to select the last name for every city and then join:
--- Option 2 ---
SELECT
c.*
, p.*
FROM
city AS c
JOIN
( SELECT person_city_id
, MAX(person_name) AS person_name
FROM person
GROUP BY person_city_id
) AS pmax
ON c.city_id = pmax.person_city_id
JOIN
person AS p
ON p.person_city_id = pmax.person_city_id
AND p.person_name = pmax.person_name
Another way is the self join (of the table person), with the < trick that #mathematical_coffee describes.
--- Option 3 ---
see #mathematical-coffee's answer
Yet another way is to use a LIMIT 1 subquery for the join of city with person:
--- Option 4 ---
SELECT
c.*
, p.*
FROM
city AS c
JOIN
person AS p
ON
p.person_id =
( SELECT person_id
FROM person AS pm
WHERE pm.person_city_id = c.city_id
ORDER BY person_name DESC
LIMIT 1
)
This will run a subquery (on table person) for every city and it will be efficient if you have a (person_city_id, person_name) index for InnoDB engine or an (person_city_id, person_name, person_id) for MyISAM engine.
There is one major difference between these options:
Oprions 2 and 3 will return all tied results (if you have two or more persons in a city with same name that is alphabetically last, then both or all will be shown).
Options 1 and 4 will return one result per city, even if there are ties. You can choose which one by altering the ORDER BY clause.
Which option is more efficient depends also on the distribution of your data, so the best way is to try them all, check their execution plans and find the best indexes that work for each one. An index on (person_city_id, person_name) will most likely be good for any of those queries.
With distribution I mean:
Do you have few cities with many persons per city? (I would think that options 2 and 4 would behave better in this case)
Or many cities with few persons per city? (option 3 may be better with such data).

Concatenating rows in relation to a JOIN

Suppose I have a cooking show:
cookingepisodes
id | date
---------------
1 | A
2 | B
3 | C
4 | D
…
This show reviews products in these categories (left) and are linked by the table to the right:
tests testitems
id | name id | episodeid | testid | name
------------ ------------------------------------
1 | cutlery 1 | 1 | 1 | Forks
2 | spices 2 | 2 | 1 | Knives
3 | 4 | 1 | Spoons
4 | 4 | 2 | Oregano
My desired output is this:
showid | testid | testname
4 | 1,2 | cutlery, spices
3 | NULL | NULL
2 | 1 | cutlery
1 | 1 | cutlery
I've tried using this query, and it works as long as I don't need to concatenate the results (when there are two tests on the same episode). Then the join will create multiple rows based on the number of
SELECT DISTINCT e.*, i.testid, t.name AS testname
FROM cookingepisodes AS e
LEFT OUTER JOIN testitems AS i ON i.episodeid = e.id
LEFT OUTER JOIN tests AS t ON i.testid = t.id
ORDER BY e.date DESC
I've also tried something like this, but I can't get it to work because of the outer block reference (e.id):
JOIN (
SELECT GROUP_CONCAT(DISTINCT testid)
FROM testitems
WHERE testitems.episodeid = e.id
) AS i
Any tips on how I can solve this without restructuring the database?
Try this one -
SELECT
ce.id showid,
GROUP_CONCAT(te.testid) testid,
GROUP_CONCAT(t.name) testname
FROM cookingepisodes ce
LEFT JOIN testitems te
ON te.episodeid = ce.id
LEFT JOIN tests t
ON t.id = te.testid
GROUP BY
ce.id DESC;