I have a table of people who have a name, location (where they live), and a parent_id
(parents are stored on another table). So for example:
name | location | parent_id
--------+-----------+-----------
Joe | Chicago | 12
Sammy | Chicago | 13
Bob | SF | 13
Jim | New York | 13
Jane | Chicago | 14
Dave | Portland | 14
Al | Chicago | 15
Monica | Boston | 15
Debbie | New York | 15
Bill | Chicago | 16
Bruce | New York | 16
I need to count of how many people live in Chicago and have
siblings (share a parent_id) that live in New York. So for the example above,
the count would be 3.
name | location | parent_id
--------+-----------+-----------
Joe | Chicago | 12
Sammy | Chicago | 13 * sibling Jim lives in New York
Bob | SF | 13
Jim | New York | 13
Jane | Chicago | 14
Dave | Portland | 14
Al | Chicago | 15 * sibling Debbie lives in New York
Monica | Boston | 15
Debbie | New York | 15
Bill | Chicago | 16 * sibling Bruce lives in New York
Bruce | New York | 16
Can someone help me write the SQL to query this count?
Looks like Minh's answer works great, but here is another example using a Self Join.
SELECT Count(DISTINCT a.child_id)
FROM people a
JOIN people b ON a.parent_id = b.parent_id
WHERE a.location = 'Chicago' AND b.location = 'New York'
Should produce "3" for just the above table listed.
EDIT: Added a DISTINCT a.parent_id based on Lithis' suggestion.
EDIT2: As noted by Uueerdo, a child_id or some sort of unique id would really help in the case of 2 siblings who live in Chicago and 1 sibling who lives in New York. I have edited the original query to reflect this.
Since this is not truly an "answer" to your question, because there is no such child_id, I would defer to Uueerdo's answer, sorry!
SELECT COUNT(*)
FROM `people` AS p1
WHERE p1.`location` = 'Chicago'
AND p1.parent_id IN (
SELECT DISTINCT parent_id
FROM `people` AS p2
WHERE p2.`location` = 'New York'
)
;
Using Minh's as a base, this should be pretty fast; since the subquery is no longer "correlated", it should not risk the possibility of it needing executed repeatedly, once for every row in people.
The correlated query is a very nice way to go and is very efficient. Avoid the use of distinct as it is an expensive operation. Group by is a nice alternative over the use of distinct. Understand the data and structure the query accordingly. Here is another option that is engine optimized...
select count(*)
from (select * from #t where Location = 'Chicago') ch
inner join (select * from #t where Location = 'New York') ny on ch.ParentID = ny.ParentID
Maybe try this?
SELECT Count(*)
FROM table table1
WHERE table1.location= 'Chicago'
AND EXISTS (SELECT * FROM table table2
WHERE table1.parent_id= table2.parent_id
AND table2.location= 'New York')
Related
I want to create a new column from this example table but with one more condition that so far I couldn't figure out, I want to create an average holdings column that's specific to each city.
Name | City | Holdings
Tom Jones | London | 42
Rick James| Paris | 83
Mike Tim | NY | 83
Milo James| London | 83
So in this example table London has more than one instance and accordingly it will have a unique value of '62.5' indicating an average of holdings that's specific to the value in the city column.
Name | City | Holdings | City Avg. Holdings
Tom Jones | London | 42 | 62.5
Rick James| Paris | 36 | 36
Mike Tim | NY | 70 | 70
Milo James| London | 83 | 62.5
In MySQL 8.0, this is straight-forward with window functions:
select t.*, avg(holdings) over(partition by city) avg_city_holdings
from mytable t
In earlier versions, you can join:
select t.*, a.avg_city_holdings
from mytable t
left join (select city, avg(holdings) avg_city_holdings from mytable group by city) a
on a.city = t.city
I am working on a project where every user has an id and can have multiple regions as follows:
Regions Table
ID | State | etc..
-----------
122 | MD
122 | FL
122 | NY
122 | NJ
122 | CA
11 | NC
11 | SC
11 | GA
I would like to essentially write a query that will create a result set where every user ID only appears once and if the user ID is listed multiple times, it concatenates the column as follows...
ID | State
----------
122 | MD, FL, NY, NJ, CA
11 | NC, SC, GA
Is this possible? I appreciate any suggestions.
Thanks in advance!
You can use GROUP_CONCAT:
SELECT id, GROUP_CONCAT(state SEPARATOR '|')
FROM regions
GROUP BY id
I need to perform a relatively easy to explain but (given my somewhat limited skills) hard to write SQL query.
Assume we have a table similar to this one:
exam_no | name | surname | result | date
---------+------+---------+--------+------------
1 | John | Doe | PASS | 2012-01-01
1 | Ryan | Smith | FAIL | 2012-01-02 <--
1 | Ann | Evans | PASS | 2012-01-03
1 | Mary | Lee | FAIL | 2012-01-04
... | ... | ... | ... | ...
2 | John | Doe | FAIL | 2012-02-01 <--
2 | Ryan | Smith | FAIL | 2012-02-02
2 | Ann | Evans | FAIL | 2012-02-03
2 | Mary | Lee | PASS | 2012-02-04
... | ... | ... | ... | ...
3 | John | Doe | FAIL | 2012-03-01
3 | Ryan | Smith | FAIL | 2012-03-02
3 | Ann | Evans | PASS | 2012-03-03
3 | Mary | Lee | FAIL | 2012-03-04 <--
Note that exam_no and date aren't necessarily related as one might expect from the kind of example I chose.
Now, the query that I need to do is as follows:
From the latest exam (exam_no = 3) find all the students that have failed (John Doe, Ryan Smith and Mary Lee).
For each of these students find the date of the first of the batch of consecutively failing exams. Another way to put it would be: for each of these students find the date of the first failing exam that comes after their last passing exam. (Look at the arrows in the table).
The resulting table should be something like this:
name | surname | date_since_failing
------+---------+--------------------
John | Doe | 2012-02-01
Ryan | Smith | 2012-01-02
Mary | Lee | 2012-03-04
How can I perform such a query?
Thank you for your time.
You can take advantage of the fact that if someone passed the most recent exam, then they have not failed any exams since their most recent pass: therefore the problem reduces to finding the first exam failed since the most recent pass:
SELECT name, surname, MIN(date) date_since_fail
FROM results NATURAL LEFT JOIN (
SELECT name, surname, MAX(date) lastpass
FROM results
WHERE result = 'PASS'
GROUP BY name, surname
) t
WHERE result = 'FAIL' AND date > IFNULL(lastpass,0)
GROUP BY name, surname
See it on sqlfiddle.
I should use a subquery that fetch last passed exam,
somthing like:
SET #query_exam_no = 3;
SELECT
name,
surname,
MIN(IF(date > last_passed_exam, date, NULL)) AS date_failing_since
FROM
exam_results
LEFT JOIN (
SELECT
name,
surname,
MAX(date) AS last_passed_exam
FROM exam_results
WHERE result = 'PASS'
GROUP BY name, surname
) AS last_passed_exams USING (name, surname)
HAVING
MAX(IF(exam_no = #query_exam_no, result, NULL)) = 'FAIL'
GROUP BY name, surname
This is enough:
select t.name,
t.surname,
t.date as 'date_since_failing'
from tablename t
inner join
(
select name,
surname,
max(exam_no) as exam_no
from tablename
group by name, surname
having min(result) = 'FAIL'
) aux on t.name = aux.name and t.surname = aux.surname and t.exam_no = aux.exam_no
The condition you are asking for is good for nothing you can do it without it. Here is the working example.
select
e.name,
e.sur_name,
min(e.date) as `LastFailed`
from exams as e
where e.result = 'Fail'
group by e.name
order by e.name
This produces this result
name sur_name LastFailed
Ann Evans 2012-02-03
John Doe 2012-02-01
Mary Lee 2012-01-04
Ryan Smith 2012-01-02
I have UserComments table like this :
1 | Frank | hello world
2 | Jane | Hi there
3 | Frank | this is my comments
4 | Frank | I think I need some sleep
5 | Jason | I need to buy new MacBook
6 | Jane | Please invite my new Blackberry PIN
On the other hand, I have FriendList table contains :
1 | Jason
2 | Jane
Let's say my friends ID always BETWEEN 1 AND 5.
And since Frank is not my friend, I'm not able to see his comments. how to have combined tables like this (ORDER BY UserComments.ID DESC) :
1 | Jane | Please invite my new Blackberry PIN
2 | Jason | I need to buy new MacBook
3 | Jane | Hi there
thanks.
Try this:
SELECT A.ID, B.UserName, B.Comment
FROM FriendList A
INNER JOIN UserComments B ON A.ID = B.ID
ORDER BY A.ID DESC
Try this::
Select
*
from
user_comments inner join friend_List on (join criteria)
where user_Id = ? order by user_comments.id desc
Let's say I've got the following data in one-to-many tables city and person, respectively:
SELECT city.*, person.* FROM city, person WHERE city.city_id = person.person_city_id;
+---------+-------------+-----------+-------------+----------------+
| city_id | city_name | person_id | person_name | person_city_id |
+---------+-------------+-----------+-------------+----------------+
| 1 | chicago | 1 | charles | 1 |
| 1 | chicago | 2 | celia | 1 |
| 1 | chicago | 3 | curtis | 1 |
| 1 | chicago | 4 | chauncey | 1 |
| 2 | new york | 5 | nathan | 2 |
| 3 | los angeles | 6 | luke | 3 |
| 3 | los angeles | 7 | louise | 3 |
| 3 | los angeles | 8 | lucy | 3 |
| 3 | los angeles | 9 | larry | 3 |
+---------+-------------+-----------+-------------+----------------+
9 rows in set (0.00 sec)
And I want to select a single record from person for each unique city using some particular logic. For example:
SELECT city.*, person.* FROM city, person WHERE city.city_id = person.person_city_id
GROUP BY city_id ORDER BY person_name DESC
;
The implication here is that within each city, I want to get the lexigraphically greatest value, eg:
+---------+-------------+-----------+-------------+----------------+
| city_id | city_name | person_id | person_name | person_city_id |
+---------+-------------+-----------+-------------+----------------+
| 2 | new york | 5 | nathan | 2 |
| 3 | los angeles | 6 | luke | 3 |
| 1 | chicago | 1 | curtis | 1 |
+---------+-------------+-----------+-------------+----------------+
The actual output I get, however, is:
+---------+-------------+-----------+-------------+----------------+
| city_id | city_name | person_id | person_name | person_city_id |
+---------+-------------+-----------+-------------+----------------+
| 2 | new york | 5 | nathan | 2 |
| 3 | los angeles | 6 | luke | 3 |
| 1 | chicago | 1 | charles | 1 |
+---------+-------------+-----------+-------------+----------------+
I understand that the reason for this discrepancy is that MySQL first performs the GROUP BY, then it does the ORDER BY. This is unfortunate for me, as I want the GROUP BY to have selection logic in which record it picks.
I can workaround this by using some nested SELECT statements:
SELECT c.*, p.* FROM city c,
( SELECT p_inner.* FROM
( SELECT * FROM person ORDER BY person_city_id, person_name DESC ) p_inner
GROUP BY person_city_id ) p
WHERE c.city_id = p.person_city_id;
+---------+-------------+-----------+-------------+----------------+
| city_id | city_name | person_id | person_name | person_city_id |
+---------+-------------+-----------+-------------+----------------+
| 1 | chicago | 3 | curtis | 1 |
| 2 | new york | 5 | nathan | 2 |
| 3 | los angeles | 6 | luke | 3 |
+---------+-------------+-----------+-------------+----------------+
This seems like it would be terribly inefficient when the person table grows arbitrarily large. I assume the inner SELECT statements don't know about outermost WHERE filters. Is this true?
What is the accepted best approach for doing what effectively is an ORDER BY before the GROUP BY?
The usual way to do this (in MySQL) is with a join of your table to itself.
First to get the greatest person_name per city (ie per person_city_id in the person table):
SELECT p.*
FROM person p
LEFT JOIN person p2
ON p.person_city_id = p2.person_city_id
AND p.person_name < p2.person_name
WHERE p2.person_name IS NULL
This joins person to itself within each person_city_id (your GROUP BY variable), and also pairs the tables up such that p2's person_name is greater than p's person_name.
Since it's a left join if there's a p.person_name for which there is no greater p2.person_name (within that same city), then the p2.person_name will be NULL. These are precisely the "greatest" person_names per city.
So to join your other information (from city) to it, just do another join:
SELECT c.*,p.*
FROM person p
LEFT JOIN person p2
ON p.person_city_id = p2.person_city_id
AND p.person_name < p2.person_name
LEFT JOIN city c -- add in city table
ON p.person_city_id = c.city_id -- add in city table
WHERE p2.person_name IS NULL -- ORDER BY c.city_id if you like
Your "solution" is not valid SQL but it works in MySQL. You can't be sure however if it will break with a future change in the query optimizer code. It could be slightly improved to have just 1 level of nesting (still not valid SQL):
--- Option 1 ---
SELECT
c.*
, p.*
FROM
city AS c
JOIN
( SELECT *
FROM person
ORDER BY person_city_id
, person_name DESC
) AS p
ON c.city_id = p.person_city_id
GROUP BY p.person_city_id
Another way (valid SQL syntax, works in other DBMS, too) is to make a subquery to select the last name for every city and then join:
--- Option 2 ---
SELECT
c.*
, p.*
FROM
city AS c
JOIN
( SELECT person_city_id
, MAX(person_name) AS person_name
FROM person
GROUP BY person_city_id
) AS pmax
ON c.city_id = pmax.person_city_id
JOIN
person AS p
ON p.person_city_id = pmax.person_city_id
AND p.person_name = pmax.person_name
Another way is the self join (of the table person), with the < trick that #mathematical_coffee describes.
--- Option 3 ---
see #mathematical-coffee's answer
Yet another way is to use a LIMIT 1 subquery for the join of city with person:
--- Option 4 ---
SELECT
c.*
, p.*
FROM
city AS c
JOIN
person AS p
ON
p.person_id =
( SELECT person_id
FROM person AS pm
WHERE pm.person_city_id = c.city_id
ORDER BY person_name DESC
LIMIT 1
)
This will run a subquery (on table person) for every city and it will be efficient if you have a (person_city_id, person_name) index for InnoDB engine or an (person_city_id, person_name, person_id) for MyISAM engine.
There is one major difference between these options:
Oprions 2 and 3 will return all tied results (if you have two or more persons in a city with same name that is alphabetically last, then both or all will be shown).
Options 1 and 4 will return one result per city, even if there are ties. You can choose which one by altering the ORDER BY clause.
Which option is more efficient depends also on the distribution of your data, so the best way is to try them all, check their execution plans and find the best indexes that work for each one. An index on (person_city_id, person_name) will most likely be good for any of those queries.
With distribution I mean:
Do you have few cities with many persons per city? (I would think that options 2 and 4 would behave better in this case)
Or many cities with few persons per city? (option 3 may be better with such data).