Concatenating various field values for the same ID - mysql

I am working on a project where every user has an id and can have multiple regions as follows:
Regions Table
ID | State | etc..
-----------
122 | MD
122 | FL
122 | NY
122 | NJ
122 | CA
11 | NC
11 | SC
11 | GA
I would like to essentially write a query that will create a result set where every user ID only appears once and if the user ID is listed multiple times, it concatenates the column as follows...
ID | State
----------
122 | MD, FL, NY, NJ, CA
11 | NC, SC, GA
Is this possible? I appreciate any suggestions.
Thanks in advance!

You can use GROUP_CONCAT:
SELECT id, GROUP_CONCAT(state SEPARATOR '|')
FROM regions
GROUP BY id

Related

Create a new column from existing columns

I want to create a new column from this example table but with one more condition that so far I couldn't figure out, I want to create an average holdings column that's specific to each city.
Name | City | Holdings
Tom Jones | London | 42
Rick James| Paris | 83
Mike Tim | NY | 83
Milo James| London | 83
So in this example table London has more than one instance and accordingly it will have a unique value of '62.5' indicating an average of holdings that's specific to the value in the city column.
Name | City | Holdings | City Avg. Holdings
Tom Jones | London | 42 | 62.5
Rick James| Paris | 36 | 36
Mike Tim | NY | 70 | 70
Milo James| London | 83 | 62.5
In MySQL 8.0, this is straight-forward with window functions:
select t.*, avg(holdings) over(partition by city) avg_city_holdings
from mytable t
In earlier versions, you can join:
select t.*, a.avg_city_holdings
from mytable t
left join (select city, avg(holdings) avg_city_holdings from mytable group by city) a
on a.city = t.city

Selecting lowest integer value (non unique) for each table entry that shares the same column value [duplicate]

This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 3 years ago.
I have a single database table 'Accomodation' that lists information of hotel/B&B suites with the city they are located in, the price and an ID for each suite that functions as the primary key.
It looks similar to this:
id | city | Price
ams001 | Amsterdam | 160
ams011 | Amsterdam | 120
par004 | Paris | 90
par006 | Paris | 120
rom005 | Rome | 130
rom015 | Rome | 130
I want to list all information of the cheapest accomodation for each city, however if two records both share the same lowest price I want to display both of these.
The result should look something like this:
ams011 | Amsterdam | 120
par004 | Paris | 90
rom005 | Rome | 130
rom015 | Rome | 130
I have tried using
SELECT * FROM accomodation
WHERE price IN (SELECT MIN(price) FROM accomodation GROUP BY city);
However this will produce a table like this
ams011 | Amsterdam | 120
par004 | Paris | 90
par006 | Paris | 120
rom005 | Rome | 130
rom015 | Rome | 130
Since the 120 price is the cheapest for Amsterdam it will show up at Paris too.
If I add a group by statement at the end, outside of the subquery like this:
SELECT * FROM accomodation
WHERE price IN (SELECT MIN(price) FROM accomodation GROUP BY city)
GROUP BY city;
It will fail to display lowest values that are identical and I'm left with a table like this:
ams011 | Amsterdam | 120
par004 | Paris | 90
rom015 | Rome | 130
First GROUP BY city to get the min price for each city and then join to the table:
SELECT a.*
FROM accomodation a INNER JOIN (
SELECT city, MIN(price) minprice
FROM accomodation
GROUP BY city
) g ON g.city = a.city AND g.minprice = a.price

Count SQL records based on sibling property

I have a table of people who have a name, location (where they live), and a parent_id
(parents are stored on another table). So for example:
name | location | parent_id
--------+-----------+-----------
Joe | Chicago | 12
Sammy | Chicago | 13
Bob | SF | 13
Jim | New York | 13
Jane | Chicago | 14
Dave | Portland | 14
Al | Chicago | 15
Monica | Boston | 15
Debbie | New York | 15
Bill | Chicago | 16
Bruce | New York | 16
I need to count of how many people live in Chicago and have
siblings (share a parent_id) that live in New York. So for the example above,
the count would be 3.
name | location | parent_id
--------+-----------+-----------
Joe | Chicago | 12
Sammy | Chicago | 13 * sibling Jim lives in New York
Bob | SF | 13
Jim | New York | 13
Jane | Chicago | 14
Dave | Portland | 14
Al | Chicago | 15 * sibling Debbie lives in New York
Monica | Boston | 15
Debbie | New York | 15
Bill | Chicago | 16 * sibling Bruce lives in New York
Bruce | New York | 16
Can someone help me write the SQL to query this count?
Looks like Minh's answer works great, but here is another example using a Self Join.
SELECT Count(DISTINCT a.child_id)
FROM people a
JOIN people b ON a.parent_id = b.parent_id
WHERE a.location = 'Chicago' AND b.location = 'New York'
Should produce "3" for just the above table listed.
EDIT: Added a DISTINCT a.parent_id based on Lithis' suggestion.
EDIT2: As noted by Uueerdo, a child_id or some sort of unique id would really help in the case of 2 siblings who live in Chicago and 1 sibling who lives in New York. I have edited the original query to reflect this.
Since this is not truly an "answer" to your question, because there is no such child_id, I would defer to Uueerdo's answer, sorry!
SELECT COUNT(*)
FROM `people` AS p1
WHERE p1.`location` = 'Chicago'
AND p1.parent_id IN (
SELECT DISTINCT parent_id
FROM `people` AS p2
WHERE p2.`location` = 'New York'
)
;
Using Minh's as a base, this should be pretty fast; since the subquery is no longer "correlated", it should not risk the possibility of it needing executed repeatedly, once for every row in people.
The correlated query is a very nice way to go and is very efficient. Avoid the use of distinct as it is an expensive operation. Group by is a nice alternative over the use of distinct. Understand the data and structure the query accordingly. Here is another option that is engine optimized...
select count(*)
from (select * from #t where Location = 'Chicago') ch
inner join (select * from #t where Location = 'New York') ny on ch.ParentID = ny.ParentID
Maybe try this?
SELECT Count(*)
FROM table table1
WHERE table1.location= 'Chicago'
AND EXISTS (SELECT * FROM table table2
WHERE table1.parent_id= table2.parent_id
AND table2.location= 'New York')

Use MySQL to create a string out of a subquery?

What I'm hoping to do is create a string out of a table WITHIN a query so that I may be able to place that string in another query I'm creating. Say, I have this for a table:
index | position | name
----------------------------------------
1 | member | John Smith
2 | chair | Mary Jones
3 | member | Mary Jones
4 | contact | Grace Adams
5 | director | Grace Adams
6 | member | Grace Adams
7 | treasurer | Bill McDonnell
8 | vice chair | Bill McDonnell
9 | member | Ishmael Rodriguez
I'm looking for the result as follows:
name | positions
----------------------------------------
John Smith | member
Mary Jones | chair,member
Grace Adams | contact,director,member
Bill McDonnell | treasurer,vice chair
Ishmael Rodriguez | member
I was hoping I could use some variant of CONCAT_WS() to get my result, like this...
SELECT
a.NAME,
CONCAT_WS(
',',
(
SELECT
position
FROM
TABLE
WHERE
NAME = a.NAME
)
)AS positions FROM ---------------
Obviously, this isn't working out for me. Any ideas?
Use GROUP_CONCAT[docs]
SELECT name, GROUP_CONCAT(position) result
FROM tableName
GROUP BY name
ORDER BY `index`
SQLFiddle Demo
Use GROUP_CONCAT like so:
SELECT name, GROUP_CONCAT(position SEPARATOR ',')
FROM Table
GROUP BY name

Most efficient way to SELECT one row in a one:many pair of tables in MySQL

Let's say I've got the following data in one-to-many tables city and person, respectively:
SELECT city.*, person.* FROM city, person WHERE city.city_id = person.person_city_id;
+---------+-------------+-----------+-------------+----------------+
| city_id | city_name | person_id | person_name | person_city_id |
+---------+-------------+-----------+-------------+----------------+
| 1 | chicago | 1 | charles | 1 |
| 1 | chicago | 2 | celia | 1 |
| 1 | chicago | 3 | curtis | 1 |
| 1 | chicago | 4 | chauncey | 1 |
| 2 | new york | 5 | nathan | 2 |
| 3 | los angeles | 6 | luke | 3 |
| 3 | los angeles | 7 | louise | 3 |
| 3 | los angeles | 8 | lucy | 3 |
| 3 | los angeles | 9 | larry | 3 |
+---------+-------------+-----------+-------------+----------------+
9 rows in set (0.00 sec)
And I want to select a single record from person for each unique city using some particular logic. For example:
SELECT city.*, person.* FROM city, person WHERE city.city_id = person.person_city_id
GROUP BY city_id ORDER BY person_name DESC
;
The implication here is that within each city, I want to get the lexigraphically greatest value, eg:
+---------+-------------+-----------+-------------+----------------+
| city_id | city_name | person_id | person_name | person_city_id |
+---------+-------------+-----------+-------------+----------------+
| 2 | new york | 5 | nathan | 2 |
| 3 | los angeles | 6 | luke | 3 |
| 1 | chicago | 1 | curtis | 1 |
+---------+-------------+-----------+-------------+----------------+
The actual output I get, however, is:
+---------+-------------+-----------+-------------+----------------+
| city_id | city_name | person_id | person_name | person_city_id |
+---------+-------------+-----------+-------------+----------------+
| 2 | new york | 5 | nathan | 2 |
| 3 | los angeles | 6 | luke | 3 |
| 1 | chicago | 1 | charles | 1 |
+---------+-------------+-----------+-------------+----------------+
I understand that the reason for this discrepancy is that MySQL first performs the GROUP BY, then it does the ORDER BY. This is unfortunate for me, as I want the GROUP BY to have selection logic in which record it picks.
I can workaround this by using some nested SELECT statements:
SELECT c.*, p.* FROM city c,
( SELECT p_inner.* FROM
( SELECT * FROM person ORDER BY person_city_id, person_name DESC ) p_inner
GROUP BY person_city_id ) p
WHERE c.city_id = p.person_city_id;
+---------+-------------+-----------+-------------+----------------+
| city_id | city_name | person_id | person_name | person_city_id |
+---------+-------------+-----------+-------------+----------------+
| 1 | chicago | 3 | curtis | 1 |
| 2 | new york | 5 | nathan | 2 |
| 3 | los angeles | 6 | luke | 3 |
+---------+-------------+-----------+-------------+----------------+
This seems like it would be terribly inefficient when the person table grows arbitrarily large. I assume the inner SELECT statements don't know about outermost WHERE filters. Is this true?
What is the accepted best approach for doing what effectively is an ORDER BY before the GROUP BY?
The usual way to do this (in MySQL) is with a join of your table to itself.
First to get the greatest person_name per city (ie per person_city_id in the person table):
SELECT p.*
FROM person p
LEFT JOIN person p2
ON p.person_city_id = p2.person_city_id
AND p.person_name < p2.person_name
WHERE p2.person_name IS NULL
This joins person to itself within each person_city_id (your GROUP BY variable), and also pairs the tables up such that p2's person_name is greater than p's person_name.
Since it's a left join if there's a p.person_name for which there is no greater p2.person_name (within that same city), then the p2.person_name will be NULL. These are precisely the "greatest" person_names per city.
So to join your other information (from city) to it, just do another join:
SELECT c.*,p.*
FROM person p
LEFT JOIN person p2
ON p.person_city_id = p2.person_city_id
AND p.person_name < p2.person_name
LEFT JOIN city c -- add in city table
ON p.person_city_id = c.city_id -- add in city table
WHERE p2.person_name IS NULL -- ORDER BY c.city_id if you like
Your "solution" is not valid SQL but it works in MySQL. You can't be sure however if it will break with a future change in the query optimizer code. It could be slightly improved to have just 1 level of nesting (still not valid SQL):
--- Option 1 ---
SELECT
c.*
, p.*
FROM
city AS c
JOIN
( SELECT *
FROM person
ORDER BY person_city_id
, person_name DESC
) AS p
ON c.city_id = p.person_city_id
GROUP BY p.person_city_id
Another way (valid SQL syntax, works in other DBMS, too) is to make a subquery to select the last name for every city and then join:
--- Option 2 ---
SELECT
c.*
, p.*
FROM
city AS c
JOIN
( SELECT person_city_id
, MAX(person_name) AS person_name
FROM person
GROUP BY person_city_id
) AS pmax
ON c.city_id = pmax.person_city_id
JOIN
person AS p
ON p.person_city_id = pmax.person_city_id
AND p.person_name = pmax.person_name
Another way is the self join (of the table person), with the < trick that #mathematical_coffee describes.
--- Option 3 ---
see #mathematical-coffee's answer
Yet another way is to use a LIMIT 1 subquery for the join of city with person:
--- Option 4 ---
SELECT
c.*
, p.*
FROM
city AS c
JOIN
person AS p
ON
p.person_id =
( SELECT person_id
FROM person AS pm
WHERE pm.person_city_id = c.city_id
ORDER BY person_name DESC
LIMIT 1
)
This will run a subquery (on table person) for every city and it will be efficient if you have a (person_city_id, person_name) index for InnoDB engine or an (person_city_id, person_name, person_id) for MyISAM engine.
There is one major difference between these options:
Oprions 2 and 3 will return all tied results (if you have two or more persons in a city with same name that is alphabetically last, then both or all will be shown).
Options 1 and 4 will return one result per city, even if there are ties. You can choose which one by altering the ORDER BY clause.
Which option is more efficient depends also on the distribution of your data, so the best way is to try them all, check their execution plans and find the best indexes that work for each one. An index on (person_city_id, person_name) will most likely be good for any of those queries.
With distribution I mean:
Do you have few cities with many persons per city? (I would think that options 2 and 4 would behave better in this case)
Or many cities with few persons per city? (option 3 may be better with such data).