I have inherited a table with information about some groups of people in which one field which contains delimited data, with the results matched to another table.
id_group Name
-----------------------
1 2|4|5
2 3|4|6
3 1|2
And in another table I have a list of people who may belong to one or more groups
id_names Names
-----------------------
1 Jack
2 Joe
3 Fred
4 Mary
5 Bill
I would like to perform a select on the group data which results in a single field containing a comma or space delimited list of names such as this from the first group row above "Joe Fred Bill"
I have looked at using a function to split the delimited string, and also looked at sub queries, but concatenating the results of sub queries quickly becomes huge.
Thanks!
As implied by Strawberry's comment above, there is a way to do this, but it's so ugly. It's like finishing your expensive kitchen remodel using duct tape. You should feel resentment toward the person who designed the database this way.
SELECT g.id_group, GROUP_CONCAT(n.Names SEPARATOR ' ') AS Names
FROM groups AS g JOIN names AS n
ON FIND_IN_SET(n.id_names, REPLACE(g.Name, '|', ','))
GROUP BY g.id_group;
Output, tested on MySQL 5.6:
+----------+---------------+
| id_group | Names |
+----------+---------------+
| 1 | Joe Mary Bill |
| 2 | Fred Mary |
| 3 | Jack Joe |
+----------+---------------+
The complexity of this query, and the fact that it will be forced to do a table-scan and cannot be optimized, should convince you of what is wrong with storing a list of id's in a delimited string.
The better solution is to create a third table, in which you store each individual member of the group on a row by itself. That is, multiple rows per group.
CREATE TABLE group_name (
id_group INT NOT NULL,
id_name INT NOT NULL,
PRIMARY KEY (id_group, id_name)
);
Then you can query in a simpler way, and you have an opportunity to create indexes to make the query very fast.
SELECT id_group, GROUP_CONCAT(names SEPARATOR ' ') AS names
FROM groups
JOIN group_name USING (id_group)
JOIN names USING (id_name)
Shadow is correct. Your primary problem is the bad design of relations in the database. Typically one designs this kind of business problems as a so-called M:N relation (M to N). To accomplish that you need 3 tables:
first table is groups that has a GroupId field with primary key on it and a readable name field (e.g. 'group1' or whatever)
second table is people that looks exactly as you showed above. (do not forget to include a primary key in the PeopleId field also here)
third table is a bridge table called GroupMemberships. That one has 2 fields GroupId and PeopleId. This table connects the first two with each other and marks the M:N relation. One group can have 1 to N members and people can be members of 1 to M groups.
Finally, just join together the tables in the select and aggregate:
SELECT
g.Name,
GROUP_CONCAT(p.Name ORDER BY p.PeopleId DESC SEPARATOR ';') AS Members
FROM
Groups AS g
INNER JOIN GroupMemberships AS gm ON g.GroupId = gm.GroupId
INNER JOIN people AS p ON gm.PeopleId = p.PeopleId
GROUP BY g.Name;
Related
I'm attempting to take an existing application and re-architect the schema to support new customer requests and fix several outstanding issues (mostly around our current schema being heavily denormalized). In doing so, I've reached an interesting problem which at first glance seems to have a simple solution, but I can't seem to find the function I'm looking for.
The application is a media organization tool.
Our Old Schema:
Our old schema had separate models for "Groups", "Subgroups", and "Videos". A Group could have many Subgroups (one-to-many) and a Subgroup could have many Videos (one-to-many).
There were certain fields that were shared among Groups, Subgroups, and Videos. For instance, the Google Analytics ID to be used when the Video was embedded on a page. Whenever we displayed the embed page we would first look if the value was set on the Video. If not, we checked its Subgroup. If not, we checked its Group. The query looked roughly like so (I wish this were the real query, but unfortunately our application was written over many years by many junior developers, so the truth is much more painful):
SELECT
v.id,
COALESCE(v.google_analytics_id, sg.google_analytics_id, g.google_analytics_id) as google_analytics_id
FROM
Videos v
LEFT JOIN Subgroups sg ON sg.id = v.subgroup_id
LEFT JOIN Groups g ON g.id = sg.group_id
Pretty straight-forward. Now the issue we've run into is that customers want to be able to nest groups arbitrarily deep, and our schema clearly only allows for 2 levels (and, in fact, necessitates two levels - even if you only want one)
New Schema (First Pass):
As a first pass, I knew we'd want a basic tree structure for the Groups, so I came up with this:
CREATE TABLE Groups (
id INT PRIMARY KEY,
name VARCHAR(255),
parent_id INT,
ga_id VARCHAR(20)
)
We can then easily nest up to N levels deep with N joins like so:
SELECT
v.id,
COALESCE(v.ga_id, g1.ga_id, g2.ga_id, g3.ga_id, ...) as ga_id
FROM
Videos v
LEFT JOIN Groups g1 ON g1.id = v.group_id
LEFT JOIN Groups g2 ON g2.id = g1.parent_id
LEFT JOIN Groups g3 ON g3.id = g2.parent_id
...
There's obvious flaws with this approach: We don't know how many parents there will be so we don't know how many times we should JOIN, forcing us to implement a "max depth". Then even with a max depth, if a person only has a single level of groups we still perform multiple JOINs because our queries can't know how deep they need to go. MySQL offers recursive queries, but while looking into if that was the right option I found a smarter schema that produced the same results
New Schema (Take 2):
Looking into better ways to handle a tree structure, I learned about Adjacency Lists (my prior solution), Nested Sets, Materialized Paths, and Closure Tables. Other than Adjacency Lists (which depend on JOINs to grab the entire tree structure and so produces a single row with multiple columns per node on the tree), the other three solutions all return multiple rows for each node on the tree
I ended up going with a Closure Table solution like so:
CREATE TABLE Groups (
id INT PRIMARY KEY,
name VARCHAR(255),
ga_id VARCHAR(20)
)
CREATE TABLE Group_Closure (
ancestor_id INT,
descendant_id INT,
PRIMARY KEY (ancestor_id, descendant_id)
)
Now given a Video I can get all of its parents like so:
SELECT
v.id,
v.ga_id,
g.id,
g.ga_id
FROM
Videos v
JOIN Group_Closure gc ON v.group_id = gc.descendant
JOIN Groups g ON g.id = gc.ancestor;
This returns each group in the hierarchy as a separate row:
+------+---------+------+---------+
| v.id | v.ga_id | g.id | g.ga_id |
+------+---------+------+---------+
| 1 | abc123 | 2 | new_val |
| 1 | abc123 | 1 | default |
| 2 | NULL | 4 | xyz987 |
| 2 | NULL | 3 | NULL |
| 2 | NULL | 1 | default |
| 3 | NULL | 3 | NULL |
| 3 | NULL | 1 | default |
+------+---------+------+---------+
What I wish to do now is somehow achieve the same result I would have expected from using COALESCE on multiple self-joined Group tables: a single value for ga_id based on whichever node is "lowest" in the tree
Because I have multiple rows per Video, I suspect that this can be accomplished using GROUP BY and some kind of aggregate function:
SELECT
v.id,
COALESCE(v.ga_id, FIRST_NON_NULL(g.ga_id))
FROM
Videos v
JOIN Group_Closure gc ON v.group_id = gc.descendant
JOIN Groups g ON g.id = gc.ancestor
GROUP BY v.id, v.ga_id;
Note that because (ancestor, descendant) is my primary key, I believe the order of the group closure table can be guaranteed to always come back the same - meaning if I put the lowest node first, it will be the first row in the resulting query... If my understanding of this is incorrect, please let me know.
If you were to stick with an adjacency list, you could use a recursive CTE. This one traverses up from each video id value until it finds a non-NULL ga_id:
WITH RECURSIVE CTE AS (
SELECT id, ga_id, group_id
FROM videos
UNION ALL
SELECT CTE.id, COALESCE(CTE.ga_id, g.ga_id), g.parent_id
FROM `groups` g
JOIN CTE ON g.id = CTE.group_id AND CTE.ga_id IS NULL
)
SELECT id, ga_id
FROM CTE
WHERE ga_id IS NOT NULL
For my attempt to reconstruct your data from your question, this yields:
id ga_id
1 abc123
2 xyz987
3 default
Demo on dbfiddle
Would someone mind advising me please regarding this table setup.
Its the first time designing a database. This will be a part of it.
Its a report writing application. Multiple Engineers can be assigned to attend any job/report and multiple engineers can author the report as well as attending.
Is this the best way to do this. I would need to be able to search attendees and authors separately in the application.
Thanks very much for the assistance.
You have, I believe, two tables containing entities. The entities are employee and report.
These entities have two different many-to-many relationships: author and attendee.
So your tables are these
employee report
-------- -----
employee_id (PK) report_id (PK)
surname title
givenname releasedate
whatever whatever
Then you have two many:many relationship tables with the same columns as each other. One is author and the other is attendee.
author / attendee
------
employee_id PK, FK to employee.employee_id
report_id PK, FK to report.report_id
Notice the compound (two-column) primary keys.
+---------------------+\ /+-------------+\ /+-----------------------+
| +-----+ author +-----+ |
| |/ \+-------------+/ \| |
| employee | | report |
| | | |
| |\ /+-------------+\ /| |
| +-----+ attendee +-----+ |
+---------------------+/ \+-------------+/ \+-----------------------+
\ /
----- means a many-to-many relationship
/ \
When you determine an employee is an attendee for a certain report, you insert a row into the attendee table with the correct employee and report.
If you want, for example, all authors for each report you can do this sort of thing:
SELECT r.title, r.releasedate,
GROUP_CONCAT(e.surname ORDER BY e.surname SEPARATED BY ',')surnames
FROM report r
LEFT JOIN author a ON r.report_id = a.report_id
LEFT JOIN employee e ON a.report_id = e.report_id
GROUP BY r.title, r.releasedate
ORDER BY r.releasedate DESC
The LEFT JOIN operations allow your query to find reports that have no authors. Ordinary inner JOIN operations would suppress those rows from your result set.
There is a limitation with this strict E:R design. For many kinds of reports, (scientific papers for example) the order of authors is critically important. (You want to start an academic food fight? List the authors of a paper in the wrong order.)
So you author table might also contain an ordinal value.
author
------
employee_id PK, FK to employee.employee_id
report_id PK, FK to report.report_id
ordinal INT
and your report query would contain this line.
GROUP_CONCAT(e.surname ORDER BY e.ordinal SEPARATED BY ',')surnames
I have two tables:
Table A:
id | name
Table B:
id | hash | owners_id
owners_id contains the ids from table A.
Example:
Table A:
id | name
1 | James
2 | Jonas
Table B:
id | hash | owners_id
1 | j28sj | 1,2
Expect Result:
James | j28sj
Jonas | j28sj
Because both contain the ownerds_id
I'm trying to make a query that selects all the names from table A associates with table B owners_id column.
SELECT
A.name,
B.hash
FROM
A
left JOIN B ON
B.owners_id LIKE CONCAT('%', A.id, '%')
Note: The database you designed is poorly designed and it may not work it you have owners_id like 1,11,111 .so either you need to make seperate table with many to many relation or put leading zeros like 001,011,111
There are a couple ways to do this. If you want to keep owners_id as a comma-separated string, it's a bit messy. You need to first parse the string into a list of integers to form the join condition:
SELECT A.name, B.hash FROM A
LEFT JOIN B
ON find_in_set(A.id,B.owners_id) <> 0;
You may want to consider letting owners_id be an integer foreign key to Table A, if you can change your schema.
Here's a working SQL fiddle:
http://sqlfiddle.com/#!9/320477/4
I have got a table names in MySQL with following columns ID, type, row, value
The composite primary key is ID, type, row
The purpose of this table is to save all names and professions of a specified person in multiple rows - one data per row.
For example: Commonly in Spain people have two first names and two last names, like José Anastacio Rojas Laguna.
In germany, there are many persons having one first name but two last names. And even persons with wide profession, like teaching on university and working as a doctor in a hospital at the same time. In this case, in germany people would have trailing Prof. Dr. in their names. For example: Prof. Dr. José Anastacio Rojas Laguna
In this case, I would store all these information in the table like this:
ID | type | row | value
1 | 0 | 1 | Prof.
1 | 0 | 2 | Dr.
1 | 1 | 1 | José
1 | 1 | 2 | Anastacio
1 | 2 | 1 | Rojas
1 | 2 | 2 | Laguna
An ID is given for one single person. Every people in the table have one unique ID and even one person is identified by his ID. type defines as it says the type of the name. 0 means profession, 1 means first name and 2 means last name. row defines the position in the name. 1 means 1st first name, 2 means 2nd firstname, 3 means 3rd firstname, etc... The same for profession and last name.
Now I would like to find out, how i can SELECT the ID of a specified person by just passing some of the names of that person? How can I determine the ID by only giving a few of the values, which occur all in (or have) the same ID?
This will return users that have the name José Laguna with the same ID:
select t1.id, t1.name, t2.name
from yourTable t1
join (select * from yourTable
where name = 'Laguna') t2
on t1.id = t2.id
where t1.name = 'José'
I use José You could use a variable #searchText
SELECT *
FROM YourTable
WHERE ID IN (SELECT DISTINCT ID
FROM YourTable
WHERE value = 'José')
Or maybe use an IN if multiple parameters
WHERE value IN ('José', 'Laguna')
So here's something using GROUP_CONCAT. Tested with your sample data and works.
It groups together all of the person's titles into a single column, their given name into another single column, and all their family names into a third column. It wraps each of those columns with commas to ensure finding a particular name is accurate.
The snippet below will find anyone who:
Has atleast one given name of "José" and
Has atleast one family name of "Rojas"
All you have to do to find a different user is change the WHERE clause.
SELECT n.ID,n.type,n.row,n.value
FROM names n
INNER JOIN (
SELECT ID
FROM (
SELECT ID
,CONCAT(',',GROUP_CONCAT((CASE WHEN type=0 THEN value ELSE NULL END) ORDER BY value ASC),',') AS titles
,CONCAT(',',GROUP_CONCAT((CASE WHEN type=1 THEN value ELSE NULL END) ORDER BY value ASC),',') AS givenNames
,CONCAT(',',GROUP_CONCAT((CASE WHEN type=2 THEN value ELSE NULL END) ORDER BY value ASC),',') AS familyNames
FROM `names`
GROUP BY ID
) grouped
WHERE grouped.givenNames LIKE '%,Jose,%' AND grouped.familyNames LIKE '%,rojas,%'
) people ON n.ID = people.ID
Before edit, this may have not worked as intended. The extra commas ensure the name searched for is not found as a substring
Sorry if my question seems unclear, I'll try to explain.
I have a column in a row, for example /1/3/5/8/42/239/, let's say I would like to find a similar one where there is as many corresponding "ids" as possible.
Example:
| My Column |
#1 | /1/3/7/2/4/ |
#2 | /1/5/7/2/4/ |
#3 | /1/3/6/8/4/ |
Now, by running the query on #1 I would like to get row #2 as it's the most similar. Is there any way to do it or it's just my fantasy? Thanks for your time.
EDIT:
As suggested I'm expanding my question. This column represents favourite artist of an user from a music site. I'm searching them like thisMyColumn LIKE '%/ID/%' and remove by replacing /ID/ with /
Since you did not provice really much info about your data I have to fill the gaps with my guesses.
So you have a users table
users table
-----------
id
name
other_stuff
And you like to store which artists are favorites of a user. So you must have an artists table
artists table
-------------
id
name
other_stuff
And to relate you can add another table called favorites
favorites table
---------------
user_id
artist_id
In that table you add a record for every artist that a user likes.
Example data
users
id | name
1 | tom
2 | john
artists
id | name
1 | michael jackson
2 | madonna
3 | deep purple
favorites
user_id | artist_id
1 | 1
1 | 3
2 | 2
To select the favorites of user tom for instance you can do
select a.name
from artists a
join favorites f on f.artist_id = a.id
join users u on f.user_id = u.id
where u.name = 'tom'
And if you add proper indexing to your table then this is really fast!
Problem is you're storing this in a really, really awkward way.
I'm guessing you have to deal with an arbitrary number of values. You have two options:
Store the multiple ID's in a blob object in JSON format. While MySQL doesn't have JSON functions built in, there are user defined functions that will extract values for you, etc.
See: http://blog.ulf-wendel.de/2013/mysql-5-7-sql-functions-for-json-udf/
Alternatively, switch to PostGres
Add as many columns to your table as the maximum number of ID's you expect to have. So if /1/3/7/2/4/8/ is the longest entry, have 6 columns in your table. Reason this is bad: you'll have sparse columns that'll unnecessarily slow your tables.
I'm sure you could write some horrific regex to accomplish the task, but I caution on using complex regex's on enormous tables.