SQL query that searches comma-delimited field - mysql

I have a student table which looks something like this:
id | name | school_descriptors
-------------------------------------------------------
1 | Rob | Comp Sci,Undergraduate,2020
2 | Tim | Business,MBA,2022
3 | Matt | Business,MBA,2022
4 | Jack | Law,Masters,2024
5 | Steph | Comp Sci,Masters,2022
The school_descriptors field is just one column, and stores information about the Course, Qualification and Graduation year as a comma-delimited string. (it's terribly designed and I wish it could be split up into its own fields, but it can't right now (I am not the database owner))
I want to provide an interface where teachers can quickly find students that match certain Course, Qualifications and Graduation years, and thus would like to create relevant queries.
Question 1: For example, I would like a teacher to be able to select from the UI: "Business", "MBA" and get returned students with ID 2 and 3. Specifically, an example question I have is: Find students who are in the Business Course and doing the MBA qualification:
SELECT * FROM student_table WHERE school_descriptors LIKE '%Business%' AND school_descriptors LIKE '%MBA%'
The query I have in mind is a basic LIKE query, but I can't help but think there is a more efficient query that can take advantage of the fact that the school_descriptor string is 1) always in a specific order (e.g. course, qualification, graduation), and 2) comma-delimited, and thus could be perhaps split. The table currently sits at ~5000 rows so relatively small but is expected to grow.
Related question 2: Find students who are in the Comp Sci Course and graduating after 2019:
Would it be possible to split the school_descriptors field and add a >2019 operand?
Many thanks!

In MySql you can use the function SUBSTRING_INDEX() to split the column school_descriptors.
This will work only if the positions of Course, Qualification and Graduation year are fixed.
select *,
substring_index(school_descriptors, ',', 1) Course,
substring_index(substring_index(school_descriptors, ',', 2), ',', -1) Qualification,
substring_index(school_descriptors, ',', -1) Graduation
from student_table
See the demo.
Results:
> id | name | school_descriptors | Course | Qualification | Graduation
> -: | :---- | :-------------------------- | :------- | :------------ | :---------
> 1 | Rob | Comp Sci,Undergraduate,2020 | Comp Sci | Undergraduate | 2020
> 2 | Tim | Business,MBA,2022 | Business | MBA | 2022
> 3 | Matt | Business,MBA,2022 | Business | MBA | 2022
> 4 | Jack | Law,Masters,2024 | Law | Masters | 2024
> 5 | Steph | Comp Sci,Masters,2022 | Comp Sci | Masters | 2022

select id, name,
substring_index(school_descriptors,',',1) as course,
substring_index(substring(school_descriptors,length(substring_index(school_descriptors,',',1))+2,200),',',1) as Qualifications,
substring_index(school_descriptors,',',-1) as year
from student;
output:
+------+-------+----------+----------------+------+
| id | name | course | Qualifications | year |
+------+-------+----------+----------------+------+
| 1 | Rob | Comp Sci | Undergraduate | 2020 |
| 2 | Tim | Business | MBA | 2022 |
| 3 | Matt | Business | MBA | 2022 |
| 4 | Jack | Law | Masters | 2024 |
| 5 | Steph | Comp Sci | Masters | 2022 |
+------+-------+----------+----------------+------+
A link to the docs, in case you want to know about SUBSTRING_INDEX()

Answer 1:
SELECT * FROM student_table WHERE school_descriptors REGEXP ['Business','MBA']
By using this query you can get all the records that are having Business OR MBA.
If you want to select only Business, MBA you can try like this
SELECT * FROM student_table WHERE school_descriptors LIKE '%Business,MBA%'
Answer 2:
SELECT *
FROM student
WHERE
SUBSTRING_INDEX(SUBSTRING_INDEX(school_descriptors , ',', 1), ',', -1)='Comp Sci'
AND
SUBSTRING_INDEX(SUBSTRING_INDEX(school_descriptors , ',', 3), ',', -1)> 2019;

Related

How to output a list by the results of two columns

I’ve a database where I store each product submitted by the curators, and there I register if it was approved. I need to generate a list where I show their score, ordered by the one who has more submitted (subm) and approved (appr). For that I need to get the approval rate (with the division of appr/subm) and we call it ar (Approval rate), and then I need a second operation to get the cs (Curator Score), which is the result of appr*(ar*ar).
The final output should be as the following:
| Curator | subm | appr| ar | cs |
------------------------------------------------
| 1 | 21 | 20 | 95.24% | 18.14058957 |
| 4 | 13 | 12 | 92.31% | 10.22485207 |
| 2 | 10 | 7 | 70.00% | 3.43 |
| 3 | 2 | 2 |100.00% | 2 |
To get the values from the table I use
SELECT curator, SUM(prop) subm, SUM(date) appr
FROM control
GROUP BY curator
ORDER BY cs
But I need to add somewhere:
SUM(appr/subm) ar, SUM(appr*(ar*ar)) cs
But I don’t know how to do this.
It's probably simplest to use your existing query as a subquery:
SELECT *, appr/subm AS ar, appr*(appr/subm*appr/subm)) AS cs
FROM (SELECT curator, SUM(prop) subm, SUM(date) appr
FROM control
GROUP BY curator) c
ORDER BY cs

SQL: how can I use GROUP BY to take an aggregate of an aggregate?

I have a query that groups by (column_a, column_b) and selects an aggregated value. I would like to then group by column_a and take an aggregate sum of the previously aggregated values.
Probably clearer with an example:
We have 3 tables: projects, devs, and contributors. Each project has many contributors, and each dev is a contributor to many projects:
+======== projects =========+ +====== devs =======+
+--------------+------------+ +--------+----------+
| project_name | project_id | | dev_id | dev_name |
+--------------+------------+ +--------+----------+
| parsalot | 1 | | 1 | Ally |
| vimplug | 2 | | 2 | Ben |
| gamify | 3 | | 3 | Chris |
+--------------+------------+ +--------+----------+
+==== contributors ===+
+------------+--------+
| project_id | dev_id |
+------------+--------+
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 3 | 3 |
+------------+--------+
I'm interested in how much work goes into each project. I could just count how many contributors each has, but I'd like to give more weight to contributions made by devs who aren't splitting their time over lots of other projects.
So vimplug is more actively developed than parsalot: each project has two contributors, but one of vimplug's (Ally) does nothing else, whereas parsalot's contributors are both splitting their time across other projects.
I've constructed a query that groups by (project, contributor) and calculates each contributors "dedication" to the project:
SELECT
projects.project_name,
devs.dev_name,
1 / COUNT(contributions.project_id) as dedication
FROM
projects
JOIN
contributors USING (project_id)
JOIN
devs USING (dev_id)
JOIN
contributors contributions USING (dev_id)
GROUP BY projects.project_id , contributors.dev_id;
Which yields,
+--------------+----------+------------+
| project_name | dev_name | dedication |
+--------------+----------+------------+
| parsalot | Ben | 0.5000 |
| parsalot | Chris | 0.5000 |
| vimplug | Ally | 1.0000 |
| vimplug | Ben | 0.5000 |
| gamify | Chris | 0.5000 |
+--------------+----------+------------+
What I really want, though, is the total dedication for each project, i.e.
+--------------+------------------+
| project_name | total_dedication |
+--------------+------------------+
| gamify | 0.5000 |
| parsalot | 1.0000 |
| vimplug | 1.5000 |
+--------------+------------------+
I (naively) tried changing my select statement to
SELECT
projects.project_name,
SUM(1 / COUNT(contributions.project_id)) as total_dedication
but that doesn't work ("Invalid use of group function"). Is there a way I can do this without having to do a sub-select?
Just use a subquery:
select project_name, sum(dedication)
from (<your query here>) q
group by project_name;
You are close to the solution please use the following :
SELECT project_name,sum(dedication) as total_dedication FROM (SELECT
projects.project_name,
devs.dev_name,
1 / COUNT(contributions.project_id) as dedication
FROM
projects
JOIN
contributors USING (project_id)
JOIN
devs USING (dev_id)
JOIN
contributors contributions USING (dev_id)
GROUP BY projects.project_id , contributors.dev_id) as A GROUP BY project_name
Ivan,
You asked "Is there a way I can do this without having to do a sub-select" ... is there a reason you cannot sub-select?
Unfortunately, you'll need to use a sub-select, because you cannot combine aggregate functions (which would be the only way you'd be able to accomplish this). See: How to combine aggregate functions in MySQL?
So as the other answers have shown, you'll have to use a sub-query.

Select query in multi primary key table

I'm making a database with fitness exercises and their equipment needed.
My database is designed like this
+-------+-----------------+
| id(pk)| equip(pk) |
+-------+-----------------+
| 1 | Barbell |
| 1 | Bench |
| 2 | Dumbbell |
| 2 | Bench |
| 3 | Barbell |
| 4 | Dumbbell |
| ... | ..(many rows).. |
+-------+-----------------+
The id stands for a certain exercise and the equip is needed to select that exercise
So for exercise 1 (id = 1) you need a Barbell and Bench.
But for exercise 3 (id = 3) you only need a Barbell
So if the user want exercises containing Barbell and Bench, id 1 and 3 should be selected
Current Query
SELECT * FROM( SELECT id, GROUP_CONCAT(equip SEPARATOR ', ') equip
FROM equip group by id ) as x
This gives the following result
+-------+-----------------+
| id(pk)| equip(pk) |
+-------+-----------------+
| 1 | Barbell, Bench |
| 2 | Dumbbell, Bench |
| 3 | Barbell |
| 4 | Dumbbell |
| ... | ..(many rows).. |
+-------+-----------------+
So if i want to search for Barbell and Bench, 1 and 3 should be selected
Thank you very much :)
I think rather than explaining your expected outcome, just explain the business rules you're trying to implement and give us some insight into the environment you're working in.
Also, what do you mean by "barbell and bench are true"? Varchar fields cannot be true or false.
For instance, your last line talks about weights and support, which are not included in your data set and would probably help in answering the question. Because I don't have rep to comment, I had to create an answer, so here is my best shot without more information:
select * from (
SELECT id, GROUP_CONCAT(equip SEPARATOR ', ') concatenatedEquip
FROM table GROUP BY id )
where (concatenatedEquip contains('Barbell') or concatenatedEquip
contains('Bench'))
so this query would

How to condense a column like this?

I've tried finding something like this, but to no avail...
This is about a system of tables for a customer management system. In particular, I need to create a note history for each customer.
So, I have a table 'customers' with the columns customers.customer_ID, customers.lastname, customers.firstname, customers.postal_code, customers.city and customers.street;
and another table 'notes' with the columns notes.note_ID, notes.customer_ID, notes.subject, notes.description and notes.entered_on
Now I need to create a third table search which condenses much of the information above. It has the tables search.contact_ID, search.name, search.address and search.history. This is supposed to look like this:
contacts:
contact_ID | lastname | firstname | ...
------------+-----------+-----------+-----
1 | Doe | John | ...
2 | Dane | Jane | ...
note:
note_ID | contact_ID | subject | description | entered_on
--------+---------------+-----------------------+-----------------------+----------------
1 | 1 | call received | John Doe called us to | 2014-05-03
| | | ask for an offer |
2 | 1 | offer made | We called John Doe to | 2014-06-03
| | | submit our offer |
3 | 2 | advertisement call | We called Jane Dane to| 2014-06-03
| | | inform her of our |
| | | latest offer |
4 | 1 | offer accepted | John Doe called to | 2014-08-03
| | | accept our offer |
search:
contact_ID | name | address | history
------------+---------------+---------------------------------+-------------------
1 | Doe, John | 55 Main Street, 12345 Oldtown | 'On 2014-08-03 offer accepted: John Doe accepted our offer.
| | | On 2014-06-03 offer made: We called John Doe to submit our offer.
| | | On 2014-05-03 call received: John Doe called us to ask for an offer.'
2 | Dane, Jane | 111 Wall Street, 67890 Newtown | 'On 2014-06-03 advertisement call: We called Jane Dane to submit our offer.'
While I can deal with much of the rest, I have no idea how to generate the history information. My idea was as follows
WHILE
customers.customer_ID = note.customer_ID
AND
note.entered_on = GREATEST(note.entered_on)
DO
SET customers.note_history = CONCAT_WS(' | ', CONCAT_WS(': ',note.subject,note.description), customers.note_history);
But that one isn't necessarily chronological. Also how do I transform that into a statement compatible with the SELECT INTO used for the creation of the rest of the table?
Sounds like a case for a Group-By, along with GROUP_CONCAT
CREATE TABLE search (PRIMARY KEY(contact_ID))
SELECT contact_ID, CONCAT(lastname,', ',firstname) AS name, address,
GROUP_CONCAT(CONCAT('On ',entered_on,' ',subject,': ',description)
ORDER BY note_ID SEPARATOR "\n") AS history
FROM contacts LEFT JOIN note USING (contact_ID)
GROUP BY contact_ID
If dont want to use CREATE TABLE .. SELECT ... , can first just create (or truncate!) the table, and then use INSERT INTO ... SELECT ... instead.

MySQL Relational Division

I am having difficulties to solve one exercise:
For which People there is a Restaurant, that serves ALL their favorite beers.
(Yes, we actually have this in school :D)
I have got 2 Tables that can be used:
Table1: Favoritebeer (Name, Surname, beername)
Table2: OnStock (beername, restaurant, quantity)
My solution would be: OnStock % Favoritebeer
There is no such thing like DIVISION in MySQL. Any ideas how I could solve that? I found the following on Wikipedia: http://en.wikipedia.org/wiki/Relational_algebra#Division_.28.C3.B7.29 which is exactly what I need but I am having difficulties to translate it in SQL.
EDIT:
Here sample data: http://www.sqlfiddle.com/#!2/34e00
The result should be:
Bucher Rolf
Mastroyanni Pepe
Meier Hans
Meier Hanspeter
Meier Hansruedi
Müller Heinrich
Peters Peter
Zarro Darween
Give this a try:
SELECT DISTINCT fb1.name, fb1.surname FROM favoriteBeer fb1
JOIN stock s ON fb1.beerName = s.beerName
GROUP BY fb1.name, fb1.surname, s.restaurant
HAVING COUNT(*) = (
SELECT COUNT(*) FROM favoriteBeer fb2
WHERE fb1.name = fb2.name AND fb1.surname = fb2.surname
)
Output:
| NAME | SURNAME |
|-------------|-----------|
| Bucher | Rolf |
| Mastroyanni | Pepe |
| Meier | Hans |
| Meier | Hanspeter |
| Meier | Hansruedi |
| Müller | Heinrich |
| Peters | Peter |
| Zarro | Darween |
Fiddle here.