Table Architecture Difficulty with Query - mysql

I'm working on a practice problem with DDL as follows:
CREATE TABLE people (
id SMALLINT NOT NULL AUTO_INCREMENT,
first_name VARCHAR(50),
last_name VARCHAR(50),
PRIMARY KEY (id)
)
;
CREATE TABLE cd (
id SMALLINT NOT NULL AUTO_INCREMENT,
artist VARCHAR(50),
title VARCHAR(50),
PRIMARY KEY(id),
owner SMALLINT,
FOREIGN KEY (owner) REFERENCES people(id)
)
;
CREATE TABLE lend (
id SMALLINT NOT NULL AUTO_INCREMENT,
cd_id SMALLINT,
lend_to SMALLINT,
FOREIGN KEY (lend_to) REFERENCES people(id),
FOREIGN KEY (cd_id) REFERENCES cd(id),
lend_date DATE DEFAULT '0000-00-00',
PRIMARY KEY(id)
)
;
INSERT INTO people (id, first_name, last_name) VALUES
(1, 'Brett', 'CEO'),
(2, 'Jeff', 'President'),
(3, 'Beta', 'Media'),
(4, 'Casey', 'Content')
;
INSERT INTO cd (id, artist, title, owner) VALUES
(1, 'The xx', 'Coexist', 2),
(2, 'ACDC', 'High Voltage', 1),
(3, 'Bjork', 'Cocoon', 3),
(4, 'Ella Fitzgerald', 'Ella Sings Gershwin', 4),
(5, 'Fever Ray', 'Live in Lulea', 2),
(6, 'Tom Waits', 'Rain Dogs', 4),
(7, 'Howlin Wolf', 'Smokestack Lightning', 1),
(8, 'Tupac', 'Poetic Justice', 4)
;
INSERT INTO lend (id, cd_id, lend_to, lend_date) VALUES
(1, 2, 3, '2014/01/03'),
(2, 3, 1, '2014/04/02'),
(3, 7, 4, '2013/12/22'),
(4, 4, 2, '2014/01/03')
;
I want my query to show who the CD is lent to. I can get the ID from the lend table, but want to display the full name of the individual lending it from the people table. Do I need to rework the design of how the lend table connects to the people table, or just use some sort of case function in the query? Below is my query so-far where I'm getting the l.lent_to and want to be showing the CONCAT(p.first_name, ' ', p.last_name) who the CD is lent to.
SELECT /*cd.id,*/
CONCAT(p.first_name, ' ', p.last_name) 'CD OWNER',
cd.title,
l.lend_to,
p.id ,
(
CASE
WHEN l.lend_to IS NULL
THEN 'Not Lent'
ELSE DATE_FORMAT(l.lend_date, '%m-%d-%Y')
END
) 'LEND DATE',
(
CASE
WHEN l.lend_to IS NULL
THEN 'Not Lent'
ELSE TIMESTAMPDIFF(day, l.lend_date, NOW())
END
) 'DAYS LENT'
FROM
people p
LEFT JOIN cd cd
ON p.id = cd.owner
LEFT JOIN lend l
ON cd.id = l.cd_id
LEFT JOIN lend l1
on p.id = l1.lend_to
;

See if this query gives you the basic information you are looking for
select c.title as 'Title', c.artist as 'Artist', o.first_name as 'Owner',
l.lend_date as 'Lend Date', p.first_name as 'Lender'
from cd c
left outer join people o on c.owner = o.id
left outer join lend l on c.id = l.cd_id
left outer join people p on l.lend_to = p.id
You can add additional switch logic to refine the result, if this is what you are looking for.

I've resolved the issue with a data architecture redesign. Take a look if interested.
http://sqlfiddle.com/#!2/b6158/3

Related

SQL to count staff working at time with join

I am new to SQL.
I want to count staff working at a particular time.
The data schema has a Person table and a Shifts table. They are joined by a StaffShifts table which has both a user_id field and a shift_id field.
Each staff member can have many shifts, and each shift can have many staff.
create table Person
(
user_id INT,
rank_id INT,
groupschedule_id INT,
personnum VARCHAR(6),
PRIMARY KEY (user_id)
);
INSERT INTO Person (user_id, rank_id, groupschedule_id, personnum)
VALUES
(1, 1, 1, 'ABC123'),
(2, 1, 2, 'DEF456'),
(3, 2, 3, 'GHI789'),
(4, 1, 1, 'JKL123'),
(5, 3, 2, 'NOP123'),
(6, 1, 3, 'RST789'),
(7, 2, 1, 'WXY789'),
(8, 1, 2, 'ABC432'),
(9, 1, 3, 'DEF789')
;
CREATE TABLE Groupschedule
(
groupschedule_id INT,
shortnm char(20)
);
INSERT INTO Groupschedule
VALUES
(1,'TEAM 1'),
(2,'TEAM 2'),
(3,'TEAM 3')
;
CREATE TABLE Shifts
(
shift_id INT PRIMARY KEY,
shift_start datetime,
shift_end datetime
);
INSERT INTO Shifts
VALUES
(1, '2021-03-08 06:45:00', '2021-03-08 15:00:00'),
(2, '2021-03-08 14:00:00', '2021-03-08 23:00:00'),
(3, '2021-03-08 23:00:00', '2021-04-09 07:00:00')
;
CREATE TABLE Osl
(
shift_id INT,
osl INT,
area char(10),
FOREIGN KEY (shift_id) REFERENCES Shifts(shift_id)
);
INSERT INTO Osl
VALUES
(1,3, 'EAST'),
(2,2, 'EAST'),
(3,2, 'EAST')
;
CREATE TABLE StaffShifts
( shift_id INT,
user_id INT,
FOREIGN KEY (shift_id) REFERENCES Shifts(shift_id),
FOREIGN KEY (user_id) REFERENCES Person(user_id)
);
WHATS BEEN TRIED
I tried to start with retrieving all the staff working at the time with:
SELECT shift_id FROM Shifts WHERE shift_start < '2021-03-08 11:00:00' AND shift_end > '2021-03-08 11:00:00'
INNER JOIN StaffShifts ON Person.user_id=StaffShifts.user_id
On a fiddle this results in an error that references the INNER JOIN but does not elaborate.
I have created a fiddle here https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=1572ed28005766d30a18521557aadb90
UPDATE
I have tried SQL statement:
SELECT shift_id FROM Shifts INNER JOIN StaffShifts ON Person.user_id=StaffShifts.user_id
WHERE shift_start < '2021-03-08 11:00:00' AND shift_end > '2021-03-08 11:00:00'
However this produces and error shift_id is ambiguous.
UPDATE
Since i really want to COUNT the users working a given shift I am trying to return a list of users - using JOINs for the two one to many relationships:
SELECT Person.user_id
FROM Person
INNER JOIN StaffShifts ON Person.user_id=StaffShifts.user_id
INNER JOIN StaffShifts ON Shifts.shift_id=StaffShifts.shift_id
WHERE Shifts.shift_start < '2021-03-08 11:00:00'
AND Shifts.shift_end > '2021-03-08 11:00:00'
But this results in 'Not unique table/alias: 'StaffShifts''
Note I have not tried to use COUNT until I return a list of Persons.
You need to include Person - which you want to count, StaffShifts - where a person is assigned to a shift so you could also join Shifts based on that, and Shifts - where you check the hour range you wish.
SELECT
COUNT(*)
FROM `Person`
INNER JOIN `StaffShifts` ON `Person`.`user_id` = `StaffShifts`.`user_id`
INNER JOIN `Shifts` ON `StaffShifts`.`shift_id` = `Shifts`.`shift_id`
WHERE
Shifts.shift_start < '2021-03-08 11:00:00'
AND Shifts.shift_end > '2021-03-08 14:00:00'
;

MySQL challenge using MIN and subquery

My intent is to return a date value based on a selected year as well as a minimum date value based on a dataset that includes the complete dataset across all years. The query always returns the minimum date value in 2017. I want it to return the minimum start_date from the whole dataset.
What I get is for min_date_over_all_years
orgA 2017-10-09
orgB 2017-10-08
Required result for min_date_over_all_years is
orgA 2015-10-10
orgB 2014-10-09
Please see the attached fiddle for the example:
http://sqlfiddle.com/#!9/c0f74/9
The schema is:
CREATE TABLE IF NOT EXISTS `project` (
`project_id` int(11) NOT NULL AUTO_INCREMENT,
`p_name` varchar(10) NOT NULL,
`start_date` DATE NOT NULL,
`organisation_id` int(11) NOT NULL,
PRIMARY KEY (`project_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=6 ;
INSERT INTO `project` (`project_id`, `p_name`,
`start_date`, `organisation_id`)
VALUES
(1, 'testP1', '2017-10-09', 1),
(2, 'testP2', '2016-10-10', 1),
(3, 'testP3', '2015-10-10', 1),
(4, 'testP4', '2017-10-10', 2),
(5, 'testP5', '2014-10-10', 2),
(6, 'testP6', '2017-10-10', 1),
(7, 'testP7', '2016-10-10', 1),
(8, 'testP8', '2015-10-10', 1),
(9, 'testP9', '2017-10-08', 2),
(10, 'testP10', '2014-10-09', 2);
CREATE TABLE IF NOT EXISTS `organisation` (`organisation_id` int(11) NOT NULL AUTO_INCREMENT,
`org_name` varchar(10) NOT NULL,
PRIMARY KEY (`organisation_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=6 ;
INSERT INTO `organisation` (`organisation_id`, `org_name`
)
VALUES
(1, 'orgA'),
(2, 'orgB');
AND the query I have tried (along with simpler subquery and Case versions) is:
SELECT o.org_name, MIN(p.start_date) AS min_date_2017, YEAR(p.start_date) AS year_selected,
(SELECT MIN(p.start_date) FROM project p2
INNER JOIN organisation o2 ON o2.organisation_id = p2.organisation_id
WHERE p2.organisation_id = o.organisation_id
GROUP BY o2.organisation_id) AS min_date_over_all_years
FROM organisation o
INNER JOIN project p on p.organisation_id = o.organisation_id
WHERE YEAR(p.start_date)=2017
GROUP BY o.organisation_id
You can't put a subquery that returns multiple rows in the SELECT list; when a subquery is being used as an expression, it has to return a single row with a single column.
You don't need a separate query.
SELECT o.org_name,
MIN(IF(YEAR(p.start_date) = 2017, p.start_date, NULL)) AS min_date_2017,
2017 AS year_selected,
MIN(p.start_date) AS min_date_over_all_years
FROM organisation AS o
INNER JOIN project AS p ON p.organisation_id = o.organisation_id
GROUP BY o.organisation_id
You can also join with a subquery that gets the overall data.
SELECT o.org_name, MIN(p.start_date) AS min_date_2017, YEAR(p.start_date) AS year_selected, overall.start_date AS min_date_over_all_years
FROM organisation o
INNER JOIN project p on p.organisation_id = o.organisation_id
INNER JOIN (
SELECT organisation_id, MIN(start_date) AS start_date
FROM project
GROUP BY organisation_id) AS overall ON o.organisation_id = overall.organisation_id
WHERE YEAR(p.start_date)=2017
GROUP BY o.organisation_id

SQL subsets query

I am having trouble creating a query for an SQL table. The query I am trying to create shows the number of products within the category of "clothes" and does not show accessories for example a list of products that are entered as T-shirts or sweatshirts.
Here is the tables that have been created:
CREATE DATABASE IF NOT EXISTS product_list;
DROP TABLE IF EXISTS products;
DROP TABLE IF EXISTS product_categories;
DROP TABLE IF EXISTS categories;
CREATE TABLE products (
product_id INT AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(50) DEFAULT NULL,
active BOOL DEFAULT NULL
);
CREATE TABLE categories (
category_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(50),
structure VARCHAR(50)
);
CREATE TABLE product_categories (
product_id INT,
category_id INT,
PRIMARY KEY(product_id, category_id)
);
INSERT INTO products VALUES
(NULL, "Blue Sweatshirt", false),
(NULL, "Short Sleeve T-Shirt", true),
(NULL, "White Vest", true),
(NULL, "Black Hairclip", true),
(NULL, "Knitted Hat", false),
(NULL, "Grey Sweatshirt", true),
(NULL, "Tartan Scarf", true);
INSERT INTO categories VALUES
(NULL, "Sweatshirts", "Clothes>Sweatshirts"),
(NULL, "T-Shirts", "Clothes>T-Shirts"),
(NULL, "Accessories", "Accessories"),
(NULL, "Winter", "Clothes>Winter"),
(NULL, "Vests", "Clothes>Vests");
INSERT INTO product_categories VALUES
(1, 1), (2, 2), (3, 5), (3, 4), (4, 3), (5, 3), (5, 4), (6, 1), (7, 3), (7, 4);
If I understand correctly, this is a set-within-sets query. You are looking for products that have at least one "clothes" category, and none of the categories are not clothes. I approach this using group by and having because it is quite flexible:
select pc.product_id
from Product_categories pc join
categories c
on pc.category_id = c.category_id
group by pc.product_id
having sum(case when c.structure like 'Clothes%' then 1 else 0 end) > 0 and
sum(case when c.structure not like 'Clothes%' then 1 else 0 end) = 0;
If you just want the count, then you can use this as a subquery and use count(*).
EDIT:
A small note. The question is now tagged with MySQL, which has convenient short-hand for the having clause:
having sum(c.structure like 'Clothes%') > 0 and
sum(c.structure not like 'Clothes%') = 0;
Try this query
select * from products a
join Product_categories b on a.product_id=b.product_id
join categories c on b.category_id=b.category_id
where c.name like '%Clothes%'

MySQL latest related record from more than one table

Assuming a main "job" table, and two corresponding "log" tables (one for server events and the other for user events, with quite different data stored in each).
What would be the best way to return a selection of "job" records and the latest corresponding log record (with multiple fields) from each of the two "log" tables (if there are any).
Did get some inspiration from: MySQL Order before Group by
The following SQL would create some example tables/data...
CREATE TABLE job (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` tinytext NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE job_log_server (
`id` int(11) NOT NULL AUTO_INCREMENT,
`job_id` int(11) NOT NULL,
`event` tinytext NOT NULL,
`ip` tinytext NOT NULL,
`created` datetime NOT NULL,
PRIMARY KEY (id),
KEY job_id (job_id)
);
CREATE TABLE job_log_user (
`id` int(11) NOT NULL AUTO_INCREMENT,
`job_id` int(11) NOT NULL,
`event` tinytext NOT NULL,
`user_id` int(11) NOT NULL,
`created` datetime NOT NULL,
PRIMARY KEY (id),
KEY job_id (job_id)
);
INSERT INTO job VALUES (1, 'Job A');
INSERT INTO job VALUES (2, 'Job B');
INSERT INTO job VALUES (3, 'Job C');
INSERT INTO job VALUES (4, 'Job D');
INSERT INTO job_log_server VALUES (1, 2, 'Job B Event 1', '127.0.0.1', '2000-01-01 00:00:01');
INSERT INTO job_log_server VALUES (2, 2, 'Job B Event 2', '127.0.0.1', '2000-01-01 00:00:02');
INSERT INTO job_log_server VALUES (3, 2, 'Job B Event 3*', '127.0.0.1', '2000-01-01 00:00:03');
INSERT INTO job_log_server VALUES (4, 3, 'Job C Event 1*', '127.0.0.1', '2000-01-01 00:00:04');
INSERT INTO job_log_user VALUES (1, 1, 'Job A Event 1', 5, '2000-01-01 00:00:01');
INSERT INTO job_log_user VALUES (2, 1, 'Job A Event 2*', 5, '2000-01-01 00:00:02');
INSERT INTO job_log_user VALUES (3, 2, 'Job B Event 1*', 5, '2000-01-01 00:00:03');
INSERT INTO job_log_user VALUES (4, 4, 'Job D Event 1', 5, '2000-01-01 00:00:04');
INSERT INTO job_log_user VALUES (5, 4, 'Job D Event 2', 5, '2000-01-01 00:00:05');
INSERT INTO job_log_user VALUES (6, 4, 'Job D Event 3*', 5, '2000-01-01 00:00:06');
One option (only returning 1 field from each table) would be to use nested sub-queries... but the ORDER BY will have to be done in separate queries to the GROUP BY (x2):
SELECT
*
FROM
(
SELECT
s2.*,
jlu.event AS user_event
FROM
(
SELECT
*
FROM
(
SELECT
j.id,
j.name,
jls.event AS server_event
FROM
job AS j
LEFT JOIN
job_log_server AS jls ON jls.job_id = j.id
ORDER BY
jls.created DESC
) AS s1
GROUP BY
s1.id
) AS s2
LEFT JOIN
job_log_user AS jlu ON jlu.job_id = s2.id
ORDER BY
jlu.created DESC
) AS s3
GROUP BY
s3.id;
Which actually seems to perform quite well... just not very easy to understand.
Or you could try to return and sort the log records in two separate sub-queries:
SELECT
j.id,
j.name,
jls2.event AS server_event,
jlu2.event AS user_event
FROM
job AS j
LEFT JOIN
(
SELECT
jls.job_id,
jls.event
FROM
job_log_server AS jls
ORDER BY
jls.created DESC
) AS jls2 ON jls2.job_id = j.id
LEFT JOIN
(
SELECT
jlu.job_id,
jlu.event
FROM
job_log_user AS jlu
ORDER BY
jlu.created DESC
) AS jlu2 ON jlu2.job_id = j.id
GROUP BY
j.id;
But this seems to take quite a bit longer to run... possibly because of the amount of records it's adding to a temporary table, which are then mostly ignored (to keep this short-ish, I've not added any conditions to the job table, which would otherwise be only returning active jobs).
Not sure if I've missed anything obvious.
How about the following SQL Fiddle. It produces the same results as both of your queries.
SELECT j.id, j.name,
(
SELECT s.event
FROM job_log_server s
WHERE j.id = s.job_id
ORDER BY s.id DESC
LIMIT 1
)AS SERVER_EVENT,
(
SELECT u.event
FROM job_log_user u
WHERE j.id = u.job_id
ORDER BY u.id DESC
LIMIT 1
)AS USER_EVENT
FROM job j
EDIT SQL Fiddle:
SELECT m.id, m.name, js.event AS SERVER_EVENT, ju.event AS USER_EVENT
FROM
(
SELECT j.id, j.name,
(
SELECT s.id
FROM job_log_server s
WHERE j.id = s.job_id
ORDER BY s.id DESC
LIMIT 1
)AS S_E,
(
SELECT u.id
FROM job_log_user u
WHERE j.id = u.job_id
ORDER BY u.id DESC
LIMIT 1
)AS U_E
FROM job j
) m
LEFT JOIN job_log_server js ON js.id = m.S_E
LEFT JOIN job_log_user ju ON ju.id = m.U_E

SQL "where IN" query in a many to many relation of 2 tables

I maybe ask a relatively simple question. But I cannot find a solution to this. It's a matter of two tables MANY TO MANY, so there's a third table between them. The schema below:
CREATE TABLE `options` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(200) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `options` (`id`, `name`) VALUES
(1, 'something'),
(2, 'thing'),
(3, 'some option'),
(4, 'other thing'),
(5, 'vacuity'),
(6, 'etc');
CREATE TABLE `person` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(200) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `person` (`id`, `name`) VALUES
(1, 'ROBERT'),
(2, 'BOB'),
(3, 'FRANK'),
(4, 'JOHN'),
(5, 'PAULINE'),
(6, 'VERENA'),
(7, 'MARCEL'),
(8, 'PAULO'),
(9, 'SCHRODINGER');
CREATE TABLE `person_option_link` (
`person_id` int(11) NOT NULL,
`option_id` int(11) NOT NULL,
UNIQUE KEY `person_id` (`person_id`,`option_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `person_option_link` (`person_id`, `option_id`) VALUES
(1, 1),
(2, 1),
(2, 2),
(3, 2),
(3, 3),
(3, 4),
(3, 5),
(4, 1),
(4, 3),
(4, 6),
(5, 3),
(5, 4),
(5, 5),
(6, 1),
(7, 2),
(8, 3),
(9, 4)
(5, 6);
The idea is as follow: I would like to retrieve all people who have a link to option_id=1 AND option_id=3.
The expected result should be one person: John.
But I tried with something like that, which doesn't work because it returns also people who have 1 OR 3:
SELECT *
FROM person p
LEFT JOIN person_option_link l ON p.id = l.person_id
WHERE l.option_id IN ( 1, 3 )
What is the best practice in this case?
//////// POST EDITED: I need to focus on an other important point ////////
And what if we add a new condition with NOT IN? like:
SELECT *
FROM person p
LEFT JOIN person_option_link l ON p.id = l.person_id
WHERE l.option_id IN ( 3, 4 )
AND l.option_id NOT IN ( 6 )
In this case, the result should be FRANK, because PAULINE who has also 3 and 4, have the option 6 and we don't want that.
Thanks!
This is a Relational Division Problem.
SELECT p.id, p.name
FROM person p
INNER JOIN person_option_link l
ON p.id = l.person_id
WHERE l.option_id IN ( 1, 3 )
GROUP BY p.id, p.name
HAVING COUNT(*) = 2
SQLFiddle Demo
if a unique constraint was not enforce on option_id for every id, a DISTINCT keyword is required to filter unique option_ID
SELECT p.id, p.name
FROM person p
INNER JOIN person_option_link l
ON p.id = l.person_id
WHERE l.option_id IN ( 1, 3 )
GROUP BY p.id, p.name
HAVING COUNT(DISTINCT l.option_id) = 2
SQL of Relational Division
Use GROUP BY and COUNT:
SELECT p.id, p.name
FROM person p
LEFT JOIN person_option_link l ON p.id = l.person_id
WHERE l.option_id IN ( 1, 3 )
GROUP BY p.id, p.name
HAVING COUNT(Distinct l.option_id) = 2
I prefer using COUNT DISTINCT in case you could have the same option id multiple times.
Good luck.
It may not be the best option, but you could use a 'double join' to the person_option_link table:
SELECT *
FROM person AS p
JOIN person_option_link AS l1 ON p.id = l1.person_id AND l1.option_id = 1
JOIN person_option_link AS l2 ON p.id = l2.person_id AND l2.option_id = 3
This ensures that there is simultaneously a row with option ID of 1 and another with option ID of 3 for the given user.
The GROUP BY alternatives certainly work; they might well be quicker too (but you'd need to scrutinize query plans to be sure). The GROUP BY alternatives scale better to handle more values: for example, a list of the users with option IDs 2, 3, 5, 7, 11, 13, 17, 19 is fiddly with this variant but the GROUP BY variants work without structural changes to the query. You can also use the GROUP BY variants to select users with at least 4 of the 8 values which is substantially infeasible using this technique.
Using the GROUP BY does require a slight restatement (or rethinking) of the query, though, to:
How can I select people who have 2 of the option IDs in the set {1, 3}?
How can I select people who have 8 of the option IDs in the set {2, 3, 5, 7, 11, 13, 17, 19}?
How can I select people who have at least 4 of the option IDs in the set {2, 3, 5, 7, 11, 13, 17, 19}?
For the "has not these ids" part of the question, simply add a WHERE clause:
WHERE person_id NOT IN
(
SELECT person_id
FROM person_option_link
WHERE option_id = 4
)