MySQL challenge using MIN and subquery - mysql

My intent is to return a date value based on a selected year as well as a minimum date value based on a dataset that includes the complete dataset across all years. The query always returns the minimum date value in 2017. I want it to return the minimum start_date from the whole dataset.
What I get is for min_date_over_all_years
orgA 2017-10-09
orgB 2017-10-08
Required result for min_date_over_all_years is
orgA 2015-10-10
orgB 2014-10-09
Please see the attached fiddle for the example:
http://sqlfiddle.com/#!9/c0f74/9
The schema is:
CREATE TABLE IF NOT EXISTS `project` (
`project_id` int(11) NOT NULL AUTO_INCREMENT,
`p_name` varchar(10) NOT NULL,
`start_date` DATE NOT NULL,
`organisation_id` int(11) NOT NULL,
PRIMARY KEY (`project_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=6 ;
INSERT INTO `project` (`project_id`, `p_name`,
`start_date`, `organisation_id`)
VALUES
(1, 'testP1', '2017-10-09', 1),
(2, 'testP2', '2016-10-10', 1),
(3, 'testP3', '2015-10-10', 1),
(4, 'testP4', '2017-10-10', 2),
(5, 'testP5', '2014-10-10', 2),
(6, 'testP6', '2017-10-10', 1),
(7, 'testP7', '2016-10-10', 1),
(8, 'testP8', '2015-10-10', 1),
(9, 'testP9', '2017-10-08', 2),
(10, 'testP10', '2014-10-09', 2);
CREATE TABLE IF NOT EXISTS `organisation` (`organisation_id` int(11) NOT NULL AUTO_INCREMENT,
`org_name` varchar(10) NOT NULL,
PRIMARY KEY (`organisation_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=6 ;
INSERT INTO `organisation` (`organisation_id`, `org_name`
)
VALUES
(1, 'orgA'),
(2, 'orgB');
AND the query I have tried (along with simpler subquery and Case versions) is:
SELECT o.org_name, MIN(p.start_date) AS min_date_2017, YEAR(p.start_date) AS year_selected,
(SELECT MIN(p.start_date) FROM project p2
INNER JOIN organisation o2 ON o2.organisation_id = p2.organisation_id
WHERE p2.organisation_id = o.organisation_id
GROUP BY o2.organisation_id) AS min_date_over_all_years
FROM organisation o
INNER JOIN project p on p.organisation_id = o.organisation_id
WHERE YEAR(p.start_date)=2017
GROUP BY o.organisation_id

You can't put a subquery that returns multiple rows in the SELECT list; when a subquery is being used as an expression, it has to return a single row with a single column.
You don't need a separate query.
SELECT o.org_name,
MIN(IF(YEAR(p.start_date) = 2017, p.start_date, NULL)) AS min_date_2017,
2017 AS year_selected,
MIN(p.start_date) AS min_date_over_all_years
FROM organisation AS o
INNER JOIN project AS p ON p.organisation_id = o.organisation_id
GROUP BY o.organisation_id
You can also join with a subquery that gets the overall data.
SELECT o.org_name, MIN(p.start_date) AS min_date_2017, YEAR(p.start_date) AS year_selected, overall.start_date AS min_date_over_all_years
FROM organisation o
INNER JOIN project p on p.organisation_id = o.organisation_id
INNER JOIN (
SELECT organisation_id, MIN(start_date) AS start_date
FROM project
GROUP BY organisation_id) AS overall ON o.organisation_id = overall.organisation_id
WHERE YEAR(p.start_date)=2017
GROUP BY o.organisation_id

Related

Matching two column values

I have 2 tables. 'user_cities' and 'visits'. I want to check if a user visited a city which he was not supposed to visit.
CREATE TABLE `user_cities` (
`id` INT NOT NULL AUTO_INCREMENT,
`user_id` INT,
`name` varchar(255),
`city_id` INT,
PRIMARY KEY (`id`)
);
CREATE TABLE `visits` (
`id` INT NOT NULL AUTO_INCREMENT,
`user_id` INT,
`visit_id` INT,
`city_id` INT,
PRIMARY KEY (`id`)
);
INSERT INTO `user_cities` VALUES
(1, 1, 'John', 35),
(2, 1, 'John', 36),
(3, 1, 'John', 37),
(4, 2, 'Michael', 38),
(5, 2, 'Michael', 39);
INSERT INTO `visits` VALUES
(1, 1, 1, 35),
(2, 1, 2, 36),
(3, 1, 3, 38),
(4, 2, 4, 38),
(5, 2, 5, 39);
http://sqlfiddle.com/#!9/68c658
Example: John must visit only 35, 36 and 37. Michael must visit 38 and 39. These are defined in 'user_cities'
However, John has visited 38 (visits id 3)
How can i query users that visited the wrong city?
If you're not familiar with SQL this kind of problem is a bit trickier than it appears.
You want to select the information from the visits where that visit is not specified by user_cities. The usual way to do this is to use a LEFT JOIN to the user_cities table for the record that validates the visit, and then to find entries where the LEFT JOIN can't find a match.
So your query would be:
SELECT *
FROM visits
LEFT JOIN user_cities
ON(visits.user_id = user_cities.user_id AND
visits.city_id = user_cities.city_id)
WHERE user_cities.id IS NULL
Of course you would probably also extend the query to give you the name of the user (and the coty, but that's not in your example code)
You can try using left join
DEMO
select a.user_id,a.city_id
FROM visits a left join user_cities b on a.user_id=b.user_id and
a.city_id=b.city_id
where b.user_id is null
OUTPUT:
1 John 37
The only way I can see to do this is via a full outer join (which MySQL doesn't even support, so we have to use a workaround):
SELECT COALESCE(user_id_1, user_id_2) AS user_id_matching
FROM
(
SELECT
uc.user_id AS user_id_1,
uc.city_id AS city_id_1,
v.user_id AS user_id_2,
v.city_id AS city_id_2
FROM user_cities uc
LEFT JOIN visits v
ON uc.user_id = v.user_id AND uc.city_id = v.city_id
UNION ALL
SELECT
uc.user_id AS user_id_1,
uc.city_id AS city_id_1,
v.user_id AS user_id_2,
v.city_id AS city_id_2
FROM user_cities uc
RIGHT JOIN visits v
ON uc.user_id = v.user_id AND uc.city_id = v.city_id
WHERE uc.user_id IS NULL
) t
GROUP BY COALESCE(user_id_1, user_id_2)
HAVING
COUNT(CASE WHEN user_id_1 IS NULL THEN 1 END) = 0 AND
COUNT(CASE WHEN user_id_2 IS NULL THEN 1 END) = 0;
Demo

filter all two tables to get all the data

I created a database for survey software. The two tables of the database are what I want to do, I want to get the average scores from the two date ranges and from a place, and get the ones without the answer as null or 0. I tried
SELECT
AVG(tbAnswers.averageScore)
FROM
tbDrivers
LEFT JOIN tbAnswers ON tbDrivers.driverId = tbAnswers.driverId
WHERE
tbDrivers.place = 'WDC'
GROUP BY
tbDrivers.driverId
But when I specify the date range, is not get the data of the drivers without answer.
SELECT AVG(tbAnswers.averageScore)
FROM tbDrivers LEFT JOIN tbAnswers ON tbDrivers.driverId = tbAnswers.driverId
WHERE tbDrivers.place = 'WDC'
AND answerDate BETWEEN '2018-11-28' AND '2018-12-03'
GROUP BY tbDrivers.driverId
Table structures:
CREATE TABLE `tbAnswers` (
`answerId` int(11) NOT NULL,
`answerDate` date NOT NULL,
`driverId` int(11) NOT NULL,
`score1` int(11) NOT NULL,
`score2` int(11) NOT NULL,
`score3` int(11) NOT NULL,
`averageScore` float NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `tbAnswers` (`answerId`, `answerDate`, `driverId`, `score1`, `score2`, `score3`, `averageScore`) VALUES
(10, '2018-11-28', 1032, 0, 0, 0, 0),
(11, '2018-11-29', 1032, 9, 8, 3, 6.67),
(12, '2018-11-30', 1032, 0, 3, 2, 1.67),
(13, '2018-11-30', 1035, 10, 2, 10, 7.34),
(14, '2018-11-01', 1032, 5, 5, 5, 5),
(15, '2018-12-03', 1035, 5, 5, 7, 5.67);
CREATE TABLE `tbDrivers` (
`driverId` int(11) NOT NULL,
`nameSurname` varchar(32) NOT NULL,
`place` varchar(64) NOT NULL,
`plate` varchar(8) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `tbDrivers` (`driverId`, `nameSurname`, `place`, `plate`) VALUES
(1032, 'Nick Oliver', 'WDC', 'B16186D'),
(1033, 'Nicholas Keller', 'WDC', 'ACG8095'),
(1034, 'Felipe Mendez', 'WDC', 'C26106E'),
(1035, 'Lowell Butler', 'WDC', '5123QK');
How can I solve this problem?
The problem arises because you have no records for driverid in tbanswers table.
Either make an entry in tbanswers or Use Query given by Forpas above or use this query
SELECT tbdrivers.driverid,
Avg(tbanswers.averagescore)
FROM tbdrivers
LEFT JOIN tbanswers
ON tbdrivers.driverid = tbanswers.driverid
WHERE tbdrivers.place = 'WDC'
AND answerdate BETWEEN '2018-11-28' AND '2018-12-03'
OR answerdate IS NULL
GROUP BY tbdrivers.driverid
Use your query which fetches the drivers that have at least 1 answer, UNION the drivers that have no answer:
(SELECT tbDrivers.driverId, AVG(tbAnswers.averageScore) AS avgscore
FROM tbDrivers LEFT JOIN tbAnswers ON tbDrivers.driverId = tbAnswers.driverId
WHERE tbDrivers.place = 'WDC'
AND answerDate BETWEEN '2018-11-28' AND '2018-12-03'
GROUP BY tbDrivers.driverId )
UNION
(SELECT t.driverId, NULL AS avgscore
FROM tbDrivers t
WHERE
NOT EXISTS (SELECT 1 FROM tbAnswers WHERE tbAnswers.driverId = t.driverId))
ORDER BY driverId
the result is:
driverId avgscore
1032 2.7800000111262
1033 (null)
1034 (null)
1035 6.505000114440918

Table Architecture Difficulty with Query

I'm working on a practice problem with DDL as follows:
CREATE TABLE people (
id SMALLINT NOT NULL AUTO_INCREMENT,
first_name VARCHAR(50),
last_name VARCHAR(50),
PRIMARY KEY (id)
)
;
CREATE TABLE cd (
id SMALLINT NOT NULL AUTO_INCREMENT,
artist VARCHAR(50),
title VARCHAR(50),
PRIMARY KEY(id),
owner SMALLINT,
FOREIGN KEY (owner) REFERENCES people(id)
)
;
CREATE TABLE lend (
id SMALLINT NOT NULL AUTO_INCREMENT,
cd_id SMALLINT,
lend_to SMALLINT,
FOREIGN KEY (lend_to) REFERENCES people(id),
FOREIGN KEY (cd_id) REFERENCES cd(id),
lend_date DATE DEFAULT '0000-00-00',
PRIMARY KEY(id)
)
;
INSERT INTO people (id, first_name, last_name) VALUES
(1, 'Brett', 'CEO'),
(2, 'Jeff', 'President'),
(3, 'Beta', 'Media'),
(4, 'Casey', 'Content')
;
INSERT INTO cd (id, artist, title, owner) VALUES
(1, 'The xx', 'Coexist', 2),
(2, 'ACDC', 'High Voltage', 1),
(3, 'Bjork', 'Cocoon', 3),
(4, 'Ella Fitzgerald', 'Ella Sings Gershwin', 4),
(5, 'Fever Ray', 'Live in Lulea', 2),
(6, 'Tom Waits', 'Rain Dogs', 4),
(7, 'Howlin Wolf', 'Smokestack Lightning', 1),
(8, 'Tupac', 'Poetic Justice', 4)
;
INSERT INTO lend (id, cd_id, lend_to, lend_date) VALUES
(1, 2, 3, '2014/01/03'),
(2, 3, 1, '2014/04/02'),
(3, 7, 4, '2013/12/22'),
(4, 4, 2, '2014/01/03')
;
I want my query to show who the CD is lent to. I can get the ID from the lend table, but want to display the full name of the individual lending it from the people table. Do I need to rework the design of how the lend table connects to the people table, or just use some sort of case function in the query? Below is my query so-far where I'm getting the l.lent_to and want to be showing the CONCAT(p.first_name, ' ', p.last_name) who the CD is lent to.
SELECT /*cd.id,*/
CONCAT(p.first_name, ' ', p.last_name) 'CD OWNER',
cd.title,
l.lend_to,
p.id ,
(
CASE
WHEN l.lend_to IS NULL
THEN 'Not Lent'
ELSE DATE_FORMAT(l.lend_date, '%m-%d-%Y')
END
) 'LEND DATE',
(
CASE
WHEN l.lend_to IS NULL
THEN 'Not Lent'
ELSE TIMESTAMPDIFF(day, l.lend_date, NOW())
END
) 'DAYS LENT'
FROM
people p
LEFT JOIN cd cd
ON p.id = cd.owner
LEFT JOIN lend l
ON cd.id = l.cd_id
LEFT JOIN lend l1
on p.id = l1.lend_to
;
See if this query gives you the basic information you are looking for
select c.title as 'Title', c.artist as 'Artist', o.first_name as 'Owner',
l.lend_date as 'Lend Date', p.first_name as 'Lender'
from cd c
left outer join people o on c.owner = o.id
left outer join lend l on c.id = l.cd_id
left outer join people p on l.lend_to = p.id
You can add additional switch logic to refine the result, if this is what you are looking for.
I've resolved the issue with a data architecture redesign. Take a look if interested.
http://sqlfiddle.com/#!2/b6158/3

MySQL latest related record from more than one table

Assuming a main "job" table, and two corresponding "log" tables (one for server events and the other for user events, with quite different data stored in each).
What would be the best way to return a selection of "job" records and the latest corresponding log record (with multiple fields) from each of the two "log" tables (if there are any).
Did get some inspiration from: MySQL Order before Group by
The following SQL would create some example tables/data...
CREATE TABLE job (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` tinytext NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE job_log_server (
`id` int(11) NOT NULL AUTO_INCREMENT,
`job_id` int(11) NOT NULL,
`event` tinytext NOT NULL,
`ip` tinytext NOT NULL,
`created` datetime NOT NULL,
PRIMARY KEY (id),
KEY job_id (job_id)
);
CREATE TABLE job_log_user (
`id` int(11) NOT NULL AUTO_INCREMENT,
`job_id` int(11) NOT NULL,
`event` tinytext NOT NULL,
`user_id` int(11) NOT NULL,
`created` datetime NOT NULL,
PRIMARY KEY (id),
KEY job_id (job_id)
);
INSERT INTO job VALUES (1, 'Job A');
INSERT INTO job VALUES (2, 'Job B');
INSERT INTO job VALUES (3, 'Job C');
INSERT INTO job VALUES (4, 'Job D');
INSERT INTO job_log_server VALUES (1, 2, 'Job B Event 1', '127.0.0.1', '2000-01-01 00:00:01');
INSERT INTO job_log_server VALUES (2, 2, 'Job B Event 2', '127.0.0.1', '2000-01-01 00:00:02');
INSERT INTO job_log_server VALUES (3, 2, 'Job B Event 3*', '127.0.0.1', '2000-01-01 00:00:03');
INSERT INTO job_log_server VALUES (4, 3, 'Job C Event 1*', '127.0.0.1', '2000-01-01 00:00:04');
INSERT INTO job_log_user VALUES (1, 1, 'Job A Event 1', 5, '2000-01-01 00:00:01');
INSERT INTO job_log_user VALUES (2, 1, 'Job A Event 2*', 5, '2000-01-01 00:00:02');
INSERT INTO job_log_user VALUES (3, 2, 'Job B Event 1*', 5, '2000-01-01 00:00:03');
INSERT INTO job_log_user VALUES (4, 4, 'Job D Event 1', 5, '2000-01-01 00:00:04');
INSERT INTO job_log_user VALUES (5, 4, 'Job D Event 2', 5, '2000-01-01 00:00:05');
INSERT INTO job_log_user VALUES (6, 4, 'Job D Event 3*', 5, '2000-01-01 00:00:06');
One option (only returning 1 field from each table) would be to use nested sub-queries... but the ORDER BY will have to be done in separate queries to the GROUP BY (x2):
SELECT
*
FROM
(
SELECT
s2.*,
jlu.event AS user_event
FROM
(
SELECT
*
FROM
(
SELECT
j.id,
j.name,
jls.event AS server_event
FROM
job AS j
LEFT JOIN
job_log_server AS jls ON jls.job_id = j.id
ORDER BY
jls.created DESC
) AS s1
GROUP BY
s1.id
) AS s2
LEFT JOIN
job_log_user AS jlu ON jlu.job_id = s2.id
ORDER BY
jlu.created DESC
) AS s3
GROUP BY
s3.id;
Which actually seems to perform quite well... just not very easy to understand.
Or you could try to return and sort the log records in two separate sub-queries:
SELECT
j.id,
j.name,
jls2.event AS server_event,
jlu2.event AS user_event
FROM
job AS j
LEFT JOIN
(
SELECT
jls.job_id,
jls.event
FROM
job_log_server AS jls
ORDER BY
jls.created DESC
) AS jls2 ON jls2.job_id = j.id
LEFT JOIN
(
SELECT
jlu.job_id,
jlu.event
FROM
job_log_user AS jlu
ORDER BY
jlu.created DESC
) AS jlu2 ON jlu2.job_id = j.id
GROUP BY
j.id;
But this seems to take quite a bit longer to run... possibly because of the amount of records it's adding to a temporary table, which are then mostly ignored (to keep this short-ish, I've not added any conditions to the job table, which would otherwise be only returning active jobs).
Not sure if I've missed anything obvious.
How about the following SQL Fiddle. It produces the same results as both of your queries.
SELECT j.id, j.name,
(
SELECT s.event
FROM job_log_server s
WHERE j.id = s.job_id
ORDER BY s.id DESC
LIMIT 1
)AS SERVER_EVENT,
(
SELECT u.event
FROM job_log_user u
WHERE j.id = u.job_id
ORDER BY u.id DESC
LIMIT 1
)AS USER_EVENT
FROM job j
EDIT SQL Fiddle:
SELECT m.id, m.name, js.event AS SERVER_EVENT, ju.event AS USER_EVENT
FROM
(
SELECT j.id, j.name,
(
SELECT s.id
FROM job_log_server s
WHERE j.id = s.job_id
ORDER BY s.id DESC
LIMIT 1
)AS S_E,
(
SELECT u.id
FROM job_log_user u
WHERE j.id = u.job_id
ORDER BY u.id DESC
LIMIT 1
)AS U_E
FROM job j
) m
LEFT JOIN job_log_server js ON js.id = m.S_E
LEFT JOIN job_log_user ju ON ju.id = m.U_E

SQL "where IN" query in a many to many relation of 2 tables

I maybe ask a relatively simple question. But I cannot find a solution to this. It's a matter of two tables MANY TO MANY, so there's a third table between them. The schema below:
CREATE TABLE `options` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(200) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `options` (`id`, `name`) VALUES
(1, 'something'),
(2, 'thing'),
(3, 'some option'),
(4, 'other thing'),
(5, 'vacuity'),
(6, 'etc');
CREATE TABLE `person` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(200) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `person` (`id`, `name`) VALUES
(1, 'ROBERT'),
(2, 'BOB'),
(3, 'FRANK'),
(4, 'JOHN'),
(5, 'PAULINE'),
(6, 'VERENA'),
(7, 'MARCEL'),
(8, 'PAULO'),
(9, 'SCHRODINGER');
CREATE TABLE `person_option_link` (
`person_id` int(11) NOT NULL,
`option_id` int(11) NOT NULL,
UNIQUE KEY `person_id` (`person_id`,`option_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `person_option_link` (`person_id`, `option_id`) VALUES
(1, 1),
(2, 1),
(2, 2),
(3, 2),
(3, 3),
(3, 4),
(3, 5),
(4, 1),
(4, 3),
(4, 6),
(5, 3),
(5, 4),
(5, 5),
(6, 1),
(7, 2),
(8, 3),
(9, 4)
(5, 6);
The idea is as follow: I would like to retrieve all people who have a link to option_id=1 AND option_id=3.
The expected result should be one person: John.
But I tried with something like that, which doesn't work because it returns also people who have 1 OR 3:
SELECT *
FROM person p
LEFT JOIN person_option_link l ON p.id = l.person_id
WHERE l.option_id IN ( 1, 3 )
What is the best practice in this case?
//////// POST EDITED: I need to focus on an other important point ////////
And what if we add a new condition with NOT IN? like:
SELECT *
FROM person p
LEFT JOIN person_option_link l ON p.id = l.person_id
WHERE l.option_id IN ( 3, 4 )
AND l.option_id NOT IN ( 6 )
In this case, the result should be FRANK, because PAULINE who has also 3 and 4, have the option 6 and we don't want that.
Thanks!
This is a Relational Division Problem.
SELECT p.id, p.name
FROM person p
INNER JOIN person_option_link l
ON p.id = l.person_id
WHERE l.option_id IN ( 1, 3 )
GROUP BY p.id, p.name
HAVING COUNT(*) = 2
SQLFiddle Demo
if a unique constraint was not enforce on option_id for every id, a DISTINCT keyword is required to filter unique option_ID
SELECT p.id, p.name
FROM person p
INNER JOIN person_option_link l
ON p.id = l.person_id
WHERE l.option_id IN ( 1, 3 )
GROUP BY p.id, p.name
HAVING COUNT(DISTINCT l.option_id) = 2
SQL of Relational Division
Use GROUP BY and COUNT:
SELECT p.id, p.name
FROM person p
LEFT JOIN person_option_link l ON p.id = l.person_id
WHERE l.option_id IN ( 1, 3 )
GROUP BY p.id, p.name
HAVING COUNT(Distinct l.option_id) = 2
I prefer using COUNT DISTINCT in case you could have the same option id multiple times.
Good luck.
It may not be the best option, but you could use a 'double join' to the person_option_link table:
SELECT *
FROM person AS p
JOIN person_option_link AS l1 ON p.id = l1.person_id AND l1.option_id = 1
JOIN person_option_link AS l2 ON p.id = l2.person_id AND l2.option_id = 3
This ensures that there is simultaneously a row with option ID of 1 and another with option ID of 3 for the given user.
The GROUP BY alternatives certainly work; they might well be quicker too (but you'd need to scrutinize query plans to be sure). The GROUP BY alternatives scale better to handle more values: for example, a list of the users with option IDs 2, 3, 5, 7, 11, 13, 17, 19 is fiddly with this variant but the GROUP BY variants work without structural changes to the query. You can also use the GROUP BY variants to select users with at least 4 of the 8 values which is substantially infeasible using this technique.
Using the GROUP BY does require a slight restatement (or rethinking) of the query, though, to:
How can I select people who have 2 of the option IDs in the set {1, 3}?
How can I select people who have 8 of the option IDs in the set {2, 3, 5, 7, 11, 13, 17, 19}?
How can I select people who have at least 4 of the option IDs in the set {2, 3, 5, 7, 11, 13, 17, 19}?
For the "has not these ids" part of the question, simply add a WHERE clause:
WHERE person_id NOT IN
(
SELECT person_id
FROM person_option_link
WHERE option_id = 4
)