Select rows grouped by a column having max aggregate - mysql

Given the following data set, how would I find the email addresses that were references for the most ApplicationIDs that have an "Accepted" decision?
CREATE TABLE IF NOT EXISTS `EmailReferences` (
`ApplicationID` INT NOT NULL,
`Email` VARCHAR(45) NOT NULL,
PRIMARY KEY (`ApplicationID`, `Email`)
);
INSERT INTO EmailReferences (ApplicationID, Email)
VALUES
(1, 'ref10#test.org'), (1, 'ref11#test.org'), (1, 'ref12#test.org'),
(2, 'ref20#test.org'), (2, 'ref21#test.org'), (2, 'ref22#test.org'),
(3, 'ref11#test.org'), (3, 'ref31#test.org'), (3, 'ref32#test.org'),
(4, 'ref40#test.org'), (4, 'ref41#test.org'), (4, 'ref42#test.org'),
(5, 'ref50#test.org'), (5, 'ref51#test.org'), (5, 'ref52#test.org'),
(6, 'ref60#test.org'), (6, 'ref11#test.org'), (6, 'ref62#test.org'),
(7, 'ref70#test.org'), (7, 'ref71#test.org'), (7, 'ref72#test.org'),
(8, 'ref10#test.org'), (8, 'ref81#test.org'), (8, 'ref82#test.org')
;
CREATE TABLE IF NOT EXISTS `FinalDecision` (
`ApplicationID` INT NOT NULL,
`Decision` ENUM('Accepted', 'Denied') NOT NULL,
PRIMARY KEY (`ApplicationID`)
);
INSERT INTO FinalDecision (ApplicationID, Decision)
VALUES
(1, 'Accepted'), (2, 'Denied'),
(3, 'Accepted'), (4, 'Denied'),
(5, 'Denied'), (6, 'Denied'),
(7, 'Denied'), (8, 'Accepted')
;
Fiddle of same:http://sqlfiddle.com/#!9/03bcf2/1
Initially, I was using LIMIT 1 and ORDER BY CountDecision DESC, like so:
SELECT er.email, COUNT(fd.Decision) AS CountDecision
FROM EmailReferences AS er
JOIN FinalDecision AS fd ON er.ApplicationID = fd.ApplicationID
WHERE fd.Decision = 'Accepted'
GROUP BY er.email
ORDER BY CountDecision DESC
LIMIT 1
;
However, it occurred to me that I could have multiple email addresses that referred different "most accepted" decisions (i.e., a tie, so to speak), and those would be filtered out (is that the right phrasing?) with the LIMIT keyword.
I then tried a variation on the above query, replacing the ORDER BY and LIMIT lines with:
HAVING MAX(CountDecision)
But I realized that that's only half a statement: MAX(CountDecision) needs to be compared to something. I just don't know what.
Any pointers would be much appreciated. Thanks!
Note: this is for a homework assignment.
Update: To be clear, I'm trying to find value and count of Emails from EmailReferences. However, I only want rows that have FinalDecision.Decision = 'Accepted' (on matching ApplicantIDs). Based on my data, the result should be:
Email | CountDecision
---------------+--------------
ref10#test.org | 2
ref11#test.org | 2

For example...
SELECT a.*
FROM
( SELECT x.email
, COUNT(*) total
FROM emailreferences x
JOIN finaldecision y
ON y.applicationid = x.applicationid
WHERE y.decision = 'accepted'
GROUP
BY x.email
) a
JOIN
( SELECT COUNT(*) total
FROM emailreferences x
JOIN finaldecision y
ON y.applicationid = x.applicationid
WHERE y.decision = 'accepted'
GROUP
BY x.email
ORDER
BY total DESC
LIMIT 1
) b
ON b.total = a.total;

MySQL still lack window functions, but when version 8 is production ready, this becomes easier. So for fuure reference, or for those databases like Mariadb that already have window functions:
CREATE TABLE IF NOT EXISTS `EmailReferences` (
`ApplicationID` INT NOT NULL,
`Email` VARCHAR(45) NOT NULL,
PRIMARY KEY (`ApplicationID`, `Email`)
);
INSERT INTO EmailReferences (ApplicationID, Email)
VALUES
(1, 'ref10#test.org'), (1, 'ref11#test.org'), (1, 'ref12#test.org'),
(2, 'ref20#test.org'), (2, 'ref21#test.org'), (2, 'ref22#test.org'),
(3, 'ref30#test.org'), (3, 'ref31#test.org'), (3, 'ref32#test.org'),
(4, 'ref40#test.org'), (4, 'ref41#test.org'), (4, 'ref42#test.org'),
(5, 'ref50#test.org'), (5, 'ref51#test.org'), (5, 'ref52#test.org'),
(6, 'ref60#test.org'), (6, 'ref11#test.org'), (6, 'ref62#test.org'),
(7, 'ref70#test.org'), (7, 'ref71#test.org'), (7, 'ref72#test.org'),
(8, 'ref10#test.org'), (8, 'ref81#test.org'), (8, 'ref82#test.org')
;
CREATE TABLE IF NOT EXISTS `FinalDecision` (
`ApplicationID` INT NOT NULL,
`Decision` ENUM('Accepted', 'Denied') NOT NULL,
PRIMARY KEY (`ApplicationID`)
);
INSERT INTO FinalDecision (ApplicationID, Decision)
VALUES
(1, 'Accepted'), (2, 'Denied'),
(3, 'Accepted'), (4, 'Denied'),
(5, 'Denied'), (6, 'Denied'),
(7, 'Denied'), (8, 'Accepted')
;
select email, CountDecision
from (
SELECT er.email, COUNT(fd.Decision) AS CountDecision
, max(COUNT(fd.Decision)) over() maxCountDecision
FROM EmailReferences AS er
JOIN FinalDecision AS fd ON er.ApplicationID = fd.ApplicationID
WHERE fd.Decision = 'Accepted'
GROUP BY er.email
) d
where CountDecision = maxCountDecision
email | CountDecision
:------------- | ------------:
ref10#test.org | 2
dbfiddle here

Related

Check if a pair of records belong to multiple group IDs

I have a table that contains 2 IDs - UserID and GroupID. I need to pull a list of all UserIDs that "share" the same GroupID at least 4 times
So, based on the following data set:
CREATE TABLE IF NOT EXISTS `tableA` (
`UserID` int(11) unsigned NOT NULL,
`GroupID` int(11) unsigned NOT NULL
) DEFAULT CHARSET=utf8;
INSERT INTO `tableA` (`UserID`, `GroupID`) VALUES
(1, 1),
(2, 1),
(3, 1),
(4, 1),
(1, 2),
(2, 2),
(3, 2),
(1, 3),
(2, 3),
(3, 3),
(1, 4),
(2, 4),
(3, 4),
(1, 5),
(3, 5);
I'm trying to generate the following result:
UserID A
UserID B
NumberOfOccurrences
1
2
4
2
3
4
1
3
5
I've created an SQLFiddle for it. I've tried to achieve this via JOINs and sub-queries, but I'm not entirely sure how to properly proceed with something like this.
Do a self join. GROUP BY. Use HAVING to make sure at least 4 common GroupID's.
select a1.userid, a2.userid
from tablea a1
join tablea a2
on a1.GroupID = a2.GroupID and a1.userid < a2.userid
group by a1.userid, a2.userid
having count(*) >= 4

How do I build a query to get the latest row per user where a third criteria is in a separate table?

I have three tables
CREATE TABLE `LineItems` (
`LineItemID` int NOT NULL,
`OrderID` int NOT NULL,
`ProductID` int NOT NULL
);
INSERT INTO `LineItems` (`LineItemID`, `OrderID`, `ProductID`) VALUES
(1, 1, 2),
(2, 1, 1),
(3, 2, 3),
(4, 2, 4),
(5, 3, 1),
(6, 4, 2),
(7, 5, 4),
(8, 5, 2),
(9, 5, 3),
(10, 6, 1),
(11, 6, 4),
(12, 7, 4),
(13, 7, 1),
(14, 7, 2),
(15, 8, 1),
(16, 9, 3),
(17, 9, 4),
(18, 10, 3);
CREATE TABLE `Orders` (
`OrderID` int NOT NULL,
`UserID` int NOT NULL,
`OrderDate` datetime NOT NULL
);
INSERT INTO `Orders` (`OrderID`, `UserID`, `OrderDate`) VALUES
(1, 21, '2021-05-01 00:00:00'),
(2, 21, '2021-05-03 00:00:00'),
(3, 24, '2021-05-06 00:00:00'),
(4, 23, '2021-05-12 00:00:00'),
(5, 21, '2021-05-14 00:00:00'),
(6, 22, '2021-05-16 00:00:00'),
(7, 23, '2021-05-20 00:00:00'),
(8, 21, '2021-05-22 00:00:00'),
(9, 24, '2021-05-23 00:00:00'),
(10, 23, '2021-05-26 00:00:00');
CREATE TABLE `Products` (
`ProductID` int NOT NULL,
`ProductTitle` VARCHAR(250) NOT NULL,
`ProductType` enum('doors','windows','flooring') NOT NULL
);
INSERT INTO `Products` (`ProductID`, `ProductTitle`, `ProductType`) VALUES
(1, 'French Doors','doors'),
(2, 'Sash Windows','windows'),
(3, 'Sliding Doors','doors'),
(4, 'Parquet Floor','flooring');
SQL Fiddle:
Orders - contains an order date and a user id
LineItems - Foreign key to the orders table, contains product ids that are in the order
Products - Contains details of the products (including if they are a door, window, or flooring)
I have figured out how to get the latest order per user with
SELECT O.* FROM Orders O LEFT JOIN Orders O2
ON O2.UserID=O.UserID AND O.OrderDate < O2.OrderDate
WHERE O2.OrderDate IS NULL;
This works fine and is included in the SQL fiddle, along with a query that returns a complete picture for reference.
I am trying to figure out how to get the latest order with flooring per user, but I'm not having any luck.
In the SQL fiddle linked above, the intended output for what I am after would be
OrderID | UserID | OrderDate
6 | 22 | 2021-05-16T00:00:00Z
5 | 21 | 2021-05-14T00:00:00Z
9 | 24 | 2021-05-23T00:00:00Z
7 | 23 | 2021-05-20T00:00:00Z
EDIT: To clarify, in the intended result, two rows (for users 21 and 23) are different than in the query that gets just latest order per user. This is because order IDs 8 and 10 (from the latest order per user query) do not include flooring. The intended query has to find the latest order with flooring from each user to return in the result set.
You need to add the LineItems and Products tables to your query to find orders where flooring was purchased:
SELECT DISTINCT O.*
FROM Orders O
LEFT JOIN Orders O2
ON O2.UserID=O.UserID AND
O.OrderDate < O2.OrderDate
INNER JOIN LineItems i
ON i.OrderID = O.OrderID
INNER JOIN Products p
ON p.ProductID = i.ProductID
WHERE O2.OrderDate IS NULL AND
p.ProductType = 'flooring'
db<>fiddle here

MySQL update multiple rows based on id

In a table in MySQL I'm trying to update a specific field based on the id.
The table has more than 5,000 rows and many fields. One of the fields is "id" and another one that I want to update is called "category" which right now all of them are NULL and I want to update all of them.
my backup mysql file that I want to use has only "id" and "category" which they are like this:
INSERT INTO `products` (`id`, `category`) VALUES
(3, 1),
(4, 1),
(5, 2),
(6, 1),
(7, 5),
(8, 1),
(9, 6),
(10, 1),
...
(5000, 3);
I want to update the "category" field in my table according to the id's in this list and because there is more than 5,000 rows I don't want to change each record manually.
Right now in my table all the "category" fields are NULL and I want to update or give new information to the "category" fields using the file that I have.
The easiest is to use a Temporary table :
CREATE TEMPORARY TABLE temp_products (id int, category int ) ;
Then
INSERT INTO `temp_products` (`id`, `category`) VALUES
(3, 1),
(4, 1),
(5, 2),
(6, 1),
(7, 5),
(8, 1),
(9, 6),
(10, 1),
...
(5000, 3);
Now you just have to use an update with an inner join :
Update products p
INNER JOIN temp_products t_p ON t_p.id = p.id
SET p.category = t_p.category
if you want you can add a where clause :
Update products p
INNER JOIN temp_products t_p ON t_p.id = p.id
SET p.category = t_p.category
WHERE p.category IS NULL
Perhaps the better solution for you can be:
First of all, create temporary table
CREATE TABLE `products_tmp` (
`id` INT NOT NULL,
`category` INT NOT NULL,
PRIMARY KEY (`id`)
);
After that, perform inserts into generated temporary table:
INSERT INTO `products_tmp` (`id`, `category`) VALUES
(3, 1),
(4, 1),
(5, 2),
(6, 1),
(7, 5),
(8, 1),
(9, 6),
(10, 1),
...
(5000, 3);
after this, you can update all fields in your original table:
UPDATE products p
JOIN products_tmp pt ON p.id = pt.id
SET p.category = pt.category;
After this, you can delete temporary table
drop table products_tmp;

Complex SQL query issue (MYSQL Workbench 6.3)

I am having trouble figuring out how to write this query.
Let me explain the situation.
So, the question,
I need to display all the player names who have scored a score greater than 99, who have played matches in all the same grounds where a certain player (e.g. pid = 1) has played and has scored a score greater than 99.
(They could have played in other grounds besides the one pid = 1 has played, but the minimum requirement being they must have played in all the same grounds as him).
I have a database, which consist of 3 tables; player, ground, matches. And following data.
create database test1;
use test1;
CREATE TABLE `player` (
`pid` int(11) NOT NULL,
`pname` varchar(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `ground` (
`gid` int(11) NOT NULL,
`gname` varchar(20) DEFAULT NULL,
`country` varchar(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `matches` (
`pid` int(11) DEFAULT NULL,
`gid` int(11) DEFAULT NULL,
`score` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `player` ADD PRIMARY KEY (`pid`);
ALTER TABLE `ground` ADD PRIMARY KEY (`gid`);
ALTER TABLE `matches`
ADD KEY `gid` (`gid`),
ADD KEY `pid` (`pid`);
INSERT INTO `player` (`pid`, `pname`) VALUES
(1, 'afridi'),
(2, 'kohli'),
(3, 'imam'),
(4, 'fawad'),
(5, 'baven'),
(6, 'awais');
INSERT INTO `ground` (`gid`, `gname`, `country`) VALUES
(1, 'Qaddafi', 'PK'),
(2, 'National', 'PK'),
(3, 'Eden Garden', 'IND'),
(4, 'Lords', 'ENG'),
(5, 'MCG', 'AUS'),
(6, 'Arbab Nayyaz', 'PK');
INSERT INTO `matches` (`pid`, `gid`, `score`) VALUES
(1, 2, 23),
(1, 1, 111),
(2, 3, 107),
(2, 5, 103),
(1, 3, 117),
(1, 4, 55),
(1, 5, 101),
(1, 6, 44),
(2, 6, 103),
(2, 4, 103),
(2, 2, 117),
(2, 1, 103),
(4, 1, 77),
(3, 1, 13),
(5, 2, 22),
(3, 2, 101),
(3, 3, 101),
(5, 1, 101),
(5, 4, 101),
(5, 5, 101),
(6, 1, 101),
(6, 2, 101),
(6, 3, 101),
(6, 4, 101),
(6, 5, 101),
(6, 4, 101);
Relatively a simple database.
I've written the following query which displays the names of 4 players. It is displaying all the players who have played in the same grounds as pid = 1. How to display only those players which have played in all the same grounds as pid = 1.
select p.pname
from player p
join matches mn on mn.pid = p.pid
where (p.pid != 1) and (mn.score > 99) and exists (select m.gid from matches m where (m.pid = 1) and (mn.gid = m.gid))
group by pname;
According to the data provided in the tables,
Afridi (pid = 1) has scored century in the following grounds; 1, 3, and 5.
Respectively, players (pid) 2, 3, 5 ,6 have scored century in grounds = 1, 3, and 5.
These players have made centuries in other grounds as well but this query displays all players who have played in any of the 3 grounds.
The players could've played in other grounds as well, but the minimum requirement being that the players have to play in all the grounds; 1, 3, 5.
So, what I need is, only all those players, which have played in all of the same grounds, as in grounds; 1, 3, 5.
From observing the data in table matches we can see the players that have played in all the same grounds are only 2, being pid = 2, 6.
Any idea how to go about this?
I think this query should do what you want. It creates a table of grounds where the first player has played and made a century (g1), and joins that to the players who have also played at those grounds. If the number of different grounds that the other player has played at is the same as the number of different grounds that the first player has played at, they must have both played at the same set of grounds. Note there are a couple of places (in both subqueries) where you need to set the player id for comparison.
SELECT p.pname
FROM (SELECT gid, pid FROM matches WHERE pid=1 AND score >= 100) g1
LEFT JOIN matches m
ON m.gid = g1.gid AND m.pid != g1.pid
JOIN player p
ON p.pid = m.pid
GROUP BY m.pid
HAVING COUNT(DISTINCT m.gid) = (SELECT COUNT(DISTINCT gid) FROM matches WHERE pid=1 AND score >= 100)
ORDER BY m.pid
SQLFiddle Demo

SELECT data based on result of previous row in table

I have a database of students.
CREATE TABLE classlist
(`id` int, `studentid` int, `subjectid` int, `presentid` int)
;
CREATE TABLE student
(`id` int, `name` varchar(4))
;
CREATE TABLE subject
(`id` int, `name` varchar(4))
;
CREATE TABLE classStatus
(`id` int, `name` varchar(8))
;
INSERT INTO classlist
(`id`, `studentid`, `subjectid`, `presentid`)
VALUES
(1, 111, 1, 1),
(2, 222, 3, 0),
(3, 333, 2, 1),
(4, 111, 4, 1),
(5, 111, 1, 0),
(6, 222, 3, 0),
(7, 333, 2, 1),
(8, 111, 4, 1),
(9, 111, 2, 0),
(10, 111, 4, 1),
(11, 111, 1, 1),
(12, 333, 3, 1),
(13, 333, 2, 1),
(14, 333, 3, 1)
;
INSERT INTO student
(`id`, `name`)
VALUES
(111, 'John'),
(222, 'Kate'),
(333, 'Matt')
;
INSERT INTO subject
(`id`, `name`)
VALUES
(1, 'MATH'),
(2, 'ENG'),
(3, 'SCI'),
(4, 'GEO')
;
INSERT INTO classStatus
(`id`, `name`)
VALUES
(0, 'Absent'),
(1, 'Present')
;
And I have a query which shows how many times they have been present or absent.
SELECT
studentid,
students.name AS NAME,
SUM(presentid = 1) AS present,
SUM(presentid = 0) AS absent
FROM classlist
INNER JOIN student as students ON classlist.studentid=students.id
GROUP BY studentid, NAME
See this fiddle below.
http://sqlfiddle.com/#!2/fe0b0/1
There seems to be a trend from looking at this sample data that after someone attends subjectid 4 they are often not coming to the next class. How can I capture this in a query. I want to ONLY show data WHERE last subjectid =4. So in my sample data rows matching my criteria would be.
(5, 111, 1, 0),
(9, 111, 2, 0),
(11, 111, 1, 1),
as these rows are all the next row of a studentid who had a subjectid=4.
My output would be
| STUDENTID | NAME | PRESENT | ABSENT|
| 111 | John | 1 | 2 |
To get the next class for a student, use a correlated subquery:
select cl.*,
(select min(cl2.id) from classlist cl2 where cl2.studentid = cl.studentid and cl2.id > cl.id) as nextcl
from classlist cl
Plugging this into your query example tell you you who is present and absent for the next class:
SELECT students.id, students.name AS NAME,
SUM(cl.presentid = 1) AS present, SUM(cl.presentid = 0) AS absent,
sum(clnext.presentid = 1) as presentnext
FROM (select cl.*,
(select min(cl2.id) from classlist cl2 where cl2.studentid = cl.studentid and cl2.id > cl.id) as nextcl
from classlist cl
) cl INNER JOIN
student as students
ON cl.studentid = students.id left outer join
classlist clnext
on cl.nextcl = clnext.id
GROUP BY students.id, students.NAME
Add a where cl.subjectid = 4 to get the answer for subject 4.
I fixed the query. The SQLFiddle is k.
A quick and dirty solution could be to get the Classlist.Id for all lines where subjectid=4 (let's call them n) then select all the lines where Id = n+1