MySQL: Select first row with value in interval - mysql

With the following table:
CREATE TABLE table1 (`id` INT, `num` INT);
INSERT INTO table1 (`id`, `num`) VALUES
(1, 1),
(1, 5),
(1, 7),
(1, 12),
(1, 22),
(1, 23),
(1, 24),
(2, 1),
(2, 6);
How do I select a row for each num interval of 5 (ie. select the first row for [0,5), the first for [5,10), the first for [10,15), etc.), with a given id? Is this possible with a MySQL query, or must I process it in a programming language later?
For reference, the output I'd want for id=1:
(1, 1), (1,5), (1,12), (1,22)

Here is a short query:
select min(num), ceiling((num + 1)/5)
from table1
where id = 1
group by ceiling((num + 1)/5);

Related

Check if a pair of records belong to multiple group IDs

I have a table that contains 2 IDs - UserID and GroupID. I need to pull a list of all UserIDs that "share" the same GroupID at least 4 times
So, based on the following data set:
CREATE TABLE IF NOT EXISTS `tableA` (
`UserID` int(11) unsigned NOT NULL,
`GroupID` int(11) unsigned NOT NULL
) DEFAULT CHARSET=utf8;
INSERT INTO `tableA` (`UserID`, `GroupID`) VALUES
(1, 1),
(2, 1),
(3, 1),
(4, 1),
(1, 2),
(2, 2),
(3, 2),
(1, 3),
(2, 3),
(3, 3),
(1, 4),
(2, 4),
(3, 4),
(1, 5),
(3, 5);
I'm trying to generate the following result:
UserID A
UserID B
NumberOfOccurrences
1
2
4
2
3
4
1
3
5
I've created an SQLFiddle for it. I've tried to achieve this via JOINs and sub-queries, but I'm not entirely sure how to properly proceed with something like this.
Do a self join. GROUP BY. Use HAVING to make sure at least 4 common GroupID's.
select a1.userid, a2.userid
from tablea a1
join tablea a2
on a1.GroupID = a2.GroupID and a1.userid < a2.userid
group by a1.userid, a2.userid
having count(*) >= 4

Lead window function in mysql to find sales

Given this table. I would like to know for each day how many different customers made a sale on date t and and t+1.
-- create a table
CREATE TABLE sales_t(
id INTEGER PRIMARY KEY,
d_date date NOT NULL,
sale INT NOT NULL,
customer_n INT NOT NULL
);
-- insert some values
INSERT INTO sales_t VALUES (1, '2021-06-30', 12, 1);
INSERT INTO sales_t VALUES (2, '2021-06-30', 22, 5);
INSERT INTO sales_t VALUES (3, '2021-06-30', 111, 3);
INSERT INTO sales_t VALUES (4, '2021-07-01', 27, 1);
INSERT INTO sales_t VALUES (5, '2021-07-01', 90, 4);
INSERT INTO sales_t VALUES (6, '2021-07-01', 33, 3);
INSERT INTO sales_t VALUES (6, '2021-07-01', 332, 3);
The result for date 2021-06-30 is 2 because customer 1 and 3 made a sale in t and t+1.
Date sale_t_and_t+1
.....................................
2021-06-30 2
2021-07-01 0
Use LEAD() window function for each distinct combination of date and customer to create a flag which will be 1 if the customer is present in both days or 0 if not and aggregate:
SELECT d_date, COALESCE(SUM(flag), 0) `sale_t_and_t+1`
FROM (
SELECT DISTINCT d_date, customer_n,
LEAD(d_date) OVER (PARTITION BY customer_n ORDER BY d_date) = d_date + INTERVAL 1 DAY flag
FROM sales_t
) t
GROUP BY d_date;
See the demo.

MySQL update multiple rows based on id

In a table in MySQL I'm trying to update a specific field based on the id.
The table has more than 5,000 rows and many fields. One of the fields is "id" and another one that I want to update is called "category" which right now all of them are NULL and I want to update all of them.
my backup mysql file that I want to use has only "id" and "category" which they are like this:
INSERT INTO `products` (`id`, `category`) VALUES
(3, 1),
(4, 1),
(5, 2),
(6, 1),
(7, 5),
(8, 1),
(9, 6),
(10, 1),
...
(5000, 3);
I want to update the "category" field in my table according to the id's in this list and because there is more than 5,000 rows I don't want to change each record manually.
Right now in my table all the "category" fields are NULL and I want to update or give new information to the "category" fields using the file that I have.
The easiest is to use a Temporary table :
CREATE TEMPORARY TABLE temp_products (id int, category int ) ;
Then
INSERT INTO `temp_products` (`id`, `category`) VALUES
(3, 1),
(4, 1),
(5, 2),
(6, 1),
(7, 5),
(8, 1),
(9, 6),
(10, 1),
...
(5000, 3);
Now you just have to use an update with an inner join :
Update products p
INNER JOIN temp_products t_p ON t_p.id = p.id
SET p.category = t_p.category
if you want you can add a where clause :
Update products p
INNER JOIN temp_products t_p ON t_p.id = p.id
SET p.category = t_p.category
WHERE p.category IS NULL
Perhaps the better solution for you can be:
First of all, create temporary table
CREATE TABLE `products_tmp` (
`id` INT NOT NULL,
`category` INT NOT NULL,
PRIMARY KEY (`id`)
);
After that, perform inserts into generated temporary table:
INSERT INTO `products_tmp` (`id`, `category`) VALUES
(3, 1),
(4, 1),
(5, 2),
(6, 1),
(7, 5),
(8, 1),
(9, 6),
(10, 1),
...
(5000, 3);
after this, you can update all fields in your original table:
UPDATE products p
JOIN products_tmp pt ON p.id = pt.id
SET p.category = pt.category;
After this, you can delete temporary table
drop table products_tmp;

Select rows grouped by a column having max aggregate

Given the following data set, how would I find the email addresses that were references for the most ApplicationIDs that have an "Accepted" decision?
CREATE TABLE IF NOT EXISTS `EmailReferences` (
`ApplicationID` INT NOT NULL,
`Email` VARCHAR(45) NOT NULL,
PRIMARY KEY (`ApplicationID`, `Email`)
);
INSERT INTO EmailReferences (ApplicationID, Email)
VALUES
(1, 'ref10#test.org'), (1, 'ref11#test.org'), (1, 'ref12#test.org'),
(2, 'ref20#test.org'), (2, 'ref21#test.org'), (2, 'ref22#test.org'),
(3, 'ref11#test.org'), (3, 'ref31#test.org'), (3, 'ref32#test.org'),
(4, 'ref40#test.org'), (4, 'ref41#test.org'), (4, 'ref42#test.org'),
(5, 'ref50#test.org'), (5, 'ref51#test.org'), (5, 'ref52#test.org'),
(6, 'ref60#test.org'), (6, 'ref11#test.org'), (6, 'ref62#test.org'),
(7, 'ref70#test.org'), (7, 'ref71#test.org'), (7, 'ref72#test.org'),
(8, 'ref10#test.org'), (8, 'ref81#test.org'), (8, 'ref82#test.org')
;
CREATE TABLE IF NOT EXISTS `FinalDecision` (
`ApplicationID` INT NOT NULL,
`Decision` ENUM('Accepted', 'Denied') NOT NULL,
PRIMARY KEY (`ApplicationID`)
);
INSERT INTO FinalDecision (ApplicationID, Decision)
VALUES
(1, 'Accepted'), (2, 'Denied'),
(3, 'Accepted'), (4, 'Denied'),
(5, 'Denied'), (6, 'Denied'),
(7, 'Denied'), (8, 'Accepted')
;
Fiddle of same:http://sqlfiddle.com/#!9/03bcf2/1
Initially, I was using LIMIT 1 and ORDER BY CountDecision DESC, like so:
SELECT er.email, COUNT(fd.Decision) AS CountDecision
FROM EmailReferences AS er
JOIN FinalDecision AS fd ON er.ApplicationID = fd.ApplicationID
WHERE fd.Decision = 'Accepted'
GROUP BY er.email
ORDER BY CountDecision DESC
LIMIT 1
;
However, it occurred to me that I could have multiple email addresses that referred different "most accepted" decisions (i.e., a tie, so to speak), and those would be filtered out (is that the right phrasing?) with the LIMIT keyword.
I then tried a variation on the above query, replacing the ORDER BY and LIMIT lines with:
HAVING MAX(CountDecision)
But I realized that that's only half a statement: MAX(CountDecision) needs to be compared to something. I just don't know what.
Any pointers would be much appreciated. Thanks!
Note: this is for a homework assignment.
Update: To be clear, I'm trying to find value and count of Emails from EmailReferences. However, I only want rows that have FinalDecision.Decision = 'Accepted' (on matching ApplicantIDs). Based on my data, the result should be:
Email | CountDecision
---------------+--------------
ref10#test.org | 2
ref11#test.org | 2
For example...
SELECT a.*
FROM
( SELECT x.email
, COUNT(*) total
FROM emailreferences x
JOIN finaldecision y
ON y.applicationid = x.applicationid
WHERE y.decision = 'accepted'
GROUP
BY x.email
) a
JOIN
( SELECT COUNT(*) total
FROM emailreferences x
JOIN finaldecision y
ON y.applicationid = x.applicationid
WHERE y.decision = 'accepted'
GROUP
BY x.email
ORDER
BY total DESC
LIMIT 1
) b
ON b.total = a.total;
MySQL still lack window functions, but when version 8 is production ready, this becomes easier. So for fuure reference, or for those databases like Mariadb that already have window functions:
CREATE TABLE IF NOT EXISTS `EmailReferences` (
`ApplicationID` INT NOT NULL,
`Email` VARCHAR(45) NOT NULL,
PRIMARY KEY (`ApplicationID`, `Email`)
);
INSERT INTO EmailReferences (ApplicationID, Email)
VALUES
(1, 'ref10#test.org'), (1, 'ref11#test.org'), (1, 'ref12#test.org'),
(2, 'ref20#test.org'), (2, 'ref21#test.org'), (2, 'ref22#test.org'),
(3, 'ref30#test.org'), (3, 'ref31#test.org'), (3, 'ref32#test.org'),
(4, 'ref40#test.org'), (4, 'ref41#test.org'), (4, 'ref42#test.org'),
(5, 'ref50#test.org'), (5, 'ref51#test.org'), (5, 'ref52#test.org'),
(6, 'ref60#test.org'), (6, 'ref11#test.org'), (6, 'ref62#test.org'),
(7, 'ref70#test.org'), (7, 'ref71#test.org'), (7, 'ref72#test.org'),
(8, 'ref10#test.org'), (8, 'ref81#test.org'), (8, 'ref82#test.org')
;
CREATE TABLE IF NOT EXISTS `FinalDecision` (
`ApplicationID` INT NOT NULL,
`Decision` ENUM('Accepted', 'Denied') NOT NULL,
PRIMARY KEY (`ApplicationID`)
);
INSERT INTO FinalDecision (ApplicationID, Decision)
VALUES
(1, 'Accepted'), (2, 'Denied'),
(3, 'Accepted'), (4, 'Denied'),
(5, 'Denied'), (6, 'Denied'),
(7, 'Denied'), (8, 'Accepted')
;
select email, CountDecision
from (
SELECT er.email, COUNT(fd.Decision) AS CountDecision
, max(COUNT(fd.Decision)) over() maxCountDecision
FROM EmailReferences AS er
JOIN FinalDecision AS fd ON er.ApplicationID = fd.ApplicationID
WHERE fd.Decision = 'Accepted'
GROUP BY er.email
) d
where CountDecision = maxCountDecision
email | CountDecision
:------------- | ------------:
ref10#test.org | 2
dbfiddle here

SELECT data based on result of previous row in table

I have a database of students.
CREATE TABLE classlist
(`id` int, `studentid` int, `subjectid` int, `presentid` int)
;
CREATE TABLE student
(`id` int, `name` varchar(4))
;
CREATE TABLE subject
(`id` int, `name` varchar(4))
;
CREATE TABLE classStatus
(`id` int, `name` varchar(8))
;
INSERT INTO classlist
(`id`, `studentid`, `subjectid`, `presentid`)
VALUES
(1, 111, 1, 1),
(2, 222, 3, 0),
(3, 333, 2, 1),
(4, 111, 4, 1),
(5, 111, 1, 0),
(6, 222, 3, 0),
(7, 333, 2, 1),
(8, 111, 4, 1),
(9, 111, 2, 0),
(10, 111, 4, 1),
(11, 111, 1, 1),
(12, 333, 3, 1),
(13, 333, 2, 1),
(14, 333, 3, 1)
;
INSERT INTO student
(`id`, `name`)
VALUES
(111, 'John'),
(222, 'Kate'),
(333, 'Matt')
;
INSERT INTO subject
(`id`, `name`)
VALUES
(1, 'MATH'),
(2, 'ENG'),
(3, 'SCI'),
(4, 'GEO')
;
INSERT INTO classStatus
(`id`, `name`)
VALUES
(0, 'Absent'),
(1, 'Present')
;
And I have a query which shows how many times they have been present or absent.
SELECT
studentid,
students.name AS NAME,
SUM(presentid = 1) AS present,
SUM(presentid = 0) AS absent
FROM classlist
INNER JOIN student as students ON classlist.studentid=students.id
GROUP BY studentid, NAME
See this fiddle below.
http://sqlfiddle.com/#!2/fe0b0/1
There seems to be a trend from looking at this sample data that after someone attends subjectid 4 they are often not coming to the next class. How can I capture this in a query. I want to ONLY show data WHERE last subjectid =4. So in my sample data rows matching my criteria would be.
(5, 111, 1, 0),
(9, 111, 2, 0),
(11, 111, 1, 1),
as these rows are all the next row of a studentid who had a subjectid=4.
My output would be
| STUDENTID | NAME | PRESENT | ABSENT|
| 111 | John | 1 | 2 |
To get the next class for a student, use a correlated subquery:
select cl.*,
(select min(cl2.id) from classlist cl2 where cl2.studentid = cl.studentid and cl2.id > cl.id) as nextcl
from classlist cl
Plugging this into your query example tell you you who is present and absent for the next class:
SELECT students.id, students.name AS NAME,
SUM(cl.presentid = 1) AS present, SUM(cl.presentid = 0) AS absent,
sum(clnext.presentid = 1) as presentnext
FROM (select cl.*,
(select min(cl2.id) from classlist cl2 where cl2.studentid = cl.studentid and cl2.id > cl.id) as nextcl
from classlist cl
) cl INNER JOIN
student as students
ON cl.studentid = students.id left outer join
classlist clnext
on cl.nextcl = clnext.id
GROUP BY students.id, students.NAME
Add a where cl.subjectid = 4 to get the answer for subject 4.
I fixed the query. The SQLFiddle is k.
A quick and dirty solution could be to get the Classlist.Id for all lines where subjectid=4 (let's call them n) then select all the lines where Id = n+1