MySQL COUNT(*) not counting result rows - mysql

Simplified schema of m:n relation implementing a subscription model:
CREATE TABLE c (
id INT(11) PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(32)
) ENGINE=MyISAM CHARACTER SET=UTF8;
CREATE TABLE t (
id INT(11) PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(32)
) ENGINE=MyISAM CHARACTER SET=UTF8;
CREATE TABLE c2t (
id INT(11) PRIMARY KEY AUTO_INCREMENT,
cid INT(11) NOT NULL,
tid INT(11) NOT NULL,
dateStart DATE NULL,
dateEnd DATE NULL
) ENGINE=MyISAM CHARACTER SET=UTF8;
INSERT INTO c (name) VALUES ('mike'),('carl'),('suzy');
INSERT INTO t (name) VALUES ('plan1'),('plan2'),('plan3'),('plan4');
INSERT INTO c2t (cid, tid, dateStart, dateEnd) VALUES
(1, 1, '2014-01-01', '2014-07-31'),
(1, 2, '2014-08-01', '2015-07-31'),
(1, 1, '2015-08-01', null),
(1, 3, '2015-09-01', null),
(2, 1, '2014-01-01', '2015-07-31'),
(2, 2, '2015-08-01', '2015-09-30'),
(2, 3, '2015-09-30', null),
(3, 1, '2014-01-01', '2014-12-31'),
(3, 2, '2014-01-01', '2014-12-31'),
(3, 3, '2015-01-01', '2015-10-31'),
(3, 4, '2015-01-01', '2015-10-31');
I've developed a query to find the c's who have active subscriptions of t's:
SELECT c.*
FROM c
LEFT JOIN c2t ON c.id = c2t.cid
AND NOW() BETWEEN COALESCE(dateStart, '0000-00-00')
AND COALESCE(dateEnd, DATE_ADD(NOW(), INTERVAL 1 DAY))
GROUP BY c2t.cid
HAVING COUNT(c2t.id) > 0;
Result as expected:
id name
1 mike
2 carl
The problem arises when I try to count the result rows. The query is almost identical, I've just dropped in a COUNT(*):
SELECT COUNT(*)
FROM c
LEFT JOIN c2t ON c.id = c2t.cid
AND NOW() BETWEEN COALESCE(dateStart, '0000-00-00')
AND COALESCE(dateEnd, DATE_ADD(NOW(), INTERVAL 1 DAY))
GROUP BY c2t.cid
HAVING COUNT(c2t.id) > 0;
Result:
`COUNT(*)`
2
1
Expected result would be a single row containing the number of rows found (2). I can only assume that the GROUP BY is interfering, but have no idea how to work around. Explanations are most welcome.

Wrap everything with subquery and use COUNT in outer query:
SELECT COUNT(*)
FROM (
SELECT c.*
FROM c
LEFT JOIN c2t ON c.id = c2t.cid
AND NOW() BETWEEN COALESCE(dateStart, '0000-00-00')
AND COALESCE(dateEnd, DATE_ADD(NOW(), INTERVAL 1 DAY))
GROUP BY c2t.cid
HAVING COUNT(c2t.id) > 0
) AS sub

If the only thing you want returned is the number of c's who have active subscriptions, then you can simplify your query like this:
SELECT COUNT(DISTINCT c.id) AS cnt
FROM c
INNER JOIN c2t ON c.id = c2t.cid
AND NOW() BETWEEN COALESCE(dateStart, '0000-00-00')
AND COALESCE(dateEnd, DATE_ADD(NOW(), INTERVAL 1 DAY))
So, INNER JOIN is used in place of LEFT JOIN: there is no need to return c's with no matches in c2t, since these are not going to have any active subscriptions.
Also, there is no need to GROUP BY: the query returns just one row with the number of c's.
Finally, DISTINCT must be used in COUNT so as to avoid counting duplicate c.id values more than once.

Related

Write a query to identify frequent posters

I'm trying to write a query that will find the user_id's of all users
that have created a minimum of two posts in a maximum of 1 hour.
Here's a light example of the data:
CREATE TABLE tbl_posts
(`id` int, `user_id` int, `created_date` datetime);
INSERT INTO tbl_posts
(`id`, `user_id`, `created_date`)
VALUES
(1, 1, '2021-07-01 09:00'),
(2, 2, '2021-07-01 10:15'), -- *
(3, 2, '2021-07-01 11:00'), -- * user posted twice within an hour.
(4, 3, '2021-07-01 13:00'),
(5, 3, '2021-07-01 15:00'),
(6, 3, '2021-07-01 18:00'),
(7, 4, '2021-07-01 11:00'),
(8, 4, '2021-07-02 11:30'),
(9, 4, '2021-07-03 12:30'), -- *
(10, 4, '2021-07-03 12:45'); -- * user posted twice within an hour.
http://sqlfiddle.com/#!9/0e7cba
The expected output of the query is
2, 4
This output is expected because users 2 and 4 have each posted at least twice in under an hour.
I don't know where to begin with this in MySQL. I can export the data and get a result procedurally in something like C or Python, but I'm sure this is accomplishable in MySQL and am curious to know how. Maybe I need a Window function?
Use EXISTS:
SELECT DISTINCT t1.user_id
FROM tbl_posts t1
WHERE EXISTS (
SELECT 1
FROM tbl_posts t2
WHERE t2.user_id = t1.user_id
AND t1.created_date < t2.created_date
AND TIMESTAMPDIFF(SECOND, t1.created_date, t2.created_date) <= 60 * 60
)
Or, if your version of MySql is 8.0+ use LEAD() window function:
SELECT user_id
FROM (
SELECT *, TIMESTAMPDIFF(
SECOND,
created_date,
LEAD(created_date) OVER (PARTITION BY user_id ORDER BY created_date)
) diff
FROM tbl_posts
) t
GROUP BY user_id
HAVING MIN(diff) <= 60 * 60
See the demo.
select distinct p.user_id from tbl_posts p
inner join tbl_posts p2 on p.user_id = p2.user_id
and p.created_date < p2.created_date
and DATE_ADD(p.created_date,interval 1 hour) >= p2.created_date

MySQL get previous and next record when order by date?

I have the following tables and data:
CREATE TABLE `jobs` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`product_id` INT(10) UNSIGNED NOT NULL,
`status_id` INT(10) UNSIGNED NOT NULL,
`start_dt` TIMESTAMP NOT NULL,
`end_dt` TIMESTAMP NOT NULL,
`refreshed_dt` TIMESTAMP AS ((`start_dt` + interval ((to_days(`end_dt`) - to_days(`start_dt`)) / 2) day)) STORED,
`job_title` VARCHAR(100) NOT NULL,
PRIMARY KEY (`id`)
)
COLLATE='utf8_unicode_ci'
ENGINE=InnoDB
CREATE TABLE `job_industry` (
`job_id` INT(10) UNSIGNED NOT NULL,
`industry_id` INT(10) UNSIGNED NOT NULL,
PRIMARY KEY (`job_id`, `industry_id`),
INDEX `job_industry_industry_id_foreign` (`industry_id`),
CONSTRAINT `job_industry_industry_id_foreign` FOREIGN KEY (`industry_id`) REFERENCES `industries` (`id`),
CONSTRAINT `job_industry_job_id_foreign` FOREIGN KEY (`job_id`) REFERENCES `jobs` (`id`) ON DELETE CASCADE
)
COLLATE='utf8_unicode_ci'
ENGINE=InnoDB
INSERT INTO jobs (product_id, status_id, start_dt, end_dt, job_title)
VALUES (1, 4, "2019-07-28", "2019-08-28", "Financial Accountant"),
(1, 4, "2019-07-28", "2019-08-28", "Payroll Clerk"),
(3, 4, "2019-07-28", "2019-08-28", "Management Accountant"),
(1, 4, "2019-07-28", "2019-08-28", "Accounts Assistant"),
(1, 4, "2019-07-28", "2019-08-28", "Auditor");
INSERT INTO job_industry (job_id, industry_id)
VALUES (1, 1), (2, 1), (3, 1), (4, 1), (5, 1);
I have the following query to return a paginated results set to return all jobs which are currently live and within the accountancy industry sector:
select jobs.id,
jobs.job_title
from jobs
inner join job_industry on job_industry.job_id = jobs.id
where job_industry.industry_id in (1)
and jobs.start_dt <= now()
and jobs.end_dt >= now()
and jobs.status_id = 4
group by jobs.id
order by CASE WHEN jobs.product_id = 3 AND jobs.refreshed_dt <= now() THEN
jobs.refreshed_dt
ELSE jobs.start_dt
END desc, jobs.id desc limit 10 offset 0
The order by clause in the above query uses the product_id and refreshed_dt to give a higher ranking to records - if the product_id is 3 then it's considered a premium listing and if its reached half way through it's listing period then we use the refreshed_dt to bump it up in the list. The refreshed_dt basically is the mid point between the start_dt and end_dt. We want to list the newest listing first.
The above query give me the following result set:
id | job_title
----------------------
3 | Management Accountant <--- premium listing
5 | Auditor <--- previous
4 | Accounts Assistant <--- selected record
2 | Payroll Clerk <--- next
1 | Financial Accountant
Now if I select record id 4, how do I get the previous and next records?
I've checked the following post How to get next/previous record in MySQL? but that only works if you're ordering by id.
This is my attempt to get previous record which returns record id 5 which is correct however if there were other premium records then i feel this query would fail:
select MAX(jobs.id)
from jobs
inner join job_industry on job_industry.job_id = jobs.id
where job_industry.industry_id in (1)
and jobs.start_dt <= now()
and jobs.end_dt >= now()
and jobs.status_id = 4
and jobs.id > 4
order by CASE WHEN jobs.product_id = 3 AND jobs.refreshed_dt <= now() THEN
jobs.refreshed_dt
ELSE jobs.start_dt
END desc, jobs.id desc
And to get next record I have the following which return record id 1 which is incorrect:
select MIN(jobs.id)
from jobs
inner join job_industry on job_industry.job_id = jobs.id
where job_industry.industry_id in (1)
and jobs.start_dt <= now()
and jobs.end_dt >= now()
and jobs.status_id = 4
and jobs.id < 4
order by CASE WHEN jobs.product_id = 3 AND jobs.refreshed_dt <= now() THEN
jobs.refreshed_dt
ELSE jobs.start_dt END desc, jobs.id desc
Some help to tackle this would be appreciated. Please note I've provided a minimal reproducible example above. Also i'm using and limited to mysql version 5.7.17

Get time difference between all consecutive rows (latest one not printing)

I'm trying to retrieve all columns data along with the time difference between all consecutive rows from the following table, where (sender_id = 1 OR = 2) and (recipient_id = 2 OR = 1).
CREATE TABLE records (
id INT(11) AUTO_INCREMENT,
send_date DATETIME NOT NULL,
content TEXT NOT NULL,
sender_id INT(11) NOT NULL,
recipient_id INT(11) NOT NULL,
PRIMARY KEY (id)
);
INSERT INTO records (send_date, content, sender_id, recipient_id) VALUES
('2013-08-23 14:50:00', 'record 1/5', 1, 2),
('2013-08-23 14:51:00', 'record 2/5', 2, 1),
('2013-08-23 15:50:00', 'record 3/5', 2, 1),
('2013-08-23 15:50:13', 'record 4/5', 1, 2),
('2013-08-23 16:50:00', 'record 5/5', 1, 2);
Problem is my select query won't output the latest record because of the WHERE clause :
SELECT t1.content, DATE_FORMAT(t1.send_date, '%b, %D, %H:%i') AS 'pprint_date',
TIMESTAMPDIFF(MINUTE, t1.send_date, t2.send_date) AS 'duration'
FROM records t1, records t2
WHERE (t1.id = t2.id - 1) /*<= this subtraction excludes latest record*/
AND ((t1.sender_id = 1 AND t1.recipient_id = 2)
OR (t1.sender_id = 2 AND t1.recipient_id = 1))
ORDER BY t1.id ASC
How can I properly get the time difference between all consecutive records while still printing all of them ?
I would use a correlated subquery:
select r.*,
(select r2.send_date
from records r2
where (r2.sender_id in (1, 2) or r2.recipient_id in (1, 2)) and
r2.send_date > r.send_date
order by r2.send_date asc
limit 1
) as next_send_date
from records r
where r.sender_id in (1, 2) or r.recipient_id in (1, 2);
You can get the duration (instead of the next time) by using TIMESTAMPDIFF(MINUTE, r.send_date, r2.send_date) in the subquery. I think the first version is easier for you to test with to see what is happening.

MySQL Query for finding a "LAST" row, based on two fields

I have the following MySQL table to log the registration status changes of pupils:
CREATE TABLE `pupil_registration_statuses` (
`status_id` INT(11) NOT NULL AUTO_INCREMENT,
`status_pupil_id` INT(10) UNSIGNED NOT NULL,
`status_status_id` INT(10) UNSIGNED NOT NULL,
`status_effectivedate` DATE NOT NULL,
PRIMARY KEY (`status_id`),
INDEX `status_pupil_id` (`status_pupil_id`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
Example data:
INSERT INTO `pupil_registration_statuses` (`status_id`, `status_pupil_id`, `status_status_id`, `status_effectivedate`) VALUES
(1, 123, 1, '2013-05-06'),
(2, 123, 2, '2014-03-15'),
(3, 123, 5, '2013-03-15'),
(4, 123, 6, '2013-05-06'),
(5, 234, 2, '2013-02-02'),
(6, 234, 4, '2013-04-17'),
(7, 345, 2, '2014-02-01'),
(8, 345, 3, '2013-06-01');
It is possible that statuses can be inserted, thus the sequence of dates does not necessarily follow the same sequence of IDs.
For example: status_id 1 might has a date of 2013-05-06, but status_id 3 might have a date of 2013-03-15.
status_id values are, however, sequential within any particular date. Thus if a pupil's registration status changes multiple times on one day then the last row will will reflect their status for that date.
It is necessary to find out a particular student's registration status on a particular date. The following query works for an individual pupil:
SELECT *
FROM pupil_registration_statuses
WHERE status_pupil_id = 123
AND status_effectivedate <= '2013-05-06'
ORDER BY status_effectivedate DESC, status_id DESC
LIMIT 1;
This returns the expected row of status_id = 4
However, I now need to issue a (single) query to return the status for all pupils on a particular date.
The following query is proposed, but doesn't obey the "last status_id in a day" requirement:
SELECT *
FROM pupil_registration_statuses prs
INNER JOIN (SELECT status_pupil_id, MAX(status_effectivedate) last_date
FROM pupil_registration_statuses
WHERE status_effectivedate <= '2013-05-06'
GROUP BY status_pupil_id) qprs ON prs.status_pupil_id = qprs.status_pupil_id AND prs.status_effectivedate = qprs.last_date;
This query, however, returns 2 rows for pupil 123.
EDIT
To clarify, if the input is the date '2013-05-06', I expect to get the rows 4 and 6 from the query.
http://sqlfiddle.com/#!2/68ee6/2
Is this what you're after?
SELECT a.*
FROM pupil_registration_statuses a
JOIN
( SELECT prs.status_pupil_id
, MIN(prs.status_id) min_status_id
FROM pupil_registration_statuses prs
JOIN
( SELECT status_pupil_id
, MAX(status_effectivedate) last_date
FROM pupil_registration_statuses
WHERE status_effectivedate <= '2013-05-06'
GROUP
BY status_pupil_id
) qprs
ON prs.status_pupil_id = qprs.status_pupil_id
AND prs.status_effectivedate = qprs.last_date
GROUP
BY prs.status_pupil_id
) b
ON b.min_status_id = a.status_id;
http://sqlfiddle.com/#!2/68ee6/7
(Incidentally, there's an ugly and undocumented hack for this kind of problem which goes something like this:
SELECT x.* FROM (SELECT * FROM prs WHERE status_effectivedate <= '2013-05-06' ORDER BY status_pupil_id, status_effectivedate DESC, status_id)x GROUP BY status_pupil_id;
...but I didn't tell you that! ;) )
If I understood right, you want to...
1) Get 1 row per person.
2) Get the status changes from the specific day you manually input.
3) Get the last status changes from within the specific day.
If that's right, you need the query you already have ordering by date and then by id, just with a distinct.
SELECT DISTINCT on status_pupil_id *
FROM pupil_registration_statuses
WHERE status_pupil_id = 123
AND status_effectivedate <= '2013-05-06'
ORDER BY status_effectivedate DESC, status_id DESC
I have changed where clause, please try it.
SELECT *
FROM pupil_registration_statuses prs
INNER JOIN (SELECT status_pupil_id, MAX(status_effectivedate) last_date
FROM pupil_registration_statuses
WHERE Datediff(status_effectivedate, '2013-05-06') <= 0
GROUP BY status_pupil_id) qprs ON prs.status_pupil_id = qprs.status_pupil_id AND prs.status_effectivedate = qprs.last_date;
EDIT
Try this
SELECT *
FROM
(
select status_pupil_id,max(status_id) as status_id from pupil_registration_statuses innr
--where Datediff(dd,status_effectivedate, '2013-05-06') >= 0
group by status_pupil_id
)as ca
inner join pupil_registration_statuses prs on prs.status_id = ca.status_id
where Datediff(dd,prs.status_effectivedate, '2013-05-06') >= 0

MySQL query, MAX() + GROUP BY

Daft SQL question. I have a table like so ('pid' is auto-increment primary col)
CREATE TABLE theTable (
`pid` INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
`timestamp` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
`cost` INT UNSIGNED NOT NULL,
`rid` INT NOT NULL,
) Engine=InnoDB;
Actual table data:
INSERT INTO theTable (`pid`, `timestamp`, `cost`, `rid`)
VALUES
(1, '2011-04-14 01:05:07', 1122, 1),
(2, '2011-04-14 00:05:07', 2233, 1),
(3, '2011-04-14 01:05:41', 4455, 2),
(4, '2011-04-14 01:01:11', 5566, 2),
(5, '2011-04-14 01:06:06', 345, 1),
(6, '2011-04-13 22:06:06', 543, 2),
(7, '2011-04-14 01:14:14', 5435, 3),
(8, '2011-04-14 01:10:13', 6767, 3)
;
I want to get the PID of the latest row for each rid (1 result per unique RID). For the sample data, I'd like:
pid | MAX(timestamp) | rid
-----------------------------------
5 | 2011-04-14 01:06:06 | 1
3 | 2011-04-14 01:05:41 | 2
7 | 2011-04-14 01:14:14 | 3
I've tried running the following query:
SELECT MAX(timestamp),rid,pid FROM theTable GROUP BY rid
and I get:
max(timestamp) ; rid; pid
----------------------------
2011-04-14 01:06:06; 1 ; 1
2011-04-14 01:05:41; 2 ; 3
2011-04-14 01:14:14; 3 ; 7
The PID returned is always the first occurence of PID for an RID (row / pid 1 is frst time rid 1 is used, row / pid 3 the first time RID 2 is used, row / pid 7 is first time rid 3 is used). Though returning the max timestamp for each rid, the pids are not the pids for the timestamps from the original table. What query would give me the results I'm looking for?
(Tested in PostgreSQL 9.something)
Identify the rid and timestamp.
select rid, max(timestamp) as ts
from test
group by rid;
1 2011-04-14 18:46:00
2 2011-04-14 14:59:00
Join to it.
select test.pid, test.cost, test.timestamp, test.rid
from test
inner join
(select rid, max(timestamp) as ts
from test
group by rid) maxt
on (test.rid = maxt.rid and test.timestamp = maxt.ts)
select *
from (
select `pid`, `timestamp`, `cost`, `rid`
from theTable
order by `timestamp` desc
) as mynewtable
group by mynewtable.`rid`
order by mynewtable.`timestamp`
Hope I helped !
SELECT t.pid, t.cost, to.timestamp, t.rid
FROM test as t
JOIN (
SELECT rid, max(tempstamp) AS maxtimestamp
FROM test GROUP BY rid
) AS tmax
ON t.pid = tmax.pid and t.timestamp = tmax.maxtimestamp
I created an index on rid and timestamp.
SELECT test.pid, test.cost, test.timestamp, test.rid
FROM theTable AS test
LEFT JOIN theTable maxt
ON maxt.rid = test.rid
AND maxt.timestamp > test.timestamp
WHERE maxt.rid IS NULL
Showing rows 0 - 2 (3 total, Query took 0.0104 sec)
This method will select all the desired values from theTable (test), left joining itself (maxt) on all timestamps higher than the one on test with the same rid. When the timestamp is already the highest one on test there are no matches on maxt - which is what we are looking for - values on maxt become NULL. Now we use the WHERE clause maxt.rid IS NULL or any other column on maxt.
You could also have subqueries like that:
SELECT ( SELECT MIN(t2.pid)
FROM test t2
WHERE t2.rid = t.rid
AND t2.timestamp = maxtimestamp
) AS pid
, MAX(t.timestamp) AS maxtimestamp
, t.rid
FROM test t
GROUP BY t.rid
But this way, you'll need one more subquery if you want cost included in the shown columns, etc.
So, the group by and join is better solution.
If you want to avoid a JOIN, you can use:
SELECT pid, rid FROM theTable t1 WHERE t1.pid IN ( SELECT MAX(t2.pid) FROM theTable t2 GROUP BY t2.rid);
Try:
select pid,cost, timestamp, rid from theTable order by timestamp DESC limit 2;