MySQL query, MAX() + GROUP BY - mysql

Daft SQL question. I have a table like so ('pid' is auto-increment primary col)
CREATE TABLE theTable (
`pid` INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
`timestamp` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
`cost` INT UNSIGNED NOT NULL,
`rid` INT NOT NULL,
) Engine=InnoDB;
Actual table data:
INSERT INTO theTable (`pid`, `timestamp`, `cost`, `rid`)
VALUES
(1, '2011-04-14 01:05:07', 1122, 1),
(2, '2011-04-14 00:05:07', 2233, 1),
(3, '2011-04-14 01:05:41', 4455, 2),
(4, '2011-04-14 01:01:11', 5566, 2),
(5, '2011-04-14 01:06:06', 345, 1),
(6, '2011-04-13 22:06:06', 543, 2),
(7, '2011-04-14 01:14:14', 5435, 3),
(8, '2011-04-14 01:10:13', 6767, 3)
;
I want to get the PID of the latest row for each rid (1 result per unique RID). For the sample data, I'd like:
pid | MAX(timestamp) | rid
-----------------------------------
5 | 2011-04-14 01:06:06 | 1
3 | 2011-04-14 01:05:41 | 2
7 | 2011-04-14 01:14:14 | 3
I've tried running the following query:
SELECT MAX(timestamp),rid,pid FROM theTable GROUP BY rid
and I get:
max(timestamp) ; rid; pid
----------------------------
2011-04-14 01:06:06; 1 ; 1
2011-04-14 01:05:41; 2 ; 3
2011-04-14 01:14:14; 3 ; 7
The PID returned is always the first occurence of PID for an RID (row / pid 1 is frst time rid 1 is used, row / pid 3 the first time RID 2 is used, row / pid 7 is first time rid 3 is used). Though returning the max timestamp for each rid, the pids are not the pids for the timestamps from the original table. What query would give me the results I'm looking for?

(Tested in PostgreSQL 9.something)
Identify the rid and timestamp.
select rid, max(timestamp) as ts
from test
group by rid;
1 2011-04-14 18:46:00
2 2011-04-14 14:59:00
Join to it.
select test.pid, test.cost, test.timestamp, test.rid
from test
inner join
(select rid, max(timestamp) as ts
from test
group by rid) maxt
on (test.rid = maxt.rid and test.timestamp = maxt.ts)

select *
from (
select `pid`, `timestamp`, `cost`, `rid`
from theTable
order by `timestamp` desc
) as mynewtable
group by mynewtable.`rid`
order by mynewtable.`timestamp`
Hope I helped !

SELECT t.pid, t.cost, to.timestamp, t.rid
FROM test as t
JOIN (
SELECT rid, max(tempstamp) AS maxtimestamp
FROM test GROUP BY rid
) AS tmax
ON t.pid = tmax.pid and t.timestamp = tmax.maxtimestamp

I created an index on rid and timestamp.
SELECT test.pid, test.cost, test.timestamp, test.rid
FROM theTable AS test
LEFT JOIN theTable maxt
ON maxt.rid = test.rid
AND maxt.timestamp > test.timestamp
WHERE maxt.rid IS NULL
Showing rows 0 - 2 (3 total, Query took 0.0104 sec)
This method will select all the desired values from theTable (test), left joining itself (maxt) on all timestamps higher than the one on test with the same rid. When the timestamp is already the highest one on test there are no matches on maxt - which is what we are looking for - values on maxt become NULL. Now we use the WHERE clause maxt.rid IS NULL or any other column on maxt.

You could also have subqueries like that:
SELECT ( SELECT MIN(t2.pid)
FROM test t2
WHERE t2.rid = t.rid
AND t2.timestamp = maxtimestamp
) AS pid
, MAX(t.timestamp) AS maxtimestamp
, t.rid
FROM test t
GROUP BY t.rid
But this way, you'll need one more subquery if you want cost included in the shown columns, etc.
So, the group by and join is better solution.

If you want to avoid a JOIN, you can use:
SELECT pid, rid FROM theTable t1 WHERE t1.pid IN ( SELECT MAX(t2.pid) FROM theTable t2 GROUP BY t2.rid);

Try:
select pid,cost, timestamp, rid from theTable order by timestamp DESC limit 2;

Related

Finding the entry with the most occurrences per group

I have the following (simplified) Schema.
CREATE TABLE TEST_Appointment(
Appointment_id INT AUTO_INCREMENT PRIMARY KEY,
Property_No INT NOT NULL,
Property_Type varchar(10) NOT NULL
);
INSERT INTO TEST_Appointment(Property_No, Property_Type) VALUES
(1, 'House'),
(1, 'House'),
(1, 'House'),
(2, 'Flat'),
(2, 'Flat'),
(3, 'Flat'),
(4, 'House'),
(5, 'House'),
(6, 'Studio');
I am trying to write a query to get the properties that have the most appointments in each property type group. An example output would be:
Property_No | Property_Type | Number of Appointments
-----------------------------------------------------
1 | House | 3
2 | Flat | 2
6 | Studio | 1
I have the following query to get the number of appointments per property but I am not sure how to go from there
SELECT Property_No, Property_Type, COUNT(*)
from TEST_Appointment
GROUP BY Property_Type, Property_No;
If you are running MySQL 8.0, you can use aggregation and window functions:
select *
from (
select property_no, property_type, count(*) no_appointments,
rank() over(partition by property_type order by count(*) desc) rn
from test_appointment
group by property_no, property_type
) t
where rn = 1
In earlier versions, one option uses a having clause and a row-limiting correlated subquery:
select property_no, property_type, count(*) no_appointments
from test_appointment t
group by property_no, property_type
having count(*) = (
select count(*)
from test_appointment t1
where t1.property_type = t.property_type
group by t1.property_no
order by count(*) desc
limit 1
)
Note that both queries allow ties, if any.

Can you get a different column from a row with a MIN or MAX value?

I'm building an application with millions of rows, so I'm trying to avoid JOIN whenever possible. I have a table like this:
ID category value_1 value_2
1 1 2.2432 5.4321
2 2 6.5423 5.1203
3 1 8.8324 7.4938
4 2 0.4823 9.8244
5 2 7.2456 3.1278
6 1 1.9348 4.4421
I'm trying to retrieve value_1 from the row with the lowest ID and value_2 from the row with the highest ID while grouped by category, like this:
category value_1 value_2
1 2.2432 4.4421
2 6.5423 3.1278
Is this possible in an effective way while avoiding constructs like string operations and JOIN?
Thank you!
Try this:
SELECT
category,
(
SELECT t2.value1
FROM table1 t2
WHERE t2.id = MIN(t1.id)
) as value1,
(
SELECT t3.value2
FROM table1 t3
WHERE t3.id = MAX(t1.id)
) as value2
FROM
table1 t1
GROUP BY
category
;
Create and fill table:
CREATE TABLE `table1` (
`id` INT NOT NULL,
`category` INT NULL,
`value1` DOUBLE NULL,
`value2` DOUBLE NULL,
PRIMARY KEY (`id`)
);
INSERT INTO table1 VALUES
(1, 1, 2.2432, 5.4321),
(2, 2, 6.5423, 5.1203),
(3, 1, 8.8324, 7.4938),
(4, 2, 0.4823, 9.8244),
(5, 2, 7.2456, 3.1278),
(6, 1, 1.9348, 4.4421);
Output:
1 2.2432 4.4421
2 6.5423 3.1278
One approach which avoids joins is to use ROW_NUMBER:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY category ORDER BY ID) rn_min,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY ID DESC) rn_max
FROM yourTable
)
SELECT
category,
MAX(CASE WHEN rn_min = 1 THEN value_1 END) AS value_1,
MAX(CASE WHEN rn_max = 1 THEN value_2 END) AS value_2
FROM cte
GROUP BY
category;
Demo
Edit:
The above query should benefit from the following index:
CREATE INDEX idx ON yourTable (category, ID);
This should substantially speed up the row number operations.

get rows from a table where value of field x is maximum

I have two tables myTable and myTable2 in a mysql database:
CREATE TABLE myTable (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
number INT,
version INT,
date DATE
) ENGINE MyISAM;
INSERT INTO myTable
(`id`, `number`, `version`, `date`)
VALUES
(1, '123', '1', '2016-01-12'),
(2, '123', '2', '2016-01-13'),
(3, '124', '1', '2016-01-14'),
(4, '124', '2', '2016-01-15'),
(5, '124', '3', '2016-01-16'),
(6, '125', '1', '2016-01-17')
;
CREATE TABLE myTable2 (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
myTable_id INT
) ENGINE MyISAM;
INSERT INTO myTable2
(`id`, `myTable_id`)
VALUES
(1, 1),
(2, 1),
(3, 2),
(4, 2),
(5, 3),
(6, 3),
(7, 4),
(8, 4),
(9, 4),
(10, 5),
(11, 6)
;
The field myTable2.myTable_id is a foreign key of myTable.Id.
I would like to get all the rows from myTable where myTable2.myTable_id = myTable.Id and the value of the field version in myTable is the maximum for every corresponding value for the field number in myTable.
I tried something like this:
SELECT
*
FROM
myTable,
myTable2
WHERE
myTable.version = (SELECT MAX(myTable.version) FROM myTable)
But the above query does not return the correct data. The correct query should output this:
Id number version date
2 123 2 2016-01-13
5 124 3 2016-01-16
6 125 1 2016-01-17
Please help!
One way to do this is to get the max version for each number in myTable in a derived table and join with that:
SELECT DISTINCT
m.*
FROM
myTable m
JOIN
myTable2 m2 ON m.id = m2.myTable_id
JOIN
(
SELECT number, MAX(version) AS max_version
FROM myTable
GROUP BY number
) AS derived_table
ON m.number = derived_table.number
AND m.version = derived_table.max_version
With your sample data this produces a result like this:
id number version date
6 125 1 2016-01-17
5 124 3 2016-01-16
2 123 2 2016-01-13
your Query is logically wrong. Here is the correct one
SELECT
*
FROM
myTable,
myTable2
WHERE
(myTable.version,myTable.number) in
(SELECT MAX(myTable.version),number FROM myTable group by number)
and myTable.id=myTable2.id
Here is the sqlfiddle http://sqlfiddle.com/#!9/74a67/4/0
This is the query posted for the previous edited question
SELECT * FROM myTable
inner join myTable2 on myTable.id = myTable2.mytable_id
WHERE (version, number) in
(SELECT MAX(version), number FROM myTable group by number)
Try this solution with using subquery simply as:
# Selecting desired result..
SELECT t1.id, t1.number, t1.version, t1.date
FROM myTable As t1 JOIN
# subquery to select max version and its corresponding
# number form myTable
(SELECT number, max(version) As max_ver FROM myTable
GROUP BY number
) As t2 ON t1.number = t2.number and t1.version = t2.max_ver
# Now checking for foreign key..
WHERE t1.id IN (SELECT mytable_id FROM myTable2);
Was it helpful..

MySQL COUNT(*) not counting result rows

Simplified schema of m:n relation implementing a subscription model:
CREATE TABLE c (
id INT(11) PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(32)
) ENGINE=MyISAM CHARACTER SET=UTF8;
CREATE TABLE t (
id INT(11) PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(32)
) ENGINE=MyISAM CHARACTER SET=UTF8;
CREATE TABLE c2t (
id INT(11) PRIMARY KEY AUTO_INCREMENT,
cid INT(11) NOT NULL,
tid INT(11) NOT NULL,
dateStart DATE NULL,
dateEnd DATE NULL
) ENGINE=MyISAM CHARACTER SET=UTF8;
INSERT INTO c (name) VALUES ('mike'),('carl'),('suzy');
INSERT INTO t (name) VALUES ('plan1'),('plan2'),('plan3'),('plan4');
INSERT INTO c2t (cid, tid, dateStart, dateEnd) VALUES
(1, 1, '2014-01-01', '2014-07-31'),
(1, 2, '2014-08-01', '2015-07-31'),
(1, 1, '2015-08-01', null),
(1, 3, '2015-09-01', null),
(2, 1, '2014-01-01', '2015-07-31'),
(2, 2, '2015-08-01', '2015-09-30'),
(2, 3, '2015-09-30', null),
(3, 1, '2014-01-01', '2014-12-31'),
(3, 2, '2014-01-01', '2014-12-31'),
(3, 3, '2015-01-01', '2015-10-31'),
(3, 4, '2015-01-01', '2015-10-31');
I've developed a query to find the c's who have active subscriptions of t's:
SELECT c.*
FROM c
LEFT JOIN c2t ON c.id = c2t.cid
AND NOW() BETWEEN COALESCE(dateStart, '0000-00-00')
AND COALESCE(dateEnd, DATE_ADD(NOW(), INTERVAL 1 DAY))
GROUP BY c2t.cid
HAVING COUNT(c2t.id) > 0;
Result as expected:
id name
1 mike
2 carl
The problem arises when I try to count the result rows. The query is almost identical, I've just dropped in a COUNT(*):
SELECT COUNT(*)
FROM c
LEFT JOIN c2t ON c.id = c2t.cid
AND NOW() BETWEEN COALESCE(dateStart, '0000-00-00')
AND COALESCE(dateEnd, DATE_ADD(NOW(), INTERVAL 1 DAY))
GROUP BY c2t.cid
HAVING COUNT(c2t.id) > 0;
Result:
`COUNT(*)`
2
1
Expected result would be a single row containing the number of rows found (2). I can only assume that the GROUP BY is interfering, but have no idea how to work around. Explanations are most welcome.
Wrap everything with subquery and use COUNT in outer query:
SELECT COUNT(*)
FROM (
SELECT c.*
FROM c
LEFT JOIN c2t ON c.id = c2t.cid
AND NOW() BETWEEN COALESCE(dateStart, '0000-00-00')
AND COALESCE(dateEnd, DATE_ADD(NOW(), INTERVAL 1 DAY))
GROUP BY c2t.cid
HAVING COUNT(c2t.id) > 0
) AS sub
If the only thing you want returned is the number of c's who have active subscriptions, then you can simplify your query like this:
SELECT COUNT(DISTINCT c.id) AS cnt
FROM c
INNER JOIN c2t ON c.id = c2t.cid
AND NOW() BETWEEN COALESCE(dateStart, '0000-00-00')
AND COALESCE(dateEnd, DATE_ADD(NOW(), INTERVAL 1 DAY))
So, INNER JOIN is used in place of LEFT JOIN: there is no need to return c's with no matches in c2t, since these are not going to have any active subscriptions.
Also, there is no need to GROUP BY: the query returns just one row with the number of c's.
Finally, DISTINCT must be used in COUNT so as to avoid counting duplicate c.id values more than once.

MySQL Query for finding a "LAST" row, based on two fields

I have the following MySQL table to log the registration status changes of pupils:
CREATE TABLE `pupil_registration_statuses` (
`status_id` INT(11) NOT NULL AUTO_INCREMENT,
`status_pupil_id` INT(10) UNSIGNED NOT NULL,
`status_status_id` INT(10) UNSIGNED NOT NULL,
`status_effectivedate` DATE NOT NULL,
PRIMARY KEY (`status_id`),
INDEX `status_pupil_id` (`status_pupil_id`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
Example data:
INSERT INTO `pupil_registration_statuses` (`status_id`, `status_pupil_id`, `status_status_id`, `status_effectivedate`) VALUES
(1, 123, 1, '2013-05-06'),
(2, 123, 2, '2014-03-15'),
(3, 123, 5, '2013-03-15'),
(4, 123, 6, '2013-05-06'),
(5, 234, 2, '2013-02-02'),
(6, 234, 4, '2013-04-17'),
(7, 345, 2, '2014-02-01'),
(8, 345, 3, '2013-06-01');
It is possible that statuses can be inserted, thus the sequence of dates does not necessarily follow the same sequence of IDs.
For example: status_id 1 might has a date of 2013-05-06, but status_id 3 might have a date of 2013-03-15.
status_id values are, however, sequential within any particular date. Thus if a pupil's registration status changes multiple times on one day then the last row will will reflect their status for that date.
It is necessary to find out a particular student's registration status on a particular date. The following query works for an individual pupil:
SELECT *
FROM pupil_registration_statuses
WHERE status_pupil_id = 123
AND status_effectivedate <= '2013-05-06'
ORDER BY status_effectivedate DESC, status_id DESC
LIMIT 1;
This returns the expected row of status_id = 4
However, I now need to issue a (single) query to return the status for all pupils on a particular date.
The following query is proposed, but doesn't obey the "last status_id in a day" requirement:
SELECT *
FROM pupil_registration_statuses prs
INNER JOIN (SELECT status_pupil_id, MAX(status_effectivedate) last_date
FROM pupil_registration_statuses
WHERE status_effectivedate <= '2013-05-06'
GROUP BY status_pupil_id) qprs ON prs.status_pupil_id = qprs.status_pupil_id AND prs.status_effectivedate = qprs.last_date;
This query, however, returns 2 rows for pupil 123.
EDIT
To clarify, if the input is the date '2013-05-06', I expect to get the rows 4 and 6 from the query.
http://sqlfiddle.com/#!2/68ee6/2
Is this what you're after?
SELECT a.*
FROM pupil_registration_statuses a
JOIN
( SELECT prs.status_pupil_id
, MIN(prs.status_id) min_status_id
FROM pupil_registration_statuses prs
JOIN
( SELECT status_pupil_id
, MAX(status_effectivedate) last_date
FROM pupil_registration_statuses
WHERE status_effectivedate <= '2013-05-06'
GROUP
BY status_pupil_id
) qprs
ON prs.status_pupil_id = qprs.status_pupil_id
AND prs.status_effectivedate = qprs.last_date
GROUP
BY prs.status_pupil_id
) b
ON b.min_status_id = a.status_id;
http://sqlfiddle.com/#!2/68ee6/7
(Incidentally, there's an ugly and undocumented hack for this kind of problem which goes something like this:
SELECT x.* FROM (SELECT * FROM prs WHERE status_effectivedate <= '2013-05-06' ORDER BY status_pupil_id, status_effectivedate DESC, status_id)x GROUP BY status_pupil_id;
...but I didn't tell you that! ;) )
If I understood right, you want to...
1) Get 1 row per person.
2) Get the status changes from the specific day you manually input.
3) Get the last status changes from within the specific day.
If that's right, you need the query you already have ordering by date and then by id, just with a distinct.
SELECT DISTINCT on status_pupil_id *
FROM pupil_registration_statuses
WHERE status_pupil_id = 123
AND status_effectivedate <= '2013-05-06'
ORDER BY status_effectivedate DESC, status_id DESC
I have changed where clause, please try it.
SELECT *
FROM pupil_registration_statuses prs
INNER JOIN (SELECT status_pupil_id, MAX(status_effectivedate) last_date
FROM pupil_registration_statuses
WHERE Datediff(status_effectivedate, '2013-05-06') <= 0
GROUP BY status_pupil_id) qprs ON prs.status_pupil_id = qprs.status_pupil_id AND prs.status_effectivedate = qprs.last_date;
EDIT
Try this
SELECT *
FROM
(
select status_pupil_id,max(status_id) as status_id from pupil_registration_statuses innr
--where Datediff(dd,status_effectivedate, '2013-05-06') >= 0
group by status_pupil_id
)as ca
inner join pupil_registration_statuses prs on prs.status_id = ca.status_id
where Datediff(dd,prs.status_effectivedate, '2013-05-06') >= 0