One row per group with multiple column sorting - mysql

Would like to return one row per group, where the one is selected by multiple sort columns. Treading lightly here in the land of greatest-n-per-group to avoid a duplicate question.
SCHEMA:
CREATE TABLE logs (
id INT NOT NULL,
ip_address INT NOT NULL,
status INT NOT NULL,
PRIMARY KEY id
);
DATA:
INSERT INTO logs (id, ip_address, status)
VALUES ('1', 19216800, 1),
('2', 19216801, 2),
('3', 19216800, 2),
('4', 19216803, 0),
('5', 19216804, 0),
('6', 19216803, 0),
('7', 19216804, 1);
CURRENT QUERY:
SELECT *
FROM logs
ORDER BY ip_address, status=1 DESC, id DESC
Note: sorting by status=1 effectively turns the status column into a boolean. The tie breaker after status=1 is id. This query currently returns the correct row for each ip_address first and then a bunch of other rows I don't want for that ip_address.
CURRENT OUTPUT:
1, 19216800, 1
3, 19216800, 2
2, 19216801, 2
6, 19216803, 0
4, 19216803, 0
7, 19216804, 1
5, 19216804, 0
WANTED OUTPUT:
1, 19216800, 1
2, 19216801, 2
6, 19216803, 0
7, 19216804, 1
Today my workaround is to filter in PHP with if ($lastIP == $row['ip_address']) continue;. But I would like to move this logic to MySQL.

Try this -
SELECT MIN(id), ip_address, status
FROM logs
GROUP BY ip_address, status

Since there are already hundreds of solutions for greatest-n-per-group problems in MySQL, I'm going to start answering these questions with CTE syntax with window functions, since that is now available in MySQL 8.0.3.
WITH sorted AS (
SELECT id, ip_address, status,
ROW_NUMBER() OVER (PARTITION BY ip_address ORDER BY status) AS rn
FROM logs
)
SELECT * FROM sorted WHERE rn = 1;

Here is different way to think about the problem. You want to find the "best" row for each id_address. Or in other words, you want to select rows where no better row exists.
This solution works for MySQL versions before 8.0. In other words, it works with the version you already have installed with RHEL 7. You can extend this technique easily for an arbitrary number of sort columns.
SELECT a.*
FROM (SELECT * FROM logs) a
LEFT JOIN (SELECT * FROM logs) b
ON (b.ip_address = a.ip_address AND (b.stat=1) > (a.stat=1))
OR (b.ip_address = a.ip_address AND (b.stat=1) = (a.stat=1) AND b.id > a.id)
WHERE b.id IS NULL
ORDER BY a.ip_address
If you have more columns to sort by then keeping adding OR clauses to handle tie breaks and select the "best" row for each ip_address. Regardless how complicated your subquery is or how many "SORT BY~ conditions you have, you will only need one LEFT JOIN with this technique.

Try this:
SELECT
l.`ip_address` , l.`status`
FROM
`logs` l
GROUP BY l.`ip_address`
ORDER BY l.`status` = 1 DESC

Related

Retrieve latest model belonging to a set of users from a MySQL table

I need to retrieve the latest model in a relationship, from a collection of records that belong to a set of users. The best I've come up with is:
SELECT
`answers`.*,
answers.created_at,
`questions`.`survey_id` AS `laravel_through_key`
FROM
`answers`
INNER JOIN
`questions` ON `questions`.`id` = `answers`.`question_id`
WHERE
`questions`.`id` IN (4, 5, 6)
AND `user_id` IN (1 , 2, 3)
group by user_id, question_id
ORDER BY `created_at` DESC
The tables are:
questions
id, text
answers
id, user_id (belongs to a user), question_id (belongs to a question)
users
id, name
For a set of users with IDs 1, 2, 3 - I want to retrieve the set of answers to questions with IDs 4, 5, 6 - but I only want each user's most recent answer for each question. In other words, there should only be a single answer for each user/question combination. I thought using GROUP_BY would do the trick, but I don't get the most recent answer. I'm using Laravel, but the issue is more a SQL one rather than a Laravel specific problem.
In MySQL 8 or later you can use window functions to find greatest n per group:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id, question_id ORDER BY created_at DESC) AS rn
FROM answers
WHERE user_id IN (1 , 2, 3) AND question_id IN (4, 5, 6)
)
SELECT *
FROM cte
WHERE rn = 1

Group by bucket (with NULL values)

I have the following tables:
entries (id, title, text, duplicate_bucket_id)
duplicate_buckets (id, comment)
So every entry can be in a duplicate bucket. Now I want to get all entries without the duplicates:
SELECT MIN(id) FROM entries GROUP BY duplicate_bucket_id
The problem with this query is that it also groups all the entries without a duplicate_bucket_id to only one entry with NULL.
So I need something like this
(SELECT MIN(id) FROM entries WHERE duplicate_bucket_id IS NOT NULL GROUP BY duplicate_bucket_id)
UNION
(SELECT id FROM entries WHERE duplicate_bucket_id IS NULL)
This query gives me the correct result, but ActiveRecord can't use UNIONs.
Alternatively, I can use this query with a subquery:
SELECT * FROM entries WHERE duplicate_bucket_id IS NULL OR id IN
(SELECT MIN(id) FROM entries WHERE duplicate_bucket_id IS NOT NULL GROUP BY duplicate_bucket_id )
In this query, I must place additional where-clauses in AND outside of the subquery. So the query gets quite complicated and I don't know yet, how to use the Ransack Gem with such a query...
The query would be simple, if every "entry" would be in a "duplicate_bucket" - buckets of size 1 (I could use *SELECT * FROM entries GROUP BY duplicate_bucket_id*). But I want to avoid to have entries in a duplicate_bucket, if the entry don't have a duplicate. Is there a simple query (no unions, no subqueries) to get all entries without their duplicates?
Dataset
entries(id, title, text, duplicate_bucket_id)
1, 'My title', 'Bla bla', 1
2, 'Hello', 'Jaha', 1
3, 'Test', 'Bla bla', 1
4, 'Foo', 'Bla', NULL
5, 'Bar1', '', 2
6, 'Bar2', '', 2
duplicate_buckets (id, comment)
1, 'This bucket has 3 entries'
2, 'Bar1 and Bar2 are duplicates!'
Result
1, 'My title', 'Bla bla', 1
4, 'Foo', 'Bla', NULL
5, 'Bar1', '', 2
ANSI/ISO SQL:
select *
from entries as e1
where not exists (select null from entries as e2 where e2.duplicate_bucket_id = e1.duplicate_bucket_id and e2.id < e1.id)
;
MySQL Terrible, Horrible, No Good, Very Bad syntax
select *
from entries
group by coalesce(-duplicate_bucket_id,id)
;

How come RAND() is messing up in SQL subquery?

My goal is to select a random business and then with that business' id get all of their advertisements. I am getting unexpected results from my query. The number of advertisement rows returned is always what I assume is the value of "SELECT id FROM Business ORDER BY RAND() LIMIT 1". I have 3 businesses and only 1 business that has advertisement rows (5 of them) yet it always displays between 1-3 of the 5 advertisements for the same business.
SELECT * FROM Advertisement WHERE business_id=(SELECT id FROM Business ORDER BY RAND() LIMIT 1) ORDER BY priority
Business TABLE:
Advertisement TABLE:
Data for Advertisement and Business tables:
INSERT INTO `Advertisement` (`id`, `business_id`, `image_url`, `link_url`, `priority`) VALUES
(1, 1, 'http://i64.tinypic.com/2w4ehqw.png', 'https://www.dennys.com/food/burgers-sandwiches/spicy-sriracha-burger/', 1),
(2, 1, 'http://i65.tinypic.com/zuk1w1.png', 'https://www.dennys.com/food/burgers-sandwiches/prime-rib-philly-melt/', 2),
(3, 1, 'http://i64.tinypic.com/8yul3t.png', 'https://www.dennys.com/food/burgers-sandwiches/cali-club-sandwich/', 3),
(4, 1, 'http://i64.tinypic.com/o8fj9e.png', 'https://www.dennys.com/food/burgers-sandwiches/bacon-slamburger/', 4),
(5, 1, 'http://i68.tinypic.com/mwyuiv.png', 'https://www.dennys.com/food/burgers-sandwiches/the-superbird/', 5);
INSERT INTO `Business` (`id`, `name`) VALUES
(1, 'Test Dennys'),
(2, 'Test Business 2'),
(3, 'Test Business 3');
You're assuming your query does something it doesn't do.
(SELECT id FROM Business ORDER BY RAND() LIMIT 1) isn't materialized at the beginning of the query. It's evaluated for each row... so for each row, we're testing whether that business_id matches the result of a newly-executed instance of the subquery. More thorough test data (more than one business included) should reveal this.
You need to materialize the result into a derived table, then join to it.
SELECT a.*
FROM Advertisement a
JOIN (
SELECT (SELECT id
FROM Business
ORDER BY RAND()
LIMIT 1) AS business_id
) b ON b.business_id = a.business_id;
The ( SELECT ... ) x construct creates a temporary table that exists only for the duration of the query and uses the alias x. Such tables can be joined just like real tables.
MySQL calls this a Subquery in the FROM Clause.
Try following query
SELECT * FROM Advertisement WHERE business_id = (select floor(1 + rand()* (select count(*) from Business)));
To retrieve rows in random order use SELECT * Instead Of Id and then query for its id.
SELECT * FROM Advertisement WHERE business_id=(SELECT ID FROM (SELECT * FROM Business ORDER BY RAND() LIMIT 1) as table1)
In this case with your example data, only when rand returns 1 you get results.

Order results in which it was read in. Using in()

This is my query in mysql:
SELECT school_id, first_name, last_name, email, blog_username, comment_username
FROM table
WHERE user_id IN (100, 3,72) ;
The results show the two user_id's in ascending order. How can I make it so that it is ordered by in which is was received?
So instead of 3, 72, 100 I want the results to be 100, 3, 72.
Select school_id, first_name, last_name, email, blog_username, comment_username
From table
Where user_id IN ( 100, 3, 72 )
Order By Case
When user_id = 100 Then 1
When user_id = 3 Then 2
When user_id = 72 Then 3
End Asc
Addition explanation:
What is being sought is the ability to order the rows in a custom manner. Said another way, we need to add custom cardinality to a set of values that do not conform to a standard cardinality. A Case expression can be used to do just that. Another way to accomplish the same thing would be:
Select school_id, first_name, last_name, email, blog_username, comment_username
From table
Join (
Select 100 As user_id, 1 As Sort
Union All Select 3, 2
Union All Select 72, 3
) As Seq
On Seq.user_id = table.user_id
Order By Seq.Sort
MySQL has a function FIELD() which is suited for this:
ORDER BY FIELD(user_id, 100, 3, 72 )
http://dev.mysql.com/doc/refman/5.6/en/string-functions.html#function_field
and in general (method is designed for strings but it will work just fine with numbers)
ORDER BY FIELD(field_name, value_1, value_2, ... )
However
It makes your SQL less portable
Ordering like this is much less efficient
Whole code
SELECT school_id, first_name, last_name, email, blog_username, comment_username
FROM table
WHERE user_id IN (100, 3, 72)
ORDER BY FIELD(user_id, 100, 3, 72)
Although I would consider changing the rest of you code so you do not have to use this method.

How can I find which two rows have timestamps closest to each other?

I'm building a web application for location-based check ins, sort of like a local 4square, but based on RFID tags.
Anyway, each check-in is stored in a MySQL table with a userID and the time of the check-in as a DATETIME column.
Now I'd like to show which users have the closest check-in times between different stations.
Explanation: Let's say user A checked in at 21:43:12 and then again at 21:43:19. He moved between stations in 7 seconds.
There are thousands of check-ins in the database, how do I write SQL to select the users with the two closest check-in times?
Try this:
select
a.id,
b.id,
abs(a.rfid-b.rfid)
from
table1 a,
table1 a
where
a.userID=b.userID
// and any other conditions to make it a single user
group by
a.id,
b.id,
a.rfid,
b.rfid
order by
abs(a.rfid-b.rfid) desc
limit 1
Really fast solution would introduce some precalculations. Like storing the difference between current and previous checkins.
In this case you would select what you need in fast manner (as long as you cover that column by index).
Not using precalculation in this case would cause terrible queries that would operate over cartesian-like productions.
What have you tried? Have you looked at DATEDIFF
http://msdn.microsoft.com/en-us/library/ms189794.aspx
Cheers
--Jocke
First, you want an index on the user and then the timestamp.
Second, you need to use correlated sub-queries to find "the next timestamp".
Then you use GROUP BY to find the smallest interval per user.
SELECT
a.user_id,
MIN(TIMEDIFF(b.timestamp, a.timestamp)) AS min_duration,
FROM
checkin AS a
INNER JOIN
checkin AS b
ON b.user_id = a.user_id
AND b.timestamp = (SELECT MIN(timestamp)
FROM checkin
WHERE user_id = a.user_id
AND timestamp > a.timestamp)
GROUP BY
a.user_id
ORDER BY
min_duration
LIMIT
1
If you want to allow for multiple users with the same min_duration, I recommend storing the results (without the LIMIT 1) in a temporary table, then searching that table for all users that share the minimum duration.
Depending on the volume of data, this could be slow. One optimisation would be to cache the results of the TIMEDIFF(). Every time a new checkin is recorded, also calculate and store the duration since the last checkin, maybe using triggers. Having this pre-calculated makes the query simpler and the values indexable.
I figure, you only want to compute the difference between two checkins if, they are two consecutive checkins of the same person.
create table test (
id int,
person_id int,
checkin datetime);
insert into test (id, person_id, checkin) values (1, 1, now());
insert into test (id, person_id, checkin) values (2, 1, now());
insert into test (id, person_id, checkin) values (3, 2, now());
insert into test (id, person_id, checkin) values (4, 2, now());
insert into test (id, person_id, checkin) values (5, 1, now());
insert into test (id, person_id, checkin) values (6, 2, now());
insert into test (id, person_id, checkin) values (7, 1, now());
select * from (
select a.*,
(select a.checkin - b.checkin
from test b where b.person_id = a.person_id
and b.checkin < a.checkin
order by b.checkin desc
limit 1
) diff
from test a
where a.person_id = 1
order by a.person_id, a.checkin
) tt
where diff is not null
order by diff asc;
SELECT a.*, b.*
FROM table_name AS a
JOIN table_name AS b
ON a.id != b.id
ORDER BY TIMESTAMPDIFF(SECOND, a.checkin, b.checkin) ASC
LIMIT 1
Should do it. Might be a bit laggy as mentioned.