Select most occurring value in MySQL - mysql

I'm looking for a way to select the most occurring value, e.g. the person who posted most for each thread;
SELECT MOST_OCCURRING(user_id) FROM thread_posts GROUP BY thread_id
Is there a good way to do this?

If you want a count on a per thread basis, I think you can use a nested query; grouping by thread first and then by user:
SELECT thread_id AS tid,
(SELECT user_id FROM thread_posts
WHERE thread_id = tid
GROUP BY user_id
ORDER BY COUNT(*) DESC
LIMIT 0,1) AS topUser
FROM thread_posts
GROUP BY thread_id

This will tabulate the occurrences of user_id per thread
SELECT thread_id, user_id, COUNT(*) as postings
FROM thread_posts
GROUP BY thread_id, user_id
But you only wish to select the top user for each thread
SELECT thread_id, user_id, postings
FROM (
SELECT thread_id, user_id, COUNT(*) as postings
FROM thread_posts
GROUP BY thread_id, user_id
)
HAVING postings = max(postings)
which is equivalent to
SELECT thread_id, user_id, COUNT(*) as postings
FROM thread_posts
GROUP BY thread_id, user_id
HAVING postings = max(postings)
The HAVING keyword is usually used with an aggregation operation to cherry-pick the aggregated output lines that satisfy the conditions in the HAVING clause.
The HAVING clause is different from the the WHERE clause, wherein the HAVING clause filters resultant output of a query. Whereas, the WHERE clause filters on the input data of a query.
Since theHAVING clause filters the resultant output of a query, it must appear after the ORDER BY and GROUP BY clauses.

There's numerous examples if you check the questions under the "greatest n per group" tag. But in this case, you don't define how you want to handle ties - what if two or more users have the same count value?
SELECT DISTINCT
tp.thread_id,
tp.user_id
FROM THREAD_POSTS tp
JOIN (SELECT t.thread_id,
t.user_id,
COUNT(t.user_id) AS occurrence,
CASE
WHEN #thread != t.thread_id THEN #rownum := 1
ELSE #rownum := #rownum + 1
END AS rank,
#thread := t.thread_id
FROM THREAD_POSTS t
JOIN (SELECT #rownum := 0, #thread := -1) r
GROUP BY t.thread_id, t.user_id
ORDER BY t.thread_id, occurrence DESC) x ON x.thread_id = tp.thread_id
AND x.user_id = tp.user_id
AND x.rank = 1

Related

How do I make a sql sentence to do order by before distinct?

For example, I have this order table, and it has columns: order_id, user_id, create_time, city_id.
Now I want to get the entry of an user's most recent order so basically what I want to do is:
select distinct(order.user_id), city_id
from order
where city_id != 0
order by create_time desc
But as far as I know distinct will run before order by, which means there's already only one user_id left for each user before it reaches order by, so what do I do to make order by run first?
Have a sub-query that returns each user's most recent create_time. JOIN with that result.
select o1.user_id, o1.city_id
from order o1
join (select user_id, max(create_time) as newest_create_time
from order
where city_id != 0
group by user_id) o2
on o1.user_id = o2.user_id and o1.create_time = o2.newest_create_time
where o1.city_id != 0

MySql - SUM() of column in child table (1:n) in a Sub-query/Outer Join using LIMIT

I have two tables - PRODUCT, ACTIVITY. Each product can have multiple activity (1:n). Each activity has an (INT) action column. I need to query all the products and the SUM(product.action) of the 10 most recent activities.
My first attempt was to use a sub-query:
select p.*, (select sum(a.action) from activity a
where a.product_id = p.product_id
order by a.timestamp desc limit 10) recent
from product p
However, the result was incorrect. I realized that the sub-query wasn't using the LIMIT and was returning the SUM() of all ACTIVITY records matching the product_id.
My second attempt was to follow the advice here and wrap the sub-query in another select query:
select p.*, (select sum(temp.action) as recent
from (select a.action from activity a
where a.product_id = p.product_id
order by a.timestamp desc limit 10) temp)
from product p
However, I got the error message
Error Code: 1054. Unknown column 'p.product_id' in 'where clause'. I found a related question here and realized that MYSQL doesn't support alias on 2nd level nesting. I didn't quite follow the answer for that question.
I also tried an outer join
select p.*, sum(temp.action) as recent
from product p
left join
(select a.product_id, a.action from activity a
where a.product_id = p.product_id
order by a.timestamp desc limit 10) temp
on p.product_id= temp.product_id
Again, I ran into the same issues:
The LIMIT is not enforced
The alias is not recognized
How do I fix this?
1- Get distinct products from your product table
2- Get the ten most recent activities for each product
3- Get sums from (2)
4- Join
Take a look at Using LIMIT within GROUP BY to get N results per group?. It sounds similar to what you need for (2).
EDIT
I modified the query slightly and tested it on a small dataset. The previous version did not work because the where clause was in the wrong place. See below.
select t.product_id, sum(t.action) from
(
select product_id, action, timestamp,
#row_num := if(#prev = product_id, #row_num + 1, 1) as row_num, #prev := product_id
from activity
join (select #prev := null, #row_num := 0) as vars
order by product_id, timestamp desc
) as t
where t.row_num <= 10
group by t.product_id;

How do I select a row with max count doing a group by

I have a table posts with columns (id, user_name, thread_id).
A user can submit multiple posts for a thread. thread to post is one to many.
I need to find out who submitted max posts per thread. So the result would be
Max(Count), user_name, thread_id WHERE there will be only one row per thread_id.
The table is too huge so I wanted to get the query optimized as much as I could.
You can try with the group by and having clauses:
select t.user_name, t.thread_id , count(*) as max_count
from tbl t
group by t.user_name, t.thread_id
having count(*) = ( select count(*) as ttl
from tbl
where thread_id = t.thread_id
group by user_name
order by ttl desc
limit 1 )
select user_name, thread_id, count(*) as max
from tbl t
group by user_name, thread_id
having count(*) = (
select count(*) as cnt /* most posts per user per thread */
from tbl
group by user_name, thread_id
order by cnt desc
limit 1
)
Easy workaround for system that don't have limit is:
select user_name, thread_id, count(*) as max
from tbl t
group by user_name, thread_id
having count(*) = (
select max(cnt) from (
select count(*) as cnt /* most posts per user per thread */
from tbl
group by user_name, thread_id
) m
)
Suppose you have a table posts with fields id, user_name & thread_id.
If you want to query which user has posted the most posts on a specific thread and the total number of his posts from a table, you can achieve that with this MySQL query:
SELECT user_name, thread_id, count(thread_id)
FROM posts WHERE thread_id=1 GROUP BY user_name
ORDER BY count(thread_id) DESC LIMIT 1;
It will return only one row...

MySQL - if row is duplicate, return only the first one

I have a MySQL table "results" which has the following fields:
id (PK, AI), user_id (FK), date, score, time
I want to be able to query this table so that it sorts and returns the fields in order of score (descending order) followed by time (ascending order). So something like:
SELECT * FROM results ORDER BY score DESC, time ASC.
However, after this sorting, if more than one row has the same user_id, I only want to include the highest row.
How would I do this?
You can do this with not exists:
SELECT *
FROM results r
WHERE NOT EXISTS (select 1 from results r2 where r2.user_id = r.user_id and r2.id > r.id)
ORDER BY score DESC;
This will work best with an index on results(user_id, id).
My suggestion: SELECT user_id, max(score), time FROM results GROUP BY user_id ORDER BY score DESC;
Select id and highest score per user_id via max() and Group By. Then order the records by score descending.
EDIT: If you need the time for the user-score and there is only one entry with the same score you can use a subselect to get this time:
SELECT user_id, max(score), (
SELECT max(time)
FROM results AS r2
WHERE r2.user_id = r1.user_id
AND r2.score = max(r1.score)
) AS time
FROM results AS r1
GROUP BY user_id
ORDER BY score DESC;
I've managed to get something working at the moment.
SELECT user_id, score, time
FROM results T
WHERE T.score = (
SELECT MAX(T2.score)
FROM results T2
WHERE T2.user_id = T.user_id
)
ORDER BY score DESC, time ASC;

How to give rank in query according to points

The below query is working absolutely fine, as I need. All the user get unique RANKS (User of same points should not get same rank)
SELECT
id,
first_name,
email,
(SELECT
rank
FROM ( SELECT
#rownum:=#rownum+1 rank,
u.id AS user_id,
points
FROM
user_master u, (SELECT #rownum:=0) r
ORDER BY
points
DESC) AS tmp
WHERE
user_id = um.id) AS Rank,
registered_date AS registered,
um.points as Points
FROM
user_master um
ORDER BY
um.id ASC
Now I want to make view for this query, it gives me error message
View's SELECT contains a subquery in the FROM clause
I've also tried first to make a view of user ranks to merge 2 different views. The below query gives perfect rankings of user but when I try to make view of this:
SELECT
#rownum:=#rownum+1 rank,
id AS user_id,
points
FROM
user_master u, (SELECT #rownum:=0) r
ORDER BY
points
DESC
..it gives me error message:
View's SELECT contains a variable or parameter
Is there any other way to apply rank in this query (Rank must be unique even if points are same).
Give this a go:
create view test_view as SELECT t.id,t.first_name,t.email,
(select sum(case when t1.points > t.points then 1
when t1.points = t.points and t1.id < t.id then 1
else 0 end) from user_master t1)+1 as rank, t.registered_date AS registered,
t.points as Points
from user_master t
order by points desc;