How to deduplicate mysql rows but keep max view - mysql

I have MySQL rows like this
id | title | desc | view
1 | i'm a title | i'm a desc | 0
2 | i'm a title | i'm a desc | 0
3 | i'm a title | i'm a desc | 5
4 | i'm a title | i'm a desc | 0
5 | i'm a title | i'm a desc | 0
6 | i'm a title | i'm a desc | 3
8 | i'm a title | i'm a desc | 0
And i would like to keep only
3 | i'm a title | i'm a desc | 5
because this record as the max view and others are duplicates

If your data is not too big, you can use delete like this:
delete t from yourtable t join
(select title, `desc`, max(view) as maxview
from yourtable t
group by title, `desc`
) tt
on t.title = tt.title and
t.`desc` = tt.`desc` and
t.view < tt.maxview;
Note: if there are multiple rows with the same maximum number of views, this will keep all of them. Also, desc is a lousy name for a column because it is a SQL (and MySQL) reserved word.
EDIT:
If you have a large amount of data, often it is faster to do the truncate/re-insert approach:
create table temp_t as
select t.*
from yourtable t join
(select title, `desc`, max(view) as maxview
from yourtable t
group by title, `desc`
) tt
on t.title = tt.title and
t.`desc` = tt.`desc` and
t.view = tt.maxview;
truncate table yourtable;
insert into yourtable
select *
from temp_t;

I could not understand what the specific question is. The possible solutions are followed...
1) Use UPDATE instead of INSERT statement in mysql. Just write UPDATE your_table_name SET view=view+1
2) or you can run a cron job if using php to delete duplicate rows having lower value
3) If INSERT is necessary then you should do ON DUPLICATE KEY UPDATE. Refer to the documentation * http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html

Related

GROUP_CONCAT ORDER BY With LIMIT Unknown column 'p.id' in 'where clause' [duplicate]

This question already has answers here:
GROUP_CONCAT with limit
(7 answers)
Closed 9 months ago.
Here is my table:
+----+---------+------------+----------+
| id | message | projectID | noteType |
+----+---------+------------+----------+
| 1 | 1 | 125 | update |
| 2 | 2 | 125 | update |
| 3 | 3 | 125 | update |
| 4 | 4 | 125 | update |
| 5 | 5 | 125 | update |
| 6 | 6 | 125 | update |
My query using the suggestion below:
SELECT `p`.`id`, `proName`, `p`.`proType`, `p`.`priority`,
`p`.`busSegment`, `p`.`portfolio`, `p`.`description`,
(SELECT group_concat('<li>', `message`, '</li>') AS temp FROM (SELECT
projectID, message FROM notes where projectID = p.id AND noteType =
'update' ORDER BY id DESC LIMIT 3) three_messages GROUP BY projectID) as
updates
FROM `projects` as `p`
WHERE `p`.`id` = 125
Error:
Error Code: 1054. Unknown column 'p.id' in 'where clause'
All records are returned. For some reason, the LIMIT 3 is not working.
The suggestion query work by it self.
SELECT group_concat('<li>', `message`, '</li>') AS updates
FROM (
SELECT projectID, message
FROM notes where projectID = 125 AND noteType = 'update'
ORDER BY id DESC
LIMIT 3
) three_messages
GROUP BY projectID;
+---------------------+
| updates |
+---------------------+
| 6,5,4 |
+---------------------+
I'm guessing you have more than project id = 125 in your real query and for each one you want 3 results. in that case you may need to do some kind of ranking. I am doing it here with the row_number() function.
here is the fiddle https://www.db-fiddle.com/f/uqSzoth466RX86c5dCkfDR/0
with t as(select a.*,
row_number() over
(partition by project_id order by message desc) as rn
from mytable a
where note_type = 'update')
SELECT project_id,
group_concat('<li>', `message`, '</li>' ORDER BY id DESC) AS updates
from t where rn <=3 group by project_id;
LIMIT applies to the rows after grouping.
You can use it in a subquery and group the results of that subquery:
SELECT group_concat('<li>', `message`, '</li>' ORDER BY id DESC) AS updates
FROM (
SELECT id, projectID, message
FROM notes where projectID = 125 AND noteType = 'update'
ORDER BY id DESC
LIMIT 3
) three_messages
GROUP BY projectID
It does look to me like you want:
group_concat('<li>', `message`, '</li>' ORDER BY id DESC SEPARATOR '')
though, to not put commas between the list items.
I also think it is a bad idea to use id for ordering; if you want newest notes, use a timestamp. Imagine if your database was hacked and you had to restore from a backup. After the restore, you start your system again and notes get added. Then you go through your application logs and recover some number of notes that were lost; if you are imputing order to ids, you have to increase all the ids added after the restore to make room for the recovered ones. It's much better just to be able to insert with a timestamp.

How to get lowest value from posts with the same ID [duplicate]

This question already has answers here:
Retrieving the last record in each group - MySQL
(33 answers)
Closed 2 years ago.
My sample table:
ID | Post_id | Score
1 | 1 | 33
2 | 1 | 43
3 | 1 | 27
4 | 1 | 66
I want to get rows with the lowest value (Score). In this case it is:
ID | Post_id | Score
3 | 1 | 27
My query:
SELECT * FROM table WHERE post_id = '1' GROUP BY post_id ORDER BY Score ASC
But that doesn't work because it returns me: Score: 33
How to fix it? What if I have thousands of rows and want post_id to be unique for the lowest values?
You must use subquery selecting min values for each post_id.
SELECT a.* FROM records a
JOIN
( SELECT post_id, MIN(score) as min_score
FROM records GROUP BY post_id
) b
ON a.score=b.min_score;
Output
| id | post_id | score |
| --- | ------- | ----- |
| 3 | 1 | 27 |
| 5 | 2 | 20 |
View on DB Fiddle
For a single id, just remove the group by and use limit:
SELECT *
FROM table
WHERE post_id = 1
ORDER BY Score ASC
LIMIT 1;
I assume that post_id is a number. Compare numbers to numbers, not to strings.
EDIT:
If you want this per post_id, then just use a correlated subquery:
select t.*
from t
where t.score = (select min(t2.score) from t t2 where t2.post_id = t.post_id);
If you may have multiple rows with the lowest score, you can do it with a sub-query :
SELECT *
FROM test
WHERE post_id = 1
AND score = (
SELECT MIN(score)
FROM test
WHERE post_id = 1
)
Fiddle : https://www.db-fiddle.com/f/3ppntnA77HFpKRU82h32Gv/1
IF you're using MySQL v8.0, you can use the ROW_NUMBER() function to order the result. That way you can choose the row with the lower score and return everything from it:
select
sq.id, sq.post_id, sq.score
from
(select id, post_id, score
, row_number() over (partition by post_id order by score) RowNo
from test) sq
where sq.RowNo = 1
Here is a Fiddle to test the code: https://www.db-fiddle.com/#&togetherjs=8dHSCs50Iq
I also included another post_id beside your sample data, to demonstrate how it reacts to multiple post_id's
The below should do the trick:
Select
id,
score,
Post_id,
min(score)
from
table
where
score = min(score);

Different ORDER BY direction for MySql query results

I am trying to do some ordering on a mysql query that I can't figure out.
id | status | created_at
------------------------
1 | open | 1348778070
2 | closed | 1348711241
3 | open | 1348839204
4 | closed | 1348738073
5 | banned | 1348238422
How do I order the above table so that the 'open' records are first, in ASC order; and then the non-open records are second in DESC order? In another word, have a dynamic second level ordering direction based on some condition?
I have tried a UNION of two SELECT queries with ordering within them, which doesn't work because UNION by default produces an unordered set of rows.
Also I've tried a pseudo column that subtracts the created_at timestamp from a large number, for the closed status records, so I can just ORDER BY ASC to get the result as per below...
SELECT table.*, (table.created_at) as tmp_order FROM table
WHERE table.status = 'open'
UNION
SELECT table.*, (999999999 - table.created_at) as tmp_order FROM table
WHERE table.status = 'closed'
ORDER BY tmp_order ASC
This works but I feel there has to be a better way. Ideally a solution would not include a random big number as above
UPDATED
SELECT *
FROM tmp_order
ORDER BY FIELD(status, 'open') DESC,
CASE
WHEN status = 'open'
THEN created_at
ELSE (999999999 - created_at)
END
or
SELECT *
FROM tmp_order
ORDER BY FIELD(status, 'open') DESC,
CASE
WHEN status = 'open'
THEN created_at END,
CASE
WHEN status <> 'open'
THEN created_at END DESC
Output:
| ID | STATUS | CREATED_AT |
----------------------------
| 1 | open | 1348778070 |
| 3 | open | 1348839204 |
| 4 | closed | 1348738073 |
| 2 | closed | 1348711241 |
| 5 | banned | 1348238422 |
Here is SQLFiddle demo.
Try:
SELECT id, status,
if (status = 'open', created_at, 999999999 - created_at) as tmp_order
FROM table
ORDER BY status, tmp_order
This is how I would solve it:
SELECT
id, status, created_at
FROM
yourtable
ORDER BY
status DESC,
CASE WHEN status='open' THEN created_at END,
CASE WHEN status='closed' THEN created_at END DESC
In your case, you can probably do it in one query, but the general solution, which works for any two different and unrelated. orderings is to use two unioned subqueries each with their own ordering;
SELECT * FROM (
SELECT *
FROM table
WHERE table.status = 'open'
ORDER BY created_at DESC) x
UNION ALL
SELECT * FROM (
SELECT *
FROM table
WHERE table.status = 'closed'
ORDER BY created_at) y

MySQL Group By Consecutive Rows

I have a feed application that I am trying to group results from consecutively.
My table looks like this:
postid | posttype | target | action | date | title | content
1 | userid | NULL | upgrade | 0000-01-00 00:00:00 | Upgraded 1 | exmple
1 | userid | NULL | upgrade | 0000-01-00 00:00:01 | Upgraded 2 | exmple
1 | userid | NULL | downgrade | 0000-01-00 00:00:02 | Downgraded | exmple
1 | userid | NULL | upgrade | 0000-01-00 00:00:03 | Upgraded | exmple
What I would like the outcome to be is:
postid | posttype | target | action | date | title | content
1 | userid | NULL | upgrade | 0000-01-00 00:00:01 | Upgrade 1 | exmple,exmple
1 | userid | NULL | downgrade | 0000-01-00 00:00:02 | Downgraded | exmple
1 | userid | NULL | upgrade | 0000-01-00 00:00:03 | Upgraded | exmple
So as you can see because Upgrade 1 & Upgrade 2 were sent Consecutively, it groups them together. The "Action" table is a reference, and should be used for the consecutive grouping as well as the postid & posttype.
I looked around on SO but didnt see anything quite like mine. Thanks in advance for any help.
Here's another version that works with MySQL Variables and doesn't require 3 level nesting deep. The first one pre-sorts the records in order by postID and Date and assigns them a sequential number per group whenever any time a value changes in one of the Post ID, Type and/or action. From that, Its a simple group by... no comparing record version T to T2 to T3... what if you wanted 4 or 5 criteria... would you have to nest even more entries?, or just add 2 more #sql variables to the comparison test...
Your call on which is more efficient...
select
PreQuery.postID,
PreQuery.PostType,
PreQuery.Target,
PreQuery.Action,
PreQuery.Title,
min( PreQuery.Date ) as FirstActionDate,
max( PreQuery.Date ) as LastActionDate,
count(*) as ActionEntries,
group_concat( PreQuery.content ) as Content
from
( select
t.*,
#lastSeq := if( t.action = #lastAction
AND t.postID = #lastPostID
AND t.postType = #lastPostType, #lastSeq, #lastSeq +1 ) as ActionSeq,
#lastAction := t.action,
#lastPostID := t.postID,
#lastPostType := t.PostType
from
t,
( select #lastAction := ' ',
#lastPostID := 0,
#lastPostType := ' ',
#lastSeq := 0 ) sqlVars
order by
t.postid,
t.date ) PreQuery
group by
PreQuery.postID,
PreQuery.ActionSeq,
PreQuery.PostType,
PreQuery.Action
Here's my link to SQLFiddle sample
For the title, you might want to adjust the line...
group_concat( distinct PreQuery.Title ) as Titles,
At least this will give DISTINCT titles concatinated... much tougher to get let without nesting this entire query one more level by having the max query date and other elements to get the one title associated with that max date per all criteria.
There is no primary key in your table so for my example I used date. You should create an auto increment value and use that instead of the date in my example.
This is a solution (view on SQL Fiddle):
SELECT
postid,
posttype,
target,
action,
COALESCE((
SELECT date
FROM t t2
WHERE t2.postid = t.postid
AND t2.posttype = t.posttype
AND t2.action = t.action
AND t2.date > t.date
AND NOT EXISTS (
SELECT TRUE
FROM t t3
WHERE t3.date > t.date
AND t3.date < t2.date
AND (t3.postid != t.postid OR t3.posttype != t.posttype OR t3.action != t.action)
)
), t.date) AS group_criterion,
MAX(title),
GROUP_CONCAT(content)
FROM t
GROUP BY 1,2,3,4,5
ORDER BY group_criterion
It basically reads:
For each row create a group criterion and in the end group by it.
This criterion is the highest date of the rows following the current one and having the same postid, posttype and action as the current one but there may be not a row of different postid, posttype or action between them.
In other words, the group criterion is the highest occurring date in a group of consecutive entries.
If you use proper indexes it shouldn't be terribly slow but if you have a lot of rows you should think of caching this information.

Sort data before using GROUP BY?

I have read that grouping happens before ordering, is there any way that I can order first before grouping without having to wrap my whole query around another query just to do this?
Let's say I have this data:
id | user_id | date_recorded
1 | 1 | 2011-11-07
2 | 1 | 2011-11-05
3 | 1 | 2011-11-06
4 | 2 | 2011-11-03
5 | 2 | 2011-11-06
Normally, I'd have to do this query in order to get what I want:
SELECT
*
FROM (
SELECT * FROM table ORDER BY date_recorded DESC
) t1
GROUP BY t1.user_id
But I'm wondering if there's a better solution.
Your question is somewhat unclear but I have a suspicion what you really want is not any GROUP aggregates at all, but rather ordering by date first, then user ID:
SELECT
id,
user_id,
date_recorded
FROM tbl
ORDER BY date_recorded DESC, user_id ASC
Here would be the result. Note reordering by date_recorded from your original example
id | user_id | date_recorded
1 | 1 | 2011-11-07
3 | 1 | 2011-11-06
2 | 1 | 2011-11-05
5 | 2 | 2011-11-06
4 | 2 | 2011-11-03
Update
To retrieve the full latest record per user_id, a JOIN is needed. The subquery (mx) locates the latest date_recorded per user_id, and that result is joined to the full table to retrieve the remaining columns.
SELECT
mx.user_id,
mx.maxdate,
t.id
FROM (
SELECT
user_id,
MAX(date_recorded) AS maxdate
FROM tbl
GROUP BY user_id
) mx JOIN tbl t ON mx.user_id = t.user_id AND mx.date_recorded = t.date_recorded
Iam just using the technique
"Using order clause before group by inserting it in group_concat clause"
SELECT SUBSTRING_INDEX(group_concat(cast(id as char)
ORDER BY date_recorded desc),',',1),
user_id,
SUBSTRING_INDEX(group_concat(cast(`date_recorded` as char)
ORDER BY `date_recorded` desc),',',1)
FROM data
GROUP BY user_id