Removing duplicate rows with row_number issue - duplicates

I have used the following to remove duplicates, however i am not getting the behaviour i expect.
INNER JOIN (SELECT
SLA.*,ROW_NUMBER() OVER (PARTITION BY SLA.WorkItemDimKey ORDER BY SLA.UpdatedBatchId DESC) AS seqnum
FROM dbo.SLAInstanceInformationFactvw SLA) SLA_Remove_Duplicates
ON WorkItemDimvw.WorkItemDimKey = SLA_Remove_Duplicates.WorkItemDimKey
AND SLA_Remove_Duplicates.seqnum =1
I need to use the Order DESC tag to get the duplicate with the largest value of UpdateBatchId. This works great on items that have duplications, however items that do not have duplicates are also excluded.
If i change the Order to ASC (default), it returns all single items plus filtered duplicates, but i get the wrong duplicate (one with the lowest UpdateBatchId).
Would greatly appreciate any help.

Related

MySQL MAX Function mixes rows

I have the query SELECT id, MAX(value) FROM table1 and it returns the correct value, but it takes the first id of the table instead of the one corresponding to the value returned (id is primary key).
I've already seen solutions, but they all needed a WHERE clause which i can't use in my case.
I believe what you're trying to do is return the id of the row with the max value. Is that right?
I'm curious why you can't use a WHERE clause?
But ok, using that constraint this can be solved. I'm going to assume that your table is unique on id (if not, you should really talk to whoever built it and ask why ?)
SELECT id, value
FROM table1
ORDER BY value DESC
LIMIT 1
This will sort your table, by value descending (greatest -> least), and then only show the first row (ie, the row with the largest "value").
If your table is not unique on id, you can still group by ID and get the same
SELECT id, max(value) as max_value
FROM table1
GROUP BY id
ORDER BY max_value DESC
LIMIT 1
First, to answer why your query is behaving in the way you observe: I suspect you are running without sql_mode = only_full_group_by as your query would likely generate an error otherwise. As you've noticed, this can lead to somewhat odd results.
If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard SQL use of GROUP BY permits the select list, HAVING condition, or ORDER BY list to refer to nonaggregated columns even if the columns are not functionally dependent on GROUP BY columns. This causes MySQL to accept the preceding query. In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are nondeterministic, which is probably not what you want.
In this case, since you have no GROUP BY clause, the entire table is effectively the group.
To get one id associated with the largest value in the table, you can select all the rows, order by the value (descending), and then just limit to the first result, no need for the aggregation operator (or a WHERE caluse):
SELECT id, value FROM table1 ORDER BY value DESC LIMIT 1
Note that if there are multiple ids with the (same) max value, this only returns one of them. In the comments, #RaymondNijland points out that this may give different results (for id, the value will always be the maximum) each time you run it, and you can make it more deterministic by ordering by id as well:
SELECT id, value FROM table1 ORDER BY value DESC, id ASC LIMIT 1
Likewise, if there are for some reason multiple values for the same ID, it will still return that ID if one of its rows happens to be the max value -- thankfully this doesn't apply in this case, as you mentioned that id is the primary key.
I think you forgot a group by clause :
SELECT id, MAX(value) FROM table1 GROUP BY id
EDIT : To answer your need you could do
SELECT id, MAX(value)
FROM table1
GROUP BY id
HAVING MAX(value) = (SELECT MAX(value) FROM table1)
This could give you multiple results if you have multiple ids with the max value. In this case you could add "LIMIT 1" to get only one result but that would be quite strange and random.

Mysql DISTINCT with more than one column (remove duplicates)

My database is called: (training_session)
I try to print out some information from my data, but I do not want to have any duplicates. I do get it somehow, may someone tell me what I do wrong?
SELECT DISTINCT athlete_id AND duration FROM training_session
SELECT DISTINCT athlete_id, duration FROM training_session
It works perfectly if i use only one column, but when I add another. it does not work.
I think you misunderstood the use of DISTINCT.
There is big difference between using DISTINCT and GROUP BY.
Both have some sort of goal, but they have different purpose.
You use DISTINCT if you want to show a series of columns and never repeat. That means you dont care about calculations or group function aggregates. DISTINCT will show different RESULTS if you keep adding more columns in your SELECT (if the table has many columns)
You use GROUP BY if you want to show "distinctively" on a certain selected columns and you use group function to calculate the data related to it. Therefore you use GROUP BY if you want to use group functions.
Please check group functions you can use in this link.
https://dev.mysql.com/doc/refman/8.0/en/group-by-functions.html
EDIT 1:
It seems like you are trying to get the "latest" of a certain athlete, I'll assume the current scenario if there is no ID.
Here is my alternate solution:
SELECT a.athlete_id ,
( SELECT b.duration
FROM training_session as b
WHERE b.athlete_id = a.athlete_id -- connect
ORDER BY [latest column to sort] DESC
LIMIT 1
) last_duration
FROM training_session as a
GROUP BY a.athlete_id
ORDER BY a.athlete_id
This syntax is called IN-SELECT subquery. With the help of LIMIT 1, it shows the topmost record. In-select subquery must have 1 record to return or else it shows error.
MySQL's DISTINCT clause is used to filter out duplicate recordsets.
If your query was SELECT DISTINCT athlete_id FROM training_session then your output would be:
athlete_id
----------
1
2
3
4
5
6
As soon as you add another column to your query (in your example, the column called duration) then each record resulting from your query are unique, hence the results you're getting. In other words the query is working correctly.

Getting duplicates from SQL query?

I'm using this sql query, but I get duplicates. How can I make sure that only one post per inst_id is fetched? I guess it's my group by main_posts.c_p_id that makes it happen, but I get an error if I don't have it since I'm sorting by it. Basically I'm trying to sort and put posts that have a match in main_posts.c_p_id first.
SELECT main_posts.c_p_id, insts.inst_id, insts.inst_title
FROM insts
LEFT JOIN inst_posts
ON inst_posts.instp_inst_id = insts.inst_id
LEFT JOIN main_posts
ON main_posts.c_id = insts.instp_c_id
GROUP BY insts.inst_id, main_posts.c_p_id
ORDER BY main_posts.c_p_id DESC, insts.inst_title ASC
How can I make sure that only one post per inst_id is fetched?
Since, you haven't provided any sample data, but you mentioned you are getting duplicate rows. Then, you could use DISTINCT to show distinct rows.
SELECT DISTINCT main_posts.c_p_id, insts.inst_id, insts.inst_title
FROM insts
LEFT JOIN inst_posts
ON inst_posts.instp_inst_id = insts.inst_id
LEFT JOIN main_posts
ON main_posts.c_id = insts.instp_c_id
GROUP BY insts.inst_id, main_posts.c_p_id
ORDER BY main_posts.c_p_id DESC, insts.inst_title ASC
I guess it's my group by main_posts.c_p_id that makes it happen,
Probably.
but I get an error if I don't have it since I'm sorting by it.
No, you are getting error because, you have mentioned main_posts.c_p_id column in GROUP BY. The ORDER BY will only order the record, it doesn't matter whether you select this column or not.

Order by Date not working as expected in MySql

I have a mysql query
select count(*) as TotalCount,
pd.Product_Modified_Date,
psc.Product_Subcategory_Name,
pd.Product_Image_URL
from product_subcategory psc
inner join product_details pd on psc.Product_Subcategory_ID = pd.Product_Subcategory_Reference_ID
where pd.Product_Status = 0 and
psc.Product_Subcategory_Status = 0
group by psc.Product_Subcategory_Name
order by pd.Product_Modified_Date desc
In my product_details table have new image urls. But i could not get it by the above query.
How can i do it?
You are grouping by one column, Product_Subcategory_Name, but you have other columns Product_Image_URL and Product_Modified_Date in your select-list.
If you have cases where the group has multiple rows (which you do, since the count is 14 or more in each group), MySQL can only present one value for the Product_Image_URL. So it picks some row in the group, and uses the value in that row. The URL value for all other rows in the group is ignored.
To fix this, you must group by all columns in your select-list that are not part of an aggregate function. Any column you don't want to use to form a new group must go into an aggregate function.
Roland Bouman wrote an excellent blog detailing how to use GROUP BY properly: http://rpbouman.blogspot.com/2007/05/debunking-group-by-myths.html
Combining GROUP BY and ORDER BY is problematic and your problem is most likely covered in another question on Stack Exchange : MySQL wrong results with GROUP BY and ORDER BY

How to select a random row with a group by clause?

I have the following table
SQLFiddle
What I'm attempting to do is to select three random images but to make sure that no two images have the same object, what I attempted to do is to do a GROUP BY along with an ORDER BY rand() but that is failing as it is always giving me cat1.jpg, dog1.jpg, box1.jpg (All images whose path ends with 1 and not the others)
The fiddle includes the query I ran and how it is not working.
What you need is a Random aggregate function. Usually there are no such functions in the current RDBMSs.
Similar question has been asked.
So the basic idea is shuffle the elements, then group by, and then for every group just select the first row for every group. If we modify one of answers provided on the link we get this.
select object_id, name, image_path
from
(SELECT images.image_path AS image_path, objects.id AS object_id, objects.name
FROM objects LEFT JOIN images ON images.object_id = objects.id
ORDER BY RAND()) as z
group by z.object_id, z.name
You can't get a random image as MySQL always returns that data based on the time of insert (first come, first serve), i.e. internal order.
But you can get a random result using following approach (fiddle):
SELECT images.image_path AS image_path, objects.name
FROM objects
LEFT JOIN
(
SELECT object_id,
SUBSTRING_INDEX(GROUP_CONCAT(image_path order by rand()), ',', 1) AS image_path
FROM images
GROUP BY object_id
) as images
ON images.object_id = objects.id
GROUP BY objects.name
If there's a restrictive WHERE-condition on the objects table you might get a better performance when you join first and the GROUP_CONCAT.
I think this should do:
ORDER BY random()
LIMIT 1