How to detect duplicated rows in SQL with where condition - mysql

I have a table with this sample data:
place_id email
----------------------------
3 uno#uno.com
3 dos#dos.com
4 tres#tres.com
5 uno#uno.com
6 uno#uno.com
3 dos#dos.com
4 tres#tres.com
I want to show the emails that are in different places, I tried this query:
select email, count(email)
from table
group by email
having count(email) > 1
The problem is, this shows the duplicated rows in the same place, and I need to show only rows in different places. For example show only the email "uno#uno.com", that is in the places 3, 5 y 6, and no the "dos#dos.com" that is repeated in the same place.
Thanks.

If you only want the emails, you can use aggregation:
select email
from t
group by email
having min(place) <> max(place);
If you want the places as well in a unique list, you can do:
select distinct place, email
from t
where exists (select 1
from t t2
where t2.email = t.email and t2.place <> t.place
);
And, although you can use window functions, the solution is not as obvious:
select distinct place, email
from (select t.*,
min(t.place) over (partition by t.email) as min_place,
max(t.place) over (partition by t.email) as max_place
from t
) t
where min_place <> max_place;

You could use windowed COUNT(*) supported on modern RDBMS.
SELECT *
FROM (SELECT t.*, COUNT(DISTINCT place_id) OVER(PARTITION BY email) AS cnt
FROM tab t) sub
WHERE cnt > 1
DBFiddle Demo
SQL Sever/MariaDB/MySQL 8.0/PostgreSQL:
SELECT *
FROM (SELECT *, COUNT(*) OVER(PARTITION BY email) AS cnt
FROM (SELECT DISTINCT place_id, email FROM tab) s
)sub
WHERE cnt > 1;
DB-Fiddle.com Demo

You can use simple GROUP BY clause with HAVING clause to filter out the unique place
select place_id, email
from table t
group by place_id, email
having count(*) = 1;

Thanks a lot, i tried the first solution of Gordon Linoff and it's worked. But i have a little problem, i have a "where" clause, and this:
select email, count(email) from data where place = 2 or place=3 or place=4 group by email having min(place) <> max(place)
show the same results as:
select email, count(email) from data where place = 2 or place=3 group by email having min(place) <> max(place)
because it's an or condition, but i don't know how to repair and in the first query how to show only the items that are in all these places, not only in two of them.

Related

In SQL or mySQL, how can I find a key which has the highest sum in another column while the key appears in two column?

Suppose I have a video chatting app that records the username of two users and the length of the call, the table is data of all the calls.
A person can appear in both user1 and user2. For example, in the table David appears in both user1 and user2. Using the data that we have on the table, how can I write a SQL query that finds the user who has the longest total call length? In this case, David has the longest total call length, which is 50 minutes.
You can use a LEAST/GREATEST trick here:
SELECT user, SUM(length) AS total_length
FROM
(
SELECT LEAST(User1, User2) AS user, length
FROM yourTable
UNION ALL
SELECT GREATEST(User1, User2), length
FROM yourTable
) t
GROUP BY
user
ORDER BY
SUM(length) DESC
Demo
with dat as
(
Select 'Jhony' User1, 'Jennifer' User2, 23 Call_Length union all
Select 'David','Michael',10 union all
Select 'Lisa','David',40 union all
Select 'Lisa','Jennifer',5
)
Select top 1 sum(a.call_length+nvl(b.call_length,0)),a.user1,b.user2 from
dat a
left join dat b on a.user1=b.user2
group by a.user1,b.user2
order by sum(a.call_length+nvl(b.call_length,0)) desc
Another way, which might look dirtier since mysql doesn't support any window fucntion like any other RDBMS but always give you the exact result including multiple users having the highest total length is by combining the results of both users, calculating its total length and use that value in the outer query comparing to the total sum of the same query without the use of LIMIT.
SELECT Caller, SUM(length) TotalLength
FROM
(
SELECT User1 AS Caller, length FROM calls UNION ALL
SELECT User2, length FROM calls
) a
GROUP BY Caller
HAVING SUM(length) = (
SELECT MAX(TotalLength)
FROM
(
SELECT Caller, SUM(length) TotalLength
FROM
(
SELECT User1 AS Caller, length FROM calls UNION ALL
SELECT User2, length FROM calls
) a
GROUP BY Caller
) a
)
Here's a Demo.
I will combine all call time for each users (user1 and user 2) then group by user. Get the top 1 record based on call time.
Select user, sum(calltime) as calltime
From
(Select user1 as user, calltime
from tbl
Union all
Select user2, calltime
from tbl
) t
Group by user
Order by calltime desc
Limit 1;

sql return most prevalent column value

I'm a beginner at SQL, how do I get a query which returns the most prevalent column value? Probably there is an answer somewhere but I don't know how to google it.
For example in the user_id column the query should return the value 1 because this is the most prevalent number.
One approach is to do a GROUP BY aggregation and then apply a LIMIT trick:
SELECT user_id, COUNT(*) AS cnt
FROM yourTable
GROUP BY user_id
ORDER BY COUNT(*) DESC
LIMIT 1;
If you want something more complex, then you would be getting into the realm of rank functionality. MySQL (at least as of the current release) does not support built-in rank support, so it can be tricky to perform such queries.
SELECT top 1 user_id, COUNT(*) AS cnt
FROM yourTable
GROUP BY user_id
ORDER BY COUNT(*) DESC
Have a common table expression that counts each user_id. Select user_id where the count is the max count. Will return both user_id's in case of a tie.
with cte as
(
SELECT user_id, COUNT(*) AS cnt
FROM yourTable
GROUP BY user_id
)
select user_id
from cte
where cnt = (select max(cnt) from cte)

Grouping users by group ids in mysql, exclude specified userid from the results?

There is a small application that I've been tasked on, that deals with getting latest posts in a group. In this sample below, I have there is a MySQL table formatted as such:
groupid userid date_updated
1 1 [date]
1 2 [date]
2 1 [date]
2 2 [date]
2 3 [date]
...
How do I do an SQL statement as such as the results go out in this manner (assuming I give a userid with a value of 1 for example):
groupid userid date
1 2 [date]
2 2 [date]
2 3 [date]
These are all ordered by date. As you may have noticed, the results do not include the provided userid (as the requirement is only to get users other than the supplied user ID). In other words, show only users other than the specified user in groups where the specified user is part of.
Is it possible to do this in a single SQL statement?
Search select query with where
select * from table where userid != '1'
Try the following solution.
select
tbl.*
from
tbl INNER JOIN
(select groupid, userid, max(date_updated)
from tbl
group by groupid, userid) tbl2
USING(groupid, userid)
ORDER BY tbl.date_updated;
You can use this
SELECT tbl.* FROM (SELECT * FROM tablename ORDER BY date DESC) as tbl GROUP BY tbl.groupid
I managed to find a possible answer to my question here with this SQL statement:
SELECT a.groupid, a.userid, a.date_updated
FROM group_participants a
WHERE a.groupid IN (
SELECT DISTINCT b.groupid FROM group_participants b WHERE b.userid = 1
)
AND a.user_id <> 1
GROUP BY a.userid
ORDER by a.date_updated DESC
Thank you guys those SQL statements you posted, gave me an idea. I don't know if the SQL statement above can still be optimized, but this one above gave me the correct answer.

Maximum Count of Distinct Values in SQL

please forgive me if this has been answered, but could not find it using the search tool or a basic google query.
I am trying to return a value that indicates the maximum number of rows any distinct value in a column in SQL.
For example, I'd like to use something like
SELECT MAX(COUNT(DISTINCT person_id) AS MAX_NUM_PERS_ROW
FROM mytable
and if the person with most rows in the table had 5 rows, the value returned would be 5...
Any and all help is appreciated!
You can do this with nested aggregation:
select max(cnt)
from (select person_id, count(*) as cnt
from mytable
group by person_id
) p;
If you actually want the person, you can also do:
select person_id, count(*) as cnt
from mytable
group by person_id
order by count(*) desc
limit 1;

group_concat in SQL Server 2008 [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Combine multiple results in a subquery into a single comma-separated value
Concat groups in SQL Server
I want to be able to get the duplication's removed
SELECT Count(Data) as Cnt, Id
FROM [db].[dbo].[View_myView]
Group By Data
HAVING Count(Data) > 1
In MySQL it was as simple as this:
SELECT Count(Data), group_concat(Id)
FROM View_myView
Group By Data
Having Cnt > 1
Does anyone know of a solution? Examples are a plus!
In SQL Server as of version 2005 and newer, you can use a CTE (Common Table Expression) with the ROW_NUMBER function to eliminate duplicates:
;WITH LastPerUser AS
(
SELECT
ID, UserID, ClassID, SchoolID, Created,
ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY Created DESC) AS 'RowNum'
FROM dbo.YourTable
)
SELECT
ID, UserID, ClassID, SchoolID, Created,
FROM LastPerUser
WHERE RowNum = 1
This CTE "partitions" your data by UserID, and for each partition, the ROW_NUMBER function hands out sequential numbers, starting at 1 and ordered by Created DESC - so the latest row gets RowNum = 1 (for each UserID) which is what I select from the CTE in the SELECT statement after it.
Using the same CTE, you can also easily delete duplicates:
;WITH LastPerUser AS
(
SELECT
ID, UserID, ClassID, SchoolID, Created,
ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY Created DESC) AS 'RowNum'
FROM dbo.YourTable
)
DELETE FROM dbo.YourTable t
FROM LastPerUser cte
WHERE t.ID = cte.ID AND cte.RowNum > 1
Same principle applies: you "group" (or partition) your data by some criteria, you consecutively number all the rows for each data partition, and those with values larger than 1 for the "partitioned row number" are weeded out by the DELETE.
Just use distinct to remove duplicates. It sounds like you were using group_concat to join duplicates without actually wanting to use its value. In that case, MySQL also has a distinct you could have been using:
SELECT DISTINCT Count(Data) as Cnt, Id
FROM [db].[dbo].[View_myView]
GROUP BY Id
HAVING Count(Data) > 1
Also, you can't group by something you use in an aggregate function; I think you mean to group by id. I corrected it in the example above.