how to find duplicate count without counting original - mysql

I need to count the number of duplicate emails in a mysql database, but without counting the first one (considered the original). In this table, the query result should be the single value "3" (2 duplicate x#q.com plus 1 duplicate f#q.com).
TABLE
ID | Name | Email
1 | Mike | x#q.com
2 | Peter | p#q.com
3 | Mike | x#q.com
4 | Mike | x#q.com
5 | Frank | f#q.com
6 | Jim | f#q.com
My current query produces not one number, but multiple rows, one per email address regardless of how many duplicates of this email are in the table:
SELECT value, count(lds1.leadid) FROM leads_form_element lds1 LEFT JOIN leads lds2 ON lds1.leadID = lds2.leadID
WHERE lds2.typesID = "31" AND lds1.formElementID = '97'
GROUP BY lds1.value HAVING ( COUNT(lds1.value) > 1 )

It's not one query so I'm not sure if it would work in your case, but you could do one query to select the total number of rows, a second query to select distinct email addresses, and subtract the two. This would give you the total number of duplicates...
select count(*) from someTable;
select count(distinct Email) from someTable;
In fact, I don't know if this will work, but you could try doing it all in one query:
select (count(*)-(count(distinct Email))) from someTable
Like I said, untested, but let me know if it works for you.

Try doing a group by in a sub query and then summing up. Something like:
select sum(tot)
from
(
select email, count(1)-1 as tot
from table
group by email
having count(1) > 1
)

Related

Combine MySQL SELECT COUNTs

I have 3 MySQL queries I'd like to combine as 1
SELECT pic FROM active,
SELECT pic FROM deleted,
SELECT alt_pic FROM active where alt_pic!=''
I've managed to get the first 2 as one
SELECT pic FROM active UNION SELECT pic FROM deleted
I think I've partially gotten through combining all 3 except I don't know where exactly to insert the 3rd query
SELECT COUNT(*) FROM (SELECT pic FROM active UNION SELECT pic FROM deleted)t
I am just studying for fun. If I am somehow breaking convention or introduce some security risks, please don't get mad :)
Edit 1: Newbie doesn't know, thanks Mureinik and Strawberry for pointing it out :)
alt_pic is just a very optional field, my table active has about 300+ rows but only 8 alt_pic fields filled
active
ID | name | pic | alt_pic
1 | Peter | pic5.jpg | alt1.jpg
2 | Mark | pic4.jpg | NULL
3 | John | pic3.jpg | alt2.jpg
deleted
ID | pic
1 | pic2.jpg
2 | pic1.jpg
The result I'd like to have is
pic_count | alt_pic_count
5 | 2
I'd perform an aggregate query on each table, and then, since they both return just one row, cross join them. Note you don't need a third query with a condition on alt_pic - since count ignores nulls, you could apply it directly to that column.
SELECT pcount + dcount AS pic_count, acount AS alt_pic_count
FROM (SELECT COUNT(pic) AS pcount, COUNT(alt_pic) AS acount
FROM active) a
CROSS JOIN (SELECT COUNT(*) AS dcount
FROM deleted) d

MySQL COUNT(DISTINCT) giving wrong values with GROUP BY

I have a table that contains custom user analytics data. I was able to pull the number of unique users with a query:
SELECT COUNT(DISTINCT(user_id)) AS 'unique_users'
FROM `events`
WHERE client_id = 123
And this will return 16728
This table also has a column of type DATETIME that I would like to group the counts by. However, if I add a GROUP BY to the end of it, everything groups properly it seems except the totals don't match. My new query is this:
SELECT COUNT(DISTINCT(user_id)) AS 'unique_users', DATE(server_stamp) AS 'date'
FROM `events`
WHERE client_id = 123
GROUP BY DATE(server_stamp)
Now I get the following values:
|-----------------------------|
| unique_users | date |
|---------------|-------------|
| 2650 | 2019-08-26 |
| 3486 | 2019-08-27 |
| 3475 | 2019-08-28 |
| 3631 | 2019-08-29 |
| 3492 | 2019-08-30 |
|-----------------------------|
Totaling to 16734. I tried using a sub query to get the distinct users then count and group in the main query but no luck there. Any help in this would be greatly appreciated. Let me know if there is further information to help diagnosis.
A user, who is connected with events on multiple days (e.g. session starts before midnight and ends afterwards), will occur the number of these days times in the new query. This is due to the fact, that the first query performs the DISTINCT over all rows at once while the second just removes duplicates inside each groups. Identical values in different groups will stay untouched.
So if you have a combination of DISTINCT in the select clause and a GROUP BY, the GROUP BY will be executed before the DISTINCT. Thus without any restrictions you cannot assume, that the COUNT(DISTINCT user_id) of the first query and the sum over the COUNT(DISTINCT user_id) of all groups is the same.
Xandor is absolutely correct. If a user logged on 2 different days, There is no way your 2nd query can remove them. If you need data grouped by date, You can try below query -
SELECT COUNT(user_id) AS 'unique_users', DATE(MIN_DATE) AS 'date'
FROM (SELECT user_id, MIN(DATE(server_stamp)) MIN_DATE -- Might be MAX
FROM `events`'
WHERE client_id = 123
GROUP BY user_id) X
GROUP BY DATE(server_stamp);

ORDER BY does not work if COUNT is used

I have a table with following content
loan_application
+----+---------+
| id | user_id |
+----+---------+
| 1 | 10 |
| 2 | 10 |
| 3 | 10 |
+----+---------+
I want to fetch 3rd record only if there are 3 records available, in this case i want id 3 and total count must be 3, here is what i expect
+--------------+----+
| COUNT(la.id) | id |
+--------------+----+
| 3 | 3 |
+--------------+----+
Here is the query i tried.
SELECT COUNT(la.id), la.id FROM loan_application la HAVING COUNT(la.id) = 3 ORDER BY la.id DESC;
However this gives me following result
+--------------+----+
| COUNT(la.id) | id |
+--------------+----+
| 3 | 1 |
+--------------+----+
The problem is that it returns id 1 even if i use order by id descending, whereas i am expecting the id to have value of 3, where am i going wrong ?
Thanks.
In your case u can use this query:
SELECT COUNT(la.id), max(la.id) FROM loan_application la
GROUP BY user_id
I try your table in my db MySQL
When you have a group by function (in this instance count()) in the select list without a group by clause, then mysql will return a single record only with the function applied to the whole table.
Mysql under certain configuration settings allow you to include fields in the select loist which are not in the group by clause, nor are aggregated. Mysql pretty much picks up the 1st value it encounters while scanning the data as a value for such fields, in your case the value 1 for id.
If you want to fetch the record where id=count of records within the table, then I would use the following query:
select *
from loan_application
join (select count(*) as numrows from loan_application) t
where id=t.numrows and t.numrows=3
However, this implies that the values within the id field are continuous and there are no gaps.
You are selecting la.id along with an aggregated function (COUNT). So after iterating the first record the la.id is selected but the count goes on. So in this case you will get the first la.id not the last. In order to get the last la.id you need to use the max function on that field.
Here's the updated query:
SELECT
COUNT(la.id),
MAX(la.id)
FROM
loan_application la
GROUP BY user_id
HAVING
COUNT(la.id) = 3
N:B: You are using COUNT without a GROUP BY Function. So this particular aggregated function is applied to the whole table.

How do I select row based on the number of times it appears in the table?

I have my table: call it tblA THis table has three rows, id, sub-id, and visibility
sub-id is the primary key (it defines taxonomies for id). I'm trying to build a query that selects every id that appears less than three times.
here is an example query/result
select * from tbla where id = 188002;
+--------+--------+-------------+
| sub-id | id | visibility |
+--------+--------+-------------+
| 284922 | 188002 | 2 |
| 284923 | 188002 | 2 |
| 284924 | 188002 | 0 |
+--------+--------+-------------+
From what i've seen here and here it looks like I need to join the table on...itself. I dont really understand what that accomplishes.
If anyone has insight into this, it is appreciated. I will continue to research it and update this topic with any additional information I come across.
Thanks
SELECT id
FROM tbla
GROUP BY id
HAVING COUNT(*) < 3
If you want to select all columns from the table, you will have to use #Joe's query in a sub-select:
SELECT * FROM tbla a
WHERE a.id IN (SELECT DISTINCT b.id
FROM tbla b
GROUP BY b.id
HAVING COUNT(*) < 3)
This query first selects all id's that have fewer than 3 duplicates.
The distinct eliminates duplicates, the query works the same without, but slightly slower.
Next it selects all rows that have an id that meets the criteria in the sub-select i.e. that have fewer than 3 duplicate id's.
The reason that you cannot do this in one go is that the group by heaps all rows with the same id together into one super-row (for want of a better metafor) .
You cannot separate out the columns that are not in the group by clause.
The outer select solves this.

How to filter duplicates within row using Distinct/group by with JOINS

For simplicity, I will give a quick example of what i am trying to achieve:
Table 1 - Members
ID | Name
--------------------
1 | John
2 | Mike
3 | Sam
Table 1 - Member_Selections
ID | planID
--------------------
1 | 1
1 | 2
1 | 1
2 | 2
2 | 3
3 | 2
3 | 1
Table 3 - Selection_Details
planID | Cost
--------------------
1 | 5
2 | 10
3 | 12
When i run my query, I want to return the sum of the all member selections grouped by member. The issue I face however (e.g. table 2 data) is that some members may have duplicate information within the system by mistake. While we do our best to filter this data up front, sometimes it slips through the cracks so when I make the necessary calls to the system to pull information, I also want to filter this data.
the results SHOULD show:
Results Table
ID | Name | Total_Cost
-----------------------------
1 | John | 15
2 | Mike | 22
3 | Sam | 15
but instead have John as $20 because he has plan ID #1 inserted twice by mistake.
My query is currently:
SELECT
sq.ID, sq.name, SUM(sq.premium) AS total_cost
FROM
(
SELECT
m.id, m.name, g.premium
FROM members m
INNER JOIN member_selections s USING(ID)
INNER JOIN selection_details g USING(planid)
) sq group by sq.agent
Adding DISTINCT s.planID filters the results incorrectly as it will only show a single PlanID 1 sold (even though members 1 and 3 bought it).
Any help is appreciated.
EDIT
There is also another table I forgot to mention which is the agent table (the agent who sold the plans to members).
the final group by statement groups ALL items sold by the agent ID (which turns the final results into a single row).
Perhaps the simplest solution is to put a unique composite key on the member_selections table:
alter table member_selections add unique key ms_key (ID, planID);
which would prevent any records from being added where the unique combo of ID/planID already exist elsewhere in the table. That'd allow only a single (1,1)
comment followup:
just saw your comment about the 'alter ignore...'. That's work fine, but you'd still be left with the bad duplicates in the table. I'd suggest doing the unique key, then manually cleaning up the table. The query I put in the comments should find all the duplicates for you, which you can then weed out by hand. once the table's clean, there'll be no need for the duplicate-handling version of the query.
Use UNIQUE keys to prevent accidental duplicate entries. This will eliminate the problem at the source, instead of when it starts to show symptoms. It also makes later queries easier, because you can count on having a consistent database.
What about:
SELECT
sq.ID, sq.name, SUM(sq.premium) AS total_cost
FROM
(
SELECT
m.id, m.name, g.premium
FROM members m
INNER JOIN
(select distinct ID, PlanID from member_selections) s
USING(ID)
INNER JOIN selection_details g USING(planid)
) sq group by sq.agent
By the way, is there a reason you don't have a primary key on member_selections that will prevent these duplicates from happening in the first place?
You can add a group by clause into the inner query, which groups by all three columns, basically returning only unique rows. (I also changed 'premium' to 'cost' to match your example tables, and dropped the agent part)
SELECT
sq.ID,
sq.name,
SUM(sq.Cost) AS total_cost
FROM
(
SELECT
m.id,
m.name,
g.Cost
FROM
members m
INNER JOIN member_selections s USING(ID)
INNER JOIN selection_details g USING(planid)
GROUP BY
m.ID,
m.NAME,
g.Cost
) sq
group by
sq.ID,
sq.NAME