MYSQL Finding count of duplicate value - mysql

In below query there is a column called lead_id i want to find count of duplicate lead_id
in my result if there are 10 unique lead id i must get 10
but it must be grouped based on created_time ie.. if there are 2 unique lead_id for today's date then the result would be 2 ..
select
t.created_time,
t.timecreated,
sum(t.suggested_pending_cnt),
sum(t.suggested_dropped_cnt)
from
(select
date_format(timecreated, '%d-%b-%Y') created_time,
timecreated,
case
when source = 2 then 1
else 0
end suggested_pending_cnt,
case
when (source = 2 && directory_status = 4) then 1
else 0
end suggested_dropped_cnt
from
mg_lead_suggested_listing) t
group by t.created_time
order by t.timecreated desc
limit 10

It looks like u want to get the unique ids for a particular date not duplicate

Following is the query to count duplicate records with first_name and last_name in a table.
mysql> SELECT COUNT(*) as repetitions, last_name, first_name
-> FROM person_tbl
-> GROUP BY last_name, first_name
-> HAVING repetitions > 1;
This query will return a list of all the duplicate records in person_tbl table. In general, to identify sets of values that are duplicated, do the following:
Determine which columns contain the values that may be duplicated.
List those columns in the column selection list, along with COUNT(*).
List the columns in the GROUP BY clause as well.
Add a HAVING clause that eliminates unique values by requiring group counts to be greater than one.

Related

Mysql match exact set of result in a single table

I am having trouble with generating result set. This is what my 'user_roles' table looks like,
id user_id role_id
1 1 1
2 1 2
3 2 1
4 3 1
5 3 2
6 3 3
... ... ...
I want this result where user has exact both roles i:e 1 and 2, I do not want those user having roles other than 1,2.
id user_id role_id
1 1 1
2 1 2
... ... ...
I have tried so far,
SELECT
*
FROM
`user_roles`
WHERE `role_id` IN (1,2)
HAVING COUNT(id) = 2
But, it returns null.
Why your query doesn't work
HAVING applies after GROUP BY and your query doesn't have one. When the query contains HAVING or GROUP BY aggregate functions but it doesn't contain the GROUP BY clause, a single group containing all the selected rows is created.
Before applying HAVING, your query selects the rows having id in 1..5 (i.e. 5 rows). A single group is created from them, COUNT(id) returns 5 and the HAVING condition doesn't match. That's why the query doesn't return anything.
In order to correctly count the number of roles of each user it needs to group the records by user_id:
SELECT `user_id`
FROM `user_roles`
WHERE `role_id` IN (1, 2)
GROUP BY `user_id`
HAVING COUNT(`id`) = 2
This way, the WHERE clause selects the user having the roles 1 or 2 (but it ignores other roles), the GROUP BY clause allows the function COUNT(id) to count the number of selected roles for each user and the HAVING clause keeps only those users having both roles (1 and 2). The SELECT clause is not allowed to contain * because for the columns that are not in the GROUP BY clause, MySQL is free to pick any value it finds in the corresponding column and it may return different results on different executions of the query.
However, the query above doesn't return the values you want. It completely ignore the roles that are not 1 or 2 and it will return the user having user_id = 3.
A query that works
This query returns the users having only the roles 1 and 2 is:
SELECT `user_id`
FROM `user_roles`
GROUP BY `user_id`
HAVING COUNT(`role_id`) = 2 AND GROUP_CONCAT(`role_id`) = '1,2'
The condition COUNT(role_id) = 2 is not needed. In theory it should improve the execution speed (because counting works faster that string concatenation) but in real life it might have no impact whatsoever. The MySQL engine knows better.
Update
#martin-schneider asks in a comment:
is the order of GROUP_CONCAT(role_id) deterministic? or could it be that the result is '2,1'?
It's a very good question that has the answer in the documentation of function GROUP_CONCAT():
To sort values in the result, use the ORDER BY clause. To sort in reverse order, add the DESC (descending) keyword to the name of the column you are sorting by in the ORDER BY clause. The default is ascending order; this may be specified explicitly using the ASC keyword.
The complete query is:
SELECT `user_id`
FROM `user_roles`
GROUP BY `user_id`
HAVING COUNT(`role_id`) = 2
AND GROUP_CONCAT(`role_id` ORDER BY `role_id` ASC SEPARATOR ',') = '1,2'
I omitted ORDER BY and SEPARATOR because their default values (sort ascending by the values that are concatenated and use comma as separator) are good for our needs in this query.
Important to notice
There is a limit for the length of the value computed by the GROUP_CONCAT() function. Its result is truncated to the value stored in the system variable group_concat_max_len whose default value is 1024.
This value can be increased using the SET MySQL statement before running the query:
SET group_concat_max_len = 1000000
However, for this particular query the default limit of 1024 characters is more than enough.
You could aggregate by user_id and use HAVING:
SELECT *
FROM `user_roles`
WHERE `user_id` IN (SELECT user_id
FROM `user_roles`
GROUP BY user_id
HAVING SUM(role_id IN (1,2)) = 2
AND SUM(role_id NOT IN (1,2)) = 0);
LiveDemo*
*SQLFiddle does not respond so SQL Server equivalent
Note:
I assumed that user_id, role_id are unique and not null.

MYSQL to order before grouping by

I have the following:
user_id date_created project_id
3 10/10/2013 1
3 09/10/2013 1
5 10/10/2013 1
8 10/10/2013 1
10 10/10/2013 1
3 08/10/2013 1
The end result i want is:
user_id date_created project_id
3 10/10/2013 1
5 10/10/2013 1
8 10/10/2013 1
10 10/10/2013 1
Context:
I have this thing called an influence, and a user can have many influences for a project.
I want to get a list of the latest influence from a user on a project.
I have tried:
select * from influences
where project_id = 1
group by user_id
ORDER BY created_at DESC
but of course this ignores first ordering by user created at, and then ordering the full list. It simply just squishes the users together and orders the end list
THE LARAVEL - Eloquent FOR THE ANSWER PROVIDED IS THIS:
return Influence::select( "user_id", "influence", DB::raw( "MAX(created_at) as created_at" ) )
->where( "project_id", "=", $projectID )
->groupBy( "user_id", "project_id" )->get();
You don't want to order before group by, because given the structure of your query, it won't necessary do what you want.
If you want the most recently created influence, then get it explicitly:
select i.*
from influences i join
(select user_id, max(created_at) as maxca
from influences i
where project_id = 1
group by user_id
) iu
on iu.user_id = i.user_id and iu.maxca = i.created_at
where i.project_id = 1;
Your intention is to use a MySQL extension that the documentation explicitly warns against using. You want to include columns in the select that are not in the group by. As the documentation says:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause. Sorting of the result set
occurs after values have been chosen, and ORDER BY does not affect
which values within each group the server chooses.
Use this:
SELECT user_id, project_id, MAX(date_created) as latest
FROM influences
WHERE project_id = 1
GROUP BY user_id, project_id
How it works: MySQL selects all the rows that match the WHERE conditions and sorts them by user_id then, for each user_id by project_id. From each set of rows having the same user_id and project_id it will produce a single row in the final result set.
You can use in the SELECT clause the columns used in the GROUP BY clause (user_id and project_id); their values are unambiguous: all the rows from each group have the same user_id and project_id.
You can also use aggregate functions. Each of them uses one column from all the rows in the group to compute a single value. The most recent created_at is, of course, MAX(created_at).
If you select a column that is neither included in the GROUP BY clause, nor passed to an aggregate function (like created_at you have in your query), MySQL has no hint how to compute that value. The standard SQL forbids it (the query is not valid) but MySQL allows it. It will simply pick a value from that column but there is no way to make it pick it from a specific row because this is, in fact, undefined behaviour.
You can omit the project_id from the GROUP BY clause because the WHERE clause will make all the rows having the same project_id. This will coincidentally make the result correct even if project_id does not appear in a GROUP BY clause and it's not computed using an aggregate function.
I recommend you to keep project_id into the GROUP BY clause. It doesn't affect the result or the query speed and it allows you to loose the filtering conditions (f.e. use WHERE project_id IN (1, 2)) always get the correct result (this doesn't happen if you remove it from GROUP BY).

Deselect duplicates in sql

Database ::
id subid
1 1
2 1
3 2
4 3
The id column is automatically incremented and primary
subid is any number.
I want to arrange subid in decreasing order, remove rows which consists
duplicate subid in it and get rows
Result i want ::
id subid
4 3
3 2
2 1
select max(id), subid group by subid order by subid desc
Use
order by (col) ASC or DESC
To have ascending or descending sorting
Use also
group by (col)
or
distinct
To drop out duplicate datas
An explanation for #Geri of the use of GROUP BY.
with the accepted answer of :-
SELECT * FROM table GROUP BY subid ORDER BY subid DESC
the query selects all columns grouped by subid. In this case there is just 1 other column (although in other queries there could be many other columns). GROUP BY is designed to work with aggregate functions, for example MAX. With MAX(id) MySQL will find all the rows for each subid and then find the maximum value of id for each subid and return that.
So for subid 2 there is only a single id of 3 so that is obviously the max value of the id for that subid and will be returned, similarly for subid of 3 the only id is 4. For subid of 1 the possible values of id are 1 and 2. As 2 is the highest value it will be returned.
Without an aggregate function MySQL is free to chose any value of id, and which one it choosing is not defined and may change.
So for subid of 1 it might return the id of 1 or the id of 2.
Most flavours of SQL will issue an error in the situation where there is a column in the SELECT clause which is not the result of an aggregate function and is also not mentioned in the GROUP BY clause. By default MySQL does not error in this situation, and there are logical reasons for this to be correct. For example with extra columns which are entirely dependent on a column that is in the GROUP BY clause - such as grouping by a unique user id and pulling back the users name, and this behaviour is defined in the SQL standards although exact implementation is patchy (and of course it becomes a non issue if these directly related columns are added to the GROUP BY clause).
MySQL can be set up to error in this situation, bringing it more in to line with other flavours of SQL by running with only_full_group_by set up:-
http://dev.mysql.com/doc/refman/5.0/en/sql-mode.html#sqlmode_only_full_group_by
The answer by #franglais of:-
select max(id), subid group by subid order by subid desc
avoids the issue of which value of id to be returned being undefined, as it is specifying that for each subid returned the maximum id value for that subid will be returned.
SELECT MAX(id), subid FROM table GROUP BY subid ORDER BY subid DESC

Multiple counting in 1 sql statement

Lets say I have a table with a column of ages..
Here is the list of ages
1
2
3
1
1
3
I want the SQL to count how many of age 1s, how many of 2s and 3s.
The code:
Select count(age) as age1 where age = ‘1’;
Select count(age) as age2 where age = ‘2’;
Select count(age) as age3 where age = ‘3’;
Should work but would there be a way to just display it all using only 1 line of code?
This is an instance where the GROUP BY clause really shines:
SELECT age, COUNT(age)
FROM table_name
GROUP BY age
Just an additional tip:
You shouldn't use single quotes here in your query:
WHERE age = '1';
This is because age is an INT data type and therefore does not have single quotes. MySQL will implicitly convert age to the correct data type for you - and it's a negligible amount of overhead here. But imagine if you were doing a JOIN of two tables with millions of rows, then the overhead introduced would be something to consider.
Try this ,if the count is limited to three ages ,also using aggregate functions without grouping them will result in a single row,you can use SUM() with the condition which will result in a boolean and you can get the count based on your criteria
Select SUM(age = '1') as age1,
SUM(age = '2') as age2,
SUM(age = '3') as age3
from table
SELECT SUM(CASE WHEN age = 1 THEN 1 ELSE 0 END) AS age1,
SUM(CASE WHEN age = 2 THEN 1 ELSE 0 END) AS age2,
SUM(CASE WHEN age = 3 THEN 1 ELSE 0 END) AS age3
FROM YourTable
If your query should return only one column (age in this case, you can use Count+groupby):
SELECT age, Count(1) as qty
FROM [yourTable]
GROUP BY age
Remember you must include any additional column in your group by condition.
Select age as Age_Group, count(age) as Total_count from table1 group by age;
select age, count(age) from SomeTable group by age
http://sqlfiddle.com/#!2/b40da/2
The group by clause works like this:
When using aggregate functions, like the count function without a group by clause the function will apply to the entire dataset determined by the from and where clauses. A count will for instance count the number of rows in the result set, and sum over a specfic column will sum all the rows in the result set.
What the group by clause allows us to do, is to divide the result set determined by the from and where clause into partitions, so that the aggregate functions no longer applies to the result set as a whole, but rather within each partition of the result set.
When you specify a column to group by, what you are saying is something like "for each distinct value of column x in the result set, create a partition containing any row in the result set with this particular value in column x". Then, instead of yielding one result covering the entire resultset, aggregate functions will yield one result for each distinct value of column x in the result set.
With your example input of:
1
2
3
1
1
3
let's analyze the above query. As always, we should look at the from clause and the where clause first. The from clause tells us that we are selecting from SomeTable and only this, and the lack of a where clause tells us that we are selecting from the full contents of SomeTable.
Next, we'll look at the group by clause. It's present, and it groups by the age column, which is the only column in our example. The presence of the group by clause changes our dataset completely! Instead of selecting from the entire row set of SomeTable, we are now selecting from a set of partitions, one for each distinct value of the age-column in our original result set (which was every row in SomeTable).
At last, we'll look at the select-clause. Now, since we are selecting from partitions and not regular rows, the select-clause has fewer options for what it can contain, actually it only has 2: The column that it is grouped by, or an aggregate function.
Now, in our example we only have one column, but consider that we had another column, like here:
http://sqlfiddle.com/#!2/d5479/2
Now, imagine that in our data set we have two rows, both with age='1', but with different values in the other column. If we were to include this other column in a query that is grouped by the age-column (which we now know will return one row for each partition over the age-column), which value should be presented in the result? It makes no sense to include other column than the one you grouped by. (I'll leave multiple columns in the group by clause out of this, in my experience one usually just wants one..)
But back to our select-clause, knowing our dataset has the distinct values {1, 2, 3} in the age-column, we should expect to get 3 rows in our result set. The first thing to be selected is the age-column, which will yield the values [1, 2, 3]´ in the three rows. Next in theselect-list is an aggregate functioncount(age), which we now know will count the number of rows in each partition. So, for the row in the result whereage='1', it will count the number of rows withage='1', for the row whereage='2'it will count the number of rows whereage='2'`, and so on.
The result would look something like this:
age count(age)
1 3
2 1
3 2
(of course you are free to override the name of the second column in the result, with the as-operator..)
And that concludes today's lesson.

Does this MySQL query always return the expected result?

I wrote a query as follows:
SELECT COUNT(*) AS count, email
FROM sometable
GROUP BY email
ORDER BY count DESC
LIMIT 4
I am interested in seeing the four most duplicated email entries in the table. So far, it seems to return exactly what I want:
count email
12 very-duplicated#email.com
2 duped-twice#email.com
2 also-twice#email.com
1 single#email.com
When I don't use LIMIT, I get the same result (albeit with many more rows having a count = 1). What I'm wondering about is the LIMIT. In the future, when the numbers change, will my query above still return the four most used emails? or does the query need to scan the entire database to remain accurate?
(note: I am not trying to prevent duplicates, I'm trying to see the most frequently used email.)
I'm not sure. But if you're concerned, you could apply a limit to a subquery:
select *
from
(
SELECT COUNT(*) AS count, email
FROM sometable
GROUP BY email
ORDER BY count DESC
)
limit 4
Alternateively, you could do something like this to see all duplicated email address (may return more or less than 4):
SELECT COUNT(*) AS count, email
FROM sometable
GROUP BY email
having COUNT(email) > 1
ORDER BY count DESC
Well first thing is, the query does not only return you the duplicate entries. Look at 4th row which says count = 1 which means it occurs only once in the table. To list duplicate records you need to modify your query as -
SELECT COUNT(*) AS count, email
FROM sometable
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY count DESC
LIMIT 4
Then, this will always return you 4 topmost duplicate entries in your table as the order mentioned.