MYSQL Count of duplicate records with condition - mysql

I'm trying to get count of duplicate data, but my query doesn't work correctly.
Every user should have one dev_id but when other user will have this same dev_id I want to know this
Table for example:
dev_id user_id
------------------
111 1
111 1
222 2
111 2
333 3
Should result:
user_id qu
------------------
1 1
2 1
3 0
This is my query
SELECT t1.user_id,
(SELECT Count(DISTINCT t2.dev_id)
FROM reports t2
WHERE t2.user_id != t1.user_id
AND t2.dev_id = t1.dev_id
) AS qu
FROM reports t1
GROUP BY t1.user_id

You can get results by doing:
select r.user_id, count(*) - 1
from reports r
group by r.user_id;
Is this the calculation that you want?

Okay. Let start from simple.
First you need get unique user_id/dev id combinations
select distinct dev_id,user_id from reports
Result will be
dev_id user_id
------------------
111 1
222 2
111 2
333 3
After that you should get number of different user_id per dev_id
select dev_id,c from (
SELECT
dev_id,
count(*)-1 AS c
FROM
(select distinct user_id,dev_id from reports) as fixed_reports
GROUP BY dev_id
) as counts
Result of such query will be
dev_id c
-----------------
111 1
222 0
333 0
Now you should show users which have such dev_id. For that you should join such dev_id list with table from step1(which show which one user_id,dev_id pairs exist)
select distinct fixed_reports2.user_id,counts.c from (
SELECT
dev_id,
count(*)-1 AS c
FROM
(select distinct user_id,dev_id from reports) as fixed_reports
GROUP BY dev_id
) as counts
join
(select distinct user_id,dev_id from reports) as fixed_reports2
on fixed_reports2.dev_id=counts.dev_id
where counts.c>0 and counts.c is not null
"Distinct" here need to skip same rows.
Result should be for internal query
dev_id c
-----------------
111 1
For all query
user_id c
------------------
1 1
2 1
If you sure you need also rows with c=0, then you need do "left join" of fixed_reports2 and large query,that way you will get all rows and rows with c=null will be rows with 0(can be changed by case/when statement)

I think following sql query should solve you problem:
SELECT t1.user_id, t1.dev_id, count(t2.user_id) as qu
FROM (Select Distinct * from reports) t1
Left Join (Select Distinct * from reports) t2
on t1.user_id != t2.user_id and t2.dev_id = t1.dev_id
group by t1.user_Id, t1.dev_id
SQL Fiddle Link

SELECT user_id, (COUNT(user_id) -1) as qu
FROM reports
GROUP BY user_id
This would give desired result in your case, however you can improve it a lot more.
Cheers,,

Your query is broken and would not run on many systems. The problem is that the group with user_id of 2 has two different dev_ids. If you run the "broken query" below you can see that the min() and max() are distinct but the subquery only sees one of those values which is randomly chosen. The last query is corrected by adding dev_id to the groupings which shows you where the "missing" row went in the counts.
SELECT -- broken query
t1.user_id, min(t1.dev_id), max(t1.dev_id),
(select distinct t1.dev_id from reports) as should_have_errored,
(SELECT Count(DISTINCT t2.dev_id)
FROM reports t2
WHERE t2.user_id != t1.user_id
AND t2.dev_id = t1.dev_id
) AS qu
FROM reports t1
GROUP BY t1.user_id;
-- On SQL Server that query returns an error
-- Msg 8120, Level 16, State 1, Line 7
-- Column 'reports.dev_id' is invalid in the select list because it is
-- not contained in either an aggregate function or the GROUP BY clause.
SELECT -- query that duplicates your original query
t1.user_id,
(SELECT Count(DISTINCT t2.dev_id)
FROM reports t2
WHERE t2.user_id != t1.user_id
AND t2.dev_id = max(t1.dev_id) /* <-- see here */
) AS qu
FROM reports t1
GROUP BY t1.user_id;
SELECT t1.user_id, t1.dev_id, -- fixed query
(SELECT Count(DISTINCT t2.dev_id)
FROM reports t2
WHERE t2.user_id != t1.user_id
AND t2.dev_id = t1.dev_id
) AS qu
FROM reports t1
GROUP BY t1.user_id, t1.dev_id
http://sqlfiddle.com/#!9/6576e3/20
Here are some queries that might be useful:
Which dev_ids have multiple user_ids associated with them?
select dev_id
from reports
group by dev_id
having count(distinct user_id) > 1
Which other user_ids share a dev_id with this user_id?
select user_id
from reports r1
where exists (
select 1
from reports r2
where r2.dev_id = r1.dev_id and r2.user_id <> ?
)
Or really that's just equivalent to an inner join which also makes it easy to list everybody at once. Note that each pair will be listed twice:
select r1.user_id, r1.dev_id, r2.user_id as common_user_id
from
reports r1 inner join reports r2
on r2.dev_id = r1.dev_id
where
r1.user_id <> r2.user_id
order by
r1.user_id, r1.dev_id, r2.user_id
And since you've got duplicate rows in your table you'd need to make it select distinct to get unique rows.

Try
SELECT
user_id,
SUM(qu) AS qu
FROM (
SELECT
user_id,
count(*)-1 AS qu
FROM
reports
GROUP BY user_id, dev_id
) AS r
GROUP BY user_id
No need to do a join if all the data you need is in one table.
Edit: changed the group by to dev_id instead of user_id
Edit2: I think you need both dev_id and user_id in the group by clause.
Edit3: Added a subquery to get the desired result. This might be a little cumbersome, perhaps someone has a way to improve this?

Related

Selecting Counts from Different Tables with a Subquery

I'm new to MySQL, and I'd like some help in setting up a MySQL query to pull some data from a few tables (~100,000 rows) in a particular output format.
This problem involves three SQL tables:
allusers : This one contains user information. The columns of interest are userid and vip
table1 and table2 contain data, but they also have a userid column, which matches the userid column in allusers.
What I'd like to do:
I'd like to create a query which searches through allusers, finds the userid of those that are VIP, and then count the number of records in each of table1 and table2 grouped by the userid. So, my desired output is:
userid | Count in Table1 | Count in Table2
1 | 5 | 21
5 | 16 | 31
8 | 21 | 12
What I've done so far:
I've created this statement:
SELECT userid, count(1)
FROM table1
WHERE userid IN (SELECT userid FROM allusers WHERE vip IS NOT NULL)
GROUP BY userid
This gets me close to what I want. But now, I want to add another column with the respective counts from table2
I also tried using joins like this:
select A.userid, count(T1.userid), count(T2.userid) from allusers A
left join table1 T1 on T1.userid = A.userid
left join table2 T2 on T2.userid = A.userid
where A.vip is not null
group by A.userid
However, this query took a very long time and I had to kill the query. I'm assuming this is because using Joins for such large tables is very inefficient.
Similar Questions
This one is looking for a similar result as I am, but doesn't need nearly as much filtering with subqueries
This one sums up the counts across tables, while I need the counts separated into columns
Could someone help me set up the query to generate the data I need?
Thanks!
You need to pre-aggregate first, then join, otherwise the results will not be what you expect if a user has several rows in both table1 and table2. Besides, pre-aggregation is usually more efficient than outer aggregation in a situation such as yours.
Consider:
select a.userid, t1.cnt cnt1, t2.cnt cnt2
from allusers a
left join (select userid, count(*) cnt from table1 group by userid) t1
on t1.userid = a.userid
left join (select userid, count(*) cnt from table2 group by userid) t2
on t2.userid = a.userid
where a.vip is not null
This is a case where I would recommend correlated subqueries:
select a.userid,
(select count(*) from table1 t1 where t1.userid = a.userid) as cnt1,
(select count(*) from table2 t2 where t2.userid = a.userid) as cnt2
from allusers a
where a.vip is not null;
The reason that I recommend this approach is because you are filtering the alllusers table. That means that the pre-aggregation approach may be doing additional, unnecessary work.

skip row if user_id contains a specific code

I have these rows
user_id code
1 9103
1 9103
1 9001
2 9103
3 9103
3 9104
4 9103
4 9103
4 9001
I want to get only id that not contains 9001, then only 2 and 3
I try with Distinct But I without lucky
Select distinct v.code, user_id from mytable as v
where v.code not in ( Select v2.code from mytable as v2
where v2.code=9001)
Group by the user and then take only those groups having no record of the condition
select user_id
from your_table
group by user_id
having sum(code = 9001) = 0
There are multiple methods to get the results you need.
NOT EXISTS (ALL DBMS)
SELECT
*
FROM
Table1
WHERE
NOT EXISTS (
SELECT
1
FROM
Table1
WHERE
code = 9001
)
NOT IN (ALL DBMS)
SELECT
DISTINCT
Table1.user_id
FROM
Table1
WHERE
user_id NOT IN (
SELECT
user_id
FROM
Table1
WHERE
code = 9001
)
RIGHT JOIN / LEFT JOIN (ALL DBMS but for example SQLite does not support RIGHT JOIN)
SELECT
DISTINCT
Table1.user_id
FROM (
SELECT
user_id
FROM
Table1
WHERE
code = 9001
) AS Table1_filter
RIGHT JOIN
Table1
ON
Table1_filter.user_id = Table1.user_id
WHERE
Table1_filter.user_id IS NULL
;
SELECT
DISTINCT
Table1.user_id
FROM
Table1
LEFT JOIN (
SELECT
user_id
FROM
Table1
WHERE
code = 9001
) AS Table1_filter
ON
Table1_filter.user_id = Table1.user_id
WHERE
Table1_filter.user_id IS NULL
;
Conditional SUM (#juergen d answer) (ALL DBMS)
SELECT
Table1.user_id
FROM
Table1
GROUP BY
Table1.user_id
HAVING
SUM(Table1.code = 9001) = 0
Variation on (#juergen d answer) with GROUP_CONCAT (MySQL and SQLite only)
Also possible with
... HAVING FIND_IN_SET('9001', GROUP_CONCAT(Table1.code)) = 0 (MySQL Only)
SELECT
Table1.user_id
FROM
Table1
GROUP BY
Table1.user_id
HAVING
GROUP_CONCAT(Table1.code) NOT LIKE '%9001%'
p.s GROUP_CONCAT(Table1.code) NOT LIKE '%9001%' might also select false positives depending on the data used. Using FIND_IN_SET('9001', GROUP_CONCAT(Table1.code)) = 0 is more safe option to use.
see demo http://sqlfiddle.com/#!9/fc6f6b/34

Select something from mysql database and order it by count (where)

I have a problem with selecting something from my database. Here is the sql sentence:
SELECT name
FROM table1
JOIN table2
ON table1.id=table2.advid
GROUP BY advid
ORDER BY COUNT(table2.likes) ASC
This will output name with the least table2.likes to the highest value of table2.likes
The problem is that table2.likes contain both likes and dislikes. Likes are marked with 1, and dislikes are marked with 2 in the table.
Currently, if there is...
...written in the table, the syntax will count both likes and dislikes so the result would be 6. I would need this result to be zero, which means when counting, dislikes have to be deduced from the number of likes. Which also means this part of the sentence: ORDER BY COUNT(table2.likes) ASC would have to be changed, but I don't know how.
Use conditional aggregation with SUM():
SELECT name
FROM table1 t1 JOIN
table2 t2
ON t2.id = t2.advid
GROUP BY name
ORDER BY SUM(CASE WHEN t2.likes = 1 THEN 1 ELSE -1 END) ASC;
Note: I changed the GROUP BY to be by name. The GROUP BY columns should match the columns you are selecting.
Use a case expression to count 1 for likes and -1 for dislikes. It is considered good style and less error-prone not to join and then aggregate, but to join the already aggregated data instead.
select t1.name, t2.sumlikes
from table1 t1
join
(
select advid, sum(case when likes = 1 then 1 else -1 end) as sumlikes
from table2
group by advid
) t2 on t2.advid = t1.id
order by sumlikes;
If you want to list names without like entries, too, then turn the join into a left outer join and select coalesce(t2.sumlikes, 0) instead.

SELECT with a COUNT of another SELECT

I have a table in SQL that is a list of users checking in to a website. It looks much like this:
id | date | status
------------------
Status can be 0 for not checking in, 1 for checked in, 2 for covered, and 3 for absent.
I'm trying to build one single query that lists all rows with status = 0, but also has a COUNT on how many rows have status = 3 on each particular id.
Is this possible?
MySQL VERSION
just join a count that is joined by id.
SELECT t.*, COALESCE(t1.status_3_count, 0) as status_3_count
FROM yourtable t
LEFT JOIN
( SELECT id, SUM(status=3) as status_3_count
FROM yourtable
GROUP BY id
) t1 ON t1.id = t.id
WHERE t.status = 0
note: this is doing the boolean sum (aka count)..
the expression returns either true or false a 1 or a 0. so I sum those up to return the count of status = 3 for each id
SQL SERVER VERSION
SELECT id, SUM(CASE WHEN status = 3 THEN 1 ELSE 0 END) as status_3_count
FROM yourtable
GROUP BY id
or just use a WHERE status = 3 and a COUNT(id)
Try a dependent subquery:
SELECT t1.*,
( SELECT count(*)
FROM sometable t2
WHERE t2.id = t1.id
AND t2.status = 3
) As somecolumnname
FROM sometable t1
WHERE t1.status=0
You can use a join for this. Write one query that will get all rows with a status zero:
SELECT *
FROM myTable
WHERE status = 0;
Then, write a subquery to get counts for the status of 3 for each id, by grouping by id:
SELECT COUNT(*)
FROM myTable
WHERE status = 3
GROUP BY id;
Since you want all the rows from the first table (at least that's what I am picturing), you can use a LEFT JOIN with the second table like this:
SELECT m.id, m.status, IFNULL(t.numStatus3, 0)
FROM myTable m
LEFT JOIN (SELECT id, COUNT(*) AS numStatus3
FROM myTable
WHERE status = 3
GROUP BY id) t ON m.id = t.id
WHERE m.status = 0;
The above will only show the count for rows containing an id that has status 0. Hopefully this is what you are looking for. If it is not, please post some sample data and expected results and I will help you try to reach it. Here is an SQL Fiddle example.

select unique data from a table with similar id data field

I am trying to retrieve unique values from the table above (order_status_data2). I would like to get the most recent order with the following fields: id,order_id and status_id. High id field value signifies the most recent item i.e.
4 - 56 - 4
8 - 52 - 6
7 - 6 - 2
9 - 8 - 2
etc.
I have tried the following query but not getting the desired result, esp the status_id field:
select max(id) as id, order_id, status_id from order_status_data2 group by order_id
This is the result am getting:
How would i formulate the query to get the desired results?
SELECT o.id, o.order_id, o.status_id
FROM order_status_data2 o
JOIN (SELECT order_id, MAX(id) maxid
FROM order_status_data2
GROUP BY order_id) m
ON o.order_id = m.order_id AND o.id = m.maxid
SQL Fiddle
In your query, you didn't put any constraints on status_id, so it picked it from an arbitrary row in the group. Selecting max(id) doesn't make it choose status_id from the row that happens to have that value, you need a join to select a specific row for all the non-aggregated columns.
Like so:
select d.*
from order_status_data2 d
join (select max(id) mxid from order_status_data2 group by order_id) s
on d.id = s.mxid
Try this Query.This will help you
SELECT id ,orderid,statusid
FROM table_name
WHERE id IN
(
SELECT max(id) FROM table_name GROUP BY orderid
)
ORDER BY statusid
You can refer this Sql_Fiddle_link which uses your example.