Finding users with at least one of every item - mysql

For example, I have the following table called, Information
user_id | item
-------------------------
45 | camera
36 | smartphone
23 | camera
1 | glucose monitor
3 | smartwatch
2 | smartphone
7 | smartphone
2 | camera
2 | glucose monitor
2 | smartwatch
How can I check which user_id has at least one of every item?
The following items will not be static and may be different everytime. However in this example there are 4 unique items: camera, smartphone, smartwatch, glucose monitor
Expected Result:
Because user_id : 2 has at least one of every item, the result will be:
user_id
2
Here is what I attempted at so far, however if the list of items changes from 4 unique items to 3 unique items, I don't think it works anymore.
SELECT *
FROM Information
GROUP BY Information.user_id
having count(DISTINCT item) >= 4

One approach would be to aggregate by user_id, and then assert that the distinct item_id count matches the total distinct item_id count from the entire table.
SELECT
user_id
FROM Information
GROUP BY
user_id
HAVING
COUNT(DISTINCT item_id) = (SELECT COUNT(DISTINCT item_id) FROM Information);

You can try to use self-join by count and total count
SELECT t1.user_id
FROM (
SELECT user_id,COUNT(DISTINCT item) cnt
FROM T
GROUP BY user_id
) t1 JOIN (SELECT COUNT(DISTINCT item) cnt FROM T) t2
WHERE t1.cnt = t2.cnt
or exists
Query 1:
SELECT t1.user_id
FROM (
SELECT user_id,COUNT(DISTINCT item) cnt
FROM T
GROUP BY user_id
) t1
WHERE exists(
SELECT 1
FROM T tt
HAVING COUNT(DISTINCT tt.item) = t1.cnt
)
Results:
| user_id |
|---------|
| 2 |

One more way of solving this problem is by using CTE and dense_rank function.
This also gives better performance on MySQL. The Dense_Rank function ranks every item among users. I count the number of distinct items and say pick the users who have the maximum number of distinct items.
With Main as (
Select user_id
,item
,Dense_Rank () over (
Partition by user_id
Order by item
) as Dense_item
From information
)
Select
user_id
From Main
Where
Dense_item = (
Select
Count(Distinct item)
from
information);

Related

Delete duplication of table item's

Hi i have to delete duplication from my table where item's duplicate for user
Example table
Id | User | item | count
1 | max | coco | 2
2 | max | nut | 4
3 | max | image| 1
4 | max | coco | 4
How to create sql query to delete all duplicate where, have a lot of user's.
I try to find this duplicate by :
SELECT id, user, item, COUNT(id) AS licznik
FROM Users
GROUP BY user, item
HAVING licznik > 1;
If you don't care which row remains, you can use a query such as the following to keep the row with the minimum id:
delete u
from users u
(select user, item, min(id) as min_id
from users u
group by user, item
having count(*) > 1
) ui
using (user, item)
where u.id > min_id;
Try using common expression table to receive data with row number over specified columns (window function). And then delete with row number larger than 1. I don't have mysql database to check if it works but on Microsoft sql server below query works like a charm. I read documentation of mysql and this should also work.
;with cte as (
select
row_number() over (partition by [User], item order by Id) as rn
from Users
)
delete c from cte c where rn > 1

sum of count(*) for all rows in MySQL

I'm stuck with sum() query where I want the sum of count(*) values in all rows with group by.
Here is the query:
select
u.user_type as user,
u.count,
sum(u.count)
FROM
(
select
DISTINCT
user_type,
count(*) as count
FROM
users
where
(user_type = "driver" OR user_type = "passenger")
GROUP BY
user_type
) u;
Current Output:
----------------------------------
| user | count | sum |
----------------------------------
| driver | 58 | 90 |
----------------------------------
Expected Output:
----------------------------------
| user | count | sum |
----------------------------------
| driver | 58 | 90 |
| passenger | 32 | 90 |
----------------------------------
If I remove sum(u.count) from query then output is looks like:
--------------------------
| user | count |
--------------------------
| driver | 58 |
| passenger | 32 |
--------------------------
You need a subquery:
SELECT user_type,
Count(*) AS count,
(SELECT COUNT(*)
FROM users
WHERE user_type IN ("driver","passenger" )) as sum
FROM users
WHERE user_type IN ("driver","passenger" )
GROUP BY user_type ;
Note you dont need distinct here.
OR
SELECT user_type,
Count(*) AS count,
c.sum
FROM users
CROSS JOIN (
SELECT COUNT(*) as sum
FROM users
WHERE user_type IN ("driver","passenger" )
) as c
WHERE user_type IN ("driver","passenger" )
GROUP BY user_type ;
You can use WITH ROLLUP modifier:
select coalesce(user_type, 'total') as user, count(*) as count
from users
where user_type in ('driver', 'passenger')
group by user_type with rollup
This will return the same information but in a different format:
user | count
----------|------
driver | 32
passenger | 58
total | 90
db-fiddle
In MySQL 8 you can use COUNT() as window function:
select distinct
user_type,
count(*) over (partition by user_type) as count,
count(*) over () as sum
from users
where user_type in ('driver', 'passenger');
Result:
user_type | count | sum
----------|-------|----
driver | 32 | 90
passenger | 58 | 90
db-fiddle
or use CTE (Common Table Expressions):
with cte as (
select user_type, count(*) as count
from users
where user_type in ('driver', 'passenger')
group by user_type
)
select user_type, count, (select sum(count) from cte) as sum
from cte
db-fiddle
I would be tempted to ask; Are you sure you need this at the DB level?
Unless you are working purely in the database layer, any processing of these results will be built into an application layer and will presumably require some form of looping through the results
It could be easier, simpler, and more readable to run
SELECT user_type,
COUNT(*) AS count
FROM users
WHERE user_type IN ("driver", "passenger")
GROUP BY user_type
.. and simply add up the total count in the application layer
As pointed out by Juan in another answer, the DISTINCT is redundant as the GROUP BY ensures that each resultant row is different
Like Juan, I also prefer an IN here, rather than OR condition, for the user_type as I find it more readable. It also reduces the likelihood of confusion if combining further AND conditions in the future
As an aside, I would consider moving the names of the user types, "driver" and "passenger" into a separate user_types table and referencing them by an ID column from your users table
N.B. If you absolutely do need this at the DB level, I would advocate using one of Paul's excellent options, or the CROSS JOIN approach proffered by Tom Mac, and by Juan as his second suggested solution
Try this. Inline view gets the overall total :
SELECT a.user_type,
count(*) AS count,
b.sum
FROM users a
JOIN (SELECT COUNT(*) as sum
FROM users
WHERE user_type IN ("driver","passenger" )
) b ON TRUE
WHERE a.user_type IN ("driver","passenger" )
GROUP BY a.user_type;
You could simply combine SUM() OVER() with COUNT(*):
SELECT user_type, COUNT(*) AS cnt, SUM(COUNT(*)) OVER() AS total
FROM users WHERE user_type IN ('driver', 'passenger') GROUP BY user_type;
db<>fiddle demo
Output:
+------------+------+-------+
| user_type | cnt | total |
+------------+------+-------+
| passenger | 58 | 90 |
| driver | 32 | 90 |
+------------+------+-------+
Add a group by clause at the end for user-type, e.g:
select
u.user_type as user,
u.count,
sum(u.count)
FROM
(
select
DISTINCT
user_type,
count(*) as count
FROM
users
where
(user_type = "driver" OR user_type = "passenger")
GROUP BY
user_type
) u GROUP BY u.user_type;
Tom Mac Explain Properly Your answer. Here is the another way you can do that.
I check the query performance and not found any difference within 1000 records
select user_type,Countuser,(SELECT COUNT(*)
FROM users
WHERE user_type IN ('driver','passenger ') )as sum from (
select user_type,count(*) as Countuser from users a
where a.user_type='driver'
group by a.user_type
union
select user_type,count(*) as Countuser from users b
where b.user_type='passenger'
group by b.user_type
)c
group by user_type,Countuser
Try this:
WITH SUB_Q AS (
SELECT USER_TYPE, COUNT (*) AS CNT
FROM USERS
WHERE USER_TYPE = "passenger" OR USER_TYPE = "driver"
GROUP BY USER_TYPE
),
SUB_Q2 AS (
SELECT SUM(CNT) AS SUM_OF_COUNT
FROM SUB_Q
)
SELECT A.USER_TYPE, A.CNT AS COUNT, SUB_Q2 AS SUM
FROM SUB_Q JOIN SUB_Q2 ON (TRUE);
I used postgresql dialect but you can easily change to a subquery.
select
u.user_type as user,
u.count,
sum(u.count)
FROM users group by user

Get count of multiple table records with group by function

I have 2 tables : priority_list and priority_list_delete
I want to get the following data in a single row:
1) Sum of these 2 table records
2) Count of individual table records
3) Count of priority_list_delete table records category wise
This is what I have done so far:
SELECT
(SELECT COUNT(*) FROM priority_list)+(SELECT COUNT(*) from
priority_list_delete) as tot_count,
(SELECT COUNT(*) FROM priority_list) as prior_cnt,
(SELECT COUNT(*) FROM priority_list_delete) as prior_del_cnt
The above query returns the count of the tables but when I merge the below query with the above one, it throws an error:
(SELECT category, COUNT(*) FROM priority_list_delete group by category)
I guess, there is some syntax error which I am unable to sort it out and moreover I am not getting idea about how to get the count records category wise where category names will be the column name.
Example format:
tot_count| prior_cnt| prior_del_cnt| ST | OBC
---------|----------|--------------|------|------
920 | 893 | 27 | 64 | 100
Here ST and OBC are the categories.
Any help would be appreciated.
Thanks in advance.
I think your exact desired output might be tough to do, because the number of category columns is dynamic. But we can try reporting categories across rows:
SELECT category, cnt
FROM
(
SELECT category, COUNT(*) AS cnt, 0 AS pos
FROM priority_list_delete
GROUP BY category
UNION ALL
SELECT 'prior_cnt', COUNT(*), 1 FROM priority_list
UNION ALL
SELECT 'prior_del_cnt', COUNT(*), 2 FROM priority_list_delete
UNION ALL
SELECT 'tot_count', (SELECT COUNT(*) FROM priority_list) +
(SELECT COUNT(*) FROM priority_list_delete), 3
) t
ORDER BY pos, category;
This would give an output looking something like:
category | cnt
ST | 64
OBC | 100
prior_cnt | 893
prior_del_cnt | 27
tot_count | 920

ORDER BY before GROUP BY for leaderboard

I have a table with all players results:
id |result| user_id
------------------------
1 | 130 | 5C382072
2 | 145 | 5C382072
3 | 130 | 8QHDTz7w
4 | 166 | 6155B6D0
5 | 100 | DFSA3444
Smaller result is better. I need to make query for leaderboard.
Each player must appear once in leaderboard with his best result. If 2 players have equal results, the one with smaller id should appear first.
So I'm expecting this output:
id |result| user_id
------------------------
5 | 100 | DFSA3444
1 | 130 | 5C382072
3 | 130 | 8QHDTz7w
4 | 166 | 6155B6D0
I can't get desired result, cause grouping by user_id goes before ordering it by result, id.
My code:
SELECT id, MIN(result), user_id
FROM results
GROUP BY user_id
ORDER BY result, id
It output something close to desired result, but id field is not connected to row with smallest user result, it can be any id from group with the same user_id. Because of that ordering by id not work at all.
EDIT:
What I didn't mention before is that I need to handle situations when user have identical results.
I came up with two solutions that I don't like. :)
1) A bit slow and ugly:
SELECT t1.*
FROM (SELECT * FROM results WHERE results_status=1) t1
LEFT JOIN (SELECT * FROM results WHERE results_status=1) t2
ON (t1.user_id = t2.user_id AND (t1.result > t2.result OR (t1.result = t2.result AND t1.id > t2.id)))
WHERE t2.id IS NULL
ORDER BY result, id
2) Ten times slower but more clear:
SELECT *
FROM results t1
WHERE id = (
SELECT id
FROM results
WHERE user_id = t1.user_id AND results_status=1
ORDER BY result, id
LIMIT 1
)
ORDER BY result, id
I'm stuck. :(
This should get you close. Avoid MySQL's lenient (errant) GROUP BY syntax, which lets you form a GROUP BY clause without naming unaggregated columns from the SELECT list. Use standard SQL's GROUP BY syntax instead.
select t.user_id, m.min_result, min(t.id) id
from results t
inner join (select user_id, min(result) min_result
from results
group by user_id) m
on t.user_id = m.user_id
and t.result = m.min_result
group by t.user_id, m.min_result
Edit: I think you need a subquery:
SELECT a.id, a.result, a.user_id FROM results a
WHERE a.user_id, a.result IN (SELECT b.user_id, MIN(b.result) FROM results b
GROUP BY b.user_id)
ORDER BY a.user_id
This will return an undefined id if the same user had the same score more than once, but will order the users correctly and will match id to the correct user_id.

SQL Query - Not in a set of already in-use items

I am trying to select jobs that are not currently assigned to a user.
Users table: id | name
Jobs: id | name
Assigned: id | user_id | job_id | date_assigned
I want to select all the jobs that are not currently taken. Example:
Users:
id | name
--------------
1 | Chris
2 | Steve
Jobs
id | name
---------------
1 | Sweep
2 | Skids
3 | Mop
Assigned
id | user_id | job_id | date_assigned
-------------------------------------------------
1 | 1 | 1 | 2012-01-01
2 | 1 | 2 | 2012-01-02
3 | 2 | 3 | 2012-01-05
No two people can be assigned the same job. So the query would return
[1, Sweep]
Since no one is working on it since Chris got moved to Skids a day later.
So far:
SELECT
*
FROM
jobs
WHERE
id
NOT IN
(
SELECT
DISTINCT(job_id)
FROM
assigned
ORDER BY
date_assigned
DESC
)
However, this query returns NULL on the same data set. Not addressing that the sweep job is now open because it is not currently being worked on.
SELECT a.*
FROM jobs a
LEFT JOIN
(
SELECT a.job_id
FROM assigned a
INNER JOIN
(
SELECT MAX(id) AS maxid
FROM assigned
GROUP BY user_id
) b ON a.id = b.maxid
) b ON a.id = b.job_id
WHERE b.job_id IS NULL
This gets the most recent job per user. Once we have a list of those jobs, we select all jobs that aren't on that list.
You can try this variant:
select * from jobs
where id not in (
select job_id from (
select user_id, job_id, max(date_assigned)
from assigned
group by user_id, job_id));
I think you might want:
SELECT *
FROM jobs
WHERE id NOT IN (SELECT job_id
from assigned
where user_id is not null
)
This assumes that re-assigning someone changes the user id on the original assignment. Does this happen? By the way, I also simplified the subquery.
First you need to be looking at a list of only current job assignments. Ordering isn't enough. The way you have it set up, you need a distinct subset of job assignments from Assigned that are the most recent assignments.
So you want a grouping subquery something like
select job_id, user_id, max(date_assigned) last_assigned from assigned group by job_id, user_id
Put it all together and you get
select id, name from jobs
where id not in (
select job_id as id from (
select job_id, user_id, max(date_assigned) last_assigned from assigned
group by job_id, user_id
)
)
As an extra feature, you could pass up the value of "last_assigned" and it would tell you how long a job has been idle for.