Count and Distinct SQL query - mysql

I have a table of Books, that contains the following columns:
Book_Id User_Id
001 1
002 2
001 1
004 2
005 3
006 3
007 2
008 2
009 1
Where :
Book_Id - identifier of a book that a user read; User_Id - identifier
of a reader/user.
Let's assume that User1 read books three times, but 2 of them were same, so the user 1 read 2 distinct books (001 and 009). User 2 read 4 distinct books, while user 3 read 2 distinct books.
In overall, there are 2 users that read 2 distinct books, and 1 user that read 4 distinct books.
The output expected is as below:
Distinct_Books_Count --- User_Count
2 2
4 1
I tried the following:
SELECT COUNT(DISTINCT Book_Id), COUNT(User_Id) FROM Books GROUP BY
User_Id
But I receive the following table:
Distinct_Books_Count User_Count
2 3
4 4
2 2
So any alternative solution or changes?

I call this a "histogram of histograms" query. You can do it using two group bys:
SELECT num_books, COUNT(*)
FROM (SELECT b.User_Id, COUNT(DISTINCT Book_Id) as num_books
FROM Books b
GROUP BY User_Id
) b
GROUP BY num_books
ORDER BY num_books;

Related

Select from two tables but only for the maximum values in one of the columns

I have the following two tables. I want to select all the users with user_type 1, as well as the details for their most recent note. I want to order the results by the note time_created.
users table:
user_id name user_type
1 Joe 1
2 Fred 1
3 Sam 0
4 Dave 0
notes table:
notes_id user_id note time_created
1 1 Joe 1 2019-06-05 13:45:00
2 1 Joe 2 2019-06-05 13:46:00
2 1 Joe 3 2019-06-05 13:47:00
3 2 Fred 1 2019-06-05 13:45:00
4 2 Fred 2 2019-06-05 13:46:00
5 3 Sam 1 2019-06-05 13:45:00
6 4 Dave 1 2019-06-05 13:45:00
So the result of the query would be
user_id name note time_created
1 Joe Joe 3 2019-06-05 13:47:00
2 Fred Fred 2 2019-06-05 13:45:00
This is my effort so far
SELECT users.user_id, users.name, notes.note, notes.time_created FROM notes
INNER JOIN users ON users.id=prospect_notes.subject_id
WHERE users.user_type = 1 ORDER BY notes_time_created;
It is pretty good, but it returns several rows per user. I just want the one row containing the maximum time created.
You can use a correlated subquery:
SELECT u.user_id, u.name, n.note, n.time_created
FROM users u JOIN
notes n
ON u.id = n.subject_id
WHERE u.user_type = 1 AND
n.notes_time_created = (SELECT MAX( n2.notes_time_created )
FROM notes n2
WHERE n2.subject_id = n.subject_id
)
ORDER BY n.notes_time_created;

SQL query that finds all the column, that is in multiple rows, but have different value in another column?

If I have a "SALES" table, with columns of SaleID, Product#, and CustomerName. and a PRODUCTS table with two columns product_ID and Name. The contains 5 differnt products. In the SALES table populates when a sale is made.
How would I query customer_name with only Product_ID of 1 and 2?
sales table
SALES_ID PRODUCT_ID CUSTOMER_NAME
1 1 DAVE
2 2 DAVE
3 3 DAVE
4 1 TOM
5 2 TOM
6 1 JANE
7 1 MIKE
8 1 MIKE
9 3 MIKE
10 4 MARY
I would like a table result to be
SALES_ID PRODUCT_ID CUSTOMER_NAME
1 1 TOM
2 2 TOM
Select s.CustomerName from SALES s
INNER JOIN PRODUCTS p ON s.Product#=p.Product#
WHERE p.Product# =1
INTERSECT
Select s.CustomerName from SALES s
INNER JOIN PRODUCTS p ON s.Product#=p.Product#
WHERE p.Product# =2

MYSQL Stuck Generating temp table (massive query)

I have 4 tables (1 to many):
Dont say anything about that "email" relation. It is how my developer boss built it years ago.
EMPLOYEES (+-50 results)
------------------------------------------------
id name
1 EmpName 1
2 EmpName 2
CUSTOMERS (+50k results)
------------------------------------------------
id name email employee_assigned
1 John john#doe.com 12
2 Donald donald#duck.com 6
INTERESTS_CATEGORIES (+650k results)
------------------------------------------------
id customer_email category_id
1 john#doe.com 97
2 john#doe.com 13
3 donald#duck.com 56
4 donald#duck.com 126
5 donald#duck.com 45
INTERESTS_PRODUCTS (+650k results)
------------------------------------------------
id customer_email product_id
1 john#doe.com 78
2 john#doe.com 23
3 donald#duck.com 19
4 donald#duck.com 56
5 donald#duck.com 45
So I need to filter the customers by their assigned employee and their interests.
And here is the query:
SELECT
*
FROM
(
SELECT
customers.id AS 'id',
customers.name AS 'first_name',
customers.email,
employees.id AS 'employee_id'
FROM
customers,
employees
WHERE
employees.id = 2
AND
customers.employee_assigned = employees.id
) AS myCustomers
LEFT JOIN interests_categories
ON interests_categories.customer_email = myCustomers.email
LEFT JOIN interests_products
ON interests_categories.customer_email = myCustomers.email
WHERE
(
interests_categories.category_id = 20
OR
interests_categories.category_id = 21
)
GROUP BY myCustomers.email
So, the problem:
If the employee has a low number of assigned customers (like 3) query
is successfull.
If the employee has a medium-high number of assigned customers (over 100) query stucks.
I execute SHOW PROCESSLIST and it is stucked "Generating temp table".
Anyone has idea? :(
Thank you.
Check the indexes on your tables and try this:
SELECT
c.id AS 'id',
c.name AS 'first_name',
c.email,
e.id AS 'employee_id'
ic.*,
ip.*
FROM customers c
JOIN employees e
ON c.employee_assigned = e.id
LEFT JOIN interests_categories ic
ON ic.customer_email = c.email
LEFT JOIN interests_products ip
ON ic.customer_email = c.email
WHERE
(
ic.category_id IN (20,21)
AND e.id = 2
)
GROUP BY myCustomers.email
Incidentally, a less dumb design might look like as follows. If it was me, I'd start with this, and provide properly representative CREATE and INSERT statements accordingly. Also, I'm curious about where category_id comes from - because that's potentially an area for further optimization.
EMPLOYEES
------------------------------------------------
employee_id name
6 EmpName 1
12 EmpName 2
CUSTOMERS
------------------------------------------------
customer_id name email employee_assigned
1 John john#doe.com 12
2 Donald donald#duck.com 6
INTERESTS_CATEGORIES
------------------------------------------------
customer_id category_id
1 97
1 13
2 56
2 126
2 45
INTERESTS_PRODUCTS
------------------------------------------------
customer_id product_id
1 78
1 23
2 19
2 56
2 45

Add total of 3 rows for specific id

I have three tables:
Students
-------------------------------------------------------------
studentId first last gender weight
-------------------------------------------------------------
1 John Doe m 185
2 John Doe2 m 130
3 John Doe3 m 250
Lifts
-------------------
liftId name
-------------------
1 Bench Press
2 Power Clean
3 Parallel Squat
4 Deadlift
5 Shoulder Press
StudentLifts
------------------------------------------------
studentLiftId studentId liftId weight
------------------------------------------------
1 1 1 185
2 2 3 130
3 3 1 190
4 1 2 120
5 2 1 155
6 3 2 145
7 1 1 135
8 1 1 205
9 2 3 200
10 1 3 150
11 2 2 110
12 3 3 250
I would like to have four top lists:
Bench Press
Parallel Squat
Power Clean
Total of the above 3
I can successfully grab a top list for each specific lift using the following query:
SELECT s.studentId, s.first, s.last, s.gender, s.weight, l.name, sl.weight
FROM Students s
LEFT JOIN (
SELECT *
FROM StudentLifts
ORDER BY weight DESC
) sl ON sl.studentId = s.studentId
LEFT JOIN Lifts l ON l.liftId = sl.liftId
WHERE l.name = 'Bench Press'
AND s.gender = 'm'
AND s.weight > 170
GROUP BY s.studentId
ORDER BY sl.weight DESC
However, I am stuck on how to add the highest total of each lift for each student. How can I first find the highest total for each student in each lift, and then add them up to get a total of all three lifts?
Edit
The result set that I am looking for would be something like:
-------------------------------------------------
studentId first last weight
-------------------------------------------------
3 John Doe3 585
1 John Doe 475
2 John Doe2 465
I also forgot to mention that I would actually like two lists, one for students above 170 and one for students below 170.
SELECT -- join student a total weight to the student table
A.studentId,
A.first,
A.last,
C.totalWeight
FROM
Student A,
(
SELECT -- for each studet add the max weights
sum(B.maxWeight) as totalWeight,
B.studentID
FROM (
SELECT -- for each (student,lift) select the max weight
max(weight) as maxWeight,
studentId,
liftID
FROM
StudentLifts
GROUP BY
studentId,
liftID
) B
GROUP BY
studentId
) C
WHERE
A.studentID = C.studentId
-- AND A.weight >= 170
-- AND A.weight < 170
-- pick one here to generate on of the two lists.

MySQL get top average entries

I am trying to write a mysql query to return the top 3 courses that have the highest average course rating. I have two tables, Ratings and Courses.
The Ratings table:
courseId rating
1 6
2 2
1 4
2 5
3 3
4 0
6 0
The Courses Table:
courseId cnum cname
1 100 name1
2 112 name2
3 230 name3
4 319 name4
5 122 name5
6 320 name6
I need to return the top 3 courses that have the highest average rating. Any ideas how I could do this? Thanks
SELECT Courses.*
FROM Courses NATURAL JOIN (
SELECT courseId, AVG(rating) avg_rating
FROM Ratings
GROUP BY courseId
ORDER BY avg_rating DESC
LIMIT 3
) t
See it on sqlfiddle.