Grouping data into ranges - mysql

I have a query that returns the counts from a database. Sample output of the query:
23
14
94
42
23
12
The query:
SELECT COUNT(*)
FROM `submissions`
INNER JOIN `events`
ON `submissions`.event_id = `events`.id
WHERE events.user_id IN (
SELECT id
FROM `users`
WHERE users.created_at IS NOT NULL
GROUP BY `events`.id
Is there a way to easily take the output and split it into pre-defined ranges of values (0-100, 101-200, etc), indicating the number of rows that fall into a particular range?

Use a case expression in select clause.
SELECT `events`.id ,
case when COUNT(`events`.id) between 0 and 100 then '0 - 100'
when COUNT(`events`.id) between 100 and 200 then '100 - 200'
end as Range
FROM `submissions`
INNER JOIN `events`
ON `submissions`.event_id = `events`.id
WHERE events.user_id IN (
SELECT id
FROM `users`
WHERE users.created_at IS NOT NULL
GROUP BY `events`.id

Use conditional count by leveraging SUM() aggregate.
If you need your ranges in columns
SELECT SUM(CASE WHEN n BETWEEN( 0 AND 100) THEN 1 ELSE 0 END) '0-100',
SUM(CASE WHEN n BETWEEN(101 AND 200) THEN 1 ELSE 0 END) '101-200'
-- , add other ranges here
FROM (
SELECT COUNT(*) n
FROM submissions s JOIN events e
ON s.event_id = e.id JOIN users u
ON e.user_id = u.id
WHERE u.created_at IS NOT NULL
GROUP BY e.id
) q
Sample output
+-------+---------+
| 0-100 | 101-200 |
+-------+---------+
| 2 | 3 |
+-------+---------+
1 row in set (0.01 sec)
If you'd rather have it as a set you can do
SELECT CONCAT(r.min, '-', r.max) `range`,
SUM(n BETWEEN r.min AND r.max) count
FROM (
SELECT COUNT(*) n
FROM submissions s JOIN events e
ON s.event_id = e.id JOIN users u
ON e.user_id = u.id
WHERE u.created_at IS NOT NULL
GROUP BY e.id
) q CROSS JOIN (
SELECT 0 min, 100 max
UNION ALL
SELECT 101, 200
-- add other ranges here
) r
GROUP BY r.min, r.max
Sample output
+---------+-------+
| range | count |
+---------+-------+
| 0-100 | 2 |
| 101-200 | 3 |
+---------+-------+
2 rows in set (0.01 sec)

Related

get the total count but exclude certain condition

Hello I had this table:
id | user_id | status
1 | 34 | x
2 | 35 | x
3 | 42 | x
4 | 42 | y
My goal is to count the data with X status except if the user has a another data with Y status, it will exclude in the count. So instead of 3, it will only count 2 since the 3rd row has another data which is the 4th row with y status.
SELECT * FROM logs
AND user_id NOT IN (SELECT user_id FROM logs WHERE status = 'y')
GROUP BY user_id;
We can try the following aggregation approach:
SELECT COUNT(*) AS cnt
FROM
(
SELECT user_id
FROM logs
GROUP BY user_id
HAVING MIN(status) = MAX(status) AND
MIN(status) = 'x'
) t;
The above logic only counts a user having one or more records only having x status.
You can do it this way, I only modify a bit on your sql
SELECT COUNT(*) FROM (
SELECT u_id FROM tbl WHERE u_id NOT IN
(SELECT u_id FROM tbl WHERE status = 'y')
GROUP BY u_id
) as t
You can use inner join:
SELECT
count(t1.id) AS `cnt`
FROM
`test` AS t1,
`test` AS t2
WHERE
t2.`status`='y'
&& t1.`user_id` != t2.`user_id`;

Sum hours value, count and display based on hours using SQL

I have 2 tables which are Teacher and Activities.
CREATE TABLE teacher (
TeacherId INT, BranchId VARCHAR(5));
INSERT INTO teacher VALUES
("1121","A"),
("1132","A"),
("1141","A"),
("2120","B"),
("2122","B");
CREATE TABLE activities (
ID INT, TeacherID INT, Hours INT);
INSERT INTO activities VALUES
(1,1121,2),
(2,1121,1),
(3,1132,1),
(4,1141,NULL),
(5,2120,NULL),
(6,2122,NULL);
NULL indicates no activities and will be convert to 0 on output table. I want to produce a query to count total of hours and count how many activities base on teacher hours such as the following table:
+-----------+------------+------------+
| Hours | A | B |
+-----------+------------+------------+
| 0 | 1 | 2 |
| 1 | 1 | 0 |
| 2 | 0 | 0 |
| 3 | 1 | 0 |
+-----------+------------+------------+
Edited: Sorry I don't know how to elaborate accurately, but here is the fiddle i received from other member https://www.db-fiddle.com/f/mmtuZquKyUqdhPvTFN9qaF/1
Edit: Last, modification need, to sum the hours and count the hours base on branch id and teacher id as the output.
Expected output here (red text): https://drive.google.com/file/d/1wyZ_aX5hz_7I1Ncf5sXLpstYk6FT8PMg/view?usp=sharing
We can handle this via the use of a calendar table of hours joined to an aggregation subquery:
SELECT
t1.Hours,
SUM(CASE WHEN t2.BranchId = 'A' THEN t2.cnt ELSE 0 END) AS A,
SUM(CASE WHEN t2.BranchId = 'B' THEN t2.cnt ELSE 0 END) AS B
FROM (SELECT 0 AS Hours UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) t1
LEFT JOIN
(
SELECT t.BranchId, COALESCE(a.Hours, 0) AS Hours, COUNT(*) AS cnt
FROM Teacher t
LEFT JOIN Activities a ON a.TeacherId = t.TeacherId
GROUP BY t.BranchId, COALESCE(a.Hours, 0)
) t2
ON t1.Hours = t2.Hours
GROUP BY
t1.Hours
ORDER BY
t1.Hours
Demo
This is basically a JOIN and aggregation . . . but you need to start with all the hours you want:
SELECT h.Hours,
COALESCE(SUM(t.BranchId = 'A'), 0) AS A,
COALESCE(SUM(t.BranchId = 'B'), 0) AS B
FROM (SELECT 0 AS Hours UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
) h LEFT JOIN
activities a
ON h.hours = COALESCE(a.hours, 0) LEFT JOIN
teacher t
ON t.TeacherId = a.TeacherId
GROUP BY h.Hours
ORDER BY h.Hours;
Here is a db<>fiddle.

MySQL multiple SubQueries on same table

I got a table votes that indicates me if a user voted for a specific movie. It also shows me how many movies a user has voted for.
id_film | id_user | voting
----------------------------
1 | 1 | 7
1 | 33 | 5
3 | 1 | 9
4 | 7 | 7
4 | 2 | 8
4 | 1 | 6
6 | 1 | 6
... | ... | ...
I want to get a list of id_film's which are related to id_user's in this way:
Get all id_film's from a specific id_user like
SELECT id_film FROM votes WHERE id_user = 1
Grab every id_user which is related
SELECT DISTINCT v.user FROM votes v WHERE id_film IN ( id_film's )
Then SELECT id_film's FROM votes v WHERE user IN ( "user list from previous query" ) except id_film's from first query.
This was my first attempt:
SELECT id_film, film.title, film.originaltitle, COUNT(*)
FROM votes v
INNER JOIN film ON v.id_film = film.id
WHERE user IN
(
SELECT DISTINCT v.user
FROM votes v
WHERE id_film IN
(
SELECT id_film
FROM votes v
WHERE user = 1
)
)
AND
id_film NOT IN
(
SELECT id_film
FROM votes v
WHERE user = 1
)
GROUP BY id_film
It doesn't work. MySQL took too long for a result and I restarted XAMPP.
So I tried another SELECT, this time with JOINS:
SELECT DISTINCT v.id_film AS vFilm, v1.user AS v1User, v2.id_film AS v2Film
FROM votes v
LEFT OUTER JOIN votes v1 ON v1.id_film = v.id_film
LEFT OUTER JOIN votes v2 ON v1.user = v2.user
WHERE v.user = 1
AND v1.user != 1
AND v2.id_film NOT
IN
(
SELECT id_film
FROM votes
WHERE user = 1
)
GROUP BY v2.id_film
Also doesn't work, but when I tried it without the NOT IN condition in the end it works! (It took appr. 13 sec.) :-(
Here is the working query.
SELECT DISTINCT v2.id_film AS v2Film
FROM votes v
LEFT OUTER JOIN votes v1 ON v1.id_film = v.id_film
LEFT OUTER JOIN votes v2 ON v1.user = v2.user
WHERE v.user = 1
AND v1.user != 1
With Output
v2Film
---------
1
13
14
58
4
...
But this query doesn't except id_film's from first query.
Because I know that user 1 already voted for id_film 1.
So, am I totally wrong with my logic or is my code too complex for this?

MySQL INNER JOIN from second table (TOP10)

$stmt = $conn->prepare('SELECT a.*, c.*, SUM(a.money+b.RESULT) AS ARESULT
FROM users a
INNER JOIN bankaccounts c
ON a.id = c.owner
INNER JOIN
(
SELECT owner, SUM(amount) AS RESULT
FROM bankaccounts
GROUP BY owner
) b ON a.id = b.owner
ORDER BY ARESULT DESC LIMIT 10');
What's problem, it show wrong only one record? I want list max 10 records - like TOP 10 richest who has [money+(all his bankaccounts amount)]
Lets say.. I have 2 tables.
Table: users
ID | username | money
1 | richman | 500
2 | richman2 | 600
Table: bankaccounts
ID | owner | amount
65 | 1 | 50
68 | 1 | 50
29 | 2 | 400
So it would list:
richman2 1000$
richman 600$
Try using a subqueries...
$stmt = $conn->prepare('SELECT a.*,
IFNULL((SELECT SUM(amount) FROM bankaccounts b WHERE b.owner=a.id),0) AS BANK_MONEY,
(IFNULL(a.money,0) + IFNULL((SELECT SUM(amount) FROM bankaccounts c WHERE c.owner=a.id),0)) AS ARESULT
FROM users a
ORDER BY ARESULT DESC LIMIT 0, 10');
EDIT: Added a field for bank account totals
EDIT2: Added IFNULL to SQL statement in case user is not in BankAccounts table
Try this:
SELECT a.*, (a.money + b.RESULT) AS ARESULT
FROM users a
INNER JOIN (SELECT owner, SUM(amount) AS RESULT
FROM bankaccounts
GROUP BY owner
) b ON a.id = b.owner
ORDER BY ARESULT DESC
LIMIT 10

Unknown column in mysql subquery

I am trying to get the avg of an item so I am using a subquery.
Update: I should have been clearer initially, but i want the avg to be for the last 5 items only
First I started with
SELECT
y.id
FROM (
SELECT *
FROM (
SELECT *
FROM products
WHERE itemid=1
) x
ORDER BY id DESC
LIMIT 15
) y;
Which runs but is fairly useless as it just shows me the ids.
I then added in the below
SELECT
y.id,
(SELECT AVG(deposit) FROM (SELECT deposit FROM products WHERE id < y.id ORDER BY id DESC LIMIT 5)z) AVGDEPOSIT
FROM (
SELECT *
FROM (
SELECT *
FROM products
WHERE itemid=1
) x
ORDER BY id DESC
LIMIT 15
) y;
When I do this I get the error Unknown column 'y.id' in 'where clause', upon further reading here I believe this is because when the queries go down to the next level they need to be joined?
So I tried the below ** removed un needed suquery
SELECT
y.id,
(SELECT AVG(deposit) FROM (
SELECT deposit
FROM products
INNER JOIN y as yy ON products.id = yy.id
WHERE id < yy.id
ORDER BY id DESC
LIMIT 5)z
) AVGDEPOSIT
FROM (
SELECT *
FROM products
WHERE itemid=1
ORDER BY id DESC
LIMIT 15
) y;
But I get Table 'test.y' doesn't exist. Am I on the right track here? What do I need to change to get what I am after here?
The example can be found here in sqlfiddle.
CREATE TABLE products
(`id` int, `itemid` int, `deposit` int);
INSERT INTO products
(`id`, `itemid`, `deposit`)
VALUES
(1, 1, 50),
(2, 1, 75),
(3, 1, 90),
(4, 1, 80),
(5, 1, 100),
(6, 1, 75),
(7, 1, 75),
(8, 1, 90),
(9, 1, 90),
(10, 1, 100);
Given my data in this example, my expected result is below, where there is a column next to each ID that has the avg of the previous 5 deposits.
id | AVGDEPOSIT
10 | 86 (deposit value of (id9+id8+id7+id6+id5)/5) to get the AVG
9 | 84
8 | 84
7 | 84
6 | 79
5 | 73.75
I'm not an MySQL expert (in MS SQL it could be done easier), and your question looks a bit unclear for me, but it looks like you're trying to get average of previous 5 items.
If you have Id without gaps, it's easy:
select
p.id,
(
select avg(t.deposit)
from products as t
where t.itemid = 1 and t.id >= p.id - 5 and t.id < p.id
) as avgdeposit
from products as p
where p.itemid = 1
order by p.id desc
limit 15
If not, then I've tri tried to do this query like this
select
p.id,
(
select avg(t.deposit)
from (
select tt.deposit
from products as tt
where tt.itemid = 1 and tt.id < p.id
order by tt.id desc
limit 5
) as t
) as avgdeposit
from products as p
where p.itemid = 1
order by p.id desc
limit 15
But I've got exception Unknown column 'p.id' in 'where clause'. Looks like MySQL cannot handle 2 levels of nesting of subqueries.
But you can get 5 previous items with offset, like this:
select
p.id,
(
select avg(t.deposit)
from products as t
where t.itemid = 1 and t.id > coalesce(p.prev_id, -1) and t.id < p.id
) as avgdeposit
from
(
select
p.id,
(
select tt.id
from products as tt
where tt.itemid = 1 and tt.id <= p.id
order by tt.id desc
limit 1 offset 6
) as prev_id
from products as p
where p.itemid = 1
order by p.id desc
limit 15
) as p
sql fiddle demo
This is my solution. It is easy to understand how it works, but at the same time it can't be optimized much since I'm using some string functions, and it's far from standard SQL. If you only need to return a few records, it could be still fine.
This query will return, for every ID, a comma separated list of previous ID, ordered in ascending order:
SELECT p1.id, p1.itemid, GROUP_CONCAT(p2.id ORDER BY p2.id DESC) previous_ids
FROM
products p1 LEFT JOIN products p2
ON p1.itemid=p2.itemid AND p1.id>p2.id
GROUP BY
p1.id, p1.itemid
ORDER BY
p1.itemid ASC, p1.id DESC
and it will return something like this:
| ID | ITEMID | PREVIOUS_IDS |
|----|--------|-------------------|
| 10 | 1 | 9,8,7,6,5,4,3,2,1 |
| 9 | 1 | 8,7,6,5,4,3,2,1 |
| 8 | 1 | 7,6,5,4,3,2,1 |
| 7 | 1 | 6,5,4,3,2,1 |
| 6 | 1 | 5,4,3,2,1 |
| 5 | 1 | 4,3,2,1 |
| 4 | 1 | 3,2,1 |
| 3 | 1 | 2,1 |
| 2 | 1 | 1 |
| 1 | 1 | (null) |
then we can join the result of this query with the products table itself, and on the join condition we can use FIND_IN_SET(src, csvalues) that return the position of the src string inside the comma separated values:
ON FIND_IN_SET(id, previous_ids) BETWEEN 1 AND 5
and the final query looks like this:
SELECT
list_previous.id,
AVG(products.deposit)
FROM (
SELECT p1.id, p1.itemid, GROUP_CONCAT(p2.id ORDER BY p2.id DESC) previous_ids
FROM
products p1 INNER JOIN products p2
ON p1.itemid=p2.itemid AND p1.id>p2.id
GROUP BY
p1.id, p1.itemid
) list_previous LEFT JOIN products
ON list_previous.itemid=products.itemid
AND FIND_IN_SET(products.id, previous_ids) BETWEEN 1 AND 5
GROUP BY
list_previous.id
ORDER BY
id DESC
Please see fiddle here. I won't recommend using this trick for big tables, but for small sets of data it is fine.
This is maybe not the simplest solution, but it does do the job and is an interesting variation and in my opinion transparent. I simulate the analytical functions that I know from Oracle.
As we do not assume the id to be consecutive the counting of the rows is simulated by increasing #rn each row. Next products table including the rownum is joint with itself and only the rows 2-6 are used to build the average.
select p2id, avg(deposit), group_concat(p1id order by p1id desc), group_concat(deposit order by p1id desc)
from ( select p2.id p2id, p1.rn p1rn, p1.deposit, p2.rn p2rn, p1.id p1id
from (select p.*,#rn1:=#rn1+1 as rn from products p,(select #rn1 := 0) r) p1
, (select p.*,#rn2:=#rn2+1 as rn from products p,(select #rn2 := 0) r) p2 ) r
where p2rn-p1rn between 1 and 5
group by p2id
order by p2id desc
;
Result:
+------+--------------+---------------------------------------+------------------------------------------+
| p2id | avg(deposit) | group_concat(p1id order by p1id desc) | group_concat(deposit order by p1id desc) |
+------+--------------+---------------------------------------+------------------------------------------+
| 10 | 86.0000 | 9,8,7,6,5 | 90,90,75,75,100 |
| 9 | 84.0000 | 8,7,6,5,4 | 90,75,75,100,80 |
| 8 | 84.0000 | 7,6,5,4,3 | 75,75,100,80,90 |
| 7 | 84.0000 | 6,5,4,3,2 | 75,100,80,90,75 |
| 6 | 79.0000 | 5,4,3,2,1 | 100,80,90,75,50 |
| 5 | 73.7500 | 4,3,2,1 | 80,90,75,50 |
| 4 | 71.6667 | 3,2,1 | 90,75,50 |
| 3 | 62.5000 | 2,1 | 75,50 |
| 2 | 50.0000 | 1 | 50 |
+------+--------------+---------------------------------------+------------------------------------------+
SQL Fiddle Demo: http://sqlfiddle.com/#!2/c13bc/129
I want to thank this answer on how to simulate analytical functions in mysql: MySQL get row position in ORDER BY
It looks like you just want:
SELECT
id,
(SELECT AVG(deposit)
FROM (
SELECT deposit
FROM products
ORDER BY id DESC
LIMIT 5) last5
) avgdeposit
FROM products
The inner query gets the last 5 rows added to product, the query that wraps that gets the average for their deposits.
I'm going to simplify your query a bit so I can explain it.
SELECT
y.id,
(
SELECT AVG(deposit) FROM
(
SELECT deposit
FROM products
LIMIT 5
) z
) AVGDEPOSIT
FROM
(
SELECT *
FROM
(
SELECT *
FROM products
) x
LIMIT 15
) y;
My guess would be that you just need to insert some AS keywords in there. I'm sure someone else will come up with something more elegant, but for now you can try it out.
SELECT
y.id,
(
SELECT AVG(deposit) FROM
(
SELECT deposit
FROM products
LIMIT 5
) z
) AS AVGDEPOSIT
FROM
(
SELECT *
FROM
(
SELECT *
FROM products
) AS x
LIMIT 15
) y;
Here's one way to do it in MySQL:
SELECT p.id
, ( SELECT AVG(deposit)
FROM ( SELECT #rownum:=#rownum+1 rn, deposit, id
FROM ( SELECT #rownum:=0 ) r
, products
ORDER BY id ) t
WHERE rn BETWEEN p.rn-5 AND p.rn-1 ) avgdeposit
FROM ( SELECT #rownum1:=#rownum1+1 rn, id
FROM ( SELECT #rownum1:=0 ) r
, products
ORDER BY id ) p
WHERE p.rn >= 5
ORDER BY p.rn DESC;
It's a shame MySQL doesn't support the WITH clause or windowing functions. Having both would greatly simplify the query to the following:
WITH tbl AS (
SELECT id, deposit, ROW_NUMBER() OVER(ORDER BY id) rn
FROM products
)
SELECT id
, ( SELECT AVG(deposit)
FROM tbl
WHERE rn BETWEEN t.rn-5 AND t.rn-1 )
FROM tbl t
WHERE rn >= 5
ORDER BY rn DESC;
The latter query runs fine in Postgres.
2 possible solutions here
Firstly using user variables to add a sequence number. Do this twice, and join the second set to the first where the sequence number is between the id - 1 and the id - 5. Then just use AVG. No correlated sub queries.
SELECT Sub3.id, Sub3.itemid, Sub3.deposit, AVG(Sub4.deposit)
FROM
(
SELECT Sub1.id, Sub1.itemid, Sub1.deposit, #Seq:=#Seq+1 AS Sequence
FROM
(
SELECT id, itemid, deposit
FROM products
ORDER BY id DESC
) Sub1
CROSS JOIN
(
SELECT #Seq:=0
) Sub2
) Sub3
LEFT OUTER JOIN
(
SELECT Sub1.id, Sub1.itemid, Sub1.deposit, #Seq1:=#Seq1+1 AS Sequence
FROM
(
SELECT id, itemid, deposit
FROM products
ORDER BY id DESC
) Sub1
CROSS JOIN
(
SELECT #Seq1:=0
) Sub2
) Sub4
ON Sub4.Sequence BETWEEN Sub3.Sequence + 1 AND Sub3.Sequence + 5
GROUP BY Sub3.id, Sub3.itemid, Sub3.deposit
ORDER BY Sub3.id DESC
Second one is cruder, and uses a correlated sub query (which is likely to perform poorly as the amount of data increases). Does a normal select but for the last column it has a sub query that refers to the id in the main select.
SELECT id, itemid, deposit, (SELECT AVG(P2.deposit) FROM products P2 WHERE P2.id BETWEEN P1.id - 5 AND p1.id - 1 ORDER BY id DESC LIMIT 5)
FROM products P1
ORDER BY id DESC
Is this what you are after?
SELECT m.id
, AVG(d.deposit)
FROM products m
, products d
WHERE d.id < m.id
AND d.id >= m.id - 5
GROUP BY m.id
ORDER BY m.id DESC
;
But can't be that simple. Firstly, the table cannot just contain one itemid (hence your WHERE clause); Second, the id cannot be sequential/without gaps within an itemid. Thirdly, you probably want to produce something that runs across itemid and not one itemid at a time. So here it is.
SELECT itemid
, m_id as id
, AVG(d.deposit) as deposit
FROM (
SELECT itemid
, m_id
, d_id
, d.deposit
, #seq := (CASE WHEN m_id = d_id THEN 0 ELSE #seq + 1 END) seq
FROM (
SELECT m.itemid
, m.id m_id
, d.id d_id
, d.deposit
FROM products m
, products d
WHERE m.itemid = d.itemid
AND d.id <= m.id
ORDER BY m.id DESC
, d.id DESC) d
, (SELECT #seq := 0) s
) d
WHERE seq BETWEEN 1 AND 5
GROUP BY itemid
, m_id
ORDER BY itemid
, m_id DESC
;