Group MySQL into buckets based on table Max / Min - mysql

I am writing an MySQL query that will take a table, split it into buckets of equal size of a given column, and then return a count of values within each bucket. This isnt the same as 10 equal "count" buckets - I am expecting the number of records in each bucket to vary - but for them to be split equally by a given column.
I have data as follows:
User | Followers
----------------
User 1 | 100
User 2 | 1000
User 3 | 1300
User 4 | 2000
User 5 | 10000
I would like to split the data into 5 equal sized "follower" buckets - ie buckets of increasing 2000 followers. So there would be an output as follows:
Bucket | Count
-----------------------
1.(0 - 2000) | 3
2.(2000 - 4000) | 1
3.(4000 - 6000) | 0
4.(6000 - 8000) | 0
4.(8000 - 10000)| 1
So far I've tried the following:
SELECT (followers)%(bucket_size),COUNT(*) FROM (SELECT (ROUND((MAX(followers)/MIN(followers))/10,0)) as bucket_size FROM users
WHERE followers > 0) as a
INNER JOIN users
GROUP BY (followers)%(bucket_size)
But this is providing me with all distinct values.

You can use aggregation as follows:
select 1 + (t.followers - 1) % b.bucket_size bucket, count(*) no_users
from mytable t
cross join (select 2000 bucket_size) b
group by t.followers % b.bucket_size
On the other hand, if you want to also return empty buckets, as shown in your desired results, it is a bit different. You can use an inline query to list the buckets, then bring the table with a left join:
select n bucket, count(t.followers) cnt
from (select 2000 bucket_size) b
cross join (select 1 bucket union all select 2 union all select 3 union all select 4 union all select 5) n
left join mytable t on (t.followers - 1) % b.bucket_size = n.bucket - 1
group by n.bucket

If having empty buckets is not important, here is a simple and readable solution:
select bucket as Bucket,
count(*) as Count
from (
select case when followers between 0 and 1999 then '(0-2000)'
when followers between 2000 and 3999 then '(2000-4000)'
when followers between 4000 and 5999 then '(4000-6000)'
when followers between 6000 and 7999 then '(6000-8000)'
when followers between 8000 and 10000 then '(8000-10000)'
end as bucket
from users
) buckets
group by bucket
You can also play around with the above query here: db-fiddle

Related

How to select a row from table1 if the row id isn't present in table2 more than x times

accounts as a1 | team_logs as tl1
--------------------------------------------------------
id Name counter | id team_id user_id account_id
1 Account 1 2 | 1 1 100 1
2 Account 2 2 | 2 2 200 1
3 Account 3 0 | 3 3 300 2
... | 4 2 200 2
This is an account review app. Based on the 2 tables above a query is needed that will output 1 account from a1 table based on the tl1 records as below:
A team member is requesting an account, and once an account is assigned to him a log entry is made in tl1 that an account_id is assigned to him.
An account can be assigned to a Team only once.
An account can be assigned to x teams (In the above example we have only 3 teams).
An record can be reviewed x times(In the example above it can be reviewed 3 times).
I had a project where I had only 3 teams and each teams logs were stored in its own table, and I had this query which worked:
Example for Team1
SELECT `a1`.*
FROM `accounts` AS `a1`
LEFT JOIN `team1_logs` AS `tl1` ON tl1.account_id = a1.id
WHERE (tl1.account_id IS NULL)
AND (a1.counter < '3')
ORDER BY RAND()
LIMIT 1
a1 has a counter column which has a value that represents the number of times a row was shown to teams. Now my project can house x teams, we made the teams dynamic, so making a table log for each team isn't an option.
So in the above tables if i want an account to be reviewed(assigned to a team member) 3 times.
Account 1 can be reviewed 1 more time by any team that isn't 1 and 2
Account 2 can be reviewed 1 more time by any team that isn't 2 and 3
What would my new query need to look like if i want to get the next first available record, based on the 1-4 criteria from above?
The data in Table 2 is more than enough, you don't need to know any other
data to make the needed query.
team_id is an query input (since we need to output an account to the team
member)
Answer
Assuming that I am a team member of team 1
SELECT DISTINCT a.*
FROM accounts AS a
LEFT JOIN (
SELECT account_id, team_id FROM team_logs) AS tl1 ON a.id = tl1.account_id
WHERE a.id NOT IN (
SELECT account_id FROM team_logs WHERE team_id =1)
AND a.counter < 3
ORDER BY a.id ASC
If you just want to see which teams are not allowed to review the account again, join with a subquery that uses GROUP_CONCAT to get the list of teams that have reviewed it.
SELECT a.*, 3 - counter AS remaining_reviews, IFNULL(tl.already_reviewed, '') AS already_reviewed
FROM accounts AS a
LEFT JOIN (
SELECT account_id, GROUP_CONCAT(team_id ORDER BY team_id) AS already_reviewed
FROM team_logs
GROUP BY account_id) AS tl ON a.id = tl.account_id
WHERE a.counter < 3
DEMO

get 1st and 2nd highest vlaue rows in case of similar values

I have a table with the columns : id, status, value.
id status value
-- ------ -----
1 10 100
2 10 100
3 10 60
4 11 20
5 11 15
6 12 100
7 12 50
8 12 50
I would like to get the id and value of the first and second highest valued rows, from each status group. My table should have the following columns:
status, id of the first highest value, first highest value, id of second highest value, second highest value.
I should get:
status 1stID 1stValue 2ndID 2ndValue
------ ----- -------- ----- --------
10 1/2 100 2/1 100
11 4 20 5 15
12 6 100 7/8 50
I tried all kinds of solutions, but I couldn't find a solution for same-value 1st s (two rows with the same value, which happened to be the highest in that status group) or same-value seconds.
For example, in case of two rows sharing the highest value in their status group, this not-so-elegant query will return two rows with the same status, different 1sts and same 2nd:
SELECT 2nds.status, 1sts.id AS "1stID",1sts.value AS "1stValue",
2nds.id AS "2ndID",2nds.value AS "2ndValue"
FROM
(SELECT v.* FROM
(SELECT status, MAX(value) AS "SecMaxValue" FROM table o
WHERE value < (SELECT MAX(value) FROM table
WHERE status = o.status
GROUP BY status) AS m
INNER JOIN table v
ON v.status = m.status AND v.value = m.SecMaxValue) AS 2nds
INNER JOIN
(SELECT v.* FROM
(SELECT status, MAX(value) AS maxValue FROM table
GROUP BY status) AS m
INNER JOIN table v
ON v.status = m.status AND v.value = m.MaxValue) AS 1sts
ON 1sts.status = 2nds.status ;
This query will give me:
status 1stID 1stValue 2ndID 2ndValue
------ ----- -------- ----- --------
10 1 100 3 60
10 2 100 3 60
11 4 20 5 15
12 6 100 7 50
12 6 100 8 50
To conclude, I would like to find a solution in which:
a. if there are two rows with the highest value the query puts the details one of them in the column of the 1st and the details of other in 2nd (no mather which)
b. if there are two rows with the second highst value it puts the highest in its place and one of the seconds in the second place.
Is there a way to change the query above? someone has a nicer solution?
I came across several 1st and 2nd queries but they had the same problem - for example this solution: Finding the highest n values of each group in MySQL. it does not deliver 1st and 2nd in the same row, but the main problem it provides only one of the firsts.
Thanks
After spent a lot of time, finally I found a solution for above problem. Please try it out:
select 1st.status as Status,
SUBSTRING_INDEX(1st.id,'/',1) as 1stID,
1st.value as 1stValue,
(case when locate('/',1st.id) > 0 then SUBSTRING_INDEX(1st.id,'/',-1)
else 2nd.id
end) as 2ndID,
(case when locate('/',1st.id) > 0 then 1st.value
else 2nd.value
end) as 2ndValue
from
(
(select status, SUBSTRING_INDEX(Group_concat(id separator '/'),'/',2) as id,value
from t1
where (status,value) in (select status,value
from t1
group by status
having max(value))
group by status) 1st
inner join
(select status,id,value
from t1
where (status,value) not in (select status,value
from t1
group by status
having max(value))
group by status,value
order by status,value desc) 2nd
on 1st.status = 2nd.status)
group by 1st.status;
Just replace t1 with your tablename and it should work like a charm.
Click here for Updated Demo
If you have any doubt(s), feel free to ask.
Hope it helps!

Select sum of zero if no records in second table?

I did some research and learned about the COALESCE(sum(num), 0) function. The issue is the example I found only related to using one table.
I am calculating a sum from a second table, and if there are no records for an item in the second table, I still want it to show up in my query and have a sum of zero.
SELECT note.user, note.product, note.noteID, note.note, COALESCE(sum(noteTable.Score), 0) as points
FROM note, noteTable
WHERE note.user <> 3 AND note.noteID = noteTable.noteID
I am only recieving results if there is an entry in the second table noteTable. If there are scores added for a note, I still want them to show up in the result with a points value of zero.
Table Examples:
Note
user | product | noteID |note
3 1 1 Great
3 2 2 Awesome
4 1 3 Sweet
NoteTable
noteID | score
1 5
The query should show me this:
user | noteID | sum(points)
3 1 5
3 2 0
4 3 0
But I am only getting this:
user | noteID | sum(points)
3 1 5
http://sqlfiddle.com/#!9/aae812/2
SELECT
note.user,
note.product,
note.noteID, note.note,
COALESCE(sum(noteTable.Score),0) as points
FROM note
LEFT JOIN noteTable
ON note.noteID = noteTable.noteID
WHERE note.user <> 3
and I guess you should add:
GROUP BY note.noteid
if you expect to get SUM for every user. So you want to get more then 1 record back.
First, learn to use proper JOIN syntax and table aliases. The answer to your question is SUM() along with COALESCE():
SELECT n.user, n.product, n.noteID, n.note,
COALESCE(sum(nt.Score), 0) as points
FROM note n LEFT JOIN
noteTable nt
ON n.noteID = nt.noteID
WHERE n.user <> 3
GROUP BY n.user, n.product, n.noteID, n.note;
You also need a GROUP BY.

MySQL query, COUNT and SUM with two joined tables

I need a little help with a MySQL query.
I have two tables one table is a list of backlinks with a is_homepage (bool) flag. The second table is a list of the domains for all of the backlinks, a was link_found (bool) flag, and a url_count column which is the number of rows in the backlinks table that are associated with each domain.
Note that the domain_id column is the foreign key to the domain table id column. Heres some sample data.
backlinks
id domain_id is_homepage page_href
1 1 1 http://ablog.wordpress.com/
2 1 0 http://ablog.wordpress.com/contact/
3 1 0 http://ablog.wordpress.com/archives/
4 2 1 http://www.somewhere.org/
5 2 0 http://www.somewhere.org/page=3
6 3 1 http://www.great-fun-site.com/
7 3 0 http://www.great-fun-site.com/index.html
8 4 0 http://red.blgspot.com/page=7
9 4 0 http://blue.blgspot.com/page=9
domains
id url_count link_found domain_name
1 3 1 wordpress.com
2 2 0 somewhere.org
3 2 1 great-fun-site.com
4 2 1 blgspot.com
The results Im looking to get from the above data would be: count = 2, total = 5.
Im trying to get the count of rows from the domains table (count) and then the sum of the url_count (total) from the domains table WHERE link_found is 1 and where one of the links in the backlink table is_homepage is 1.
Here's the query I'm trying to work with.
SELECT SUM(1) AS count, SUM(`url_count`) total
FROM `domains` AS domain
LEFT JOIN `backlinks` AS link ON link.domain_id = domain.id
WHERE domain.id IN (
SELECT DISTINCT(bl.domain_id)
FROM `backlinks` AS bl
WHERE bl.tablekey_id = 11
AND bl.is_homepage = 1
)
AND domain.link_found = 1
AND link.is_homepage = 1
GROUP BY `domain`.`id`
The problem with this query is that it returns a row for each entry in the domains table. I think I might need one more sub query to add up the returned results but I'm not sure if that's correct. Does anyone see what I'm doing wrong? Thank you!
EDIT:
The problem I'm having is that if there are more than one homepage in the back-links table then its counted multiple times. I need to only count each domain once.
Well, you shouldn't have to do a group by as you are not selecting anything other than aggregated fields. I'm no mysql expert, but this should work:
SELECT count(d.id) as count, sum(d.url_count) as total from domains as d
inner join backlinks as b
on b.domain_id = d.id
Where d.Link_found = 1 and b. is_homepage = 1
The reason you're getting a row for each entry in the domains table is that you're grouping by domain.id. If you want grand totals only, just leave off the GROUP BY piece.
I think a fairly simple query will do the trick:
SELECT COUNT(*), SUM(domains.URL_Count)
FROM domains
WHERE domains.link_found = 1 AND domains.id IN (
SELECT domain_id FROM backlinks WHERE is_homepage = 1)
There's a working SQLFiddle here.
Thanks for the help. Sorry it was so hard to explain I need a MySQL fiddle :)
If anyones interested heres what I ened up with:
SELECT SUM(1) AS count, SUM(total) AS total
FROM
(
SELECT SUM(`url_count`) total
FROM `domains` AS domain
LEFT JOIN `backlinks` AS link ON link.domain_id = domain.id
WHERE domain.id IN (
SELECT DISTINCT(bl.domain_id)
FROM `backlinks` AS bl
WHERE bl.tablekey_id = 11
AND bl.is_homepage = 1
)
AND domain.link_found = 1
AND link.is_homepage = 1
GROUP BY `domain`.`id`
) AS result

Access Totals Query Not Necessarily Returning First Record

I have a table of data like this:
id user_id A B C
=====================
1 15 1 2 3
2 15 1 2 5
3 20 1 3 9
4 20 1 3 7
I need to remove duplicate user ids and keep the record that sorts lowest when sorting by A then B then C. So using the above table, I set up a temp query (qry_temp) that simply does the sort--first on user_id, then on A, then on B, then on C. It returns the following:
id user_id A B C
====================
1 15 1 2 3
2 15 1 2 5
4 20 1 3 7
3 20 1 3 9
Then I wrote a Totals Query based on qry_temp that just had user_id (Group By) and then id (First), and I assumed this would return the following:
user_id id
===========
15 1
20 4
But it doesn't seem to do that--instead it appears to be just returning the lowest id in a group of duplicate user ids (so I get 1 and 3 instead of 1 and 4). Shouldn't the Totals query use the order of the query it's based upon? Is there a property setting in the query that might impact this or another way to get what I need? If it helps, here is the SQL:
SELECT qry_temp.user_id, First(qry_temp.ID) AS FirstOfID
FROM qry_temp
GROUP BY qry_temp.user_id;
You need a different type of query, for example:
SELECT tmp.id,
tmp.user_id,
tmp.a,
tmp.b,
tmp.c
FROM tmp
WHERE (( ( tmp.id ) IN (SELECT TOP 1 id
FROM tmp t
WHERE t.user_id = tmp.user_id
ORDER BY t.a,
t.b,
t.c,
t.id) ));
Where tmp is the name of your table. First, Last, Min and Max are not dependent on a sort order. In relational databases, sort orders are quite ephemeral.