MySQL ( v5.7.30 ) query AVG of AVGs - mysql

So I have these 2 tables:
jobs:
-------------------------------------------------
id business_id other_columns
-------------------------------------------------
1 223 xxxxxx
-------------------------------------------------
1 12 xxxxxx
-------------------------------------------------
businesses_ratings:
--------------------------------------------------------------------------------------
id business_id professional communication safety respectful dependability
--------------------------------------------------------------------------------------
1 223 4 2 5 4 3
--------------------------------------------------------------------------------------
2 223 3 5 2 4 5
--------------------------------------------------------------------------------------
3 223 1 2 5 4 4
--------------------------------------------------------------------------------------
I want to select the jobs of a particular business_id, and append to each job the overall rating of that business_id, computed as AVG((AVG(professional), AVG(communication), AVG(safety), AVG(respectful), AVG(dependability))
Can I achieve this in one query?
LE: I'll append here the query I've tried ( containing also the WHERE clause, maybe it'll help explaining better what I need to achieve. Also the thrown error:
SELECT * FROM jobs
CROSS JOIN (
SELECT count(*) totalJobs FROM jobs
WHERE (
( JSON_CONTAINS(skills, '{"id":1,"val":"Carpenter"}') )
AND NOT JSON_CONTAINS(workers, '{"id":6,"fullname":"Cip"}')
AND NOT JSON_CONTAINS(applicants, '{"id":6,"fullname":"Cip"}')
)
) ttl
CROSS JOIN (
SELECT AVG(
(SELECT AVG(professional) FROM businesses_ratings WHERE business_id=jobs.business_id) +
(SELECT AVG(communication) FROM businesses_ratings WHERE business_id=jobs.business_id) +
(SELECT AVG(safety) FROM businesses_ratings WHERE business_id=jobs.business_id) +
(SELECT AVG(respectful) FROM businesses_ratings WHERE business_id=jobs.business_id) +
(SELECT AVG(dependability) FROM businesses_ratings WHERE business_id=jobs.business_id)
) business_rating FROM businesses_ratings WHERE business_id=jobs.business_id
) avg
WHERE (
( JSON_CONTAINS(skills, '{"id":1,"val":"Carpenter"}') )
AND NOT JSON_CONTAINS(workers, '{"id":6,"fullname":"Cip"}')
AND NOT JSON_CONTAINS(applicants, '{"id":6,"fullname":"Cip"}')
)
ORDER BY start_date LIMIT 3
and the error:
Unknown column 'jobs.business_id' in 'where clause'

I think you want aggregation and window functions:
select
business_id,
rank() over(
order by avg(professional) + avg(communication) + avg(safety)
) as rn
from businesses_ratings
group by business_id
You can expand the order by clause of the window function with additional columns as needed.
I am quite skeptical about the interest of ranking by the average of the 3 averages - but the above query seems like a reasonable interpretation of what you ask for.
In earlier versions, one option uses user-defined variables to compute the rank:
select t.*, #rn := #rn + 1 rn
from (
select
business_id,
avg(professional) + avg(communication) + avg(safety) sum_avg
from businesses_ratings
group by business_id
order by sum_avg
) t
cross join (select #rn := 0) x

Your query seems way more complicated than necessary. I think you want:
select br.business_id,
avg(professional),
avg(communication),
avg(safety),
avg(respectful),
avg(dependability),
(avg(professional) + avg(communication) + avg(safety) + avg(respectful) + avg(dependability)) / 5 as overall_avg
from businesses_ratings br
group by br.business_id;

Related

How to pick a row randomly based on a number of tickets you have

I have this table called my_users
my_id | name | raffle_tickets
1 | Bob | 3
2 | Sam | 59
3 | Bill | 0
4 | Jane | 10
5 | Mike | 12
As you can see Sam has 59 tickets so he has the highest chance of winning.
Chance of winning:
Sam = 59/74
Bob = 3/74
Jane = 10/74
Bill = 0/74
Mike = 12/74
PS: 74 is the number of total tickets in the table (just so you know I didn't randomly pick 74)
Based on this, how can I randomly pick a winner, but ensure those who have more raffles tickets have a higher chance of being randomly picked? Then the winner which is picked, has 1 ticket deducted from their total tickets
UPDATE my_users
SET raffle_tickets = raffle_tickets - 1
WHERE my_id = --- Then I get stuck here...
Server version: 5.7.30
For MySQL 8+
WITH
cte1 AS ( SELECT name, SUM(raffle_tickets) OVER (ORDER BY my_id) cum_sum
FROM my_users ),
cte2 AS ( SELECT SUM(raffle_tickets) * RAND() random_sum
FROM my_users )
SELECT name
FROM cte1
CROSS JOIN cte2
WHERE cum_sum >= random_sum
ORDER BY cum_sum LIMIT 1;
For 5+
SELECT cte1.name
FROM ( SELECT t2.my_id id, t2.name, SUM(t1.raffle_tickets) cum_sum
FROM my_users t1
JOIN my_users t2 ON t1.my_id <= t2.my_id
WHERE t1.raffle_tickets > 0
GROUP BY t2.my_id, t2.name ) cte1
CROSS JOIN ( SELECT RAND() * SUM(raffle_tickets) random_sum
FROM my_users ) cte2
WHERE cte1.cum_sum >= cte2.random_sum
ORDER BY cte1.cum_sum LIMIT 1;
fiddle
You want a weighted pull from a random sample. For this purpose, variables are probably the most efficient solution:
select u.*
from (select u.*, (#t := #t + raffle_tickets) as running_tickets
from my_users u cross join
(select #t := 0, #r := rand()) params
where raffle_tickets > 0
) u
where #r >= (running_tickets - raffle_tickets) / #t and
#r < (running_tickets / #t);
What this does is calculate the running sum of tickets and then divide by the number of tickets to get a number between 0 and 1. For example this might produce:
my_id name raffle_tickets running_tickets running_tickets / #t
1 Bob 3 3 0.03571428571428571
2 Sam 59 62 0.7380952380952381
4 Jane 10 72 0.8571428571428571
5 Mike 12 84 1
The ordering of the original rows doesn't matter -- which is why there is no ordering in the subquery.
The ratio is then used with rand() to select a particular row.
Note that in the outer query, #t is the total number of tickets.
Here is a db<>fiddle.

SQL: How to count items in a specific ordernation

This is my table:
PACKAGE_ID ITEM_ID
1 1
1 2
1 3
2 4
2 5
3 6
4 7
4 8
I want a new column called count that count 1 to N according to package ID. E.g.:
PACKAGE_ID ITEM_ID COUNT
1 1 1
1 2 2
1 3 3
2 4 1
2 5 2
3 6 1
4 7 1
4 8 2
Thanks!
p.s: I'm using MariaDb 10.1
You can use window function.
SELECT PACKAGE_ID, ITEM_ID
, ROW_NUMBER() OVER (PARTITION BY PACKAGE_ID ORDER BY PACKAGE_ID, ITEM_ID) AS THE_COUNT
FROM your_table
In pre 8.0 MySQL, the fastest method is probably variables:
select t.*,
(#rn := if(#p = package_id, #rn + 1,
if(#p := package_id, 1, 1)
)
) as counter
from t cross join
(select #rn := 0, #p := -1) params
order by package_id, item_id;
Obviously an index on (package_id, item_id) would benefit this query.
In MySQL 8+ or similar versions of MariaDB, use row_number():
You can use subquery :
select *, (select count(1)
from table t1
where t1.PACKAGE_ID = t.PACKAGE_ID and
t1.ITEM_ID <= t.ITEM_ID
) as COUNT
from table t;

MySQL select not equal to limit in subquery

MySQL #1235 - This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
Given 1 table as following
Item | Name | Price
----- ------------ --------
1 | Adidas | 310.00
2 | Nike Run | 30.00
3 | Puma | 150.00
4 | Nike Women | 20.00
5 | NB | 20.00
Would like to select records and return the sum amount. Do not sum up the 2 highest prices' record.
SELECT SUM(Price) as total_amount
FROM `test`
WHERE Item NOT IN (
SELECT Price
FROM `test`
ORDER BY Price DESC
LIMIT 2)
Expected Result:
total_amount
------------
70.00
How to use JOIN or alternative LIMIT in Subquery in this query?
Thank you.
You need a temp table:
SELECT SUM(Price) FROM test WHERE Item NOT IN (
SELECT * FROM (
SELECT Item FROM test ORDER BY Price DESC LIMIT 2
) AS tmp
)
Here's one option using a subquery with limit / offset:
select sum(price)
from (
select *
from test
order by price desc
limit 999999999
offset 2
) t
SQL Fiddle Demo
Just make sure the limit value is greater than the number of potential rows (which evidently is 18446744073709551615)...
Or you could use user-defined variables:
select sum(price)
from (
select *, #rn:=#rn + 1 rn
from test cross join (select #rn:= 0) t
) t
where rn > 2
If you looking to exclude the 2 highest prices which could be more than 2 records, this will also work with user defined variables:
select sum(price)
from (
select *, #rn:=if(#prevPrice=price, #rn,
if(#prevPrice:=price, #rn + 1, #rn + 1)) rn
from test cross join (select #rn:= 0, #prevPrice:= null) t
) t
where rn > 2
More Fiddle

SQL filter rows without join

I'm always "irk" by unnecessary join. But in this case, I wonder if it's possible to not use join.
This is an example of the table I have:
id | team | score
1 | 1 | 300
2 | 1 | 257
3 | 2 | 127
4 | 2 | 533
5 | 3 | 459
This is what I want:
team | score | id
1 | 300 | 1
2 | 533 | 4
3 | 459 | 5
Doing a query looking like this:
(basically: who's the best player of each team)
SELECT team, MAX(score) AS score, id
FROM my_table
GROUP BY team
But I get something like that:
team | score | id
1 | 300 | 1
2 | 533 | 3
3 | 459 | 5
But it's not the third player that got 533 points, so the result have no consistency.
Is it possible to get truthworthy results without joining the table with itself? How to achieve that?
You can do it without joins by using subquery like this:
SELECT id, team, score
FROM table1 a
WHERE score = (SELECT MAX(score) FROM table1 b WHERE a.team = b.team);
However in big tables this can be very slow as you have to run the whole subquery for every row in your table.
However there's nothing wrong with using join to filter results like this:
SELECT id, team, score FROM table1 a
INNER JOIN (
SELECT MAX(score) score, team
FROM table1
GROUP BY team
) b ON a.score = b.score AND a.team = b.team
Although joining itself is quite expensive, this way you only have to run two actual queries regardless how many rows are in your tables. So in big tables this method can still be hundreds, if not thousands of times faster than the first method with subquery.
You can use variables:
SELECT id, team, score
FROM (
SELECT id, team, score,
#seq := IF(#t = team, #seq,
IF(#t := team, #seq + 1, #seq + 1)) AS seq,
#grp := IF(#t2 = team, #grp + 1,
IF(#t2 := team, 1, 1)) AS grp
FROM mytable
CROSS JOIN (SELECT #seq := 0, #t := 0, #grp := 0, #t2 := 0) AS vars
ORDER BY score DESC) AS t
WHERE seq <= 3 AND grp = 1
Variable #seq is incremented each time a new team is met as the records are being processed in descending score order. Variable #grp is used to enumerate records within each team partition. Records with #grp = 1 are the ones having the greatest score value within the team slice.
Demo here
Unfortantly , MySQL doesn't support window functions like ROW_NUMBER() which could have solved this easily.
There are several ways on doing that:
NOT EXISTS() :
SELECT * FROM YourTable t
WHERE NOT EXISTS(SELECT 1 FROM YourTable s
WHERE t.team = s.team AND s.score > t.score)
NOT IN() :
SELECT * FROM YourTable t
WHERE (t.team,t.score) IN(SELECT s.team,MAX(s.score)
FROM YourTable s
GROUP BY s.team)
A correlated query:
SELECT distinct t.id,t.team,
(SELECT s.score FROM YourTable s
WHERE s.team = t.team
ORDER BY s.score DESC
LIMIT 1)
FROM YourTable t
Or a join which I understand you already have.
EDIT : I take my words back, you can do it with a variable like #GiorgosBetsos solution.
You could do something like this:
SELECT team, score, id
FROM (SELECT *
,RANK() OVER
(PARTITION BY team ORDER BY score DESC) AS Rank
FROM my_table) ranked_result
WHERE Rank = 1;
Some info on Rank functionality: Clicketyclickclick

Grouping to find min,max for each group

This would be relatively easy if I only cared about a single min and max for each group, the problem is my requirement is to find the various boundaries. An example data set is as follows:
BoundaryColumn GroupIdentifier
1 A
3 A
4 A
7 A
8 B
9 B
11 B
13 A
14 A
15 A
16 A
What I need from the sql is a result set as follows:
min max groupid
1 7 A
8 11 B
13 16 A
Essentially finding the boundaries for each cluster of the groups.
The data would be stored in either oracle11g or mysql so syntax can be provided for either platform.
A disclaimer: It's a lot easier to query partial results and process something like this with a front-end language. That said...
The following query works for Oracle (which supports analytic queries) but not for MySQL (which does not). There's a SQL Fiddle here.
WITH BoundX AS (
SELECT * FROM (
SELECT
BoundaryColumn,
GroupIdentifier,
LAG(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLag,
LEAD(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLead
FROM MyTable
ORDER BY BoundaryColumn
)
WHERE GIDLag IS NULL OR GroupIdentifier <> GIDLag
OR GIDLead IS NULL OR GroupIdentifier <> GIDLead
)
SELECT MIN, MAX, GROUPID
FROM (
SELECT
BoundaryColumn AS MIN,
LEAD(BoundaryColumn) OVER (ORDER BY BoundaryColumn) AS MAX,
GroupIdentifier AS GROUPID,
GIDLag,
GIDLead
FROM BoundX
)
WHERE GROUPID = GIDLead
Here's the logic, step by step. You may be able to improve on this, because I get the feeling there's one subquery too many here...
This query pulls the prior and following GroupIdentifier values into each row:
SELECT
BoundaryColumn,
GroupIdentifier,
LAG(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLag,
LEAD(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLead
FROM MyTable
ORDER BY BoundaryColumn
The result looks like this:
BoundaryColumn GroupIdentifier GIDLag GIDLead
1 A A
3 A A A
4 A A A
7 A A B
8 B A B
9 B B B
11 B B A
13 A B A
14 A A A
15 A A A
16 A A
If you add logic to get rid of all the rows where GIDLag = GIDLead = GroupIdentifier, you'll end up with the boundaries:
WITH BoundX AS (
SELECT * FROM (
SELECT
BoundaryColumn,
GroupIdentifier,
LAG(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLag,
LEAD(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLead
FROM MyTable
ORDER BY BoundaryColumn
)
WHERE GIDLag IS NULL OR GroupIdentifier <> GIDLag
OR GIDLead IS NULL OR GroupIdentifier <> GIDLead
)
SELECT
BoundaryColumn AS MIN,
LEAD(BoundaryColumn) OVER (ORDER BY BoundaryColumn) AS MAX,
GroupIdentifier AS GROUPID,
GIDLag,
GIDLead
FROM BoundX
With this addition the results are:
MIN MAX GROUPID GIDLAG GIDLEAD
--- --- ------- ------ -------
1 7 A A
7 8 A A B
8 11 B A B
11 13 B B A
13 16 A B A
16 A A
Finally, include only those rows where GroupID = GIDLead. That's the query at the top of this answer. The results are:
MIN MAX GROUPID
--- --- -------
1 7 A
8 11 B
13 16 A
Take a look at this site regarding "runs" of data: http://www.sqlteam.com/article/detecting-runs-or-streaks-in-your-data
Armed with the knowledge provided in that link, you could write a query like this:
SELECT BoundaryColumn,
GroupIdentifier,
(
SELECT COUNT(*)
FROM Table T
WHERE T.GroupIdentifier <> TR.GroupIdentifier
AND T.BoundaryColumn <= TR.BoundaryColumn
) as RunGroup
FROM Table TR
Using this information, you could then group by "RunGroup", and select the GroupIdentifier and min/max BoundaryColumn.
EDIT: I've felt the peer pressure, here's an SQLFiddle with my version of the answer: http://www.sqlfiddle.com/#!8/9a24c/4/0
Another approach(Oracle). Here we simply divide result set returned by the query issued against table t1(your table) into logical groups(grp). Each new group starts when a value of GroupIdentifier changes:
select min(q.BoundaryColumn) as MinB
, max(q.BoundaryColumn) as MaxB
, max(q.GroupIdentifier) as groupid
from ( select s.BoundaryColumn
, s.GroupIdentifier
, sum(grp) over(order by s.BoundaryColumn) as grp
from ( select BoundaryColumn
, GroupIdentifier
, case
when GroupIdentifier <> lag(GroupIdentifier)
over(order by BoundaryColumn)
then 1
end as grp
from t1) s
) q
group by q.grp
Result:
MINB MAXB GROUPID
---------- ---------- -------
1 7 A
8 11 B
13 16 A
SQLfiddle Demo