Mysql Bayesian and sort by star ratings - mysql

Say I have two tables. businesses and reviews for businesses.
businesses table:
+----+-------+
| id | title |
+----+-------+
reviews table:
+----+-------------+---------+------+
| id | business_id | message | rate |
+----+-------------+---------+------+
each review has a rate ( 1 to 5 stars )
I want to sort businesses by their reviews rates, based on Bayesian Ranking with condition of having at least 2 reviews.
Here is my query:
SELECT b.id,
(SELECT COUNT(r.rate) as rr FROM reviews r WHERE r.business_id = b.id) as rr,
(SELECT
((COUNT(r.rate) / (COUNT(r.rate) + 2)) AVG(r.rate) +
(2 /(COUNT(r.rate) + 2)) 4)
FROM reviews r where r.business_id = b.id AND rr > 2
) as score
FROM businesses b
order by score desc
LIMIT 4
this will output me:
+------+----+------------+
| id | rr | score |
+------+----+------------+
| 992 | 14 | 4.31250000 |
+------+----+------------+
| 237 | 3 | 4.2000000 |
+------+----+------------+
| 19 | 5 | 4.0000000 |
+------+----+------------+
| 1009 | 12 | 3.9285142 |
+------+----+------------+
I have two questions:
as you see in ((COUNT(r.rate) / (COUNT(r.rate) + 2)) AVG(r.rate) +
(2 /(COUNT(r.rate) + 2)) 4) FROM reviews r where r.business_id = b.id AND rr > 2 ) some functions are running more than once, like COUNT or AVG. are they running once in background and maybe caches the resuslt? OR run for every single call?
is there any equivalent query for this but more optimize?
thanks in advance.

I would hope that MySQL would optimise the multiple counts away, but not certain.
However you could rearrange you query to join against a sub query. This way you are not performing 2 sub queries for every row.
SELECT b.id,
sub0.rr,
sub0.score
FROM businesses b
INNER JOIN
(
SELECT r.business_id,
COUNT(r.rate) AS rr ,
((COUNT(r.rate) / (COUNT(r.rate) + 2)) AVG(r.rate) + (2 /(COUNT(r.rate) + 2)) 4) AS score
FROM reviews r
GROUP BY r.business_id
HAVING rr > 2
) sub0
ON sub0.business_id = b.id
ORDER BY score DESC
LIMIT 4
Note that the result here are very slightly different as it will exclude records with only 2 reviews, while your query will still return them but with a score of NULL. I have left in the apparent missing operators (ie, before AVG(r.rate) and before 4) AS score from your original query.
Using the above idea you could recode it to return both the count and the average rate in the sub query, and just use the values of those returned columns for the calculation.
SELECT b.id,
sub0.rr,
((rr / (rr + 2)) arr + (2 /(rr + 2)) 4) AS score
FROM businesses b
INNER JOIN
(
SELECT r.business_id,
COUNT(r.rate) AS rr ,
AVG(r.rate) AS arr
FROM reviews r
GROUP BY r.business_id
HAVING rr > 2
) sub0
ON sub0.business_id = b.id
ORDER BY score DESC
LIMIT 4

Related

Query to group without lost IF function

I created a query to search for all my stock products that are in orders placed, and I created an alias "total_vendido" that adds the products when they are kits or units, so far this is ok. But now I need to group the sizes and add this "total_vendido" alias by size.
Query:
SELECT `gp`.`id`, `gp`.`data`, `gp`.`status`, `gp`.`situacao`, `gp`.`nome`,
`gp`.`razao_social`, `gp`.`email`, `gp`.`telefone`,
`itens`.*,
IF(itens.tipo = 'K',
SUM(itens.qtde_prod) * itens.qtde_lote,
SUM(itens.qtde_prod)
) AS total_vendido,
`estoq`.`titulo`
FROM `ga845_pedidos_view` `gp`
JOIN `ga845_pedido_itens` `itens` ON `itens`.`pedido_id` = `gp`.`id`
JOIN `ga845_produtos` `prod` ON `prod`.`id` = `itens`.`produtos_id`
JOIN `ga845_produtos_estoque` `estoq` ON `estoq`.`id` = `prod`.`estoques_id`
WHERE `gp`.`situacao` IN('Pedido Realizado', 'Pagamento Aprovado',
'Pedido em Separação', 'Pedido Separado')
AND date(gp.data) >= '2020-07-25'
AND date(gp.data) <= '2020-07-25'
AND `estoq`.`id` IN('24')
GROUP BY `itens`.`tamanho_prod`, `estoq`.`id`
ORDER BY `estoq`.`id` ASC, `itens`.`tamanho_prod` ASC
Current result (only important columns)
tamanho_prod | tipo | total_vendido
G | K | 5
G | U | 1
M | K | 1
P | U | 8
Expected result (only important columns)
tamanho_prod | total_vendido
G | 6
M | 1
P | 8
Code related to Expected result (only important columns)
SELECT
, `itens`.`tamanho_prod`
, SUM( IF(itens.tipo = 'K',
itens.qtde_prod * itens.qtde_lote,
itens.qtde_prod
) AS total_vendido
FROM `ga845_pedidos_view` `gp`
JOIN `ga845_pedido_itens` `itens` ON `itens`.`pedido_id` = `gp`.`id`
JOIN `ga845_produtos` `prod` ON `prod`.`id` = `itens`.`produtos_id`
JOIN `ga845_produtos_estoque` `estoq` ON `estoq`.`id` = `prod`.`estoques_id`
WHERE `gp`.`situacao` IN('Pedido Realizado', 'Pagamento Aprovado',
'Pedido em Separação', 'Pedido Separado')
AND date(gp.data) >= '2020-07-25'
AND date(gp.data) <= '2020-07-25'
AND `estoq`.`id` IN('24')
GROUP BY `itens`.`tamanho_prod`
ORDER BY `itens`.`tamanho_prod` ASC
if you want an aggregated result just for itens.tamanho_prod .. then you should use group by only for this column ... and move the SUM() outside the if condition

SQL Order results by Match Against Relevance and display the price based on sellers rank

Looking to display results based on 'relevance' of the users search along with the price of the seller that ranks highest. A live example to what i'm after is Amazons search results, now I understand their algorithm is extremely complicated, but i'm after a simplified version.
Lets say we search for 'Jumper' the results that are returned are products related to 'Jumper' but then the price is not always the cheapest is based on the sellers rank. The seller with the highest rank gets his/hers prices displayed.
Heres what I have been working on but not giving me the expected results at mentioned above, and to be honest I don't think this is very efficient.
SELECT a.catalogue_id, a.productTitle, a.prod_rank, b.catalogue_id, b.display_price, b.sellers_rank
FROM
(
SELECT c.catalogue_id,
c.productTitle,
MATCH(c.productTitle) AGAINST ('+jumper*' IN BOOLEAN MODE) AS prod_rank
FROM catalogue AS c
WHERE c.catalogue_id IN (1, 2, 3)
) a
JOIN
(
SELECT inventory.catalogue_id,
inventory.amount AS display_price,
(accounts.comsn + inventory.quantity - inventory.amount) AS sellers_rank
FROM inventory
JOIN accounts ON inventory.account_id = accounts.account_id
WHERE inventory.catalogue_id IN (1, 2, 3)
) AS b
ON a.catalogue_id = b.catalogue_id
ORDER BY a.prod_rank DESC
LIMIT 100;
Sample Tables:
Accounts:
----------------------------
account_id | comsn
----------------------------
1 | 100
2 | 9999
Catalogue:
----------------------------
catalogue_id | productTitle
----------------------------
1 | blue jumper
2 | red jumper
3 | green jumper
Inventory:
-----------------------------------------------
product_id | catalogue_id | account_id | quantity | amount |
-----------------------------------------------
1 | 2 | 1 | 6 | 699
2 | 2 | 2 | 2 | 2999
Expected Results:
Product Title:
red jumper
Amount:
29.99 (because he/she has sellers rank of: 7002)
First, you should limit the results only to the matches for the first subquery:
Second, you should eliminate the second subquery:
SELECT p.catalogue_id, p.productTitle, p.prod_rank,
i.amount as display_price,
(a.comsn + i.quantity - i.amount)
FROM (SELECT c.catalogue_id, c.productTitle,
MATCH(c.productTitle) AGAINST ('+jumper*' IN BOOLEAN MODE) AS prod_rank
FROM catalogue AS c
WHERE c.catalogue_id IN (1, 2, 3)
HAVING prod_rank > 0
) p JOIN
inventory i
ON i.catalogue_id = c.catalogue_id join
accounts a
ON i.account_id = a.account_id
ORDER BY c.prod_rank DESC
LIMIT 100;
I'm not sure if you can get rid of the final ORDER BY. MATCH with JOIN can be a bit tricky in that respect. But only ordering by the matches should help.

Count with conditions in different tables

I have 4 tables (songs, albums and two relation tables).
I have this 2 queries that I need to merge:
-QUERY 1)
SELECT l.name_language, count(s.id_song)
FROM language as l
LEFT JOIN song_has_languages as s ON l.id_language = s.id_language
GROUP BY l.id_language
HAVING COUNT(s.id_song) > 0
ORDER BY name_language ASC
The output:
name_language | songs
English | 5
Spanish | 1
-QUERY 2)
SELECT l.name_language, count(a.id_album)
FROM language as l
LEFT JOIN album_has_languages as a ON l.id_language = a.id_language
GROUP BY l.id_language
HAVING COUNT(a.album)> 0
ORDER BY name_language ASC
The output:
name_language | albums
English | 5
French | 2
My goal is this output:
name_language | total |
English | 10 |
Spanish | 1 |
French | 2 |
I want to print only the languages with a song or an album.
You can do what you want with union all and aggregation:
select l.name_language, sum(num_songs + num_albums) as total
from language l left join
((select shl.id_language, count(*) as num_songs, 0 as num_albums
from song_has_languages shl
group by shl.id_language
) union all
(select ahl.id_language, 0 as num_songs, count(*) as num_albums
from album_has_languages ahl
group by ahl.id_language
)
) sa
on sa.id_language = l.id_language
group by l.id_language, l.name_language;
You could also express this with LEFT JOINs:
select l.name_language,
( coalesce(num_songs, 0) + coalesce(num_albums, 0) ) as total
from language l left join
(select shl.id_language, count(*) as num_songs
from song_has_languages shl
group by shl.id_language
) shl
on shl.id_language = l.id_language left join
(select ahl. id_language, count(*) as num_albums
from album_has_languages ahl
group by ahl.id_language
) ahl
on ahl.id_language = l.id_language;

mysql How to select an ID when it's values are equal to X

I have tried quite a lot of solutions and I decided to post it here to try and find a solution. Any little help is welcome (so I can learn too).
I have a table formed by ArticleID, UserID, and Votes (1/-1).
I want to select the ArticleID that contains a certain UserID and which SUM of Votes is equal to 1.
So far I arrived to:
SELECT catch.ID, votes.postid, catch.text, votes.userid, votes.value, catch.name FROM catch INNER JOIN votes ON catch.ID=votes.postid AND votes.userid=:iduser AND votes.value='1' ORDER BY ID DESC LIMIT 100
but this gives me an erroneous result, as it doesn't consider articles that have votes 1 and -1 (which SUM should be 0).
Thanks!
UPDATE
ID + Value + userid
1 | 1 | 54
1 | -1 | 54
3 | 1 | 54
7 | 1 | 56
7 | -1 | 56
Given the above table, and selecting just the user '54' the wanted result should be ID 3.
Is this what you want?
SELECT c.ID, v.postid, c.text, v.userid, v.value, c.name
FROM catch c INNER JOIN
votes v
ON c.ID = v.postid AND v.userid = :iduser
GROUP BY c.ID
HAVING SUM(v.value) = 1;
This is what you describe but it is a bit different from your query.
Try like this, but for :iduser set what particular id of user
SELECT catch.ID,
votes.postid,
catch.text,
votes.userid,
SUM(votes.value),
catch.name
FROM catch
LEFT JOIN votes ON catch.ID=votes.postid
WHERE votes.userid = :iduser and votes.value='1'
ORDER BY ID DESC
LIMIT 100

SQL Select with multiple search parameters using joins and subqueries

I have spent hours searching for an answer for my problem without satisfying results.
I want to select everything with one query from players, villages and alliances -tables and date and population from histories table.
Selection must be filtered with following rules:
Select latest information by date.
Select only if player has <= number of villages at the moment.
Select only if total population of player's villages is <= at the moment
and 3. are the ones causing my head hurt. How to add those to my query?
Here is my current query:
SELECT players.name AS player,
players.uid as uid,
players.tid,
villages.name AS village,
villages.vid as vid,
villages.fid as fid,
alliances.name AS alliance,
alliances.aid as aid,
SQRT( POW( least(abs($xcoord - villages.x),
400-abs($xcoord - villages.x)), 2 ) +
POW( least(abs($ycoord - villages.y),
400-abs($ycoord - villages.y)), 2 ) ) AS distance
FROM histories
LEFT JOIN players ON players.uid = histories.uid
LEFT JOIN villages ON villages.vid = histories.vid
LEFT JOIN alliances ON alliances.aid = histories.aid
LEFT JOIN histories h2
ON ( histories.vid = h2.vid AND histories.idhistory < h2.idhistory )
WHERE h2.vid IS NULL
AND histories.uid != $uid
AND SQRT( POW(least(abs($xcoord - villages.x),
400-abs($xcoord - villages.x)), 2 ) +
POW(least(abs($ycoord - villages.y),
400-abs($ycoord - villages.y)), 2 ) ) < $rad
ORDER BY distance
Notice: xcoord and ycoord are posted from the search form.
Example output:
PLayer| Village | Alliance | Distance
P1 | V1 | A1 | 1
P2 | V4 | A2 | 2
P1 | V2 | A1 | 3
P1 | V3 | A1 | 4
P2 | V5 | A2 | 5
Thank you in advance for helping. :)
This query can find players that have less than 2 villages. I just cant put my original query and this together. Is it even possible?
SELECT
b.*, count(b.uid) as hasvillages
FROM
histories b
WHERE
b.vid IN (SELECT a.vid FROM villages a)
GROUP BY
b.uid
HAVING
count(b.uid) < 2
HERE IS THE LINK TO SQLFIDDLE
HERE IS THE LINK TO PICTURE OF MY DATABASE EER DIAGRAM
After one week of try-outs I have finally found the answer.
With this query I can use following search parameters:
Find latest rows by date
Find rows by limiting the number of villages the player has.
Find rows by limiting the total population of villages the player has.
Find rows by calculating the distance.
Exclude players or alliances from selection.
Here is the query
SELECT players.name AS player, players.uid as uid, players.tid,
villages.name AS village, villages.vid as vid, villages.fid as fid,
alliances.name AS alliance, alliances.aid as aid,
SQRT( POW( least(abs(100 - villages.x),400-abs(100 - villages.x)), 2 ) +
POW( least(abs(100 - villages.y),400-abs(100 - villages.y)), 2 ) ) AS distance
FROM histories
LEFT JOIN players ON players.uid = histories.uid
LEFT JOIN villages ON villages.vid = histories.vid
LEFT JOIN alliances ON alliances.aid = histories.aid
WHERE histories.uid IN
(SELECT b.uid FROM histories b
WHERE (b.vid IN (SELECT a.vid FROM villages a) and b.date
in (select max(date) from histories))
GROUP BY b.uid HAVING count(b.uid) < 4 AND
sum(b.population) < 2000)
AND histories.uid != 1
and histories.date in (select max(date) from histories)
AND SQRT( POW( least(abs(100 - villages.x),400-abs(100 - villages.x)),2)+
POW( least(abs(100 - villages.y),400-abs(100 - villages.y)), 2 ) ) < 200
ORDER BY distance