MySQL Rating With Weight - mysql

I want to create a rating with weight depending on number of votes.
So, 1 voting with 5 can't be better than 4 votings with 4.
I found this math form:
bayesian = ( (avg_num_votes * avg_rating) + (this_num_votes * this_rating) ) / (avg_num_votes + this_num_votes)
How can I make a MySQL SELECT to get the ID of the best rating image.
I got a table for IMAGE, and a table for VOTING
VOTING:
id
imageID
totalVotes
avgVote
I think I got to do this with SELECT in SELECT, but how?

A first step is to calculate avg_num_votes and avg_rating:
SELECT
SUM(totalVotes)/COUNT(*) AS avg_num_votes,
SUM(avgVote)/COUNT(*) AS avg_rating
FROM voting;
If you can live with a small error, it might be good enough to calculate that once in a while.
Now using your formula and the values above, you can run the weighing query. As a small optimization I precalculate avg_num_votes * avg_rating and call it avg_summand
SELECT
voting.*, -- or whatever fields you need
($avg_summand+totalVotes*avgVote)/($avg_num_votes+totalVotes) AS bayesian
FROM voting
ORDER BY bayesian DESC
LIMIT 1;
Edit
You could run this as a join:
SELECT
voting.*, -- or whatever fields you need
(avg_num_votes*avg_rating+totalVotes*avgVote)/(avg_num_votes+totalVotes) AS bayesian
FROM voting,
(
SELECT
SUM(totalVotes)/COUNT(*) AS avg_num_votes,
SUM(avgVote)/COUNT(*) AS avg_rating
FROM voting AS iv
) AS avg
ORDER BY bayesian DESC
LIMIT 1;
But this will calculate sum and average on every single query - call it a performance bomb.

Related

SQL Query - selecting higher priority rows more often

I am trying to do SQL code in mysqli query to select rows with higher priority more often. I have a DB where all posts are sorted by priority, but I want it select like this (10 - the highest priority):
**Priority**
10
3
10
9
7
10
9
1
10
How can I do this? I have tried that to solve by more ways but no result. Thank you.
If you want to sample your data with preference to higher priorities, you could do something like this:
SELECT *
FROM (
SELECT OrderDetailID
,mod(OrderDetailID, 10) + 1 AS priority
,rand() * 10 AS rand_priority
FROM OrderDetails
) A
WHERE rand_priority < priority
ORDER BY OrderDetailID
This query runs in MySQL Tryit from W3Schools.
mod(OrderDetailID, 10) + 1 simulates a 1-10 priority - your table just has this value in it already
rand() * 10 gives you a random number between 0 and 10
Then by filtering to only ones where the random number is less than the priority, you get a result set where the higher priorities are more likely.
You may use rank function if your MySQL version supports it. It will order your data by priority in descending order and ranks each row. If the two rows have same priority then both rows will have same ranking. Then you can filter out the first rank data which will give you highest priority rows always.
Select * FROM
(
SELECT
col1,
col2,
priority,
RANK() OVER w AS 'rank'
FROM MyTable
WINDOW w AS (ORDER BY priority)
) MyQuery
Where rank = 1
Note : Syntax might be incorrect, please feel to edit the query.
This post might help you for ranking if your MySql version doesn't support Rank.

Average value for top n records?

i have this SQL Schema: http://sqlfiddle.com/#!9/eb34d
In particular these are the relevant columns for this question:
ut_id,ob_punti
I need to get the average of the TOP n (where n is 4) values of "ob_punti" for each user (ut_id)
This query returns the AVG of all values of ob_punti grouped by ut_id:
SELECT ut_id, SUM(ob_punti), AVG(ob_punti) as coefficiente
FROM vw_obiettivi_2015
GROUP BY ut_id ORDER BY ob_punti DESC
But i can't figure out how to get the AVG for only the TOP 4 values.
Can you please help?
It will give SUM and AVG of top 4. You may replace 4 by n to get top n.
select ut_id,SUM(ob_punti), AVG(ob_punti) from (
select #rank:=if(#prev_cat=ut_id,#rank+1,1) as rank,ut_id,ob_punti,#prev_cat:=ut_id
from Table1,(select #rank:=0, #prev_cat:="")t
order by ut_id, ob_punti desc
) temp
where temp.rank<=4
group by ut_id;
This is not exactly related to the question asked, I am placing this because some one might get benefited.
I got the hackerearth problem to write mysql query to fetch top 10 records based on average of product quantity in stock available.
SELECT productName, avg(quantityInStock) from products
group by quantityInStock
order by quantityInStock desc
limit 10
Note: If someone can make better the above query, please welcome to modify.

Mysql "where rand()" performance

I'm trying to pick random articles from my database, where high rating articles have a higher chance of getting picked
SELECT * FROM articles WHERE RAND()>0.9 ORDER BY rating DESC LIMIT 3
My question is:
Will it random the whole table, or just until it finds 3 articles that random a number higher then 0.9
If you have INDEX(rating), that query will probably fetch 3 or 4 (3/(1-0.1)) rows before finding the 3.
But that does not give you "high rating articles have a higher chance of getting picked" at all. It merely gives you a random 90% of the highest ranking rows.
This might give you what you want, but with a full table scan:
SELECT *
FROM articles
ORDER BY rating * RAND() DESC
LIMIT 3;

MYSQL SELECT random on large table ORDER BY SCORE [duplicate]

This question already has answers here:
Optimizing my mysql statement! - RAND() TOO SLOW
(6 answers)
Closed 8 years ago.
I have a large mysql table with about 25000 rows. There are 5 table fields: ID, NAME, SCORE, AGE,SEX
I need to select random 5 MALES order BY SCORE DESC
For instance, if there are 100 men that score 60 each and another 100 that score 45 each, the script should return random 5 from the first 200 men from the list of 25000
ORDER BY RAND()
is very slow
The real issue is that the 5 men should be a random selection within the first 200 records. Thanks for the help
so to get something like this I would use a subquery.. that way you are only putting the RAND() on the outer query which will be much less taxing.
From what I understood from your question you want 200 males from the table with the highest score... so that would be something like this:
SELECT *
FROM table_name
WHERE age = 'male'
ORDER BY score DESC
LIMIT 200
now to randomize 5 results it would be something like this.
SELECT id, score, name, age, sex
FROM
( SELECT *
FROM table_name
WHERE age = 'male'
ORDER BY score DESC
LIMIT 200
) t -- could also be written `AS t` or anything else you would call it
ORDER BY RAND()
LIMIT 5
I dont think that sorting by random can be "optimised out" in any way as sorting is N*log(N) operation. Sorting is avoided by query analyzer by using indexes.
The ORDER BY RAND() operation actually re-queries each row of your table, assigns a random number ID and then delivers the results. This takes a large amount of processing time for table of more than 500 rows. And since your table is containing approx 25000 rows then it will definitely take a good amount of time.

Select a Portion of Vast Data Over Time with MySQL

I have hundreds of thousands of price points spanning 40 years plus. I would like to construct a query that will only return 3000 total data points, with the last 500 being the most recent data points, and the other 2500 being just a sample of the rest of the data, evenly distributed.
Is it possible to do this in one query? How would I select just a sample of the large amount of data? This is a small example of what I mean for getting just a sample of the other 2500 data points:
1
2
3
4
5
6
7
8
9
10
And I want to return something like this:
1
5
10
Here's the query for the last 500:
SELECT * FROM price ORDER BY time_for DESC LIMIT 500
I'm not sure how to go about getting the sample data from the other data points.
Try this:
(SELECT * FROM price ORDER BY time_for DESC LIMIT 500)
UNION ALL
(SELECT * FROM price WHERE time_for < (SELECT time_for FROM price ORDER BY time_for LIMIT 500, 1) ORDER BY rand() LIMIT 2500)
ORDER BY time_for
Note: It's probably going to be slow. How big is this table?
It might be faster to only get the primary ID from all these rows, then join it to the original in a secondary query once it's narrowed down. This is because ORDER BY rand() LIMIT has to sort the entire table. If the table is large this can take a LONG time, and a lot of disk space. Retrieving only the ID reduces the necessary disk space.
The previous answer is good, but you did specify that you want the results to be evenly distributed so I'll add this possibility too. By iterating a counter over the rows you can use a MOD operator to sample an even distribution. I don't have a MYSQL install right now to test this so apologies if the syntax isn't 100% spot on. But it should be close enough and may give you some ideas.
( SELECT p1.*
FROM price p1
ORDER BY p1.time_for DESC
LIMIT 500 )
UNION ALL
( SELECT #i := #i + 1 AS row_num,
p2.*
FROM price p2,
(SELECT #i: = 0)
WHERE row_num > 500
AND (row_num % 500) = 0
ORDER BY time_for DESC )
The first query gives the 500 latest rows. The second query gives every 500th row after that, thus returning an even distribution from the rest of the data. Obviously you can tune this parameter to achieve the desired sample spacing. Or base it on the total number of rows in the table to calculate the necessary spacing to give exactly 2500 records.