How would you write SO's Popularity algorithm in MySQL?
The algorithm is detailed here: Popularity algorithm.
thanks!
It's relatively simple.
t = (time of entry post) - (Dec 8, 2005)
You would convert the date values to timestamps (you can use unix_timestamp), which gives you an integer that can be used in the rest of the comparisons.
x = upvotes - downvotes
This one should be pretty easy... obviously MySQL supports subtraction.
y = {1 if x > 0, 0 if x = 0, -1 if x < 0)
z = {1 if x < 0, otherwise x}
For these, take a look at MySQL's case statement.
log(z) + (y * t)/45000
MySQL has a log function, so this one should be easy too, just simple math.
And, you tie it all together with a select statement. You can store intermediate calculations in your select statement using user-defined variables. For example:
select #x := (upvotes - downvotes) as x,
(#x > 4) as isXGreaterThanFour
Related
I would like to calculate the entropy of a list in mysql.
Now I run this and move to python:
select group_concat(first_name), last_name
from table
group by last name
What I am looking for would be the equivalent of
entropy(first_name)
Returning a single number for each.
Similar to the below usage for numericals:
std(age)/avg(age)
EDIT- Partially answered: Thank you to commenter #IVO GELOV for a very efficient approximation:
SELECT LOG2(COUNT(DISTINCT column)) FROM Table
Based on solution above and an approximate of the t-test we reach comparative weighted entropy. Hacky, but works like a charm:
CASE
WHEN count(*)-1 < 6 THEN (1 + LOG2(COUNT(distinct first_name)))*5.61*power(count(*)-1,-0.71)
WHEN count(*)-1 >= 6 and cnt-1 < 27 THEN (1 + LOG2(COUNT(distinct first_name)))*2.2*power(count(*)-1,-0.081)
ELSE (1 + LOG2(COUNT(distinct first_name)))*1.815*power(count(*)-1,-0.02)
END as entropy
Defined for rows with count(*) > 1
I having around 500 excel sheets in .csv format with data captured for my experiment having following columns in place.
Now I need to calculate the following parameters using this data. I have done these in excel, however doing this repeatedly for each excel so many times is difficult, so I want to write an SQL query in PhpmyAdmin will help some time.
Last charecter typed - need to capture last charecter from the column 'CharSq'
Slope (in column J) =(B3-B2)/(A3-A2)
Intercept (in column K) =B2-(A2*(J3))
Angle (in degrees) =MOD(DEGREES(ATAN2((A3-A2),(B3-B2))), 360) -
Index of Difficulty =LOG(((E1/7.1)+1),2)
Speed Value length (if speed value length >3, then mark as 1 or else 0) = =IF(LEN(D3) >= 3, "1","0")
Wrong Sequence (if I3=I2,then mark search time, else actual time) =IF(I3=I2,"Search Time","Actual Time")
Mark charecter into (1,2,3) = =IF(I2="A",1, IF(I2="B",2, IF(I2="C",3, 0)))
I have started with this SQL query
SELECT id, type, charSq, substr(charSq,-1,1) AS TypedChar, xCoordinate, yCoordinate, angle, distance, timestamp, speed FROM table 1 WHERE 1
Need help for the rest of the parameters. Thanks.
Note - I am going to run this in phpMyAdmin SQL
create table test.Table10 select mm.myid,mm.id,mm.type1 as GESTURE,MM.CHARSQ,MM.TYPE2 as TYPEDCHAR,MM.MYCHAR,MM.XCOR,MM.YCOR,MM.SLOPE,l4-(l2*(SLOPE)) as Intercept,
if (ANGLE1<0, (ANGLE1+360) , ANGLE1 ) as ANGLE0,MM.DISTANCE,MM.DW,MM.INDDIFF,MM.TIME1,MM.SPEED,MM.SPDFILT,MM.TIMETYPE from (select c11.*,((YCOR-l4)/(XCOR-l2)) as SLOPE,MOD(DEGREES (ATAN2((YCOR-l4),(XCOR-l2))), 360) as ANGLE1,(YCOR-l4)/(XCOR-l2) ATT,LOG2(((DW)+1)) as INDDIFF,
if(TYPE2=(LAG(TYPE2) OVER (
PARTITION BY MYID
ORDER BY ID)),"Search Time","Actual Time") as TIMETYPE,case when type2="A" then "1"
when type2="B" then 2
when type2="C" then 3
else 0
end as MYCHAR from (SELECT b.*,LEAD(XCOR) OVER (
PARTITION BY charsq) l1,LAG(XCOR) OVER (
PARTITION BY MYID
ORDER BY ID) l2,LEAD(YCOR) OVER (
PARTITION BY MYID) l3,LAG(YCOR) OVER (
PARTITION BY MYID
ORDER BY ID) l4,distance/7.1 as DW,IF(length(speed) >= 3, "1","0") as SPDFILT,RIGHT(charSq,1) as TYPE2 FROM test.table2 b) c11) mm
I have the following query where Im trying to retrieve matches, within a certain breathing space, of the variables entered.
SELECT fthg, ftag, avover, avunder, whh, wha, whd
FROM full
WHERE (whh < ($home_odds + 0.05)
AND whh > ($home_odds - 0.05)
AND wha < ($away_odds + 0.05)
AND wha > ($away_odds -0.05)
AND whd < ($draw_odds + 0.05)
AND whd > ($draw_odds - 0.05))
There are occasions where this returns 0 results so in that case I would like to retrieve the closest matching record to all three but Im not quite sure how to put the query together.
Basically this is the last resort if the other query doesn't return results, this one will return the next best thing no matter how far from the original values.
Thanks for the help
Your original query would be simpler and more readable as this:
SELECT
fthg,
ftag,
avover,
avunder,
whh,
wha,
whd
FROM full
WHERE ABS($home_odds - whh) < 0.05
and ABS($away_odds - wha) < 0.05
and ABS($draw_odds - whd) < 0.05
If that query returns nothing, you could run this one:
SELECT
fthg,
ftag,
avover,
avunder,
whh,
wha,
whd
FROM full
ORDER BY
ABS($home_odds - whh) + ABS($away_odds - wha) + ABS($draw_odds - whd)
LIMIT 1
It will return the row with the lowest deviation from the combination of those three pairs of fields.
How about faking a distance calculation between the parameters you provide and the parameters you are comparing to? Something like
SELECT fthg, ftag, avover, avunder, whh, wha, whd
FROM full
ORDER BY
sqrt(abs(whh - $home_odds) * abs(whh - $home_odds)) +
sqrt(abs(wha - $away_odds) * abs(wha - $away_odds)) +
sqrt(abs(whd - $draw_odds) * abs(whd - $draw_odds))
This way, even if there are no matches given the range you are interested in, you can still get a closer result.
I came across a mysql query that looks like this:
SELECT
SUM(some_amount*(some_field!=90)*(some_date < '2011-04-22'))
, SUM(some_amount*(some_field =90)*(some_date < '2011-04-22')*(another_field IS NULL))
FROM
some_table
What does the * mean in the select statement in this case?
Looks like CAST() is not necessary for boolean-to-integer conversions. Multiplication is used to convert the sum to 0 for unwanted rows (using the fact that boolean true can be cast to 1 and false to 0):
some_amount*(some_field!=90)*(some_date < '2011-04-22')
if some_field == 90 or some_date >= '2011-04-22', the corresponding term will evaluate to 0, thereby converting the entire expression to 0.
It is a multiplication operation.
example 2*3=6
It's a standard multiplication operator,
select 2 * 2
= 4
:)
SELECT COUNT(*) FROM planets
WHERE ROUND(SQRT(POWER(('71' - coords_x), 2) +
POWER(('97' - coords_y), 2))) <= 17
==> 51
SELECT COUNT(*) FROM planets
WHERE ROUND(SQRT(POWER((71 - coords_x), 2) +
POWER((97 - coords_y), 2))) <= 17
==> 22
coords_x and coords_y are both TINYINT fields containing values in the range [1, 100]. Usually MySQL doesn't care if numbers are quoted or not.. but apparently it does in this case. The question is: Why?
I am a bit rusty on the inerds of MySql but <= on string goes to lexicographical sorting instead of numeric ie, '150' < '17'.
The implicit conversion from string to floating point number is probably causing in inaccurate results. See: Type Conversion in Expression Evaluation