Mysql Query for logarithmic aging

Mysql Query for logarithmic aging - mysql

I want to implement a 'logarithmic' score-decay based on aging, and I'm trying to figure out the SUM/LOG combination. Here you have a simplified version of the current query:
SELECT SUM(1) as total_score FROM posts
JOIN votes ON votes.post_id = posts.post_id
WHERE 1
GROUP BY post_id
ORDER BY total_score DESC
I'm currently doing SELECT 'SUM(1) as total_score' but I want to modify the query to take the date/age of the vote into consideration; where a vote from today weights 1, a vote from 15 days ago weights close to .8 and a vote from 30 days ago close to 0. I'm storing the date field on the votes table (vote_date) as a unix_timestamp.
I'm not really concerned about the WHERE clausule; that's pretty straightforward. What I'm trying to figure out is the logarithmic aging part.

I think there are two parts to your answer. First, the weighting function and then the SQL implementation.
Wegighting function:
According to your graph, you don't want a log weight buit rather parabolic.
From this you have to solve
Xc = y
where
X = [1 1 1 ;
15^2 15 1;
30^2 30 1];
and
y = [1;.8;0];
you get c = X^(-1)y or in matlab
c = X\y
Now you have the appropriate wieights of the quadratic function you depicted; namely y = ax^2+bx+c with (a,b,c) =(-.0013,.0073,.9941).
SQL part:
you select statement should look like (assuming the column of interest is named "age")
SELECT (-.0013*age*age + .0073*age + .9941) as age_weighted
Hope it helps
Cheers
Here 's the complete Matlab code (also to doublecheck solution)
X = [1 1 1 ;
15^2 15 1;
30^2 30 1];
y = [1;.8;0];
c = X\y;
x= (1:30)';
y = [x.^2 x ones(30,1)]*c;
figure(1)
clf;hold on
plot(x,y)
plot([1 15 30],[1 .8 0],'o')

Suppose you have a function WEIGHT(age) that gives the weight of a vote that's age days old.
Then your query would be
SELECT SUM(WEIGHT(DATEDIFF(CURRENT_DATE, votes.date_vote_cast))) as total_score,
posts.post_id
FROM posts
JOIN votes ON votes.post_id = posts.post_id
WHERE votes.date_vote_cast <= CURRENT_DATE
AND votes.date_vote_cast > CURRENT_DATE - INTERVAL 30 DAY
GROUP BY post_id
ORDER BY total_score DESC
I am afraid I don't know exactly what function you want for WEIGHT(age). But you do, and you can work it out.

I havent done the SQL part but I found a function that will provide the decay you are after, mathematically at least
y=(sqrt(900-(x^2)))/30
or in your case
score=(sqrt(900-(days^2)))/30
Hope it can help!

Related

MYSQL join most recent record and multiply

I have a stock table and a stock history table, and I am basically trying to write a MySQL statement which will get the value of the stock on a particular day (in this case on the 31st of March), which can only be found by multiplying the cost per unit against what the balance for each item was on the particular day
So far I have :
SELECT
SUM(tbl_stock.cost_per_unit * tbl_stock_history.quantity_balance) as total
FROM
tbl_stock
LEFT JOIN
tbl_stock_history ON tbl_stock.part_ID = tbl_stock_history.part_ID
WHERE
tbl_stock_history.date_of_entry <= '20180331'
and tbl_stock.department = 1
AND tbl_stock.qty > 0
Unfortunately, this code takes the sum of ALL qty_balances found against the part ID's history instead of just the most recent one against the booking_date parameter.
I have tried all the solutions I could find with sub select queries but none of them were playing ball and I feel like I am missing something super obvious!
Any help is greatly appreciated!

I think that this is what you are looking for:
SELECT
SUM(tbl_stock.cost_per_unit * t.quantity_balance) as total
FROM
tbl_stock
LEFT JOIN
(
SELECT * FROM tbl_stock_history
WHERE date_of_entry <= '20180331' ORDER BY date_of_entry DESC limit 1
)
t on tbl_stock.part_ID = t.part_ID
WHERE tbl_stock.department = 1
AND tbl_stock.qty > 0

Reddit Style Score Degridation Over Time In MySQL

I'm attempting to create a reddit style score degradation system for entries on a system. I've got a MySQL view setup to calculate the total "Score" (sum of all up/down votes). I'm having trouble creating a simple, but effective system for moving entries down the page (so that newer entries end up at the top, but a high score can move entries to the top that would otherwise have aged off)...
Here's the closest bit of SQL I've been able to create thus far:
(SUM(v.Score) - (TIMESTAMPDIFF(MINUTE, t.Genesis, NOW()) *
IF(TIMESTAMPDIFF(MINUTE, t.Genesis, NOW()) > 1440,
0.1, 0.003))
) as "Weight",
v.Score is a 1 or a -1 dependent on user votes. t.Genesis is the timestamp on the entry itself.
Any help or suggestions would be appreciated.

One solution could be to use a sort of exponential decay for the relevance of time as a ranking parameter. For example:
SELECT
article, ranking
FROM (
SELECT
article,
(upvotes + downvotes) AS Total,
(upvotes - downvotes) AS Score,
(EXP(-(Published - Genesis) * Constant / 86400) * (Score / Total)) AS Ranking
FROM Table)
ORDER BY ranking DESC
Where Published is the time of publishing, Genesis is some really early date, and Constant is a scaling factor, to determine how late the weight should drop to zero at:
For example: if you want to give all posts a very small score advantage after 7 days from now (say 0.1) then -ln(0.1) / 7 is your Constant.
Score / Total for the average rating rather than the absolute value and 86400 for a one day in seconds (assuming that's how you're measuring your time).
Once again, apologies for my lack of knowledge on SQL functions, I know that EXP is definitely possible, only the time difference function can be adjusted in order to get the time difference in seconds.

You can implement the same ranking algorithm than Hacker News :
Implementing the Hacker News ranking algorithm in SQL
#OMG Ponies solution:
SELECT x.*
FROM POSTS x
JOIN (SELECT p.postid,
SUM(v.vote) AS points
FROM POSTS p
JOIN VOTES v ON v.postid = p.postid
GROUP BY p.postid) y ON y.postid = x.postid
ORDER BY (y.points - 1)/POW(((UNIX_TIMESTAMP(NOW()) - UNIX_TIMESTAMP(x.timestamp))/3600)+2, 1.5) DESC
LIMIT n
x.timestamp is your t.Genesis, v.vote is your v.Score

AVG or SUM in SQL where the values are being calculated on the fly

I have an existing SQL query that gets call stats from a Zultys MX250 phone system: -
SELECT
CONCAT(LEFT(u.firstname,1),LEFT(u.lastname,1)) AS Name,
sec_to_time(SUM(
time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)
)) AS Duration,
COUNT(*) AS '#Calls'
FROM
session s
JOIN mxuser u ON
s.ExtensionID1 = u.ExtensionId
OR s.ExtensionID2 = u.ExtensionId
WHERE
s.ServiceExtension1 IS NULL
AND s.connecttimestamp >= CURRENT_DATE
AND BINARY u.userprofilename = BINARY 'DBAM'
GROUP BY
u.firstname,
u.lastname
ORDER BY
'#Calls' DESC,
Duration DESC;
Output is as follows: -
Name Duration #Calls
TH 01:19:10 30
AS 00:44:59 28
EW 00:51:13 22
SH 00:21:20 13
MG 00:12:04 8
TS 00:42:02 5
DS 00:00:12 1
I am trying to generate a 4th column that shows the average call time for each user, but am struggling to figure out how.
Mathematically it's just "'Duration' / '#Calls'" but after looking at some similar questions on StackOverflow, the example queries are too simple to help me relate to my one above.
Right now, I'm not even sure that it's going to be possible to divide the time column by the number of calls.
UPDATE: I was so close in my testing but got all confused & overcomplicated things. Here's the latest SQL (thanks to #McAdam331 & my buddy Jim from work): -
SELECT
CONCAT(LEFT(u.firstname,1),LEFT(u.lastname,1)) AS Name,
sec_to_time(SUM(
time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)
)) AS Duration,
COUNT(*) AS '#Calls',
sec_to_time(SUM(time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)) / COUNT(*)) AS Average
FROM
session s
JOIN mxuser u ON
s.ExtensionID1 = u.ExtensionId
OR s.ExtensionID2 = u.ExtensionId
WHERE
s.ServiceExtension1 IS NULL
AND s.connecttimestamp >= CURRENT_DATE
AND BINARY u.userprofilename = BINARY 'DBAM'
GROUP BY
u.firstname,
u.lastname
ORDER BY
Average DESC;
Output is as follows: -
Name Duration #Calls Average
DS 00:14:25 4 00:03:36
MG 00:17:23 11 00:01:34
TS 00:33:38 22 00:01:31
EW 01:04:31 43 00:01:30
AS 00:49:23 33 00:01:29
TH 00:43:57 35 00:01:15
SH 00:13:51 12 00:01:09

Well, you are able to get the number of total seconds, as you do before converting it to time. Why not take the number of total seconds, divide that by the number of calls, and then convert that back to time?
SELECT sec_to_time(
SUM(time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)) / COUNT(*))
AS averageDuration

If I understand correctly, you can just replace sum() with avg():
SELECT
CONCAT(LEFT(u.firstname,1),LEFT(u.lastname,1)) AS Name,
sec_to_time(SUM(
time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)
)) AS Duration,
COUNT(*) AS `#Calls`,
sec_to_time(AVG(
time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)
)) AS AvgDuration

Seems like all you need is another expression in the SELECT list. The SUM() aggregate (from the second expression) divided by COUNT aggregate (the third expr). Then wrap that in a sec_to_time function. (Unless I'm totally missing the question.)
Personally, I'd use the TIMESTAMPDIFF function to get a difference in times.
SEC_TO_TIME(
SUM(TIMESTAMPDIFF(SECOND,s.connecttimestamp,s.disconnecttimestamp))
/ COUNT(*)
) AS avg_duration
If what you are asking is there's a way to reference other expressions in the SELECT list by the alias... the answer is unfortunately, there's not.
With a performance penalty, you could use your existing query as an inline view, then in the outer query, the alias names assigned to the expressions are available...
SELECT t.Name
, SEC_TO_TIME(s.TotalDur) AS Duration
, s.`#Calls`
, SEC_TO_TIME(s.TotalDur/s.`#Calls`) AS avgDuration
FROM (
SELECT CONCAT(LEFT(u.firstname,1),LEFT(u.lastname,1)) AS Name
, SUM(TIMESTAMPDIFF(SECOND,s.connecttimestamp,s.disconnecttimestamp)) AS TotalDur
, COUNT(1) AS `#Calls`
FROM session s
-- the rest of your query
) t

SQL - Calculating variable moving average over variable lenghts

FIRST: This question is NOT a duplicate. I have asked this on here already and it was closed as a duplicate. While it is similar to other threads on stackoverflow, it is actually far more complex. Please read the post before assuming it is a duplicate:
I am trying to calculate variable moving averages crossover with variable dates.
That is: I want to prompt the user for 3 values and 1 option. The input is through a web front end so I can build/edit the query based on input or have multiple queries if needed.
X = 1st moving average term (N day moving average. Any number 1-N)
Y = 2nd moving average term. (N day moving average. Any number 1-N)
Z = Amount of days back from present to search for the occurance of:
option = Over/Under: (> or <. X passing over Y, or X passing Under Y)
X day moving average passing over OR under Y day moving average
within the past Z days.
My database is structured:
tbl_daily_data
id
stock_id
date
adj_close
And:
tbl_stocks
stock_id
symbol
I have a btree index on:
daily_data(stock_id, date, adj_close)
stock_id
I am stuck on this query and having a lot of trouble writing it. If the variables were fixed it would seem trivial but because X, Y, Z are all 100% independent of each other (could look, for example for 5 day moving average within the past 100 days, or 100 day moving average within the past 5) I am having a lot of trouble coding it.
Please help! :(
Edit: I've been told some more context might be helpful?
We are creating an open stock analytic system where users can perform trend analysis. I have a database containing 3500 stocks and their price histories going back to 1970.
This query will be running every day in order to find stocks that match certain criteria
for example:
10 day moving average crossing over 20 day moving average within 5
days
20 day crossing UNDER 10 day moving average within 5 days
55 day crossing UNDER 22 day moving average within 100 days
But each user may be interested in a different analysis so I cannot just store the moving average with each row, it must be calculated.

I am not sure if I fully understand the question ... but something like this might help you get where you need to go: sqlfiddle
SET #X:=5;
SET #Y:=3;
set #Z:=25;
set #option:='under';
select * from (
SELECT stock_id,
datediff(current_date(), date) days_ago,
adj_close,
(
SELECT
AVG(adj_close) AS moving_average
FROM
tbl_daily_data T2
WHERE
(
SELECT
COUNT(*)
FROM
tbl_daily_data T3
WHERE
date BETWEEN T2.date AND T1.date
) BETWEEN 1 AND #X
) move_av_1,
(
SELECT
AVG(adj_close) AS moving_average
FROM
tbl_daily_data T2
WHERE
(
SELECT
COUNT(*)
FROM
tbl_daily_data T3
WHERE
date BETWEEN T2.date AND T1.date
) BETWEEN 1 AND #Y
) move_av_2
FROM
tbl_daily_data T1
where
datediff(current_date(), date) <= #z
) x
where
case when #option ='over' and move_av_1 > move_av_2 then 1 else 0 end +
case when #option ='under' and move_av_2 > move_av_1 then 1 else 0 end > 0
order by stock_id, days_ago
Based on answer by #Tom H here: How do I calculate a moving average using MySQL?

Repeating calendar events and some final maths

I am trying to have a go at the infamous repeating events on calendars using PHP/MySQL. I've finally found something that seems to work. I found my answer here but I'm having a little difficulty finishing it off.
My first table 'events'.
ID NAME
1 Sample Event
2 Another Event
My second table 'events_meta that stores the repeating data.
ID event_id meta_key meta_value
1 1 repeat_start 1336312800 /* May 7th 2012 */
2 1 repeat_interval_1 432000 /* 5 days */
With repeat_start being a date with no time as a unix timestamp, and repeat_interval an amount in seconds between intervals (432000 is 5 days).
I then have the following MySQL which I modified slightly from the above link. The timestamp used below (1299132000 which is 12th May 2012) is the current day with no time.
SELECT EV.*
FROM `events` EV
RIGHT JOIN `events_meta` EM1 ON EM1.`event_id` = EV.`id`
RIGHT JOIN `events_meta` EM2 ON EM2.`meta_key` = CONCAT( 'repeat_interval_', EM1.`id` )
WHERE EM1.meta_key = 'repeat_start'
AND (
( CASE ( 1336744800 - EM1.`meta_value` )
WHEN 0
THEN 1
ELSE ( 1336744800 - EM1.`meta_value` ) / EM2.`meta_value`
END
)
) = 1
In the above MySQL, the following code deducts the repeat_start field (EM1.'meta_value') from the current date and then divides it by the repeat interval field (EM2.'meta_value').
ELSE ( 1336744800 - EM1.`meta_value` ) / EM2.`meta_value`
OR
TODAYS DATE - START DATE / 5 DAYS
So here's the maths:
1336744800 - 1336312800 = 432000
432000 / 432000 = 1
Now that works perfect. But if I change the current timestamp 5 days ahead to 1336312800 which is 17th Mat 2012, it looks a bit like this:
1336312800 - 1336312800 = 864000
86400 / 432000 = 2
Which doesn't work because it equals 2 and in the MySQL it needs to equal 1. So I guess my question is, how do I get the MySQL to recognise a whole number rather than having to do this?
...
WHERE EM1.meta_key = 'repeat_start'
AND (
( CASE ( 1336744800 - EM1.`meta_value` )
WHEN 0
THEN 1
ELSE ( 1336744800 - EM1.`meta_value` ) / EM2.`meta_value`
END
)
) = IN (1,2,3,4,5,6,7,8,....)
Hope I'm making sense and I hope it's just a simple maths thing or a function that MySQL has that will help :) Thanks for your help!
EDIT: THE ANSWER
Thanks to #eggypal below, I found my answer and of course it was simple!
SELECT EV.*
FROM elvanto_calendars_events AS EV
RIGHT JOIN elvanto_calendars_events_meta AS EM1 ON EM1.`event_id` = EV.`id`
RIGHT JOIN elvanto_calendars_events_meta AS EM2 ON EM2.`meta_key` = CONCAT( 'repeat_interval_', EM1.`id` )
WHERE EM1.meta_key = 'repeat_start'
AND ( ( 1336744800 - EM1.`meta_value` ) % EM2.`meta_value`) = 0

It's not entirely clear what you want your query to do, but the jist of your question makes me lean toward suggesting that you look into modular arithmetic: in SQL, a % b returns the remainder when a is divided by b - if there is no remainder (i.e. a % b = 0), then a must be an exact multiple of b.
In your case, I think you're trying to find events where the time between the event start and some given literal is an exact multiple of the event interval: that is, (literal - event_start) % event_interval = 0. If it's non-zero, the value is the time to the next occurrence after literal (and, therefore, to determine whether that next occurrence occurs within some period of time, say a day, one would test to see if the remainder is less than such constant e.g. (literal - event_start) % event_interval < 86400).
If this isn't what you're after, please clarify exactly what your query is trying to achieve.

set #dat_ini = '2023-05-20',#dat_fim = '2022-11-20'; select (DATEDIFF( #dat_fim,#dat_ini )) % 60
THIS < 10
It only works for a short period.
To do this, take the start date and change the Month that is on the screen and add a year, then subtract it from the start date, then it works.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008