MySQL: Calculate intermediate value at a given timestamp (Linear interpolation) - mysql

Given the following table, I want to select the value of each ID at exactly 00:00:00. When there's an entry at this exact time, return it, otherwise calculate it with linear interpolation (an imaginary graph line between the nearest values before and after 00:00:00). If there's no value after the given time yet, return the last value, or use linear interpolation from the last two points.
ID|Timestamp|Value
1|2015-01-01 23:00:00|90
1|2015-01-02 01:00:00|110
2|2015-01-01 23:00:00|210
2|2015-01-02 01:00:00|190
3|2015-01-02 00:00:00|50
4|2015-01-01 23:00:00|100
5|2015-01-01 22:00:00|80
5|2015-01-01 23:00:00|90
Result:
ID|Value
1|100
2|200
3|50
4|100
5|100
Is this possible with MySQL only and how?

First, let's split it into single entries like #3 versus double entries (the rest):
( SELECT ID, value FROM tbl GROUP BY ID HAVING COUNT(*) = 1 )
UNION ALL
( ... ) -- This needs interpolation code, below
ORDER BY ID;
To separate the pair of rows is tricky since there is no good way to do "groupwise-max". So, instead, I will work with #variables and walk through the table in order.
To round to the nearest midnight might be ROUND(... / 86400) * 86400 . The potential problem is the time_zone you are in. I don't feel like fixing that.
SELECT ID, val FROM (
SELECT ID,
IF(ID != #prevID, '1st', '2nd') AS picker, -- See WHERE filter, below
#ts = timestamp, -- Need an INT here, not sure that is what I have
#dts = #ts - #prev_ts,
#dval = value - #prev_val,
#midnight := ROUND(#ts / 86400) * 86400, -- DST issues?
value + (#midnight - #ts) * (#dval / #dts) AS val, -- interpolate
#prev_id = ID,
#prev_ts = #ts,
#prev_val = value
FROM ( SELECT #prevID := 0 ) AS Initialize
JOIN tbl
ORDER BY ID, timestamp
) AS x
WHERE picker = '2nd'
ORDER BY ID
Put that monster as the second part of the UNION.

Related

Getting formatted result in mysql table

This is my input table
but i want to get this table
Explaination:
I want to subtract value of segmeted 14/10/22 - 7/10/22 that means (28930-28799)
how could i get this kindly help me to figure it out. I cant format it properly.
This is my table
and i want to subtract value column subtraction by SEGMENTED_DATE wise
like (14th october value - 7th october value) that means (28930-28799)
the segment table is created by bellow query
select segment ,count(distinct user_id)as value,SEGMENTED_DATE from weekly_customer_RFM_TABLE
where segment in('About to sleep','Promising','champion','Loyal_customer',
'Potential_Loyalist','At_Risk','Need_Attention','New_customer',
'Hibernating','Cant_loose')
and SEGMENTED_DATE between '2022-10-07' and '2022-10-28'
Group by segment,SEGMENTED_DATE
I want this table as output
This is only value difference only Segment_date wise
The sample data of results table is not correct.
You said that
"I want to subtract value of segmeted 14/10/22 - 7/10/22 that means (28930-28799) " but this gives 131 not 233.
You said that "while in you example and value 21/10/22 -14/10/22 that means(29137-28930)" but this gives 207 not 190.
How did you calculate the value 344 in the first row?
The following query will produce the format you want but without the first row as it is not clear to me how did you calculate it. I put xxx AS your table name. The query is based on using variables.
SET #Prev = 0;
SET #i = 0;
SELECT CONCAT('Week', C, '-', 'Week', C-1) AS Change_Time, Segment, Prev AS Value FROM (
SELECT `Value`- #Prev AS Prev, Segment, #Prev :=`Value` AS V, #i:=#i+1 AS C, Segmentd FROM xxx
) AS t WHERE C> 1;
The results will be :
This query is suitable for MySQL engine and will not run on SQL server.
Edit1:
Here is some explanation:
In inner query I used variables for two reasons:
I need a counter (#i) so I can know week index like (week1, week2, ...). This counter will increase with each record by (#i:=#i+1).
I need to know value of previous record so I used (#Prev :=Value) to save that value then I can subtract it from Value in current record (Value- #Prev) AS Prev.
I started with initial values (SET #Prev = 0;) Assuming no previous values and (SET #i = 0;) because #i will increased to (1) at first record.
In outer query I converted (#i named C) to (week(i)-week(i-1)) week1-week0, week2-week1, ....... and removed first record because it will display wrong data.
I can help improving the query if you show me some real data.
Edit2:
According to you last modification at 2022/10/07 the query will be :
SET #Prev = 0;
SET #S = 0;
SELECT Segment, Diffirence, SEGMENTED_DATE FROM (
SELECT
`Value`- #Prev AS Diffirence,
POSITION(#S IN Segment) AS NotFirst,
#Prev := IF(#S=Segment, `Value`, 0) AS `Value`,
#S := Segment AS Segment,
SEGMENTED_DATE
FROM test
) AS t WHERE NotFirst> 0;
You may perform a self join as the following:
SET #rn=1;
SELECT T.segment,
D.value-T.value AS Difference,
D.segmented_date,
FROM table_name T JOIN table_name D
ON D.segmented_date=T.segmented_date + INTERVAL 7 DAY
AND D.segment=T.segment
ORDER BY T.segment, D.segmented_date
See a demo.

finding a percentile value in mysql 5.7? [duplicate]

I have a table which contains thousands of rows and I would like to calculate the 90th percentile for one of the fields, called 'round'.
For example, select the value of round which is at the 90th percentile.
I don't see a straightforward way to do this in MySQL.
Can somebody provide some suggestions as to how I may start this sort of calculation?
Thank you!
First, lets assume that you have a table with a value column. You want to get the row with 95th percentile value. In other words, you are looking for a value that is bigger than 95 percent of all values.
Here is a simple answer:
SELECT * FROM
(SELECT t.*, #row_num :=#row_num + 1 AS row_num FROM YOUR_TABLE t,
(SELECT #row_num:=0) counter ORDER BY YOUR_VALUE_COLUMN)
temp WHERE temp.row_num = ROUND (.95* #row_num);
Compare solutions:
Number of seconds it took on my server to get 99 percentile of 1.3 million rows:
LIMIT x,y with index and no where: 0.01 seconds
LIMIT x,y with no where: 0.7 seconds
LIMIT x,y with where: 2.3 seconds
Full scan with no where: 1.6 seconds
Full scan with where: 5.7 seconds
Fastest solution for large tables using LIMIT x,y ():
Get count of values: SELECT COUNT(*) AS cnt FROM t
Get nth value, where n = (cnt - 1) * (1 - 0.95) : SELECT k FROM t ORDER BY k DESC LIMIT n,1
This solution requires two queries, because mysql does not support specifying variables in LIMIT clause, except for stored procedures (can be optimized with stored procedure). Usually additional query overhead is very low
This solution can be further optimized if you add index to k column and do not use complex where clauses (like 0.01 second for table with 1 million rows, because sorting is not needed).
Implementation example in PHP (can calculate percentile not only of columns, but also of expressions):
function get_percentile($table, $where, $expr, $percentile) {
if ($where) $subq = "WHERE $where";
else $subq = "";
$r = query("SELECT COUNT(*) AS cnt FROM $table $subq");
$w = mysql_fetch_assoc($r);
$num = abs(round(($w['cnt'] - 1) * (100 - $percentile) / 100.0));
$q = "SELECT ($expr) AS prcres FROM $table $subq ORDER BY ($expr) DESC LIMIT $num,1";
$r = query($q);
if (!mysql_num_rows($r)) return null;
$w = mysql_fetch_assoc($r);
return $w['prcres'];
}
// Usage example
$time = get_percentile(
"state", // table
"service='Time' AND cnt>0 AND total>0", // some filter
"total/cnt", // expression to evaluate
80); // percentile
The SQL standard supports the PERCENTILE_DISC and PERCENTILE_CONT inverse distribution functions for precisely this job. Implementations are available in at least Oracle, PostgreSQL, SQL Server, Teradata. Unfortunately not in MySQL. But you can emulate PERCENTILE_DISC in MySQL 8 as follows:
SELECT DISTINCT first_value(my_column) OVER (
ORDER BY CASE WHEN p <= 0.9 THEN p END DESC /* NULLS LAST */
) x,
FROM (
SELECT
my_column,
percent_rank() OVER (ORDER BY my_column) p,
FROM my_table
) t;
This calculates the PERCENT_RANK for each row given your my_column ordering, and then finds the last row for which the percent rank is less or equal to the 0.9 percentile.
This only works on MySQL 8+, which has window function support.
I was trying to solve this for quite some time and then I found the following answer. Honestly brilliant. Also quite fast even for big tables (the table where I used it contained approx 5 mil records and needed a couple of seconds).
SELECT
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(field_name ORDER BY
field_name SEPARATOR ','), ',', 95/100 * COUNT(*) + 1), ',', -1) AS DECIMAL)
AS 95th Per
FROM table_name;
As you can imagine just replace table_name and field_name with your table's and column's names.
For further information check Roland Bouman's original post
In MySQL 8 there is the ntile window function you can use:
SELECT SomeTable.ID, SomeTable.Round
FROM SomeTable
JOIN (
SELECT SomeTable, (NTILE(100) OVER w) AS Percentile
FROM SomeTable
WINDOW w AS (ORDER BY Round)
) AS SomeTablePercentile ON SomeTable.ID = SomeTablePercentile.ID
WHERE Percentile = 90
LIMIT 1
https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html#function_ntile
http://www.artfulsoftware.com/infotree/queries.php#68
SELECT
a.film_id ,
ROUND( 100.0 * ( SELECT COUNT(*) FROM film AS b WHERE b.length <= a.length ) / total.cnt, 1 )
AS percentile
FROM film a
CROSS JOIN (
SELECT COUNT(*) AS cnt
FROM film
) AS total
ORDER BY percentile DESC;
This can be slow for very large tables
As pert Tony_Pets answer, but as I noted on a similar question: I had to change the calculation slightly, for example the 90th percentile - "90/100 * COUNT(*) + 0.5" instead of "90/100 * COUNT(*) + 1". Sometimes it was skipping two values past the percentile point in the ordered list, instead of picking the next higher value for the percentile. Maybe the way integer rounding works in mysql.
ie:
.... SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(fieldValue ORDER BY fieldValue SEPARATOR ','), ',', 90/100 * COUNT(*) + 0.5), ',', -1) as 90thPercentile ....
The most common definition of a percentile is a number where a certain percentage of scores fall below that number. You might know that you scored 67 out of 90 on a test. But that figure has no real meaning unless you know what percentile you fall into. If you know that your score is in the 95th percentile, that means you scored better than 95% of people who took the test.
This solution works also with the older MySQL 5.7.
SELECT *, #row_num as numRows, 100 - (row_num * 100/(#row_num + 1)) as percentile
FROM (
select *, #row_num := #row_num + 1 AS row_num
from (
SELECT t.subject, pt.score, p.name
FROM test t, person_test pt, person p, (
SELECT #row_num := 0
) counter
where t.id=pt.test_id
and p.id=pt.person_id
ORDER BY score desc
) temp
) temp2
-- optional: filter on a minimal percentile (uncomment below)
-- having percentile >= 80
An alternative solution that works in MySQL 8: generate a histogram of your data:
ANALYZE TABLE my_table UPDATE HISTOGRAM ON my_column WITH 100 BUCKETS;
And then just select the 95th record from information_schema.column_statistics:
SELECT v,c FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets',
'$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist
WHERE column_name='my_column' LIMIT 95,1
And voila! You will still need to decide whether you take the lower or upper limit of the percentile, or perhaps take an average - but that is a small task now. Most importantly - this is very quick, once the histogram object is built.
Credit for this solution: lefred's blog.

Find the column with unusual difference between succeeding or preceding column in mysql

I have following table
id vehicle_id timestamp distance_meters
1 1 12:00:01 1000
2 1 12:00:04 1000.75
3 1 15:00:06 1345.0(unusual as time and distance jumped)
4 1 15:00:09 1347
The table above is the log of the vehicle.Normally , vehicle sends the data at 3 seconds interval , but sometimes they can get offline and send the data only they are online. Only, way to find out that is find out unusual jump in distance . We can assume some normal jump as (500 meters)
What is the best way to do that?
If you cannot ensure that the ids increment with no gaps, then you need another method. One method uses variables and one uses correlated subqueries.
The variables is messy, but probably the fastest method:
select t.*,
(case when #tmp_prev_ts := #prev_ts and false then NULL -- never happens
when #prev_ts := timestamp and false then NULL -- never happens
else #tmp_prev_ts
end) as prev_timestamp,
(case when #tmp_prev_d := #prev_d and false then NULL -- never happens
when #prev_d := distance_meters and false then NULL -- never happens
else #tmp_prev_d
end) as prev_distance_meters
from t cross join
(select #prev_ts := '', #prev_d := 0) params
order by timestamp; -- assume this is the ordering
You can then use a subquery or other logic to get the large jumps.
Usually you can use windowing function for such task - LEAD and LAG are perfect for this. However since there are no windowing functions in mysql you would have to emulate them.
You need to get your data set with row number and then join it to itself by row number with offset by 1.
It would look something like this:
SELECT
*
FROM (SELECT
rownr,
vehicle_id,
timestamp,
distance_meters
FROM t) tcurrent
LEFT JOIN (SELECT
rownr,
vehicle_id,
timestamp,
distance_meters
FROM t) tprev
ON tcurrent.vehicle_id = tprev.vehicle_id
AND tprev.rownr = tcurrent.rownr - 1
If you can assume id is sequential (without gaps) per vehicle_id, then you could use it instead of rownr. Otherwise you would have to make you own rank/row number.
So you would have to combine ranking solution from this question:
MySQL - Get row number on select

Iterate result set like array

How I can in MySQL fetch any row by index from result set as it possible with arrays or collections in most programming languages ?
array[index]
Or:
collection.getElementByIndex(index)
Update:
I have a result set of dates, me need to check whether the 90 days between each date
You have two alternatives:
Use a a sub-select.
Use the ability for MySQL to iterate over the returned rows.
First alternative looks like:
SELECT BIT_AND(IFNULL(DATEDIFF((SELECT dt FROM foo WHERE dt > a.dt ORDER BY dt LIMIT 1), a.dt) >= 90, 1)) AS all_larger
FROM foo a;
Update: To handle a table where a date is duplicated, it is necessary to add a second sub-select to see if there are duplicates for the date, as follows:
SELECT BIT_AND(larger && ! duplicates) AS all_larger
FROM (SELECT a.dt
, IFNULL(DATEDIFF((SELECT dt FROM foo WHERE dt > a.dt ORDER BY dt LIMIT 1), a.dt) >= 90, 1) AS larger
, (SELECT COUNT(*) FROM foo WHERE dt = a.dt) > 1 AS duplicates
FROM foo a) AS x;
Second alternative looks like:
SET #prev = NULL;
SELECT BIT_AND(a.larger) AS all_larger
FROM (SELECT IFNULL(DATEDIFF(dt, #prev) >= 90, 1) AS larger
, #prev := dt
FROM foo ORDER BY dt) a;
Both give the following result set when run on a table where the difference between the dates are more than 90 days:
+------------+
| all_larger |
+------------+
| 1 |
+------------+
The second one is probably faster, but I haven't measured on larger sets.
Intrinsically you cannot. A relational database doesn't preserve record order (or at least you can't rely on it, even if it temporarily stores record order). In this way it acts more like a hashmap or List than an array.
However if you want, you can add a field in the table - let's call it RowNum - that stores a row number, and you can query on that.
select * from Table where RowNum = %index%;

Calculating the Median with Mysql

I'm having trouble with calculating the median of a list of values, not the average.
I found this article
Simple way to calculate median with MySQL
It has a reference to the following query which I don't understand properly.
SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)+1)/2
If I have a time column and I want to calculate the median value, what do the x and y columns refer to?
I propose a faster way.
Get the row count:
SELECT CEIL(COUNT(*)/2) FROM data;
Then take the middle value in a sorted subquery:
SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit #middlevalue) x;
I tested this with a 5x10e6 dataset of random numbers and it will find the median in under 10 seconds.
This will find an arbitrary percentile by replacing the COUNT(*)/2 with COUNT(*)*n where n is the percentile (.5 for median, .75 for 75th percentile, etc).
val is your time column, x and y are two references to the data table (you can write data AS x, data AS y).
EDIT:
To avoid computing your sums twice, you can store the intermediate results.
CREATE TEMPORARY TABLE average_user_total_time
(SELECT SUM(time) AS time_taken
FROM scores
WHERE created_at >= '2010-10-10'
and created_at <= '2010-11-11'
GROUP BY user_id);
Then you can compute median over these values which are in a named table.
EDIT: Temporary table won't work here. You could try using a regular table with "MEMORY" table type. Or just have your subquery that computes the values for the median twice in your query. Apart from this, I don't see another solution. This doesn't mean there isn't a better way, maybe somebody else will come with an idea.
First try to understand what the median is: it is the middle value in the sorted list of values.
Once you understand that, the approach is two steps:
sort the values in either order
pick the middle value (if not an odd number of values, pick the average of the two middle values)
Example:
Median of 0 1 3 7 9 10: 5 (because (7+3)/2=5)
Median of 0 1 3 7 9 10 11: 7 (because 7 is the middle value)
So, to sort dates you need a numerical value; you can get their time stamp (as seconds elapsed from epoch) and use the definition of median.
Finding median in mysql using group_concat
Query:
SELECT
IF(count%2=1,
SUBSTRING_INDEX(substring_index(data_str,",",pos),",",-1),
(SUBSTRING_INDEX(substring_index(data_str,",",pos),",",-1)
+ SUBSTRING_INDEX(substring_index(data_str,",",pos+1),",",-1))/2)
as median
FROM (SELECT group_concat(val order by val) data_str,
CEILING(count(*)/2) pos,
count(*) as count from data)temp;
Explanation:
Sorting is done using order by inside group_concat function
Position(pos) and Total number of elements (count) is identified. CEILING to identify position helps us to use substring_index function in the below steps.
Based on count, even or odd number of values is decided.
Odd values: Directly choose the element belonging to the pos using substring_index.
Even values: Find the element belonging to the pos and pos+1, then add them and divide by 2 to get the median.
Finally the median is calculated.
If you have a table R with a column named A, and you want the median of A, you can do as follows:
SELECT A FROM R R1
WHERE ( SELECT COUNT(A) FROM R R2 WHERE R2.A < R1.A ) = ( SELECT COUNT(A) FROM R R3 WHERE R3.A > R1.A )
Note: This will only work if there are no duplicated values in A. Also, null values are not allowed.
Simplest ways me and my friend have found out... ENJOY!!
SELECT count(*) INTO #c from station;
select ROUND((#c+1)/2) into #final;
SELECT round(lat_n,4) from station a where #final-1=(select count(lat_n) from station b where b.lat_n > a.lat_n);
Here is a solution that is easy to understand. Just replace Your_Column and Your_Table as per your requirement.
SET #r = 0;
SELECT AVG(Your_Column)
FROM (SELECT (#r := #r + 1) AS r, Your_Column FROM Your_Table ORDER BY Your_Column) Temp
WHERE
r = (SELECT CEIL(COUNT(*) / 2) FROM Your_Table) OR
r = (SELECT FLOOR((COUNT(*) / 2) + 1) FROM Your_Table)
Originally adopted from this thread.