Time differences between row in google big query - mysql

I'm currently attempting to calculate the timestamp differences between rows in google big query attached is the sample table I am using to test the code .
I am using this code
SELECT
A.row,
A.issue.updated_at,
(B.issue.updated_at - A.issue.updated_at) AS timedifference
FROM [icxmedia-servers:icx_metrics.gh_zh_data_production] A
INNER JOIN [icxmedia-servers:icx_metrics.gh_zh_data_production] B
ON B.row = (A.row + 1)
WHERE issue.number==6 and issue.name=="archer"
ORDER BY A.requestid ASC
Referenced from this question Calculate the time difference between of two rows

Rather than a JOIN, this is more naturally expressed using analytic functions. The documentation for analytic functions with standard SQL in BigQuery explains how analytic functions work and what the syntax is. As an example, if you wanted to take successive differences in x values where the order is determined by column y, you could do:
WITH T AS (
SELECT
x,
y
FROM UNNEST([9, 3, 4, 7]) AS x WITH OFFSET y)
SELECT
x,
x - LAG(x) OVER (ORDER BY y) AS x_diff
FROM T;
Note that to run this in BigQuery, you need to uncheck the "Use Legacy SQL" box under "Show Options" to enable standard SQL. The WITH T clause is simply setting up some data for the example.
For your specific case, you would probably want a query such as:
SELECT
row,
issue.updated_at,
issue.updated_at - LAG(issue.updated_at) OVER (ORDER BY issue.updated_at) AS timedifference
FROM `icxmedia-servers.icx_metrics.gh_zh_data_production`
WHERE issue.number = 6
AND issue.name = "archer"
ORDER BY requestid ASC;
If you want to determine differences in updated_at outside of just a single issue number, you could use a PARTITION BY clause as well. For example:
SELECT
row,
issue.name,
issue.number,
issue.updated_at,
issue.updated_at - LAG(issue.updated_at) OVER (
PARTITION BY issue.number
ORDER BY issue.updated_at) AS timedifference
FROM `icxmedia-servers.icx_metrics.gh_zh_data_production`
ORDER BY requestid ASC;

Related

DENSE_RANK() OVER and IFNULL()

Let's say I have a table like this -
id
number
1
1
2
1
3
1
I want to return the second largest number, and if there isn't, return NULL instead. In this case, since all the numbers in the table are the same, there isn't the second largest number, so it should return NULL.
These codes work -
SELECT IFNULL((
SELECT number
FROM (SELECT *, DENSE_RANK() OVER(ORDER BY number DESC) AS ranking
FROM test) r
WHERE ranking = 2), NULL) AS SecondHighestNumber;
However, after I changed the order of the query, it doesn't work anymore -
SELECT IFNULL(number, NULL) AS SecondHighestNumber
FROM (SELECT *, DENSE_RANK() OVER(ORDER BY number DESC) AS ranking
FROM test) r
WHERE ranking = 2;
It returns blank instead of NULL. Why?
Explanation
This is something of a byproduct of the way you are using subquery in your SELECT clause, and really without a FROM clause.
It is easy to see with a very simple example. We create an empty table. Then we select from it where id = 1 (no results as expected).
CREATE TABLE #foo (id int)
SELECT * FROM #foo WHERE id = 1; -- Empty results
But now if we take a left turn and turn that into a subquery in the select statement - we get a result!
CREATE TABLE #foo (id int)
SELECT (SELECT * FROM #foo WHERE id = 1) AS wtf; -- one record in results with value NULL
I'm not sure what else we could ask our sql engine to do for us - perhaps cough up an error and say I can't do this? Maybe return no results? We are telling it to select an empty result set as a value in the SELECT clause, in a query that doesn't have any FROM clause (personally I would like SQL to cough up and error and say I can't do this ... but it's not my call).
I hope someone else can explain this better, more accurately or technically - or even just give a name to this behavior. But in a nutshell there it is.
tldr;
So your first query has SELECT clause with an IFNULL function in it that uses a subquery ... and otherwise is a SELECT without a FROM. So this is a little weird but does what you want, as shown above. On the other hand, your second query is "normal" sql that selects from a table, filters the results, and lets you know it found nothing -- which might not be what you want but I think actually makes more sense ;)
Footnote: my "sql" here is T-SQL, but I believe this simple example would work the same in MySQL. And for what it's worth, I believe Oracle (back when I learned it years ago) actually would cough up errors here and say you can't have a SELECT clause with no FROM.

SQL: LIMIT and OFFSET to find median

I am trying to find median for odd number of floats. I used the following code for this reason -
select a from tab1
limit 1 offset (count(a) div 2) - 1
But, this code is giving syntax error. I am using MySQL. Any help/ suggestion towards solving the problem will be highly appreciated.
You cannot use expressions in the limit for MySQL. You could use a window function:
select t.*
from (select tab1.*,
row_number() over (order by a) as seqnum,
count(*) over () as cnt
from tab1
) t
where seqnum = (cnt div 2) - 1 ;
Note: This follows the same logic you have in your question, but using window functions instead of limit. There are other ways to calculate the median -- and this isn't strictly speaking the median because it doesn't work for both even and odd numbers of rows.

How to count when value crosses the average

I am trying to write a MySQL query that would count the number of times a value crosses a constant. The end result is we are tying to determine the relative 'noise' of the value via the amplitude and the frequency of the value. MIN() and MAX() provide the amplitude. Count() gives the number of samples that fit the criteria, but it doesn't provide how stable that value is. We are currently using MySQL 5.7 but we will be moving to MySQL 8.0 that provides the windowing features. Something like
Select Count(Value) over (order by logtime ROWS 1 Proeeding <123 AND 1 Following > 123) WHERE logtime BETWEEN...;
Thank your for any help you can provide.
SELECT Count(Value) WHERE Value > 123 AND logtime BETWEEN...;
SELECT Count(Value) WHERE Value < 123 AND logtime BETWEEN...;
Window functions are not available in MySQL versions before 8.0
With MySQL 5.7, we can emulate some window functions by using user-defined variables in a carefully crafted query. The MySQL Reference Manual gives explicit warning about using user-defined variables in a context like this. We are relying on behavior that is not guaranteed.
But as an example of the pattern I would use to achieve the specified result:
SELECT SUM(c.crossed_avg) AS count_crossed_avg
FROM (
SELECT IF( ( #prval > a.avg_ AND t.value < a.avg_ ) OR
( #prval < a.avg_ AND t.value > a.avg_ )
,1,0) AS crossed_avg
, #prval := t.value AS value_
FROM mytable t
CROSS
JOIN ( SELECT 123 AS avg_ ) a
CROSS
JOIN ( SELECT #prval := NULL ) i
WHERE ...
ORDER BY t.logtime
) c
To unpack this, focus first on the inline view query; that is, ignore the SELECT SUM() wrapper query, and run just the inline view query.
We order the rows by logtime so that we can process the rows in order.
We compare the value on the current row to the value from the previous row. If one is above average and the other is below average, then we return a 1, else we return 0.
Save the current value into the user-defined variable for comparing the next row. (Note: the order of operations is important; we are depending on MySQL to do that assignment after the evaluation of the IF() function.
The example query doesn't address the edge case when a row value is exactly equal to the average, e.g. a sequence of values 124.4 < 123.0 < 122.2. (We might want to consider changing the comparisons so that one includes the equality e.g. < and >=.

Alternative for PERCENTILE_CONT in MySQL/MariaDB

I want to calculate percentile_cont on this table.
In Oracle, the query would be
SELECT PERCENTILE_CONT(0.05) FROM sometable;
What would be it's alternative in MariaDB/MySQL?
While MariaDB 10.3.3 has support for these functions in the form of window functions (see Lukasz Szozda's answer), you can emulate them using window functions in MySQL 8 as well:
SELECT DISTINCT first_value(matrix_value) OVER (
ORDER BY CASE WHEN p <= 0.05 THEN p END DESC /* NULLS LAST */
) x,
FROM (
SELECT
matrix_value,
percent_rank() OVER (ORDER BY matrix_value) p,
FROM some_table
) t;
I've blogged about this more in detail here.
MariaDB 10.3.3 introduced PERCENTILE_CONT, PERCENTILE_DISC, and MEDIAN windowed functions.
PERCENTILE_CONT
PERCENTILE_CONT() (standing for continuous percentile) is an ordered set aggregate function which can also be used as a window function. It returns a value which corresponds to the given fraction in the sort order. If required, it will interpolate between adjacent input items.
SELECT name, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY star_rating)
OVER (PARTITION BY name) AS pc
FROM book_rating;
There is no built in function for this in either MariaDB or MySQL, so you have to solve this on the SQL level (or by adding a user defined function written in C ...)
This might help with coming up with a SQL solution:
http://rpbouman.blogspot.de/2008/07/calculating-nth-percentile-in-mysql.html
MariaDB 10.2 has windowing functions.
For MySQL / older MariaDB, and assuming you just want the Nth percentile for a single set of values.
This is best done form app code, but could be built into a stored routine.
Count the total number of rows: SELECT COUNT(*) FROM tbl.
Construct and execute a SELECT with LIMIT n,1 where n is computed as the percentile times the count, then filled into the query.
If you need to interpolate between two values, it gets messier. Do you need that, too?

Table statistics (aka row count) over time

i'm preparing a presentation about one of our apps and was asking myself the following question: "based on the data stored in our database, how much growth have happend over the last couple of years?"
so i'd like to basically show in one output/graph, how much data we're storing since beginning of the project.
my current query looks like this:
SELECT DATE_FORMAT(created,'%y-%m') AS label, COUNT(id) FROM table GROUP BY label ORDER BY label;
the example output would be:
11-03: 5
11-04: 200
11-05: 300
unfortunately, this query is missing the accumulation. i would like to receive the following result:
11-03: 5
11-04: 205 (200 + 5)
11-05: 505 (200 + 5 + 300)
is there any way to solve this problem in mysql without the need of having to call the query in a php-loop?
Yes, there's a way to do that. One approach uses MySQL user-defined variables (and behavior that is not guaranteed)
SELECT s.label
, s.cnt
, #tot := #tot + s.cnt AS running_subtotal
FROM ( SELECT DATE_FORMAT(t.created,'%y-%m') AS `label`
, COUNT(t.id) AS cnt
FROM articles t
GROUP BY `label`
ORDER BY `label`
) s
CROSS
JOIN ( SELECT #tot := 0 ) i
Let's unpack that a bit.
The inline view aliased as s returns the same resultset as your original query.
The inline view aliased as i returns a single row. We don't really care what it returns (except that we need it to return exactly one row because of the JOIN operation); what we care about is the side effect, a value of zero gets assigned to the #tot user variable.
Since MySQL materializes the inline view as a derived table, before the outer query runs, that variable gets initialized before the outer query runs.
For each row processed by the outer query, the value of cnt is added to #tot.
The return of s.cnt in the SELECT list is entirely optional, it's just there as a demonstration.
N.B. The MySQL reference manual specifically states that this behavior of user-defined variables is not guaranteed.