How to count when value crosses the average - mysql

I am trying to write a MySQL query that would count the number of times a value crosses a constant. The end result is we are tying to determine the relative 'noise' of the value via the amplitude and the frequency of the value. MIN() and MAX() provide the amplitude. Count() gives the number of samples that fit the criteria, but it doesn't provide how stable that value is. We are currently using MySQL 5.7 but we will be moving to MySQL 8.0 that provides the windowing features. Something like
Select Count(Value) over (order by logtime ROWS 1 Proeeding <123 AND 1 Following > 123) WHERE logtime BETWEEN...;
Thank your for any help you can provide.
SELECT Count(Value) WHERE Value > 123 AND logtime BETWEEN...;
SELECT Count(Value) WHERE Value < 123 AND logtime BETWEEN...;

Window functions are not available in MySQL versions before 8.0
With MySQL 5.7, we can emulate some window functions by using user-defined variables in a carefully crafted query. The MySQL Reference Manual gives explicit warning about using user-defined variables in a context like this. We are relying on behavior that is not guaranteed.
But as an example of the pattern I would use to achieve the specified result:
SELECT SUM(c.crossed_avg) AS count_crossed_avg
FROM (
SELECT IF( ( #prval > a.avg_ AND t.value < a.avg_ ) OR
( #prval < a.avg_ AND t.value > a.avg_ )
,1,0) AS crossed_avg
, #prval := t.value AS value_
FROM mytable t
CROSS
JOIN ( SELECT 123 AS avg_ ) a
CROSS
JOIN ( SELECT #prval := NULL ) i
WHERE ...
ORDER BY t.logtime
) c
To unpack this, focus first on the inline view query; that is, ignore the SELECT SUM() wrapper query, and run just the inline view query.
We order the rows by logtime so that we can process the rows in order.
We compare the value on the current row to the value from the previous row. If one is above average and the other is below average, then we return a 1, else we return 0.
Save the current value into the user-defined variable for comparing the next row. (Note: the order of operations is important; we are depending on MySQL to do that assignment after the evaluation of the IF() function.
The example query doesn't address the edge case when a row value is exactly equal to the average, e.g. a sequence of values 124.4 < 123.0 < 122.2. (We might want to consider changing the comparisons so that one includes the equality e.g. < and >=.

Related

Resetting the value of the variable for every new row

I am trying to execute a query which would do the below options for me. I have been given a table(testdata1) say:
YearS CountS($)
2015 360
2016 1000
2017 2000
2018 3500
From this table I have to create another table(testdata2) as listed below:
YearS NewCountS($)
2015 360
2016 640(i.e 1000 - 360)
2017 1000(i.e. 2000 - 1000)
2018 1500(i.e. 3500 - 2000)
There is one more thing which I need to keep in mind is that any number of row can be given to me in the testdata1 therefore what I tried is:
set #diffr := 0;
update testdata1
set cd = (#diffr := CountS - #diffr)
order by yearS;
=================================
insert into testdata2(yearS, NewCountS)
Select yearS, cd
from testdata1;
This query works but generates this output:
YearS NewCountS($)
2015 360
2016 640
2017 1360
2018 2140
I found this link to help but did not understood what the answer explains since I am tyro in mysql.
My query:
How to achieve the desired result?
Is there a better way to get the data from testdata1 table to testdata2 table(i.e. without creating cd column in testdata1)?
Any help will be heartily welcomed...
There's no need to perform an UPDATE of testdata1. Just write a query that gets the rows you need to insert into testdata2.
If we want to use a user-defined variable, test the statement first to verify that it returns the result we expect.
SELECT r.years
, r.counts
FROM (
SELECT s.years
, s.counts - #prev_counts AS `counts`
, #prev_counts := s.counts AS `prev_counts`
FROM ( SELECT #prev_counts := 0 ) i
CROSS
JOIN testdata1 s
ORDER BY s.years
) r
ORDER BY r.years
Note that the inline view i initializes the user-defined variable at the beginning of the statement, essentially equivalent to running a separated SET #prev_counts = 0; statement.
For each row, the value of the user-defined variable is set to the value of counts, so that it will be available when we process the next row. Note that we are expecting MySQL to do this assignment operation after the udv is referenced in the preceding expression.
Note that this behavior with user-defined variables used like this (the order of evaluation) is documented (in the MySQL Reference Manual) to be "undefined". We do observe consistent behavior with carefully constructed SQL, but we should make note that officially this behavior is not guaranteed.
Once the SELECT is tested, we can turn it into an INSERT ... SELECT.
An alternative that doesn't use a user-defined variable, we could use a correlated subquery in an expression on the SELECT list. (gain, write this is a a SELECT statement and test it first, before we turn it into an iNSERT ... SELECT)
SELECT q.years
, q.counts
- IFNULL(
( SELECT r.counts
FROM testdata1 r
WHERE r.years < q.years
ORDER BY r.years DESC
LIMIT 1
)
,0) AS `counts`
FROM testdata1 q
ORDER BY q.years
EDIT to add explanation of the query above
starts out as a simple query like this:
SELECT q.years
, q.counts
FROM testdata1 q
ORDER BY q.years
For each row returned from testdata1, the expressions in the SELECT list are evaluated, and a value is returned. In this case, the two expressions are simple simple column references, we get the value stored in the column.
We could use a more complex expression in the SELECT list, for examples:
SELECT q.years
, q.years - 2000 AS `yy`
, REVERSE(q.years) AS `sraey`
, CONCAT('for ',s.years,' the count is ',s.counts) AS `cs`
, ...
The same thing happens for those expressions in the SELECT list, for every row returned, those expressions are evaluated, and a value is returned.
It's also possible to use a query as an expression, but with some restrictions. The query must return a single column (a single expression), and can return at most one row.
When the query is working on the row years-2017, the expressions in the SELECT list are evaluated.
The specification is that we want to get the value of counts for the preceding year, years=2016. We could get that row by executing a query like this:
SELECT r.years
, r.counts
FROM testdata1 r
WHERE r.years = '2016'
To use a query like this as an expression in the SELECT list, we need to make sure it doesn't return more than one row. We can add LIMIT 1 clause to ensure that it doesn't. And we need to return only one column ...
SELECT r.counts
FROM testdata1 r
WHERE r.years = 2016
LIMIT 1
But that always gets the 2016 row. What we can do now is change that to reference values from the row in the outer query, instead of the literal 2016.
SELECT (
SELECT r.counts
FROM testdata1 r
WHERE r.years = ( q.years - 1 )
LIMIT 1
) AS `prev_years_counts`
, q.counts
, q.years
FROM testdata1 q
ORDER BY q.years
Note that q.years in the subquery is a reference to the row from the outer query. When a row from q is being processed, MySQL executes the subquery, using the value of q.years in the WHERE clause. For each row processed by the outer query, the subquery is executed. And because of that q.years reference in the outer query, we say that its a correlated subquery.
In the event no row is returned (as will be the case for q.years=2015, the subquery returns a NULL value. We wrap that whole subquery in a IFNULL function, so if the subquery returns NULL, we will return a 0.
And the end result is a value. We can write expressions in the SELECT list that do subtraction, e.g.
SELECT q.counts - 540 AS `counts_minus_540`
In place a literal 540, we can use expressions or column references ...
SELECT q.counts - foo AS `counts_minus_foo`
We can use the correlated subquery expression in place of foo, just like we did in that second query in this answer, of the form:
SELECT q.years
, q.counts - IFNULL( crsq ,0) AS `counts`
FROM ...
where crsq is the correlated subquery
SELECT q.years
, q.counts - IFNULL(
SELECT r.counts
FROM testdata1 r
WHERE r.years = ( q.years - 1 )
LIMIT 1
,0) AS `counts`
FROM testdata1 q
ORDER BY q.years
With the given example data, this query is equivalent to the second one in the answer. If there is z gap in the years values (for example, there is no years=2016 row. the query result will be different, because the correlated subquery will return something different for q.years-2017.

Alternative for PERCENTILE_CONT in MySQL/MariaDB

I want to calculate percentile_cont on this table.
In Oracle, the query would be
SELECT PERCENTILE_CONT(0.05) FROM sometable;
What would be it's alternative in MariaDB/MySQL?
While MariaDB 10.3.3 has support for these functions in the form of window functions (see Lukasz Szozda's answer), you can emulate them using window functions in MySQL 8 as well:
SELECT DISTINCT first_value(matrix_value) OVER (
ORDER BY CASE WHEN p <= 0.05 THEN p END DESC /* NULLS LAST */
) x,
FROM (
SELECT
matrix_value,
percent_rank() OVER (ORDER BY matrix_value) p,
FROM some_table
) t;
I've blogged about this more in detail here.
MariaDB 10.3.3 introduced PERCENTILE_CONT, PERCENTILE_DISC, and MEDIAN windowed functions.
PERCENTILE_CONT
PERCENTILE_CONT() (standing for continuous percentile) is an ordered set aggregate function which can also be used as a window function. It returns a value which corresponds to the given fraction in the sort order. If required, it will interpolate between adjacent input items.
SELECT name, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY star_rating)
OVER (PARTITION BY name) AS pc
FROM book_rating;
There is no built in function for this in either MariaDB or MySQL, so you have to solve this on the SQL level (or by adding a user defined function written in C ...)
This might help with coming up with a SQL solution:
http://rpbouman.blogspot.de/2008/07/calculating-nth-percentile-in-mysql.html
MariaDB 10.2 has windowing functions.
For MySQL / older MariaDB, and assuming you just want the Nth percentile for a single set of values.
This is best done form app code, but could be built into a stored routine.
Count the total number of rows: SELECT COUNT(*) FROM tbl.
Construct and execute a SELECT with LIMIT n,1 where n is computed as the percentile times the count, then filled into the query.
If you need to interpolate between two values, it gets messier. Do you need that, too?

Use the result of the MySQL SELECT query as a WHERE condition in the same query

I'm trying to do this query:
SELECT MAX(`peg_num`)
AS "indicator"
FROM `list`
WHERE `list_id` = 1
AND "indicator" >= 1
But I'm getting the result of NULL. What I should be getting is 99, as the range of peg_num is 00 to 99.
The value checked against "indicator" should actually be a user input, so I want it to be versatile. But, it does give me the correct result if I flip the equality around:
SELECT MAX(`peg_num`)
AS "indicator"
FROM `list`
WHERE `list_id` = 1
AND "indicator" <= 1
Why would it do this?
Edit:
As suggested, I'm using the HAVING clause... but I just ditched the alias for now anyway:
SELECT MAX(`peg_num`) AS "indicator"
FROM `list`
GROUP BY `list_id`
HAVING MAX(`peg_num`) <= 40
Still very stubborn. It gives me 99 now no matter the value in the having clause, regardless of the inequality.
Edit2:
As a clarification:
What I want to happen is the query select the largest value in the range of peg_num, but only if it is larger than a user-given input. So, the max in this case is 99. If the user wants to select a number like 101, he/she can't because it's not in the range.
Because of double quotes, "indicator" in WHERE clause is interpreted as a string. Thus, it evaluates to 0, meaning it is always less than 1. Column names must be escaped in backticks.
Keep in mind that WHERE clause is executed before SELECT an hence aliases defined in SELECT can not be used in WHERE clause.
SELECT MAX(`peg_num`) AS `indicator`
FROM `list`
WHERE `list_id` = 1
HAVING `indicator` >= 1
You might want to check out the link on the answer to another Stack question about not being allowed to use alias in where clause:
Can you use an alias in the WHERE clause in mysql?
Paul Dixon cites:
It is not allowable to refer to a column alias in a WHERE clause,
because the column value might not yet be determined when the WHERE
clause is executed. See Section B.1.5.4, “Problems with Column
Aliases”.
Also:
Standard SQL disallows references to column aliases in a WHERE clause.
The behavior you're seeing in your query when you swap the '<=' and '>=' operators, results from the query comparing the string/varchar 'indicator' to the number 1.
That's why you see the correct answer..when ('indicator' >= 1) which is true, and null when ('indicator' <= 1) which is false.
I don't know, but I'm amazed either of them work at all. WHERE works serially on fields belonging to individual records and I wouldn't expect it to work on "indicator" since that's a group calculation.
Does this do what you want?
SELECT max(`peg_num` ) AS "indicator"
FROM actions
WHERE `peg_num` >=1
AND `list_id` <= 1
WHERE happens before SELECT, and don't know what's "indicator".
You should use HAVING (with GROUP BY) to use the SELECT fields
Here's the documentation for syntax
http://dev.mysql.com/doc/refman/5.5/en/select.html
Something like this is the idea
SELECT MAX(peg_num) AS indicator
FROM list
WHERE list_id = 1
HAVING indicator <= 1
I can't test it and i never met Mysql so just the idea,
You should use HAVING
No quotes in HAVING condition
This must work:
SELECT MAX(peg_num)
AS indicator
FROM list
WHERE list_id = 1
HAVING indicator >= 1
I completely re-invented my query and it worked. The thing is, I had to use a nested query (and I wanted to not do that as much as possible, my professor had always discouraged it).
Anyway, here it is:
SELECT IF(`key` < 900, `key`, null) `key`
FROM (
(
SELECT MAX( `peg_num` ) AS `key`
FROM `list`
WHERE `list_id` =1
) AS `derivedTable`
)

Table statistics (aka row count) over time

i'm preparing a presentation about one of our apps and was asking myself the following question: "based on the data stored in our database, how much growth have happend over the last couple of years?"
so i'd like to basically show in one output/graph, how much data we're storing since beginning of the project.
my current query looks like this:
SELECT DATE_FORMAT(created,'%y-%m') AS label, COUNT(id) FROM table GROUP BY label ORDER BY label;
the example output would be:
11-03: 5
11-04: 200
11-05: 300
unfortunately, this query is missing the accumulation. i would like to receive the following result:
11-03: 5
11-04: 205 (200 + 5)
11-05: 505 (200 + 5 + 300)
is there any way to solve this problem in mysql without the need of having to call the query in a php-loop?
Yes, there's a way to do that. One approach uses MySQL user-defined variables (and behavior that is not guaranteed)
SELECT s.label
, s.cnt
, #tot := #tot + s.cnt AS running_subtotal
FROM ( SELECT DATE_FORMAT(t.created,'%y-%m') AS `label`
, COUNT(t.id) AS cnt
FROM articles t
GROUP BY `label`
ORDER BY `label`
) s
CROSS
JOIN ( SELECT #tot := 0 ) i
Let's unpack that a bit.
The inline view aliased as s returns the same resultset as your original query.
The inline view aliased as i returns a single row. We don't really care what it returns (except that we need it to return exactly one row because of the JOIN operation); what we care about is the side effect, a value of zero gets assigned to the #tot user variable.
Since MySQL materializes the inline view as a derived table, before the outer query runs, that variable gets initialized before the outer query runs.
For each row processed by the outer query, the value of cnt is added to #tot.
The return of s.cnt in the SELECT list is entirely optional, it's just there as a demonstration.
N.B. The MySQL reference manual specifically states that this behavior of user-defined variables is not guaranteed.

MySQL Running Total with COUNT

I'm aware of the set #running_sum=0; #running_sum:=#running_sum + ... method, however, it does not seem to be working in my case.
My query:
SELECT DISTINCT(date), COUNT(*) AS count
FROM table1
WHERE date > '2011-09-29' AND applicationid = '123'
GROUP BY date ORDER BY date
The result gives me unique dates, with the count of occurrences of application 123.
I want to keep a running total of the count, to see the accumulated growth.
Right now I'm doing this in PHP, but I want to switch it all to MySQL.
Using the method from the first line of this post simply duplicates the count, instead of accumulating it.
What am I missing?
P.S. The set is very small, only about 100 entries.
Edit: you're right ypercube:
Here's the version with running_sum:
SET #running_sum=0;
SELECT date, #running_sum:=#running_sum + COUNT(*) AS total FROM table1
WHERE date > '2011-09-29' AND applicationid = '123'
GROUP BY date ORDER BY date
count column ends up being the same as if I just printed COUNT(*)
Updated Answer
The OP asked for a single-query approach, so as not to have to SET a user variable separately from using the variable to compute the running total:
SELECT d.date,
#running_sum:=#running_sum + d.count AS running
FROM ( SELECT date, COUNT(*) AS `count`
FROM table1
WHERE date > '2011-09-29' AND applicationid = '123'
GROUP BY date
ORDER BY date ) d
JOIN (SELECT #running_sum := 0 AS dummy) dummy;
"Inline initialization" of user variables is useful for simulating other analytic functions, too. Indeed I learned this technique from answers like this one.
Original Answer
You need to introduce an enclosing query to tabulate the #running_sum over your COUNT(*)ed records:
SET #running_sum=0;
SELECT d.date,
#running_sum:=#running_sum + d.count AS running
FROM ( SELECT date, COUNT(*) AS `count`
FROM table1
WHERE date > '2011-09-29' AND applicationid = '123'
GROUP BY date
ORDER BY date ) d;
See also this answer.
SQL is notoriously poor at running totals. As your result set is in order, you are much better advised to append a calculated running total column on the client side. Nothing in SQL will be as performant as that.
The Running total can be easily calculated using the lib_mysqludf_ta UDF library.
https://github.com/mysqludf/lib_mysqludf_ta#readme