I want to calculate percentile_cont on this table.
In Oracle, the query would be
SELECT PERCENTILE_CONT(0.05) FROM sometable;
What would be it's alternative in MariaDB/MySQL?
While MariaDB 10.3.3 has support for these functions in the form of window functions (see Lukasz Szozda's answer), you can emulate them using window functions in MySQL 8 as well:
SELECT DISTINCT first_value(matrix_value) OVER (
ORDER BY CASE WHEN p <= 0.05 THEN p END DESC /* NULLS LAST */
) x,
FROM (
SELECT
matrix_value,
percent_rank() OVER (ORDER BY matrix_value) p,
FROM some_table
) t;
I've blogged about this more in detail here.
MariaDB 10.3.3 introduced PERCENTILE_CONT, PERCENTILE_DISC, and MEDIAN windowed functions.
PERCENTILE_CONT
PERCENTILE_CONT() (standing for continuous percentile) is an ordered set aggregate function which can also be used as a window function. It returns a value which corresponds to the given fraction in the sort order. If required, it will interpolate between adjacent input items.
SELECT name, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY star_rating)
OVER (PARTITION BY name) AS pc
FROM book_rating;
There is no built in function for this in either MariaDB or MySQL, so you have to solve this on the SQL level (or by adding a user defined function written in C ...)
This might help with coming up with a SQL solution:
http://rpbouman.blogspot.de/2008/07/calculating-nth-percentile-in-mysql.html
MariaDB 10.2 has windowing functions.
For MySQL / older MariaDB, and assuming you just want the Nth percentile for a single set of values.
This is best done form app code, but could be built into a stored routine.
Count the total number of rows: SELECT COUNT(*) FROM tbl.
Construct and execute a SELECT with LIMIT n,1 where n is computed as the percentile times the count, then filled into the query.
If you need to interpolate between two values, it gets messier. Do you need that, too?
Related
I have a query that has a user-defined variable set on top of the main query. Its something like this.
set #from_date = '2019-10-01', #end_date = '2019-12-31';
SELECT * FROM myTable
WHERE create_time between #from_date AND #end_date;
It works just fine when I executed it in MySQL Workbench, but when I put it to Excel via MySQL ODBC it always shows an error like this.
I need that user-defined variable to works in Excel. What am I supposed to change in my query?
The ODBC connector is most likely communicating with MySQL via statements or prepared statements, one statement at a time, and user variables are not supported. A prepared statement would be one way you could bind your date literals. Another option here, given your example, would be to just inline the date literals:
SELECT *
FROM myTable
WHERE create_time >= '2019-10-01' AND create_time < '2020-01-01';
Side note: I expressed the check on the create_time, which seems to want to include the final quarter of 2019, using an inequality. The reason for this is that if create_time be a timestamp/datetime, then using BETWEEN with 31-December on the RHS would only include that day at midnight, at no time after it.
Use subquery for variables values init:
SELECT *
FROM myTable,
( SELECT #from_date := '2019-10-01', #end_date := '2019-12-31' ) init_vars
WHERE create_time between #from_date AND #end_date;
Pay attention:
SELECT is used, not SET;
Assinging operator := is used, = operator will be treated as comparing one in this case giving wrong result;
Alias (init_vars) may be any, but it is compulsory.
Variable is initialized once but may be used a lot of times.
If your query is complex (nested) the variables must be initialized in the most inner subquery. Or in the first CTE if DBMS version knows about CTEs. If the query is extremely complex, and you cannot determine what subquery is "the most inner", then look for execution plan for what table is scanned firstly, and use its subquery for variables init (but remember that execution plan may be altered in any moment).
I am trying to write a MySQL query that would count the number of times a value crosses a constant. The end result is we are tying to determine the relative 'noise' of the value via the amplitude and the frequency of the value. MIN() and MAX() provide the amplitude. Count() gives the number of samples that fit the criteria, but it doesn't provide how stable that value is. We are currently using MySQL 5.7 but we will be moving to MySQL 8.0 that provides the windowing features. Something like
Select Count(Value) over (order by logtime ROWS 1 Proeeding <123 AND 1 Following > 123) WHERE logtime BETWEEN...;
Thank your for any help you can provide.
SELECT Count(Value) WHERE Value > 123 AND logtime BETWEEN...;
SELECT Count(Value) WHERE Value < 123 AND logtime BETWEEN...;
Window functions are not available in MySQL versions before 8.0
With MySQL 5.7, we can emulate some window functions by using user-defined variables in a carefully crafted query. The MySQL Reference Manual gives explicit warning about using user-defined variables in a context like this. We are relying on behavior that is not guaranteed.
But as an example of the pattern I would use to achieve the specified result:
SELECT SUM(c.crossed_avg) AS count_crossed_avg
FROM (
SELECT IF( ( #prval > a.avg_ AND t.value < a.avg_ ) OR
( #prval < a.avg_ AND t.value > a.avg_ )
,1,0) AS crossed_avg
, #prval := t.value AS value_
FROM mytable t
CROSS
JOIN ( SELECT 123 AS avg_ ) a
CROSS
JOIN ( SELECT #prval := NULL ) i
WHERE ...
ORDER BY t.logtime
) c
To unpack this, focus first on the inline view query; that is, ignore the SELECT SUM() wrapper query, and run just the inline view query.
We order the rows by logtime so that we can process the rows in order.
We compare the value on the current row to the value from the previous row. If one is above average and the other is below average, then we return a 1, else we return 0.
Save the current value into the user-defined variable for comparing the next row. (Note: the order of operations is important; we are depending on MySQL to do that assignment after the evaluation of the IF() function.
The example query doesn't address the edge case when a row value is exactly equal to the average, e.g. a sequence of values 124.4 < 123.0 < 122.2. (We might want to consider changing the comparisons so that one includes the equality e.g. < and >=.
I'm currently attempting to calculate the timestamp differences between rows in google big query attached is the sample table I am using to test the code .
I am using this code
SELECT
A.row,
A.issue.updated_at,
(B.issue.updated_at - A.issue.updated_at) AS timedifference
FROM [icxmedia-servers:icx_metrics.gh_zh_data_production] A
INNER JOIN [icxmedia-servers:icx_metrics.gh_zh_data_production] B
ON B.row = (A.row + 1)
WHERE issue.number==6 and issue.name=="archer"
ORDER BY A.requestid ASC
Referenced from this question Calculate the time difference between of two rows
Rather than a JOIN, this is more naturally expressed using analytic functions. The documentation for analytic functions with standard SQL in BigQuery explains how analytic functions work and what the syntax is. As an example, if you wanted to take successive differences in x values where the order is determined by column y, you could do:
WITH T AS (
SELECT
x,
y
FROM UNNEST([9, 3, 4, 7]) AS x WITH OFFSET y)
SELECT
x,
x - LAG(x) OVER (ORDER BY y) AS x_diff
FROM T;
Note that to run this in BigQuery, you need to uncheck the "Use Legacy SQL" box under "Show Options" to enable standard SQL. The WITH T clause is simply setting up some data for the example.
For your specific case, you would probably want a query such as:
SELECT
row,
issue.updated_at,
issue.updated_at - LAG(issue.updated_at) OVER (ORDER BY issue.updated_at) AS timedifference
FROM `icxmedia-servers.icx_metrics.gh_zh_data_production`
WHERE issue.number = 6
AND issue.name = "archer"
ORDER BY requestid ASC;
If you want to determine differences in updated_at outside of just a single issue number, you could use a PARTITION BY clause as well. For example:
SELECT
row,
issue.name,
issue.number,
issue.updated_at,
issue.updated_at - LAG(issue.updated_at) OVER (
PARTITION BY issue.number
ORDER BY issue.updated_at) AS timedifference
FROM `icxmedia-servers.icx_metrics.gh_zh_data_production`
ORDER BY requestid ASC;
I am trying to convert below MySQL query to SQL Server.
SELECT
#a:= #a + 1 serial_number,
a.id,
a.file_assign_count
FROM usermaster a,
workgroup_master b,
(
SELECT #a: = 0
) AS c
WHERE a.wgroup = b.id
AND file_assign_count > 0
I understand that := operator in MySQL assigns value to a variable & returns the value immediately. But how can I simulate this behavior in SQL Server?
SQL Server 2005 and later support the standard ROW_NUMBER() function, so you can do it this way:
SELECT ROW_NUMBER() OVER (ORDER BY xxxxx) AS serial_number,
a.id,
a.file_assign_count
FROM usermaster a
JOIN workgroup_master b ON a.wgroup = b.id
WHERE file_assign_count > 0
Re your comments: I edited the above to show the OVER clause. The row-numbering only has any meaning if you define the sort order. Your original query didn't do this, which means it's up to the RDBMS what order the rows are returned in.
But when using ROW_NUMBER() you must be specific. Where I put xxxxx, you would put a column or expression to define the sort order. See explanation and examples in the documentation.
The subquery setting #a:=0 is only for initializing that variable, and it doesn't need to be joined into the query anyway. It's just a style that some developers use. This is not needed in SQL Server, because you don't need the user variable at all when you can use ROW_NUMBER() instead.
If the SQL Server database is returning two rows where your MySQL database returned one row, the data must be different. Because neither ROW_NUMBER() or the MySQL user variables would limit the number of rows returned.
P.S.: please use JOIN syntax. It has been standard since 1992.
Is there a way to use window functions in MySQL queries dynamically within a SELECT query itself? (I know for a fact that it is possible in PostgreSQL).
For example, here is the equivalent query in PostgreSQL:
SELECT c_server_ip, c_client_ip, sum(a_num_bytes_sent) OVER
(PARTITION BY c_server_ip) FROM network_table;
However, what would be the corresponding query in MySQL?
Starting MySQL 8.0, you can now use OVER and partition, so consider upgrading to the latest version :)
Hope this might work:
select A.c_server_ip, A.c_client_ip, B.mySum
from network_table A, (
select c_server_ip, sum(a_num_bytes_sent) as mySum
from network_table group by c_server_ip
) as B
where A.c_server_ip=B.c_server_ip;