Calculate median - mysql - mysql

Can somebody explain to me how this query below works?
It's a query to calculate a median Latitude from the table. I get tired of understanding this and still, I can't.
SELECT *
FROM station as st
where (
select count(LAT_N)
from station
where LAT_N < st.LAT_N
) = (
select count(LAT_N)
from STATION where LAT_N > st.LAT_N
);

The median is the middle value in a collection, which means there are as many values above it as below.
So for each row in the table, the first subquery counts the number of rows where LAT_N is lower than in the current row, and the second counts the number of rows where it's higher. Then it only returns the rows where these counts are the same.
Note that this won't work in many situations. The simplest example is when there are an even number of distinct values:
1
2
3
4
The median should be 2.5 (the mean of the two middle values), which doesn't exist in the table.
Another case is when there are duplicate values:
1
1
2
the median should be 1. But the count of lower values is 0, while the count of higher values is 1, so they won't be equal.

Oh that's some clever code! It's got a bug, but here's what it's trying to do.
We define median as the value(s) that have the same number of values greater and less than them. So here's the query in pseudocode:
for each station in st:
compute number of stations with latitude greater than the current station's latitude
compute number of stations with latitude less than the current station's latitude
If these two values are equal, include it in the result.
Bug:
For tables with an even number of distinct values, the median should be defined as the mean of the two middle values. This code doesn't handle that.

Related

sum of differences between two rows in mysql

Above is a table and i need to get the total distance covered by tyre.
Looking for a way to get the difference the sum to get the total distance covered.
Each the total distance if gotten by sum of difference between "removal" AND "insert" action sharing.
The end results should be 1100+300 = 1400
If there always are just one 'insert' and one 'removal' row per tyre and position, you can use conditional aggregation to compute the distance covered by tuple, and then add another level of aggregation at tyre level:
select tyreId, sum(distance_covered) distance_covered
from (
select
tyreId,
position,
sum(case action when 'removal' then distance else - distance end) distance_covered
from mytable
where action in ('insert', 'removal')
group by tyreId, position
)t
group by tyreId

SQL query for percentile intervals, but equal number of rows per interval?

I would like to assign an index/group number to each row indicating what percentile it belongs to. For instance, the code I have now is:
SELECT User, Revenue,
(int)(100*(PERCENT_RANK() OVER(ORDER BY Revenue)))/5 AS Percentile
FROM RevenueDistribution;
So a Percentile 0 indicates 0-5%, 1 is 5-10%, etc. However, using equally spaced intervals means the number of rows is uneven, as the first interval has many more rows than the others because there are many rows with Revenue=0 in the data. For instance, there may be over a million users with Percentile=0, and then a few hundred thousand users each for Percentile=1,2,....
Is there a way to assign an index like this, but with an equal number of rows per interval, as opposed to having equally spaced intervals but with an unequal number of rows?
Based on your description I think that you are searching for NTILE:
NTILE(N) over_clause
Divides a partition into N groups (buckets), assigns each row in the partition its bucket number, and returns the bucket number of the current row within its partition. For example, if N is 4, NTILE() divides rows into quartiles. If N is 100, NTILE() divides rows into percentiles.
SELECT User, Revenue,
NTILE(100) OVER(ORDER BY Revenue) AS bucket_num
FROM RevenueDistribution;

smallest value between difference of two columns mysql

please, I want to select from mysql tables where the absolute difference between two columns is the smallest value between the absolute difference values.
I tried this syntax but it was not right
SELECT strike FROM options_20161230 ORDER BY ask - bid ASC LIMIT 1
I wonder if I can create a new column in the table as the difference between two columns, is that possible?
also I want to select where one column has a value between two numbers, I tried this
SELECT strike FROM options_20161230 WHERE 7 < Expiration - Datadate < 37 AND type ='put' AND UnderlyingSymbol = 'SPY'
it works when limited Expiration - Datadate by one value < 37. however It was not working with two values <,> ?
any idea please!
Many Thanks
Your first query is close. You just want abs():
SELECT strike
FROM options_20161230
ORDER BY abs(ask - bid) ASC
LIMIT 1;
Your third query should use between (assuming the difference is an integer) or two inequalities:
SELECT strike
FROM options_20161230
WHERE Expiration - Datadate BETWEEN 8 AND 36 AND
type ='put' AND
UnderlyingSymbol = 'SPY';

Select average value X of SQL table column while not grouping by X

For the purposes of my question, I have a database in a MySQL server with info on many taxi rides (it is comprised of two tables, history_trips and trip_info).
In history_trips, each row's useful data is comprised of a unique alphanumeric ID, ride_id, the name of the rider, rider, and the time the ride ended, finishTime as a Y-m-d string.
In trip_info, each row's useful data similarly contains ride_id and rider, but also contains an integer, value (calculated in the back end from other data).
What I need to do is create a query that can find the average of all the maximum 'values' from all riders in a given time period. The riders included in this average are only considered if they completed less than X (let's say 3) rides within the aforementioned time period.
So far, I have a query that creates a grouped table containing the name of the rider, the finishTime of their highest 'value' ride, the value of said ride, and the number of rides, num_rides, they have taken in that time period. The AVG(b.value) column, however, gives me the same values as b.value, which is unexpected. I would like to find some way to return the average of the b.value column.
SELECT a.rider, a.finishTime, b.value, AVG(b.value), COUNT(a.rider) as num_rides
FROM history_trips as a, trip_info as b
WHERE a.finishTime > 'arbitrary_start_date_str' and a.ride_id = b.ride_id
and b.value = (SELECT MAX(value)
from trip_info where rider = b.rider and ride_id = b.ride_id)
GROUP BY a.rider
HAVING COUNT(a.rider) < 3
I am a novice in SQL but have read on some other forums that when using the AVG function on a value you must also GROUP BY that value. I was wondering if there is a way around that or if I am thinking of this problem incorrectly. Thanks in advance for any advice / solutions you might have!
The following worked for me:
SELECT AVG(ridergroups.maxvalues) avgmaxvalues FROM
(SELECT MAX(trip_info.value) maxvalues FROM trip_info
INNER JOIN history_trips
ON trip_info.rideid = history_trips.ride_id
WHERE history_trips.finishTime > '2010-06-20'
GROUP BY trip_info.rider
HAVING COUNT(trip_info.rider) < 3) ridergroups;
The subquery groups the maximum values by rider after filtering by date and rider count. The containing query calculates the average of the maximum values.

Calculate average date difference between records in MS Access

I have a list on when items have been handed out. The table has the following structure:
primary key - autonumber itemname
itemid - number
datehandedout - date/time
I want to calculate the average length of time between when one object is given out and the next one is given out. There will be a number of different items for which the average time between handouts needs to be listed for.
So something like (pseudocode):
average( [thisrecord]![datehandedout] - [lastrecord]![datehandedout] )
Any help will be much appreciated.
This is a very slow query:
SELECT Avg(DateDiff("h",[datehandedout],(
SELECT TOP 1 datehandedout
FROM tbl tx
WHERE tx.datehandedout > t.datehandedout))) AS Difference
FROM tbl AS t
Add another Where statement to limit the number of records returned when you test, for example:
WHERE Year([datehandedout])=2010