Creating buckets to categorize the values using MySQL - mysql

Hi I am trying to create a buckets for a very large number of rows. I have a maximum value of 9759721 and a minimum value of 1006909. I would like to show the results as following:
distance
bucket
range
1006909
0
1000000 - 1009999
1013525
1
1010000 - 1019999
1021948
2
1020000 - 1029999
The table might not be so clear but in general, I would like to break down them by a change of 10000. Creating a new bucket once every 10000 starting from 1000000.
I tried the following code but it doesn't show the correct output.
select distance,floor(distance/10000) as _floor from data;
I got something like:
distance
bucket
1006909
100
1013525
101
1021948
102
1035472
103
1042069
104
9759721
975
This seems to be correct but I need the bucket to start from 0 and then change based on 10000. And then have a range column as well. The minimum value that I have for distance is 1006909 and so the data doesn't start with 0 but is it possible to still have a bucket column starting from 0 [i.e assigned to minimum distance].

SELECT
d.distance,
DENSE_RANK() OVER (ORDER BY d._floor) - 1 AS bucket,
d._floor * 10000 AS bucket_lower_limit,
d._floor * 10000 + 10000 AS bucket_upper_limit
FROM
(
SELECT
distance,
FLOOR(distance / 10000) AS _floor
FROM
data
)
AS d
NOTE: the will give buckets numbered from 0 upwards, but will also remove all gaps (such that you sample data will have bucket 5 for the last row, not bucket 975)
Alternatively, if you need to preserve the gaps...
SELECT
d.distance,
d._floor - MIN(d._floor) OVER () AS bucket,
d._floor * 10000 AS bucket_lower_limit,
d._floor * 10000 + 10000 AS bucket_upper_limit
FROM
(
SELECT
distance,
FLOOR(distance / 10000) AS _floor
FROM
data
)
AS d

Just calculate 1006909 div 10000 * 10000 = 1000000 and subtract it from distance. That'll make the buckets start from 0:
SELECT distance
, (distance - a) div 10000 AS bucket
, distance div 10000 * 10000 AS range_from
, distance div 10000 * 10000 + (10000 - 1) AS range_to
FROM t
CROSS JOIN (
SELECT MIN(distance) div 10000 * 10000 AS a
FROM t
) AS x
SQL Fiddle

Related

Compare current value with previous and list the results if meet criteria

I have a raw table that consists of Timestamp, location and metric. I would like to generate a another table where outputs only the rows where current row is lower than 40% of the previous row and the difference in terms of percentage between the previous record.
Example Input:
Timestamp
location
metric
2021-10:00:00
Dallas
150
2021-10:05:00
Dallas
120
2021-10:10:00
Dallas
180
2021-10:15:00
Dallas
100
2021-10:20:00
Dallas
59
2021-10:25:00
Dallas
100
Expected Output:
Timestamp
location
metric
percentage
2021-10:15:00
Dallas
100
56%
2021-10:20:00
Dallas
59
59%
I think you want lag():
select t.*, metric / prev_metric as ratio
from (select t.*,
lag(metric) over (partition by location order by timestamp) as prev_metric
from t
) t
where metric < 0.4 * prev_metric ;
Consider below
select * from (
select *, round(100 * metric / lag(metric) over win, 0) calc
from `project.dataset.table`
window win as (partition by location order by timestamp)
)
where calc < 60
if applied to sample data in your question - output is
Note: I believe your sample data had typos in values of timestamp - last few rows all had 2021-10:15:00 value, so I assume this was a typo - thus fixed it in your question

How to get previous row value that is a mathematical argument with alias in mysql query?

I got troubled in getting the getting the previous row value in the query. In the query there is a field percent that is a mathematical equation of ROUND(COUNT(DISTINCT userSteps.userid) / (SELECT COUNT(*) FROM userSteps) * 100.0, 2) AS percent
The idea is, in the percent field, I will get the current row and subtract to the next one and it will be the value of the delta field. on 1st row it should be 0 since there is no row before it. any idea?
I was planning to have (percent - <previous of percent value>) as delta
I tried to use LAG() but it got me some error of unknown column
SET #startDate='';
SET #endDate='';
SET #version='';
WITH userSteps AS
(SELECT ... )
SELECT steps.version as version,
steps.step as step,
steps.stepDesc as stepDesc,
COUNT(DISTINCT userSteps.userid) AS numUsers,
ROUND(COUNT(DISTINCT userSteps.userid) / (SELECT COUNT(*) FROM userSteps) * 100.0, 2) AS percent,
LAG(percent, 1) OVER (
PARTITION BY steps.version
ORDER BY steps.step
) delta
FROM
(SELECT ...) steps,
userSteps
WHERE userSteps.version = steps.version AND userSteps.maxStep >= steps.step
GROUP BY steps.version, steps.step;
this line that I insert got an issue:
LAG(percent, 1) OVER (
PARTITION BY steps.version
ORDER BY steps.step
) delta
I expect that it will look like this one.
version step stepdesc numUsers percent delta
1 1 .. 10 100 0
1 2 .. 5 98 2
1 3 .. 3 92 6
1 4 .. 4 90 2
1 5 .. 8 80 10
``

MySQL change return values

Here's my problem: I have two tables - zipcodes table and vendors table.
What I want to do is, when I enter a zip code, to get all vendors (based on their zip code) within a certain radius. I got it working so far.
But here's the thing. I need to divide the results based on the distance. I need to have several groups: within 10 miles, within 50 miles, and within 100 miles. What I want to do (if possible) is to change all values under 10 miles to 10, those between 11 and 50 to 50 and those between 51 and 100 to 100.
Here is my query so far, that returns the correct results. I need help how to substitute the actual distance values with those I need.
SELECT SQL_CALC_FOUND_ROWS
3959 * 2 * ASIN(SQRT(POWER(SIN(( :lat - zipcodes.zip_lat) * pi()/180 / 2), 2) + COS( :lat * pi()/180) * COS(zipcodes.zip_lat * pi()/180) * POWER(SIN(( :lon - zipcodes.zip_lon) * pi()/180 / 2), 2))) AS distance,
vendors.*
FROM
g_vendors AS vendors
INNER JOIN g_zipcodes AS zipcodes ON zipcodes.zip_code = vendors.vendor_zipcode
WHERE
vendors.vendor_status != 4
GROUP BY
vendors.vendor_id
HAVING distance < 100
Use CASE EXPRESSION :
SELECT t.*,
CASE WHEN t.distance < 10 THEN 10
WHEN t.distance between 11 and 50 THEN 50
ELSE 100
END as new_distance
FROM ( Your Query Here ) t
Add a new column to your SELECT-Part containing a number to represent the distances:
3 -> within 10 miles
2 -> within 50 miles
1 -> within 100 miles
code:
CAST((distance < 10) AS SIGNED INTEGER) + CAST((distance < 50) AS SIGNED INTEGER) + CAST((distance < 100) AS SIGNED INTEGER) AS goodName

how to fix query mysql with multiple sum

I have a query data from sum function:
ROUND(((nominal)*12) * ROUND((SUM((a.NCI)/3*(60/100))+SUM((b.NSI)/3*(40/100)))/3,2)/100,2) AS nominal_persentasi,
ROUND((((nominal)*12) * ROUND((SUM((a.NCI)/3*(60/100))+SUM((b.NSI)/3*(40/100)))/3,2))*(1.1/100)/100,2) AS tambah_persentasi,
ROUND((((nominal)*12) * ROUND((SUM((a.NCI)/3*(60/100))+SUM((b.NSI)/3*(40/100)))/3,2))+((((nominal)*12) * ROUND((SUM((a.NCI)/3*(60/100))+SUM((b.NSI)/3*(40/100)))/3,2))*(1.1/100))/100,2) AS total_penyesuaian
And the results are:
nominal_persentasi | tambah_persentasi | total_penyesuaian
12.000 3.000 1.203.000
The results produced should be 15,000 , why did it happen ?
I tried to sum the variable nominal_persentasi + tambah_persentasi but the result is 0.
You are missing a division by 100 in your total. Hence, instead of adding 12,000 and 3,000 to get 15,000 you were actually adding 12,000,000 and 3,000 to get 12,003,000.
SELECT ROUND(( (nominal)*12) * ROUND((SUM((a.NCI)/3*(60/100))+SUM((b.NSI)/3*(40/100)))/3,2)/100,2) AS nominal_persentasi,
ROUND((((nominal)*12) * ROUND((SUM((a.NCI)/3*(60/100))+SUM((b.NSI)/3*(40/100)))/3,2))*(1.1/100)/100,2) AS tambah_persentasi,
ROUND((((nominal)*12) * ROUND((SUM((a.NCI)/3*(60/100))+SUM((b.NSI)/3*(40/100)))/3,2)/100) + ((((nominal)*12) * ROUND((SUM((a.NCI)/3*(60/100))+SUM((b.NSI)/3*(40/100)))/3,2))*(1.1/100))/100, 2) AS total_penyesuaian
FROM yourTable -- your query was missing this division by 100 ^^^

How to compute the standard deviation with a "number" column?

I have this table in MySQL :
value number_ads
1 3
2 1
3 1
3 1
4 1
I would like to compute the standard deviation of the column value, but taking into account that the value 1 for example should be counted 3 times.
The result should be :
AVG = 2.1429 STD = 1.124858267715973
I tried with this following request, but I don't have the good result:
SELECT
SUM(value * number_ads) / SUM(number_ads) AS avg,
SQRT((SUM(POW(value, 2)) - POW(2.1429, 2))/SUM(number_ads))
FROM `test`
Calculate the square root of variance. Variance is the difference between mean of (squares of values) and square of mean i.e, Sum(xx)/Count(n) - MeanMean.
SELECT
SUM(value * number_ads) / SUM(number_ads) AS avg,
SQRT((SUM(POW(value ,2) * number_ads)/SUM(number_ads)) - avg * avg)
FROM `test`
Source