MYSQL NTILE function start with highest percentile - mysql

I'm using the MYSQL NTILE function and for the most part it is doing what I need it to, however there is one case in which I need different behaviour and I can't figure out how to do it. The case is when I have more buckets than I do records.
So lets say my data in a table called data looks like this
ID val
1 15
2 20
3 10
My issue is when I have more buckets than I do records, so lets say I run
select *, NTILE(4) over (order by val) from data
This will result in
ID val NTILE
3 10 1
1 15 2
2 20 3
I'm having some trouble wording my question which is probably why I am struggling to find solutions on Google, but basically my question is this: Is there any way that when I have more buckets than records (in this example 4 buckets but only 3 records) that I can treat the highest value as the highest percentile and work backwards rather than what it is currently doing which is treating the lowest value as the lowest percentile? Essentially resulting in this:
ID val NTILE
2 20 4
1 15 3
3 10 2

I think you might be able to reverse the ordering in the NTILE() and numerically flip the result like so:
select *, 5-NTILE(4) over (order by val desc) from data
I would expect the following to happen (I have not run this though!):
ID val NITLE
2 20 4
1 15 3
3 10 2

Related

How to sum and get average of dynamic rows

I am new in MySql and help will be much appreciated
I had this assignment which is to get the sum and average of dynamic rows in the table. The table looks like
tbl_grade
id scores
1 10
1 11
1 9
1 10
1 6
2 10
2 9
2 10
I want to show the results like this
id sum average
1 46 9.2
2 29 9.7
Hope anyone can help. Thanks
This is a simple query using GROUP BY clause and the aggregate methods SUM and AVG. For a better understanding about grouping and aggregate methods, please read next: http://www.mysqltutorial.org/mysql-group-by.aspx
SELECT
id,
SUM(scores),
AVG(scores)
FROM
tbl_grade
GROUP BY
id

Spotfire intersect first 'n' periods

Is there a way to use an Over and Intersect function to get the average sales for the first 3 periods (not always consecutive months, sometimes a month is skipped) for each Employee?
For example:
EmpID 1 is 71.67 ((80 + 60 + 75)/3) despite skipping "3/1/2007"
EmpID 3 is 250 ((350 + 250 + 150)/3).
I'm not sure how EmpID 2 would work because there are just two data points.
I've used a work-around by calculated column using DenseRank over Date, "asc", EmpID and then used another Boolean calculated column where DenseRank column name is <= 3, then used Over functions over the Boolean=TRUE column but I want to figure the correct way to do this.
There are Last 'n' Period functions but I haven't seen anything resembling a First 'n' Period function.
EmpID Date Sales
1 1/1/2007 80
1 2/1/2007 60
1 4/1/2007 75
1 5/1/2007 30
1 9/1/2007 100
2 2/1/2007 200
2 3/1/2007 100
3 12/1/2006 350
3 1/1/2007 250
3 3/1/2007 150
3 4/1/2007 275
3 8/1/2007 375
3 9/1/2007 475
3 10/1/2007 300
3 12/1/2007 200
I suppose the solution depends on where you want this data represented, but here is one example
If((Rank([Date],"asc",[EmpID])<=3) and (Max(Rank([Date],"asc",[EmpID])) OVER ([EmpID])>=3),Avg([Sales]) over ([EmpID]))
You can insert this as a calculated column and it will give you what you want (assuming your data is sorted by date when imported).
You may want to see the row numbering, and in that case insert this as a calculated column as well and name it RN
Rank([Date],"asc",[EmpID])
Explanation
Rank([Date],"asc",[EmpID])
This part of the function is basically applying a row number (labeled as RN in the results below) to each EmpID grouping.
Rank([Date],"asc",[EmpID])<=3
This is how we are taking the top 3 rows regardless if Months are skipped. If your data isn't sorted, we'd have to create one additional calculated column but the same logic applies.
(Max(Rank([Date],"asc",[EmpID])) OVER ([EmpID])>=3)
This is where we are basically ignoring EmpID = 2, or any EmpID who doesn't have at least 3 rows. Removing this would give you the average (dynamically) for each EmpID based on their first 1, 2, or 3 months respectively.
Avg([Sales]) over ([EmpID])
Now that our data is limited to the rows we care about, just take the average for each EmpID.
#Chris- Here is the solution I came up with
Step 1: Inserted a calculated column 'rank' with the expression below
DenseRank([Date],"asc",[EmpID])
Step 2: Created a cross table visualization from the data table and limited data with the expression below

mysql - get the average of the output average

I have 3 table. final,milestone and milestonewp consider that the three tables is foreigned key like milestonewp<--FK--milestone<--FK--Final .Then I have a column for determining the average of the milestonewp for a certain foreign key. Then getting that average to be average again to be displayed to the final table.Here is my visual representation
milestonewp
condition | mile_id
20 1
20 1
30 1
21 2
21 2
31 2
40 3
30 3
50 3
How can I average the average that the chart above will produce?
I'm trying to work on this
select avg(milewp_condition)
from logs_pms_r_milestone_wp
where mile_id=1;
but i dont have any idea how it can produce for the other mile_id
EDIT
The above code will produce something like this
avg(milewp_condition)
0
0
0
so then, i also want to average that 3 rows.
If I understand well this should be what you look for:
SELECT AVG(milewp_condition)
FROM logs_pms_r_milestone_wp
GROUP BY mile_id;
If you want to average all, just do:
SELECT AVG(milewp_condition)
FROM logs_pms_r_milestone_wp;
Regards

Calculate max value of list of numbers with a maximum combination of "x"

ok, i'm not sure if i can explain this right.
Lets say i have a table with three columns (id, price, maxcombo)
maybe there's like 5 rows in this table with random numbers for price. 2. id is just incremental unique key)
maxcombo specified if that price can be in a combination of up to whatever number it is.
If x was 3, i would need to find the combination that has the maximum value of the sum 1-3 columns.
So say the table had:
1 - 100 - 1
2 - 50 - 3
3 - 10 - 3
4 - 15 - 3
5 - 20 - 2
the correct answer with be just row id 1.
since 100 alone (and can only be alone based on the maxcombo number)
is greater than say 50 + 20 + 15 or 20 + 15 or 10 + 20 etc.
Does that make sense?
I mean i could just calculate all the diff combinations and see which has the largest value, but i would imagine that would take a very long time if the table was larger than 5 rows.
Was wondering any math genius or super dev out there had some advice or creative way to figure this out in a more efficient manner.
Thanks ahead of time!
I built this solution to achieve the desired query. However, it hasn't been tested in terms of efficiency.
Following the example of colums 1-3:
SELECT max(a+b+c) FROM sample_table WHERE a < 3;
EDIT:
Looking at:
The correct answer will be just row id 1
...I considered maybe I misunderstood your question, and you want the query just obtain the rowid. So, I made this other one:
SELECT a FROM sum_combo WHERE a+b+c=(
SELECT max(a+b+c) FROM sum_combo WHERE a > 3
);
Which would for sure take too long in larger tables than just 5 rows.

MySQL: Matching inexact values using "ON"

I'm way out of my league here...
I have a mapping table (table1) to assign particular values (value) to a whole number (map_nu). My second table (table2), is a collection of averages (avg) for each user (user_id).
(I couldn't figure out how to properly make a markdown table, please feel free to edit!)
table1: table2:
(value)(Map_nu) (user_id)(avg)
---- -----
1 1 1 1.111
1.045 2 2 1.2
1.09 3 3 1.33333
1.135 4 4 1
1.18 5 5 1.389
1.225 6 6 1.42
1.27 7 7 1.07
1.315 8
1.36 9
1.405 10
The value Map_nu is a special number that each user gets assigned according to their average. I need to find a way to match the averages from table2 to the closest value in table1. I only need to match to the 2 digit past the decimal, so I've added the Truncated function
SELECT table2.user_id, map_nu
FROM `table1`
JOIN table2 ON TRUNCATE(table1.value,2)=TRUNCATE(table2.avg,2)
I still miss the values that don't match the averages exactly. Is there a way to pick the nearest truncated value or even to round to the second decimal? Rounding up/down wont matter as long as its applied to all values the same.
I am trying to have the following result (if rounded up):
(user_id)(Map_nu)
----
1 4
2 6
3 6
4 1
5 10
6 11
7 3
Thanks!
i think you might have to do this in 2 separate queries. there is no 'nearest' operator in sql, so you can either calculate it in your software, or you could use
select map_nu from table1 ORDER BY abs(value - $avg) LIMIT 1
inside a loop. however, that cannot be used as a join function as it requires the ORDER and LIMIT which are not valid as joins.
another way of looking at it is it seems that your map_nu and value are deterministic in relation to each other - value = 1 + ((map_nu - 1) * 0.045) - so maybe you could make use of that fact and calculate an integer based on that equation? assuming that relationship holds true for all values of map_nu.
This is an awkward database design. What is the data representing and what are you trying to solve? There might be a better way.
Maybe do something like...
SELECT a.user_id, b.map_nu, abs(a.avg - b.value)
FROM
table2 a
join table1 b
left join table1 c on abs(a.avg - b.value) > abs(a.avg - c.value)
where c.value is null
order by a.user_id
Doesn't actually produce the same output as the one you were expecting for (doesn't do any rounding). Though you should be able to tweak it from there. Above query will produce the output below (w/ data you've provided):
user_id map_nu abs(a.avg - b.value)
------- ------ --------------------
1 3 0.0209999999999999
2 5 0.02
3 8 0.01833
4 1 0
5 10 0.016
6 10 0.0149999999999999
7 3 0.02
Beware though if you're dealing with large tables. Evaluate the explain of the above query if it'll be practical to run it within MySQL or if better to be done outside it.
Note 2: Will produce duplicate rows if there are avg values that are equi-distant to value values within table1 (Ex. if value for map_nu's 11 and 12 are 2 and 3 and someone get's an avg of 2.5). Your question doesn't really specify what to do for that so you might want to take that into account.
Its taking a little extra work, but I figure the easiest way to get my results will be to map all values to the second decimal place in table1:
1 1
1.01 1
1.02 1
1.03 1
1.04 1
1.05 2
1.06 2
1.07 2
1.08 2
1.09 3
1.1 3
1.11 3
1.12 3
1.13 3
1.14 4
...
Thanks for the suggestions! Sorry I couldn't present the question more clear.