Count number of elements per decile - mysql

I have a table that's something like this:
Name | Frequency
----------------
Bill | 12
Joe | 23
Hank | 1
Stew | 98
I need to figure out how many people make up each decile of total frequency. I.e. if total sum(frequency) is 10,000 then each decile will have size 1,000. I need to know how many people make up each 1000. Right now I have done:
with rankedTable as (select * from TABLE order by frequency desc limit XXXX)
select sum(frequency) from rankedTable
And I am changing the XXXX so that the sum(frequency) adds up to decile values (which I know from sum(frequency)/10). There has to be a faster way of doing this.

I think this can give the n-percentile a user belongs to. I use variables for readability, but they are not strictly necessary.
set #sum := (select sum(freq) from t);
set #n := 10; -- define the N in N-perentile
select b.name, b.freq, sum(a.freq) as cumulative_sum, floor(sum(a.freq) / #sum * #n) as percentile
from t a join t b on b.freq >= a.freq
group by b.name
From this it is easy to count the members of each percentile:
select percentile, count(*) as `count`
from
(
select b.name, b.freq, sum(a.freq) as cumulative_sum, floor(sum(a.freq) / #sum * #n) as percentile
from t a join t b on b.freq >= a.freq
group by b.name
) x
group by percentile;
I hope this helps!

Related

mysql ranking based on sum of values

having a mysql table with multiple records belonging many different users like this:
id score
1 , 50
1 , 75
1 , 40
1, 20
2 , 85
2 , 60
2 , 20
i need to get the rank of each id but after finding the sum of their score;
the rank should be the same if the total score for each player is the same.
this gives me the total for each player:
select id,sum(score) as total from table_scores group by id order by total desc;
is it posssible to find the sum like above and use it to rank the players in one query?
Something big missing from the accepted answer. The rank needs to be bumped after a tie. If you've got 2 tied for 3rd place, there is no 4th place.
The following query is an adjustment of the accepted SQL to account for this and reset the rank variable (#r in the query) to match the row value. You can avoid the extra addition in the CASE/WHEN but initializing #row to 1 instead of 0 but then the row value is off by 1 and my OCD won't let that stand even if row number is not valuable.
select
id, total,
CASE WHEN #l=total THEN #r ELSE #r:=#row + 1 END as rank,
#l:=total,
#row:=#row + 1
FROM (
select
id, sum(score) as total
from
table_scores
group by
id
order by
total desc
) totals, (SELECT #r:=0, #row:=0, #l:=NULL) rank;
You can rank rows using variables:
select
id, total,
CASE WHEN #l=total THEN #r ELSE #r:=#r+1 END as rank,
#l:=total
FROM (
select
id, sum(score) as total
from
table_scores
group by
id
order by
total desc
) totals, (SELECT #r:=0, #l:=NULL) rank;
Please see it working here.
i find one more way to this problem... This one is based on JOIN clause
SET #rank = 0;
SELECT t1.id, t1.score, t2.rank
FROM (SELECT id, SUM(score) as score
FROM table_scores GROUP BY id ORDER BY score Desc) AS t1
INNER JOIN
(SELECT x.score, #rank:=#rank + 1 as rank FROM
(SELECT DISTINCT(SUM(score)) AS score
FROM table_scores
GROUP BY id ORDER BY score DESC) AS x) AS t2
ON t1.score = t2.score
Here is SQL Fiddle: http://sqlfiddle.com/#!9/2dcfc/16
P.S. it's interesting to see there is more then one way to solve a problem...

MYSQL query to find the all employees with nth highest salary

The two tables are salary_employee and employee
employee_salary
salary_id emp_id salary
Employee
emp_id | first_name | last_name | gender | email | mobile | dept_id | is_active
Query to get the all employees who have nth highest salary where n =1,2,3,... any integer
SELECT a.salary, b.first_name
FROM employee_salary a
JOIN employee b
ON a.emp_id = b.emp_id
WHERE a.salary = (
SELECT salary
FROM employee_salary
GROUP BY salary
DESC
LIMIT 1 OFFSET N-1
)
My Questions:
1) Is there any better and optimized way we can query this,
2) Is using LIMIT an good option
3) We have more options to calculate the nth highest salary, which is the best and what to follow and when?
One option using :
SELECT *
FROM employee_salary t1
WHERE ( N ) = ( SELECT COUNT( t2.salary )
FROM employee_salary t2
WHERE t2.salary >= t1.salary
)
Using Rank Method
SELECT salary
FROM
(
SELECT #rn := #rn + 1 rn,
a.salary
FROM tableName a, (SELECT #rn := 0) b
GROUP BY salary DESC
) sub
WHERE sub.rn = N
You have asked what seems like a reasonable question. There are different ways of doing things in SQL and sometimes some methods are better than others. The ranking problem is just one of many, many examples. The "answer" to your question is that, in general, order by is going to perform better than group by in MySQL. Although even that depends on the particular data and what you consider to be "better".
The specific issues with the question are that you have three different queries that return three different things.
The first returns all employees with a "dense rank" that is the same. That terminology is use purposely because it corresponds to the ANSI dense_rank() function which MySQL does not support. So, if your salaries are 100, 100, and 10, it will return two rows with a ranking of 1 and one with a ranking of 2.
The second returns different results if there are ties. If the salaries are 100, 100, 10, this version will return no rows with a ranking of 1, two rows with a ranking of 2, and one row with a ranking of 3.
The third returns an entirely different result set, which is just the salaries and the ranking of the salaries.
My comment was directed at trying the queries on your data. In fact, you should decide what you actually want, both from a functional and a performance perspective.
LIMIT requires the SQL to skim through all records between 0 and N and therefore requires increasing time the further back in your ranking you want to look. However, IMO that problem cannot be solved better.
As Gordon Linoff suggested: Run your option against your data set, using the commonly used ranks (which ranks are queried often, which are not? The result might be fast on rank 1 but terrible on rank 100).
Execute and analyze the Query Execution Plan and create indexes accordingly (for example on the salary column) and retest your queries.
Other options:
Option 4:
You could build a ranking table whichs serves as cache. The execution plan of your Limit-Query shows (see sqlfiddle here), that mysql already does create a temporary table to solve the query.
Pros: Easy and fast
Cons: Forces you to regenerate the ranking table each time the data changes
Option 5:
You could reconsider how you define "ranks".
If we have the following salaries:
100'000
100'000
80'000
Is the employee Nr 3 considered to be of rank 3 or 2?
Are 1 and 2 on the same rank (rank 1), but 3 is on rank 3?
If you define rank = order, you can greatly simplify the query to
SELECT a.salary, b.first_name
FROM employee_salary a, employee b
WHERE a.emp_id = b.emp_id
order by salary desc
LIMIT 1 OFFSET 4
demo: http://sqlfiddle.com/#!2/e7321d/1/0
try this,
SELECT * FROM one as A WHERE ( n ) = ( SELECT COUNT(DISTINCT(b.salary)) FROM one as B WHERE
B.salary >= A.salary )
Suppose emp_salary table have the below records:
And you want to select all employees with nth (N=1,2,3 etc.) highest/lowest (only change >(for highest), < (for lowest) operator according to your needs) salary, use the below sql:
SELECT DISTINCT(a.salary),
a.id,
a.name
FROM emp_salary a
WHERE N = (SELECT COUNT( DISTINCT(b.salary)) FROM emp_salary b
WHERE b.salary >= a.salary
);
For example, if you want to select all employees with 2nd highest salary, use below sql:
SELECT DISTINCT(a.salary),
a.id,
a.name
FROM emp_salary a
WHERE 2 = (SELECT COUNT( DISTINCT(b.salary)) FROM emp_salary b
WHERE b.salary >= a.salary
);
But if you want to display only second highest salary(only single record), use the below sql:
SELECT DISTINCT(a.salary),
a.id,
a.name
FROM emp_salary a
WHERE 2 = (SELECT COUNT( DISTINCT(b.salary)) FROM emp_salary b
WHERE b.salary >= a.salary
) limit 1;

Mysql query statement to find how many time I am the max amount for a listing

Here is my tabel structure.
id
veh_id
user_id
amount
...
I have other tables to relate the user_id and veh_id as well.
I want to know how many times a user has put an amount on each veh_id and on how many occasions, this amount is actually the highest amount received. I would like to have those 2 counts for each user available.
id, veh_id, user_id, amount
1 1 30 100
2 1 32 105
3 2 30 100
4 2 32 95
5 2 33 90
I would like the select statement to give me:
user 30 as bid 2 times and 1 time is the higest bidder
user 32 as bid 2 time ans 1 time is the higest bidder
user 33 bid 1 time and 0 time the highest bidder
I don't know if it is possible to get those numbers.
This might be close, not sure exactly how you're relating vehicles together.
select
user_id,
count(*) as num_bids,
SUM(is_highest) as max_bids
from ( select
a.user_id,
COALESCE((select
MAX(b.amount) < a.amount
from bid as b
where b.id < a.id
and b.veh_id=a.veh_id
), 1) as is_highest
from bid as a
) as c
group by user_id
My understanding is user 30 has 2 max bids (2 first bids on a vehicle).
EDIT: If you're just looking for total 1 max bid per vehicle, let me know. That's actually a lot easier than rolling back to see who's bids were max when they came in...
EDIT2: Solution for only 1 max counts per vehicle:
Seems like this should be simpler for some reason:
select
user_id,
count(*) as num_bids,
count(vamt) as num_max
from bid
left join (
select veh_id as vid, max(amount) as vamt
from bid
group by veh_id
) as a on vid = veh_id and vamt <= amount
group by user_id
Try this,
select x.user_id, x.bid_times, COALESCE(y.max_times,0) as max_times from
(select user_id, count(*) as bid_times from testt group by user_id) as x
LEFT JOIN
(select user_id, count(*) as max_times from testt a where 0=( select count(*) from testt where amount > a.amount and veh_id=a.veh_id ) group by user_id) as y
ON x.user_id=y.user_id

SQL- Selecting the most similar product

Alright, I have a relation which stores two keys, a product Id and an attribute Id. I want to figure out which product is most similar to a given product. (Attributes are actually numbers but it makes the example more confusing so they have been changed to letters to simplify the visual representation.)
Prod_att
Product | Attributes
1 | A
1 | B
1 | C
2 | A
2 | B
2 | D
3 | A
3 | E
4 | A
Initially this seems fairly simple, just select the attributes that a product has and then count the number of attributes per product that are shared. The result of this is then compared to the number of attributes a product has and I can see how similar two products are. This works for products with a large number of attributes relative to their compared products, but issues arise when products have very few attributes. For example product 3 will have a tie for almost every other product (as A is very common).
SELECT Product, count(Attributes)
FROM Prod_att
WHERE Attributes IN
(SELECT Attributes
FROM prod_att
WHERE Product = 1)
GROUP BY Product
;
Any suggestions on how to fix this or improvements to my current query?
Thanks!
*edit: Product 4 will return count() =1 for all Products. I would like to show Product 3 is more similar as it has fewer differing attributes.
Try this
SELECT
a_product_id,
COALESCE( b_product_id, 'no_matchs_found' ) AS closest_product_match
FROM (
SELECT
*,
#row_num := IF(#prev_value=A_product_id,#row_num+1,1) AS row_num,
#prev_value := a_product_id
FROM
(SELECT #prev_value := 0) r
JOIN (
SELECT
a.product_id as a_product_id,
b.product_id as b_product_id,
count( distinct b.Attributes ),
count( distinct b2.Attributes ) as total_products
FROM
products a
LEFT JOIN products b ON ( a.Attributes = b.Attributes AND a.product_id <> b.product_id )
LEFT JOIN products b2 ON ( b2.product_id = b.product_id )
/*WHERE */
/* a.product_id = 3 */
GROUP BY
a.product_id,
b.product_id
ORDER BY
1, 3 desc, 4
) t
) t2
WHERE
row_num = 1
The above query gets the closest matches for all the products, you can include the product_id in the innermost query, to get the results for a particular product_id, I have used LEFT JOIN so that even if a product has no matches, its displayed
SQLFIDDLE
Hope this helps
Try the "Lower bound of Wilson score confidence interval for a Bernoulli parameter". This explicitly deals with the problem of statistical confidence when you have small n. It looks like a lot of math, but actually this is about the minimum amount of math you need to do this sort of thing right. And the website explains it pretty well.
This assumes it is possible to make the step from positive / negative scoring to your problem of matching / not matching attributes.
Here's an example for positive and negative scoring and 95% CL:
SELECT widget_id, ((positive + 1.9208) / (positive + negative) -
1.96 * SQRT((positive * negative) / (positive + negative) + 0.9604) /
(positive + negative)) / (1 + 3.8416 / (positive + negative))
AS ci_lower_bound FROM widgets WHERE positive + negative > 0
ORDER BY ci_lower_bound DESC;
You could write a little view that will give you the total shared attributes between two products.
create view vw_shared_attributes as
select a.product,
b.product 'product_match',
count(*) 'shared_attributes'
from your_table a
inner join test b on b.attribute = a.attribute and b.product <> a.product
group by a.product, b.product
and then use that view to select the top match.
select product,
(select top 1 s.product_match from vw_shared_attributes s where t.product = s.product order by s.shared_attributes desc)
from your_table t
group by product
See http://www.sqlfiddle.com/#!6/53039/1 for an example

SELECT rows with minimum count(*)

Let's say i have a simple table voting with columns
id(primaryKey),token(int),candidate(int),rank(int).
I want to extract all rows having specific rank,grouped by candidate and most importantly only with minimum count(*).
So far i have reached
SELECT candidate, count( * ) AS count
FROM voting
WHERE rank =1
AND candidate <200
GROUP BY candidate
HAVING count = min( count )
But,it is returning empty set.If i replace min(count) with actual minimum value it works properly.
I have also tried
SELECT candidate,min(count)
FROM (SELECT candidate,count(*) AS count
FROM voting
where rank = 1
AND candidate < 200
group by candidate
order by count(*)
) AS temp
But this resulted in only 1 row,I have 3 rows with same min count but with different candidates.I want all these 3 rows.
Can anyone help me.The no.of rows with same minimum count(*) value will also help.
Sample is quite a big,so i am showing some dummy values
1 $sampleToken1 101 1
2 $sampleToken2 102 1
3 $sampleToken3 103 1
4 $sampleToken4 102 1
Here ,when grouped according to candidate there are 3 rows combining with count( * ) results
candidate count( * )
101 1
103 1
102 2
I want the top 2 rows to be showed i.e with count(*) = 1 or whatever is the minimum
Try to use this script as pattern -
-- find minimum count
SELECT MIN(cnt) INTO #min FROM (SELECT COUNT(*) cnt FROM voting GROUP BY candidate) t;
-- show records with minimum count
SELECT * FROM voting t1
JOIN (SELECT id FROM voting GROUP BY candidate HAVING COUNT(*) = #min) t2
ON t1.candidate = t2.candidate;
Remove your HAVING keyword completely, it is not correctly written.
and add SUB SELECT into the where clause to fit that criteria.
(ie. select cand, count(*) as count from voting where rank = 1 and count = (select ..... )
The HAVING keyword can not use the MIN function in the way you are trying. Replace the MIN function with an absolute value such as HAVING count > 10