Count unique values per row (over index axis, not column wise) - mysql

I have the following table:
WITH data AS (
SELECT 10 AS A, 10 AS B, 10 AS C
UNION ALL
SELECT 20 AS A, 10 AS B, 20 AS C
UNION ALL
SELECT 30 AS A, 20 AS B, 10 AS C
UNION ALL
SELECT 40 AS A, 40 AS B, 40 AS C
UNION ALL
SELECT 50 AS A, 20 AS B, 20 AS C)
SELECT * FROM data;
A B C
0 10 10 10
1 20 10 20
2 30 20 10
3 40 40 40
4 50 20 20
Now I want to count the number if unique values per row and store this in a new column called Unique_count
So my expected output would be:
A B C Unique_count
0 10 10 10 1
1 20 10 20 2
2 30 20 10 3
3 40 40 40 1
4 50 20 20 2
I am familiar with SELECT DISTINCT. But these are all column wise operations. I can't figure out how to count per row in SQL.
With the pandas module in Python it would simply be:
data['Unique_count'] = data.nunique(axis=1)
I have access to a MS SQL SERVER or MySQL SERVER so answers in both dialects are accepted.

In SQL Server, use a lateral join -- apply keyword`:
select t.*, v.unique_count
from t cross apply
(select count(distinct col) as unique_count
from (values (t.a), (t.b), (t.c)) v(col)
) v;
A lateral join is a lot like a correlated subquery in the from clause -- but more general because the subquery can return more than one column and more than one row.
This version does exactly what it looks like: it unpivots the columns and then uses count(distinct) to count the number of unique values.

In MySQL, you can use conditional logic:
select
t.*,
1 + (a <> b) + (a <> c and b<>c) unique_count
from data t
This works because MySQL evaluates true/false conditions as 1/0 in numeric context (this features saves us from lengthy case expressions here).
Demo on DB Fiddle:
| A | B | C | unique_count |
| --- | --- | --- | ------------ |
| 10 | 10 | 10 | 1 |
| 20 | 10 | 20 | 2 |
| 30 | 20 | 10 | 3 |
| 40 | 40 | 40 | 1 |
| 50 | 20 | 20 | 2 |

Add an id column to the table. Then you can use UNION to pivot the columns into rows, then COUNT(*) to get the counts. Then join that with the original table.
Note that you don't need to use COUNT(DISTINCT) because UNION DISTINCT removes duplicates.
WITH data AS (
SELECT 0 AS id, 10 AS A, 10 AS B, 10 AS C
UNION ALL
SELECT 1 AS id, 20 AS A, 10 AS B, 20 AS C
UNION ALL
SELECT 2 AS id, 30 AS A, 20 AS B, 10 AS C
UNION ALL
SELECT 3 AS id, 40 AS A, 40 AS B, 40 AS C
UNION ALL
SELECT 4 AS id, 50 AS A, 20 AS B, 20 AS C)
SELECT t1.*, t2.unique_count
FROM data AS t1
JOIN (
SELECT id, COUNT(*) AS unique_count
FROM (
SELECT id, A AS datum FROM data
UNION DISTINCT
SELECT id, B AS datum FROM data
UNION DISTINCT
SELECT id, C AS datum FROM data) AS x
GROUP BY id) AS t2
ON t1.id = t2.id

Related

how to make an SQL Join with inequality but just select TOP 1 rows for every mach of the inequality?

I have this tables:
table A:
id
value
1
20
2
15
3
10
table B:
id
value
1
20
2
14
3
10
I want all the pairs where A.value >= than B.value. But for every comparison in the WHERE condition i just want the first match. In the example:
I got this query:
SELECT * FROM A, B
WHERE A.date>=B.date;
A_id
A_value
B_id
B_value
1
20
1
20
1
20
2
14
1
20
3
10
2
15
2
14
2
15
3
10
3
10
3
10
but as i said, i just want the first match of every comparison (asume that a_value and b_value are sorted)
So i want to delete (actually ignore) these values:
A_id
A_value
B_id
B_value
1
20
2
14
1
20
3
10
2
15
3
10
and obtain:
A_id
A_value
B_id
B_value
1
20
1
20
2
15
2
14
3
10
3
10
I think i can achieve the result grouping by A_id and A_value and calculating MAX(B_value) but i dont know if this is efficient.
something like this
SELECT A.id,A.Value,MAX(B_value)
FROM A, B
WHERE A.date>=B.date
GROUP BY A.id,A.value;
So the question is:
Is there a query that can give me the result i need ?
You can use ROW_NUMBER() (available in MySQL 8.x). For example:
select *
from (
select
a.id as a_id, a.value as a_value,
b.id as b_id, b.value as b_value,
row_number() over(partition by a.id order by b.value desc) as rn
from a
join b on a.id = b.id
and a.value >= b.value
) x
where rn = 1

Getting wrong data from DB when joining MySql [duplicate]

I have a table of revenue as
title_id revenue cost
1 10 5
2 10 5
3 10 5
4 10 5
1 20 6
2 20 6
3 20 6
4 20 6
when i execute this query
SELECT SUM(revenue),SUM(cost)
FROM revenue
GROUP BY revenue.title_id
it produces result
title_id revenue cost
1 30 11
2 30 11
3 30 11
4 30 11
which is ok, now i want to combine sum result with another table which has structure like this
title_id interest
1 10
2 10
3 10
4 10
1 20
2 20
3 20
4 20
when i execute join with aggregate function like this
SELECT SUM(revenue),SUM(cost),SUM(interest)
FROM revenue
LEFT JOIN fund ON revenue.title_id = fund.title_id
GROUP BY revenue.title_id,fund.title_id
it double the result
title_id revenue cost interest
1 60 22 60
2 60 22 60
3 60 22 60
4 60 22 60
I can't understand why is it double it,please help
Its doubling because you have title repeated in fund and revenue tables. This multiplies the number of records where it matches. This is pretty easy to see if you remove the aggregate functions and look at the raw data. See here
The way to get around this is to create inline views of your aggregates and join on the those results.
SELECT R.title_id,
R.revenue,
R.cost,
F.interest
FROM (SELECT title_id,
Sum(revenue) revenue,
Sum(cost) cost
FROM revenue
GROUP BY revenue.title_id) r
LEFT JOIN (SELECT title_id,
Sum(interest) interest
FROM fund
GROUP BY title_id) f
ON r.title_id = F.title_id
output
| TITLE_ID | REVENUE | COST | INTEREST |
----------------------------------------
| 1 | 30 | 11 | 30 |
| 2 | 30 | 11 | 30 |
| 3 | 30 | 11 | 30 |
| 4 | 30 | 11 | 30 |
demo
The reason for this is that you have joined the table the first derived table from the second table without grouping it. To solve the problem, group the second table (fund) and join it with the first derived table using LEFT JOIN.
SELECT b.title_id,
b.TotalRevenue,
b.TotalCost,
d.TotalInterest
FROM
(
SELECT a.title_id,
SUM(a.revenue) TotalRevenue,
SUM(a.cost) TotalCost
FROM revenue a
GROUP BY a.title_id
) b LEFT JOIN
(
SELECT c.title_id,
SUM(a.interest) TotalInterest
FROM fund c
GROUP BY c.title_id
) d ON b.title_id = d.title_id
There are two rows for each title_id in revenue table.

MySQL - select avg of list but ignore max and min value

I have a list of values in my database.
k v
1 5000
1 100
1 120
1 3
2 5000
2 100
2 120
2 4
3 10000
3 120
3 100
3 4
4 10
4 120
4 110
4 5000
I want to calculate the average of each k but I need to ignore the highest and lowest value of v for each k. (to remove spikes)
select avg(v) from table where v > min(v) and v < max(v) group by k
results in an :
"Invalid use of group function"
I was thinking that this is a quite common task but I wasn't able to find any ideas from the docs.
Thanks for any advise.
One way to do this without worrying about whether there are duplicate min and max values of v (assuming you only want to ignore one of each) is to take the average as SUM(v)/COUNT(v), but subtracting the min and max values from the computation:
SELECT k, (SUM(v) - MAX(v) - MIN(v)) / (COUNT(v) - 2) AS average
FROM data
GROUP BY k
Output:
k average
1 110
2 110
3 110
4 115
Demo on dbfiddle
select avg(v) , k
from table
group by k
having k <> min (v) and k<> max (v)
First get the min and max v for each k and then left join the table to the results so to get the average of the non matching rows:
select
t.k, avg(t.v) average
from tablename t left join (
select k, min(v) minv, max(v) maxv
from tablename
group by k
) g on g.k = t.k and t.v in (g.minv, g.maxv)
where g.k is null
group by t.k
See the demo.
Results:
| k | average |
| --- | ------- |
| 1 | 110 |
| 2 | 110 |
| 3 | 110 |
| 4 | 115 |
Link: Demo
select t1.k, avg(t1.v) average
from numbers t1 left join (
select k, min(v) minv, max(v) maxv
from numbers
group by k
) t2 on t2.k = t1.k and t1.v in (t2.minv, t2.maxv)
where t2.k is null
group by t1.k

MySQL Relative ranking of values in one column for all occurrences of a given key

I'm looking for some basic direction on how where to start looking to try and rank rows of a common key in a query.
Imagine I have a table like this:
user_id | account_id | score
1 A 10
1 B 20
2 C 10
2 D 20
2 E 30
What I'm hoping to do is add a rank column for relative to each user_id where the highest score gets the top rank:
user_id | account_id | score | rank
1 A 10 2
1 B 20 1
2 C 10 3
2 D 20 2
2 E 30 1
Just looking for some basic direction in terms of which way to head :/
You can use subquery
select
*,
(select count(1)+1 from your_table b where a.user_id=b.user_id and a.score<b.score) as rank
from your_table a
Output
1 A 10 2
1 B 20 1
2 C 10 3
2 D 20 2
2 E 30 1

Mysql join and sum is doubling result

I have a table of revenue as
title_id revenue cost
1 10 5
2 10 5
3 10 5
4 10 5
1 20 6
2 20 6
3 20 6
4 20 6
when i execute this query
SELECT SUM(revenue),SUM(cost)
FROM revenue
GROUP BY revenue.title_id
it produces result
title_id revenue cost
1 30 11
2 30 11
3 30 11
4 30 11
which is ok, now i want to combine sum result with another table which has structure like this
title_id interest
1 10
2 10
3 10
4 10
1 20
2 20
3 20
4 20
when i execute join with aggregate function like this
SELECT SUM(revenue),SUM(cost),SUM(interest)
FROM revenue
LEFT JOIN fund ON revenue.title_id = fund.title_id
GROUP BY revenue.title_id,fund.title_id
it double the result
title_id revenue cost interest
1 60 22 60
2 60 22 60
3 60 22 60
4 60 22 60
I can't understand why is it double it,please help
Its doubling because you have title repeated in fund and revenue tables. This multiplies the number of records where it matches. This is pretty easy to see if you remove the aggregate functions and look at the raw data. See here
The way to get around this is to create inline views of your aggregates and join on the those results.
SELECT R.title_id,
R.revenue,
R.cost,
F.interest
FROM (SELECT title_id,
Sum(revenue) revenue,
Sum(cost) cost
FROM revenue
GROUP BY revenue.title_id) r
LEFT JOIN (SELECT title_id,
Sum(interest) interest
FROM fund
GROUP BY title_id) f
ON r.title_id = F.title_id
output
| TITLE_ID | REVENUE | COST | INTEREST |
----------------------------------------
| 1 | 30 | 11 | 30 |
| 2 | 30 | 11 | 30 |
| 3 | 30 | 11 | 30 |
| 4 | 30 | 11 | 30 |
demo
The reason for this is that you have joined the table the first derived table from the second table without grouping it. To solve the problem, group the second table (fund) and join it with the first derived table using LEFT JOIN.
SELECT b.title_id,
b.TotalRevenue,
b.TotalCost,
d.TotalInterest
FROM
(
SELECT a.title_id,
SUM(a.revenue) TotalRevenue,
SUM(a.cost) TotalCost
FROM revenue a
GROUP BY a.title_id
) b LEFT JOIN
(
SELECT c.title_id,
SUM(a.interest) TotalInterest
FROM fund c
GROUP BY c.title_id
) d ON b.title_id = d.title_id
There are two rows for each title_id in revenue table.