MySQL - Get the occurrence of couples of values in 2 fields - mysql

I've got a table with repeated X and Y couples:
------------------------
ID | X | Y
------------------------
1 10 20
2 20 10
3 10 20
4 30 20
5 20 10
6 20 10
I would like to count the frequency of the same (X,Y) couples like this:
--------------------------
X | Y | COUNT
--------------------------
20 10 3
10 20 2
30 20 1
This is what I tried to do:
SELECT X,
Y,
COUNT(DISTINCT X, Y) AS FREQUENCY
FROM `ordini`
GROUP BY X, Y
ORDER BY `FREQUENCY` DESC
But the result is not what I expected: FREQUENCY returned is 1 for all the couples.
Where am I wrong?

Don't use this COUNT(DISTINCT X, Y) as DISTINCT removes all the same records and you are getting single value for same X,Y that's why you are getting 1
SELECT X,
Y,
COUNT(*) AS FREQUENCY
FROM ordini
GROUP BY X, Y
ORDER BY FREQUENCY DESC
Live Demo
http://sqlfiddle.com/#!9/beeff/2

Related

Count unique values per row (over index axis, not column wise)

I have the following table:
WITH data AS (
SELECT 10 AS A, 10 AS B, 10 AS C
UNION ALL
SELECT 20 AS A, 10 AS B, 20 AS C
UNION ALL
SELECT 30 AS A, 20 AS B, 10 AS C
UNION ALL
SELECT 40 AS A, 40 AS B, 40 AS C
UNION ALL
SELECT 50 AS A, 20 AS B, 20 AS C)
SELECT * FROM data;
A B C
0 10 10 10
1 20 10 20
2 30 20 10
3 40 40 40
4 50 20 20
Now I want to count the number if unique values per row and store this in a new column called Unique_count
So my expected output would be:
A B C Unique_count
0 10 10 10 1
1 20 10 20 2
2 30 20 10 3
3 40 40 40 1
4 50 20 20 2
I am familiar with SELECT DISTINCT. But these are all column wise operations. I can't figure out how to count per row in SQL.
With the pandas module in Python it would simply be:
data['Unique_count'] = data.nunique(axis=1)
I have access to a MS SQL SERVER or MySQL SERVER so answers in both dialects are accepted.
In SQL Server, use a lateral join -- apply keyword`:
select t.*, v.unique_count
from t cross apply
(select count(distinct col) as unique_count
from (values (t.a), (t.b), (t.c)) v(col)
) v;
A lateral join is a lot like a correlated subquery in the from clause -- but more general because the subquery can return more than one column and more than one row.
This version does exactly what it looks like: it unpivots the columns and then uses count(distinct) to count the number of unique values.
In MySQL, you can use conditional logic:
select
t.*,
1 + (a <> b) + (a <> c and b<>c) unique_count
from data t
This works because MySQL evaluates true/false conditions as 1/0 in numeric context (this features saves us from lengthy case expressions here).
Demo on DB Fiddle:
| A | B | C | unique_count |
| --- | --- | --- | ------------ |
| 10 | 10 | 10 | 1 |
| 20 | 10 | 20 | 2 |
| 30 | 20 | 10 | 3 |
| 40 | 40 | 40 | 1 |
| 50 | 20 | 20 | 2 |
Add an id column to the table. Then you can use UNION to pivot the columns into rows, then COUNT(*) to get the counts. Then join that with the original table.
Note that you don't need to use COUNT(DISTINCT) because UNION DISTINCT removes duplicates.
WITH data AS (
SELECT 0 AS id, 10 AS A, 10 AS B, 10 AS C
UNION ALL
SELECT 1 AS id, 20 AS A, 10 AS B, 20 AS C
UNION ALL
SELECT 2 AS id, 30 AS A, 20 AS B, 10 AS C
UNION ALL
SELECT 3 AS id, 40 AS A, 40 AS B, 40 AS C
UNION ALL
SELECT 4 AS id, 50 AS A, 20 AS B, 20 AS C)
SELECT t1.*, t2.unique_count
FROM data AS t1
JOIN (
SELECT id, COUNT(*) AS unique_count
FROM (
SELECT id, A AS datum FROM data
UNION DISTINCT
SELECT id, B AS datum FROM data
UNION DISTINCT
SELECT id, C AS datum FROM data) AS x
GROUP BY id) AS t2
ON t1.id = t2.id

MySQL - select avg of list but ignore max and min value

I have a list of values in my database.
k v
1 5000
1 100
1 120
1 3
2 5000
2 100
2 120
2 4
3 10000
3 120
3 100
3 4
4 10
4 120
4 110
4 5000
I want to calculate the average of each k but I need to ignore the highest and lowest value of v for each k. (to remove spikes)
select avg(v) from table where v > min(v) and v < max(v) group by k
results in an :
"Invalid use of group function"
I was thinking that this is a quite common task but I wasn't able to find any ideas from the docs.
Thanks for any advise.
One way to do this without worrying about whether there are duplicate min and max values of v (assuming you only want to ignore one of each) is to take the average as SUM(v)/COUNT(v), but subtracting the min and max values from the computation:
SELECT k, (SUM(v) - MAX(v) - MIN(v)) / (COUNT(v) - 2) AS average
FROM data
GROUP BY k
Output:
k average
1 110
2 110
3 110
4 115
Demo on dbfiddle
select avg(v) , k
from table
group by k
having k <> min (v) and k<> max (v)
First get the min and max v for each k and then left join the table to the results so to get the average of the non matching rows:
select
t.k, avg(t.v) average
from tablename t left join (
select k, min(v) minv, max(v) maxv
from tablename
group by k
) g on g.k = t.k and t.v in (g.minv, g.maxv)
where g.k is null
group by t.k
See the demo.
Results:
| k | average |
| --- | ------- |
| 1 | 110 |
| 2 | 110 |
| 3 | 110 |
| 4 | 115 |
Link: Demo
select t1.k, avg(t1.v) average
from numbers t1 left join (
select k, min(v) minv, max(v) maxv
from numbers
group by k
) t2 on t2.k = t1.k and t1.v in (t2.minv, t2.maxv)
where t2.k is null
group by t1.k

MySQL Relative ranking of values in one column for all occurrences of a given key

I'm looking for some basic direction on how where to start looking to try and rank rows of a common key in a query.
Imagine I have a table like this:
user_id | account_id | score
1 A 10
1 B 20
2 C 10
2 D 20
2 E 30
What I'm hoping to do is add a rank column for relative to each user_id where the highest score gets the top rank:
user_id | account_id | score | rank
1 A 10 2
1 B 20 1
2 C 10 3
2 D 20 2
2 E 30 1
Just looking for some basic direction in terms of which way to head :/
You can use subquery
select
*,
(select count(1)+1 from your_table b where a.user_id=b.user_id and a.score<b.score) as rank
from your_table a
Output
1 A 10 2
1 B 20 1
2 C 10 3
2 D 20 2
2 E 30 1

sql SELECT count all points from different entries limit by 10 per ID

i have a table that looks like this:
ID GameID DateID Points Place
-------------------------------------
10 1 1 100 1
11 1 1 90 2
12 1 1 80 3
13 1 1 70 4
14 1 1 60 5
10 1 1 100 1
10 1 1 50 1
10 1 1 100 1
10 1 1 100 1
10 1 1 100 1
10 1 1 100 1
10 1 1 100 1
10 1 1 100 1
10 1 1 100 1
10 1 1 50 5
10 1 1 50 5
12 1 1 100 1
-------------------------------------
I want a table with two columns, one for the total points (summated scores/points) of one player and one for the id of the player. But for one player only ten scores may be counted, so for example if one player played thirteen times, only the ten highest scores are counted.
For the example above I want a table that looks like this:
ID totalPoints
-------------------
10 950
11 90
12 180
13 70
14 60
------------------
At the moment I tried this:
SELECT ID,
sum(Points) AS totalPoints
FROM (SELECT Points, ID
FROM Gamer
ORDER BY Points DESC LIMIT 10) AS totalPoints
ORDER BY Points DESC
but it limits the entries at all to ten and not to ten per player.
I hope anybody can help me :)
In all existing versions:
DELIMITER $
CREATE FUNCTION `totalPoints`(gamer_id INT) RETURNS int(11)
BEGIN
DECLARE s INT DEFAULT 0;
SELECT SUM(Points) INTO s FROM ( SELECT Points FROM Gamer WHERE ID=gamer_id ORDER BY Points DESC LIMIT 10) sq;
RETURN s;
END$
DELIMITER ;
SELECT DISTINCT ID, totalPoints(ID) FROM Gamer;
Alternative in MariaDB 10.2 (currently Beta), which has window functions:
SELECT ID, SUM(Points) FROM (
SELECT ID, Points, ROW_NUMBER()
OVER (PARTITION BY ID ORDER BY Points DESC) AS nm
FROM Gamer
) sq WHERE nm <= 10 GROUP BY ID;
I'm pretty sure there are other ways to do the same, these two are first that came to mind.

Use mysql SUM() and generate a random number in a WHERE clause

Suppose I have this table :
+------------------------------------+
| T_BOULEVERSEMENT |
+---------------------+--------------+
| PK_A_BOULEVERSEMENT | I_OCCURRENCE |
+---------------------+--------------+
| 1 | 3 |
+---------------------+--------------+
| 2 | 5 |
+---------------------+--------------+
| 3 | 1 |
+---------------------+--------------+
| ... | ... |
+---------------------+--------------+
| X | Y |
+---------------------+--------------+
And I want to return the first row in which the sum of all the previous occurrences (I_OCCURRENCE) is greater than a random value.
The random value is comprised in the range [1 - SUM(I_OCCURRENCE)].
The following statement seems to work fine.
SELECT y.`PK_A_BOULEVERSEMENT`,
y.`I_OCCURRENCE`
FROM (SELECT t.`PK_A_BOULEVERSEMENT`,
t.`I_OCCURRENCE`,
(SELECT SUM(x.`I_OCCURRENCE`)
FROM `T_BOULEVERSEMENT` x
WHERE x.`PK_A_BOULEVERSEMENT` <= t.`PK_A_BOULEVERSEMENT`) AS running_total
FROM `T_BOULEVERSEMENT` t
ORDER BY t.`PK_A_BOULEVERSEMENT`) y
WHERE y.running_total >= ROUND(RAND() * ((SELECT SUM(z.`I_OCCURRENCE`) FROM `T_BOULEVERSEMENT` z) - 1) + 1)
ORDER BY y.`PK_A_BOULEVERSEMENT`
LIMIT 1
But in really it mainly returns rows where PK_A_BOULEVERSEMENT is less than 10.
However, if I execute the following statement :
SELECT ROUND(RAND() * ((SELECT SUM(z.`I_OCCURRENCE`) FROM `T_BOULEVERSEMENT` z) - 1) + 1)
The result seems to be uniform in the range [1 - SUM(I_OCCURRENCE)].
What can be wrong ?
Thanks
EDIT :
SQL Fiddle : http://sqlfiddle.com/#!2/b37d6/2
The desired result must be uniform in the range 1 - MAX(PK_A_BOULEVERSEMENT)
try this:
SET #random_sum = (SELECT ROUND(RAND() * ((SELECT SUM(z.`I_OCCURRENCE`) FROM `T_BOULEVERSEMENT` z) - 1) + 1));
SELECT y.PK_A_BOULEVERSEMENT, SUM(x.I_OCCURRENCE) AS tot_occurence
FROM T_BOULEVERSEMENT AS x, T_BOULEVERSEMENT AS y
WHERE x.PK_A_BOULEVERSEMENT <= y.PK_A_BOULEVERSEMENT
GROUP BY y.PK_A_BOULEVERSEMENT
HAVING tot_occurence <= #random_sum
I had to use a temporary variable because mysql seems to recalculate rand() every row when using it in a where clause (so every row is compared to a different value).
With temporary variable I evaluate random number just before executing the query.
The cause of your problem is that the random number is being regenerated for each row in the subquery. Chances are that within the first 10 rows, you'll get a random number that's less than that row's running total. If we add the RAND() call and look at the subquery, it will look like this:
PK_A_.. I_OCC.. RUNNING_TOTAL RNDM
1 3 3 58
2 1 4 30
3 3 7 38
4 1 8 33
5 3 11 53
6 3 14 40
7 3 17 37
8 3 20 1
9 3 23 21
10 1 24 39
11 3 27 3
12 1 28 23
We only have to go as far as row 8 to find a running_total that exceeds the random value. The solution is to get the random value once, as suggested in the other answer.