Use mysql SUM() and generate a random number in a WHERE clause - mysql

Suppose I have this table :
+------------------------------------+
| T_BOULEVERSEMENT |
+---------------------+--------------+
| PK_A_BOULEVERSEMENT | I_OCCURRENCE |
+---------------------+--------------+
| 1 | 3 |
+---------------------+--------------+
| 2 | 5 |
+---------------------+--------------+
| 3 | 1 |
+---------------------+--------------+
| ... | ... |
+---------------------+--------------+
| X | Y |
+---------------------+--------------+
And I want to return the first row in which the sum of all the previous occurrences (I_OCCURRENCE) is greater than a random value.
The random value is comprised in the range [1 - SUM(I_OCCURRENCE)].
The following statement seems to work fine.
SELECT y.`PK_A_BOULEVERSEMENT`,
y.`I_OCCURRENCE`
FROM (SELECT t.`PK_A_BOULEVERSEMENT`,
t.`I_OCCURRENCE`,
(SELECT SUM(x.`I_OCCURRENCE`)
FROM `T_BOULEVERSEMENT` x
WHERE x.`PK_A_BOULEVERSEMENT` <= t.`PK_A_BOULEVERSEMENT`) AS running_total
FROM `T_BOULEVERSEMENT` t
ORDER BY t.`PK_A_BOULEVERSEMENT`) y
WHERE y.running_total >= ROUND(RAND() * ((SELECT SUM(z.`I_OCCURRENCE`) FROM `T_BOULEVERSEMENT` z) - 1) + 1)
ORDER BY y.`PK_A_BOULEVERSEMENT`
LIMIT 1
But in really it mainly returns rows where PK_A_BOULEVERSEMENT is less than 10.
However, if I execute the following statement :
SELECT ROUND(RAND() * ((SELECT SUM(z.`I_OCCURRENCE`) FROM `T_BOULEVERSEMENT` z) - 1) + 1)
The result seems to be uniform in the range [1 - SUM(I_OCCURRENCE)].
What can be wrong ?
Thanks
EDIT :
SQL Fiddle : http://sqlfiddle.com/#!2/b37d6/2
The desired result must be uniform in the range 1 - MAX(PK_A_BOULEVERSEMENT)

try this:
SET #random_sum = (SELECT ROUND(RAND() * ((SELECT SUM(z.`I_OCCURRENCE`) FROM `T_BOULEVERSEMENT` z) - 1) + 1));
SELECT y.PK_A_BOULEVERSEMENT, SUM(x.I_OCCURRENCE) AS tot_occurence
FROM T_BOULEVERSEMENT AS x, T_BOULEVERSEMENT AS y
WHERE x.PK_A_BOULEVERSEMENT <= y.PK_A_BOULEVERSEMENT
GROUP BY y.PK_A_BOULEVERSEMENT
HAVING tot_occurence <= #random_sum
I had to use a temporary variable because mysql seems to recalculate rand() every row when using it in a where clause (so every row is compared to a different value).
With temporary variable I evaluate random number just before executing the query.

The cause of your problem is that the random number is being regenerated for each row in the subquery. Chances are that within the first 10 rows, you'll get a random number that's less than that row's running total. If we add the RAND() call and look at the subquery, it will look like this:
PK_A_.. I_OCC.. RUNNING_TOTAL RNDM
1 3 3 58
2 1 4 30
3 3 7 38
4 1 8 33
5 3 11 53
6 3 14 40
7 3 17 37
8 3 20 1
9 3 23 21
10 1 24 39
11 3 27 3
12 1 28 23
We only have to go as far as row 8 to find a running_total that exceeds the random value. The solution is to get the random value once, as suggested in the other answer.

Related

Count unique values per row (over index axis, not column wise)

I have the following table:
WITH data AS (
SELECT 10 AS A, 10 AS B, 10 AS C
UNION ALL
SELECT 20 AS A, 10 AS B, 20 AS C
UNION ALL
SELECT 30 AS A, 20 AS B, 10 AS C
UNION ALL
SELECT 40 AS A, 40 AS B, 40 AS C
UNION ALL
SELECT 50 AS A, 20 AS B, 20 AS C)
SELECT * FROM data;
A B C
0 10 10 10
1 20 10 20
2 30 20 10
3 40 40 40
4 50 20 20
Now I want to count the number if unique values per row and store this in a new column called Unique_count
So my expected output would be:
A B C Unique_count
0 10 10 10 1
1 20 10 20 2
2 30 20 10 3
3 40 40 40 1
4 50 20 20 2
I am familiar with SELECT DISTINCT. But these are all column wise operations. I can't figure out how to count per row in SQL.
With the pandas module in Python it would simply be:
data['Unique_count'] = data.nunique(axis=1)
I have access to a MS SQL SERVER or MySQL SERVER so answers in both dialects are accepted.
In SQL Server, use a lateral join -- apply keyword`:
select t.*, v.unique_count
from t cross apply
(select count(distinct col) as unique_count
from (values (t.a), (t.b), (t.c)) v(col)
) v;
A lateral join is a lot like a correlated subquery in the from clause -- but more general because the subquery can return more than one column and more than one row.
This version does exactly what it looks like: it unpivots the columns and then uses count(distinct) to count the number of unique values.
In MySQL, you can use conditional logic:
select
t.*,
1 + (a <> b) + (a <> c and b<>c) unique_count
from data t
This works because MySQL evaluates true/false conditions as 1/0 in numeric context (this features saves us from lengthy case expressions here).
Demo on DB Fiddle:
| A | B | C | unique_count |
| --- | --- | --- | ------------ |
| 10 | 10 | 10 | 1 |
| 20 | 10 | 20 | 2 |
| 30 | 20 | 10 | 3 |
| 40 | 40 | 40 | 1 |
| 50 | 20 | 20 | 2 |
Add an id column to the table. Then you can use UNION to pivot the columns into rows, then COUNT(*) to get the counts. Then join that with the original table.
Note that you don't need to use COUNT(DISTINCT) because UNION DISTINCT removes duplicates.
WITH data AS (
SELECT 0 AS id, 10 AS A, 10 AS B, 10 AS C
UNION ALL
SELECT 1 AS id, 20 AS A, 10 AS B, 20 AS C
UNION ALL
SELECT 2 AS id, 30 AS A, 20 AS B, 10 AS C
UNION ALL
SELECT 3 AS id, 40 AS A, 40 AS B, 40 AS C
UNION ALL
SELECT 4 AS id, 50 AS A, 20 AS B, 20 AS C)
SELECT t1.*, t2.unique_count
FROM data AS t1
JOIN (
SELECT id, COUNT(*) AS unique_count
FROM (
SELECT id, A AS datum FROM data
UNION DISTINCT
SELECT id, B AS datum FROM data
UNION DISTINCT
SELECT id, C AS datum FROM data) AS x
GROUP BY id) AS t2
ON t1.id = t2.id

MySQL - select avg of list but ignore max and min value

I have a list of values in my database.
k v
1 5000
1 100
1 120
1 3
2 5000
2 100
2 120
2 4
3 10000
3 120
3 100
3 4
4 10
4 120
4 110
4 5000
I want to calculate the average of each k but I need to ignore the highest and lowest value of v for each k. (to remove spikes)
select avg(v) from table where v > min(v) and v < max(v) group by k
results in an :
"Invalid use of group function"
I was thinking that this is a quite common task but I wasn't able to find any ideas from the docs.
Thanks for any advise.
One way to do this without worrying about whether there are duplicate min and max values of v (assuming you only want to ignore one of each) is to take the average as SUM(v)/COUNT(v), but subtracting the min and max values from the computation:
SELECT k, (SUM(v) - MAX(v) - MIN(v)) / (COUNT(v) - 2) AS average
FROM data
GROUP BY k
Output:
k average
1 110
2 110
3 110
4 115
Demo on dbfiddle
select avg(v) , k
from table
group by k
having k <> min (v) and k<> max (v)
First get the min and max v for each k and then left join the table to the results so to get the average of the non matching rows:
select
t.k, avg(t.v) average
from tablename t left join (
select k, min(v) minv, max(v) maxv
from tablename
group by k
) g on g.k = t.k and t.v in (g.minv, g.maxv)
where g.k is null
group by t.k
See the demo.
Results:
| k | average |
| --- | ------- |
| 1 | 110 |
| 2 | 110 |
| 3 | 110 |
| 4 | 115 |
Link: Demo
select t1.k, avg(t1.v) average
from numbers t1 left join (
select k, min(v) minv, max(v) maxv
from numbers
group by k
) t2 on t2.k = t1.k and t1.v in (t2.minv, t2.maxv)
where t2.k is null
group by t1.k

MySQL Relative ranking of values in one column for all occurrences of a given key

I'm looking for some basic direction on how where to start looking to try and rank rows of a common key in a query.
Imagine I have a table like this:
user_id | account_id | score
1 A 10
1 B 20
2 C 10
2 D 20
2 E 30
What I'm hoping to do is add a rank column for relative to each user_id where the highest score gets the top rank:
user_id | account_id | score | rank
1 A 10 2
1 B 20 1
2 C 10 3
2 D 20 2
2 E 30 1
Just looking for some basic direction in terms of which way to head :/
You can use subquery
select
*,
(select count(1)+1 from your_table b where a.user_id=b.user_id and a.score<b.score) as rank
from your_table a
Output
1 A 10 2
1 B 20 1
2 C 10 3
2 D 20 2
2 E 30 1

How to select more rows with MySQL after sorting?

I have a table which have 10 results. Let's say the following:
id user number
-- ---- ------
1 user1 10
2 user2 5
3 user3 30
4 user4 45
5 user5 5
6 user6 22
7 user7 10
8 user8 40
9 user9 90
10 user10 65
I basically want to sort them, by the 'number' value.
So it should be something like this:
SORT id user number
---- -- ---- ------
1 2 user2 5
2 5 user5 5
3 1 user1 10
4 7 user7 10
5 6 user6 22
6 3 user3 30
7 8 user8 40
8 4 user4 45
9 10 user10 65
10 9 user9 90
After it's been sorted, I want to select * from (for example) the one with id = 6 (which's number is 22) and 2 other results which is above it (in this case: id = 7 and 1) and 2 other results under it (in this case: id = 3 and 8).
So the return result should be something like this, when I'm searching for id = 6:
SORT id user number
---- -- ---- ------
3 1 user1 10
4 7 user7 10
5 6 user6 22
6 3 user3 30
7 8 user8 40
I could esaily do this on server side, if I select everything, however there will be a huge data amount in here, so I'd rather just select those, which are appropriate to my search.
Is there any way to do this with MySQL?
Here is a typical way to get what you want:
select t.*
from ((select t.*
from table t
where number <= (select number where id = 6 limit 1)
order by number desc
limit 3
) union all
(select t.*
from table t
where id > (select number where id = 6 limit 1)
order by number asc
limit 2
)
) t
order by number;
This assumes that when duplicates appear, you still want 5 rows output. It also assumes that less than five rows is ok for the first two or last two rows. An index on id would help performance of this query.
Get UNION of 2 queries:
Get all records <= the desired number, in your case 22. Sort them in descending order and get top 3 results.
Get all records > the desired number in ascending order and get top 2 results
Get the UNION of these 2 results and order them in ascending order
HTH
EDIT: My idea to post this answer was to give an algorithm on how to approach such queries. Instead of selecting on number, you can get all records based on id and then UNION them.

COUNT for every GROUP BY with non existent values

I have the following table
id rid eId vs isN
24 3 22 2 1
25 3 21 2 1
26 60 21 2 1
27 60 21 2 1
28 60 21 2 1
29 60 21 2 1
30 60 21 2 1
31 60 21 2 1
32 81 21 2 1
35 60 22 2 1
36 81 22 2 1
37 0 22 2 1
38 60 22 2 1
39 81 22 2 1
40 0 22 2 1
41 60 22 2 1
42 81 22 2 1
43 3 22 2 1
On eId i have 8 different numbers
I want to count this eight different eid , even counted as "0" what i want to get is an array contain 8 values and the keys should be the eight different names. "vs" is 3 different numbers every time i count
i want for this on "rid" = %d and "vs" = %d ( specific rid and specific vs)
SELECT count(*) as count FROM notification
WHERE rid = 60 AND vs = 2 AND isN = 1 GROUP BY eId
rid=>60,21=>6,22=>3,vs=>2,isN=>1
(this is what i get with the above)
rid=>60,21=>6,22=>3,23=>0,33=>0,34=>0,35=>0,36=>0,41=>0,42=>0,vs=>2,isN=>1
(this is what i want. eight counted, of course this numbers counted not existed on eId so i want to return as a zero)
Here's one way to get the specified resultset:
SELECT d.rid AS `rid`
, SUM(n.eid<=>21) AS `21`
, SUM(n.eid<=>22) AS `22`
, SUM(n.eid<=>23) AS `23`
, SUM(n.eid<=>33) AS `33`
, SUM(n.eid<=>34) AS `34`
, SUM(n.eid<=>35) AS `35`
, SUM(n.eid<=>36) AS `36`
, SUM(n.eid<=>41) AS `41`
, SUM(n.eid<=>42) AS `42`
, d.vs AS `vs`
, d.isN AS `isN`
FROM ( SELECT %d AS rid, %d AS vs, 1 AS isN ) d
LEFT
JOIN notification n
ON n.rid = d.rid
AND n.vs = d.vs
AND n.isN = d.isN
GROUP
BY d.rid
, d.vs
, d.isN
Note: the expression (n.eid<=>21) is shorthand for IF(n.eid=21,1,0), or the more ANSI-standard CASE WHEN n.eid = 21 THEN 1 ELSE 0 END. That gives a 0 or a 1, which can then be aggregated with a SUM function.
You could get equivalent results using any of these forms:
, SUM(n.eid<=>21) AS `21`
, COUNT(IF(n.eid=22,1,NULL)) AS `22`
, SUM(IF(n.eid=23,1,0)) AS `23`
, COUNT(CASE WHEN n.eid = 33 THEN 1 END) AS `33`
, SUM(CASE WHEN n.eid = 34 THEN 1 ELSE 0 END) AS `34`
The "trick" we are using here is that we are guaranteed that the inline view aliased as d will return one row. Then we are using a LEFT JOIN operator to pick up all "matching" rows from the notification table. The GROUP BY is going to force all those rows to be collapsed (aggregated) back down to a single row. And we are using a conditional test on each row to see if it is to be included in a given count or not, the "trick" is to return a 0 or a 1, for each row, and then add up all the 0s and 1s to get a count.
NOTE: If you use a COUNT(expr) aggregate, you want that expr to return a non-NULL when the row is to be included in the count, and a NULL when the row is not to be included in the count.
If you use a SUM(expr), then you want expr to return a 1 when the row is to be included in the count, and return a 0 when it's not. (We want a 0 rather than a NULL so that we will be guaranteed that SUM(expr) will return a "zero count" (i.e a 0 rather than a NULL) when there are no rows to be included. (Of course, we could use an IFNULL function to replace a NULL with a 0, but in this case it's simple enough to avoid the need for that.)
Note that one advantage of this approach to "counting" is that it can easily extended to get "combined" counts, or to include a row in several different counts. e.g.
, SUM(IF(n.eid IN (41,42),1,0)) AS `total_41_and_42`
would get us a total count of eid=41 and eid=42 rows. (That's not such a great example, because we could just as easily calculate that on the client side by adding the two counts together. But that really becomes an advantage if you were doing more elaborate counts, and wanted to count a single row in multiple columns ...
, SUM(IF(n.eid=42,1,0)) AS eid_42
, SUM(IF(n.eid=42 AND foo=1,1,0) AS eid_42_foo_1
, SUM(IF(n.eid=42 AND foo=2,1,0)) AS eid_42_foo_2
We can get all those separate counts with just "one pass" through notification table. If we tried to do those checks in the WHERE clause, we'd likely need multiple passes through the table.
What you need is a driver table that has all the values you want to output. You can then left outer join this to the actual data:
SELECT count(notification.eid) as count
FROM (select distinct eid
from notification
) drivers left outer join
(select *
from notification
WHERE rid = %d AND vs = %d AND isN = 1
) n
on driver.eid = notification.eid
GROUP BY driver.eId
You should also include the eid in the select clause, unless you are depending on the final ordering of the output (MySQL, unlike any other database, does guarantee the ordering of results after a group by.)
So, essentially what you're looking for is this?...
SELECT rid,eid,vs, COUNT(*) FROM notification GROUP BY rid,eid,vs;
+-----+-----+----+----------+
| rid | eid | vs | COUNT(*) |
+-----+-----+----+----------+
| 0 | 22 | 2 | 2 |
| 3 | 21 | 2 | 1 |
| 3 | 22 | 2 | 2 |
| 60 | 21 | 2 | 6 |
| 60 | 22 | 2 | 3 |
| 81 | 21 | 2 | 1 |
| 81 | 22 | 2 | 3 |
+-----+-----+----+----------+
7 rows in set (0.11 sec)