MySQL WHERE RAND() Behaviour - mysql

Consider the following table foo:
a b v
0 9 1
10 19 2
20 29 3
30 39 4
40 49 5
50 59 6
60 69 7
70 79 8
80 89 9
90 100 10
a and b are the lower and upper boundaries of a certain value v. Example:
x = 79 => v = 8
This can be done with the following statement:
SELECT `v`
FROM `foo`
WHERE 79 BETWEEN `a` AND `b`
Which MySQL correctly returns:
v
8
For each number between 0 and 100 provided as input, MySQL will correctly return one, and only one number between 1 and 10.
Issue
However, if the input number is substituted with a random number generator, the behaviour is somewhat different:
SELECT `v`
FROM `foo`
WHERE ROUND(RAND()*100) BETWEEN `a` AND `b`
Instead of returning only one number, MySQL might return anything from empty result set up to 3 numbers!
Question 1
Is this the expected behaviour of the RAND() statement? What is the reasoning for this apparently weird behavior?
Question 2
Considering the intended purpose, is the statement correct? What would be the correct one? How to correct this behaviour?

An ansi-sql-friendly solution with a single query would look like
SELECT x.rnd, `v`
FROM yourTable y
INNER JOIN (SELECT RAND()*100 rnd) x
WHERE x.rnd BETWEEN y.`a` AND y.`b`;
it generates a random value just once and then is used in a joined query.
Demo: http://rextester.com/YOIW49684
The query and the base table are kindly borrowed from Tim Biegeleisen

If you want the same random value to be applied to every row of the query, then one option is to use a session variable:
SET #rnd = RAND()*100;
SELECT v
FROM foo
WHERE ROUND(#rnd) BETWEEN a AND b;
Demo

Related

SELECT TOP PERCENT, VaR, Expected Shortfall in MySQL

I would like to achieve SELECT TOP PERCENT in MySQL.
I used Victor Sorokin's idea in Select TOP X (or bottom) percent for numeric values in MySQL, and got the following query:
SELECT x.log AS Login,
AVG(x.PROFIT) AS 'Expected Shortfall',
MAX(x.PROFIT) AS '40%VaR'
FROM
(SELECT t.PROFIT,
#counter := #counter +1 AS counter,
t.LOGIN AS log
FROM (SELECT #counter:=0) initvar, trades AS t
WHERE t.LOGIN IN (100,101)
ORDER BY t.PROFIT) AS x
WHERE x.counter <= (40/100 * #counter)
GROUP BY x.log
Which return the following result:
Login
Expected Shortfall
40%VaR
101
-85
-70
This works when I change WHERE t.LOGIN IN (100,101) to a single value like WHERE t.LOGIN=100. Whereby it will return me values for each login as following:
Login
Expected Shortfall
40%VaR
100
-4.5
-4
Login
Expected Shortfall
40%VaR
101
-95
-90
I'm not really sure what is happening and I was wondering if there is a way to use the query for multiple accounts or there is a better way to solve the issue? Was thinking of a LOOP statement?
I'm currently using MySQL version 5.7.34. Please do not hesitate to let me know if any clarification is needed. Any ideas would be much appreciated!
Edit: To replicate the issue:
CREATE TABLE trades (
TICKET int(11) PRIMARY KEY,
LOGIN int(11),
PROFIT double)
INSERT INTO trades (TICKET,LOGIN,PROFIT)
VALUES
(1,100,-5),
(2,100,-4),
(3,100,-3),
(4,100,-2),
(5,100,-1),
(6,101,-100),
(7,101,-90),
(8,101,-80),
(9,101,-70),
(10,101,-60),
(11,101,-50),
(12,101,500)
The expected output is just like the outputs you would get if you ran the query for 100 and 101 separately:
Expected Output
LOGIN
ES
40%VAR
100
-4.5
-4
101
-95
-90
Expected Output
The reason why the end result was not according to the single value queries was caused by the #row_number assignment. Taking the base query (the subquery) to run alone will return the following results:
PROFIT
counter
log
-100
1
101
-90
2
101
-80
3
101
-70
4
101
-60
5
101
-50
6
101
-5
7
100
-4
8
100
-3
9
100
-2
10
100
-1
11
100
500
12
101
As you can see, the counter value that was generated using #row_number is giving a running number for all of the data in the table regardless of it's log value. The result below shows the differences with query that using a single log value:
PROFIT
counter
log
-5
1
100
-4
2
100
-3
3
100
-2
4
100
-1
5
100
Here you can see that if using log=100, you'll get a counter (#row_number) generated from 1-5 as opposed to it being generated from 7-11 in the combined log IN (100,101). This is why WHERE x.counter <= (40/100*v.ctr) in the final query only take log=101 because it's the only one matches the condition. What you're looking for is a counter value separated by log. On MySQL 8.0+ (or MariaDB 10.2+) that support window function, this can be done by using ROW_NUMBER(). However, since OP is using an older version, I found a way to emulate the functionality of ROW_NUMBER() accordingly.
This is the final query generated:
SELECT x.log AS Login,
AVG(x.PROFIT) AS 'Expected Shortfall',
MAX(x.PROFIT) AS '40%VaR'
FROM
(SELECT t.PROFIT,
#row_number:=CASE
WHEN #id = LOGIN THEN #row_number + 1
ELSE 1 END AS counter,
#id:=LOGIN ID, t.LOGIN AS log
FROM trades t
CROSS JOIN (SELECT #id:=0,#row_number:=0) as n
ORDER BY LOGIN) AS x
JOIN (SELECT Login,COUNT(*) ctr FROM trades GROUP BY login) AS v
ON x.log=v.login
WHERE x.counter <= (40/100*v.ctr)
GROUP BY x.log
ORDER BY x.log;
And here is the demo fiddle (inclusive of ROW_NUMBER()) on MySQL 8.0+ query.

Missing values on count in mysql

I'm just stuck with this issue atm and I'm not 100% sure how to deal with it.
I have a table where I'm aggregating data on week
select week(create_date),count(*)
from user
where create_date > '2015-02-01'
and id_customer between 9 and 17
group by week(create_date);
the results that I'm getting have missing values in the count, as shown below
5 334
6 376
7 394
8 405
9 504
10 569
11 709
12 679
13 802
14 936
15 1081
16 559
21 1
24 9
25 22
26 1
32 3
34 1
35 1
For example here from 16 to 21 there a obviously 4 values missing I would like these values to be included and count to be 0. I want this because I want the weeks to be matching with other metrics as we are outputting them in an excel file for internal analysis.
Any help would be greatly appreciated.
The problem is that an sql query cannot really produce data that is not there at all.
You have 3 options:
If you have data for each week in your entire table for the period you are querying, then you can use a self join to get the missing weeks:
select week(t1.create_date), count(t2.id_customer)
from customer t1
left join customer t2 on t1.id_customer=t2.id_customer and t1.create_date=t2.create_date and t2.id_customer between 9 and 17
where t1.create_date > '2015-02-01'
group by week(t1.create_date)
If you have missing weeks from the customer table as whole, then create a helper table that contain week numbers from 1 or 0 (depending on mysql config) to 53 and do a left join to this helper table.
Use a stored procedure that loops through the results of your original query and inserts the missing data in the resultset using a temporary table and then returns the extended dataset as result.
The problem is that there is no data matching your criteria for the missing weeks. A solution will be to join from a table that has all week numbers. For example if you create a table weeknumbers with one field weeknumber containing all the numbers from 0 to 53 you can use something like this
select weeknumber,count(user.*)
from weeknumbers left join user on (weeknumbers.weeknumber=week(user.create_date)
and user.create_date > '2015-02-01'
and user.id_customer between 9 and 17)
group by weeknumber;
Additionaly you might want to limit the week numbers you do not want to see.
The other way is to do it in the application.

Cannot print out the latest results of table

I have the following table:
NAMES:
Fname | stime | etime | Ver | Rslt
x 4 5 1.01 Pass
x 8 10 1.01 Fail
x 6 7 1.02 Pass
y 4 8 1.01 Fail
y 9 10 1.01 Fail
y 11 12 1.01 Pass
y 10 14 1.02 Fail
m 1 2 1.01 Fail
m 4 6 1.01 Fail
The result I am trying to output is:
x 8 10 1.01 Fail
x 6 7 1.02 Pass
y 11 12 1.01 Pass
y 10 14 1.02 Fail
m 4 6 1.01 Fail
What the result means:
Fnames are an example of tests that are run. Each test was run on different platforms of software (The version numbers) Some tests were run on the same platform twice: It passed the first time and failed the second time or vice versa. My required output is basically the latest result of each case for each version. So basically the results above are all unique by their combination of Fname and Ver(sion), and they are selected by the latest etime from the unique group.
The query I have so far is:
select Fname,stime,max(etime),ver,Rslt from NAMES group by Fname,Rslt;
This however, does not give me the required output.
The output I get is (wrong):
x 4 10 1.01 Fail
x 6 7 1.02 Pass
y 4 12 1.01 Pass
y 10 14 1.02 Fail
m 1 6 1.01 Fail
Basically it takes the max time, but it does not really print the correct data out, it prints the max time, but it prints the initial time of the whole unique group of data, instead of the initial time of that particular test (record).
I have tried so long to fix this, but I seem to be going no where. I have a feeling there is a join somewhere in here, but I tried that too, no luck.
Any help is appreciated,
Thank you.
Use a subquery to get the max ETime by FName and Ver, then join your main table to it:
SELECT
NAMES.FName,
NAMES.STime,
NAMES.ETime,
NAMES.Ver,
NAMES.Rslt
FROM NAMES
INNER JOIN (
SELECT FName, Ver, MAX(ETime) AS MaxETime
FROM NAMES
GROUP BY FName, Ver
) T ON NAMES.FName = T.FName AND NAMES.Ver = T.Ver AND NAMES.ETime = T.MaxETime
You could first find which is the latests=max(etime) for each case for each version ?
select Fname,Ver,max(etime) from NAMES group by Fname,Ver;
From there you would display the whole thing via joining it again?
select *
from
NAMES
inner join
(select Fname,Ver,max(etime) as etime from NAMES group by Fname,Ver ) sub1
using (Fname,Ver,etime)
order by fname,Ver;

Duplicating rows in one select MySql query

At first I would like greet all Users and apologize for my english :).
I'm new user on this forum.
I have a question about MySQL queries.
I have table Items with let say 2 columns for example itemsID and ItemsQty.
itemsID ItemsQty
11 2
12 3
13 3
15 5
16 1
I need select itemsID but duplicated as many times as indicated in column ItemsQty.
itemsID ItemsQty
11 2
11 2
12 3
12 3
12 3
13 3
13 3
13 3
15 5
15 5
15 5
15 5
15 5
16 1
I tried that query:
SELECT items.itemsID, items.itemsQty
FROM base.items
LEFT OUTER JOIN
(
SELECT items.itemsQty AS Qty FROM base.items
) AS Numbers ON items.itemsQty <=Numbers.Qty
ORDER BY items.itemsID;
but it doesn't work correctly.
Thanks in advance for help.
SQL answer - Option 1
You need another table called numbers with the numbers 1 up to the maximum for ItemsQuantity
Table: NUMBERS
1
2
3
4
5
......
max number for ItemsQuantity
Then the following SELECT statement will work
SELECT ItemsID, ItemsQty
FROM originaltable
JOIN numbers
ON originaltable.ItemsQty >= numbers.number
ORDER BY ItemsID, number
See this fiddle -> you should always set-up a fiddle like this when you can - it makes everyone's life easier!!!
code answer - option 2
MySQL probably won't do what you want 'cleanly' without a second table (although some clever person might know how)
What is wrong with doing it with script?
Just run a SELECT itemsID, ItemsQty FROM table
Then when looping through the result just do (pseudo code as no language specified)
newArray = array(); // new array
While Rows Returned from database{ //loop all rows returned
loop number of times in column 'ItemsQty'{
newArray -> add 'ItemsID'
}
}//end of while loop
This will give you a new array
0 => 11
1 => 11
2 => 12
3 => 12
4 => 12
5 => 13
etc.
Select DISTINCT items.itemsID, items.itemsQty From base.items left outer join (select items.itemsQty as Qty from base.items) As Numbers On items.itemsQty <=Numbers.Qty
order by items.itemsID;
Use DISTINCT to remove duplicates. Read more here - http://dev.mysql.com/doc/refman/5.0/en/select.html
It seems like I understood what you asked differently than everyone else so I hope I answer you question. What I would basically do is -
create a new table for those changes.
Create a mysql procedure which given a line in the original table add new lines to the new table - http://dev.mysql.com/doc/refman/5.6/en/loop.html
Run this procedure for each line in the original table.
try this to get distinct values from both columns
SELECT DISTINCT itemsID FROM items
UNION
SELECT DISTINCT itemsQty FROM items

MySQL: Matching inexact values using "ON"

I'm way out of my league here...
I have a mapping table (table1) to assign particular values (value) to a whole number (map_nu). My second table (table2), is a collection of averages (avg) for each user (user_id).
(I couldn't figure out how to properly make a markdown table, please feel free to edit!)
table1: table2:
(value)(Map_nu) (user_id)(avg)
---- -----
1 1 1 1.111
1.045 2 2 1.2
1.09 3 3 1.33333
1.135 4 4 1
1.18 5 5 1.389
1.225 6 6 1.42
1.27 7 7 1.07
1.315 8
1.36 9
1.405 10
The value Map_nu is a special number that each user gets assigned according to their average. I need to find a way to match the averages from table2 to the closest value in table1. I only need to match to the 2 digit past the decimal, so I've added the Truncated function
SELECT table2.user_id, map_nu
FROM `table1`
JOIN table2 ON TRUNCATE(table1.value,2)=TRUNCATE(table2.avg,2)
I still miss the values that don't match the averages exactly. Is there a way to pick the nearest truncated value or even to round to the second decimal? Rounding up/down wont matter as long as its applied to all values the same.
I am trying to have the following result (if rounded up):
(user_id)(Map_nu)
----
1 4
2 6
3 6
4 1
5 10
6 11
7 3
Thanks!
i think you might have to do this in 2 separate queries. there is no 'nearest' operator in sql, so you can either calculate it in your software, or you could use
select map_nu from table1 ORDER BY abs(value - $avg) LIMIT 1
inside a loop. however, that cannot be used as a join function as it requires the ORDER and LIMIT which are not valid as joins.
another way of looking at it is it seems that your map_nu and value are deterministic in relation to each other - value = 1 + ((map_nu - 1) * 0.045) - so maybe you could make use of that fact and calculate an integer based on that equation? assuming that relationship holds true for all values of map_nu.
This is an awkward database design. What is the data representing and what are you trying to solve? There might be a better way.
Maybe do something like...
SELECT a.user_id, b.map_nu, abs(a.avg - b.value)
FROM
table2 a
join table1 b
left join table1 c on abs(a.avg - b.value) > abs(a.avg - c.value)
where c.value is null
order by a.user_id
Doesn't actually produce the same output as the one you were expecting for (doesn't do any rounding). Though you should be able to tweak it from there. Above query will produce the output below (w/ data you've provided):
user_id map_nu abs(a.avg - b.value)
------- ------ --------------------
1 3 0.0209999999999999
2 5 0.02
3 8 0.01833
4 1 0
5 10 0.016
6 10 0.0149999999999999
7 3 0.02
Beware though if you're dealing with large tables. Evaluate the explain of the above query if it'll be practical to run it within MySQL or if better to be done outside it.
Note 2: Will produce duplicate rows if there are avg values that are equi-distant to value values within table1 (Ex. if value for map_nu's 11 and 12 are 2 and 3 and someone get's an avg of 2.5). Your question doesn't really specify what to do for that so you might want to take that into account.
Its taking a little extra work, but I figure the easiest way to get my results will be to map all values to the second decimal place in table1:
1 1
1.01 1
1.02 1
1.03 1
1.04 1
1.05 2
1.06 2
1.07 2
1.08 2
1.09 3
1.1 3
1.11 3
1.12 3
1.13 3
1.14 4
...
Thanks for the suggestions! Sorry I couldn't present the question more clear.