I need something where I really dont know how to make it.
Here is an example:
user 1 = 3
user 2 = 1
user 3 = 6
What I want with this, is that user 1, has 30% chance, user 2 has 10% and user 3 60%. But those numbers can also be 0.01 instead of 6. What now want, is that it gets randomised but also have the % of chances. I dont really know how to explain it. That it then draws an number with 60% chance for user 3, 10 chance for user 2 and 30% chance for user 1. But, this last can be as long as possible. How to do this? Sorry, I am really bad in explaining.
Thanks!
What you want can be described as a "weighted probability distribution" or to be technical a "discrete distribution" and a "Categorical distribution":
A categorical distribution is a discrete probability distribution that describes the possible results of a random variable that can take on one of K possible elementary events, with the probability of each elementary event separately specified.
-- Wikipedia
Given you have a random variable with uniform distribution in the range 0 to 1 you can build any distribution you want by using the Inverse Method.
Your first step is to normalize the distribution. That means to make sure that the area of below the curve equals one (that is, that your weight do not sum more than 100%). For a discrete distribution, it means to make sure that sum of the weights is equal to one. [Which is the same as taking the values as a vector and calculate the unit vector in the same direction], just take the each value and divide it by the sum of the values.
Therefore, you go from this:
(original)
user 1 = 3
user 2 = 1
user 3 = 6
To this:
sum = 3 + 1 + 6 = 10
(normalized)
user 1 = 3 / 10 = 0.3
user 2 = 1 / 10 = 0.1
user 3 = 6 / 10 = 0.6
Next, get the cumulative distribution. That is, for each value you do not want its (normalized) weight but the weight of it plus all previous ones.
Therefore, you go from this
(normalized)
user 1 = 0.3
user 2 = 0.1
user 3 = 0.6
To this:
(cumulative)
user 1 = 0.3
user 2 = 0.1 + 0.3 = 0.4
user 3 = 0.6 + 0.3 + 0.1 = 1
Finally, you get your random variable with uniform distribution in the range 0 to 1 and check below which value it falls:
$r = (float)rand()/(float)getrandmax();
if ($r <= 0.3) return "user 1"; // user 1 = 0.3
else if ($r <= 0.4) return "user 2"; // user 2 = 0.4
else return "user 3"; // user 3 = 1
Note: The range is inclusive because PHP is weird.
Ok, all in one go, in (ugly) PHP:
$p = ['user 1' => 3, 'user 2' => 1, 'user 3' => 6];
$s = array_sum($p);
$n = array_map(function($i) use ($p, $s){return $i/$s;},$p);
$a = []; $t = 0;
foreach($n as $k => $i) {$t += $i; $a[$k] = $t;}
$r = (float)rand()/(float)getrandmax();
foreach($a as $k => $i) { if ($r <= $i) return $k; }
Try online.
Let us reimplement in MySQL because reasons.
First we need a table with the input, for example:
SELECT 'user 1' AS `id`, 3 AS `chance`
UNION
SELECT 'user 2' AS `id`, 1 AS `chance`
UNION
SELECT 'user 3' AS `id`, 6 AS `chance`
Then we sum the values
SELECT sum(chance) FROM (SELECT 'user 1' AS `id`, 3 AS `chance`
UNION
SELECT 'user 2' AS `id`, 1 AS `chance`
UNION
SELECT 'user 3' AS `id`, 6 AS `chance`) input
Then we normalize
SELECT id, chance / sum FROM (SELECT 'user 1' AS `id`, 3 AS `chance`
UNION
SELECT 'user 2' AS `id`, 1 AS `chance`
UNION
SELECT 'user 3' AS `id`, 6 AS `chance`) input CROSS JOIN (SELECT sum(chance) as sum FROM (SELECT 'user 1' AS `id`, 3 AS `chance`
UNION
SELECT 'user 2' AS `id`, 1 AS `chance`
UNION
SELECT 'user 3' AS `id`, 6 AS `chance`) input) s
Then we cumulate
SELECT id, chance / sum as sum, (#tmp := #tmp + chance / sum) as csum FROM (SELECT 'user 1' AS `id`, 3 AS `chance`
UNION
SELECT 'user 2' AS `id`, 1 AS `chance`
UNION
SELECT 'user 3' AS `id`, 6 AS `chance`) input CROSS JOIN (SELECT sum(chance) as sum FROM (SELECT 'user 1' AS `id`, 3 AS `chance`
UNION
SELECT 'user 2' AS `id`, 1 AS `chance`
UNION
SELECT 'user 3' AS `id`, 6 AS `chance`) input) s CROSS JOIN (SELECT #tmp := 0) cheat
Then we pick
SELECT id from (
SELECT id, chance / sum as sum, (#tmp := #tmp + chance / sum) as csum FROM (SELECT 'user 1' AS `id`, 3 AS `chance`
UNION
SELECT 'user 2' AS `id`, 1 AS `chance`
UNION
SELECT 'user 3' AS `id`, 6 AS `chance`) input CROSS JOIN (SELECT sum(chance) as sum FROM (SELECT 'user 1' AS `id`, 3 AS `chance`
UNION
SELECT 'user 2' AS `id`, 1 AS `chance`
UNION
SELECT 'user 3' AS `id`, 6 AS `chance`) input) s CROSS JOIN (SELECT #tmp := 0) cheat) a
CROSS JOIN (SELECT RAND() as r) random
WHERE csum > r
LIMIT 1
Try online.
Edit: This answer is based on the title "PHP...." not the tag "mysql"
Each user has n tickets. In your case you have 10 tickets and therefore need a random number between 0 and 9. An unelegant solution could be:
<?php
$tickets = array();
$number_of_tickets = 0;
foreach($users as $user) {
for($i = 0; i < $user->tickets; $i++) {
$tickets[] = $user->id;
$tickets++;
}
}
$lucky_draw = rand(0, $number_of_tickets);
$winner = tickets[$lucky_draw] //ID of the user
print("And the winner is...." . $winner);
Related
I have a table similar to below
Insurance ID
Created By
Closed By
1
User A
User A
2
User A
User C
3
User B
User C
4
User B
User C
5
User B
User C
From this table, I am trying to create a View as below
UserName
Total Created
Total Closed
User A
2
1
User B
3
0
User C
0
4
I am not able to figure out how to group the table to achieve this view. Any help would be greatly appreciated
Here's one option:
Sample data (you have it already, so you don't type that):
SQL> with test (insurance_id, created_by, closed_by) as
2 (select 1, 'user a', 'user a' from dual union all
3 select 2, 'user a', 'user c' from dual union all
4 select 3, 'user b', 'user c' from dual union all
5 select 4, 'user b', 'user c' from dual union all
6 select 5, 'user b', 'user c' from dual
7 ),
Query begins here:
8 all_users as
9 (select created_by username from test
10 union
11 select closed_by from test
12 )
13 select u.username,
14 sum(case when t.created_by = u.username then 1 else 0 end) total_created,
15 sum(case when t.closed_by = u.username then 1 else 0 end) total_closed
16 from all_users u cross join test t
17 group by u.username
18 order by u.username;
USERNA TOTAL_CREATED TOTAL_CLOSED
------ ------------- ------------
user a 2 1
user b 3 0
user c 0 4
SQL>
I would be inclined to have a separate Users table and only have an integer of UserId in the main table. CROSS APPLY should avoid reading the same table twice.
SELECT X.UserName
,SUM(CASE WHEN X.Activity = 'Created' THEN 1 ELSE 0 END) AS TotalCreated
,SUM(CASE WHEN X.Activity = 'Closed' THEN 1 ELSE 0 END) AS TotalClosed
FROM YourTable T
CROSS APPLY
(
VALUES (T.CreatedBy, 'Created')
,(T.ClosedBy, 'Closed')
) X (UserName, Activity)
GROUP BY X.UserName
ORDER BY UserName;
create table sometable (user_id, created_by,closed_by)
as
select 1, 'user a', 'user a' from dual union all
select 2, 'user a', 'user c' from dual union all
select 3, 'user b', 'user c' from dual union all
select 4, 'user b', 'user c' from dual union all
select 5, 'user b', 'user c' from dual;
SELECT *
FROM sometable
UNPIVOT ( username
FOR col IN ( created_by
, closed_by
)
)
PIVOT ( COUNT (user_id)
FOR col IN ( 'CREATED_BY' AS total_created
, 'CLOSED_BY' AS total_closed
)
)
ORDER BY username
;
USERNAME TOTAL_CREATED TOTAL_CLOSED
user a 2 1
user b 3 0
user c 0 4
I am working on an accounting software with JAVA + MySQL (maria db). I calculate the amount of the query with the following query, and the query takes 16 seconds when I run the query. is query duration normal? Do I make a mistake in the question?
SELECT products_id as ID,prod_name as 'Product Name',
IFNULL((SELECT sum(piece)
FROM `ktgcari_000_fatura_xref`
WHERE product_id = ktgcari_000_stok.products_id AND
(type = 1 or type = 4)
), 0) -
IFNULL((SELECT sum(piece)
FROM `ktgcari_000_fatura_xref`
WHERE product_id = ktgcari_000_stok.products_id AND
(type = 2 or type = 5)
), 0) +
IFNULL((SELECT sum(piece)
FROM ktgcari_000_ssayim
WHERE urun_id = ktgcari_000_stok.products_id
), 0) as stock
FROM ktgcari_000_stok
LIMIT 0,1000
Stock=(sum of incoming invoice + sum of incoming dispatch) - (sum of outgoing invoice + total of outgoing dispatch) + (total of counting receipt)
Database Information:
number of stock cards: 39000
Number of invoices: 545
Invoice content table count: 1800
Number of counting fingers: 942
database size: 5 MB
I would write the query as:
SELECT s.products_id as ID, s.prod_name as `Product Name`,
(COALESCE((SELECT SUM(CASE WHEN x.type IN (1, 4) THEN piece
WHEN x.type IN (2, 5) THEN - piece
END)
FROM `ktgcari_000_fatura_xref` x
WHERE x.product_id = s.products_id AND
x.type IN (1, 2, 4, 5)
), 0) +
COALESCE((SELECT SUM(ss.piece)
FROM ktgcari_000_ssayim ss
WHERE ss.urun_id = s.products_id
)), 0
) as stock
FROM ktgcari_000_stok s
LIMIT 0, 1000
Then for performance, you want indexes on ktgcari_000_fatura_xref(product_id, type, piece) and ktgcari_000_ssayim(urun_id, piece).
I also note that you are using LIMIT without ORDER BY. You do realize that SQL result sets are unordered, unless they have an explicit ORDER BY.
I edited the sql query as follows. the query time was 8 sec. how can i reduce the duration.
SELECT products_id as ID,prod_name,(SELECT IF(type=1 or
type=4,sum(urun_adet),0)-IF(type=2 or type=5,sum(urun_adet),0) FROM
ktgcari_000_fatura_xref where
product_id=ktgcari_000_stok.products_id)+IFNULL((SELECT sum(miktar)
FROM ktgcari_000_ssayim where urun_id=ktgcari_000_stok.products_id),0)
as 'stock' FROM ktgcari_000_stok LIMIT 0,1000
I need to calculate the average of occurrences in a dataset for a given value in a column. I made an easy example but in my current database contains around 2 inner joins to reduce it to 100k records. I need to perform the following select distinct statement for 10 columns.
My current design forces an inner join for each column. Another constraint is that I need to perform it at least 50-100 rows for each name in this example.
I need to figure out an efficient way to calculate this values without using too many resources while making the query fast.
http://sqlfiddle.com/#!9/c2378/3
My expected Result is:
Name | R Avg dir | L Avg dir 1 | L Avg dir 2 | L Avg dir 3
A 0 .5 .25 .25
Create table query:
CREATE TABLE demo
(`id` int, `name` varchar(10),`hand` varchar(1), `dir` int)
;
INSERT INTO demo
(`id`, `name`, `hand`, `dir`)
VALUES
(1, 'A', 'L', 1),
(2, 'A', 'L', 1),
(3, 'A', 'L', 2),
(4, 'A', 'L', 3),
(5, 'A', 'R', 3),
(6, 'A', 'R', 3)
;
Example Query:
SELECT distinct name,
COALESCE(( (Select count(id) as 'cd' from demo where hand = 'L' AND dir = 1) /(Select count(id) as 'fd' from demo where hand = 'L')),0) as 'L AVG dir'
FROM
demo
where hand = 'L' AND dir = 1 AND name = 'A'
One option is to use conditional aggregation:
SELECT name,
count(case when hand = 'L' and dir = 1 then 1 end) /
count(case when hand = 'L' then 1 end) L1Avg,
count(case when hand = 'L' and dir = 2 then 1 end) /
count(case when hand = 'L' then 1 end) L2Avg,
count(case when hand = 'L' and dir = 3 then 1 end) /
count(case when hand = 'L' then 1 end) L3Avg,
count(case when hand = 'R' and dir = 3 then 1 end) /
count(case when hand = 'R' then 1 end) RAvg
FROM demo
WHERE name = 'A'
GROUP BY name
Updated Fiddle Demo
Please note, I wasn't 100% sure why you wanted your RAvg to be 0 -- I assumed you meant 100%. If not, you can adjust the above accordingly.
I have a table foo that stores codes in format lnnnnn where l is at least one letter and n is numeric value. Both letters or numbers can be of various length, so trying to solve this like mentioned here won't work.
Example:
group | code
=============
1 | a0010
1 | a0012
1 | a0013
2 | bn0014
2 | bn0015
2 | bn0016
3 | u0017
3 | u0018
My task is to get current highest numeric value of this column in desired group, to generate new number (like sequence).
Note that I cannot redesign table and explode string and text parts.
So far I tried:
select
max(code rlike '[0-9]$')
from
foo
where
group = 2
but, sadly, regexp or rlike (synonyms) returns only 0 or 1 (matched or not matched).
One method is a brute force method:
select grp,
max(case when substr(code, 1, 1) between '0' and '9' then code + 0
when substr(code, 2, 1) between '0' and '9' then substr(code, 2) + 0
when substr(code, 3, 1) between '0' and '9' then substr(code, 3) + 0
when substr(code, 4, 1) between '0' and '9' then substr(code, 4) + 0
when substr(code, 5, 1) between '0' and '9' then substr(code, 5) + 0
when substr(code, 6, 1) between '0' and '9' then substr(code, 6) + 0
when substr(code, 7, 1) between '0' and '9' then substr(code, 7) + 0
when substr(code, 8, 1) between '0' and '9' then substr(code, 8) + 0
end)
from foo
group by grp;
If your numeric codes is always four digits then you can do it like:
select groupid, max(right(code,4)) as maxcode
from foo
group by groupid
See it here on fiddle: http://sqlfiddle.com/#!2/775b3/2
If all numeric parts start with a 0:
select gp, max(cast(substr(code, instr(code, '0')) as unsigned))
from t
group by gp
See sqlfiddle
If not, for arbitrary numeric parts (that start with any digit):
select gp, max(cast(substr(code, instr(code, n)) as unsigned))
from t
join (select 0 n union select 1 union select 2 union select 3 union select 4 union select 5
union select 6 union select 7 union select 8 union select 9) x
group by gp
See sqlfiddle
I am trying to run a loop of some sort in SQL Server 2008/TSQL and I am unsure whether this should be a WHILE or CURSOR or both. The end result is I am trying to loop through a list of user logins, then determine the unique users, then run a loop to determine how many visits it took for the user to be on the site for 5 minutes , broken out by the channel.
Table: LoginHistory
UserID Channel DateTime DurationInSeconds
1 Website 1/1/2013 1:13PM 170
2 Mobile 1/1/2013 2:10PM 60
3 Website 1/1/2013 3:10PM 180
4 Website 1/1/2013 3:20PM 280
5 Website 1/1/2013 5:00PM 60
1 Website 1/1/2013 5:05PM 500
3 Website 1/1/2013 5:45PM 120
1 Mobile 1/1/2013 6:00PM 30
2 Mobile 1/1/2013 6:10PM 90
5 Mobile 1/1/2013 7:30PM 400
3 Website 1/1/2013 8:00PM 30
1 Mobile 1/1/2013 9:30PM 200
SQL Fiddle to this schema
I can select the unique users into a new table like so:
SELECT UserID
INTO #Users
FROM LoginHistory
GROUP BY UserID
Now, the functionality I'm trying to develop is to loop over these unique UserIDs, order the logins by DateTime, then count the number of logins needed to get to 300 seconds.
The result set I would hope to get to would look something like this:
UserID TotalLogins WebsiteLogins MobileLogins Loginsneededto5Min
1 4 2 2 2
2 2 2 0 0
3 3 3 0 3
4 1 1 0 0
5 2 1 1 2
If I were performing this in another language, I would think it would something like this: (And apologies because this is not complete, just where I think I am going)
for (i in #Users):
TotalLogins = Count(*),
WebsiteLogins = Count(*) WHERE Channel = 'Website',
MobileLogins = Count(*) WHERE Channel = 'Mobile',
for (i in LoginHistory):
if Duration < 300:
count(NumLogins) + 1
** Ok - I'm laughing at myself the way I combined multiple different languages/syntaxes, but this is how I am thinking about solving this **
Thoughts on a good way to accomplish this? My preference is to use a loop so I can continue to write if/then logic into the code.
Ok, this is one of those times where a CURSOR would probably outperform a set based solution. Sadly, I'm not very good with cursors, so I can give you a set base solution for you to try:
;WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY [DateTime]) RN
FROM UserLogins
), CTE2 AS
(
SELECT *, 1 RecursionLevel
FROM CTE
WHERE RN = 1
UNION ALL
SELECT B.UserID, B.Channel, B.[DateTime],
A.DurationInSeconds+B.DurationInSeconds,
B.RN, RecursionLevel+1
FROM CTE2 A
INNER JOIN CTE B
ON A.UserID = B.UserID AND A.RN = B.RN - 1
)
SELECT A.UserID,
COUNT(*) TotalLogins,
SUM(CASE WHEN Channel = 'Website' THEN 1 ELSE 0 END) WebsiteLogins,
SUM(CASE WHEN Channel = 'Mobile' THEN 1 ELSE 0 END) MobileLogins,
ISNULL(MIN(RecursionLevel),0) LoginsNeedeto5Min
FROM UserLogins A
LEFT JOIN ( SELECT UserID, MIN(RecursionLevel) RecursionLevel
FROM CTE2
WHERE DurationInSeconds > 300
GROUP BY UserID) B
ON A.UserID = B.UserID
GROUP BY A.UserID
A slightly different piece-wise approach. A minor difference is that the recursive portion terminates when it reaches 300 seconds for each user rather than summing all of the available logins.
An index on UserId/StartTime should improve performance on larger datasets.
declare #Logins as Table ( UserId Int, Channel VarChar(10), StartTime DateTime, DurationInSeconds Int )
insert into #Logins ( UserId, Channel, StartTime, DurationInSeconds ) values
( 1, 'Website', '1/1/2013 1:13PM', 170 ),
( 2, 'Mobile', '1/1/2013 2:10PM', 60 ),
( 3, 'Website', '1/1/2013 3:10PM', 180 ),
( 4, 'Website', '1/1/2013 3:20PM', 280 ),
( 5, 'Website', '1/1/2013 5:00PM', 60 ),
( 1, 'Website', '1/1/2013 5:05PM', 500 ),
( 3, 'Website', '1/1/2013 5:45PM', 120 ),
( 1, 'Mobile', '1/1/2013 6:00PM', 30 ),
( 2, 'Mobile', '1/1/2013 6:10PM', 90 ),
( 5, 'Mobile', '1/1/2013 7:30PM', 400 ),
( 3, 'Website', '1/1/2013 8:00PM', 30 ),
( 1, 'Mobile', '1/1/2013 9:30PM', 200 )
select * from #Logins
; with MostRecentLogins as (
-- Logins with flags for channel and sequenced by StartTime (ascending) for each UserId .
select UserId, Channel, StartTime, DurationInSeconds,
case when Channel = 'Website' then 1 else 0 end as WebsiteLogin,
case when Channel = 'Mobile' then 1 else 0 end as MobileLogin,
Row_Number() over ( partition by UserId order by StartTime ) as Seq
from #Logins ),
CumulativeDuration as (
-- Start with the first login for each UserId .
select UserId, Seq, DurationInSeconds as CumulativeDurationInSeconds
from MostRecentLogins
where Seq = 1
union all
-- Accumulate additional logins for each UserId until the running total exceeds 300 or they run out of logins.
select CD.UserId, MRL.Seq, CD.CumulativeDurationInSeconds + MRL.DurationInSeconds
from CumulativeDuration as CD inner join
MostRecentLogins as MRL on MRL.UserId = CD.UserId and MRL.Seq = CD.Seq + 1 and CD.CumulativeDurationInSeconds < 300 )
-- Display the summary.
select UserId, Sum( WebsiteLogin + MobileLogin ) as TotalLogins,
Sum( WebsiteLogin ) as WebsiteLogins, Sum( MobileLogin ) as MobileLogins,
( select Max( Seq ) from CumulativeDuration where UserId = LT3.UserId and CumulativeDurationInSeconds >= 300 ) as LoginsNeededTo5Min
from MostRecentLogins as LT3
group by UserId
order by UserId
Note that your sample results seem to have an error. UserId 3 reaches 300 seconds in two calls: 180 + 120. Your example shows three calls.