Picking random row with certain chance (weight) in MySQL - mysql

I searched this for a while but results are just confusing my head because I am quite new on MySQL.
I have a table with these 4 columns: AUTO_INCREMENT ID, NAME, TYPE, CHANCE so rows look like this:
1, NOTHING, NO, 35
1, VERSICOLOR, TREE, 35
3, DIVERSIPES, TREE, 35
4, AMAZONICA, TREE, 35
5, EMILIA, GROUND, 25
6, BOEHMI, GROUND, 25
7, SMITHI, GROUND, 25
8, METALLICA, SKY, 5
9, REGALIS, SKY, 5
Note: Those are simple examples, there will be x100 like them. What I need to do is picking one row from this table with chances as shown in CHANCE column.
Meaning; I need to pick one row from 9 of them and results can be "VERSICOLOR, DIVERSIPES, AMAZONICA or NOTHING with 35% chance" or "EMILIA, BOEHMI or SMITHI with 25% chance" or "METALLICA or REGALIS with %5 chance". So this query will probably give me the result of "VERSICOLOR, DIVERSIPES, AMAZONICA or NOTHING" because it has 35% chance or maybe I am gonna be lucky and I will get the "METALLICA or REGALIS" :)
Basicly there are 3 group of types, GROUND, TREE and SKY. What I want to do is getting only one result from all of these. GROUND, TREE or SKY type item with certain chances but to be certain, I dont want one for each group, I want only one result, it can be item of GROUND, TREE or SKY type.
I hope that I did explain myself. Regards.

There is probably a more elegant solution and one that doesn't assume your percentages add to 100 - but this may work:
Example http://sqlfiddle.com/#!2/ec699/1
SELECT *
FROM (
SELECT id, name, type, chance
, #value + 1 AS lowval
, #value := #value + chance AS hival
FROM tbl
JOIN (SELECT #value := 0) AS foo) AS bar
JOIN (SELECT FLOOR(1 + RAND()*99) AS guesser) AS bar2
ON guesser BETWEEN lowval and hival;

This problem comes up outside SQL. I posted a general solution here, generate random numbers within a range with different probabilities.
With the first set of numbers you used (70%, 25%, 5%) fill three urns with 100 balls each. Urn 0 all balls are red. Urn 1 75 of the balls are green, 25 balls are red. Finally, urn 2 will have 15 blue balls and 85 red balls. Now pick a random urn, each having probability 1/3, and pick a random ball. Using this scheme the probability of getting a red ball is 0.75, the probability of a green ball is 0.25 and the probability of a blue ball is 0.05.

Related

Min and Max, but there are also letters

I'll keep this as simple as I can, I'm new to MySQL and I have values in my column such as "10, 5, 3 and n/v", the code is
SELECT MIN(InsertColumnHere), MAX(InsertColumnHere)
FROM diplomaeval
and the result is..
MIN = 10 and MAX = N/V
I need for them to be "switched" places so they go to their respective places, as N/V is the lowest and 10 is the highest, not the other way around.

SQL generate a random positive or negative value

I am looking for a way to change a value randomly from positive to negative. (I am creating a distortion on a lat/long location, so I would like to offset a given location with +/- some degrees)
I already created the following query which give me a number between -1 and +1, the idea is to multiply my distortion with this number to get a random negative or positive number.
SELECT round(-1+3*RAND(),0);
The only problem is, this also generates the value 0.0 which can't be multiplied. How do I get -1 or +1 only?
TIA
ABBOV
Maybe:
start by rounding, to give 0 or 1
then multiply to give 0 or 2
then subtract, to give -1 or 1
i.e.:
SELECT ROUND(RAND()) * 2 - 1;

occurrence to score

I get a frequency of words I would like to convert number_of_occuerence to a number between 0-10.
word number_of_occurrence score
and 200 10
png 2 1
where 50 6
news 120 7
If you want to rate terms frequencies in a corpus, I suggest you to read this wikipedia article : Term frequency–inverse document frequency.
There are many ways to count the term frequency.
I understood want to rate it between 0 to 10.
I didn't get how you calculated you score values examples.
Anyway I suggest you an usual method: the log function.
#count the occurrences of you terms
freq_table = {}
words = tokenize(sentence)
for word in words:
word = word.lower()
#stem the word if you can, using nltk
if word in stopWords:#do you really want to count the occurrences of 'and'?
continue
if word in freq_table:
freq_table[word] += 1
else:
freq_table[word] = 1
#log normalize the occurrences
for wordCount in freq_table.values():
wordCount = 10*math.log(1+wordCount)
of course instead of log normalization you can use a normalization by the maximum.
#ratio max normalize the occurrences
max = max(freq_table.values())
for wordCount in freq_table.values():
wordCount = 10*wordCount/max
Or if you need a threshold effect,you can use a sigmoid function you could customize:
For more word processing check the Natural Language Toolkit. For a good term frequency count stematisation is good choice (stopwords are also useful)!
Score is between 0-10. The maximum score is 10 for occurence 50, therefore anything higher than that should also has score 10. On the other hand, minimum score is 0, while the score is 1 for occurence 5, so assume anything lower than that has score 0.
Interpolation is based on your given condition only:
If a word appear 50 times it should be closer to 10 and if a word
appear 5 times it should be closer to 1.
df['score'] = df['number_of_occurrence'].apply(lambda x: x/5 if 5<=x<=50 else (0 if x< 5 else 10))
Output:

LineString to find Vehicle Passing through my 2 Line (4 Points)

I have got a task where in I have to Draw 2 lines on Google Map (4 Points) and on Submit event I need to display the Vehicle passing through that points. I am able to draw 2 lines on google map which gives me 4 points in Lat/Longitude format.
Now the main questions is how can I query the database to get the Vehicle passing through two lines. I know I might have to use LineString function in T-SQL but how do i get all the vehicle passing through that lines? Any suggestions is welcome.
Given that I'm not sure how you are representing your "car" or your "lines" and must make assumptions, this code sample might be able to get the ball rolling for you.
It will return a dataset indicating which cars have passed through which lines.
Spatial queries are not my forte; perhaps someone else could offer an optimisation.
-- This is the first line as describer by 2 points
DECLARE #line1 GEOMETRY = geometry::STGeomFromText('LINESTRING(0 10, 10 10)', 0)
-- This is the second line as describer by another 2 points
DECLARE #line2 GEOMETRY = geometry::STGeomFromText('LINESTRING(0 20, 10 20)', 0)
-- #Car1's path is represented as a line that does NOT intersect the 2 defined lines above
DECLARE #Car1 GEOMETRY = geometry::STGeomFromText('LINESTRING(5 0, 5 11)', 0)
-- #Car2's path is represented as a line that DOES intersect that 2 defined lines above
DECLARE #Car2 GEOMETRY = geometry::STGeomFromText('LINESTRING(5 0, 5 23)', 0)
;WITH Lines (LineID, LineGeom) AS
(
SELECT 1, #line1 UNION ALL
SELECT 2, #line2
)
,Cars (CarID, CarGeom) AS
(
SELECT 1, #Car1 UNION ALL
SELECT 2, #Car2
)
SELECT C.CarID
,L.LineID
FROM Cars C
JOIN Lines L ON L.LineGeom.STIntersects(C.CarGeom) = 1

Select random row from MySQL (with probability)

I have a MySQL table that has a row called cur_odds which is a percent number with the percent probability that that row will get selected. How do I make a query that will actually select the rows in approximately that frequency when you run through 100 queries for example?
I tried the following, but a row that has a probability of 0.35 ends up getting selected around 60-70% of the time.
SELECT * FROM table ORDER BY RAND()*cur_odds DESC
All the values of cur_odds in the table add up to 1 exactly.
If cur_odds is changed rarely you could implement the following algorithm:
1) Create another column prob_sum, for which
prob_sum[0] := cur_odds[0]
for 1 <= i <= row_count - 1:
prob_sum[i] := prob_sum[i - 1] + cur_odds[i]
2) Generate a random number from 0 to 1:
rnd := rand(0,1)
3) Find the first row for which prob_sum > rnd (if you create a BTREE index on the prob_sum, the query should work much faster):
CREATE INDEX prob_sum_ind ON <table> (prob_sum);
SET #rnd := RAND();
SELECT MIN(prob_sum) FROM <table> WHERE prob_sum > #rnd;
Given your above SQL statement, whatever numbers you have in cur_odds are not the probabilities that each row is selected, but is instead just an arbitrary weighting (relative to the "weights" of all the other rows) which could instead be best interpreted as a relative tendency to float towards the top of the sorted table. The actual value in each row is meaningless (e.g. you could have 4 rows with values of 0.35, 0.5, 0.75 and 0.99, or you could have values of 35, 50, 75 and 99, and the results would be the same).
Update: Here's what's going on with your query. You have one row with a cur_odds value of 0.35. For the sake of illustration, I'm going to assume that the other 9 rows all have the same value (0.072). Also for the sake of illustration, let's assume RAND() returns a value from 0.0 to 1.0 (it may actually).
Every time you run this SELECT statement, each row is assigned a sorting value by multiplying its cur_odds value by a RAND() value from 0.0 to 1.0. This means that the row with a 0.35 will have a sorting value between 0.0 and 0.35.
Every other row (with a value of 0.072) will have sorting values ranging between 0.0 and 0.072. This means that there is an approximately 80% chance that your one row will have a sorting value greater than 0.072, which would mean that there is no possible chance that any other row could be sorted higher. This is why your row with the cur_odds value of 0.35 is coming up first more often than you expect.
I incorrectly described the cur_odds value as a relative change weighting. It actually functions as a maximum relative weighting, which would then involve some complex math to determine the actual relative probabilities involved.
I'm not sure what you need can be done with straight T-SQL. I've implemented a weighted probability picker many times (I was even going to ask a question about best methods for this this morning, ironically) but always in code.