Min and Max, but there are also letters - mysql

I'll keep this as simple as I can, I'm new to MySQL and I have values in my column such as "10, 5, 3 and n/v", the code is
SELECT MIN(InsertColumnHere), MAX(InsertColumnHere)
FROM diplomaeval
and the result is..
MIN = 10 and MAX = N/V
I need for them to be "switched" places so they go to their respective places, as N/V is the lowest and 10 is the highest, not the other way around.

Related

occurrence to score

I get a frequency of words I would like to convert number_of_occuerence to a number between 0-10.
word number_of_occurrence score
and 200 10
png 2 1
where 50 6
news 120 7
If you want to rate terms frequencies in a corpus, I suggest you to read this wikipedia article : Term frequency–inverse document frequency.
There are many ways to count the term frequency.
I understood want to rate it between 0 to 10.
I didn't get how you calculated you score values examples.
Anyway I suggest you an usual method: the log function.
#count the occurrences of you terms
freq_table = {}
words = tokenize(sentence)
for word in words:
word = word.lower()
#stem the word if you can, using nltk
if word in stopWords:#do you really want to count the occurrences of 'and'?
continue
if word in freq_table:
freq_table[word] += 1
else:
freq_table[word] = 1
#log normalize the occurrences
for wordCount in freq_table.values():
wordCount = 10*math.log(1+wordCount)
of course instead of log normalization you can use a normalization by the maximum.
#ratio max normalize the occurrences
max = max(freq_table.values())
for wordCount in freq_table.values():
wordCount = 10*wordCount/max
Or if you need a threshold effect,you can use a sigmoid function you could customize:
For more word processing check the Natural Language Toolkit. For a good term frequency count stematisation is good choice (stopwords are also useful)!
Score is between 0-10. The maximum score is 10 for occurence 50, therefore anything higher than that should also has score 10. On the other hand, minimum score is 0, while the score is 1 for occurence 5, so assume anything lower than that has score 0.
Interpolation is based on your given condition only:
If a word appear 50 times it should be closer to 10 and if a word
appear 5 times it should be closer to 1.
df['score'] = df['number_of_occurrence'].apply(lambda x: x/5 if 5<=x<=50 else (0 if x< 5 else 10))
Output:

How to create query with simple formula?

Hey is there any way to create query with simple formula ?
I have a table data with two columns value_one and value_two both are decimal values. I want to select this rows where difference between value_one and value_two is grater then 5. How can i do this?
Can i do something like this ?
SELECT * FROM data WHERE (MAX(value_one, value_two) - MIN(value_one, value_two)) > 5
Example values
value_one, value_two
1,6
9,3
2,3
3,2
so analogical difs are: 5, 6, 1, 1 so the selected row would be only first and second.
Consider an example where smaller number is subtracted with a bigger number:
2 - 5 = -3
So, the result is a difference of two numbers with a negation sign.
Now, consider the reverse scenario, when bigger number is subtracted with the smaller number:
5 - 2 = 3
Pretty simple right.
Basically, the difference of two number remains same, if you just ignore the sign. This is in other words called absolute value of a number.
Now, the question arises how to find the absolute value in MySQL?
Answer to this is the built-in method of MySQL i.e. abs() function which returns an absolute value of a number.
ABS(X):
Returns the absolute value of X.
mysql> SELECT ABS(2);
-> 2
mysql> SELECT ABS(-32);
-> 32
Therefore, without worrying about finding min and max number, we can directly focus on the difference of two numbers and then, retrieving the absolute value of the result. Finally, check if it is greater than 5.
So, the final query becomes:
SELECT *
FROM data
WHERE abs(value_one - value_two) > 5;
You can also do complex operations once the absolute value is calculated like adding or dividing with the third value. Check the code below:
SELECT *
FROM
data
WHERE
(abs(value_one - value_two) / value_three) + value_four > 5;
You can also add multiple conditions using logical operators like AND, OR, NOT to do so. Click here for logical operators.
SELECT *
FROM
data
WHERE
((abs(value_one - value_two) / value_three) + value_four > 5)
AND (value_five != 0);
Here is the link with various functions available in MySQL:
https://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html
No, you would just use a simple where clause:
select *
from data
where abs(value_one - value_two) > 5;

Round vs Truncate in top percent

I was wondering if there was a way to change truncate into round when doing a top x percent.
For Example:
Select Top 10 Percent
HospMastID
,ClientID
,ControlGroup = 1
Into
#RandomTable
From
#ClientsTable
Order By
NewID()
At present when I have a total of 1176 original records it returns 117 as the top 10 percent. Just curious what the setting would be to change this. Since it is really truncating the original numbers instead of rounding it.
Thanks,
Scott
If you want to ROUND the results, you can calculate your own value to use in TOP (granted, this means that you need to first count the rows of your table, but it's the only way that I can think of doing this, since there isn't a setting for this):
DECLARE #TopPercent INT, #Top INT
SET #TopPercent = 10 -- use the value you want here
SELECT #Top = ROUND(COUNT(*)*CAST(#TopPercent AS DECIMAL(4,1))/100,0)
FROM #ClientsTable
SELECT TOP(#Top)
HospMastID,
ClientID,
ControlGroup = 1
INTO #RandomTable
FROM #ClientsTable
ORDER BY NEWID()

MySQL - get all column averages also with a 'total' average

I have a MySQL table which looks like this:
id load_transit load_standby
1 40 20
2 30 15
3 50 10
I need to do the following calculations:
load_transit_mean = (40+30+50)/3 = 40
load_standby_mean = (20+15+10)/3 = 15
total_mean = (40+15)/2 = 27.5
Is it possible to do this in a single query? What would the best design be?
I need my answer to be scalable (the real design has more rows and columns), and able to handle some rows containing NULL.
I believe this would do it:
SELECT AVG(Load_transit)
, AVG(load_standby)
, (AVG(Load_transit) + AVG(load_standby))/2.0
FROM table
The AVG() function handles NULL's in that it ignores them, if you want the NULL row to be counted in your denominator you can replace AVG() with SUM() over COUNT(*), ie:
SUM(load_transit)/COUNT(*)
Regarding scalability, manually listing them out like above is probably the simplest solution.

Select random row from MySQL (with probability)

I have a MySQL table that has a row called cur_odds which is a percent number with the percent probability that that row will get selected. How do I make a query that will actually select the rows in approximately that frequency when you run through 100 queries for example?
I tried the following, but a row that has a probability of 0.35 ends up getting selected around 60-70% of the time.
SELECT * FROM table ORDER BY RAND()*cur_odds DESC
All the values of cur_odds in the table add up to 1 exactly.
If cur_odds is changed rarely you could implement the following algorithm:
1) Create another column prob_sum, for which
prob_sum[0] := cur_odds[0]
for 1 <= i <= row_count - 1:
prob_sum[i] := prob_sum[i - 1] + cur_odds[i]
2) Generate a random number from 0 to 1:
rnd := rand(0,1)
3) Find the first row for which prob_sum > rnd (if you create a BTREE index on the prob_sum, the query should work much faster):
CREATE INDEX prob_sum_ind ON <table> (prob_sum);
SET #rnd := RAND();
SELECT MIN(prob_sum) FROM <table> WHERE prob_sum > #rnd;
Given your above SQL statement, whatever numbers you have in cur_odds are not the probabilities that each row is selected, but is instead just an arbitrary weighting (relative to the "weights" of all the other rows) which could instead be best interpreted as a relative tendency to float towards the top of the sorted table. The actual value in each row is meaningless (e.g. you could have 4 rows with values of 0.35, 0.5, 0.75 and 0.99, or you could have values of 35, 50, 75 and 99, and the results would be the same).
Update: Here's what's going on with your query. You have one row with a cur_odds value of 0.35. For the sake of illustration, I'm going to assume that the other 9 rows all have the same value (0.072). Also for the sake of illustration, let's assume RAND() returns a value from 0.0 to 1.0 (it may actually).
Every time you run this SELECT statement, each row is assigned a sorting value by multiplying its cur_odds value by a RAND() value from 0.0 to 1.0. This means that the row with a 0.35 will have a sorting value between 0.0 and 0.35.
Every other row (with a value of 0.072) will have sorting values ranging between 0.0 and 0.072. This means that there is an approximately 80% chance that your one row will have a sorting value greater than 0.072, which would mean that there is no possible chance that any other row could be sorted higher. This is why your row with the cur_odds value of 0.35 is coming up first more often than you expect.
I incorrectly described the cur_odds value as a relative change weighting. It actually functions as a maximum relative weighting, which would then involve some complex math to determine the actual relative probabilities involved.
I'm not sure what you need can be done with straight T-SQL. I've implemented a weighted probability picker many times (I was even going to ask a question about best methods for this this morning, ironically) but always in code.