What way to "rank" dataset Mysql

What way to "rank" dataset Mysql - mysql

Situation is as follows:
I have a database with 40.000 cities. Those cities have certain types of properties with an value.
For example "mountains" or "beaches". If a city has lots of mountains the value for mountain will be high if there are less mountains the number is lower.
Table with city name and properties and values:
With that, I have a table with the avarage values of all those properties.
What I need to happen: I want the user search for a city with has one or multiple properties, find the best match and attach a score from 0 - 100 to it.
The way I do this is as follow:
1. I first get the 25%, 50% and 70% values for the properties:
_var_[property]_25 = [integer]
_var_[property]_50 = [integer]
_var_[property]_70 = [integer]
2. Then I need to use this algorithm:
_var_user_search_for_properties = [mountain,beach]
_var_max_property_percentage = 100 / [properties user search for]
_var_match_percentage = 0
for each _var_user_search_for_properties
if [property] < _var_[property]_25 then
_var_match_percentage += _var_max_property_percentage
elseif [property] < _var_[property]_50 then
_var_match_percentage += _var_max_property_percentage / 4 * 3
elseif [property] < _var_[property]_75 then
_var_match_percentage += _var_max_property_percentage / 4 * 2
elseif [property] < 0 then
_var_match_percentage += _var_max_property_percentage / 4 * 1
end if
next
order all rows by _var_match_percentage desc
The question is: is it posible to do this with MySQL?
How do I calculate this "match percentage" with it?
Or wil it be faster to get all the rows and indexes out of the database and loop them all trough .NET?

If the percentages can be stored in the database, you could try MySQL's LIMIT clause. See http://www.mysqltutorial.org/mysql-limit.aspx.

Related

Finding prime numbers up till a number

I am trying to list down all the prime numbers up till a specific number e.g. 1000. The code gets slower as the number increase. I am pretty sure it is because of the for loop where (number -1) is checked by all the prime_factors. Need some advise how I can decrease the processing time of the code for larger numbers. Thanks
import time
t0 = time.time()
prime_list = [2]
number = 0
is_not_prime = 0
count = 0
while number < 1000:
print(number)
for i in range (2,number):
count = 0
if (number%i) == 0:
is_not_prime = 1
if is_not_prime == 1:
for j in range (0,len(prime_list)):
if(number-1)%prime_list[j] != 0:
count += 1
if count == len(prime_list):
prime_list.append(number-1)
is_not_prime = 0
count = 0
break
number += 1
print(prime_list)
t1 = time.time()
total = t1-t0
print(total)

Your solution, on top of being confusing, is very inefficient - O(n^3). Please, use the Sieve of Eratosthenes. Also, learn how to use booleans.
Something like this (not optimal, just a mock-up). Essentially, you start with a list of all numbers, 1-1000. Then, you remove ones that are the multiple of something.
amount = 1000
numbers = range(1, amount)
i = 1
while i < len(numbers):
n = i + 1
while n < len(numbers):
if numbers[n] % numbers[i] == 0:
numbers.pop(n)
else:
n += 1
i += 1
print(numbers)
Finally, I was able to answer because your question isn't language-specific, but please tag the question with the language you're using in the example.

Octave Calculus with matrix

i'm developing a program in octave that i will explain as i put the code.
So i have this matrix in a file called matprec.m:
function [res1] = obtemDadosPrec()
res1 = [
1,2001,1,2,0.00;
1,2001,1,5,5.33;
2,2001,1,5,4.57;
3,2001,1,5,5.33;
4,2001,1,5,5.59;
5,2001,1,5,4.32;
2,2001,1,13,0.00;
3,2001,1,13,0.00;
4,2001,1,13,0.00;
3,2001,1,30,30.73;
2,2001,2,1,1.02;
3,2001,2,1,1.52;
4,2001,2,1,1.78;
5,2001,2,1,1.27;
1,2001,2,2,1.78;
2,2001,2,2,1.27;
3,2001,2,2,1.78;
4,2001,2,2,2.03;
5,2001,2,2,1.78;
1,2001,3,4,18.03;
3,2001,3,4,15.75;
5,2001,3,4,17.53;
1,2001,3,5,13.46;
2,2001,3,5,12.19;
3,2001,3,5,11.94;
4,2001,3,5,9.65;
5,2001,3,5,10.92;
2,2001,4,30,0.00;
4,2001,4,30,0.00];
format short g
return
endfunction
so in this matrix the first column is just the station where we measure the amount of precipitation, the second is the year, the third is the month, the fourth is the day and the fifth is the value of precipitation.
And what i want to do in another file is call this matrix and do the following calculus, in the month 1 i want do the average on all the days for example:
in month 1 day 5 i have 5 values 5.33, 4.57, 5.33, 5.59, 4.32, so i would do
(5.33 + 4.57 + 5.33 + 5.59 + 4.32)/5 = 5.028
And i want to do that for all the days and when i have all the days i would add them all to know the amount of precipitation in that month, and do that for all the 4 months.
I'm kind of stuck there if you could help me i would appreciate, thanks a lot!

First, get your array
>> Result = obtemDadosPrec();
Then get a logical array where rows corresponding to month == 1 are true (i.e. 1) and all others are false (i.e. 0)
>> month1Indices = Result(:,3) == 1;
Use this logical array to perform logical indexing and isolate only the 'true' rows.
>> month1Rows = Result(month1Indices, :);
Repeat same procedure to isolate 'day 5'
>> day5Indices = month1Rows(:,4) == 5;
>> day5Rows = month1Rows(day5Indices , :);
Calculate the average of the 5th column.
>> mean(day5Rows(:,5))
ans = 5.028

Recursively counting divisors of a number

I have a question, more on the theoretical side. I want to make a recursive function that counts all (not only prime) different divisors of a given natural number.
For example with f(0)=0 (per Def.), f(3) = 2, f(6) = 4, f(16) = 5 etc.
Theoretically, how could I do that?
Thanks.

If I understand correctly, you only want to COUNT them, not to collect them, right?
A second assumption is that you don't want to count only independent divisors (i.e. you want to count "2", "3" but not "6").
If this is the case, the algorithm shown in Sean's answer can be simplified significantly:
You don't need the array divisorList but only a counter,
You as soon as you find a divisor, you can reduce the max limit of the loop by the result of dividing the root number by the divisor (e.g. if your root number is 900 and 2 is the first divisor, you can set the limit of the loop to 450; then, when checking 3 you will reduce the limit to 150 and so on).
EDIT:
After thinking a little bit more, here is the correct algorithm:
Assume that the number is "N". Then, you already start with a count of 2 (i.e. 1 and N),
You then check if N divides by 2; if it does, you need to add 2 to the count (i.e. 2 and N/2),
You then change the limit of the loop to N/2,
Test if dividing by 3 yields an integer; if it does, you add 2 to the count (i.e. 3 and N/3) and reduce the limit to N/3,
Test 4...
Test 5...
...
In Pseudo-code:
var Limit = N ;
Count = 2 ;
for (I = 2 ; I < Limit ; I++) {
if (N/I is integer) {
Count = Count + 3 ;
Limit = N / I ;
} ;
} ;
Note: I don't know which language you are programming, so you need to verify if your language allows you to change the limit of the loop. If it does not, you can include an EXIT-LOOP condition (e.g. if I >= Limit then exit loop).
Hope this resolves your problem.

public static ArrayList<int> recursiveDivisors(int num)
{
ArrayList<int> divisorList = new ArrayList<int>();
for (int i = 1; i <= num; i++)
{
if (num % i == 0)
divisorList.add(i)
}
return divisorList;
}
Something like this?
Returns all divisors in a divisor array list
EDIT: Not recursive

Arduino - waiting on function causes a number pile-up

I have a function filledFunction() that returns a float filled:
float filledFunction(){
if (FreqMeasure.available()) {
sum = sum + FreqMeasure.read();
count = count + 1;
if (count > 30) {
frequency = FreqMeasure.countToFrequency(sum / count);
a = frequency * x;
b = exp (a);
c = w * b;
d = frequency * z;
e = exp (d);
f = y * e;
float filled = c + f;
sum = 0;
count = 0;
return filled;
}
}
}
When I call this function with
while (1){
fillLevel = filledFunction();
int tofill = 500 - fillLevel;
Serial.print("fillLevel: ");
Serial.println(fillLevel);
Serial.print("tofill: ");
Serial.println(tofill);
The serial monitor should output two numbers that add up to 500 named fillLevel and tofill. Instead I get a repeating sequence of similar values:
http://i.imgur.com/Y9Wu8P2.png
The First two values are correct (410.93 + 89 = 500), but the following 60ish values are unknown to me, and do not belong there.
I am using an arduino nano

The filledFunction() function only returns a value if FreqMeasure.available() returns true AND count > 30. As stated in the answers to this question the C89, C99 and C11 standards all say that the default return value of a function is undefined (that is if the function completes without executing a return statement). Which really means that anything could happen, such as outputting arbitrary numbers.
In addition, the output that you're seeing starts off 'correct' with one of the numbers subtracted from 500, even when they have weird values such as 11699.00 and -11199 (which equals 500 - 11699.00). However, lower down in the output this seems to break down and the reason is that on Arduino Nano an int can only hold numbers up to 32767 and therefore the result of the subtraction is too big and 'overflows' to be a large negative number.
Fixing the filledFunction() function to explicitly return a value even if FreqMeasure.available() is false or count <= 30 and ensuring that it can't return a number greater than 500 will likely solve these issues.

Data set too large to load into memory for processing

I have a bigger rapidly growing data set of around 4 million rows, in order to define and exclude the outliers (for statistics / analytics usage) I need the algorithm to consider all entries in this data set. However this is too much data to load into memory and my system chokes. I'm currently using this to collect and process the data:
#scoreInnerFences = innerFence Post.where( :source => 1 ).
order( :score ).
pluck( :score )
Using the typical divide and conquer method won't work, I don't think because every entry has to be considered to keep my outlier calculation accurate. How can this be achieved efficiently?
innerFence identifies the lower quartile and upper quartile of the data set, then uses those findings to calculate the outliers. Here is the (yet to be refactored, non-DRY) code for this:
def q1(s)
q = s.length / 4
if s.length % 2 == 0
return ( s[ q ] + s[ q - 1 ] ) / 2
else
return s[ q ]
end
end
def q2(s)
q = s.length / 4
if s.length % 2 == 0
return ( s[ q * 3 ] + s[ (q * 3) - 1 ] ) / 2
else
return s[ q * 3 ]
end
end
def innerFence(s)
q1 = q1(s)
q2 = q2(s)
iq = (q2 - q1) * 3
if1 = q1 - iq
if2 = q2 + iq
return [if1, if2]
end

This is not the best way, but it is an easy way:
Do several querys. First you count the number of scores:
q = Post.where( :source => 1 ).count
then you do your calculations
then you fetch the scores
q1 = Post.where( :source => 1 ).
reverse_order(:score).
select("avg(score) as score").
offset(q).limit((q%2)+1)
q2 = Post.where( :source => 1 ).
reverse_order(:score).
select("avg(score) as score").
offset(q*3).limit((q%2)+1)
The code is probably wrong but I'm sure you get the idea.

For large datasets, I sometimes drop down below ActiveRecord. It's a memory hog, even I imagine, using pluck. Of course it's less portable, but sometimes it's worth it.
scores = Post.connection.execute('select score from posts where score > 1 order by score').map(&:first)
Don't know if that will help enough for 4 million record. If not, maybe look at a stored procedure?

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

What way to "rank" dataset Mysql - mysql

If the percentages can be stored in the database, you could try MySQL's LIMIT clause. See http://www.mysqltutorial.org/mysql-limit.aspx.

Related

Finding prime numbers up till a number

Octave Calculus with matrix

Recursively counting divisors of a number

Arduino - waiting on function causes a number pile-up

Data set too large to load into memory for processing

Categories

Resources