Finding prime numbers up till a number - prime-factoring

I am trying to list down all the prime numbers up till a specific number e.g. 1000. The code gets slower as the number increase. I am pretty sure it is because of the for loop where (number -1) is checked by all the prime_factors. Need some advise how I can decrease the processing time of the code for larger numbers. Thanks
import time
t0 = time.time()
prime_list = [2]
number = 0
is_not_prime = 0
count = 0
while number < 1000:
print(number)
for i in range (2,number):
count = 0
if (number%i) == 0:
is_not_prime = 1
if is_not_prime == 1:
for j in range (0,len(prime_list)):
if(number-1)%prime_list[j] != 0:
count += 1
if count == len(prime_list):
prime_list.append(number-1)
is_not_prime = 0
count = 0
break
number += 1
print(prime_list)
t1 = time.time()
total = t1-t0
print(total)

Your solution, on top of being confusing, is very inefficient - O(n^3). Please, use the Sieve of Eratosthenes. Also, learn how to use booleans.
Something like this (not optimal, just a mock-up). Essentially, you start with a list of all numbers, 1-1000. Then, you remove ones that are the multiple of something.
amount = 1000
numbers = range(1, amount)
i = 1
while i < len(numbers):
n = i + 1
while n < len(numbers):
if numbers[n] % numbers[i] == 0:
numbers.pop(n)
else:
n += 1
i += 1
print(numbers)
Finally, I was able to answer because your question isn't language-specific, but please tag the question with the language you're using in the example.

Related

How to Convert base 10 to base 2 and find the base 10 integer that denotes most consecutive 1s?

I have been trying to convert base 10 to base 2 and finding the base 10 integer that denotes most consecutive 1s
if __name__ == '__main__':
n = int(input().strip())``
outcomes = 0
biggest = 0
while n > 0:
if n % 2 == 1:
outcomes += 1
if outcomes > biggest:
biggest = outcomes
else:
result = 0
print(biggest)
those are my codes please correct where i'm wrong

kmer counts with cython implementation

I have this function implemented in Cython:
def count_kmers_cython(str string, list alphabet, int kmin, int kmax):
"""
Count occurrence of kmers in a given string.
"""
counter = {}
cdef int i
cdef int j
cdef int N = len(string)
limits = range(kmin, kmax + 1)
for i in range(0, N - kmax + 1):
for j in limits:
kmer = string[i:i+j]
counter[kmer] = counter.get(kmer, 0) + 1
return counter
Can I do better with cython? Or Can I have any away to improve it?
I am new to cython, that is my first attempt.
I will use this to count kmers in DNA with alphabet restrict to 'ACGT'. The length of the general input string is the average bacterial genomes (130 kb to over 14 Mb, where each 1 kb = 1000 bp).
The size of the kmers will be 3 < kmer < 16.
I wish to know if I could go further and maybe use cython in this function to:
def compute_kmer_stats(kmer_list, counts, len_genome, max_e):
"""
This function computes the z_score to find under/over represented kmers
according to a cut off e-value.
Inputs:
kmer_list - a list of kmers
counts - a dictionary-type with k-mers as keys and counts as values.
len_genome - the total length of the sequence(s).
max_e - cut off e-values to report under/over represented kmers.
Outputs:
results - a list of lists as [k-mer, observed count, expected count, z-score, e-value]
"""
print(colored('Starting to compute the kmer statistics...\n',
'red',
attrs=['bold']))
results = []
# number of tests, used to convert p-value to e-value.
n = len(list(kmer_list))
for kmer in kmer_list:
k = len(kmer)
prefix, sufix, center = counts[kmer[:-1]], counts[kmer[1:]], counts[kmer[1:-1]]
# avoid zero division error
if center == 0:
expected = 0
else:
expected = (prefix * sufix) // center
observed = counts[kmer]
sigma = math.sqrt(expected * (1 - expected / (len_genome - k + 1)))
# avoid zero division error
if sigma == 0.0:
z_score = 0.0
else:
z_score = ((observed - expected) / sigma)
# pvalue for all kmers/palindromes under represented
p_value_under = (math.erfc(-z_score / math.sqrt(2)) / 2)
# pvalue for all kmers/palindromes over represented
p_value_over = (math.erfc(z_score / math.sqrt(2)) / 2)
# evalue for all kmers/palindromes under represented
e_value_under = (n * p_value_under)
# evalue for all kmers/palindromes over represented
e_value_over = (n * p_value_over)
if e_value_under <= max_e:
results.append([kmer, observed, expected, z_score, p_value_under, e_value_under])
elif e_value_over <= max_e:
results.append([kmer, observed, expected, z_score, p_value_over, e_value_over])
return results
OBS - Thank you CodeSurgeon by the help. I know there are other tools to count kmer efficiently but I am learning Python so I am trying to write my own functions and code.

How to calculate a probability vector and an observation count vector for a range of bins?

I want to test the hypothesis whether some 30 occurrences should fit a Poisson distribution.
#GNU Octave
X = [8 0 0 1 3 4 0 2 12 5 1 8 0 2 0 1 9 3 4 5 3 3 4 7 4 0 1 2 1 2]; #30 observations
bins = {0, 1, [2:3], [4:5], [6:20]}; #each bin can be single value or multiple values
I am trying to use Pearson's chi-square statistics here and coded the below function. I want a Poisson vector to contain corresponding Poisson probabilities for each bin and count the observations for each bin. I feel the loop is rather redundant and ugly. Can you please let me know how can I re-factor the function without the loop and make the whole calculation cleaner and more vectorized?
function result= poissonGoodnessOfFit(bins, observed)
assert(iscell(bins), "bins should be a cell array");
assert(all(cellfun("ismatrix", bins)) == 1, "bin entries either scalars or matrices");
assert(ismatrix(observed) && rows(observed) == 1, "observed data should be a 1xn matrix");
lambda_head = mean(observed); #poisson lambda parameter estimate
k = length(bins); #number of bin groups
n = length(observed); #number of observations
poisson_probability = []; #variable for poisson probability for each bin
observations = []; #variable for observation counts for each bin
for i=1:k
if isscalar(bins{1,i}) #this bin contains a single value
poisson_probability(1,i) = poisspdf(bins{1, i}, lambda_head);
observations(1, i) = histc(observed, bins{1, i});
else #this bin contains a range of values
inner_bins = bins{1, i}; #retrieve the range
inner_bins_k = length(inner_bins); #number of values inside
inner_poisson_probability = []; #variable to store individual probability of each value inside this bin
inner_observations = []; #variable to store observation counts of each value inside this bin
for j=1:inner_bins_k
inner_poisson_probability(1,j) = poisspdf(inner_bins(1, j), lambda_head);
inner_observations(1, j) = histc(observed, inner_bins(1, j));
endfor
poisson_probability(1, i) = sum(inner_poisson_probability, 2); #assign over the sum of all inner probabilities
observations(1, i) = sum(inner_observations, 2); #assign over the sum of all inner observation counts
endif
endfor
expected = n .* poisson_probability; #expected observations if indeed poisson using lambda_head
chisq = sum((observations - expected).^2 ./ expected, 2); #Pearson Chi-Square statistics
pvalue = 1 - chi2cdf(chisq, k-1-1);
result = struct("actual", observations, "expected", expected, "chi2", chisq, "pvalue", pvalue);
return;
endfunction
There's a couple of things worth noting in the code.
First, the 'scalar' case in your if block is actually identical to your 'range' case, since a scalar is simply a range of 1 element. So no special treatment is needed for it.
Second, you don't need to create such explicit subranges, your bin groups seem to be amenable to being used as indices into a larger result (as long as you add 1 to convert from 0-indexed to 1-indexed indices).
Therefore my approach would be to calculate the expected and observed numbers over the entire domain of interest (as inferred from your bin groups), and then use the bin groups themselves as 1-indices to obtain the desired subgroups, summing accordingly.
Here's an example code, written in the octave/matlab compatible subset of both languges:
function Result = poissonGoodnessOfFit( BinGroups, Observations )
% POISSONGOODNESSOFFIT( BinGroups, Observations) calculates the [... etc, etc.]
pkg load statistics; % only needed in octave; for matlab buy statistics toolbox.
assert( iscell( BinGroups ), 'Bins should be a cell array' );
assert( all( cellfun( #ismatrix, BinGroups ) ) == 1, 'Bin entries either scalars or matrices' );
assert( ismatrix( Observations ) && rows( Observations ) == 1, 'Observed data should be a 1xn matrix' );
% Define helpful variables
RangeMin = min( cellfun( #min, BinGroups ) );
RangeMax = max( cellfun( #max, BinGroups ) );
Domain = RangeMin : RangeMax;
LambdaEstimate = mean( Observations );
NBinGroups = length( BinGroups );
NObservations = length( Observations );
% Get expected and observed numbers per 'bin' (i.e. discrete value) over the *entire* domain.
Expected_Domain = NObservations * poisspdf( Domain, LambdaEstimate );
Observed_Domain = histc( Observations, Domain );
% Apply BinGroup values as indices
Expected_byBinGroup = cellfun( #(c) sum( Expected_Domain(c+1) ), BinGroups );
Observed_byBinGroup = cellfun( #(c) sum( Observed_Domain(c+1) ), BinGroups );
% Perform a Chi-Square test on the Bin-wise Expected and Observed outputs
O = Observed_byBinGroup; E = Expected_byBinGroup ; df = NBinGroups - 1 - 1;
ChiSquareTestStatistic = sum( (O - E) .^ 2 ./ E );
PValue = 1 - chi2cdf( ChiSquareTestStatistic, df );
Result = struct( 'actual', O, 'expected', E, 'chi2', ChiSquareTestStatistic, 'pvalue', PValue );
end
Running with your example gives:
X = [8 0 0 1 3 4 0 2 12 5 1 8 0 2 0 1 9 3 4 5 3 3 4 7 4 0 1 2 1 2]; % 30 observations
bins = {0, 1, [2:3], [4:5], [6:20]}; % each bin can be single value or multiple values
Result = poissonGoodnessOfFit( bins, X )
% Result =
% scalar structure containing the fields:
% actual = 6 5 8 6 5
% expected = 1.2643 4.0037 13.0304 8.6522 3.0493
% chi2 = 21.989
% pvalue = 0.000065574
A general comment about the code; it is always preferable to write self-explainable code, rather than code that does not make sense by itself in the absence of a comment. Comments generally should only be used to explain the 'why', rather than the 'how'.

Writing Fibonacci Sequence Elegantly Python

I am trying to improve my programming skills by writing functions in multiple ways, this teaches me new ways of writing code but also understanding other people's style of writing code. Below is a function that calculates the sum of all even numbers in a fibonacci sequence up to the max value. Do you have any recommendations on writing this algorithm differently, maybe more compactly or more pythonic?
def calcFibonacciSumOfEvenOnly():
MAX_VALUE = 4000000
sumOfEven = 0
prev = 1
curr = 2
while curr <= MAX_VALUE:
if curr % 2 == 0:
sumOfEven += curr
temp = curr
curr += prev
prev = temp
return sumOfEven
I do not want to write this function recursively since I know it takes up a lot of memory even though it is quite simple to write.
You can use a generator to produce even numbers of a fibonacci sequence up to the given max value, and then obtain the sum of the generated numbers:
def even_fibs_up_to(m):
a, b = 0, 1
while a <= m:
if a % 2 == 0:
yield a
a, b = b, a + b
So that:
print(sum(even_fibs_up_to(50)))
would output: 44 (0 + 2 + 8 + 34 = 44)

AS3 convert a positive number to 1 and a negative number to -1

There is a simple trick to convert a number to 1 or -1.
Just raise it to the power of 0.
So:
4^0 = 1
-4^0 = -1
However, in AS3:
Math.pow( 4, 0); // = 1
Math.pow(-4, 0); // = 1
Is there a way to get the right answer without an if else?
This could be done bitwise.
Given the number n (avg time: 0.0065ms):
1 + 2 * (n >> 31);
Or slightly slower (avg time: 0.0095ms):
(n < 0 && -1) || 1;
However, Marty's solution is the fastest (avg time: 0.0055ms)
n < 0 ? -1 : 1;
Not sure if without an if/else includes the ternary operator in your eyes, but if not:
// Where x is your input.
var r:int = x < 0 ? -1 : 1;
Will be more efficient than Math.pow() anyway.