argument after * must be an iterable, not int - function

I have a small script that takes in a list and a value that represents what size sublists to break the list into:
def chunk(alist, n):
i = 0
j = n
while j < (len(alist) + 2):
sub = alist[i:j]
i += n
j += n
print(sub)
chunk([1, 2, 3, 4, 5], 2)
This works. I get [1, 2] [3, 4] [5]
But if I try to return sub from the function and print it in a loop it fails with: argument after * must be an iterable, not int
def chunks(alist, n):
i = 0
j = n
while j < (len(alist) + 2):
sub = alist[i:j]
i += n
j += n
return sub
for chunk in chunks([1, 2, 3, 4, 5], 2):
print(*chunk)
sub is an iterable (a list). Not sure what I am doing wrong.
NOTE cannot use itertools or any other add ons.

Your sub variable holds only a slice of the list. It should instead be appended to a list of lists as the returning value of the chunks function:
def chunks(alist, n):
i = 0
j = n
output = []
while j < (len(alist) + 2):
output.append(alist[i:j])
i += n
j += n
return output
for chunk in chunks([1, 2, 3, 4, 5], 2):
print(*chunk)
This outputs:
1 2
3 4
5

Related

lambda function and sets

I need to remove all elements from set that satisfy the lambda function, is there a way to make it continue when the fisrt part of the statement returns error?
f = lambda x : x < "t" or x % 2 == 0
s = {"a", 2, "u", 4, 3}
def sort(fun, s):
l = []
for element in s:
try:
if (fun(element)) == True:
l.append(element)
except:
pass
for element in l:
s.remove(element)
my code returns:
s = {2, 3, 4, 'u'}
but I need it to return:
s = { 3, 'u'}
Maybe you can adjust a lambda function to check the type of the element.
Also, to remove values from set use set-comprehension:
f = lambda x: (isinstance(x, str) and x < "t") or (
isinstance(x, int) and x % 2 == 0
)
s = {"a", 2, "u", 4, 3}
s = {v for v in s if not f(v)}
print(s)
Prints:
{'u', 3}

Julia CUDA - Reduce matrix columns

Consider the following kernel, which reduces along the rows of a 2-D matrix
function row_sum!(x, ncol, out)
"""out = sum(x, dims=2)"""
row_idx = (blockIdx().x-1) * blockDim().x + threadIdx().x
for i = 1:ncol
#inbounds out[row_idx] += x[row_idx, i]
end
return
end
N = 1024
x = CUDA.rand(Float64, N, 2*N)
out = CUDA.zeros(Float64, N)
#cuda threads=256 blocks=4 row_sum!(x, size(x)[2], out)
isapprox(out, sum(x, dims=2)) # true
How do I write a similar kernel except for reducing along the columns (of a 2-D matrix)? In particular, how do I get the index of each column, similar to how we got the index of each row with row_idx?
Here is the code:
function col_sum!(x, nrow, out)
"""out = sum(x, dims=1)"""
col_idx = (blockIdx().x-1) * blockDim().x + threadIdx().x
for i = 1:nrow
#inbounds out[col_idx] += x[i, col_idx]
end
return
end
N = 1024
x = CUDA.rand(Float64, N, 2N)
out = CUDA.zeros(Float64, 2N)
#cuda threads=256 blocks=8 col_sum!(x, size(x, 1), out)
And here is the test:
julia> isapprox(out, vec(sum(x, dims=1)))
true
As you can see the size of the result vector is now 2N instead of N, hence we had to adapt the number of blocks accordingly (that is multiply by 2 and now we have 8 instead of 4)
More materials can be found here: https://juliagpu.gitlab.io/CUDA.jl/tutorials/introduction/

How to plot Iterations in Julia

I coded a function picircle() that estimates pi.
Now I would like to plot this function for N values.
function Plotpi()
p = 100 # precision of π
N = 5
for i in 1:N
picircle(p)
end
end
3.2238805970149254
3.044776119402985
3.1641791044776117
3.1243781094527363
3.084577114427861
Now I am not sure how to plot the function, I tried plot(PP()) but it didn't work
Here I defined picircle:
function picircle(n)
n = n
L = 2n+1
x = range(-1, 1, length=L)
y = rand(L)
center = (0,0)
radius = 1
n_in_circle = 0
for i in 1:L
if norm((x[i], y[i]) .- center) < radius
n_in_circle += 1
end
end
println(4 * n_in_circle / L)
end
Your problem is that your functions don't actually return anything:
julia> x = Plotpi()
3.263681592039801
3.0646766169154227
2.845771144278607
3.18407960199005
3.044776119402985
julia> x
julia> typeof(x)
Nothing
The numbers you see are just printed to the REPL, and print doesn't return any value:
julia> x = print(5)
5
julia> typeof(x)
Nothing
So you probably just want to change your function so that it returns what you want to plot:
julia> function picircle(n)
n = n
L = 2n+1
x = range(-1, 1, length=L)
y = rand(L)
center = (0,0)
radius = 1
n_in_circle = 0
for i in 1:L
if norm((x[i], y[i]) .- center) < radius
n_in_circle += 1
end
end
4 * n_in_circle / L
end
Then:
julia> x = picircle(100)
3.263681592039801
julia> x
3.263681592039801
So now the value of the function is actually returned (rather than just printed to the console). You don't really need a separate function if you just want to do this multiple times and plot the results, a comprehension will do. Here's an example comparing the variability of the estimate with 100 draws vs 50 draws:
julia> using Plots
julia> histogram([picircle(100) for _ ∈ 1:1_000], label = "100 draws", alpha = 0.5)
julia> histogram!([picircle(20) for _ ∈ 1:1_000], label = "20 draws", alpha = 0.5)

kmer counts with cython implementation

I have this function implemented in Cython:
def count_kmers_cython(str string, list alphabet, int kmin, int kmax):
"""
Count occurrence of kmers in a given string.
"""
counter = {}
cdef int i
cdef int j
cdef int N = len(string)
limits = range(kmin, kmax + 1)
for i in range(0, N - kmax + 1):
for j in limits:
kmer = string[i:i+j]
counter[kmer] = counter.get(kmer, 0) + 1
return counter
Can I do better with cython? Or Can I have any away to improve it?
I am new to cython, that is my first attempt.
I will use this to count kmers in DNA with alphabet restrict to 'ACGT'. The length of the general input string is the average bacterial genomes (130 kb to over 14 Mb, where each 1 kb = 1000 bp).
The size of the kmers will be 3 < kmer < 16.
I wish to know if I could go further and maybe use cython in this function to:
def compute_kmer_stats(kmer_list, counts, len_genome, max_e):
"""
This function computes the z_score to find under/over represented kmers
according to a cut off e-value.
Inputs:
kmer_list - a list of kmers
counts - a dictionary-type with k-mers as keys and counts as values.
len_genome - the total length of the sequence(s).
max_e - cut off e-values to report under/over represented kmers.
Outputs:
results - a list of lists as [k-mer, observed count, expected count, z-score, e-value]
"""
print(colored('Starting to compute the kmer statistics...\n',
'red',
attrs=['bold']))
results = []
# number of tests, used to convert p-value to e-value.
n = len(list(kmer_list))
for kmer in kmer_list:
k = len(kmer)
prefix, sufix, center = counts[kmer[:-1]], counts[kmer[1:]], counts[kmer[1:-1]]
# avoid zero division error
if center == 0:
expected = 0
else:
expected = (prefix * sufix) // center
observed = counts[kmer]
sigma = math.sqrt(expected * (1 - expected / (len_genome - k + 1)))
# avoid zero division error
if sigma == 0.0:
z_score = 0.0
else:
z_score = ((observed - expected) / sigma)
# pvalue for all kmers/palindromes under represented
p_value_under = (math.erfc(-z_score / math.sqrt(2)) / 2)
# pvalue for all kmers/palindromes over represented
p_value_over = (math.erfc(z_score / math.sqrt(2)) / 2)
# evalue for all kmers/palindromes under represented
e_value_under = (n * p_value_under)
# evalue for all kmers/palindromes over represented
e_value_over = (n * p_value_over)
if e_value_under <= max_e:
results.append([kmer, observed, expected, z_score, p_value_under, e_value_under])
elif e_value_over <= max_e:
results.append([kmer, observed, expected, z_score, p_value_over, e_value_over])
return results
OBS - Thank you CodeSurgeon by the help. I know there are other tools to count kmer efficiently but I am learning Python so I am trying to write my own functions and code.

Tweaking a Function in Python

I am trying to get the following code to do a few more tricks:
class App(Frame):
def __init__(self, master):
Frame.__init__(self, master)
self.grid()
self.create_widgets()
def create_widgets(self):
self.answerLabel = Label(self, text="Output List:")
self.answerLabel.grid(row=2, column=1, sticky=W)
def psiFunction(self):
j = int(self.indexEntry.get())
valueList = list(self.listEntry.get())
x = map(int, valueList)
if x[0] != 0:
x.insert(0, 0)
rtn = []
for n2 in range(0, len(x) * j - 2):
n = n2 / j
r = n2 - n * j
rtn.append(j * x[n] + r * (x[n + 1] - x[n]))
self.answer = Label(self, text=rtn)
self.answer.grid(row=2, column=2, sticky=W)
if __name__ == "__main__":
root = Tk()
In particular, I am trying to get it to calculate len(x) * j - 1 terms, and to work for a variety of parameter values. If you try running it you should find that you get errors for larger parameter values. For example with a list 0,1,2,3,4 and a parameter j=3 we should run through the program and get 0123456789101112. However, I get an error that the last value is 'out of range' if I try to compute it.
I believe it's an issue with my function as defined. It seems the issue with parameters has something to do with the way it ties the parameter to the n value. Consider 0123. It works great if I use 2 as my parameter (called index in the function) but fails if I use 3.
EDIT:
def psi_j(x, j):
rtn = []
for n2 in range(0, len(x) * j - 2):
n = n2 / j
r = n2 - n * j
if r == 0:
rtn.append(j * x[n])
else:
rtn.append(j * x[n] + r * (x[n + 1] - x[n]))
print 'n2 =', n2, ': n =', n, ' r =' , r, ' rtn =', rtn
return rtn
For example if we have psi_j(x,2) with x = [0,1,2,3,4] we will be able to get [0,1,2,3,4,5,6,7,8,9,10,11] with an error on 12.
The idea though is that we should be able to calculate that last term. It is the 12th term of our output sequence, and 12 = 3*4+0 => 3*x[4] + 0*(x[n+1]-x[n]). Now, there is no 5th term to calculate so that's definitely an issue but we do not need that term since the second part of the equation is zero. Is there a way to write this into the equation?
If we think about the example data [0, 1, 2, 3] and a j of 3, the problem is that we're trying to get x[4]` in the last iteration.
len(x) * j - 2 for this data is 10
range(0, 10) is 0 through 9.
Manually processing our last iteration, allows us to resolve the code to this.
n = 3 # or 9 / 3
r = 0 # or 9 - 3 * 3
rtn.append(3 * x[3] + 0 * (x[3 + 1] - x[3]))
We have code trying to reach x[3 + 1], which doesn't exist when we only have indices 0 through 3.
To fix this, we could rewrite the code like this.
n = n2 / j
r = n2 - n * j
if r == 0:
rtn.append(j * x[n])
else:
rtn.append(j * x[n] + r * (x[n + 1] - x[n]))
If r is 0, then (x[n + 1] - x[n]) is irrelevant.
Please correct me if my math is wrong on that. I can't see a case where n >= len(x) and r != 0, but if that's possible, then my solution is invalid.
Without understanding that the purpose of the function is (is it a kind of filter? or smoothing function?), I prickled it out of the GUI suff and tested it alone:
def psiFunction(j, valueList):
x = map(int, valueList)
if x[0] != 0:
x.insert(0, 0)
rtn = []
for n2 in range(0, len(x) * j - 2):
n = n2 / j
r = n2 - n * j
print "n =", n, "max_n2 =", len(x) * j - 2, "n2 =", n2, "lx =", len(x), "r =", r
val = j * x[n] + r * (x[n + 1] - x[n])
rtn.append(val)
print j * x[n], r * (x[n + 1] - x[n]), val
return rtn
if __name__ == '__main__':
print psiFunction(3, [0, 1, 2, 3, 4])
Calling this module leads to some debugging output and, at the end, the mentionned error message.
Obviously, your x[n + 1] access fails, as n is 4 there, so n + 1 is 5, one too much for accessing the x array, which has length 5 and thus indexes from 0 to 4.
EDIT: Your psi_j() gives me the same behaviour.
Let me continue guessing: Whatever we want to do, we have to ensure that n + 1 stays below len(x). So maybe a
for n2 in range(0, (len(x) - 1) * j):
would be helpful. It only produces the numbers 0..11, but I think this is the only thing which can be expected out of it: the last items only can be
3*3 + 0*(4-3)
3*3 + 1*(4-3)
3*3 + 2*(4-3)
and stop. And this is achieved with the limit I mention here.