Why does changing a kernel parameter deplete my resources? - cuda

I made a very simple kernel below to practice CUDA.
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
from pycuda.compiler import SourceModule
from pycuda import gpuarray
import cv2
def compile_kernel(kernel_code, kernel_name):
mod = SourceModule(kernel_code)
func = mod.get_function(kernel_name)
return func
input_file = np.array(cv2.imread('clouds.jpg'))
height, width, channels = np.int32(input_file.shape)
my_kernel_code = """
__global__ void my_kernel(int width, int height) {
// This kernel trivially does nothing! Hurray!
}
"""
kernel = compile_kernel(my_kernel_code, 'my_kernel')
if __name__ == '__main__':
for i in range(0, 2):
print 'o'
kernel(width, height, block=(32, 32, 1), grid=(125, 71))
# When I take this line away, the error goes bye bye.
# What in the world?
width -= 1
Right now, if we run the code above, execution proceeds through the first iteration of the for loop just fine. However, during the second iteration of the loop, I get the following error.
Traceback (most recent call last):
File "outOfResources.py", line 27, in <module>
kernel(width, height, block=(32, 32, 1), grid=(125, 71))
File "/software/linux/x86_64/epd-7.3-1-pycuda/lib/python2.7/site-packages/pycuda-2012.1-py2.7-linux-x86_64.egg/pycuda/driver.py", line 374, in function_call
func._launch_kernel(grid, block, arg_buf, shared, None)
pycuda._driver.LaunchError: cuLaunchKernel failed: launch out of resources
If I take away the line width -= 1, the error goes away. Why is that? Can't I change the parameter for a kernel the second time around? For reference, here is clouds.jpg.

Though the error message isn't particularly informative, note that you need to pass in a correctly casted width variable. So something like:
width = np.int32(width - 1)
should work.

Related

Problems with quad when using lambdify

I'm trying to solve these two integrals, I want to use a numerical approach because C_i will eventually become more complicated and I want to use it for all cases. Currently, C_i is just a constant so _quad is not able to solve it. I'm assuming because it is a Heaviside function and it is having trouble finding the a,b. Please correct me if I'm approaching this wrongly.
Equation 33
In [1]: import numpy as np
...: import scipy as sp
...: import sympy as smp
...: from sympy import DiracDelta
...: from sympy import Heaviside
In [2]: C_i = smp.Function('C_i')
In [3]: t, t0, x, v = smp.symbols('t, t0, x, v', positive=True)
In [4]: tot_l = 10
In [5]: C_fm = (1/tot_l)*v*smp.Integral(C_i(t0), (t0, (-x/v)+t, t))
In [6]: C_fm.doit()
Out[6]:
0.1*v*Integral(C_i(t0), (t0, t - x/v, t))
In [7]: C_fm.doit().simplify()
Out[7]:
0.1*v*Integral(C_i(t0), (t0, t - x/v, t))
In [8]: C_fms = C_fm.doit().simplify()
In [9]: t_arr = np.arange(0,1000,1)
In [10]: f_mean = smp.lambdify((x, v, t), C_fms, ['scipy', {'C_i': lambda e: 0.8}])
In [11]: try2 = f_mean(10, 0.1, t_arr)
Traceback (most recent call last):
File "/var/folders/rd/wzfh_5h110l121rmlxn61v440000gn/T/ipykernel_3164/3786931540.py", line 1, in <module>
try2 = f_mean(10, 0.1, t_arr)
File "<lambdifygenerated-1>", line 2, in _lambdifygenerated
return 0.1*v*quad(lambda t0: C_i(t0), t - x/v, t)[0]
File "/opt/anaconda3/lib/python3.9/site-packages/scipy/integrate/quadpack.py", line 348, in quad
flip, a, b = b < a, min(a, b), max(a, b)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Equation 34
In [12]: C_i = smp.Function('C_i')
In [13]: t, tao, x, v = smp.symbols('t, tao, x, v', positive=True)
In [14]: I2 = v*smp.Integral((C_i(t-tao))**2, (tao, 0, t))
In [15]: I2.doit()
Out[15]:
v*Integral(C_i(t - tao)**2, (tao, 0, t))
In [16]: I2.doit().simplify()
Out[16]:
v*Integral(C_i(t - tao)**2, (tao, 0, t))
In [17]: I2_s = I2.doit().simplify()
In [18]: tao_arr = np.arange(0,1000,1)
In [19]: I2_sf = smp.lambdify((v, tao), I2_s, ['scipy', {'C_i': lambda e: 0.8}])
In [20]: try2 = I2_sf(0.1, tao_arr)
Traceback (most recent call last):
File "/var/folders/rd/wzfh_5h110l121rmlxn61v440000gn/T/ipykernel_3164/4262383171.py", line 1, in <module>
try2 = I2_sf(0.1, tao_arr)
File "<lambdifygenerated-2>", line 2, in _lambdifygenerated
return v*quad(lambda tao: C_i(t - tao)**2, 0, t)[0]
File "/opt/anaconda3/lib/python3.9/site-packages/scipy/integrate/quadpack.py", line 351, in quad
retval = _quad(func, a, b, args, full_output, epsabs, epsrel, limit,
File "/opt/anaconda3/lib/python3.9/site-packages/scipy/integrate/quadpack.py", line 463, in _quad
return _quadpack._qagse(func,a,b,args,full_output,epsabs,epsrel,limit)
File "/opt/anaconda3/lib/python3.9/site-packages/sympy/core/expr.py", line 345, in __float__
raise TypeError("Cannot convert expression to float")
TypeError: Cannot convert expression to float
So you are passing an unevaluated Integrate to lambdify, which in turn translates it call to scipy.integrate.quad.
Looks like the integrals can't be evaluated even with doit and simplify calls. Have you actually looked at C_fms and I2_s? That's one of the first things I'd do when running this code!
I've never looked at this approach. I have seen people lambdify the objective expression, and then try to use that in quad directly.
quad has specific requirements (check the docs!). The objective function must return a single number, and the bounds must also be numbers.
In the first error, you are passing array t_arr as the t bound, and it got the usual ambiguity error when checking where it is bigger than the other bound, 0. That's that b < a test. quad cannot use arrays as bounds.
I not sure why the second case gets avoids this problem - bounds must be coming from somewhere else. But the error comes when quad calls the objective function, and expects a float return. Instead the function returns a sympy expression which sympy can't convert to float. My guess there's some variable in the expression that's still a sympy.symbol.
In diagnosing lambdify problems, it's a good idea to look at the generated code. One way is with help on the function, help(I2_sf). But with that you need to be able to read and understand python, including any numpy and scipy functions. That's not always easy.
Have you tried to use sympy's own numeric integrator? Trying to combine sympy and numpy/scipy often has problems.

Estimating mixture of Gaussian models in Pytorch

I actually want to estimate a normalizing flow with a mixture of gaussians as the base distribution, so I'm sort of stuck with torch. However you can reproduce my error in my code by just estimating a mixture of Gaussian model in torch. My code is below:
import numpy as np
import matplotlib.pyplot as plt
import sklearn.datasets as datasets
import torch
from torch import nn
from torch import optim
import torch.distributions as D
num_layers = 8
weights = torch.ones(8,requires_grad=True).to(device)
means = torch.tensor(np.random.randn(8,2),requires_grad=True).to(device)#torch.randn(8,2,requires_grad=True).to(device)
stdevs = torch.tensor(np.abs(np.random.randn(8,2)),requires_grad=True).to(device)
mix = D.Categorical(weights)
comp = D.Independent(D.Normal(means,stdevs), 1)
gmm = D.MixtureSameFamily(mix, comp)
num_iter = 10001#30001
num_iter2 = 200001
loss_max1 = 100
for i in range(num_iter):
x = torch.randn(5000,2)#this can be an arbitrary x samples
loss2 = -gmm.log_prob(x).mean()#-densityflow.log_prob(inputs=x).mean()
optimizer1.zero_grad()
loss2.backward()
optimizer1.step()
The error I get is:
0
8.089411823514835
Traceback (most recent call last):
File "/home/cameron/AnacondaProjects/gmm.py", line 183, in <module>
loss2.backward()
File "/home/cameron/anaconda3/envs/torch/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/cameron/anaconda3/envs/torch/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.
After as you see the model runs for 1 iteration.
There is ordering problem in your code, since you create Gaussian mixture model outside of training loop, then when calculate the loss the Gaussian mixture model will try to use the initial value of the parameters that you set when you define the model, but the optimizer1.step() already modify that value so even you set loss2.backward(retain_graph=True) there will still be the error: RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
Solution to this problem is simply create new Gaussian mixture model whenever you update the parameters, example code running as expected:
import numpy as np
import matplotlib.pyplot as plt
import sklearn.datasets as datasets
import torch
from torch import nn
from torch import optim
import torch.distributions as D
num_layers = 8
weights = torch.ones(8,requires_grad=True)
means = torch.tensor(np.random.randn(8,2),requires_grad=True)
stdevs = torch.tensor(np.abs(np.random.randn(8,2)),requires_grad=True)
parameters = [weights, means, stdevs]
optimizer1 = optim.SGD(parameters, lr=0.001, momentum=0.9)
num_iter = 10001
for i in range(num_iter):
mix = D.Categorical(weights)
comp = D.Independent(D.Normal(means,stdevs), 1)
gmm = D.MixtureSameFamily(mix, comp)
optimizer1.zero_grad()
x = torch.randn(5000,2)#this can be an arbitrary x samples
loss2 = -gmm.log_prob(x).mean()#-densityflow.log_prob(inputs=x).mean()
loss2.backward()
optimizer1.step()
print(i, loss2)

too many values to unpack (expected 2) lda

I received error : too many values to unpack (expected 2) , when running the below code. anyone can help me? I added more details.
import gensim
import gensim.corpora as corpora
dictionary = corpora.Dictionary(doc_clean)
doc_term_matrix = [dictionary.doc2bow(doc) for doc in doc_clean]
Lda = gensim.models.ldamodel.LdaModel
ldamodel = Lda(doc_term_matrix, num_topics=3, id2word = dictionary, passes=50, per_word_topics = True, eval_every = 1)
print(ldamodel.print_topics(num_topics=3, num_words=20))
for i in range (0,46):
for index, score in sorted(ldamodel[doc_term_matrix[i]], key=lambda tup: -1*tup[1]):
print("subject", i)
print("\n")
print("Score: {}\t \nTopic: {}".format(score, ldamodel.print_topic(index, 6)))
Focusing on the loop, since this is where the error is being raised. Let's take it one iteration at a time.
>>> import numpy as np # just so we can use np.shape()
>>> i = 0 # value in first loop
>>> x = sorted( ldamodel[doc_term_matrix[i]], key=lambda tup: -1*tup[1] )
>>> np.shape(x)
(3, 3, 2)
>>> for index, score in x:
... pass
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack (expected 2)
Here is where your error is coming from. You are expecting this returned matrix to have 2 elements, however it is a multislice matrix with no simple infer-able way to unpack it. I do not personally have enough experience with this subject material to be able to infer what you might mean to be doing, I can only show you where your problem is coming from. Hope this helps!

when call the class convolution, it say error

gdd.forward(x) call error, but why?
This code uses imcol to implement the convolution layer
Traceback (most recent call last):
File "E:/PycharmProjects/untitled2/kk.py", line 61, in <module>
gdd.forward(x)
File "E:/PycharmProjects/untitled2/kk.py", line 46, in forward
FN,C,FH,FW=self.W.shape
ValueError: not enough values to unpack (expected 4, got 2)
import numpy as np
class Convolution:
# 卷积核大小
def __init__(self,W,b,stride=1,pad=0):
self.W = W
self.b = b
self.stride = stride
self.pad = pad
def forward(self,x):
FN,C,FH,FW=self.W.shape
N,C,H,W = x.shape
out_h = int(1+(H+ 2*self.pad - FH) / self.stride)
out_w = int(1+(W + 2*self.pad -FW) / self.stride)
e = np.array([[2,0,1],[0,1,2],[1,0,2]])
x = np.array([[1,2,3,0],[0,1,2,3],[3,0,1,2],[2,3,0,1]])
gdd = Convolution(e,3,1,0)
gdd.forward(x)
not enough value to unpack means that there are 2 outputs, but you are expecting 4:
FN,C,FH,FW=self.W.shape
just get rid of 2 of them and you are good to go :)
BTW I'm assuming you speak Chinese? 我说中文, 不懂可以用中文问一下

Pygame compiled cursor string not giving enough data

I'm trying to compile a cursor string provided by pygame and set the cursor to it. However, only 2 of the necessary 4 arguments are returned from the string compiler.
pygame.mouse.set_cursor(*pygame.cursors.broken_x)
cursor = pygame.cursors.compile(pygame.cursors.sizer_x_strings)
Results in:
Traceback (most recent call last):
File "main.py", line 17, in __init__
pygame.mouse.set_cursor(*cursor)
TypeError: function takes exactly 4 arguments (2 given)
The premade strings in pygame.cursors.* don't contain any metadata about the cursor, only the raw string. To effectively use them, you have to also provide the size (width in characters and height in lines) of the cursor string.
Here's an example that uses that premade cursor:
import sys
import pygame
from pygame.locals import *
pygame.init()
screen = pygame.display.set_mode((640, 480))
cursor, mask = pygame.cursors.compile(pygame.cursors.sizer_x_strings, "X", ".")
cursor_sizer = ((24, 16), (7, 11), cursor, mask)
pygame.mouse.set_cursor(*cursor_sizer)
while True:
for event in pygame.event.get():
if event.type == QUIT:
pygame.quit()
sys.exit()
screen.fill((120, 120, 120))
pygame.display.update()