How to convert an upper/lower gpuarray to the specific format required by cublasStbsv? - cuda

I am currently using pycuda and scikits.cuda to solve linear equation A*x = b, where A is an upper/lower matrix. However the cublasStbsv routine requires a specific format.
To give an example: if a lower matrix A = [[1, 0, 0], [2, 3, 0], [4, 5, 6]], then the input required by cublasStbsv should be [[1, 3, 6], [2, 5, 0], [4, 0, 0]], where rows are diagonal, subdiagonal1, subdiagonal2, respectively. If using numpy, this can be easily done by stride_tricks.as_strided, but I dont know how to do similar things with pycuda.gpuarray. Any help would be appreciated, thanks. I found pycuda.compyte.array.as_strided, but it cannot be applied to gpuarray.

I got it done by using theano. First converted it to cudandarray, change stride and make a copy back to gpuarray. Just be careful about changes between Fortran and C order.
update:
finally got it done by using gpuarray.multi_take_put
def make_triangle(s_matrix, uplo = 'L'):
"""convert triangle matrix to the specific format
required by cublasStbsv, matrix should be in Fortran order,
s_matrix: gpuarray
"""
#make sure the dytpe is float32
if s_matrix.dtype != 'f':
s_matrix = s_matrix.astype('f')
dim = s_matrix.shape[0]
if uplo == 'L':
idx_tuple = np.tril_indices(dim)
gidx = gpuarray.to_gpu(idx_tuple[0] + idx_tuple[1] * dim)
gdst = gpuarray.to_gpu(idx_tuple[0] + idx_tuple[1] * (dim - 1))
return gpuarray.multi_take_put([s_matrix], gdst, gidx, (dim, dim))[0]
else:
idx_tuple = np.triu_indices(dim)
gidx = gpuarray.to_gpu(idx_tuple[0] + idx_tuple[1] * dim)
gdst = gpuarray.to_gpu(idx_tuple[0] + (idx_tuple[1] + 1) * (dim - 1))
return gpuarray.multi_take_put([s_matrix], gdst, gidx, (dim, dim))[0]

Related

Why is my segmentation output appear as a boundary convolutional filter instead of an output mask

So I am building a neural network to look at medical CT images and try to create a segmentation mask of the heart within chest CT imaging. Ive trainied the neural network with a DICE score around 80%. The inputs and masks are all standardized and appropriate, but the output of my model appears to be making a mask that is able to create a silhoute of all soft tissue within the body, instead of just the heart alone. Does anyone know where I may be messing up?
Images:
Input Data ex.
Input Mask ex.
Prediction of model on an image that was used to train the model:
model.predict() output
Model code:
def dense_unet_256(inputs, filters, depth):
#Model Creation
# Define kwargs dictionary
kwargs = {
'kernel_size': (1, 3, 3),
'padding': 'same',
'bias_initializer': 'zeros'
}
# Define lambda functions#
conv = lambda x, filters, strides: layers.Conv3D(filters=int(filters), strides=(1, strides, strides), **kwargs)(x)
norm = lambda x: layers.BatchNormalization()(x)
relu = lambda x: layers.LeakyReLU()(x)
# Define stride-1, stride-2 blocks#
conv1 = lambda filters, x: relu(norm(conv(x, filters, strides=1)))
conv2 = lambda filters, x: relu(norm(conv(x, filters, strides=2)))
# Define single transpose#
tran = lambda x, filters, strides: layers.Conv3DTranspose(filters=int(filters), strides=(1, strides,strides),**kwargs)(x)
# Define transpose block#
tran2 = lambda filters, x: relu(norm(tran(x, filters, strides=2)))
concat = lambda a, b: layers.Concatenate()([a, b])
# Define Dense Block#
def dense_block(filters, input, DB_depth):
ext = 2 + DB_depth
outside_layer = input
for _ in range(0, int(ext)):
inside_layer = conv1(filters, outside_layer)
outside_layer = concat(outside_layer, inside_layer)
return outside_layer
def td_block(conv1_filters, conv2_filters, input, DB_depth):
TD = conv1(conv1_filters, conv2(conv2_filters, input))
DB = dense_block(conv1_filters, TD, DB_depth)
return DB
def tu_block(conv1_filters, tran2_filters, input, td_input, DB_depth):
TU = conv1(conv1_filters, tran2(tran2_filters, input))
C = concat(TU, td_input)
DB = dense_block(conv1_filters, C, DB_depth)
return DB
TD1 = td_block(filters * 1, filters * 1, inputs, depth)
TD2 = td_block(filters * 1.5, filters * 1, TD1, 1+depth)
TD3 = td_block(filters * 2, filters * 1.5, TD2, 2+depth)
TD4 = td_block(filters * 2.5, filters * 2, TD3, 3+depth)
TD5 = td_block(filters * 3, filters * 2.5, TD4, 4+depth)
TU1 = tu_block(filters * 2.5, filters * 3, TD5, TD4, 4+depth)
TU2 = tu_block(filters * 2, filters * 2.5, TU1, TD3, 3+depth )
TU3 = tu_block(filters * 1.5, filters * 2, TU2, TD2, 2+depth)
TU4 = tu_block(filters * 1, filters * 1.5, TU3, TD1, 1+depth)
TU5 = tran2(filters * 1, TU4)
logits = {}
logits['lbl'] = layers.Conv3D(filters=2, name = 'lbl', **kwargs)(TU5)
model = Model(inputs=inputs, outputs=logits['lbl'])
return model
Ive tried training with different loss and metric functions without much change, I've confirmed that the image and masks going into training only identify the heart. I have done some hyperparamter tuning with epochs and learning rate, which with some tweaking have made the output mask of the heart a slightly different value when compared to chest wall tissue, but the ranges are not standardized between images so using a clip to create the mask wont work. I personally feel like it has something to do with the activation of the loaded model, but I am unsure exactly how to prove that. Any advice would be apprecaited!
Merry Christmas and Happy Holidays everyone!

WHat does Lambda do in this code (python keras)?

def AdaIN(x):
#Normalize x[0] (image representation)
mean = K.mean(x[0], axis = [1, 2], keepdims = True)
std = K.std(x[0], axis = [1, 2], keepdims = True) + 1e-7
y = (x[0] - mean) / std
#Reshape scale and bias parameters
pool_shape = [-1, 1, 1, y.shape[-1]]
scale = K.reshape(x[1], pool_shape)
bias = K.reshape(x[2], pool_shape)#Multiply by x[1] (GAMMA) and add x[2] (BETA)
return y * scale + bias
def g_block(input_tensor, latent_vector, filters):
gamma = Dense(filters, bias_initializer = 'ones')(latent_vector)
beta = Dense(filters)(latent_vector)
out = UpSampling2D()(input_tensor)
out = Conv2D(filters, 3, padding = 'same')(out)
out = Lambda(AdaIN)([out, gamma, beta])
out = Activation('relu')(out)
return out
Please see code above. I am currently studying styleGAN. I am trying to convert this code into pytorch but I cant seem to understand what does Lambda do in g_block. AdaIN needs only one input based on its declaration but some how is gamma and beta also used as input? Please inform me what does the Lambda do in this code.
Thank you very much.
Lambda layers in keras are used to call custom functions inside the model. In g_block Lambda calls AdaIN function and passes out, gamma, beta as arguments inside a list. And AdaIN function receives these 3 tensors encapsulated within a single list as x. And also those tensors are accessed inside AdaIN function by indexing list x(x[0], x[1], x[2]).
Here's pytorch equivalent:
import torch
import torch.nn as nn
import torch.nn.functional as F
class AdaIN(nn.Module):
def forward(self, out, gamma, beta):
bs, ch = out.size()[:2]
mean = out.reshape(bs, ch, -1).mean(dim=2).reshape(bs, ch, 1, 1)
std = out.reshape(bs, ch, -1).std(dim=2).reshape(bs, ch, 1, 1) + 1e-7
y = (out - mean) / std
bias = beta.unsqueeze(-1).unsqueeze(-1).expand_as(out)
scale = gamma.unsqueeze(-1).unsqueeze(-1).expand_as(out)
return y * scale + bias
class g_block(nn.Module):
def __init__(self, filters, latent_vector_shape, input_tensor_channels):
super().__init__()
self.gamma = nn.Linear(in_features = latent_vector_shape, out_features = filters)
# Initializes all bias to 1
self.gamma.bias.data = torch.ones(filters)
self.beta = nn.Linear(in_features = latent_vector_shape, out_features = filters)
# calculate appropriate padding
self.conv = nn.Conv2d(input_tensor_channels, filters, 3, 1, padding=1)# calc padding
self.adain = AdaIN()
def forward(self, input_tensor, latent_vector):
gamma = self.gamma(latent_vector)
beta = self.beta(latent_vector)
# check default interpolation mode in keras and replace mode below if different
out = F.interpolate(input_tensor, scale_factor=2, mode='nearest')
out = self.conv(out)
out = self.adain(out, gamma, beta)
out = torch.relu(out)
return out
# Sample:
input_tensor = torch.randn((1, 3, 10, 10))
latent_vector = torch.randn((1, 5))
g = g_block(3, latent_vector.shape[1], input_tensor.shape[1])
out = g(input_tensor, latent_vector)
print(out)
Note: you need to pass latent_vector and input_tensor shapes while creating g_block.

Luong Style Attention Mechanism with Dot and General scoring functions in keras and tensorflow

I am trying to implement the dot product and general implementation of calculating similarity scores from encoder and decoder output and hidden states respectively in keras.
I have got the idea to do the product of tf.keras.layers.dot(encoder_output,decoder_state) for calculating product score but there is error in multiplication of these two values.
class Attention(tf.keras.Model):
def __init__(self,units):
super().__init__()
self.units = units
def call(self, decoder_state, encoder_output):
score = tf.keras.layers.dot([encoder_output,decoder_state], axes=[2, 1])
attention_weights = tf.nn.softmax(score, axis=1)
context_vector = attention_weights * encoder_output
context_vector = tf.reduce_sum(context_vector, axis=1)
return context_vector, attention_weights
batch_size = 16
units = 32
input_length = 20
decoder_state = tf.random.uniform(shape=[batch_size, units])
encoder_output = tf.random.uniform(shape=[batch_size, input_length, units])
attention = Attention(units)
context_vector, attention_weights = attention(decoder_state, encoder_output)
I am getting the following error:
/usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value)
InvalidArgumentError: Incompatible shapes: [16,20] vs. [16,20,32] [Op:Mul]
It is a very simple fix but as I am new to this I am not able to get the exact method needed to be called here.
I have tried reshaping the values of encoder_output but still this does not work.
Request to help me fix this.
I am just putting #Ayush Srivastava's comment as a response so that the post gets an answer.
Basically, the error occurs because you are trying to multiply 2 tensors (namely attention_weights and encoder_output) with different shapes, so you need to reshape the decoder_state.
Here is the full answer:
class Attention(tf.keras.Model):
def __init__(self,units):
super().__init__()
self.units = units
def call(self, decoder_state, encoder_output):
decoder_state = tf.keras.layers.Reshape((decoder_state.shape[1], 1))(decoder_state)
score = tf.keras.layers.dot([encoder_output, decoder_state],[2, 1])
attention_weights = tf.nn.softmax(score, axis=1)
context_vector = attention_weights * encoder_output
context_vector = tf.reduce_sum(context_vector, axis=1)
return context_vector, attention_weights
Shapes:
decoder_state before reshape: (16, 32)
decoder_state after reshape: (16, 32, 1)
enc_output: (16, 20, 32)
score: (16, 20, 1)
attention_weights: (16, 20, 1)
context_vector before sum: (16, 20, 32)

Linear Regression using sklearn

I have a model fitted with data but having trouble using the predict function.
d = {'df_Size': [1, 3, 5, 8, 10, 15, 18], 'RAM': [3676, 6532, 9432, 13697, 16633, 23620, 27990]}
df = pd.DataFrame(data=d)
df
X = np.array(df['df_Size']).reshape(-1, 1)
y = np.array(df['RAM']).reshape(-1, 1)
model = LinearRegression()
model.fit(X, y)
print(regr.score(X, y))
then when I try to predict on
X_Size = 25
X_Size
prediction = model.predict(X_Size)
I get the following error
ValueError: Expected 2D array, got scalar array instead:
array=25.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
I think I am passing the 25 in the wrong format but would just like some help on getting the response for Ram considering the 25 rows.
Thanks,
You need to pass the predictor in the same shape (basically 1 column):
X.shape
Out[11]: (7, 1)
You can do:
model.predict(np.array(25).reshape(1,1))

define a function in which min() is used

I am trying to define a function in which I want a part of the function limited. I try to do this by using min() but it returns
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
My code:
def f(x, beta):
K_w = (1+((0.5*D)/(0.5*D+x))**2)**2
K_c = min(11,(3.5*(x/D)**(-0.5))) # <-- this is what gives me the problem. It should limit K_c to 11, but that does not work.
K_tot = (K_c**2+K_w**2+2*K_c*K_w*np.cos(beta))**0.5
return K_tot
x = np.linspace(0, 50, 100)
beta = np.linspace(0, 3.14, 180)
X, Y = np.meshgrid(x, beta)
Z = f(X, Y)
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.contour3D(X, Y, Z, 100, cmap = 'viridis')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z');
I expected K_c to be limited to 11, but it gave a
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I might be making a rookie mistake, but help is much appreciated!
Consider using np.clip of which its references can be found here.
np.clip(3.5*(x/D)**(-0.5), None, 11)
for your case.
For example,
>>> import numpy as np
>>> np.clip([1, 2, 3, 15], None, 11)
array([ 1, 2, 3, 11])
The problem with your code is that min is comparing a number with a list of which this is not expected.
Alternatively, here is a list comprehension approach:
A = [1, 2, 3, 15]
B = [min(11, a) for a in A]
print(B)