ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256])

ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256]) - deep-learning

import time
model_path = "/content/drive/My Drive/Generating-Webpages-from-Screenshots-master/models/"
batch_count = len(dataloader)
start = time.time()
for epoch in range(epochs):
#s = 0
for i , (images , captions , lengths) in enumerate(dataloader):
images = Variable(images.cuda())
captions = Variable(captions.cuda())
#lenghts is a list of caption length in descending order
#The collate_fn function does padding to the captions that are short in length
#so we need to pad our targets too so as to compute the loss
targets = nn.utils.rnn.pack_padded_sequence(input = captions, lengths = lengths, batch_first = True)[0]
#Clearing out buffers
E.zero_grad()
D.zero_grad()
features = E(images)
output = D(features , captions , lengths)
loss = criterion(output , targets)
loss.backward()
optimizer.step()
#s = s + 1
if epoch % log_step == 0 and i == 0:
print("Epoch : {} || Loss : {} || Perplexity : {}".format(epoch , loss.item()
, np.exp(loss.item())))
#Uncomment this to use checkpointing
#if (epoch + 1) % save_after_epochs == 0 and i == 0:
#print("Saving Models")
#torch.save(E.state_dict , os.path.join(model_path , 'encoder-{}'.format(epoch + 1)))
#torch.save(D.state_dict , os.path.join(model_path , 'decoder-{}'.format(epoch + 1)))
print("Done Training!")
print("Time : {}".format(time.time() - start))

I can't say for sure what is going on, because I don't know the definition of D, but if it is a neural network that uses batch normalization, the likely problem is that you need to use a batch size larger than 1.

Related

kmer counts with cython implementation

I have this function implemented in Cython:
def count_kmers_cython(str string, list alphabet, int kmin, int kmax):
"""
Count occurrence of kmers in a given string.
"""
counter = {}
cdef int i
cdef int j
cdef int N = len(string)
limits = range(kmin, kmax + 1)
for i in range(0, N - kmax + 1):
for j in limits:
kmer = string[i:i+j]
counter[kmer] = counter.get(kmer, 0) + 1
return counter
Can I do better with cython? Or Can I have any away to improve it?
I am new to cython, that is my first attempt.
I will use this to count kmers in DNA with alphabet restrict to 'ACGT'. The length of the general input string is the average bacterial genomes (130 kb to over 14 Mb, where each 1 kb = 1000 bp).
The size of the kmers will be 3 < kmer < 16.
I wish to know if I could go further and maybe use cython in this function to:
def compute_kmer_stats(kmer_list, counts, len_genome, max_e):
"""
This function computes the z_score to find under/over represented kmers
according to a cut off e-value.
Inputs:
kmer_list - a list of kmers
counts - a dictionary-type with k-mers as keys and counts as values.
len_genome - the total length of the sequence(s).
max_e - cut off e-values to report under/over represented kmers.
Outputs:
results - a list of lists as [k-mer, observed count, expected count, z-score, e-value]
"""
print(colored('Starting to compute the kmer statistics...\n',
'red',
attrs=['bold']))
results = []
# number of tests, used to convert p-value to e-value.
n = len(list(kmer_list))
for kmer in kmer_list:
k = len(kmer)
prefix, sufix, center = counts[kmer[:-1]], counts[kmer[1:]], counts[kmer[1:-1]]
# avoid zero division error
if center == 0:
expected = 0
else:
expected = (prefix * sufix) // center
observed = counts[kmer]
sigma = math.sqrt(expected * (1 - expected / (len_genome - k + 1)))
# avoid zero division error
if sigma == 0.0:
z_score = 0.0
else:
z_score = ((observed - expected) / sigma)
# pvalue for all kmers/palindromes under represented
p_value_under = (math.erfc(-z_score / math.sqrt(2)) / 2)
# pvalue for all kmers/palindromes over represented
p_value_over = (math.erfc(z_score / math.sqrt(2)) / 2)
# evalue for all kmers/palindromes under represented
e_value_under = (n * p_value_under)
# evalue for all kmers/palindromes over represented
e_value_over = (n * p_value_over)
if e_value_under <= max_e:
results.append([kmer, observed, expected, z_score, p_value_under, e_value_under])
elif e_value_over <= max_e:
results.append([kmer, observed, expected, z_score, p_value_over, e_value_over])
return results
OBS - Thank you CodeSurgeon by the help. I know there are other tools to count kmer efficiently but I am learning Python so I am trying to write my own functions and code.

Why Deep Adaptive Input Normalization (DAIN) normalizes time series data accross rows?

The DAIN paper describes how a network learns to normalize time series data by itself, here is how the authors implemented it. The code leads me to think that normalization is happening across rows, not columns. Can anyone explain why it is implemented that way? Because I always thought that one normalizes time series only across columns to keep each feature's true information.
Here is the piece the does normalization:
```python
class DAIN_Layer(nn.Module):
def __init__(self, mode='adaptive_avg', mean_lr=0.00001, gate_lr=0.001, scale_lr=0.00001, input_dim=144):
super(DAIN_Layer, self).__init__()
print("Mode = ", mode)
self.mode = mode
self.mean_lr = mean_lr
self.gate_lr = gate_lr
self.scale_lr = scale_lr
# Parameters for adaptive average
self.mean_layer = nn.Linear(input_dim, input_dim, bias=False)
self.mean_layer.weight.data = torch.FloatTensor(data=np.eye(input_dim, input_dim))
# Parameters for adaptive std
self.scaling_layer = nn.Linear(input_dim, input_dim, bias=False)
self.scaling_layer.weight.data = torch.FloatTensor(data=np.eye(input_dim, input_dim))
# Parameters for adaptive scaling
self.gating_layer = nn.Linear(input_dim, input_dim)
self.eps = 1e-8
def forward(self, x):
# Expecting (n_samples, dim, n_feature_vectors)
# Nothing to normalize
if self.mode == None:
pass
# Do simple average normalization
elif self.mode == 'avg':
avg = torch.mean(x, 2)
avg = avg.resize(avg.size(0), avg.size(1), 1)
x = x - avg
# Perform only the first step (adaptive averaging)
elif self.mode == 'adaptive_avg':
avg = torch.mean(x, 2)
adaptive_avg = self.mean_layer(avg)
adaptive_avg = adaptive_avg.resize(adaptive_avg.size(0), adaptive_avg.size(1), 1)
x = x - adaptive_avg
# Perform the first + second step (adaptive averaging + adaptive scaling )
elif self.mode == 'adaptive_scale':
# Step 1:
avg = torch.mean(x, 2)
adaptive_avg = self.mean_layer(avg)
adaptive_avg = adaptive_avg.resize(adaptive_avg.size(0), adaptive_avg.size(1), 1)
x = x - adaptive_avg
# Step 2:
std = torch.mean(x ** 2, 2)
std = torch.sqrt(std + self.eps)
adaptive_std = self.scaling_layer(std)
adaptive_std[adaptive_std <= self.eps] = 1
adaptive_std = adaptive_std.resize(adaptive_std.size(0), adaptive_std.size(1), 1)
x = x / (adaptive_std)
elif self.mode == 'full':
# Step 1:
avg = torch.mean(x, 2)
adaptive_avg = self.mean_layer(avg)
adaptive_avg = adaptive_avg.resize(adaptive_avg.size(0), adaptive_avg.size(1), 1)
x = x - adaptive_avg
# # Step 2:
std = torch.mean(x ** 2, 2)
std = torch.sqrt(std + self.eps)
adaptive_std = self.scaling_layer(std)
adaptive_std[adaptive_std <= self.eps] = 1
adaptive_std = adaptive_std.resize(adaptive_std.size(0), adaptive_std.size(1), 1)
x = x / adaptive_std
# Step 3:
avg = torch.mean(x, 2)
gate = F.sigmoid(self.gating_layer(avg))
gate = gate.resize(gate.size(0), gate.size(1), 1)
x = x * gate
else:
assert False
return x
```

I am not sure either but they do transpose in forward function : x = x.transpose(1, 2) of the MLP class. Thus, it seemed to me that they normalise over time for each feature.

Shifting tensor over dimension

Lets say I have a tensor of A of shape [batch_size X length X 1024]
I want to do the following :
for the i element in the batch i want to shift the (embedding 1024) of all 'length' elements embedding by their position .
for example the vector A[0 , 0 , : ] should stay the same, and A[0 , 1 , :] should be shifted (or rolled) by 1 , and A[0 , 15 , :] should be shifted by 15.
this is for all the elements in the batch.
so far i did it with for loops, but its not efficient
below is my code with for loops :
x = # [batchsize , length , 1024]
new_embedding = []
llist = []
batch_size = x.shape[0]
seq_len = x.shape[1]
for sample in range(batch_size):
for token in range(seq_len):
orig = x[sample , token , : ]
new_embedding.append(torch.roll(orig , token , 0))
llist.append(torch.stack(new_embedding , 0))
new_embedding = []
x = torch.stack(llist , 0)

Using a metric predictor when modelling ordinal predicted variable in PyMC3

I am trying to implement the ordered probit regression model from chapter 23.4 of Doing Bayesian Data Analysis (Kruschke) in PyMC3. After sampling, the posterior distribution for the intercept and slope are not really comparable to the results from the book. I think there is some fundamental issue with the model definition, but I fail to see it.
Data:
X is the metric predictor (standardized to zX), Y are ordinal outcomes (1-7).
nYlevels3 = df3.Y.nunique()
# Setting the thresholds for the ordinal outcomes. The outer sides are
# fixed, while the others are estimated.
thresh3 = [k + .5 for k in range(1, nYlevels3)]
thresh_obs3 = np.ma.asarray(thresh3)
thresh_obs3[1:-1] = np.ma.masked
#as_op(itypes=[tt.dvector, tt.dvector, tt.dscalar], otypes=[tt.dmatrix])
def outcome_probabilities(theta, mu, sigma):
out = np.empty((mu.size, nYlevels3))
n = norm(loc=mu, scale=sigma)
out[:,0] = n.cdf(theta[0])
out[:,1] = np.max([np.repeat(0,mu.size), n.cdf(theta[1]) - n.cdf(theta[0])])
out[:,2] = np.max([np.repeat(0,mu.size), n.cdf(theta[2]) - n.cdf(theta[1])])
out[:,3] = np.max([np.repeat(0,mu.size), n.cdf(theta[3]) - n.cdf(theta[2])])
out[:,4] = np.max([np.repeat(0,mu.size), n.cdf(theta[4]) - n.cdf(theta[3])])
out[:,5] = np.max([np.repeat(0,mu.size), n.cdf(theta[5]) - n.cdf(theta[4])])
out[:,6] = 1 - n.cdf(theta[5])
return out
with pm.Model() as ordinal_model_metric:
theta = pm.Normal('theta', mu=thresh3, tau=np.repeat(1/2**2, len(thresh3)),
shape=len(thresh3), observed=thresh_obs3, testval=thresh3[1:-1])
# Intercept
zbeta0 = pm.Normal('zbeta0', mu=(1+nYlevels3)/2, tau=1/nYlevels3**2)
# Slope
zbeta = pm.Normal('zbeta', mu=0.0, tau=1/nYlevels3**2)
# Mean of the underlying metric distribution
mu = pm.Deterministic('mu', zbeta0 + zbeta*zX)
zsigma = pm.Uniform('zsigma', nYlevels3/1000.0, nYlevels3*10.0)
pr = outcome_probabilities(theta, mu, zsigma)
y = pm.Categorical('y', pr, observed=df3.Y.cat.codes)
http://nbviewer.jupyter.org/github/JWarmenhoven/DBDA-python/blob/master/Notebooks/Chapter%2023.ipynb
For reference, here is the JAGS model used by Kruschke on which I based my model:
Ntotal = length(y)
# Threshold 1 and nYlevels-1 are fixed; other thresholds are estimated.
# This allows all parameters to be interpretable on the response scale.
nYlevels = max(y)
thresh = rep(NA,nYlevels-1)
thresh[1] = 1 + 0.5
thresh[nYlevels-1] = nYlevels-1 + 0.5
modelString = "
model {
for ( i in 1:Ntotal ) {
y[i] ~ dcat( pr[i,1:nYlevels] )
pr[i,1] <- pnorm( thresh[1] , mu[x[i]] , 1/sigma[x[i]]^2 )
for ( k in 2:(nYlevels-1) ) {
pr[i,k] <- max( 0 , pnorm( thresh[ k ] , mu[x[i]] , 1/sigma[x[i]]^2 )
- pnorm( thresh[k-1] , mu[x[i]] , 1/sigma[x[i]]^2 ) )
}
pr[i,nYlevels] <- 1 - pnorm( thresh[nYlevels-1] , mu[x[i]] , 1/sigma[x[i]]^2 )
}
for ( j in 1:2 ) { # 2 groups
mu[j] ~ dnorm( (1+nYlevels)/2 , 1/(nYlevels)^2 )
sigma[j] ~ dunif( nYlevels/1000 , nYlevels*10 )
}
for ( k in 2:(nYlevels-2) ) { # 1 and nYlevels-1 are fixed, not stochastic
thresh[k] ~ dnorm( k+0.5 , 1/2^2 )
}
}

It was not a fundamental issue after all: I forgot to indicate the axis for np.max() in the function below.
#as_op(itypes=[tt.dvector, tt.dvector, tt.dscalar], otypes=[tt.dmatrix])
def outcome_probabilities(theta, mu, sigma):
out = np.empty((mu.size, nYlevels3))
n = norm(loc=mu, scale=sigma)
out[:,0] = n.cdf(theta[0])
out[:,1] = np.max([np.repeat(0,mu.size), n.cdf(theta[1]) - n.cdf(theta[0])], axis=0)
out[:,2] = np.max([np.repeat(0,mu.size), n.cdf(theta[2]) - n.cdf(theta[1])], axis=0)
out[:,3] = np.max([np.repeat(0,mu.size), n.cdf(theta[3]) - n.cdf(theta[2])], axis=0)
out[:,4] = np.max([np.repeat(0,mu.size), n.cdf(theta[4]) - n.cdf(theta[3])], axis=0)
out[:,5] = np.max([np.repeat(0,mu.size), n.cdf(theta[5]) - n.cdf(theta[4])], axis=0)
out[:,6] = 1 - n.cdf(theta[5])
return out

I try to load Poker Hand dataset(csv) into tensorflow, but the accuracy is always about 50%, how can I do with it?

I try to train an MLP that just consists of a softmax. In tensorflow tutorials, they used mnist dataset, however, I try to use another one, Poker Hand Dataset(10 classes). But by my program, the accuracy is always about 50%, that is so bothersome.
Here is my code
# coding=utf-8
from __future__ import print_function
import tensorflow as tf
import numpy as np
import datetime
class Arc:
def __init__(self):
self.filenames = ['train.csv', 'test.csv']
self.batchSize = 128
self.trainIters = 100000
self.totalEpoch = 1
self.min_after_dequeue = 256
self.capacity = 640
def readData(self, filenames=None):
files = tf.train.string_input_producer(filenames)
reader = tf.TextLineReader()
key, value = reader.read(files)
record_defaults = [[1], [1], [4], [1], [8], [1], [2], [1], [11], [1], [5]]
s1, c1, s2, c2, s3, c3, s4, c4, s5, c5, hand = tf.decode_csv(value,
record_defaults=record_defaults)
features = tf.pack(tf.to_float([s1, c1, s2, c2, s3, c3, s4, c4, s5, c5]))
hand = tf.one_hot(hand, 10, 1, 0, -1, tf.int32)
features_batch, hand_batch = tf.train.shuffle_batch(
[features, hand],
batch_size=self.batchSize,
capacity=self.capacity,
min_after_dequeue=self.min_after_dequeue)
return features_batch, hand_batch
def fullyConnected(self, incoming, n_units, bias=True,
regularizer=None, weight_decay=0.001, trainable=True,
name="FullyConnected"):
if isinstance(incoming, tf.Tensor):
input_shape = incoming.get_shape().as_list()
elif type(incoming) in [np.array, list, tuple]:
input_shape = np.shape(incoming)
else:
raise Exception("Invalid incoming layer")
assert len(input_shape) > 1, "Incoming Tensor shape must be at least 2-D"
n_inputs = int(np.prod(input_shape[1:]))
with tf.name_scope(name) as scope:
W_init = tf.uniform_unit_scaling_initializer(dtype=tf.float32, seed=None)
W_regul = None
if regularizer:
if regularizer == 'L1':
W_regul = lambda x: tf.mul(tf.nn.l2_loss(x), weight_decay, name='L2-Loss')
elif regularizer == 'L2':
W_regul = lambda x: tf.mul(tf.reduce_sum(tf.abs(x)), weight_decay, name='L1-Loss')
with tf.device(''):
try:
W = tf.get_variable(scope + 'W', [n_inputs, n_units], tf.float32, W_init, W_regul)
except Exception as e:
W = tf.get_variable(scope + 'W', [n_inputs, n_units], tf.float32, W_init)
if regularizer is not None:
if regularizer == 'L1':
W = lambda x: tf.mul(tf.nn.l2_loss(W), weight_decay, name='L2-Loss')
elif regularizer == 'L2':
W = lambda x: tf.mul(tf.reduce_sum(tf.abs(W)), weight_decay, name='L1-Loss')
b = None
if bias:
b_init = tf.constant_initializer(0.)
with tf.device(''):
b = tf.get_variable(scope + 'b', [n_units], tf.float32, b_init, W_regul, trainable=trainable)
inference = incoming
if len(input_shape) > 2:
inference = tf.reshape(inference, [-1, n_inputs])
inference = tf.matmul(inference, W)
if b: inference += b
return inference
def network(self, net):
net = self.fullyConnected(net, 10)
net = tf.nn.softmax(net)
return net
def run(self):
features, hand = self.readData(['train.csv'])
x = tf.placeholder(dtype=tf.float32,
shape=[None, 10],
name='Placeholder_X')
y = tf.placeholder(dtype=tf.float32,
shape=[None, 10],
name='Placeholder_Y')
pred = self.network(x)
cost = tf.reduce_mean(-tf.reduce_sum(y * tf.log(pred), reduction_indices=[1]))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01).minimize(cost)
correctPred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correctPred, tf.float32))
init = tf.initialize_all_variables()
startTime = datetime.datetime.now()
with tf.Session() as sess:
sess.run(init)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
iter = 1
while iter * self.batchSize < self.trainIters:
example, label = sess.run([features, hand])
try:
sess.run(optimizer, feed_dict={x: example, y: label})
except Exception as e:
print(e.message)
if iter % 10 == 0:
loss, acc = sess.run([cost, accuracy], feed_dict={x: example, y: label})
print("Iter " + str(iter * self.batchSize) + ", Minibatch Loss= " + \
"{:.6f}".format(loss) + ", Training Accuracy= " + \
"{:.5f}".format(acc))
iter += 1
coord.request_stop()
coord.join(threads)
print('all done')
endTime = datetime.datetime.now()
fitTime = (endTime - startTime)
print("Training Time:", fitTime)
if __name__ == '__main__':
net = Arc()
net.run()
I got the result as
Iter 1280, Minibatch Loss= 2.210387, Training Accuracy= 0.40625
Iter 2560, Minibatch Loss= 2.371088, Training Accuracy= 0.35156
Iter 3840, Minibatch Loss= 1.723017, Training Accuracy= 0.42188
Iter 5120, Minibatch Loss= 1.650101, Training Accuracy= 0.43750
....
....
Iter 98560, Minibatch Loss= 0.990002, Training Accuracy= 0.54688
Iter 99840, Minibatch Loss= 1.142664, Training Accuracy= 0.52344
all done
Training Time: 0:00:12.081167
What mistake did I make? I guess maybe the queue caused that?

I took a look at it and there are a lot of errors in your code
no activation function
only one layer of fully connected that has very little capacity
the print of the loss value is not displaying the correct value
no encoding of the categorical input value (encode s1 as 4 one_hot encode and c1 as 13 one_hot encode and concatenate the result)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256]) - deep-learning

I can't say for sure what is going on, because I don't know the definition of D, but if it is a neural network that uses batch normalization, the likely problem is that you need to use a batch size larger than 1.

Related

kmer counts with cython implementation

Why Deep Adaptive Input Normalization (DAIN) normalizes time series data accross rows?

Shifting tensor over dimension

Using a metric predictor when modelling ordinal predicted variable in PyMC3

I try to load Poker Hand dataset(csv) into tensorflow, but the accuracy is always about 50%, how can I do with it?

Categories

Resources