Additional seedwords argument in LDA() function from topicmodels - lda

I am looking for an in depth example of Latent Dirichlet Allocation (LDA) with seedwords specified for the topicmodels package in R.
The basic function takes on the form:
LDA(x, k, method = "Gibbs", control = NULL, model = NULL, ...)
And the documentation only states:
For method = "Gibbs" an additional argument seedwords can be specified
as a matrix or an object of class "simple_triplet_matrix"; the default
is NULL.
Can anyone point me to a complete example of how this would look and function?

Taken from this answer:
https://stats.stackexchange.com/questions/384183/seeded-lda-using-topicmodels-in-r
library("topicmodels")
data("AssociatedPress", package = "topicmodels")
## We fit 6 topics.
## We specify five seed words for five topics, the sixth topic has no
## seed words.
library("slam")
set.seed(123)
i <- rep(1:5, each = 5)
j <- sample(1:ncol(AssociatedPress), 25)
SeedWeight <- 500 - 0.1
deltaS <- simple_triplet_matrix(i, j, v = rep(SeedWeight, 25),
nrow = 6, ncol = ncol(AssociatedPress))
set.seed(1000)
ldaS <- LDA(AssociatedPress, k = 6, method = "Gibbs", seedwords = deltaS,
control = list(alpha = 0.1, best = TRUE,
verbose = 500, burnin = 500, iter = 100, thin = 100, prefix = character()))
apply(deltaS, 1, function(x) which(x == SeedWeight))
apply(posterior(ldaS)$terms, 1, function(x) order(x, decreasing = TRUE)[1:5])

Related

WHat does Lambda do in this code (python keras)?

def AdaIN(x):
#Normalize x[0] (image representation)
mean = K.mean(x[0], axis = [1, 2], keepdims = True)
std = K.std(x[0], axis = [1, 2], keepdims = True) + 1e-7
y = (x[0] - mean) / std
#Reshape scale and bias parameters
pool_shape = [-1, 1, 1, y.shape[-1]]
scale = K.reshape(x[1], pool_shape)
bias = K.reshape(x[2], pool_shape)#Multiply by x[1] (GAMMA) and add x[2] (BETA)
return y * scale + bias
def g_block(input_tensor, latent_vector, filters):
gamma = Dense(filters, bias_initializer = 'ones')(latent_vector)
beta = Dense(filters)(latent_vector)
out = UpSampling2D()(input_tensor)
out = Conv2D(filters, 3, padding = 'same')(out)
out = Lambda(AdaIN)([out, gamma, beta])
out = Activation('relu')(out)
return out
Please see code above. I am currently studying styleGAN. I am trying to convert this code into pytorch but I cant seem to understand what does Lambda do in g_block. AdaIN needs only one input based on its declaration but some how is gamma and beta also used as input? Please inform me what does the Lambda do in this code.
Thank you very much.
Lambda layers in keras are used to call custom functions inside the model. In g_block Lambda calls AdaIN function and passes out, gamma, beta as arguments inside a list. And AdaIN function receives these 3 tensors encapsulated within a single list as x. And also those tensors are accessed inside AdaIN function by indexing list x(x[0], x[1], x[2]).
Here's pytorch equivalent:
import torch
import torch.nn as nn
import torch.nn.functional as F
class AdaIN(nn.Module):
def forward(self, out, gamma, beta):
bs, ch = out.size()[:2]
mean = out.reshape(bs, ch, -1).mean(dim=2).reshape(bs, ch, 1, 1)
std = out.reshape(bs, ch, -1).std(dim=2).reshape(bs, ch, 1, 1) + 1e-7
y = (out - mean) / std
bias = beta.unsqueeze(-1).unsqueeze(-1).expand_as(out)
scale = gamma.unsqueeze(-1).unsqueeze(-1).expand_as(out)
return y * scale + bias
class g_block(nn.Module):
def __init__(self, filters, latent_vector_shape, input_tensor_channels):
super().__init__()
self.gamma = nn.Linear(in_features = latent_vector_shape, out_features = filters)
# Initializes all bias to 1
self.gamma.bias.data = torch.ones(filters)
self.beta = nn.Linear(in_features = latent_vector_shape, out_features = filters)
# calculate appropriate padding
self.conv = nn.Conv2d(input_tensor_channels, filters, 3, 1, padding=1)# calc padding
self.adain = AdaIN()
def forward(self, input_tensor, latent_vector):
gamma = self.gamma(latent_vector)
beta = self.beta(latent_vector)
# check default interpolation mode in keras and replace mode below if different
out = F.interpolate(input_tensor, scale_factor=2, mode='nearest')
out = self.conv(out)
out = self.adain(out, gamma, beta)
out = torch.relu(out)
return out
# Sample:
input_tensor = torch.randn((1, 3, 10, 10))
latent_vector = torch.randn((1, 5))
g = g_block(3, latent_vector.shape[1], input_tensor.shape[1])
out = g(input_tensor, latent_vector)
print(out)
Note: you need to pass latent_vector and input_tensor shapes while creating g_block.

How do i measure perplexity scores on a LDA model made with the textmineR package in R?

I've made a LDA topic model in R, using the textmineR package, it looks as follows.
## get textmineR dtm
dtm2 <- CreateDtm(doc_vec = dat2$fulltext, # character vector of documents
ngram_window = c(1, 2),
doc_names = dat2$names,
stopword_vec = c(stopwords::stopwords("da"), custom_stopwords),
lower = T, # lowercase - this is the default value
remove_punctuation = T, # punctuation - this is the default
remove_numbers = T, # numbers - this is the default
verbose = T,
cpus = 4)
dtm2 <- dtm2[, colSums(dtm2) > 2]
dtm2 <- dtm2[, str_length(colnames(dtm2)) > 2]
############################################################
## RUN & EXAMINE TOPIC MODEL
############################################################
# Draw quasi-random sample from the pc
set.seed(34838)
model2 <- FitLdaModel(dtm = dtm2,
k = 8,
iterations = 500,
burnin = 200,
alpha = 0.1,
beta = 0.05,
optimize_alpha = TRUE,
calc_likelihood = TRUE,
calc_coherence = TRUE,
calc_r2 = TRUE,
cpus = 4)
The questions are then:
1. Which function should i apply to get the perplexity scores in the textmineR package? I can't seem to find one.
2. how do i measure complexity scores for different numbers of topics(k)?
As asked: there's no way to calculate perplexity with textmineR unless you explicitly program it yourself. TBH, I've never seen value of perplexity that you couldn't get with likelihood and coherence, so I didn't implement it.
However, the text2vec package does have an implementation. See below for example:
library(textmineR)
# model ships with textmineR as example
m <- nih_sample_topic_model
# dtm ships with textmineR as example
d <- nih_sample_dtm
# get perplexity
p <- text2vec::perplexity(X = d,
topic_word_distribution = m$phi,
doc_topic_distribution = m$theta)

Insert graph from function call into subplots

I have a function which produces exactly the graphs I need:
def graph(x, y, yName, dataset, colour, residuals = False, calc_tau = False):
"""
Takes as an argument two arrays, one for x and one y, a string for the
name of the y-variable, and two colours (which could be strings, values
or iterators).
Plots a scatter graph of the array against log(z), along with the best fit
line, its equation and its regression coefficient, as well as Kendall's tau
and the total number of points plotted.
"""
arrayresults, arrayresids, arrayparams, arrayfit, arrayequation, arraytau = computeLinearStats(x,
y,
yName, calc_tau)
count = np.count_nonzero(~np.logical_or(np.isnan(x), np.isnan(y)))
# arrayequation = 'r\'' + yName +arrayequation[arrayequation.index('='):]
plt.scatter(x, y,
# label = arrayequation,
s = 10,
alpha = 0.5,
c = colour)
if calc_tau: #if calc_tau is set to True, display the value
#in the legend along with equation, r and n
plt.plot(x, arrayfit,
label = r'''%s
r=%g
$\tau$=%g
n=%d'''%(arrayequation, arrayparams[2], arraytau[0], count),
c = colour)
else: #otherwise just display equation, r and n
plt.plot(x, arrayfit,
label = r'''%s
$r=%g$
$n=%d$
$r^2n=%.2f$'''%(arrayequation, arrayparams[2], count, arrayresults.nobs*arrayparams[2]**2),
c = colour)
legendfont = 16
labelfont = 20
plt.xlabel('$log_{10}(z)$', fontsize = labelfont)
plt.ylabel('Magnitude combination, %s dataset'%dataset, fontsize = labelfont)
plt.legend(fontsize = legendfont)
plt.xticks(fontsize = labelfont)
plt.yticks(fontsize = labelfont)
plt.grid(True, which = 'both')
# plt.title(r'The three best high-$r$ combinations in both Hunstead and MgII', fontsize = labelfont)
if residuals:
plotResids(x, y, yName, dataset, colour)
plt.show()
return arrayresults, arrayresids, arrayparams, arrayfit, arrayequation, arraytau
which I can call multiple times to produce, for example:
and
It's kind of an ad hoc function, but it produces the graphs how I want them to look. Is there an easy way of combining the graphs output by multiple calls to the function into a set of subplots? I've tried something like
x = sources['z']
y1 = I-W2
y2 = W3-U
fig,ax = plt.subplots(2,1, sharex=True, sharey=False, gridspec_kw={'hspace': 0})
fig.suptitle('Graphs')
ax[0].scatter(x, y1, c='red', s = 10, alpha = 0.3)
ax[1].scatter(x, y2, c='purple', s = 10, alpha = 0.3)
but I'd like them with all the accoutrements. Ideally, I'd like ax1 and ax2 to call my graphing function and display the output in subplots. Is this possible?
EDIT: thanks to a comment, I'm able to use
x = sources['z']
y1 = I-W2
y2 = W3-U
fig,ax = plt.subplots(2,1, sharex=True, sharey=True, gridspec_kw={'hspace': 0})
ax1 = qf.graph(x, y1, 'I-W2', dataset, 'blue', ax[0], residuals = False, calc_tau = False) #2 rows, 1 column, first plot
ax2 = qf.graph(x, y2, 'W3-U', dataset, 'red', ax[1], residuals = False, calc_tau = False) #2 rows, 1 column, second plot
fig.suptitle('Graphs')
to produce
but how can I get one graph into the top position (ax[0])?

Reinforcement learning, why the performance collapsed?

I am trying to train an agent on ViZDoom platform on the deadly_corridor scenario with A3C algorithm and TensorFlow on TITAN X GPU server, however, the performance collapsed after training about 2+ days. As you can see in the following picture.
There are 6 demons in the corridor and the agent should kill at least 5 demons to get to the destination and get the vest.
Here is the code of the newtwork
with tf.variable_scope(scope):
self.inputs = tf.placeholder(shape=[None, *shape, 1], dtype=tf.float32)
self.conv_1 = slim.conv2d(activation_fn=tf.nn.relu, inputs=self.inputs, num_outputs=32,
kernel_size=[8, 8], stride=4, padding='SAME')
self.conv_2 = slim.conv2d(activation_fn=tf.nn.relu, inputs=self.conv_1, num_outputs=64,
kernel_size=[4, 4], stride=2, padding='SAME')
self.conv_3 = slim.conv2d(activation_fn=tf.nn.relu, inputs=self.conv_2, num_outputs=64,
kernel_size=[3, 3], stride=1, padding='SAME')
self.fc = slim.fully_connected(slim.flatten(self.conv_3), 512, activation_fn=tf.nn.elu)
# LSTM
lstm_cell = tf.contrib.rnn.BasicLSTMCell(cfg.RNN_DIM, state_is_tuple=True)
c_init = np.zeros((1, lstm_cell.state_size.c), np.float32)
h_init = np.zeros((1, lstm_cell.state_size.h), np.float32)
self.state_init = [c_init, h_init]
c_in = tf.placeholder(tf.float32, [1, lstm_cell.state_size.c])
h_in = tf.placeholder(tf.float32, [1, lstm_cell.state_size.h])
self.state_in = (c_in, h_in)
rnn_in = tf.expand_dims(self.fc, [0])
step_size = tf.shape(self.inputs)[:1]
state_in = tf.contrib.rnn.LSTMStateTuple(c_in, h_in)
lstm_outputs, lstm_state = tf.nn.dynamic_rnn(lstm_cell,
rnn_in,
initial_state=state_in,
sequence_length=step_size,
time_major=False)
lstm_c, lstm_h = lstm_state
self.state_out = (lstm_c[:1, :], lstm_h[:1, :])
rnn_out = tf.reshape(lstm_outputs, [-1, 256])
# Output layers for policy and value estimations
self.policy = slim.fully_connected(rnn_out,
cfg.ACTION_DIM,
activation_fn=tf.nn.softmax,
biases_initializer=None)
self.value = slim.fully_connected(rnn_out,
1,
activation_fn=None,
biases_initializer=None)
if scope != 'global' and not play:
self.actions = tf.placeholder(shape=[None], dtype=tf.int32)
self.actions_onehot = tf.one_hot(self.actions, cfg.ACTION_DIM, dtype=tf.float32)
self.target_v = tf.placeholder(shape=[None], dtype=tf.float32)
self.advantages = tf.placeholder(shape=[None], dtype=tf.float32)
self.responsible_outputs = tf.reduce_sum(self.policy * self.actions_onehot, axis=1)
# Loss functions
self.policy_loss = -tf.reduce_sum(self.advantages * tf.log(self.responsible_outputs+1e-10))
self.value_loss = tf.reduce_sum(tf.square(self.target_v - tf.reshape(self.value, [-1])))
self.entropy = -tf.reduce_sum(self.policy * tf.log(self.policy+1e-10))
# Get gradients from local network using local losses
local_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope)
value_var, policy_var = local_vars[:-2] + [local_vars[-1]], local_vars[:-2] + [local_vars[-2]]
self.var_norms = tf.global_norm(local_vars)
self.value_gradients = tf.gradients(self.value_loss, value_var)
value_grads, self.grad_norms_value = tf.clip_by_global_norm(self.value_gradients, 40.0)
self.policy_gradients = tf.gradients(self.policy_loss, policy_var)
policy_grads, self.grad_norms_policy = tf.clip_by_global_norm(self.policy_gradients, 40.0)
# Apply local gradients to global network
global_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'global')
global_vars_value, global_vars_policy = \
global_vars[:-2] + [global_vars[-1]], global_vars[:-2] + [global_vars[-2]]
self.apply_grads_value = optimizer.apply_gradients(zip(value_grads, global_vars_value))
self.apply_grads_policy = optimizer.apply_gradients(zip(policy_grads, global_vars_policy))
And the optimizer is
optimizer = tf.train.RMSPropOptimizer(learning_rate=1e-5)
And here are some summaries of the gradients and norms
Help some one can help me to tackle this problem.
Now, personally, I think the reason why the performance of the agent collapsed is maybe the overoptimization of values. I read a paper on Double DQN on this, you can read this paper DEEP REINFORCEMENT LEARNING WITH DOUBLE Q-LEARNING

PyMC3 how to implement latent dirichlet allocation?

I am trying to implement lda using PyMC3.
However, when defining the last part of the model in which words are sampled based on their topics, I keep getting the error: TypeError: list indices must be integers, not TensorVariable
How to tackle the problem?
The code is as follows:
## Data Preparation
K = 2 # number of topics
N = 4 # number of words
D = 3 # number of documents
import numpy as np
data = np.array([[1, 1, 1, 1], [1, 1, 1, 1], [0, 0, 0, 0]])
Wd = [len(doc) for doc in data] # length of each document
## Model Specification
from pymc3 import Model, Normal, HalfNormal, Dirichlet, Categorical, constant
lda_model = Model()
with lda_model:
# Priors for unknown model parameters
alpha = HalfNormal('alpha', sd=1)
eta = HalfNormal('eta', sd=1)
a1 = eta*np.ones(shape=N)
a2 = alpha*np.ones(shape=K)
beta = [Dirichlet('beta_%i' % i, a1, shape=N) for i in range(K)]
theta = [Dirichlet('theta_%s' % i, a2, shape=K) for i in range(D)]
z = [Categorical('z_%i' % d, p = theta[d], shape=Wd[d]) for d in range(D)]
# That's when you get the error. It is caused by: beta[z[d][w]]
w = [Categorical('w_%i_%i' % (d, w), p = beta[z[d][w]], observed = data[i,j]) for d in range(D) for w in range(Wd[d])]
Any help would be much appreciated!
beta[z[d][w]] is naturally incorrect because z[d][w] is a variable stored by PyMC instead of being an fixed index.
In pymc2 it is solved by lambda function
p=pm.Lambda("phi_z_%s_%s" % (d,i),
lambda z=z[d][w], beta=beta: beta[z])
In pymc3 it is suppose to be solved by
#theano.compile.ops.as_op
def your_function
But there is a problem here that it seems like Theano doesn't allow sending a python list of pymc variable. t.lvector baisically don't work.
More discussion is in this question:
Unable to create lambda function in hierarchical pymc3 model
The following code was adapted from what has been referenced by #Hanan. I've somehow made it work with pymc3.
import numpy as np
import pymc3 as pm
def get_word_dict(collection):
vocab_list = list({word for doc in collection for word in doc})
idx_list = [i for i in range(len(vocab_list))]
return dict(zip(vocab_list,idx_list))
def word_to_idx(dict_vocab_idx, collection):
return [[dict_vocab_idx[word] for word in doc] for doc in collection]
docs = [["sepak","bola","sepak","bola","bola","bola","sepak"],
["uang","ekonomi","uang","uang","uang","ekonomi","ekonomi"],
["sepak","bola","sepak","bola","sepak","sepak"],
["ekonomi","ekonomi","uang","uang"],
["sepak","uang","ekonomi"],
["komputer","komputer","teknologi","teknologi","komputer","teknologi"],
["teknologi","komputer","teknologi"]]
dict_vocab_idx = get_word_dict(docs)
idxed_collection = word_to_idx(dict_vocab_idx, docs)
n_topics = 3
n_vocab = len(dict_vocab_idx)
n_docs = len(idxed_collection)
length_docs = [len(doc) for doc in idxed_collection]
alpha = np.ones([n_docs, n_topics])
beta = np.ones([n_topics, n_vocab])
with pm.Model() as model:
theta = pm.distributions.Dirichlet('theta', a=alpha, shape=(n_docs, n_topics))
phi = pm.distributions.Dirichlet('phi', a=beta, shape=(n_topics, n_vocab))
zs = [pm.Categorical("z_d{}".format(d), p=theta[d], shape=length_docs[d]) for d in range(n_docs)]
ws = [pm.Categorical("w_{}_{}".format(d,i), p=phi[zs[d][i]], observed=idxed_collection[d][i])
for d in range(n_docs) for i in range(length_docs[d])]
trace = pm.sample(2000)
for d in range(n_docs):
value_z=trace.get_values("z_d{}".format(d))
print(value_z[1999])
check out this blog post. I haven't tested it.
import numpy as np
import pymc as pc
def wordDict(collection):
word_id = {}
idCounter = 0
for d in collection:
for w in d:
if (w not in word_id):
word_id[w] = idCounter
idCounter+=1
return word_id
def toNpArray(word_id, collection):
ds = []
for d in collection:
ws = []
for w in d:
ws.append(word_id.get(w,0))
ds.append(ws)
return np.array(ds)
###################################################
#doc1, doc2, ..., doc7
docs = [["sepak","bola","sepak","bola","bola","bola","sepak"],
["uang","ekonomi","uang","uang","uang","ekonomi","ekonomi"],
["sepak","bola","sepak","bola","sepak","sepak"],
["ekonomi","ekonomi","uang","uang"],
["sepak","uang","ekonomi"],
["komputer","komputer","teknologi","teknologi","komputer","teknologi"],
["teknologi","komputer","teknologi"]]
word_dict = wordDict(docs)
collection = toNpArray(word_dict,docs)
#number of topics
K = 3
#number of words (vocab)
V = len(word_dict)
#number of documents
D = len(collection)
#array([1, 1, 1, ..., 1]) K times
alpha = np.ones(K)
#array([1, 1, 1, ..., 1]) V times
beta = np.ones(V)
#array containing the information about doc length in our collection
Nd = [len(doc) for doc in collection]
######################## LDA model ##################################
#topic distribution per-document
theta = pc.Container([pc.CompletedDirichlet("theta_%s" % i,
pc.Dirichlet("ptheta_%s"%i, theta=alpha))
for i in range(D)])
#word distribution per-topic
phi = pc.Container([pc.CompletedDirichlet("phi_%s" % j,
pc.Dirichlet("pphi_%s" % j, theta=beta))
for j in range(K)])
#Please note that this is the tricky part :)
z = pc.Container([pc.Categorical("z_%i" % d,
p = theta[d],
size = Nd[d],
value = np.random.randint(K, size=Nd[d]))
for d in range(D)])
#word generated from phi, given a topic z
w = pc.Container([pc.Categorical("w_%i_%i" % (d,i),
p = pc.Lambda("phi_z_%i_%i" % (d,i),
lambda z=z[d][i], phi=phi : phi[z]),
value=collection[d][i],
observed=True)
for d in range(D) for i in range(Nd[d])])
####################################################################
model = pc.Model([theta, phi, z, w])
mcmc = pc.MCMC(model)
mcmc.sample(iter=5000, burn=1000)
#show the topic assignment for each word, using the last trace
for d in range(D):
print(mcmc.trace('z_%i'%d)[3999])