Python modularity statistic using louvain package - igraph

I am trying to do batch wise run of python's louvain modularity partition of my network atleast a 100 times. I need to save the partitions and save the modularity score after the partition.Is there any way to get both of them? I have tried the documentation and example and it only returns the partitions but no modularity statistic. Please point me to the right direction and I will be extremely grateful for your help.

You can compute the modularity score using igraph.Graph.community():
import igraph
g = igraph.Graph.Erdos_Renyi(n=100, p=0.1)
clusters = g.community_multilevel()
modularity_score = g.modularity(clusters.membership)

Here is how to estimate the modularity Q using louvain algorithm in igraph.
Important: If you have weighted adjacency matrices, DO NOT FORGET the weights=graph.es['weight'] argument in the below functions!
import numpy as np
from igraph import *
W = np.random.rand(10,10)
np.fill_diagonal(W,0.0)
graph = Graph.Weighted_Adjacency(W.tolist(), mode=ADJ_UNDIRECTED, attr="weight", loops=False)
louvain_partition = graph.community_multilevel(weights=graph.es['weight'], return_levels=False)
modularity1 = graph.modularity(louvain_partition, weights=graph.es['weight'])
print("The modularity Q based on igraph is {}".format(modularity1))
The modularity Q based on igraph is 0.02543865641866777

Related

Can neural network converge to completely random function

I am trying to train DNN that converges to random (i.e., drawn from normal distribution) function but for now the network doesn't learn anything and the loss is stuck. Is is even possible or am I just wasting my time?
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Model
from tensorflow.keras.layers import Input, Dense
import numpy as np
import matplotlib.pyplot as plt
n_hidden_units = 25
num_lay = 10
learning_rate = 0.01
batch_size = 1000
epochs = 1000
save_freq_epoches = 500000
num_of_xs = 2
inputs_train = np.random.randn(batch_size*10,num_of_xs)*1
outputs_train = np.random.randn(batch_size*10,1)#np.sum(inputs_train,axis=1)#
inputs_train = tf.convert_to_tensor(inputs_train)
outputs_train = tf.convert_to_tensor((outputs_train-outputs_train.min())/(outputs_train.max()-outputs_train.min()))
kernel_init = keras.initializers.RandomUniform(-0.25, 0.25)
inputs = Input(num_of_xs)
x = Dense(n_hidden_units, kernel_initializer=kernel_init, activation='relu', )(inputs)
for _ in range(num_lay):
x = Dense(n_hidden_units,kernel_initializer=kernel_init, activation='relu', )(x)
outputs = Dense(1, kernel_initializer=kernel_init, activation='linear')(x)
model = Model(inputs=inputs, outputs=outputs)
optimizer1 = keras.optimizers.Adam(beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0,
amsgrad=True,learning_rate=learning_rate)
model.compile(loss='mse', optimizer=optimizer1, metrics=None)
model.fit(inputs_train, outputs_train, batch_size=batch_size,epochs=epochs, shuffle=False,
verbose=2,
)
plt.plot(outputs_train,'ob')
plt.plot(model(inputs_train),'*r')
plt.show()
For now I am getting the worst predictions (in red) relative to the target labels (blue)
If you are using a validation split, you can't. Otherwise you do, but it will be hard, since good pipelines have regularization techniques that try to prevent this from happening.
Your target distribution is given by
np.random.randn(batch_size*10,1)
Then normalized to:
(outputs_train-outputs_train.min())/(outputs_train.max()-outputs_train.min())
As you can see, your targets are completely independent from your variable x! So, if you have to predict the value (y) for a previously unseen value (x), there is literally nothing you can do better than simply predicting the mean value for y.
In other words, your target distribution is a flat line y = avg + noise.
Your question is then: can the network predict this extra noise? Well, no, that's why we call it noise, because it is the random deviations from the pattern that are completely unrelated to the input info that we feed the network.
BUT.
If you do NOT use validation (that is, you are interested in the prediction error with respect to the {x, y} pairs that you see during training) then the network will learn noise, up to its full prediction capacity (the more complex the network, the more it can adapt to complex noise). This is precisely what we call overfitting, and it is a BAD thing!
Normally we want models to predict something like "y = x * 2 + 3", whereas learning noise is more like learning a dictionary of unrelated predictions: "{x1: 2.93432, x2: -0.00324, ...}"
Because overfitting is bad (it is bad because it makes predictions for unseen validation data worse, which means our models are worse in new data), pipelines have built-in techniques to fight the natural tendency of neural networks to do this. Such techniques include data augmentation (common in images), early stopping, dropout, and so on.
If you REALLY need to overfit to your data, you will need to deactivate any such techniques, and train for as long as you can (which is normally not something we want to do!).

Proper way to extract embedding weights for CBOW model?

I'm currently trying to implement the CBOW model on managed to get the training and testing, but am facing some confusion as to the "proper" way to finally extract the weights from the model to use as our word embeddings.
Model
class CBOW(nn.Module):
def __init__(self, config, vocab):
self.config = config # Basic config file to hold arguments.
self.vocab = vocab
self.vocab_size = len(self.vocab.token2idx)
self.window_size = self.config.window_size
self.embed = nn.Embedding(num_embeddings=self.vocab_size, embedding_dim=self.config.embed_dim)
self.linear = nn.Linear(in_features=self.config.embed_dim, out_features=self.vocab_size)
def forward(self, x):
x = self.embed(x)
x = torch.mean(x, dim=0) # Average out the embedding values.
x = self.linear(x)
return x
Main process
After I run my model through a Solver with the training and testing data, I basically told the train and test functions to also return the model that's used. Then I assigned the embedding weights to a separate variable and used those as the word embeddings.
Training and testing was conducted using cross entropy loss, and each training and testing sample is of the form ([context words], target word).
def run(solver, config, vocabulary):
for epoch in range(config.num_epochs):
loss_train, model_train = solver.train()
loss_test, model_test = solver.test()
embeddings = model_train.embed.weight
I'm not sure if this is the correct way of going about extracting and using the embeddings. Is there usually another way to do this? Thanks in advance.
Yes, model_train.embed.weight will give you a torch tensor that stores the embedding weights. Note however, that this tensor also contains the latest gradients. If you don't want/need them, model_train.embed.weight.data will give you the weights only.
A more generic option is to call model_train.embed.parameters(). This will give you a generator of all the weight tensors of the layer. In general, there are multiple weight tensors in a layer and weight will give you only one of them. Embedding happens to have only one, so here it doesn't matter which option you use.

Faster way to do multiple embeddings in PyTorch?

I'm working on a torch-based library for building autoencoders with tabular datasets.
One big feature is learning embeddings for categorical features.
In practice, however, training many embedding layers simultaneously is creating some slowdowns. I am using for-loops to do this and running the for-loop on each iteration is (I think) what's causing the slowdowns.
When building the model, I associate embedding layers with each categorical feature in the user's dataset:
for ft in self.categorical_fts:
feature = self.categorical_fts[ft]
n_cats = len(feature['cats']) + 1
embed_dim = compute_embedding_size(n_cats)
embed_layer = torch.nn.Embedding(n_cats, embed_dim)
feature['embedding'] = embed_layer
Then, with a call to .forward():
embeddings = []
for i, ft in enumerate(self.categorical_fts):
feature = self.categorical_fts[ft]
emb = feature['embedding'](codes[i])
embeddings.append(emb)
#num and bin are numeric and binary features
x = torch.cat(num + bin + embeddings, dim=1)
Then x goes into dense layers.
This gets the job done but running this for loop during each forward pass really slows down training, especially when a dataset has tens or hundreds of categorical columns.
Does anybody know of a way of vectorizing something like this? Thanks!
UPDATE:
For more clarity, I made this sketch of how I'm feeding categorical features into the network. You can see that each categorical column has its own embedding matrix, while numeric features are concatenated directly to their output before being passed into the feed-forward network.
Can we do this without iterating through each embedding matrix?
just use simple indexing [, though i'm not sure whether it is fast enough
Here is a simplified version for all feature have same vocab_size and embedding dim, but it should apply to cases of heterogeneous category features
xdim = 240
embed_dim = 8
vocab_size = 64
embedding_table = torch.randn(size=(xdim, vocab_size, embed_dim))
batch_size = 32
x = torch.randint(vocab_size, size=(batch_size, xdim))
out = embedding_table[torch.arange(xdim), x]
out.shape # (bz, xdim, embed_dim)
# unit test
i = np.random.randint(batch_size)
j = np.random.randint(xdim)
x_index = x[i][j]
w = embedding_table[j]
torch.allclose(w[x_index], out[i, j])

How to print/return the softmax score in Keras during training?

Question: How do I print/return the softmax layer for a multiclass problem using Keras?
my motivation: it is important for visualization/debugging.
it is important to do this for the 'training' setting. ergo batch normalization and dropout must behave as they do in train time.
it should be efficient. calling vanilla model.predict() every now and then is less desirable as the model I am using is heavy and this is extra forward passes. The most desirable case is finding a way to simply display the original network output which was calculated during training.
it is ok to assume that this is done while using Tensorflow as a backend.
Thank you.
You can get the outputs of any layer by using: model.layers[index].output
For all layers use this:
from keras import backend as K
inp = model.input # input placeholder
outputs = [layer.output for layer in model.layers] # all layer outputs
functor = K.function([inp]+ [K.learning_phase()], outputs ) # evaluation function
# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = functor([test, 1.])
print layer_outs

Tune input features using backprop in keras

I am trying to implement discriminant condition codes in Keras as proposed in
Xue, Shaofei, et al., "Fast adaptation of deep neural network based
on discriminant codes for speech recognition."
The main idea is you encode each condition as an input parameter and let the network learn dependency between the condition and the feature-label mapping. On a new dataset instead of adapting the entire network you just tune these weights using backprop. For example say my network looks like this
X ---->|----|
|DNN |----> Y
Z --- >|----|
X: features Y: labels Z:condition codes
Now given a pretrained DNN, and X',Y' on a new dataset I am trying to estimate the Z' using backprop that will minimize prediction error on Y'. The math seems straightforward except I am not sure how to implement this in keras without having access to the backprop itself.
For instance, can I add an Input() layer with trainable=True with all other layers set to trainable= False. Can backprop in keras update more than just layer weights? Or is there a way to hack keras layers to do this?
Any suggestions welcome.
thanks
I figured out how to do this (exactly) in Keras by looking at fchollet's post here
Using the keras backend I was able to compute the gradient of my loss w.r.t to Z directly and used it to drive the update.
Code below:
import keras.backend as K
import numpy as np
model.summary() #Pretrained model
loss = K.categorical_crossentropy(Y, Y_out)
grads = K.gradients(loss, Z)
grads /= (K.sqrt(K.mean(K.square(grads)))+ 1e-5)
iterate = K.function([X,Z],[loss,grads])
step = 0.1
Z_adapt = Z_in.copy()
for i in range(100):
loss_val, grads_val = iterate([X_in,Z_adapt])
Z_adapt -= grads_val[0] * step
print "iter:",i,np.mean(loss_value)
print "Before:"
print model.evaluate([X_in, Z_in],Y_out)
print "After:"
print model.evaluate([X_in, Z_adapt],Y_out)
X,Y,Z are nodes in the model graph. Z_in is an initial value for Z'. I set it to an average value from the train set. Z_adapt is after 100 iterations of gradient descent and should give you a better result.
Assume that the size of Z is m x n. Then you can first define an input layer of size m * n x 1. The input will be an m * n x 1 vector of ones. You can define a dense layer containing m * n neurons and set trainable = True for it. The response of this layer will give you a flattened version of Z. Reshape it appropriately and give it as input to the rest of the network that can be appended ahead of this.
Keep in mind that if the size of Z is too large, then network may not be able to learn a dense layer of that many neurons. In that case, maybe you need to put additional constraints or look into convolutional layers. However, convolutional layers will put some constraints on Z.