Count only a subset of motifs of size k - igraph

I want to count the motifs of size 4 in a tree graph:
library(igraph)
g <- barabasi.game(100)
census.motifs <- motifs(g, size=4)[c(4,8,13,30)]
There are 217 possible graphs with 4 vertices, but only 4 of them can appear in a directed rooted tree.
Is there a way to tell igraph that it only has to look for those 4? Or a faster/clever way to do this?

The four motifs in a directed rooted tree could be counted as k-instars using the ergm package http://svitsrv25.epfl.ch/R-doc/library/ergm/html/ergm-terms.html
A k-instar is a set of k nodes all sharing one common root. If n is the number of nodes in your tree, the counts for your 4 motifs will be the number of 3-instars (fully connected), (n-3) times the number of 2-instars (two edges connecting to root and one other node), (n-2) choose 2 times the number of 1-instars (one edge connecting to the the root and two other nodes), and n choose 4 minus the sum of the previous three counts. In R you could use,
library(intergraph)
library(ergm)
library(igraph)
n <- 100
g <- barabasi.game(n)
kistars <- summary(asNetwork(g)~istar(1:3))
kistars[3]
(n-3)*kistars[2]
choose(n-2,2)*kistars[1]
choose(n,4)*sum(kistars)

Related

How are matrices multiplied in Hierarchical Softmax model?

As I understood, the simple word2vec approach uses two matrices like the following:
Assuming that the corpus consists of N words.
Weighted input matrix (WI) with dimensions NxF (F is number of features).
Weighted output matrix (WO) with dimensions FxN.
We multiply one hot vector 1xN with WI and get a neurone 1xF.
Then we multiply the neurone with WO and get an output vector 1xN.
We apply softmax function and choose the highest entry (probability) in the vector.
Question: how is this illustrated when using the Hierarchical Softmax model?
What will be multiplied with which matrix to get the 2 dimensional vector that will lead to branch left or right?
P.S. I do understand the idea of the Hierarchical Softmax model using a binary tree and so on, but I don't know how the multiplications are done mathematically.
Thanks
To make things easy, assume that N is a power of 2. The binary tree will then have N-1 inner nodes. These nodes hook to WO with dimensions Fx(N-1).
Once you have computed a value for each inner node, calculate left and right branch values. Use something like a sigmoid function to assign to (say) the left branch. The right branch is just 1 minus the left.
To predict, find the maximum probability path starting from the root to a leaf.
To train, identify the correct leaf and identify the path of inner nodes to the root. Backpropagate starting with those log(N) nodes.

Estimate jaccard similarity mean

I am working with a network where I am trying to extract the mean value per vertexof the jaccard similarity. I am calculating this in R by using the igraph package. The similarity index estimates a value betweenn each two vertices. The network has 177 vertices, therefore 177 values. It may be easy, but I have not found out the best way to do it.
Sum the columns (or rows), subtract 1 (for the vertex similarity with itself), divide by n-1 rows (or columns)
library(igraph)
g <- make_ring(5)
m <-similarity(g, method="jaccard")
(colSums(m)-1)/(nrow(m)-1)
#[1] 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667

How to use the subset parameter of igraph's maximal.cliques function?

I have a large graph and I would to find the maximal clique involving a pair of vertices. I thought that the subset argument to igraph's maximal.clique function would do this, but either I'm using it wrong it or it does something completely different. I've spent a fair amount of time searching the web without luck.
Here's a minimal example showing the problem:
> library(igraph)
> packageVersion('igraph')
[1] ‘1.0.1’
> g = graph.empty(n=10, directed=FALSE)
> g = add.edges(g, c(1, 2))
> str(g)
IGRAPH U--- 10 1 --
+ edge:
[1] 1--2
> # This correctly results a clique.
> maximal.cliques(g, min=2)
[[1]]
+ 2/10 vertices:
[1] 2 1
> # These don't return anything!
> maximal.cliques(g, min=2, subset=1)
list()
> maximal.cliques(g, min=2, subset=c(1, 2))
list()
The subset argument is not for calculating the maximal cliques on a subset of the graph; it simply restricts the set of vertices that are used as starting points in the course of the Bron-Kerbosch algorithm when finding maximal cliques. The Bron-Kerbosch algorithm itself still searches the entire graph and is allowed to add or remove vertices from the current set that it considers as it pleases.
The only role of the subset argument is that it allows you to parallelize the maximal cliques computation on large graphs by partitioning the vertex set of the graph into a number of subsets and then running maximal.cliques on multiple CPUs or CPU cores with different subsets. It is not guaranteed that a maximal clique will be found if the starting subset includes any or all of its vertices; for instance, on my machine, the maximal clique 1--2 is found if I use a starting subset consisting of vertex 9 only:
> maximal.cliques(g, subset=c(9))
[[1]]
+ 2/10 vertices:
[1] 2 1
If you want to search for maximal cliques in a subgraph of the original graph, use induced_subgraph first, followed by max_cliques.

Problems with results when using betweenness with igraph

I am currently having problems with the results when using betweenness in igraph.
I have created the following network (which is a star network with node 1 in the center)
id <- c(1,1,1,1,1)
rv <- c(2,3,4,5,6)
df <- as.data.frame(cbind(id,rv))
Then I calculate the betweenness for each node and add it to a dataframe:
g3=graph.data.frame(df, directed=TRUE)
bFrame<-as.data.frame(as.table(betweenness(g3)))
The problem is: if you use directed=FALSE you will get that node 1 has a centrality of 10 which makes sense. If you on the other hand use directed=TRUE I get that 1 has a centrality of 0.
In consequence I have two questions:
1. Why is the centrality in the second condition 0?
2. Shouldn't it be 2* the value of the undirected condition? https://en.wikipedia.org/wiki/Betweenness_centrality
Thanks in advance
Pavel

igraph - How to calculate closeness method in iGraph to disconnected graphs

I use igraph in R for calculate graph measure, my graph make in a PIN that not Connected Graph and is Disconnected Graph.
closeness method for connected graph is good and right calculate, and for Disconnected graph in not Good!
library(igraph)
# Create of Graph Matrix for Test Closeness Centrality
g <- read.table(text="A B
1 2
2 4
3 4
3 5", header=TRUE)
gadj <- get.adjacency(graph.edgelist(as.matrix(g), directed=FALSE))
igObject <- graph.adjacency(gadj) # convert adjacency matrix to igraph object
gCloseness <- closeness(igObject,weights = NULL) # Assign Closeness to Variable for print
output :
[1] 0.1000000 0.1428571 0.1428571 0.1666667 0.1000000
my Disconnected Graph:
library(igraph)
# Create of Graph Matrix for Test Closeness Centrality
g <- read.table(text="A B
1 2
3 4
3 5", header=TRUE)
gadj <- get.adjacency(graph.edgelist(as.matrix(g), directed=FALSE))
igObject <- graph.adjacency(gadj) # convert adjacency matrix to igraph object
gCloseness <- closeness(igObject,weights = NULL) # Assign Closeness to Variable for print
output :
[1] 0.06250000 0.06250000 0.08333333 0.07692308 0.07692308
This output is Right ? and if right How to Calculate ?
Please read the documentation of the closeness function; it clearly states how igraph treats disconnected graphs:
If there is no (directed) path between vertex v and i then the total number of vertices is used in the formula instead of the path length.
The calculation then seems to be correct for me, although I would say that closeness centrality itself is not well-defined for disconnected graphs, and what igraph is using here is more of a hack (although a pretty standard hack) than a rigorous treatment of the problem. I would refrain from using closeness centrality on disconnected graphs.