I am currently having problems with the results when using betweenness in igraph.
I have created the following network (which is a star network with node 1 in the center)
id <- c(1,1,1,1,1)
rv <- c(2,3,4,5,6)
df <- as.data.frame(cbind(id,rv))
Then I calculate the betweenness for each node and add it to a dataframe:
g3=graph.data.frame(df, directed=TRUE)
bFrame<-as.data.frame(as.table(betweenness(g3)))
The problem is: if you use directed=FALSE you will get that node 1 has a centrality of 10 which makes sense. If you on the other hand use directed=TRUE I get that 1 has a centrality of 0.
In consequence I have two questions:
1. Why is the centrality in the second condition 0?
2. Shouldn't it be 2* the value of the undirected condition? https://en.wikipedia.org/wiki/Betweenness_centrality
Thanks in advance
Pavel
Related
I want to count the motifs of size 4 in a tree graph:
library(igraph)
g <- barabasi.game(100)
census.motifs <- motifs(g, size=4)[c(4,8,13,30)]
There are 217 possible graphs with 4 vertices, but only 4 of them can appear in a directed rooted tree.
Is there a way to tell igraph that it only has to look for those 4? Or a faster/clever way to do this?
The four motifs in a directed rooted tree could be counted as k-instars using the ergm package http://svitsrv25.epfl.ch/R-doc/library/ergm/html/ergm-terms.html
A k-instar is a set of k nodes all sharing one common root. If n is the number of nodes in your tree, the counts for your 4 motifs will be the number of 3-instars (fully connected), (n-3) times the number of 2-instars (two edges connecting to root and one other node), (n-2) choose 2 times the number of 1-instars (one edge connecting to the the root and two other nodes), and n choose 4 minus the sum of the previous three counts. In R you could use,
library(intergraph)
library(ergm)
library(igraph)
n <- 100
g <- barabasi.game(n)
kistars <- summary(asNetwork(g)~istar(1:3))
kistars[3]
(n-3)*kistars[2]
choose(n-2,2)*kistars[1]
choose(n,4)*sum(kistars)
This question seems pretty stupid but I actually fail to find a simple solution to this. I have a csv file that is structured like this:
0 21 34.00 34.00
1 23 35.00 25.00
2 25 45.00 65.00
The first column is the node's id, the second is an unimportant attribute. The 3rd and 4th attribute are supposed to be the x and y position of the nodes.
I can import the file into the Data Laboratory without problems, but I fail to explain to Gephi to use the x y attributes as the corresponding properties. All I want to achieve is that Gephi sets the x Property to the value of the x Attribute (and y respectively). Also see picture.
Thanks for your help!
In the Layout window, you can select "Geo Layout" and define which columns are used as Latitude and Longitude.
The projection might come in weird if you do not actually have GeoData, but for me, this is fine.
In Gephi 0.8 there was a plugin called Recast column. This plugin is unfortunately not ported to Gephi 0.9 yet, but it allowed you to set Standard (hidden) Columns in the Node Table, from visible values in the nodes table. Thus if you have two columns of type Float or Decimal that represent your coordinates, you could set the coordinate values of your nodes.
I have a large graph and I would to find the maximal clique involving a pair of vertices. I thought that the subset argument to igraph's maximal.clique function would do this, but either I'm using it wrong it or it does something completely different. I've spent a fair amount of time searching the web without luck.
Here's a minimal example showing the problem:
> library(igraph)
> packageVersion('igraph')
[1] ‘1.0.1’
> g = graph.empty(n=10, directed=FALSE)
> g = add.edges(g, c(1, 2))
> str(g)
IGRAPH U--- 10 1 --
+ edge:
[1] 1--2
> # This correctly results a clique.
> maximal.cliques(g, min=2)
[[1]]
+ 2/10 vertices:
[1] 2 1
> # These don't return anything!
> maximal.cliques(g, min=2, subset=1)
list()
> maximal.cliques(g, min=2, subset=c(1, 2))
list()
The subset argument is not for calculating the maximal cliques on a subset of the graph; it simply restricts the set of vertices that are used as starting points in the course of the Bron-Kerbosch algorithm when finding maximal cliques. The Bron-Kerbosch algorithm itself still searches the entire graph and is allowed to add or remove vertices from the current set that it considers as it pleases.
The only role of the subset argument is that it allows you to parallelize the maximal cliques computation on large graphs by partitioning the vertex set of the graph into a number of subsets and then running maximal.cliques on multiple CPUs or CPU cores with different subsets. It is not guaranteed that a maximal clique will be found if the starting subset includes any or all of its vertices; for instance, on my machine, the maximal clique 1--2 is found if I use a starting subset consisting of vertex 9 only:
> maximal.cliques(g, subset=c(9))
[[1]]
+ 2/10 vertices:
[1] 2 1
If you want to search for maximal cliques in a subgraph of the original graph, use induced_subgraph first, followed by max_cliques.
I'm currently trying to measure the Jaccard Distance between tweets in a dataset
This is where the dataset is
http://www3.nd.edu/~dwang5/courses/spring15/assignments/A2/Tweets.json
I've tried a few things to measure the distance
This is what I have so far
I saved the linked dataset to a file called Tweets.json
json_alldata <- fromJSON(sprintf("[%s]", paste(readLines(file("Tweets.json")),collapse=",")))
Then I converted json_alldata to tweet.features and got rid of the geo column
# get rid of geo column
tweet.features = json_alldata
tweet.features$geo <- NULL
These are what the first two tweets look like
tweet.features$text[1]
[1] "RT #ItsJennaMarbles: Reports of Marathon Runners that crossed finish line and continued to run to Mass General Hospital to give blood to victims. #PrayforBoston"
> tweet.features$text[2]
[1] "RT #NBCSN: Reports of Marathon Runners that crossed finish line and continued to run to Mass General Hospital to give blood to victims #PrayforBoston"
First thing I tried was using the method stringdist which is under the stringdist library
install.packages("stringdist")
library(stringdist)
#This works?
#
stringdist(tweet.features$text[1], tweet.features$text[2], method = "jaccard")
When I run that, I get
[1] 0.1621622
I'm not sure that's correct, though. A intersection B = 23, and A union B = 25. The Jaccard distance is A intersection B/A union B -- right? So by my calculation, the Jaccard distance should be 0.92?
So I figured I could do it by sets. Simply calculate intersection and union and divide
This is what I tried
# Jaccard distance is the intersection of A and B divided by the Union of A and B
#
#create set for First Tweet
A1 <- as.set(tweet.features$text[1])
A2 <- as.set(tweet.features$text[2])
When I try to do intersection, I get this: The output is just list()
Intersection <- intersect(A1, A2)
list()
When I try Union, I get this:
union(A1, A2)
[[1]]
[1] "RT #ItsJennaMarbles: Reports of Marathon Runners that crossed finish line and continued to run to Mass General Hospital to give blood to victims. #PrayforBoston"
[[2]]
[1] "RT #NBCSN: Reports of Marathon Runners that crossed finish line and continued to run to Mass General Hospital to give blood to victims #PrayforBoston"
This doesn't seem to be grouping the words into a single set.
I figured I'd be able to divide the intersection by the union. But I guess I would need the program to count the number or words in each set, then do the calculations.
Needless to say, I'm a bit stuck and I'm not sure if I'm on the right track.
Any help would be appreciated. Thank you.
intersect and union expect vectors (as.set does not exist). I think you want to compare words so you can use strsplit but the way the split is done belongs to you. An example below:
tweet.features <- list(tweet1="RT #ItsJennaMarbles: Reports of Marathon Runners that crossed finish line and continued to run to Mass General Hospital to give blood to victims. #PrayforBoston",
tweet2= "RT #NBCSN: Reports of Marathon Runners that crossed finish line and continued to run to Mass General Hospital to give blood to victims #PrayforBoston")
jaccard_i <- function(tw1, tw2){
tw1 <- unlist(strsplit(tw1, " |\\."))
tw2 <- unlist(strsplit(tw2, " |\\."))
i <- length(intersect(tw1, tw2))
u <- length(union(tw1, tw2))
list(i=i, u=u, j=i/u)
}
jaccard_i(tweet.features[[1]], tweet.features[[2]])
$i
[1] 20
$u
[1] 23
$j
[1] 0.8695652
Is this want you want?
The strsplit is here done for every space or dot. You may want to refine the split argument from strsplit and replace " |\\." for something more specific (see ?regex).
I use igraph in R for calculate graph measure, my graph make in a PIN that not Connected Graph and is Disconnected Graph.
closeness method for connected graph is good and right calculate, and for Disconnected graph in not Good!
library(igraph)
# Create of Graph Matrix for Test Closeness Centrality
g <- read.table(text="A B
1 2
2 4
3 4
3 5", header=TRUE)
gadj <- get.adjacency(graph.edgelist(as.matrix(g), directed=FALSE))
igObject <- graph.adjacency(gadj) # convert adjacency matrix to igraph object
gCloseness <- closeness(igObject,weights = NULL) # Assign Closeness to Variable for print
output :
[1] 0.1000000 0.1428571 0.1428571 0.1666667 0.1000000
my Disconnected Graph:
library(igraph)
# Create of Graph Matrix for Test Closeness Centrality
g <- read.table(text="A B
1 2
3 4
3 5", header=TRUE)
gadj <- get.adjacency(graph.edgelist(as.matrix(g), directed=FALSE))
igObject <- graph.adjacency(gadj) # convert adjacency matrix to igraph object
gCloseness <- closeness(igObject,weights = NULL) # Assign Closeness to Variable for print
output :
[1] 0.06250000 0.06250000 0.08333333 0.07692308 0.07692308
This output is Right ? and if right How to Calculate ?
Please read the documentation of the closeness function; it clearly states how igraph treats disconnected graphs:
If there is no (directed) path between vertex v and i then the total number of vertices is used in the formula instead of the path length.
The calculation then seems to be correct for me, although I would say that closeness centrality itself is not well-defined for disconnected graphs, and what igraph is using here is more of a hack (although a pretty standard hack) than a rigorous treatment of the problem. I would refrain from using closeness centrality on disconnected graphs.