erdos.renyi.game input parameters issues - igraph

According to the help file
http://www.inside-r.org/packages/cran/igraph/docs/erdos.renyi.game
the erdos.renyi.game function is supposed to accept n=The number of vertices in the graph and m= the number of edges in the graph as input parameters.
The dataset I am working with has 6 vertices and 25 edges, so when i try
g = erdos.renyi.game(6,25)
I get an error
Error in .Call("R_igraph_erdos_renyi_game", as.numeric(n), as.numeric(type1), :
At games.c:569 : Invalid probability given, Invalid value
Not sure where I am going wrong, appreciate any advise on this topic.

Just use erdos.renyi.game(n=6, m=25) erdos.renyi.game(6, 12, type="gnm") and it will work. You have to define explicitly that the second parameter is for the value of m and not p.

erdos.renyi.game(n, p.or.m, type=c("gnp", "gnm"),
directed = FALSE, loops = FALSE, ...)
n
The number of vertices in the graph.
p.or.m
Either the probability for drawing an edge between two arbitrary vertices (G(n,p) graph), or the number of edges in the graph (for G(n,m) graphs).
type
The type of the random graph to create, either gnp (G(n,p) graph) or gnm (G(n,m) graph).
directed
Logical, whether the graph will be directed, defaults to FALSE.
loops
Logical, whether to add loop edges, defaults to FALSE.
...
Additional arguments, ignored.
Details
In G(n,p) graphs, the graph has ‘n’ vertices and for each edge the probability that it is present in the graph is ‘p’.
In G(n,m) graphs, the graph has ‘n’ vertices and ‘m’ edges, and the ‘m’ edges are chosen uniformly randomly from the set of all possible edges. This set includes loop edges as well if the loops parameter is TRUE. random.graph.game is an alias to this function.
Example:
g <- erdos.renyi.game(1000, 1/1000)
degree.distribution(g)

Related

Do the multiple heads in Multi head attention actually lead to more parameters or different outputs?

I am trying to understand Transformers. While I understand the concept of the encoder-decoder structure and the idea behind self-attention what I am stuck at is the "multi head part" of the "MultiheadAttention-Layer".
Looking at this explanation https://jalammar.github.io/illustrated-transformer/, which I generally found very good, it appears that multiple weight matrices (one set of weight matrices per head) are used to transform the original input value into the query, key and value, which are then used to calculate the attention scores and the actual output of the MultiheadAttention layer. I also understand the idea of multiple heads to the individual attention heads can focus on different parts (as depicted in the link).
However, this seems to contradict other observations I have made:
In the original paper https://arxiv.org/abs/1706.03762, it is stated that the input is split into parts of equal size per attention head.
So, for example I have:
batch_size = 1
sequence_length = 12
embed_dim = 512 (I assume that the dimension for ```query```, ```key``` and ```value``` are equal)
Then the shape of my query, key and token would each be [1, 12, 512]
We assume we have five heads, so num_heads = 2
This results in a dimension per head of 512/2=256. According to my understanding this should result in the shape [1, 12, 256] for each attention head.
So, am I correct in assuming that this depiction https://jalammar.github.io/illustrated-transformer/ just does not display this factor appropriately?
Does the splitting of the input into different heads actually lead to different calculations in the layer or is it just done to make computations faster?
I have looked at the implementation in torch.nn.MultiheadAttention and printed out the shapes at various stages during the forward pass through the layer. To me it appears that the operations are conducted in the following order:
Use the in_projection weight matrices to get the query, key and value from the original inputs. After this the shape for query, key and value is [1, 12, 512]. From my understanding the weights in this step are the parameters that are actually learned in the layer during training.
Then the shape is modified for the multiple heads into [2, 12, 256].
After this the dot product between query and key is calculated, etc.. The output of this operation has the shape [2, 12, 256].
Then the output of the heads is concatenated which results in the shape [12, 512].
The attention_output is multiplied by the output projection weight matrices and we get [12, 1, 512] (The batch size and the sequence_length is sometimes switched around). Again here we have weights that are being trained inside the matrices.
I printed the shape of the parameters in the layer for different num_heads and the amount of the parameters does not change:
First parameter: [1536,512] (The input projection weight matrix, I assume, 1536=3*512)
Second parameter: [1536] (The input projection bias, I assume)
Third parameter: [512,512] (The output projection weight matrix, I assume)
Fourth parameter: [512] (The output projection bias, I assume)
On this website https://towardsdatascience.com/transformers-explained-visually-part-3-multi-head-attention-deep-dive-1c1ff1024853, it is stated that this is only a "logical split". This seems to fit my own observations using the pytorch implementation.
So does the number of attention heads actually change the values that are outputted by the layer and the weights learned by the model? The way I see it, the weights are not influenced by the number of heads.
Then how can multiple heads focus on different parts (similar to the filters in convolutional layers)?

How to convert a directed graph to its most minimal form?

I'm dealing with rooted, directed, potentially cyclic graphs. Each vertex in the graph has a label, which might or might not be unique. Edges do not have labels. The graph has a designated root vertex from which every vertex is reachable. The order of the edges outgoing from a vertex is relevant.
For my purposes, a vertex is equal to another vertex if they share the same label, and if their outgoing edges are also considered equal (and are in the same order). Two edges are equal if they have the same direction and if the vertices at their corresponding ends are equal.
Because of the equality rules above, a graph can contain multiple "sections" that are effectively equal. For example, in the graph below, there are two isomorphic sections containing vertices with labels {1, 2, 3, 4}. The root of the graph is vertex 0.
(source: graphonline.ru)
I need to be able to identify sections that are identical, and then remove all duplication, without changing the "meaning" of the graph (with regard to the equality rules above). Using the above example as input, I need to produce this:
(source: graphonline.ru)
Is there a known way of doing this within polynomial time?
The solution that ended up working was to essentially run the recursive equality check against every pair of vertices with the same label.
Let S = all pairs of vertices with the same label
For each s in S:
Compare the two vertices a and b in s by recursively comparing their children
If they compare as equal, take all edges in the graph pointing to b, and point them to a instead

How to use return_sequences option and TimeDistributed layer in Keras?

I have a dialog corpus like below. And I want to implement a LSTM model which predicts a system action. The system action is described as a bit vector. And a user input is calculated as a word-embedding which is also a bit vector.
t1: user: "Do you know an apple?", system: "no"(action=2)
t2: user: "xxxxxx", system: "yyyy" (action=0)
t3: user: "aaaaaa", system: "bbbb" (action=5)
So what I want to realize is "many to many (2)" model. When my model receives a user input, it must output a system action.
But I cannot understand return_sequences option and TimeDistributed layer after LSTM. To realize "many-to-many (2)", return_sequences==True and adding a TimeDistributed after LSTMs are required? I appreciate if you would give more description of them.
return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence.
TimeDistributed: This wrapper allows to apply a layer to every temporal slice of an input.
Updated 2017/03/13 17:40
I think I could understand the return_sequence option. But I am not still sure about TimeDistributed. If I add a TimeDistributed after LSTMs, is the model the same as "my many-to-many(2)" below? So I think Dense layers are applied for each output.
The LSTM layer and the TimeDistributed wrapper are two different ways to get the "many to many" relationship that you want.
LSTM will eat the words of your sentence one by one, you can chose via "return_sequence" to outuput something (the state) at each step (after each word processed) or only output something after the last word has been eaten. So with return_sequence=TRUE, the output will be a sequence of the same length, with return_sequence=FALSE, the output will be just one vector.
TimeDistributed. This wrapper allows you to apply one layer (say Dense for example) to every element of your sequence independently. That layer will have exactly the same weights for every element, it's the same that will be applied to each words and it will, of course, return the sequence of words processed independently.
As you can see, the difference between the two is that the LSTM "propagates the information through the sequence, it will eat one word, update its state and return it or not. Then it will go on with the next word while still carrying information from the previous ones.... as in the TimeDistributed, the words will be processed in the same way on their own, as if they were in silos and the same layer applies to every one of them.
So you dont have to use LSTM and TimeDistributed in a row, you can do whatever you want, just keep in mind what each of them do.
I hope it's clearer?
EDIT:
The time distributed, in your case, applies a dense layer to every element that was output by the LSTM.
Let's take an example:
You have a sequence of n_words words that are embedded in emb_size dimensions. So your input is a 2D tensor of shape (n_words, emb_size)
First you apply an LSTM with output dimension = lstm_output and return_sequence = True. The output will still be a squence so it will be a 2D tensor of shape (n_words, lstm_output).
So you have n_words vectors of length lstm_output.
Now you apply a TimeDistributed dense layer with say 3 dimensions output as parameter of the Dense. So TimeDistributed(Dense(3)).
This will apply Dense(3) n_words times, to every vectors of size lstm_output in your sequence independently... they will all become vectors of length 3. Your output will still be a sequence so a 2D tensor, of shape now (n_words, 3).
Is it clearer? :-)
return_sequences=True parameter:
If We want to have a sequence for the output, not just a single vector as we did with normal Neural Networks, so it’s necessary that we set the return_sequences to True. Concretely, let’s say we have an input with shape (num_seq, seq_len, num_feature). If we don’t set return_sequences=True, our output will have the shape (num_seq, num_feature), but if we do, we will obtain the output with shape (num_seq, seq_len, num_feature).
TimeDistributed wrapper layer:
Since we set return_sequences=True in the LSTM layers, the output is now a three-dimension vector. If we input that into the Dense layer, it will raise an error because the Dense layer only accepts two-dimension input. In order to input a three-dimension vector, we need to use a wrapper layer called TimeDistributed. This layer will help us maintain output’s shape, so that we can achieve a sequence as output in the end.

Limit the number o edges between vertex in mxGraph

Is there a function to prevent more than onde edge between two vertex in mxGraph? Actually I'm using mxGraph.multiplicities, however it limit the number of edges between all types of vertex and not between on type of edge.
Usually you will want to accomplish this via setting setMultigraph to false.
However if you need to distinguish between different kinds of vertices or even having edges with a direction (allowing to connect both A->B and B->A), the way I did it in the past was by overloading getEdgeValidationError, where your logic can determine if and when 2 vertices can be connected.

get random index numbers from a matrix, fortran 90

I am looking for a function or a way to get the index numbers of a 2D matrix:
my example is, I have A(Ly,Lx) where Ly = 100 and Lx = 100
I want to get a random index number of the matrix, such as : Random_node(A) = (random y, random x)
Then I want to do this repeatedly having the constraint that I don't want my random points to be repeated or even not to be close one to each other following a threshold of (let's say) 10 nodes of radius. The matrix is an eulerian 2D matrix (y,x).
Is at least the first question straightforward?
Thank you all!
Albert P
Here's one way of getting a random set of locations in your 100x100 matrix. First, declare a 100x100 matrix of reals:
real, dimension(100,100) :: randarray
then, put a random number into each element of that array
call random_number(randarray)
Now, an expression such as
randarray > 0.9
returns a logical array containing, approximately, 10% true values and 90% false. By tracking down the locations of the true values you have the random x-es and y-es that you seek. Indeed you may not need to find those locations at all, you can simply use the expression in masked assignments and similar operations, for example
where(randarray>0.9) a = func()
as long, of course, as func returns a scalar or a 100x100 array.
This approach guarantees that each location is different from all the others.
It does not however, address your constraint that the 'random' locations should not be too close to each other. That constraint, of course, is a little inconsistent with randomness.
You could, I suppose, break your 100x100 array into 10x10 blocks and choose, randomly, one element in each block. Would that be a good compromise between your constraints ?