Is there a way to calculate theoretical maximum betweenness a vertex could have by knowing the total numbers of vertices? - igraph

Can you calculate the potential maximum betweenness for a vertex in a graph?
This is assuming using all defaults for the betweenness() function.
I would think that the maximum betweenness for any vertex would be lower than the number of unique pairs (something approximate to number of unique pairs - total number of vertices).
I know igraph can calculate multiple shortest paths, but from simple models of betweenness, it does not "double count" these.
Thank you!

The highest possible number of shortest paths passing through a vertex in an undirected graph is (V-1)(V-2)/2 where V is the vertex count. This is simply $(V-1) \choose 2$, as we exclude that one vertex when considering pairs.
This value is realized for the center of a star graph.
This is in fact used in igraph's "centralization" functions:
https://igraph.org/c/html/latest/igraph-Structural.html#igraph_centralization_betweenness_tmax

Related

Negative binomial regression SPSS - Quantity vs Distance

I have quite a simple dataset of quantities of litter found in a national park located on an island. For each data point I have corresponding GPS coordinates, and I've derived the distance of each point to the shore. My aim: observe if the quantities of litter increase or decrease with the distance to shore. I'm assuming that quantities of litter will increase with a decrease in distance, as litter is commonly found on beaches etc.
Quantities of litter are counts, i.e. non-parametric. Additionally I've tested the data to see if it follows a Poisson model and it does not (p-value <0.05), and I have a larger variance than the mean for each variable (quantity and distance) seemingly overdispersed. Therefore, I went on using a negbin regression, with an output as follows:
Omnibus test is highly significant (p=0.000). I was just slightly puzzled on the parameter estimates, and generally hoping that this approach makes sense. Any input much appreciated.
Interpreting the parameter estimates requires knowing the link function specified, which would be a log link if you specified your model as a negative binomial with log link on the Type of Model tab, but could be something else if you specified a custom model using a negative binomial distribution with another link (which could be identity, negative binomial, or power, instead).
If it's a log link, then for a distance of 0 (at the shore), you predict exp(2,636) for the count, or about 13,957. For a given distance from the shore, multiply the distance by -,042 and add that to the 2,636 value, then take the exponential function to the resulting power. So for every unit away from the shore you move, the log of the prediction decreases by ,042, and the prediction is multiplied by about ,959. One unit away, you predict about 13,383 for the count, two units away, about 12,833, etc. So the results are in general accord with your hypothesis. Different specific calculations would be required if you used a different link function.

How are matrices multiplied in Hierarchical Softmax model?

As I understood, the simple word2vec approach uses two matrices like the following:
Assuming that the corpus consists of N words.
Weighted input matrix (WI) with dimensions NxF (F is number of features).
Weighted output matrix (WO) with dimensions FxN.
We multiply one hot vector 1xN with WI and get a neurone 1xF.
Then we multiply the neurone with WO and get an output vector 1xN.
We apply softmax function and choose the highest entry (probability) in the vector.
Question: how is this illustrated when using the Hierarchical Softmax model?
What will be multiplied with which matrix to get the 2 dimensional vector that will lead to branch left or right?
P.S. I do understand the idea of the Hierarchical Softmax model using a binary tree and so on, but I don't know how the multiplications are done mathematically.
Thanks
To make things easy, assume that N is a power of 2. The binary tree will then have N-1 inner nodes. These nodes hook to WO with dimensions Fx(N-1).
Once you have computed a value for each inner node, calculate left and right branch values. Use something like a sigmoid function to assign to (say) the left branch. The right branch is just 1 minus the left.
To predict, find the maximum probability path starting from the root to a leaf.
To train, identify the correct leaf and identify the path of inner nodes to the root. Backpropagate starting with those log(N) nodes.

Making sense of soundMixer.computeSpectrum

All examples that I can find on the Internet just visualize the result array of the function computeSpectrum, but I am tasked with something else.
I generate a music note and I need by analyzing the result array to be able to say what note is playing. I figured out that I need to set the second parameter of the function call 'FFTMode' to true and then it returns sound frequencies. I thought that really it should return only one non-zero value which I could use to determine what note I generated using Math.sin function, but it is not the case.
Can somebody suggest a way how I can accomplish the task? Using the soundMixer.computeSpectrum is a requirement because I am going to analyze more complex sounds later.
FFT will transform your signal window into set of Nyquist sine waves so unless 440Hz is one of them you will obtain more than just one nonzero value! For a single sine wave you would obtain 2 frequencies due to aliasing. Here an example:
As you can see for exact Nyquist frequency the FFT response is single peak but for nearby frequencies there are more peaks.
Due to shape of the signal you can obtain continuous spectrum with peaks instead of discrete values.
Frequency of i-th sample is f(i)=i*samplerate/N where i={0,1,2,3,4,...(N/2)-1} is sample index (first one is DC offset so not frequency for 0) and N is the count of samples passed to FFT.
So in case you want to detect some harmonics (multiples of single fundamental frequency) then set the samplerate and N so samplerate/N is that fundamental frequency or divider of it. That way you would obtain just one peak for harmonics sinwaves. Easing up the computations.

Determination of formula for a 3 independent variable issue

I have 3 arrays of X, Y and Z. Each have 8 elements. Now for each possible combination of (X,Y,Z) I have a V value.
I am looking to find a formula e.g. V=f(X,Y,Z). Any idea about how that can be done?
Thank you in advance,
Astry
You have a function sampled on a (possibly nonuniform) 3D grid, and want to evaluate the function at any arbitrary point within the volume. One way to approach this (some say the best) is as a multivariate spline evaluation. https://en.wikipedia.org/wiki/Multivariate_interpolation
First, you need to find which rectangular parallelepiped contains the (x,y,z) query point, then you need to interpolate the value from the nearest points. The easiest thing is to use trilinear interpolation from the nearest 8 points. If you want a smoother surface, you can use quadratic interpolation from 27 points or cubic interpolation from 64 points.
For repeated queries of a tricubic spline, your life would be a bit easier by preprocessing the spline to generate Hermite patches/volumes, where your sample points not only have the function value, but also its derivatives (∂/∂x, ∂/∂y, ∂/∂z). That way you don't need messy code for the boundaries at evaluation time.

get random index numbers from a matrix, fortran 90

I am looking for a function or a way to get the index numbers of a 2D matrix:
my example is, I have A(Ly,Lx) where Ly = 100 and Lx = 100
I want to get a random index number of the matrix, such as : Random_node(A) = (random y, random x)
Then I want to do this repeatedly having the constraint that I don't want my random points to be repeated or even not to be close one to each other following a threshold of (let's say) 10 nodes of radius. The matrix is an eulerian 2D matrix (y,x).
Is at least the first question straightforward?
Thank you all!
Albert P
Here's one way of getting a random set of locations in your 100x100 matrix. First, declare a 100x100 matrix of reals:
real, dimension(100,100) :: randarray
then, put a random number into each element of that array
call random_number(randarray)
Now, an expression such as
randarray > 0.9
returns a logical array containing, approximately, 10% true values and 90% false. By tracking down the locations of the true values you have the random x-es and y-es that you seek. Indeed you may not need to find those locations at all, you can simply use the expression in masked assignments and similar operations, for example
where(randarray>0.9) a = func()
as long, of course, as func returns a scalar or a 100x100 array.
This approach guarantees that each location is different from all the others.
It does not however, address your constraint that the 'random' locations should not be too close to each other. That constraint, of course, is a little inconsistent with randomness.
You could, I suppose, break your 100x100 array into 10x10 blocks and choose, randomly, one element in each block. Would that be a good compromise between your constraints ?