finding edge sequences (and nodes) of branches in igraph - igraph

I would like to use igraph to find edge sequences corresponding to branches from a tree. Ideally, I would like to retain the branches in a data frame.
Consider this example:
library(igraph)
g <-erdos.renyi.game(50, 3/50)
mg <- minimum.spanning.tree(g)
diam <- get.diameter(mg)
E(mg)$color = "black"
E(mg, path = diam)$color = "purple"
E(mg, path = diam)$width = 6
plot(mg)
Here the main path is diam, the purple line. Mr. Flick already kindly answered how to find the edges which are incident on diam but not within diam:
EL <- difference( E(mg)[inc(diam)], E(mg, path = diam) )
E(mg)[EL]$color<-"blue"
E(mg)[EL]$width<-6
plot(mg)
One may call these the branch "stubs", i.e. the first edge of each branch. The question now is how do I find the entire edge sequence of each individual branch.
I have a feeling it is with iterators using nei and inc following Mr. Flick's intuition, but can't see a way so far.
Example of branches:
diameter path with branch stubs
In the image, the branches would be the (undirected) edge sequences
23-19-18
23-14
23-26
1-32-24-7
1-11-42-35-41
1-11-42-35-9-17
1-11-30-38
1-28
1-33-34
1-33-29
1-48-47
1-48-12
27-49-39
27-6-45
27-6-10
31-20
16-4

Related

Cut ultrasound signal between specific values using Octave

I have an ultrasound wave (graph axes: Volt vs microsecond) and need to cut the signal/wave between two specific value to further analyze this clipping. My idea is to cut the signal between 0.2 V (y-axis). The wave is sine shaped as shown in the figure with the desired cutoff points in red
In my current code, I'm cutting the signal between 1900 to 4000 ms (x-axis) (Aa = A(1900:4000);) and then I want to make the aforementioned clipping and proceed with the code.
Does anyone know how I could do this y-axis clipping?
Thanks!! :)
clear
clf
pkg load signal
for k=1:2
w=1
filename=strcat("PCB 2.1 (",sprintf("%01d",k),").mat")
load(filename)
Lthisrun=length(A);
Pico(k,1:Lthisrun)=A;
Aa = A(1900:4000);
Ah= abs(hilbert(Aa));
step=100;
hold on
i=1;
Ac=0;
for index=1:step:3601
Ac(i+1)=Ac(i)+Ah(i);
i=i+1
r(k)=trapz(Ac)
end
end
ok, you want to just look at values 'above the noise' in your data. Or, in this case, 'clip out' everything below 0.2V. the easiest way to do this is with logical indexing. You can take an array and create a sub array eliminating everything that doesn't meet a certain logical condition. See this example:
f = #(x) sin(x)./x;
x = [-100:.1:100];
y = f(x);
plot(x,y);
figure;
x_trim = x(y>0.2);
y_trim = y(y>0.2);
plot(x_trim, y_trim);
From your question it looks like you want to do the clipping after applying the horizontal windowing from 1900-4000. (you say that that is in milliseconds, but your image shows the pulse being much sooner than 1900 ms). In any case, something like
Ab = Aa(Aa > 0.2);
will create another array Ab that will only contain the portions of Aa with values above 0.2. You may need to do something similar (see the example) for the horizontal axis if your x-data is not just the element index.

Random graph generator

I am interested in generating weighted, directed random graphs with node constraints. Is there a graph generator in R or Python that is customizable? The only one I am aware of is igraph's erdos.renyi.game() but I am unsure if one can customize it.
Edit: the customizations I want to make are 1) drawing a weighted graph and 2) constraining some nodes from drawing edges.
In igraph python, you can use link the Erdos_Renyi class.
For constraining some nodes from drawing edges, this is controlled by the p value.
Erdos_Renyi(n, p, m, directed=False, loops=False) #these are the defaults
Example:
from igraph import *
g = Graph.Erdos_Renyi(10,0.1,directed=True)
plot(g)
By setting the p=0.1 you can see that some nodes do not have edges.
For the weights you can do something like:
g.ecount() # to find the number of edges
g.es["weights"] = range(1, g.ecount())
g.es["label"] = weights
plot(g)
Result:

Finding (un)supported edges (igraph)

Let's say there is a following graph created in R igraph:
ed <- c(1,2,2,3,3,1,2,4,3,5,4,5,5,6,6,4)
gr <- make_undirected_graph(ed)
plot(gr)
I'm trying to separate edges of the graph into two groups: "supported", i.e. belonging to connected triangles (in the aforementioned example: 1-2, 2-3, 3-1, 4-5, 5-6, 6-4) and "unsupported", i.e. not belonging to connected triangles (2-4, 3-5). Is there any way to do that in igraph?
Here is a solution I found combining triangles() and #GaborCsardi's approach mentioned in my last comment:
tr <- triangles(gr)
edges <- tapply(tr, rep(1:(length(tr)/3), each = 3), function(x) E(gr,path=c(x,x[1])))
However, this solution is still rather inefficient in case of large network.

Azure client manifest entry: n and r elements

While reviewing a client manifest provided by Azure Media Services for an HTTP Smooth Stream, I notice a new element (n) not found in previous IIS manifests and absent from Sam Zhang's blog.
According to previous manifests (clientManifestVersion 2.2), r means "repeat" and is used for compression - indicating duplicate fragment duration.
But by comparing two Azure manifests from the same stream at different times, you can see:
`<c t="868948936" d="2000" r="1770" n="136" />` // (# 8:21 PM)
`<c t="881664896" d="2000" r="1770" n="6494"/>` // (# 11:53 PM)
From what I understand,
d = 2000 indicates the fragment duration (2 seconds)
And where:
n1 = 136
n2 = 6494,
t1 = 868948936
t2 = 881664896,
n2 - n1 = 6358 * d = 12716000 + t1 = t2
Even though r is supposed to be a repeat, r remains the same while n increases over time... So what is r if it is unchanging, and what is n?
The n attribute is the zero-based index of the fragment, incremented by 1 for each new fragment. Just a meaningless counter: 0, 1, 2, 3, 4, ...
The r attribute indicates that r more fragments with the same duration follow the current fragment. It allows you to replace this:
<c t="1000" d="1000" />
<c t="2000" d="1000" />
<c t="3000" d="1000" />
<c t="4000" d="1000" />
With this much more compact representation:
<c t="1000" d="1000" r="3" />
You can think of it as just duplicating the XML element r number of times.
Edit: Based on the comment, I now understand the source of the confusion - the question is not actually about what these attributes are but why, with a live stream, does only n change as time goes along.
To understand this, you must understand how a live video is represented conceptually and how this differs from an on-demand video. The latter has a definite beginning and end, with a fixed number of fragments in between:
(start)123456789(end)
Whereas a live video by definition is one with no end - there may be a "last fragment" but new fragments are continually added to the end and what is currently the "last fragment" will change as time goes along:
(start)1234
(start)12345
(start)123456
Now this works all fine and super but you probably notice a problem here. Adaptive streaming technologies allow you to play any fragment of a video. If your video goes on, essentially, forever then the origin server must store an effectively infinite number of fragments! This cannot be allowed.
To solve this problem, adaptive streaming technologies introduce the concept of a DVR window - a sliding window over the video that contains all the data that can be viewed by players. Any data that slides out of range of this window can be discarded.
(start)[1]
(start)[12]
(start)[123]
(start)1[234]
(start)12[345]
(start)123[456]
(start)1234[567]
(start)12345[678]
(start)123456[789]
Let's discard the fragments we do not need and see how that looks. If your sliding window has a size 3 then the fragments visible to players would progress in time like this:
1
12
123
234
345
456
You notice that the size of the sliding window remains constant (once enough fragments are available to fill it) and that the index of the first fragment plus the sliding window size is sufficient to represent the entire sliding window.
There you have it: r is the number of fragments in the sliding window and n is the index of the first fragment! This is not the only way to represent live video but it is certainly the most efficient, due to the obvious small size of the data in the manifest.

Get the most probable color from a words set

Are there any libraries existing or methods that let you to figure out the most probable color for a words set? For example, cucumber, apple, grass, it gives me green color. Did anyone work in that direction before?
If i have to do that, i will try to search images based on the words using google image or others and recognize the most common color of top n results.
That sounds like a pretty reasonable NLP problem and one thats very easy to handle via map-reduce.
Identify a list of words and phrases that you call colors ['blue', 'green', 'red', ...].
Go over a large corpus of sentences, and for the sentences that mention a particular color, for every other word in that sentence, note down (word, color_name) in a file. (Map Step)
Then for each word you have seen in your corpus, aggregate all the colors you have seen for it to get something like {'cucumber': {'green': 300, 'yellow': 34, 'blue': 2}, 'tomato': {'red': 900, 'green': 430'}...} (Reduce Step)
Provided you use a large enough corpus (something like wikipedia), and you figure out how to prune really small counts, rare words, you should be able to make pretty comprehensive and robust dictionary mapping millions of the items to their colors.
Another way to do that is to do a text search in google for combinations of colors and the word in question and take the combination with the highest number of results. Here's a quick Python script for that:
import urllib
import json
import itertools
def google_count(q):
query = urllib.urlencode({'q': q})
url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % query
search_response = urllib.urlopen(url)
search_results = search_response.read()
results = json.loads(search_results)
data = results['responseData']
return int(data['cursor']['estimatedResultCount'])
colors = ['yellow', 'orange', 'red', 'purple', 'blue', 'green']
# get a list of google search counts
res = [google_count('"%s grass"' % c) for c in colors]
# pair the results with their corresponding colors
res2 = list(itertools.izip(res, colors))
# get the color with the highest score
print "%s is %s" % ('grass', sorted(res2)[-1][1])
This will print:
grass is green
Daniel's and Xi.lin's answers are very good ideas. Along the same axis, we could combine both with an approach similar to Xilin's but more simple: Query Google Image with the word you want to find the color associated with + a "Color" filter (see in the lower left bar). And see which color yields more results.
I would suggest using a tightly defined set of sources if possible such as Wikipedia and Wordnet.
Here, for example, is Wordnet for "panda":
S: (n) giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca
(large black-and-white herbivorous mammal of bamboo forests of China and Tibet;
in some classifications considered a member of the bear family or of a separate
family Ailuropodidae)
S: (n) lesser panda, red panda, panda, bear cat, cat bear,
Ailurus fulgens (reddish-brown Old World raccoon-like carnivore;
in some classifications considered unrelated to the giant pandas)
Because of the concise, carefully constructed language it is highly likely that any colour words will be important. Here you can see that pandas are both black-and-white and reddish-brown.
If you identify subsections of Wikipedia (e.g. "Botanical Description") this will help to increase the relevance of your results. Also the first image in Wikipedia is very likely to be the best "definitive" one.
But, as with all statistical methods, you will get false positives (and negatives , though these are probably less of a problem).