igraph: given a list of vertices, find the smallest connected subgraph - igraph

I have the following problem: I have a relatively large graph and would like to extract a connected subgraph given a set of vertices, which might not be directly connected. Example:
library(igraph)
Test <- graph(c("a", "b", "a", "c", "a", "d", "b", "e", "b", "f",
"c", "g", "c", "h", "d", "i"))
plot(Test, layout=layout_as_tree)
now I would like to extract the (smallest) subgraph that contains e.g. vertices "e", "c" and "g".
Is there an easy way to do that in the igraph package?
thanks for any suggestions!
cheers, jo

Got it! It's easy with igraph:
subnodes <- c("e", "c", "g")
needNodes <- character()
## loop through all nodes and calculate the path to each other node
for(i in 1:(length(subnodes)-1)){
paths <- shortest_paths(Test, from=subnodes[i],
to=subnodes[(i+1):length(subnodes)],
mode="all")
needNodes <- unique(c(needNodes, unlist(lapply(paths$vpath, names))))
}
## subset the graph
subGr <- induced_subgraph(Test, vids=needNodes)
## looks good:
plot(subGr, layout=layout_as_tree)
Thanks for the nice igraph package!
cheers, jo

Related

How to search a JSON db

Do you by any chance know how I should structure my search to access the 'a', 'b', 'c' in the following:
{"_default": {"1": {"Pc": "2429546524130460032", "Pf": "2429519276857919232", "Points": [{"P": "2428316170619108992", "a": "0.0690932185744956", "b": "2.6355498567408557", "c": "0.4369495787854096"}...
Where there are a total of 10 Points in several thousand objects (the "1" at the beginning is the first object). I am able to access "Pc" and "Pf", but if I try:
db.search(Point.Points['a'] == '0.0690932185744956'
I get an empty set.
Thoughts?

Clear empty values from JSON (actually array) field in MYSQL

I have a table, in this table I have a JSON field. In this field I store arrays with some data. For example:
["a", "", "c"]
["", "", ""]
["a", "b", "c"]
I need to clear this field from empty values and get:
["a", "c"]
null
["a", "b", "c"]
Then update the filed values.
Using REPLACE:
REPLACE(REPLACE(emails,'"", ',''), '[""]','null')
Working example here in SQLfiddle
#PaulSpiegel:
Doesn't work for ["", "b", ""]
Solved:
REPLACE(REPLACE(REPLACE(emails,'"", ',''), '[""]','null'),', ""','')

Re-use Vega-Lite axis when switching between data

I am trying to port some D3.js graphs to Vega Lite to make it faster to iterate on new designs. The current example is a bar char with buttons underneath which change the data being displayed. Importantly, the y-label of the data will change as the data does (the x-does not) as does the need for a legend (some sets of data have more than one series, others just 1 and don't need a legend)
I have more or less reproduced the graphs themselves in Vega Lite, however I am having an issue ensuring the plots have their axes aligned. Even in the case where there is no legend in either plot, the change in y-label/y-values means the fixed sizing moves things around.
I have uploaded a GIF to show what I mean.
Currently the buttons function by taking the Vega Lite template and updating it based on a new set of data (basically just changing the dataset and labels, domain, etc).
Is there any way to align these or is that out of scope for Vega Lite?
EDIT:
After tinkering a bit I have found it seems to be a padding issue (see new GIF). What I changed was the formatting of the yaxis and pushing the yaxis back from the axis which seemed to 'brace' the plot against everything else. This shows that in principle this alignment (even if by fluke/brute force) is possible. However, when i close the sidebar of the responsive page I am on, the problem re-emerges. I also set 'align':'all' and 'padding':10. A higher padding 'braces' it more but then just fills the place with loads of whitespace.
The issue seems to be that the length of the labels on the y-axis change. you could fix the extent to be at least some large enough number.
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {
"values": [
{"a": "A", "b": 28}, {"a": "B", "b": 55}, {"a": "C", "b": 43},
{"a": "D", "b": 91}, {"a": "E", "b": 81}, {"a": "F", "b": 53},
{"a": "G", "b": 19}, {"a": "H", "b": 87}, {"a": "I", "b": 52}
]
},
"mark": "bar",
"encoding": {
"x": {"field": "a", "type": "ordinal"},
"y": {"field": "b", "type": "quantitative", "axis": {"minExtent": 100}}
}
}
This is a tricky issue since you are asking to align across different charts that have no knowledge of each other.
I'm pleased to report a functioning (for now at least) solution which was arrived at with the help of dominik. Specifically I needed to turn the autosizing off and pass through custom left/right padding values which wedges the plot in place for both the normal and expanded views, e.g.
"autosize":{"type":"none"},
"padding":{"top":30,"left":100,"right":100,"bottom":30},

Transform sequence of data into JSON for D3.js visualization

I have a data that shows a series of actions (column Actions ) performed by several users (column Id). The order of the data frame is important - it is the order the actions were performed in. For each id, the first action performed is start. Consecutive identical actions are possible (for example, the sequence start -> D -> D -> D is valid ). This is some code to generate data:
set.seed(10)
i <- 0
all_id <- NULL
all_vals <- NULL
while (i < 5) {
i <- i + 1
print(i)
size <- sample(3:5, size = 1)
tmp_id <- rep(i, times = size + 1)
tmp_vals <- c("start",sample(LETTERS, size = size) )
all_id <- c(all_id, tmp_id)
all_vals <- c(all_vals, tmp_vals)
}
df <- data.frame(Id = all_id,
Action = all_vals)
Goal - transform this data in a JSON nested on multiple levels that will be used in a D3.js visualization (like this). I would like to see a counter for how many times each child appears for their respective parent (an maybe even a percentage out of the total appearances of the parent) - but I hope I can do that myself.
Expected output below - this is generic, not from the data I generated above, and real data will have quite a lot of nested values ( count and percentage are optional at this point in time):
{
"action": "start",
"parent": "null",
"count": "10",
"percentage": "100",
"children": [
{
"action": "H",
"parent": "start",
"count": "6",
"percentage": "60",
"children": [
{
"action": "D",
"parent": "H",
"count": "5",
"percentage": "83.3"
},
{
"action": "B",
"parent": "H",
"count": "3",
"percentage": "50"
}
]
},
{
"action": "R",
"parent": "start",
"count": "4",
"percentage": "40"
}
]
}
I know I am supposed to post something I've tried, but I really don't have anything remotely worth of being shown.
I have just started writing some R -> d3.js converters in https://github.com/timelyportfolio/d3r that should work well in these type situations. I will work up an example later today with your data.
The internal hierarchy builder in https://github.com/timelyportfolio/sunburstR also might work well here.
I'll add to the answer as I explore both of these paths.
example 1
set.seed(10)
i <- 0
all_id <- NULL
all_vals <- NULL
while (i < 5) {
i <- i + 1
print(i)
size <- sample(3:5, size = 1)
tmp_id <- rep(i, times = size + 1)
tmp_vals <- c("start",sample(LETTERS, size = size) )
all_id <- c(all_id, tmp_id)
all_vals <- c(all_vals, tmp_vals)
}
df <- data.frame(Id = all_id,
Action = all_vals)
# not sure I completely understand what this is
# supposed to become but here is a first try
# find position of start
start_pos <- which(df$Action=="start")
# get the sequences
# surely there is a better way but do this for now
sequences <- paste(
start_pos+1,
c(start_pos[-1],nrow(df))-1,
sep=":"
)
paths <- lapply(
sequences,
function(x){
data.frame(
t(as.character(df[eval(parse(text=x)),]$Action)),
stringsAsFactors=FALSE
)
}
)
paths_df <- dplyr::bind_rows(paths)
# use d3r
# devtools::install_github("timelyportfolio/d3r")
library(d3r)
d3_nest(paths_df) # if want list, then json=FALSE
# visualize with listviewer
# devtools::install_github("timelyportfolio/listviewer")
listviewer::jsonedit(d3_nest(paths_df))

Make deeply nested JSON from data frame in R

I'm looking to take a nice tidy data frame and turn it into a deeply nested JSON using R. So far I haven't been able to find any other resources that directly address this task - most seem to be trying to take it in the other direction (un-nesting a JSON).
Here's a small dummy version of the data frame I'm starting with. Imagine a survey was given to two audiences within a company, one for managers and a separate one for employees. The surveys have different sets of questions with different IDs but many questions overlap and I want to compare the responses between the two groups. The end goal is to make a JSON that matches up section IDs, question IDs, and option IDs/text from two surveys in the correct hierarchy. Some questions have subquestions that require a further level of nesting, which is what I’m having difficulty doing.
library(dplyr)
library(tidyr)
library(jsonlite)
dummyDF <- data_frame(sectionId = c(rep(1,9),rep(2,3)),
questionId = c(rep(1,3),rep(2,6),rep(3,3)),
subquestionId = c(rep(NA,3),rep("2a",3),rep("2b",3),rep(NA,3)),
deptManagerQId = c(rep("m1",3),rep("m2",3),rep("m3",3),rep("m4",3)),
deptEmployeeQId = c(rep("e1",3),rep("e3",3),rep("e4",3),rep("e7",3)),
optionId = rep(c(1,2,3),4),
text = rep(c("yes","neutral","no"),4))
And here’s the end result I’m trying to achieve:
theGoal <- fromJSON('{
"sections": [
{
"sectionId": "1",
"questions": [
{
"questionId": "1",
"deptManagerQId": "m1",
"deptEmployeeQId": "e1",
"options": [
{
"optionId": 1,
"text": "yes"
},
{
"optionId": 2,
"text": "neutral"
},
{
"optionId": 3,
"text": "no"
}
]
},
{
"questionId": "2",
"options": [
{
"optionId": 1,
"text": "yes"
},
{
"optionId": 2,
"text": "neutral"
},
{
"optionId": 3,
"text": "no"
}
],
"subquestions": [
{
"subquestionId": "2a",
"deptManagerQId": "m2",
"deptEmployeeQId": "e3"
},
{
"subquestionId": "2b",
"deptManagerQId": "m3",
"deptEmployeeQId": "e4"
}
]
},
{
"questionId": "3",
"deptManagerQId": "m4",
"deptEmployeeQId": "e7",
"options": [
{
"optionId": 1,
"text": "yes"
},
{
"optionId": 2,
"text": "neutral"
},
{
"optionId": 3,
"text": "no"
}
]
}
]
}
]
}')
Here are a few approaches I’ve tried using nest from tidyr that end up either only getting me part of the way there or throwing an error message.
1
list1 <- dummyDF %>% nest(-sectionId, .key=questions) %>%
mutate(questions = lapply(seq_along(.$questions), function(x) nest(.$questions[[x]], optionId, text, .key = options))) %>%
list(sections = .)
2
nested1 <- dummyDF %>% nest(-sectionId, .key=questions) %>%
mutate(questions = lapply(seq_along(.$questions), function(x) nest(.$questions[[x]], optionId, text, .key = options)))
nested2 <- nested1 %>% mutate(questions = lapply(seq_along(.$questions), function(x) nest(.$questions[[x]], subquestionId, .key = subquestions)))
#Gives this error: cannot group column options, of class 'list'
3
list2 <- dummyDF %>% nest(-sectionId, .key=questions) %>%
mutate(questions = lapply(seq_along(.$questions),
function(x) {ifelse(is.na(.$questions[[x]]$subquestionId),
function(x) {.$questions[[x]] %>% select(-subquestionId) %>% nest(optionId, text, .key = options)},
function(x) {.$questions[[x]] %>% nest(subquestion_id, .key = subquestions)})})) %>%
list(sections = .)
#Gives this error: attempt to replicate an object of type 'closure'
Any ideas would be greatly appreciated. I’m open to any approaches. I took the issue to a local R user group meet-up but wasn’t able to come up with any solutions so I’ve got my fingers crossed here. I realize R might not be the best tool to accomplish this but it’s the one I know so I’m giving it a shot. Thanks.
jsonlite::toJSON looks like a nice solution to your problem.
Works seamlessly up to column types and column order (I corrected to illustrate that the objects were identical). If you need any other type of JSON structure, I would recommend restructuring the data_frame on the front end first using something like dplyr or tidyr.
library(jsonlite)
library(dplyr)
dummyDF <- data_frame(sectionId = c(rep(1,9),rep(2,3)),
questionId = c(rep(1,3),rep(2,6),rep(3,3)),
subquestionId = c(rep(NA,3),rep("2a",3),rep("2b",3),rep(NA,3)),
deptManagerQId = c(rep("m1",3),rep("m2",3),rep("m3",3),rep("m4",3)),
deptEmployeeQId = c(rep("e1",3),rep("e3",3),rep("e4",3),rep("e7",3)),
optionId = rep(c(1,2,3),4),
text = rep(c("yes","neutral","no"),4))
## Convert to a JSON object
json <- jsonlite::toJSON(dummyDF)
theGoal <- fromJSON(json) %>% tbl_df() %>% select_(.dots=names(dummyDF)) %>%
## Convert integer columns to numeric
mutate_if(function(x) {if (typeof(x)=='integer') {TRUE} else {FALSE}},as.numeric)
## Compare the objects
all.equal(theGoal,dummyDF)
# TRUE
identical(theGoal,dummyDF)
# TRUE