When using the cluster_leading_eigen function in R iGraph package I'm not able to merge the communities into two communities. The following gives a warning:
karate <- make_graph("Zachary")
wc <- cluster_edge_betweenness(karate)
cut_at(wc, no=2)
I can see that the merges matrix has a different structure than from the one I get from e.g. cluster_edge_betweenness; but I don't understand why.
Related
I am using the Factiva Package ‘tm.plugin.factiva’ to import html files containing a Factiva search. It has worked beautifully so far, but now I have a problem with importing data and constructing a corpus from several html files (350 in total). I cannot figure out how to write a loop to iterate the simple step-by-step import code I have used before.
Earlier, with a smaller sample, I have managed to import the html files an a step-by-step process:
library(R.temis)
library(tm)
library(tm.plugin.factiva)
# Import corpus
source1 <- FactivaSource("Factiva1.html")
source2 <- FactivaSource("Factiva2.html")
source3 <- FactivaSource("Factiva3.html")
corp_source1 <- Corpus(source1, list(language=NA))
corp_source2 <- Corpus(source2, list(language=NA))
corp_source3 <- Corpus(source3, list(language=NA))
full_corpus <- c(corp_source1, corp_source2, corp_source3)
However, this is obviously not an option for the 350 html files. I have tried writing a loop for the import:
# Import corpus
files <- list.files(my_path)
for (i in files){
source <- FactivaSource(i)
}
tech_corpus <- Corpus(source, list(language=NA))
And:
htmlFiles <- Sys.glob("Factiva*.html")
for (k in 1:lengths(htmlFiles[[k]])){
source <- FactivaSource(htmlFiles[[k]])
}
But both of these only reads the first html file into the source, not the rest.
I have also tried:
for (k in seq_along(htmlFiles)){
source <- FactivaSource(htmlFiles[1:k], encoding = "UTF-8", format = c("HTML"))
}
But then I get the error that:
Error: x must be a string of length 1. I have tried manipulating the htmlFiles into a list (by: html_list <- as.list(htmlFiles)), but no change in result.
The two loops that did work, but only for the first html file.
I got the same result when I tried looping constructing the corpus as well.
for (m in 1:lengths(htmlFiles)){
corp_source <- Corpus(htmlFiles[[m]], list(language=NA))
}
Which worked, but only for the first html file. But then I get the error:
In 1:lengths(htmlFiles) :
numerical expression has 5 elements: only the first used
I would highly appreciate any help to understand how to get around this issue. Ideally, a loop to repeat the step-by-step process I did in the beginning would be super, as it seems to me that neither the FactivaSource() or Corpus() likes the complications I have made here - but I am far from an expert. Any help will be highly appreciated!
I've generated this heatmap using heatmap.3. Clustering is performed based on the dendrogram, but for presentation purposed, I'd like to re-order the nodes such that dark blue is left and dark red is right while maintaining the dendrogram. I've read about re-order:
newdendro<-reorder(as.dendrogram(myclust(mydist(heatdata.scaled))),10:1,agglo.FUN=colSums)
But colSums(heatdata.scaled) is not stored in the dendrogram. How do I
1) use colSums(heatdata.scaled) to reorder the nodes
2) call this updated dendrogram in heatmap.3?
Your question is missing a self contained reproducible example. So I will use the mtcars data. And since I'm now working on the heatmaply package, I'll give an answer using it (but you can just change heatmaply to your desired function, and the code will work the same).
# get data
x <- mtcars
# row dend:
hc_r <- as.dendrogram(hclust(dist(x)))
# col dend:
hc_c <- as.dendrogram(hclust(dist(t(x))))
# weights and reordering
wts_r <- rowSums(x)
wts_c <- colSums(x) # apply(x, 2, mean)
hc_r <- rev(reorder(hc_r,wts_r))
hc_c <- reorder(hc_c,wts_c)
x2 <- x[order.dendrogram(hc_r),
order.dendrogram(hc_c)]
# plot
library(heatmaply)
heatmaply(x2, dendrogram = "none")
And we get the following beautiful (and interactive) plot:
I'm having trouble converting a JSON file (from an API) to a data frame in R. An example is the URL http://api.fantasy.nfl.com/v1/players/stats?statType=seasonStats&season=2010&week=1&format=json
I've tried a few different suggestions from S/O, including
convert json data to data frame in R and various blog posts such as http://zevross.com/blog/2015/02/12/using-r-to-download-and-parse-json-an-example-using-data-from-an-open-data-portal/
The closest I've been is using the code below which gives me a large matrix with 4 "rows" and a bunch of "varables" (V1, V2, etc.). I'm assuming that this JSON file is in a different format than "normal" ones.
library(RJSONIO)
raw_data <- getURL("http://api.fantasy.nfl.com/v1/players/stats?statType=seasonStats&season=2010&week=1&format=json")
data <- fromJSON(raw_data)
final_data <- do.call(rbind, data)
I'm pretty agnostic as to how to get it to work so any R packages/process are welcome. Thanks in advance.
The jsonlite package automatically picks up the dataframe:
library(jsonlite)
mydata <- fromJSON("http://api.fantasy.nfl.com/v1/players/stats?statType=seasonStats&season=2010&week=1&format=json")
names(mydata$players)
# [1] "id" "esbid" "gsisPlayerId" "name"
# [5] "position" "teamAbbr" "stats" "seasonPts"
# [9] "seasonProjectedPts" "weekPts" "weekProjectedPts"
head(mydata$players)
# id esbid gsisPlayerId name position teamAbbr stats.1
# 1 100029 FALSE FALSE San Francisco 49ers DEF SF 16
# 2 729 ABD660476 00-0025940 Husain Abdullah DB KC 15
# 3 2504171 ABR073003 00-0019546 John Abraham LB 15
# 4 2507266 ADA509576 00-0025668 Michael Adams DB 13
# 5 2505708 ADA515576 00-0022247 Mike Adams DB IND 15
# 6 1037889 ADA534252 00-0027610 Phillip Adams DB ATL 11
You can control this using the simplify arguments in jsonlite::fromJSON().
There's nothing "abnormal" about this JSON, its just not a rectangular structure that fits trivially into a data frame. JSON can represent much richer data structures.
For example (using the rjson package, you've not said what you've used):
> data = rjson::fromJSON(file="http://api.fantasy.nfl.com/v1/players/stats?statType=seasonStats&season=2010&week=1&format=json")
> length(data[[4]][[10]]$stats)
[1] 14
> length(data[[4]][[1]]$stats)
[1] 21
(data[[1 to 3]] look like headers)
the "stats" of the 10th element of data[[4]] has 14 elements, the "stats" of the first has 21. How is that going to fit into a rectangular data frame? R has stored it in a list because that's R's best way of storing irregular data structures.
Unless you can define a way of mapping the irregular data into a rectangular data frame, you can't store it in a data frame. Do you understand the structure of the data? That's essential.
RJson and Jsonlite have similar commands, like fromJSON but depending on the order you load them, they will override each other. For my purposes, rJson structures data much better than JsonLite, so I make sure to load in the correct order/only load Rjson
jsonlite is load
library(jsonlite)
Definition of quandl_url
quandl_url <- "https://www.quandl.com/api/v3/datasets/WIKI/FB/data.json?auth_token=i83asDsiWUUyfoypkgMz"
Import Quandl data:
quandl_data <- fromJSON(quandl_url)
quandl_data in list type
quandl_data
I have a text file with the following lines:
{"time":"2015-11-15T17:56:45.300","x":93.32,"y":8.6,"s":4.57,"dis":0.45,"on_field":true,"game":{"references":[{"origin":"gsis","id":2015111500}]},"team":{"references":[{"origin":"gsis","id":"5110"}]},"play":{"references":[{"origin":"ngs","id":""}]},"references":[{"origin":"gsis","id":"00-0026189"}]}
{"time":"2015-11-15T17:56:45.400","x":93.77,"y":8.48,"s":4.55,"dis":0.47,"on_field":true,"game":{"references":[{"origin":"gsis","id":2015111500}]},"team":{"references":[{"origin":"gsis","id":"5110"}]},"play":{"references":[{"origin":"ngs","id":""}]},"references":[{"origin":"gsis","id":"00-0026189"}]}
{"time":"2015-11-15T17:56:45.500","x":94.23,"y":8.36,"s":4.53,"dis":0.47,"on_field":true,"game":{"references":[{"origin":"gsis","id":2015111500}]},"team":{"references":[{"origin":"gsis","id":"5110"}]},"play":{"references":[{"origin":"ngs","id":""}]},"references":[{"origin":"gsis","id":"00-0026189"}]}
{"time":"2015-11-15T17:56:45.600","x":94.67,"y":8.23,"s":4.51,"dis":0.46,"on_field":true,"game":{"references":[{"origin":"gsis","id":2015111500}]},"team":{"references":[{"origin":"gsis","id":"5110"}]},"play":{"references":[{"origin":"ngs","id":""}]},"references":[{"origin":"gsis","id":"00-0026189"}]}
{"time":"2015-11-15T17:56:45.700","x":95.1,"y":8.08,"s":4.5,"dis":0.46,"on_field":true,"game":{"references":[{"origin":"gsis","id":2015111500}]},"team":{"references":[{"origin":"gsis","id":"5110"}]},"play":{"references":[{"origin":"ngs","id":""}]},"references":[{"origin":"gsis","id":"00-0026189"}]}
I am trying to extract the date, time, x, y, s, and dis variables and save them in an R data frame. I think I can find a way to clean it with a shell script then read it in R but I was hoping there is some nice trick to do this in R only. Thanks
Each of your lines appear to be in JSON format (but not the whole file, so we cannot just parse it as such). You could return each line as a list and then make a list of the results
res <- readLines("test.txt")
library(jsonlite)
allofit <- sapply(res, fromJSON)
which will give you a list of lists (of lists ..) containing all your data
I am trying to plot a graph using R which is populated by MySQL query results. I have the following code:
rs = dbSendQuery(con, "SELECT BuildingCode, AccessTime from access")
data = fetch(rs, n=-1)
x = data[,1]
y = data[,2]
cat(colnames(data),x,y)
This gives me an output of:
BuildingCode AccessTime TEST-0 TEST-1 TEST-2 TEST-3 TEST-4 14:40:59 07:05:00 20:10:59 08:40:00 07:30:59
But this is where I get stuck. I have idea how to pass the "cat" data into an R plot. I have spend hours searching online and most of the examples of R plots I have come across use read.tables(text=""). This is not feasible for me as the data has to come from a database and not be hard coded in. I also found something about saving the output as a CSV but MySQL can not overwrite existing files so after the code was executed once I was unable to do it again as a file already existed.
My question is, how can I use the "cat" data (or another way of doing it if there is a better way) to plot a graph using data that isn't hard coded?
Note: I am using RApache as my web server and I have installed the Brew package.
Make the plot using R and just pass the path to the file back in cat
<%
## Your other code to get the data, assuming it gets a data.frame called data
## Plot code
library(Cairo)
myplotfilename <- "/path/to/dir/myplot.png"
CairoPNG(filename = myplotfilename, width = 480, height = 480)
plot(x=data[,1],y=data[,2])
tmp <- dev.off()
cat(myplotfilename)
%>