Separate multiple JSON data in R - json

I am newbie of R and working on the below JSON file (snippet of head and relevant code example).
{"mdsDat":{"x":[0.098453,-0.19334,-0.23836,-0.28512,0.010195,0.14132,-0.026636,-0.17141,
0.082936,-0.030503,0.22893,0.097832,0.19978,0.048286,0.050141,0.026101,-0.10637,0.040702,
0.013531,0.013531],"y":[-0.21144,-0.25048,0.14525,-0.06405,0.16668,-0.066238,-0.23403,
0.17033,-0.037128,-0.019674,0.0089501,0.0069049,0.10143,-0.14445,0.052727,0.15911,0.049328,
0.074852,0.045969,0.045969],"topics":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
"Freq":[16.358,13.397,12.979,10.383,7.5134,7.16,6.1765,4.9584,4.6035,3.4624,3.4249,3.0709,
1.8512,1.8512,1.4977,0.90723,0.23895,0.16034,0.0031352,0.0031352],
"cluster":[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]},
"tinfo":{"Term":["equation","equations","mathematics","beauty","mathematical","people",
"beautiful","explain","world","time","understand","science","things","meaning","language",
"symbols","simple","life","nature","interesting","art","agree","movie","find","numbers",
"explore","mass","relationship","video","scientists","agree","scientists","amazing","learn",
"apply","artistic","common","fear","beautiful","mathematics","study","mathematical","science",
"meaning","physics","gravity","exchange","math","world","future","explained","sense",
"process","words","equations","experience","move","faster","eyes","fall","nature","power",
"human","exam","things","answer","people","world","ways","truth","equations","video",
"balance","painting","space"
...
"token.table":{"Term":["0","1","2","2","abstract","abstract","addition","admire","agree",
"amazing","answer","answer","apple","application","applied","applied","apply","art","artist",
"artist","artistic","arts","balance","balance","balance","beautiful","beautiful","beautiful",
"beautiful","beauty","beauty","bring","bring","bunch","bunch","calculate","calculation",
"collings","collings","collings","common","complex","complex","complex","contact","curiosity",
"curiosity","daily","difficult","discover","documentary","documentary","documentary","earth",
"earth","einstein","energy","energy","english","english","enjoy","enjoy","enjoy","equation",
"equation","equation","equation","equation","equations","equations","equations","equations",
"exam","exam","examination","examination","exchange","exchange","exchange","experience",
"experience","explain","explained","explained","explore","eyes","eyes","fact","fall","famous",
"famous","faster","faster","fear","feel","film","film","find","find","force","formula","formula","found",
...
"work","world","world","world","worlds","years","years"],"Topic":[8,5,11,13,8,10,5,4,1,1,2,
15,9,10,9,12,1,3,4,9,1,5,2,4,10,1,4,7,14,2,6,3,15,10,15,
...
,16,3,7,2,14,2,5,1,8,4,9,10,15,1,2,14,9,11,13],"Freq":[0.97036,0.9702,0.75081,0.25027,0.22141,
0.77494,0.97584,0.96609,0.99493,0.98083,0.73954,0.24651,0.99013,0.
...
In the situation of the project, I have created a variable of getJSONfield as below,
getJSONfield <- json %>%
spread_values(jsonList = jstring("token.table")) %>%
select(jsonList)
Also, it returns a JSON list something like this
jsonNodes
1 list(Term = list("0", "1", "1", "2", "Data of JSON"),
Topic = list(9, 1, 10,"Data of JSON"),
Freq = list(0.99834, "Data of JSON"))
And, I have to separate the multiple variables (i.e. Term, Topic and Freq) as head and edge of network diagram. Something that I would like to use the JSON data for:
jsonNode <-lapply(json$**topic**, header=T, as.is = T)
jsonTermsLinkS <- lapply(json$**term**, header=T, as.is=T)
jsonTermsLinkE <- lapply(json$**freq**, header=T, as.is=T)
But, first, I need to separate or call them successfully. Can anyone have any idea or advise on this?
Great thanks if someone help me!

Related

Assigning Variables from JSON in Python

I've searched across dozens of answers for the last week but I haven't been able to find an example of what I'm trying to do, happy to be pointed to something that I've missed, and I'm new to Python so I apologise if this is something trivial.
I'm trying to read in a configuration from a JSON file so that I can abstract the configuration from the script itself.
I want to be able to assign the configuration value to a variable and perform an action on it, before moving on to the next category in a nested list, of which the categories could change/expand over time (music, pictures, etc).
The JSON file (library.json) currently looks like this:
{"media":{
"tv": [{
"source": "/tmp/tv",
"dest": "/tmp/dest"
}],
"movies": [{
"source": "/tmp/movies",
"dest": "/tmp/dest"
}]
}}
The relevant script looks like this:
import json
with open(libfile) as data_file:
data = json.load(data_file)
for k, v in (data['media']['tv']):
print (k, v)
What I was hoping to see as output was:
dest /tmp/dest
source /tmp/tv
What I am seeing is:
dest source
It feels like I'm missing something simple.
This works,
import json
with open('data.json') as json_file:
data = json.load(json_file)
for p in data['media']['tv']:
dst = (p['dest'])
src = (p['source'])
print (src, dst)
Something like this? Using f-strings and zip() that will aggregate elements.
import json
with open("dummy.json") as data_file:
data = json.load(data_file)
for i, j in data["media"].items():
print(i)
print("\n".join(f'{str(k)} {str(l)}' for k,l in list(zip(j[0].keys(), j[0].values()))))
print("\n")
Output:
tv
source /tmp/tv
dest /tmp/dest
movies
source /tmp/movies
dest /tmp/dest
The problem here is that data['media']['tv'] is actually a list of dictionaries.
You can tell because it looks like this: "movies": [{.. (Note the bracket [)
That means that instead of this:
for k, v in (data['media']['tv']):
print (k, v)
You should be doing this:
for dct in (data['media']['tv']):
for k, v in dct.items():
print(k, v)

How to use/convert my R vector object in a rest (est.ensembl.org) api query?

Good morning.
I want to use the the following rest: https://rest.ensembl.org/documentation/info/sequence_id_post
I have the vector object (ids) in R:
> ids
[1] "NM_007294.3:c.932_933insT" "NM_007294.3:c.1883C>T" "NM_007294.3:c.2183A>C"
[4] "NM_007294.3:c.2321C>T" "NM_007294.3:c.4585G>A" "NM_007294.3:c.4681C>A"
I have to put this vector(ids) with more than 200 variables in the body= ids variable (bellow), according to the example of code below, for it works:
Code:
library(httr)
library(jsonlite)
library(xml2)
server <- "https://rest.ensembl.org"
ext <- "/vep/human/hgvs"
r <- POST(paste(server, ext, sep = ""), content_type("application/json"), accept("application/json"), body = '{ "hgvs_notations" : ["NM_007294.3:c.932_933insT", "NM_007294.3:c.1883C>T"] }')
stop_for_status(r)
head(fromJSON(toJSON(content(r))))
I know it's a json format, but when I convert my variable ids to json it's not in the correct format.
Do you have any suggestions?
Thanks for any help.
Leandro
I think that NM_007294.3:c.2321C>T is not a valid query to /sequence/id REST endpoint. It contains a sequence id (NM_007294.3) and a variant (c.2321C>T) and if you understood this literally, you are asking the server a letter T, since this call returns sequences.
Valid query would contain only sequence ids and you can use it like that (provided you have your ids in a vector):
r <- POST(paste(server, ext, sep = ""), content_type("application/json"), accept("application/json"), body = paste0('{ "ids" :', jsonlite::toJSON(ids), ' }')
Depending on the downstream scenario, making your ids unique might help/speed things up.

Is it possible, in R, to access the values of a list with a for loop on the names of the fields?

I have a big json file, containing 18 fields, some of which contain some other subfields. I read the file in R in the following way:
json_file <- "daily_profiles_Bnzai_20150914_20150915_20150914.json"
data <- fromJSON(sprintf("[%s]", paste(readLines(json_file), collapse=",")))
This gives me a giant list with all the fields contained in the json file. I want to make it into a data.frame and do some operations in the meantime. For example if I do:
doc_length <- data.frame(t(apply(as.data.frame(data$doc_lenght_map), 1, unlist)))
os <- data.frame(t(apply(as.data.frame(data$operating_system), 1, unlist)))
navigation <- as.data.frame(data$navigation)
monday <- data.frame(t(apply(navigation[,grep("Monday",names(data$navigation))],1,unlist)))
Monday <- data.frame(apply(monday, 1, sum))
works fine, I get what I want, with all the right subfields and then I want to join them in a final data.frame that I will use to do other operations.
Now, I'd like to do something like that on the subset of fields where I don't need to do operations. So, for example, the days of the week contained in navigation are not included. I'd like to have something like (suppose I have a data.frame df):
for(name in names(data))
{
df <- cbind(df, data.frame(t(apply(as.data.frame(data$name), 1, unlist)))
}
The above loop gives me errors. So, what I want to do is finding a way to access all the fields of the list in an automatic way, as in the loop, where the iterator "name" takes on all the fields of the list, without having to call them singularly and then doing some operations with those fields. I tried even with
for(name in names(data))
{
df <- cbind(df, data.frame(t(apply(as.data.frame(data[name]), 1, unlist)))
}
but it doesn't take all of the subfields. I also tried with
data[, name]
but it doesn't work either. So I think I need to use the "$" operator.
Is it possible to do something like that?
Thank you a lot!
Davide
Like the other commenters, I am confused, but I will throw this out to see if it might point you in the right direction.
# make mtcars a list as an example
data <- lapply(mtcars,identity)
do.call(
cbind,
lapply(
names(data),
function(name){
data.frame(data[name])
}
)
)

R, GeoJSON and Leaflet

I recently learned about leafletjs.com from an R-Bloggers.com post. One such tutorial that I would like to achieve is to create interactive choropleth maps with leaflet (http://leafletjs.com/examples/choropleth.html). I have been using the rjson package for R to create the data.js file to be read by leaflet. Although I have had success with using the provided shape file as a readable JSON file in leaflet, I am unable to repeat the process when trying to merge additional properties from the data frame ("data.csv") to the JSON file; in this case, I have done rGIS to attach data on the number of cans in each school listed in the data frame. What I would please like to achieve is to create a choropleth map in leaflet that displayes high school district (as identified by the NAME variable), and the sum of "cans". The issue, I believe, is that writeOGR exports the information as points, rather than polygon?
{
"type": "Feature",
"properties": {
"name": "Alabama",
"density": 94.65
},
"geometry": ...
...
}
###load R scripts from dropbox
dropbox.eval <- function(x, noeval=F) {
require(RCurl)
intext <- getURL(paste0("https://dl.dropboxusercontent.com/",x), ssl.verifypeer = FALSE)
intext <- gsub("\r","", intext)
if (!noeval) eval(parse(text = intext), envir= .GlobalEnv)
return(intext)
}
##pull scripts from dropbox
dropbox.eval("s/wgb3vtd9qfc9br9/pkg.load.r")
dropbox.eval("s/tf4ni48hf6oh2ou/dropbox.r")
##load packages
pkg.load(c(ggplot2,plyr,gdata,sp,maptools,rgdal,reshape2,rjson))
###setup data frames
dl_from_dropbox("data.csv","dx3qrcexmi9kagx")
data<-read.csv(file='data.csv',header=TRUE)
###prepare GIS shape and data for plotting
dropbox.eval("s/y2jsx3dditjucxu/dlshape.r")
temp <- tempfile()
dlshape(shploc="http://files.hawaii.gov/dbedt/op/gis/data/highdist_n83.shp.zip", temp)
shape<- readOGR(".","highdist_n83") #HDOE high school districts
shape#proj4string
shape2<- spTransform(shape, CRS("+proj=longlat +datum=NAD83"))
data.2<-ddply(data, .(year, schoolcode, longitude, latitude,NAME,HDist,SDist), summarise,
total = sum(total),
cans= sum(cans))
###merging back shape properties and data frame
coordinates(data.2) <-~longitude + latitude
shape2#data$id <- rownames(shape2#data)
sh.df <- as.data.frame(shape2)
sh.fort <- fortify(shape2 , region = "id" )
sh.line<- join(sh.fort, sh.df , by = "id" )
mapdf <- merge( sh.line , data.2 , by.x= "NAME", by.y="NAME" , all=TRUE)
mapdf <- mapdf[ order( mapdf$order ) , ]
###exporting merged data frame as JSON
mapdf.sp <- mapdf
coordinates(mapdf.sp) <- c("long", "lat")
writeOGR(mapdf.sp, "hssra.geojson","mapdf", driver = "GeoJSON")
However, it appears that my features are repeating itself constantly. How can I aggregate the features information so that it looks more like the following:
var statesData = {"type":"FeatureCollection","features":[
{"type":"Feature","id":"01","properties":{"name":"Alabama","density":94.65},
"geometry":{"type":"Polygon","coordinates":[[[-87.359296,35.00118],
[-85.606675,34.984749],[-85.431413,34.124869],[-85.184951,32.859696],
[-85.069935,32.580372],[-84.960397,32.421541],[-85.004212,32.322956],
[-84.889196,32.262709],[-85.058981,32.13674],[-85.053504,32.01077],[-85.141136,31.840985],
[-85.042551,31.539753],[-85.113751,31.27686],[-85.004212,31.003013],[-85.497137,30.997536],
[-87.600282,30.997536],[-87.633143,30.86609],[-87.408589,30.674397],[-87.446927,30.510088],
[-87.37025,30.427934],[-87.518128,30.280057],[-87.655051,30.247195],[-87.90699,30.411504],
[-87.934375,30.657966],[-88.011052,30.685351],[-88.10416,30.499135],[-88.137022,30.318396],
[-88.394438,30.367688],[-88.471115,31.895754],[-88.241084,33.796253],
[-88.098683,34.891641],[-88.202745,34.995703],[-87.359296,35.00118]]]}},
{"type":"Feature","id":"02","properties":{"name":"Alaska","density":1.264},
"geometry":{"type":"MultiPolygon","coordinates":[[[[-131.602021,55.117982],
[-131.569159,55.28229],[-131.355558,55.183705],[-131.38842,55.01392],
[-131.645836,55.035827],[-131.602021,55.117982]]],[[[-131.832052,55.42469],
[-131.645836,55.304197],[-131.749898,55.128935],[-131.832052,55.189182],
[-131.832052,55.42469]]],[[[-132.976733,56.437924],[-132.735747,56.459832],
[-132.631685,56.421493],[-132.664547,56.273616],[-132.878148,56.240754],
[-133.069841,56.333862],[-132.976733,56.437924]]],[[[-133.595627,56.350293],
I ended up solving this question.
What I basically did was basically join the data.2 df to the shape file:
(shape2#data<-join(shape2#data,data.2)
and then using rgdal package to writeOGR in JSON format (using JSON driver) with the *.js extension.
I hope this helps others.

Using \Sexpr{} in LaTeX tabular environment

I am trying to use \Sexpr{} to include values from my R objects in a LaTeX table. I am essentially trying to replicate the summary output of a lm object in R because xtable's built in methods xtable.lm and xtable.summary.lm don't seem to include the Fstats, adjusted R-squared, etc (all the stuff at the bottom of the summary printout of the lm object in R console) So I tried accomplishing this by building a matrix to replicate the xtable.summary.lm output then construct a data frame of the relevant info for the extra stuff so I can refer to the values using \Sexpr{}. I tried doing this by using add.to.row to append the \multicolumn{} command in order to merge all columns of the last row of the LaTeX table and then just pass all the information I need into that cell of the table.
The problem is that I get an "Undefined control sequence" for the \Sexpr{} expression in the \multicolumn{} expression. Are these two not compatible? If so, what am I doing wrong and if not does anyone know how to do what I am trying to do?
Thanks,
Here is the relevant part of my code:
<<Test, results=tex>>=
model1 <- lm(stndfnl ~ atndrte + frosh + soph)
# Build matrix to replicate xtable.summary.lm output
x <- summary(model1)
colnames <- c("Estimate", "Std. Error", "t value", "Pr(<|t|)")
rownames <- c("(Intercept)", attr(x$terms, "term.labels"))
fpval <- pf(x$fstatistic[1],x$fstatistic[2], x$fstatistic[3], lower.tail=FALSE)
mat1 <- matrix(coef(x), nrow=length(rownames), ncol=length(colnames), dimnames=list(rownames,colnames))
# Make a data frame for extra information to be called by \Sexpr in last row of table
residse <- x$sigma
degf <- x$df[2]
multr2 <- x$r.squared
adjr2 <- x$adj.r.squared
fstat <- x$fstatistic[1]
fstatdf1 <- x$fstatistic[2]
fstatdf2 <- x$fstatistic[3]
extradat <- data.frame(v1 = round(residse,4), v2 =degf, v3=round(multr2,4), v4=round(adjr2,4),v5=round(fstat,3), v6=fstatdf1, v7=fstatdf2, v8=round(fpval,6))
addtorow<- list()
addtorow$pos <-list()
addtorow$pos[[1]] <- dim(mat1)[1]
addtorow$command <-c('\\hline \\multicolumn{5}{l}{Residual standard error:\\Sexpr{extradat$v1}} \\\\ ')
print(xtable(mat1, caption="Summary Results for Regression in Equation \\eqref{model1} ", label="tab:model1"), add.to.row=addtorow, sanitize.text.function=NULL, caption.placement="top")
You don't need to have Sexpr in your R code; the R code can use the expressions directly. Sexpr is not a LaTeX command, even though it looks like one; it's an Sweave command, so it doesn't work to have it as output from R code.
Try
addtorow$command <-paste('\\hline \\multicolumn{5}{l}{Residual standard error:',
extradat$v1, '} \\\\ ')
Also, no need to completely recreate the matrix used by xtable, you can just build on the default output. Building on what you have above, something like:
mytab <- xtable(model1, caption="Summary Results", label="tab:model1")
addtorow$pos[[1]] <- dim(mytab)[1]
print(mytab, add.to.row=addtorow, sanitize.text.function=NULL,
caption.placement="top")
See http://people.su.se/~lundh/reproduce/sweaveintro.pdf for an example which you might be able to use as is.