I am using R with jsonlite to get data back from a url. It's pretty straightforward except that when I view the URL in a browser, there are 50 results. But when I view my results from jsonlite there are only 25 results in my data set. I have checked the jsonlite documentation and I can't find any parameters that would indicate paging or limits of any kind. Has anyone seen this before? The code I'm using is pretty straightforward, but I'm including it anyways. I've already checked the data in between the flatten step and the fromJSON command only returns 25 rows.
library(jsonlite)
url="https://myjson"
mydata = fromJSON(url)
mydata = flatten(mydata$mydataframe,recursive=TRUE)
Related
I'm trying to connect to an online database through R, which can be found here:
http://open.canada.ca/data/en/dataset/2270e3a4-447c-45f6-8e63-aea9fe94948f
How would I be able to load the data table into R and then be able to simply change the table name in my code to access other tables? I'm not particularly concerned with what language I need to use (JSON, JSOn-LD, XML).
Thanks in advance!
Assuming you know the URLs for each of the datasets a similar question can be found here:
Download a file from HTTPS using download.file()
For this it becomes:
library(RCurl)
URL <- "http://www.statcan.gc.ca/cgi-bin/sum-som/fl/cstsaveascsv.cgi?filename=labr71a-eng.htm&lan=eng"
x <- getURL(URL)
URLout <- read.csv(textConnection(x),row.names=NULL)
I obtained the URL by right-clicking the access button and copying the address.
I had to declare row.names=NULL as the number of columns in the first row is not equal to the number of columns elsewhere, thus read.csv assumes row names as described here. I'm not sure if the URL to these datasets would change when they are updated, but this isn't a really convenient way to get this data. The JSON doesn't seem to be much better for intuitively being able to change datasets.
At least this way you could create a list of URLs and perform the following:
URL <- list(getURL("http://www.statcan.gc.ca/cgi-bin/sum-som/fl/cstsaveascsv.cgi?filename=labr71a-eng.htm&lan=eng"),
getURL("http://www.statcan.gc.ca/cgi-bin/sum-som/fl/cstsaveascsv.cgi?filename=labr72-eng.htm&lan=eng"))
URLout <- lapply(URL,function(x) read.csv(textConnection(x),row.names=NULL,skip=2))
Again I don't like having to declare row.names=NULL and when I look at the file I'm not seeing the discrepant number of columns, however this will at least get the file into the R environment for you. It may take some more work to perform the operation over multiple URLs.
In a further effort to obtain useful colnames:
URL <- "http://www.statcan.gc.ca/cgi-bin/sum-som/fl/cstsaveascsv.cgi?filename=labr71a-eng.htm&lan=eng"
x <- getURL(URL)
URLout <- read.csv(textConnection(x),row.names=NULL, skip=2)
The arguement skip = 2 will skip the first 2 rows when reading in the CSV, and will yield some header names. Because the headers are numbers an X will be placed in front. Row 2 in this case will have the value "number" in the second column. Unfortunately it appears this data was intended for use within excel, which is really sad.
1) You need to download the CSV into some directory that you have access to.
2) Use "read.csv", or "read_csv", or "fread" to read that csv file into R.
yourTableName<-read.csv("C:/..../canadaDataset.csv")
3) You can name that csv into whatever object name you want.
I have a json file that has a multi-layered list (already parsed text). Buried within the list, there is a layer that includes several calculations that I need to average. I have code to do this for each line individually, but that is not very time efficient.
mean(json_usage$usage_history[[1]]$used[[1]]$lift)
This returns an average for the numbers in the lift layer of the list for the 1st row. As mentioned, this isn't time efficient when you have a dataset with multiple rows. Unfortunately, I haven't had much success in using either a loop or lapply to do this on the entire dataset.
This is what happens when I try the for loop:
for(i in json_usage$usage_history[[i]]$used[[1]]$lift){
json_usage$mean_lift <- mean(json_usage$usage_history[[i]]$used[[1]]$lift)
}
Error in json_usage$affinity_usage_history[[i]] :
subscript out of bounds
This is what happens why I try lapply:
mean_lift <- lapply(lift_list, mean(lift_list$used$lift))
Error in match.fun(FUN) :
'mean(lift_list$used$lift)' is not a function, character or symbol
In addition: Warning message:
In mean.default(lift_list$used$lift) :
argument is not numeric or logical: returning NA
I am new to R, so I know I am likely doing it wrong, but I haven't found any examples of what I'm trying to do. I'm running out of ideas and growing increasingly frustrated. Please help!
Thank you!
The jsonlite package has a very useful function called flatten that you can use to convert the nested lists that commonly appear when parsing JSON data to a more usable dataframe. That should make it simpler to do the calculations you need.
Documentation is here: https://cran.r-project.org/web/packages/jsonlite/jsonlite.pdf
For an answer to a vaguely similar question I asked (though my issue was with NA data within JSON results), see here: Converting nested list with missing values to data frame in R
I am parsing JSON objects from pipl.com.
Specifically I am passing a CSV of contacts using lapply fromJSON under the jsonlite library to the api. Then I want to cbind specific elements into a flat dataframe. I have tried mapply, sapply and lapply to then rbind as below but this isn't working as I expect for any other elements than the ones below. I have tried it individually using the 'mini.test[1]$records$user_ids' syntax but the original contacts dataframe has hundreds of records so I was thinking a loop would be able to extract the elements I want.
I am looking to find only the user names for linkedin, facebook and twitter for each user. Thus I was thinking some sort of grepl would help me subset it. I have that vector created and posted the code below too.
I have read multiple r-bloggers articles on the different "apply" functions, looked at the R Cookbook pdf, and read questions on stackoverflow. I am stumped so really appreciate any help.
library(jsonlite)
#sample data
first<-c('Victor','Steve','Mary')
last<-c('Arias','Madden','Johnson')
contacts<-cbind(first,last)
#make urls
urls<-paste('http://api.pipl.com/search/v3/json/?first_name=',contacts[,1],'%09&last_name=',contacts[,2],'&pretty=True&key=xxxxxxx', sep='')
#Parse api
mini.test<-lapply(urls,fromJSON,simplifyMatrix=TRUE,flatten=TRUE)
#Data frame vector name
names <- do.call(rbind, lapply(mini.test, "[[", 5))
display <-do.call(rbind, lapply(names, "[[", 3))
#Grepl for 3 sources
records <- lapply(mini.test, "[[", 7)
twitter <-grepl("twitter.com",records,ignore.case = TRUE)
facebook <-grepl("facebook.com",records,ignore.case = TRUE)
linkedin <-grepl("linkedin.com",records,ignore.case = TRUE)
I know because of pipl's response that contacts may have multiple profile user names. For this purpose I just need it unlisted as a string not a nested list in the dataframe. In the end I would like a flat file that looks like below. Again, I am sincerely appreciate the help. I have been reading about it for 3 days without much success.
twitter <- c('twitter.username1','twitter.username2','NA')
linkedin <- c('linkedin.username1','linedin.username2','linkedin.username3')
facebook <- c('fb1','fb2','fb3,fb3a')
df<-cbind(display,twitter,linkedin,facebook)
I'm attempting to visualize a cluster tree using this awesome D3 layout! However, it needs data in JSON format - how do I go from a hclust-object in R to a hierarchical JSON structure?
set.seed(123)
m <- matrix(runif(100), nrow=10)
cl <- hclust(dist(m))
plot(cl)
My googling turned up this hclustToTree function which returns a list that looks promising - but I don't really know where to go from there. Any advice would be much appreciated.
halfway <- hclustToTree(cl)
You're almost there:
jsonTree <- toJSON(halfway) # part of the RJSONIO library
In the Blekko search engine you can get the search results in JSON format, e.g. with the search term 'lifehacker':
http://blekko.com/ws/?q=lifehacker+%2Fjson
How could you carry out this query from R and parse the content?
[There is a URL, a RSS URL and a snippet with the main text.]
I have tried packages tm.plugin.webmining and boilerpipeR, but couldn't figure it out.
Using Rcurl and RJSONIO packages is very handy to retrieve rjson results:
library(RCurl)
library(RJSONIO)
doc <- getURL('http://blekko.com/ws/?q=lifehacker+%2Fjson')
doc.ll <- fromJSON(doc)
Then you can check the result like this :
doc.ll$RESULT