R web/text mining - web query JSON read - json

In the Blekko search engine you can get the search results in JSON format, e.g. with the search term 'lifehacker':
http://blekko.com/ws/?q=lifehacker+%2Fjson
How could you carry out this query from R and parse the content?
[There is a URL, a RSS URL and a snippet with the main text.]
I have tried packages tm.plugin.webmining and boilerpipeR, but couldn't figure it out.

Using Rcurl and RJSONIO packages is very handy to retrieve rjson results:
library(RCurl)
library(RJSONIO)
doc <- getURL('http://blekko.com/ws/?q=lifehacker+%2Fjson')
doc.ll <- fromJSON(doc)
Then you can check the result like this :
doc.ll$RESULT

Related

How to take results from json input and put into a csv output in python

I am trying to work out how to take results from python regarding the sentiment polarity of tweets (original input from json file) and turn them into a csv i can export for use in R - im using Python 2.7
I have tried a couple of different ways from similar stackflow queries, but no success so far.
For example, using pandas package
tweet_polarity = []
for tweet in tweet_text:
polarity = analyser.polarity_scores(tweet[1])
tweet_polarity.append([tweet[0], tweet[1], polarity['compound'],
polarity['neg'], polarity['neu'], polarity['pos']])
import pandas
df = pandas.DataFrame(data={"tweet_polarity": tweet_polarity, "tweet_text": tweet_text,
"tweets}": tweets})
df.to_csv("polarityRES.csv")
creates a csv file, but seems to just repeat the same tweet over and over again rather than creating a nice dataframe with the polarity scores
I thought about using cvs.writer, but haven't been able to find a relevant example to what I'm trying to do. Any suggestions gang?
(Sorry for my terrible explanation, I'm still getting to grips with the basics while trying to do this - and typing one handed with tendonitis!)

Changing values in JSON file through R

Is there a way to change values, or assign new variables in a json file and after give it back in the same format?
It can be used rjson pachage to get the json file in R in data.frame format but how to covert back this data.frame to json after my changes?
EDIT:
sample code:
json file:
{"__v":1,"_id":{"$oid":"559390f6fa76bc94285fa68a"},"accountId":6,"api":false,"countryCode":"no","countryName":"Norway","date":{"$date":"2015-07-01T07:04:22.265Z"},"partnerId":1,"query":{"search":[{"label":"skill","operator":"and","terms":["java"],"type":"required"}]},"terms":[{"$oid":"559390f6fa76bc94285fa68b"}],"time":19,"url":"eyJzZWFyY2giOlt7InRlcm1zIjpbImphdmEiXSwibGFiZWwiOiJza2lsbCIsInR5cGUiOiJyZXF1aXJlZCIsIm9wZXJhdG9yIjoiYW5kIn1dfQ","user":11}
{"__v":1,"_id":{"$oid":"5593910cfa76bc94285fa68d"},"accountId":6,"api":false,"countryCode":"se","countryName":"Sweden","date":{"$date":"2015-07-01T07:04:44.565Z"},"partnerId":1,"query":{"search":[{"label":"company","operator":"or","terms":["microsoft"],"type":"required"},{"label":"country","operator":"or","terms":["se"],"type":"required"}]},"terms":[{"$oid":"5593910cfa76bc94285fa68e"},{"$oid":"5593910cfa76bc94285fa68f"}],"time":98,"url":"eyJzZWFyY2giOlt7InRlcm1zIjpbIm1pY3Jvc29mdCJdLCJsYWJlbCI6ImNvbXBhbnkiLCJ0eXBlIjoicmVxdWlyZWQiLCJvcGVyYXRvciI6Im9yIn0seyJ0ZXJtcyI6WyJzZSJdLCJsYWJlbCI6ImNvdW50cnkiLCJ0eXBlIjoicmVxdWlyZWQiLCJvcGVyYXRvciI6Im9yIn1dfQ","user":13}
Code:
library('rjson')
c <- file(Usersfile,'r')
l <- readLines(c,-1L)
json <- lapply(X=l,fromJSON)
json[[1]]$countryName <- 'Jamaica'
result <- cat(toJSON(json))
Output(is one line and start with [:
[{"__v":1,"_id":{"$oid":"559390f6fa76bc94285fa68a"},"accountId":6,"api":false,"countryCode":"no","countryName":"Jamaica","date":{"$date":"2015-07-01T07:04:22.265Z"},"partnerId":1,"query":{"search":[{"label":"skill","operator":"and","terms":"java","type":"required"}]},"terms":[{"$oid":"559390f6fa76bc94285fa68b"}],"time":19,"url":"eyJzZWFyY2giOlt7InRlcm1zIjpbImphdmEiXSwibGFiZWwiOiJza2lsbCIsInR5cGUiOiJyZXF1aXJlZCIsIm9wZXJhdG9yIjoiYW5kIn1dfQ","user":11},{"__v":1,"_id":{"$oid":"5593910cfa76bc94285fa68d"},"accountId":6,"api":false,"countryCode":"se","countryName":"Sweden","date":{"$date":"2015-07-01T07:04:44.565Z"},"partnerId":1,"query":{"search":[{"label":"company","operator":"or","terms":"microsoft","type":"required"},{"label":"country","operator":"or","terms":"se","type":"required"}]},"terms":[{"$oid":"5593910cfa76bc94285fa68e"},{"$oid":"5593910cfa76bc94285fa68f"}],"time":98,"url":"eyJzZWFyY2giOlt7InRlcm1zIjpbIm1pY3Jvc29mdCJdLCJsYWJlbCI6ImNvbXBhbnkiLCJ0eXBlIjoicmVxdWlyZWQiLCJvcGVyYXRvciI6Im9yIn0seyJ0ZXJtcyI6WyJzZSJdLCJsYWJlbCI6ImNvdW50cnkiLCJ0eXBlIjoicmVxdWlyZWQiLCJvcGVyYXRvciI6Im9yIn1dfQ","user":13}]
convert data frame to json
So this question has already been answered in full here ^^^
Quick Summary ::
There are 2 options presented.
(A) rjson library
import the library
use to the toJSON() method to create a JSON object. (Not exactly sure what the unname() function does... :p ).
(B) jsonlite library
import the jsonlite library
just use the toJSON() method (same as above, but with no modification).
cat() the above object.
Code examples are at that link. Hope this helps!

R and jsonlite - truncated result set?

I am using R with jsonlite to get data back from a url. It's pretty straightforward except that when I view the URL in a browser, there are 50 results. But when I view my results from jsonlite there are only 25 results in my data set. I have checked the jsonlite documentation and I can't find any parameters that would indicate paging or limits of any kind. Has anyone seen this before? The code I'm using is pretty straightforward, but I'm including it anyways. I've already checked the data in between the flatten step and the fromJSON command only returns 25 rows.
library(jsonlite)
url="https://myjson"
mydata = fromJSON(url)
mydata = flatten(mydata$mydataframe,recursive=TRUE)

R JSON list element extraction in a loop

I am parsing JSON objects from pipl.com.
Specifically I am passing a CSV of contacts using lapply fromJSON under the jsonlite library to the api. Then I want to cbind specific elements into a flat dataframe. I have tried mapply, sapply and lapply to then rbind as below but this isn't working as I expect for any other elements than the ones below. I have tried it individually using the 'mini.test[1]$records$user_ids' syntax but the original contacts dataframe has hundreds of records so I was thinking a loop would be able to extract the elements I want.
I am looking to find only the user names for linkedin, facebook and twitter for each user. Thus I was thinking some sort of grepl would help me subset it. I have that vector created and posted the code below too.
I have read multiple r-bloggers articles on the different "apply" functions, looked at the R Cookbook pdf, and read questions on stackoverflow. I am stumped so really appreciate any help.
library(jsonlite)
#sample data
first<-c('Victor','Steve','Mary')
last<-c('Arias','Madden','Johnson')
contacts<-cbind(first,last)
#make urls
urls<-paste('http://api.pipl.com/search/v3/json/?first_name=',contacts[,1],'%09&last_name=',contacts[,2],'&pretty=True&key=xxxxxxx', sep='')
#Parse api
mini.test<-lapply(urls,fromJSON,simplifyMatrix=TRUE,flatten=TRUE)
#Data frame vector name
names <- do.call(rbind, lapply(mini.test, "[[", 5))
display <-do.call(rbind, lapply(names, "[[", 3))
#Grepl for 3 sources
records <- lapply(mini.test, "[[", 7)
twitter <-grepl("twitter.com",records,ignore.case = TRUE)
facebook <-grepl("facebook.com",records,ignore.case = TRUE)
linkedin <-grepl("linkedin.com",records,ignore.case = TRUE)
I know because of pipl's response that contacts may have multiple profile user names. For this purpose I just need it unlisted as a string not a nested list in the dataframe. In the end I would like a flat file that looks like below. Again, I am sincerely appreciate the help. I have been reading about it for 3 days without much success.
twitter <- c('twitter.username1','twitter.username2','NA')
linkedin <- c('linkedin.username1','linedin.username2','linkedin.username3')
facebook <- c('fb1','fb2','fb3,fb3a')
df<-cbind(display,twitter,linkedin,facebook)

How to turn a hclust-object into JSON for D3?

I'm attempting to visualize a cluster tree using this awesome D3 layout! However, it needs data in JSON format - how do I go from a hclust-object in R to a hierarchical JSON structure?
set.seed(123)
m <- matrix(runif(100), nrow=10)
cl <- hclust(dist(m))
plot(cl)
My googling turned up this hclustToTree function which returns a list that looks promising - but I don't really know where to go from there. Any advice would be much appreciated.
halfway <- hclustToTree(cl)
You're almost there:
jsonTree <- toJSON(halfway) # part of the RJSONIO library