Format JSON data using tidyjson - json

I would love to use the tidyjson package as it seems to have very clear instructions on how to use it.
However, I am having a few issues. Could you please help check and let me know if these are user issues or something else.
I am using the world_bank.json data downloaded from http://jsonstudio.com/resources/
worldbank <- fromJSON(file = "world_bank.json")
I do see that a list of 50 in Rstudio. However, when I try to use read_json, I get the below error.
> read_json(worldbank, format = "json")
Error in file.info(path) : invalid filename argument
> worldbank[[1]] %>% prettify
Error: parse error: trailing garbage
52b213b38594d8a2be17c780
(right here) ------^

Use jsonlite::stream_in as lizzy suggested with stream unzip:
> download.file("http://jsonstudio.com/wp-content/uploads/2014/02/world_bank.zip", "world_bank.zip")
> world_bank <- jsonlite::stream_in(unz("world_bank.zip", "world_bank.json"))
> names(world_bank)
[1] "_id" "approvalfy" "board_approval_month"
[4] "boardapprovaldate" "borrower" "closingdate"
[7] "country_namecode" "countrycode" "countryname"
[10] "countryshortname" "docty" "envassesmentcategorycode"
[13] "grantamt" "ibrdcommamt" "id"
[16] "idacommamt" "impagency" "lendinginstr"
[19] "lendinginstrtype" "lendprojectcost" "majorsector_percent"
[22] "mjsector_namecode" "mjtheme" "mjtheme_namecode"
[25] "mjthemecode" "prodline" "prodlinetext"
[28] "productlinetype" "project_abstract" "project_name"
[31] "projectdocs" "projectfinancialtype" "projectstatusdisplay"
[34] "regionname" "sector" "sector1"
[37] "sector2" "sector3" "sector4"
[40] "sector_namecode" "sectorcode" "source"
[43] "status" "supplementprojectflg" "theme1"
[46] "theme_namecode" "themecode" "totalamt"
[49] "totalcommamt" "url"

Related

Recognizing and Keeping Elements Containing Certain Patterns in a List

I want to try and webscrape my own Stackoverflow Profiles! By this I mean, get an html link of every question I have ever asked:
https://stackoverflow.com/users/18181916/antonoyaro8
https://math.stackexchange.com/users/1024449/antonoyaro8
I tried to do this follows:
library(rvest)
library(httr)
library(XML)
url<-"https://stackoverflow.com/users/18181916/antonoyaro8?tab=questions&sort=newest"
page <-read_html(url)
resource <- GET(url)
parse <- htmlParse(resource)
links <- list(xpathSApply(parse, path="//a", xmlGetAttr, "href"))
I tried to pick up on a pattern and noticed that all links with questions have some number - so I tried to write a code that checks if elements in the list contain a number and keep these links:
rv <- c("1", "2", "3", "4", "5", "6", "7", "8", "9", "0")
final <- unique (grep(paste(rv,collapse="|"),
links, value=TRUE))
But I don't think I am doing this correctly - apart from the messy formatting, the final file is returning links that do not contain any numbers at all.
Can someone please show me how to webscrape these links properly, and then repeat this for all pages (e.g. https://stackoverflow.com/users/18181916/antonoyaro8?tab=questions&sort=newest, https://stackoverflow.com/users/18181916/antonoyaro8?tab=questions&sort=newest&page=2, https://stackoverflow.com/users/18181916/antonoyaro8?tab=questions&sort=newest&page=3)
Worse come to worst, if I can do it for one of these pages, I can manually copy/paste the code for all pages and proceed that way.
Thank you!
The output is a list of length 1. We need to extract ([[) the element before applying the grep
unique (grep(paste(rv,collapse="|"),
links[[1]], value=TRUE))
Note that the rv includes numbers 0 to 9 and it can match a digit if it is present anywhere in the link. If the intention is to subset the digits following the questions
grep("questions/\\d+", links[[1]], value = TRUE)
-output
[1] "/questions/72859976/recognizing-and-keeping-elements-containing-certain-patterns-in-a-list"
[2] "/questions/72843570/combing-two-selections-together"
[3] "/questions/72840913/selecting-rows-from-a-table-based-on-a-list"
[4] "/questions/72840624/even-out-table-in-r"
[5] "/questions/72840548/creating-a-dictionary-reference-table"
[6] "/questions/72837147/sequentially-replacing-factor-variables-with-numerical-values"
[7] "/questions/72822951/scanning-and-replacing-values-of-rows-in-r"
[8] "/questions/72822781/alternative-to-do-callrbind-data-frame-for-combining-a-list-of-data-frames"
[9] "/questions/72738885/referencing-a-query-in-another-query"
[10] "/questions/72725108/defining-cte-common-table-expressions-in-r"
[11] "/questions/72723768/creating-an-id-variable-on-the-spot"
[12] "/questions/72720013/selecting-data-using-conditions-stored-in-a-variable"
[13] "/questions/72717135/effecient-ways-to-append-sql-results-in-r"
...
If there are multiple pages, add the page= with paste or sprintf
urls <- c(url, sprintf("%s&page=%d", url, 2:3))
out_lst <- lapply(urls, function(url)
{
page <-read_html(url)
resource <- GET(url)
parse <- htmlParse(resource)
links <- list(xpathSApply(parse, path="//a", xmlGetAttr, "href"))
grep("questions/\\d+", links[[1]], value = TRUE)
})
-output
> out_lst
[[1]]
[1] "/questions/72859976/recognizing-and-keeping-elements-containing-certain-patterns-in-a-list"
[2] "/questions/72843570/combing-two-selections-together"
[3] "/questions/72840913/selecting-rows-from-a-table-based-on-a-list"
[4] "/questions/72840624/even-out-table-in-r"
[5] "/questions/72840548/creating-a-dictionary-reference-table"
[6] "/questions/72837147/sequentially-replacing-factor-variables-with-numerical-values"
[7] "/questions/72822951/scanning-and-replacing-values-of-rows-in-r"
[8] "/questions/72822781/alternative-to-do-callrbind-data-frame-for-combining-a-list-of-data-frames"
[9] "/questions/72738885/referencing-a-query-in-another-query"
[10] "/questions/72725108/defining-cte-common-table-expressions-in-r"
[11] "/questions/72723768/creating-an-id-variable-on-the-spot"
[12] "/questions/72720013/selecting-data-using-conditions-stored-in-a-variable"
[13] "/questions/72717135/effecient-ways-to-append-sql-results-in-r"
[14] "/questions/72710448/removing-files-from-global-environment-with-a-certain-pattern"
[15] "/questions/72710203/r-sql-is-the-default-option-sampling-with-replacement"
[16] "/questions/72695401/allocating-max-memory-in-r"
[17] "/questions/72681898/randomly-delete-columns-from-datasets"
[18] "/questions/72663516/are-rds-files-more-efficient-than-csv-files"
[19] "/questions/72625690/importing-files-using-list-files"
[20] "/questions/72623856/second-most-common-element-in-each-row"
[21] "/questions/72623744/counting-the-position-where-a-pattern-is-completed"
[22] "/questions/72620501/bulk-import-export-files-from-r"
[23] "/questions/72613413/counting-every-position-where-a-pattern-appears"
[24] "/questions/72612577/counting-the-position-of-the-first-0-in-each-row"
[25] "/questions/72607160/taking-averages-across-lists"
[26] "/questions/72589276/functions-for-finding-out-the-midpoint-interpolation"
[27] "/questions/72587298/sandwiching-values-between-rows"
[28] "/questions/72569338/integration-error-lengthlower-1-is-not-true"
[29] "/questions/72568817/synchronizing-nas-in-r"
[30] "/questions/72568661/finding-the-loser-in-each-row"
[[2]]
[1] "/questions/72566170/making-a-race-between-two-variables"
[2] "/questions/72418723/making-a-list-of-random-numbers"
[3] "/questions/72418364/random-uniform-numbers-without-runif"
[4] "/questions/72353102/integrate-normal-distribution-between-2-values"
[5] "/questions/72174868/placing-commas-between-names"
[6] "/questions/72163297/simulate-flipping-french-fries-in-r"
[7] "/questions/71982286/alternatives-to-the-partition-by-statement-in-sql"
[8] "/questions/71970960/converting-lists-into-data-frames"
[9] "/questions/71970672/random-numbers-are-too-similar-to-each-other"
[10] "/questions/71933753/making-combinations-of-items"
[11] "/questions/71874791/sorting-rows-in-specified-order"
[12] "/questions/71866097/hiding-the-legend-in-this-graph"
[13] "/questions/71866048/understanding-the-median-in-this-graph"
[14] "/questions/71852517/nas-produced-when-number-of-iterations-increase"
[15] "/questions/71791906/assigning-unique-colors-to-multiple-lines-on-a-graph"
[16] "/questions/71787336/finding-identical-rows-in-multiple-datasets"
[17] "/questions/71758983/multiple-replace-lookups"
[18] "/questions/71758648/create-ascending-id-in-a-data-frame"
[19] "/questions/71731208/webscraping-data-which-pokemon-can-learn-which-attacks"
[20] "/questions/71728273/webscraping-pokemon-data"
[21] "/questions/71683045/identifying-smallest-element-in-each-row-of-a-matrix"
[22] "/questions/71671488/connecting-all-nodes-together-on-a-graph"
[23] "/questions/71641774/overriding-colors-in-ggplot2"
[24] "/questions/71641404/applying-a-function-to-a-data-frame-lapply-vs-traditional-way"
[25] "/questions/71624111/sending-emails-from-r"
[26] "/questions/71623019/sql-joining-tables-from-2-different-servers-r-vs-sas"
[27] "/questions/71429265/overriding-sql-errors-during-r-uploads"
[28] "/questions/71429129/splitting-a-dataset-into-uneven-portions"
[29] "/questions/71418533/multiplying-and-adding-values-across-rows"
[30] "/questions/71417489/tricking-an-sql-server-to-accept-a-file-from-r"
[[3]]
[1] "/questions/71417218/splitting-a-dataset-into-arbitrary-sections"
[2] "/questions/71398804/plotting-vector-fields-and-gradient-fields"
[3] "/questions/71387596/animating-the-mandelbrot-set"
[4] "/questions/71358405/repeat-a-set-of-ids-for-every-n-rows"
[5] "/questions/71344822/time-series-graphs-with-different-symbols"
[6] "/questions/71341865/creating-a-data-frame-with-commas"
[7] "/questions/71287944/converting-igraph-to-visnetwork"
[8] "/questions/71282863/fixing-the-first-and-last-numbers-in-a-random-list"
[9] "/questions/71282403/adding-labels-to-graph-nodes"
[10] "/questions/71262761/understanding-list-and-do-call-commands"
[11] "/questions/71261431/adjusting-graph-layouts"
[12] "/questions/71255038/overriding-non-existent-components-in-a-loop"
[13] "/questions/71244872/fixing-cluttered-titles-on-graphs"
[14] "/questions/71243676/directly-adding-titles-and-labels-to-visnetwork"
[15] "/questions/71232353/removing-all-edges-in-igraph"
[16] "/questions/71230273/writing-a-function-that-references-elements-in-a-matrix"
[17] "/questions/71227260/generating-random-graphs-according-to-some-conditions"
[18] "/questions/71087349/adding-combinations-of-numbers-in-a-matrix"

Plotting functions in Julia suddenly doesn't work

Yesterday I was playing with Julia, plotting various different functions and variables. And suddenly a very important feature stopped working. I cant plot a function with a simple "plot(f)" command. I would love some help, because this features simplifies my work greatly.
I tried to recompile the packages I used, yet it didn't help any.
I need to plot distributions and also my own functions (one variable).
I'm using packages Distributions.jl and StatsPlots.jl
There's a simple example that I know did work - but now it doesn't:
using Distributions
using StatsPlots
f(x) = x^2
plot(f)
It gives me this error:
ERROR: MethodError: no method matching Float64(::Array{Float64,1})
Closest candidates are:
Float64(::Int8) at float.jl:60
Float64(::Int16) at float.jl:60
Float64(::Int32) at float.jl:60
...
Stacktrace:
[1] (::getfield(Plots, Symbol("##108#109")){Symbol})(::Array{Float64,1}) at C:\Users\masen\.julia\packages\Plots\47Tik\src\axes.jl:152
[2] _broadcast_getindex at .\broadcast.jl:578 [inlined]
[3] (::getfield(Base.Broadcast, Symbol("##19#20")){Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple},Nothing,getfield(Plots, Symbol("##108#109")){Symbol},Tuple{Tuple{Array{Float64,1},Array{Float64,1}}}}})(::Int64) at .\broadcast.jl:953
[4] ntuple at .\tuple.jl:160 [inlined]
[5] copy at .\broadcast.jl:953 [inlined]
[6] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple},Nothing,getfield(Plots, Symbol("##108#109")){Symbol},Tuple{Tuple{Array{Float64,1},Array{Float64,1}}}}) at .\broadcast.jl:753
[7] _scaled_adapted_grid(::Function, ::Symbol, ::Symbol, ::Float64, ::Float64) at C:\Users\masen\.julia\packages\Plots\47Tik\src\series.jl:542
[8] macro expansion at C:\Users\masen\.julia\packages\Plots\47Tik\src\series.jl:529 [inlined]
[9] apply_recipe(::Dict{Symbol,Any}, ::Function, ::Float64, ::Float64) at C:\Users\masen\.julia\packages\RecipesBase\zBoFG\src\RecipesBase.jl:275
[10] _process_userrecipes(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{typeof(f)}) at C:\Users\masen\.julia\packages\Plots\47Tik\src\pipeline.jl:83
[11] _plot!(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{typeof(f)}) at C:\Users\masen\.julia\packages\Plots\47Tik\src\plot.jl:178
[12] #plot#137(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Function) at C:\Users\masen\.julia\packages\Plots\47Tik\src\plot.jl:57
[13] plot(::Function) at C:\Users\masen\.julia\packages\Plots\47Tik\src\plot.jl:51
[14] top-level scope at none:0

How To Extract Name in this HTML Element using rvest

I've searched through many rvest scraping posts but can't find an example like mine. I'm following the R vignette example (https://blog.rstudio.com/2014/11/24/rvest-easy-web-scraping-with-r/) for selectorgadget, but inputting my use case as necessary. None of selector gadget's suggestions get me what I need. I need to extract the name for each review on the page. A sample of what the name looks like under the hood is as follows:
<span itemprop="name" class="sg_selected">This Name</span>
Here's my code to this point. Ideally, this code should get me the individual names on this web page.
library(rvest)
library(dplyr)
dsa_reviews <-
read_html("https://www.directsalesaid.com/companies/traveling-
vineyard#reviews")
review_names <- html_nodes(dsa_reviews,'#reviews span')
df <- bind_rows(lapply(xml_attrs(review_names), function(x)
data.frame(as.list(x), stringsAsFactors=FALSE)))
Apologies if this is a duplicate question or if it's not formatted correctly. Please feel free to request any necessary edits.
Here it is :
library(rvest)
library(dplyr)
dsa_reviews <-
read_html("https://www.directsalesaid.com/companies/traveling-vineyard#reviews")
html_nodes(dsa_reviews,'[itemprop=name]') %>%
html_text()
[1] "Traveling Vineyard" ""
[3] "Kiersten Ray-kuhn" "Miley Sama"
[5] " Nancy Shawtone " "Amanda Moore"
[7] "Matt" "Kathy Barzal"
[9] "Lesa Brinker" "Lori Stryker"
[11] "Jeanette Holtman" "Penny Notarnicola"
[13] "Laura Ann" "Nicole Lafave"
[15] "Gretchen Hess Miller" "Gina Devine"
[17] "Ashley Lawton Converse" "Morgan Williams"
[19] "Angela Baston Mckeone" "Traci Feshler"
[21] "Kisha Marshall Dlugos" "Jody Cole Dvorak"
Colin

Loading and converting json file in R

I have .json file which I need to load in R and perform further operations with it after converting it into a data-frame. Initials of my json file looks like this:
{"_id":{"$oid":"57a30ce268fd0809ec4d194f"},"session":{"start_timestamp":{"$numberLong":"1470183490481"},"session_id":"def5faa9-20160803-001810481"},"metrics":{},"arrival_timestamp":{"$numberLong":"1470183523054"},"event_type":"OfferViewed","event_timestamp":{"$numberLong":"1470183505399"},"event_version":"3.0","application":{"package_name":"com.think.vito","title":"Vito","version_code":"5","app_id":"7ffa58dab3c646cea642e961ff8a8070","cognito_identity_pool_id":"us-east-1:4d9cf803-0487-44ec-be27-1e160d15df74","version_name":"2.0.0.0","sdk":{"version":"2.2.2","name":"aws-sdk-android"}},"client":{"cognito_id":"us-east-1:2e26918b-f7b1-471e-9df4-b931509f7d37","client_id":"ee0b61b0-85cf-4b2f-960e-e2aedef5faa9"},"device":{"locale":{"country":"US","code":"en_US","language":"en"},"platform":{"version":"5.1.1","name":"ANDROID"},"make":"YU","model":"AO5510"},"attributes":{"Category":"120000","CustomerID":"4078","OfferID":"45436"}}
Above sample is just one id, session, metrics and there are many like that.
I tried converting it using rjson library in R as follows. events_jason is the filename:
library(rjson)
result <- fromJSON(file = "events_json.json")
print(result)
$`_id`
$`_id`$`$oid`
[1] "57a30ce268fd0809ec4d194f"
$session
$session$start_timestamp
$session$start_timestamp$`$numberLong`
[1] "1470183490481"
$session$session_id
[1] "def5faa9-20160803-001810481"
$metrics
list()
$arrival_timestamp
$arrival_timestamp$`$numberLong`
[1] "1470183523054"
$event_type
[1] "OfferViewed"
$event_timestamp
$event_timestamp$`$numberLong`
[1] "1470183505399"
$event_version
[1] "3.0"
$application
$application$package_name
[1] "com.think.vito"
$application$title
[1] "Vito"
$application$version_code
[1] "5"
$application$app_id
[1] "7ffa58dab3c646cea642e961ff8a8070"
$application$cognito_identity_pool_id
[1] "us-east-1:4d9cf803-0487-44ec-be27-1e160d15df74"
$application$version_name
[1] "2.0.0.0"
$application$sdk
$application$sdk$version
[1] "2.2.2"
$application$sdk$name
[1] "aws-sdk-android"
$client
$client$cognito_id
[1] "us-east-1:2e26918b-f7b1-471e-9df4-b931509f7d37"
$client$client_id
[1] "ee0b61b0-85cf-4b2f-960e-e2aedef5faa9"
$device
$device$locale
$device$locale$country
[1] "US"
$device$locale$code
[1] "en_US"
$device$locale$language
[1] "en"
$device$platform
$device$platform$version
[1] "5.1.1"
$device$platform$name
[1] "ANDROID"
$device$make
[1] "YU"
$device$model
[1] "AO5510"
$attributes
$attributes$Category
[1] "120000"
$attributes$CustomerID
[1] "4078"
$attributes$OfferID
[1] "45436"
But it's just showing/reading the first row as I mentioned above. There are other more ids, session, metrics,event_type,etc which it is not showing.
Please help how can i read my whole json file so that i can see other rows as well and covert it into a proper data frame.
UPDATE:
I have found the solution. Using ndjson package I am getting desired data frame.
library(ndjson)
df<-ndjson::stream_in('events_data.json')
Your file is not a single json object, but rather a list of json obejcts, one for each line. You have to read each line and convert each one from json.
One way to do that is:
d <- lapply(strsplit(readLines("events_data2.json"),"\n"), fromJSON)
Hope this helps

trying to Upload .rmd file to wordpress

I'm having trouble uploading .rmd file to wordpress. I'm not exactly sure what's going on but the error suggests I don't have privileges to remotely publish to wordpress even though from what I understand Wordpress allows remote publishing even for free accounts. I've searched all the wordpress R queries on stack overflow and nothing seems to work. Here's my work flow:
devtools:::install_github("duncantl/RWordPress", force=T)
library(RWordPress)
# Set login parameters (replace admin,password and blog_url!)
options(WordPressLogin = c(admin = 'password'), WordPressURL = 'blog_url/xmlrpc.php')
library(markdown)
library(knitr)
options(markdown.HTML.options = c(markdownHTMLOptions(default = T),"toc"))
# Upload plots: set knitr options
opts_knit$set(upload.fun = function(file){library(RWordPress);uploadFile(file)$url;})
postThumbnail <- RWordPress::uploadFile("File.rmd",overwrite = TRUE)
That produces the following error:
Error: faultCode: 401 faultString: You do not have permission to upload files.
I also tried the following:
knit2wp('fake.rmd', title = 'TITLE', publish = FALSE)
And that produces the same error.
Here's my session info:
sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.5 (El Capitan)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets
[6] methods base
other attached packages:
[1] ggplot2_2.1.0 rmarkdown_1.0 knitr_1.13
[4] markdown_0.7.7 RWordPress_0.2-3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.5 formatR_1.4
[3] plyr_1.8.3 bitops_1.0-6
[5] base64enc_0.1-3 tools_3.3.0
[7] digest_0.6.10 jsonlite_1.0
[9] evaluate_0.9 tibble_1.1
[11] gtable_0.2.0 viridisLite_0.1.3
[13] lattice_0.20-33 png_0.1-7
[15] DBI_0.4-1 mapproj_1.2-4
[17] proto_0.3-10 gridExtra_2.2.1
[19] dplyr_0.5.0 httr_1.2.1
[21] stringr_1.0.0 caTools_1.17.1
[23] RgoogleMaps_1.2.0.7 htmlwidgets_0.7
[25] maps_3.1.0 grid_3.3.0
[27] R6_2.1.2 jpeg_0.1-8
[29] plotly_4.1.0 XML_3.98-1.4
[31] RSelenium_1.4.2 RJSONIO_1.3-0
[33] sp_1.2-3 ggmap_2.6.1
[35] tidyr_0.5.1 reshape2_1.4.1
[37] magrittr_1.5 XMLRPC_0.3-0
[39] scales_0.4.0 htmltools_0.3.5
[41] assertthat_0.1 formattable_0.2
[43] colorspace_1.2-6 geosphere_1.5-1
[45] labeling_0.3 stringi_1.0-1
[47] RCurl_1.95-4.8 lazyeval_0.2.0
[49] munsell_0.4.3 rjson_0.2.15
I'd also like to note, I checked the password and username and they're both correct (if I enter incorrect information I get a different error indicating that). I've also gotten a similar error trying user written functions:
Error: faultCode: 401 faultString: Sorry, you are not allowed to publish posts on this site.
By the way, when I run getUsersBlogs() I get:
$isAdmin
[1] TRUE
$isPrimary
[1] TRUE
$url
[1] "https://blogname.wordpress.com/"
$blogid
[1] "115210981"
$blogName
[1] "Site Title"
$xmlrpc
[1] "https://blogname.wordpress.com/xmlrpc.php"
As implied by #Lloyd Christmas, the problem is with your specification of options. If you change "WordPressURL" to "WordpressURL", you'll probably be fine.