convert json to datatable using rjsonio - json

I have properly formatted JSON fetched through a standard API.
API Basically returns an array of JSON objects each time I fetch data.
LIke this:
[
{},
{},
{}
]
I have used JSON editor to check structure of JSON data and that looks perfect.
I need to convert it to CSV so tried this:
freshDeskRaw <- fromJSON(freshDeskTickets)
tmp <- lapply(freshDeskRaw , function(u)
lapply(u, function(x) if(is.null(x)) NA else x)
)
tmp <- lapply( tmp, as.data.frame)
tmp <- do.call( rbind, tmp )
At
tmp <- lapply( tmp, as.data.frame)
, I get an error:
tmp <- lapply( tmp, as.data.frame)
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 0, 1
how do I fix this? what looks wrong? I have tried to use
as.data.frame(tmp) as well but still get the same error.

The problem might be that the JSON above produces empty lists and
is.null(list())
# [1] FALSE
Try
is.null(x) || (is.list(x) && length(x) == 0)

Related

Shiny - assigning argument to function

I am trying to create a shiny app that applies a self-made function to an uploaded dataset, then allows to download the modified results. Here is my code:
library(shiny)
library(tidyverse)
namkurz <- function(data, a_spalte) {
kuerzel <- vector(length = length(data$a_spalte))
for (i in 1:length(data$a_spalte)){
spez = data$Art[i]
s = unlist(strsplit(spez, " ", fixed = TRUE))
s = substr(s, 1, 2)
s = paste(s, collapse = ' ')
kuerzel[[i]] = s
}
data <- data %>%
mutate(kurz = kuerzel)
}
ui <- fluidPage(
fileInput('upload','Deine Kartierungsdaten'),
textInput('art', 'Wie heißt die Spalte mit Artnamen?'),
downloadButton('analyse','Artenkürzel hinzufügen')
)
server <- function(input, output, session) {
data <- reactive({
req(input$upload)
ext <- tools::file_ext(input$upload$name)
switch(ext,
csv = vroom::vroom(input$upload$datapath, delim = ";"),
validate("Invalid file; Please upload a .csv file")
)
})
art <- reactive(input$art)
output$analyse <- downloadHandler(
filename = function() {
paste0('mit_kuerzel', ".csv")
},
content = function(file) {
ergebnis <- reactive(namkurz(data(), art()))
vroom::vroom_write(ergebnis(), file)
}
)
}
shinyApp(ui, server)
When trying to save the output I get a 'Warning: Unknown or uninitialised column:' error. I think my problem is in the assignment of argument 'art' to the 'ergebnis' object, but I can't find the way to fix it.
I recommend a few things:
(Required) In your function, a_spalte is a character vector and not the literal name of column in the frame, so you need to use [[ instead of $, see The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe and Dynamically select data frame columns using $ and a character value.
Change all references of data$a_spalte to data[[a_spalte]].
namkurz <- function(data, a_spalte) {
kuerzel <- vector(length = length(data[[a_spalte]]))
for (i in 1:length(data[[a_spalte]])){
spez = data$Art[i]
s = unlist(strsplit(spez, " ", fixed = TRUE))
s = substr(s, 1, 2)
s = paste(s, collapse = ' ')
kuerzel[[i]] = s
}
data <- data %>%
mutate(kurz = kuerzel)
}
Your function is a bit inefficient doing things row-wise, we can vectorize that operation.
namkurz <- function(data, a_spalte) {
spez <- strsplit(data$Art, " ", fixed = TRUE)
data$kurz <- sapply(spez, function(z) paste(substr(z, 1, 2), collapse = " "))
data
}
(Optional) The content= portion of downloadHandler is already reactive, you do not need to wrap namkurz in reactive. Because of this, you also don't need to treat ergebnis as reactive.
output$analyse <- downloadHandler(
filename = ...,
content = function(file) {
ergebnis <- namkurz(data(), art())
vroom::vroom_write(ergebnis, file)
}
)
(Optional) Your output filename is fixed, so two things here: if it's always going to be "mit_kuerzel.csv", then there's no need for paste0, just use function() "mit_kuerzel.csv".
However, if you are intending to return a file named something based on the original input filename, one could do something like:
filename = function() {
paste0(tools::file_path_sans_ext(basename(input$upload$name)),
"_mit_kuerzel.",
tools::file_ext(input$upload$name))
},
to add _mit_kuerzel to the base portion of the uploaded filename. Note that the file in the content= section is never this name, the new_mit_kuerzel.csv is the filename offered to the downloading browser as a suggestion, that is all.
(Optional) You are using a .csv file extension in the downloadHandler, but the default for vroom::vroom_write is to use delim = "\t", which is not a CSV. I suggest either adding delim = ";" (or similar), or changing the returned filename extension to .tsv instead.

Web scrape request does not work for long date range input in R

The code here works for web scraping by sending repeated request based on the input date range (startDate and endDate). Then data will be saved in csv file. I have used this code before for different xPath # html_node() argument, and it works fine. Now with different xPath, seems like it cannot works for longer date range. With this one, I also can't detect which data went missing because the code fails to work when the heading element applied as attr(). I've tried increase the range slowly and it only works until [1:29]. After that, no matter what the range used, it keep showing the old return. In some cases, I can see str(new_df) complete the request, but it keep saving the old return, as if the bind_rows fail. Sometimes (by using date of different year), I was able to extract the range desired by slowly increase the range (desired range is [1:92]). It makes me excited, but when I change the date input to get other different year it return the last record. sometimes with error, and sometimes the error not appear. I include the lengthy code here so anyone can reproduce it. I wonder if the website burdened by repeated request or my pc getting muzzy. Kindly help.
get_sounding_data <- function(region = c("naconf", "samer", "pac", "nz", "ant",
"np", "europe", "africa", "seasia", "mideast"),
date,
from_hr = c("00", "12", "all"),
to_hr = c("00", "12", "all"),
station_number = 48615) {
# we use these pkgs (I removed the readr and dplyr dependencies)
suppressPackageStartupMessages({
require("xml2", quietly = TRUE)
require("httr", quietly = TRUE)
require("rvest", quietly = TRUE)
})
# validate region
region <- match.arg(
arg = region,
choices = c(
"naconf", "samer", "pac", "nz", "ant",
"np", "europe", "africa", "seasia", "mideast"
)
)
# this actually validates the date for us if it's a character string
date <- as.Date(date)
# get year and month
year <- as.integer(format(date, "%Y"))
stopifnot(year %in% 1973:as.integer(format(Sys.Date(), "%Y")))
year <- as.character(year)
month <- format(date, "%m")
# we need these to translate day & *_hr to the param the app needs
c(
"0100", "0112", "0200", "0212", "0300", "0312", "0400", "0412",
"0500", "0512", "0600", "0612", "0700", "0712", "0800", "0812",
"0900", "0912", "1000", "1012", "1100", "1112", "1200", "1212",
"1300", "1312", "1400", "1412", "1500", "1512", "1600", "1612",
"1700", "1712", "1800", "1812", "1900", "1912", "2000", "2012",
"2100", "2112", "2200", "2212", "2300", "2312", "2400", "2412",
"2500", "2512", "2600", "2612", "2700", "2712", "2800", "2812",
"2900", "2912", "3000", "3012", "3100", "3112"
) -> hr_vals
c(
"01/00Z", "01/12Z", "02/00Z", "02/12Z", "03/00Z", "03/12Z", "04/00Z",
"04/12Z", "05/00Z", "05/12Z", "06/00Z", "06/12Z", "07/00Z", "07/12Z",
"08/00Z", "08/12Z", "09/00Z", "09/12Z", "10/00Z", "10/12Z", "11/00Z",
"11/12Z", "12/00Z", "12/12Z", "13/00Z", "13/12Z", "14/00Z", "14/12Z",
"15/00Z", "15/12Z", "16/00Z", "16/12Z", "17/00Z", "17/12Z", "18/00Z",
"18/12Z", "19/00Z", "19/12Z", "20/00Z", "20/12Z", "21/00Z", "21/12Z",
"22/00Z", "22/12Z", "23/00Z", "23/12Z", "24/00Z", "24/12Z", "25/00Z",
"25/12Z", "26/00Z", "26/12Z", "27/00Z", "27/12Z", "28/00Z", "28/12Z",
"29/00Z", "29/12Z", "30/00Z", "30/12Z", "31/00Z", "31/12Z"
) -> hr_inputs
hr_trans <- stats::setNames(hr_vals, hr_inputs)
o_from_hr <- from_hr <- as.character(tolower(from_hr))
o_to_hr <- to_hr <- as.character(tolower(to_hr))
if ((from_hr == "all") || (to_hr == "all")) {
from_hr <- to_hr <- "all"
} else {
from_hr <- hr_trans[sprintf("%s/%02dZ", format(date, "%d"), as.integer(from_hr))]
match.arg(from_hr, hr_vals)
to_hr <- hr_trans[sprintf("%s/%02dZ", format(date, "%d"), as.integer(to_hr))]
match.arg(to_hr, hr_vals)
}
# clean up the station number if it was entered as a double
station_number <- as.character(as.integer(station_number))
# execute the API call
httr::GET(
url = "http://weather.uwyo.edu/cgi-bin/sounding",
query = list(
region = region,
TYPE = "TEXT:LIST",
YEAR = year,
MONTH = sprintf("%02d", as.integer(month)),
FROM = from_hr,
TO = to_hr,
STNM = station_number
)
) -> res
# check for super bad errors (that we can't handle nicely)
httr::stop_for_status(res)
# get the page content
doc <- httr::content(res, as="text")
# if the site reports no data, issue a warning and return an empty data frame
if (grepl("Can't get", doc)) {
doc <- xml2::read_html(doc)
msg <- rvest::html_nodes(doc, "body")
msg <- rvest::html_text(msg, trim=TRUE)
msg <- gsub("\n\n+.*$", "", msg)
warning(msg)
return(data.frame(stringsAsFactors=FALSE))
}
# turn it into something we can parse
doc <- xml2::read_html(doc)
# get the metadata
#meta <- rvest::html_node(doc, "h2")
#meta <- rvest::html_text(meta, trim=TRUE)
#attr(doc, "meta") <- meta
raw_dat <- doc %>%
html_nodes("pre + h3") %>%
html_text()
indices <- doc %>%
str_split(pattern = "\n", simplify = T) %>%
map_chr(str_squish) %>%
tibble(x = .) %>%
separate(x, into = c("Station", "Value"), sep = ": ") %>%
filter(!is.na(Value))
data <- tidyr::spread(indices, Station, Value)
data
}
startDate <- as.Date("01-11-1984", format="%d-%m-%Y")
endDate <- as.Date("04-11-1984",format="%d-%m-%Y")
#startDate <- as.Date("01-11-1984", format="%d-%m-%Y")
#endDate <- as.Date("31-01-1985",format="%d-%m-%Y")
days <- seq(startDate, endDate, "day")
#wanted to have [1:92], but its not working
lapply(days[1:4], function(day) {
get_sounding_data(
region = "seasia",
date = day,
from_hr = "00",
to_hr = "00",
station_number = "48615"
)
}) -> soundings_48615
warnings()
new_df <- map(soundings_48615, . %>% mutate_all(parse_guess))
#str(new_df)
library(tidyr)
library(tidyverse)
library(dplyr)
dat <- bind_rows(new_df)
dat <- dat %>% separate(col =`Observation time`, into = c('Date', 'time'), sep = '/')
dat$Date <- as.Date(dat$Date, format = "%y%m%d")
#save in text file
library(xlsx)
write.csv(dat, 'c:/Users/Hp/Documents/1984.csv')
get_sounding_data <- NULL
error
Error in bind_rows_(x, .id) :
Column `1000 hPa to 500 hPa thickness` can't be converted from numeric to character
dat <- dat %>% separate(col =`Observation time`, into = c('Date', 'time'), sep = '/')
Error in eval_tidy(enquo(var), var_env) :
object 'Observation time' not found
I've install different R version, but this error keep come out. So I ignore it.
Error: package or namespace load failed for ‘xlsx’:
.onLoad failed in loadNamespace() for 'rJava', details:
call: fun(libname, pkgname)
error: No CurrentVersion entry in Software/JavaSoft registry! Try re-
installing Java and make sure R and Java have matching architectures.

rbind fromJSON page: duplicate rowname error

I was trying to rbind some json data scraped from api
library(jsonlite)
pop_dat <- data.frame()
for (i in 1:3) {
# Generate url for each page
url <- paste0('http://api.worldbank.org/v2/countries/all/indicators/SP.POP.TOTL?format=json&page=',i)
# Get json data from each page and transform it into dataframe
dat <- as.data.frame(fromJSON(url)[2],flatten = TRUE, row.names = NULL)
pop_dat <- rbind(pop_dat, dat)
}
However, it returns the following error:
Error in row.names<-.data.frame(*tmp*, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘1’, ‘10’, ‘11’, ‘12’, ‘13’, ‘14’, ‘15’, ‘16’, ‘17’, ‘18’, ‘19’, ‘2’, ‘20’, ‘21’, ‘22’, ‘23’, ‘24’, ‘25’, ‘26’, ‘27’, ‘28’, ‘29’, ‘3’, ‘30’, ‘31’, ‘32’, ‘33’, ‘34’, ‘35’, ‘36’, ‘37’, ‘38’, ‘39’, ‘4’, ‘40’, ‘41’, ‘42’, ‘43’, ‘44’, ‘45’, ‘46’, ‘47’, ‘48’, ‘49’, ‘5’, ‘50’, ‘6’, ‘7’, ‘8’, ‘9’
Changing the row.names to null doesn't work. I heard from someone it is due to the fact that some data are stored as lists here, which I don't quite understand.
I understand that there is an alternative package WDI to access this data and it works well, but I want to know how to resolve the duplicates row name problem here in general so that I can deal with similar situation where no alternative package is available.
I heard from someone it is due to the fact that some data are stored as lists...
This is correct. The solution is fairly simple, but I find it really easy to get tripped up by this. Right now you're using:
dat <- as.data.frame(fromJSON(url)[2],flatten = TRUE, row.names = NULL)
The problem comes from fromJSON(url)[2]. This should be fromJSON(url)[[2]] instead. According to the documentation, the key difference between [ and [[ is a single bracket can select multiple elements whereas [[ selects only one.
You can see how this works with some fake data.
foo <- list(
a = rnorm(100),
b = rnorm(100),
c = rnorm(100)
)
With [, you can select multiple values inside this list.
foo[c("a", "b")]
length(foo["a"]) # Result is 1 not 100 like you might expect.
With [[ the results are different.
foo[[c("a", "b")]] # Raises a subscript error.
foo[["a"]] #This works.
length(foo[["a"]]) # Result is 100.
So, your answer will depend on which subset operator you're using. For your problem, you'll want to use [[ to select a single data.frame inside of the list. Then, you should be able to use rbind correctly.
final <- data.frame()
for (i in 1:10) {
url <- paste0(
'http://api.worldbank.org/v2/countries/all/indicators/SP.POP.TOTL?format=json&page=',
i
)
res <- jsonlite::fromJSON(url, flatten = TRUE)[[2]]
final <- rbind(final, res)
}
Alternative solution with lapply:
urls <- sprintf(
'http://api.worldbank.org/v2/countries/all/indicators/SP.POP.TOTL?format=json&page=%s',
1:10
)
resl <- lapply(urls, jsonlite::fromJSON, flatten = TRUE)
resl <- lapply(resl, "[[", 2) # Use lapply to select the 2 element from each list element.
resl <- do.call(rbind, resl) # This takes all the elements of the list and uses those elements as the arguments for rbind.

For loop - doesn't work properly

I'm trying to use Indeed API to search for specific jobs and I faced a problem when for loop doesn't go through each iterations.
Here is the example of code that I used:
original_url_1 <- "http://api.indeed.com/ads/apisearch?publisher=750330686195873&format=json&q="
original_url_2 <-"&l=Canada&sort=date&radius=10&st=&jt=&start=0&limit=25&fromage=3&filter=&latlong=1&co=ca&chnl=&userip=69.46.99.196&useragent=Mozilla/%2F4.0%28Firefox%29&v=2"
keywords <- c("data+scientist", "data+analyst")
for(i in keywords) {
url <- paste0(original_url_1,i,original_url_2)
x <- as.data.frame(jsonlite::fromJSON(httr::content(httr::GET(url),
as = "text", encoding = "UTF-8")))
data <- rbind(data, x)
}
Url leads to JSON file and adding one of the keyword to the url will change the JSON file. So I'm trying to repeat this for all keywords and store the result in the dataframe. However, when I'm trying to use more keywords I'm getting the result only for a few first keywords.
original_url_1 <- "http://api.indeed.com/ads/apisearch?publisher=750330686195873&format=json&q="
original_url_2 <-"&l=Canada&sort=date&radius=10&st=&jt=&start=0&limit=25&fromage=3&filter=&latlong=1&co=ca&chnl=&userip=69.46.99.196&useragent=Mozilla/%2F4.0%28Firefox%29&v=2"
keywords <- c("data_scientist", "data+analyst")
data<-data.table(NULL)#initialization of object
for(i in keywords) {
url <- paste0(Original_url_1,i,Original_url_2)
x <- as.data.frame(jsonlite::fromJSON(httr::content(httr::GET(url),as = "text", encoding = "UTF-8")))
data <- rbind(data, x)
}
>dim(data)
[1] 39 31
Here is the correct code:
original_url_1 <- "http://api.indeed.com/ads/apisearch?publisher=750330686195873&format=json&q="
original_url_2 <-"&l=Canada&sort=date&radius=10&st=&jt=&start=0&limit=25&fromage=3&filter=&latlong=1&co=ca&chnl=&userip=69.46.99.196&useragent=Mozilla/%2F4.0%28Firefox%29&v=2"
keywords <- c("data+scientist", "data+analyst")
data <- data.frame()
for (i in keywords) {
tryCatch({url <- paste0(original_url_1,i,original_url_2)
x <- as.data.frame(jsonlite::fromJSON(httr::content(httr::GET(url),
as = "text", encoding = "UTF-8")))
data <- rbind(data, x)
}, error = function(t){})
}

Loading json encoded log files into R for analysis

I have a log file with each line a json-encoded entry:
{"requestId":"5550d","partnerId":false,"ip":"170.158.3.1", ... }
I tried reading each line, json_decode and then append to the dataframe:
loadLogs <- function(fileName) {
conn <- file(fileName, "r", blocking = FALSE)
linn <- readLines(conn)
long <- length(linn)
df = data.frame(requestId = character(0),
partnerId = character(0),
ip = character(0)
)
for (i in 1:long) {
jsonRow <- fromJSON(linn[i])
df <- rbind(df, data.frame(requestId = jsonRow$requestId,
partnerId = as.character(jsonRow$partnerId),
ip = jsonRow$ip
}
close(conn)
return(df)
}
The above code is extremely slow for large files though. Is there any way to speed this up? A few options I can think of at the moment:
pre-allocate the dataframe as its copying the entire data for a new append
using apply function on the json_decode
???
How would I do (1) and (2) in R? I'm new.
Thanks for looking at my question.