Replace Quotation in List of Lists R - json

I am trying to get a JSON response from an API:
test <- GET(url, add_headers(`api_key` = key))
content(test, 'parsed')
When I run content(test, 'parsed'), I get the following error:
# Error: lexical error: invalid string in json text. .Note: Final passage of the "fiscal cliff bill" on January 1
I think this is because of the double quotations. How can I either replace the double quotes or if this is not the problem, how can I fix this issue?
Thanks!

So I had run into a similar problem before, and I had intended to write a quite function to use Jeroen's fix to try to repair the JSON. Since I intended to do it anyway, here's a quick hack attempt.
NB: repairing a structured format like this is speculative at best and most certainly prone to errors. The good news is that I tried to keep this specific enough so that it will not produce false results: it'll either fix what it knows it can, or fail. The "unit-testing" really needs to check other corner-cases. If you find something that this does not fix (and should) or that this breaks (gasp!), please comment!
fix_json_quotes <- function(s) {
if (length(s) != 1) {
warning("the argument has length > 1 and only the first element will be used")
s <- s[[1]]
}
stopifnot(is.character(s))
val <- jsonlite::validate(s)
while (! val) {
ind <- attr(val, "offset") - 1
snew <- gsub("(.*)(['\"])([[:space:],]*)$", "\\1\\\\\\2\\3", substr(s, 1, ind))
if (snew != substr(s, 1, ind)) {
s <- paste0(snew, substr(s, ind + 1, nchar(s)))
} else {
break
}
val <- jsonlite::validate(s)
}
if (! val) {
# still not validating
stop("unable to fix quotes")
}
return(s)
}
Some sample data, unit-testing if you will (testthat is not required for use of the function):
library(testthat)
lst <- list(a="final \"cliff bill\" on")
json <- as.character(toJSON(lst))
json
# [1] "{\"a\":[\"final \\\"cliff bill\\\" on\"]}"
Okay, there should be no change:
expect_equal(json, fix_json_quotes(json))
Some bad data:
# un-escape the double quotes
badlst <- "{\"a\":[\"final \"cliff bill\" on\"]}"
expect_error(jsonlite::fromJSON(badlst))
expect_equal(json, fix_json_quotes(badlst))
PS: this looks specifically for double-quotes, nothing more. However, I believe that there are related errors that this might also be able to fix. I "left room" for this, in the second group within the regex (([\"])); for example, if single-quotes could also cause a problem, then the group could be changed to be ([\"']). I don't know if it's useful or even necessary.

Related

How do I match a CSV-style quoted string in nom?

A CSV style quoted string, for the purposes of this question, is a string in which:
The string starts and ends with exactly one ".
Two double quotes inside the string are collapsed to one double quote. "Alo""ha"→Alo"ha.
"" on its own is an empty string.
Error inputs, such as "A""" e", cannot be parsed. It's an A", followed by junk e".
I've tried several things, none of which have worked fully.
The closest I've gotten, thanks to some help from user pinkieval in #nom on the Mozilla IRC:
use std::error as stderror; /* Avoids needing nightly to compile */
named!(csv_style_string<&str, String>, map_res!(
terminated!(tag!("\""), not!(peek!(char!('"')))),
csv_string_to_string
));
fn csv_string_to_string(s: &str) -> Result<String, Box<stderror::Error>> {
Ok(s.to_string().replace("\"\"", "\""))
}
This does not catch the end of the string correctly.
I've also attempted to use the re_match! macro with r#""([^"]|"")*""#, but that always results in an Err::Incomplete(1).
I've determined that the given CSV example for Nom 1.0 doesn't work for a quoted CSV string as I'm describing it, but I do know implementations differ.
Here is one way of doing it:
use nom::types::CompleteStr;
use nom::*;
named!(csv_style_string<CompleteStr, String>,
delimited!(
char!('"'),
map!(
many0!(
alt!(
// Eat a " delimiter and the " that follows it
tag!("\"\"") => { |_| '"' }
| // Normal character
none_of!("\"")
)
),
// Make a string from a vector of chars
|v| v.iter().collect::<String>()
),
char!('"')
)
);
fn main() {
println!(r#""Alo\"ha" = {:?}"#, csv_style_string(CompleteStr(r#""Alo""ha""#)));
println!(r#""" = {:?}"#, csv_style_string(CompleteStr(r#""""#)));
println!(r#"bad format: {:?}"#, csv_style_string(CompleteStr(r#""A""" e""#)));
}
(I wrote it in full nom, but a solution like yours, based on an external function instead of map!() each character, would work too, and may be more efficient.)
The magic here, that would also solve your regexp issue, is to use CompleteStr. This basically tells nom that nothing will come after that input (otherwise, nom assumes you're doing a streaming parser, so more input may follow).
This is needed because we need to know what to do with a " if it is the last character fed to nom. Depending on the character that comes after it (another ", a normal character, or EOF), we have to take a different decision -- hence the Incomplete result, meaning nom does not have enough input to make the decision. Telling nom that EOF comes next solves this indecision.
Further reading on Incomplete on nom's author's blog: http://unhandledexpression.com/general/2018/05/14/nom-4-0-faster-safer-simpler-parsers.html#dealing-with-incomplete-usage
You may note that this parser does not actually rejects the invalid input, but parses the beginning and returns the rest. If you use this parser as a subparser in another parser, the latter would then feed the remainder to the next subparser, which would crash as well (because it would expect a comma), causing the overall parser to fail.
If you don't want that, you could make csv_style_string match peek!(alt!(char!(',')|char!('\n")|eof!())).

Convert R data.frame to multilevel JSON

I have a periodic process in R that yields me a data.frame.
I want to use this data.frame to create a dropdown selector with AngularJS.
My final data.frame will look more or less as follows (my real example might have a deeper hierarchical structure):
DF<-data.frame(hie1=c(rep("Cl1",2),"Cl2"),hie2=c("Cl1op1","Cl1op2","Clop1"),
hie3=c("/first.html","/second.html","/third.html"))
I need to convert that data.frame into a JSON with the following structure :
{
"Cl1":{"Cl1op1":"/first.html","Cl1op2": "/second.html"},
"Cl2":{"Cl2op1":"/third.html"}
}
So far, I have tried all the toJSON commands of the rjson and RJSONIO packages for the data.frame with and without column names:
library(rjson)
#library(RJSONIO)
DF2<-DF
colnames(DF2)<-NULL
cat(toJSON(DF))
cat(toJSON(DF2))
I thought about using reshape2's dcast function beforeusing toJSON, but I do not know what kind of structure I need to achieve my goal.
I also used the functions toJSON2 an toJSONArray from the rCharts with no success.
Is there an appropriate transformation in R to get the output I am looking for?
P.S. (I do not mind having [] instead of {})
EDIT:
I have created a couple of functions (included below) to fulfil my needs.
However, they are not too clean and I believe that there must be a better way to perform this transformation in R.
I keep this question open expecting a better solution.
linktwo<-function(V){
paste0(sapply(V,function(x) paste0("'",toString(x),"'")),collapse=":")
}
pastehier<-function(DF){
if(ncol(DF)==2){
return(paste0(apply(DF,1,linktwo),collapse=","))
}else{
u<-unique(DF[,1])
output=character()
for(i in u){
output<-append(output,paste0(paste0("'",i,"'"),":{",pastehier(DF[DF[,1]==i,-1]),
"}"))
}
return(paste0(output,collapse=","))
}
}
pastehier(DF)
I do not fully understand your request and maybe my solution is useless, but here is a try:
library(reshape2)
prova <- dcast(DF, hie1 ~ ... )
toJSON(prova, pretty = TRUE)
[
{
"hie1": "Cl1",
"Cl1op1": "/first.html",
"Cl1op2": "/second.html"
},
{
"hie1": "Cl2",
"Clop1": "/third.html"
}
]
where:
> prova
hie1 Cl1op1 Cl1op2 Clop1
1 Cl1 /first.html /second.html <NA>
2 Cl2 <NA> <NA> /third.html

Strange behaviour in fromJSON in RJSONIO package

Ok, I'm trying to convert the following JSON data into an R data frame.
For some reason fromJSON in the RJSONIO package only reads up to about character 380 and then it stops converting the JSON properly.
Here is the JSON:-
"{\"metricDate\":\"2013-05-01\",\"pageCountTotal\":\"33682\",\"landCountTotal\":\"11838\",\"newLandCountTotal\":\"8023\",\"returnLandCountTotal\":\"3815\",\"spiderCountTotal\":\"84\",\"goalCountTotal\":\"177.000000\",\"callGoalCountTotal\":\"177.000000\",\"callCountTotal\":\"237.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.50\",\"callConversionPerc\":\"74.68\"}\n{\"metricDate\":\"2013-05-02\",\"pageCountTotal\":\"32622\",\"landCountTotal\":\"11626\",\"newLandCountTotal\":\"7945\",\"returnLandCountTotal\":\"3681\",\"spiderCountTotal\":\"58\",\"goalCountTotal\":\"210.000000\",\"callGoalCountTotal\":\"210.000000\",\"callCountTotal\":\"297.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.81\",\"callConversionPerc\":\"70.71\"}\n{\"metricDate\":\"2013-05-03\",\"pageCountTotal\":\"28467\",\"landCountTotal\":\"11102\",\"newLandCountTotal\":\"7786\",\"returnLandCountTotal\":\"3316\",\"spiderCountTotal\":\"56\",\"goalCountTotal\":\"186.000000\",\"callGoalCountTotal\":\"186.000000\",\"callCountTotal\":\"261.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.68\",\"callConversionPerc\":\"71.26\"}\n{\"metricDate\":\"2013-05-04\",\"pageCountTotal\":\"20884\",\"landCountTotal\":\"9031\",\"newLandCountTotal\":\"6670\",\"returnLandCountTotal\":\"2361\",\"spiderCountTotal\":\"51\",\"goalCountTotal\":\"7.000000\",\"callGoalCountTotal\":\"7.000000\",\"callCountTotal\":\"44.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.08\",\"callConversionPerc\":\"15.91\"}\n{\"metricDate\":\"2013-05-05\",\"pageCountTotal\":\"20481\",\"landCountTotal\":\"8782\",\"newLandCountTotal\":\"6390\",\"returnLandCountTotal\":\"2392\",\"spiderCountTotal\":\"58\",\"goalCountTotal\":\"1.000000\",\"callGoalCountTotal\":\"1.000000\",\"callCountTotal\":\"8.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.01\",\"callConversionPerc\":\"12.50\"}\n{\"metricDate\":\"2013-05-06\",\"pageCountTotal\":\"25175\",\"landCountTotal\":\"10019\",\"newLandCountTotal\":\"7082\",\"returnLandCountTotal\":\"2937\",\"spiderCountTotal\":\"62\",\"goalCountTotal\":\"24.000000\",\"callGoalCountTotal\":\"24.000000\",\"callCountTotal\":\"47.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.24\",\"callConversionPerc\":\"51.06\"}\n{\"metricDate\":\"2013-05-07\",\"pageCountTotal\":\"35892\",\"landCountTotal\":\"12615\",\"newLandCountTotal\":\"8391\",\"returnLandCountTotal\":\"4224\",\"spiderCountTotal\":\"62\",\"goalCountTotal\":\"239.000000\",\"callGoalCountTotal\":\"239.000000\",\"callCountTotal\":\"321.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.89\",\"callConversionPerc\":\"74.45\"}\n{\"metricDate\":\"2013-05-08\",\"pageCountTotal\":\"34106\",\"landCountTotal\":\"12391\",\"newLandCountTotal\":\"8389\",\"returnLandCountTotal\":\"4002\",\"spiderCountTotal\":\"90\",\"goalCountTotal\":\"221.000000\",\"callGoalCountTotal\":\"221.000000\",\"callCountTotal\":\"295.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.78\",\"callConversionPerc\":\"74.92\"}\n{\"metricDate\":\"2013-05-09\",\"pageCountTotal\":\"32721\",\"landCountTotal\":\"12447\",\"newLandCountTotal\":\"8541\",\"returnLandCountTotal\":\"3906\",\"spiderCountTotal\":\"54\",\"goalCountTotal\":\"207.000000\",\"callGoalCountTotal\":\"207.000000\",\"callCountTotal\":\"280.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.66\",\"callConversionPerc\":\"73.93\"}\n{\"metricDate\":\"2013-05-10\",\"pageCountTotal\":\"29724\",\"landCountTotal\":\"11616\",\"newLandCountTotal\":\"8063\",\"returnLandCountTotal\":\"3553\",\"spiderCountTotal\":\"139\",\"goalCountTotal\":\"207.000000\",\"callGoalCountTotal\":\"207.000000\",\"callCountTotal\":\"301.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.78\",\"callConversionPerc\":\"68.77\"}\n{\"metricDate\":\"2013-05-11\",\"pageCountTotal\":\"22061\",\"landCountTotal\":\"9660\",\"newLandCountTotal\":\"6971\",\"returnLandCountTotal\":\"2689\",\"spiderCountTotal\":\"52\",\"goalCountTotal\":\"3.000000\",\"callGoalCountTotal\":\"3.000000\",\"callCountTotal\":\"40.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.03\",\"callConversionPerc\":\"7.50\"}\n{\"metricDate\":\"2013-05-12\",\"pageCountTotal\":\"23341\",\"landCountTotal\":\"9935\",\"newLandCountTotal\":\"6960\",\"returnLandCountTotal\":\"2975\",\"spiderCountTotal\":\"45\",\"goalCountTotal\":\"0.000000\",\"callGoalCountTotal\":\"0.000000\",\"callCountTotal\":\"12.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.00\",\"callConversionPerc\":\"0.00\"}\n{\"metricDate\":\"2013-05-13\",\"pageCountTotal\":\"36565\",\"landCountTotal\":\"13583\",\"newLandCountTotal\":\"9277\",\"returnLandCountTotal\":\"4306\",\"spiderCountTotal\":\"69\",\"goalCountTotal\":\"246.000000\",\"callGoalCountTotal\":\"246.000000\",\"callCountTotal\":\"324.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.81\",\"callConversionPerc\":\"75.93\"}\n{\"metricDate\":\"2013-05-14\",\"pageCountTotal\":\"35260\",\"landCountTotal\":\"13797\",\"newLandCountTotal\":\"9375\",\"returnLandCountTotal\":\"4422\",\"spiderCountTotal\":\"59\",\"goalCountTotal\":\"212.000000\",\"callGoalCountTotal\":\"212.000000\",\"callCountTotal\":\"283.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.54\",\"callConversionPerc\":\"74.91\"}\n{\"metricDate\":\"2013-05-15\",\"pageCountTotal\":\"35836\",\"landCountTotal\":\"13792\",\"newLandCountTotal\":\"9532\",\"returnLandCountTotal\":\"4260\",\"spiderCountTotal\":\"94\",\"goalCountTotal\":\"187.000000\",\"callGoalCountTotal\":\"187.000000\",\"callCountTotal\":\"258.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.36\",\"callConversionPerc\":\"72.48\"}\n{\"metricDate\":\"2013-05-16\",\"pageCountTotal\":\"33136\",\"landCountTotal\":\"12821\",\"newLandCountTotal\":\"8755\",\"returnLandCountTotal\":\"4066\",\"spiderCountTotal\":\"65\",\"goalCountTotal\":\"192.000000\",\"callGoalCountTotal\":\"192.000000\",\"callCountTotal\":\"260.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.50\",\"callConversionPerc\":\"73.85\"}\n{\"metricDate\":\"2013-05-17\",\"pageCountTotal\":\"29564\",\"landCountTotal\":\"11721\",\"newLandCountTotal\":\"8191\",\"returnLandCountTotal\":\"3530\",\"spiderCountTotal\":\"213\",\"goalCountTotal\":\"166.000000\",\"callGoalCountTotal\":\"166.000000\",\"callCountTotal\":\"222.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.42\",\"callConversionPerc\":\"74.77\"}\n{\"metricDate\":\"2013-05-18\",\"pageCountTotal\":\"23686\",\"landCountTotal\":\"9916\",\"newLandCountTotal\":\"7335\",\"returnLandCountTotal\":\"2581\",\"spiderCountTotal\":\"56\",\"goalCountTotal\":\"5.000000\",\"callGoalCountTotal\":\"5.000000\",\"callCountTotal\":\"34.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.05\",\"callConversionPerc\":\"14.71\"}\n{\"metricDate\":\"2013-05-19\",\"pageCountTotal\":\"23528\",\"landCountTotal\":\"9952\",\"newLandCountTotal\":\"7184\",\"returnLandCountTotal\":\"2768\",\"spiderCountTotal\":\"57\",\"goalCountTotal\":\"1.000000\",\"callGoalCountTotal\":\"1.000000\",\"callCountTotal\":\"14.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.01\",\"callConversionPerc\":\"7.14\"}\n{\"metricDate\":\"2013-05-20\",\"pageCountTotal\":\"37391\",\"landCountTotal\":\"13488\",\"newLandCountTotal\":\"9024\",\"returnLandCountTotal\":\"4464\",\"spiderCountTotal\":\"69\",\"goalCountTotal\":\"227.000000\",\"callGoalCountTotal\":\"227.000000\",\"callCountTotal\":\"291.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.68\",\"callConversionPerc\":\"78.01\"}\n{\"metricDate\":\"2013-05-21\",\"pageCountTotal\":\"36299\",\"landCountTotal\":\"13174\",\"newLandCountTotal\":\"8817\",\"returnLandCountTotal\":\"4357\",\"spiderCountTotal\":\"77\",\"goalCountTotal\":\"164.000000\",\"callGoalCountTotal\":\"164.000000\",\"callCountTotal\":\"221.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.24\",\"callConversionPerc\":\"74.21\"}\n{\"metricDate\":\"2013-05-22\",\"pageCountTotal\":\"34201\",\"landCountTotal\":\"12433\",\"newLandCountTotal\":\"8388\",\"returnLandCountTotal\":\"4045\",\"spiderCountTotal\":\"76\",\"goalCountTotal\":\"195.000000\",\"callGoalCountTotal\":\"195.000000\",\"callCountTotal\":\"262.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.57\",\"callConversionPerc\":\"74.43\"}\n{\"metricDate\":\"2013-05-23\",\"pageCountTotal\":\"32951\",\"landCountTotal\":\"11611\",\"newLandCountTotal\":\"7757\",\"returnLandCountTotal\":\"3854\",\"spiderCountTotal\":\"68\",\"goalCountTotal\":\"167.000000\",\"callGoalCountTotal\":\"167.000000\",\"callCountTotal\":\"231.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.44\",\"callConversionPerc\":\"72.29\"}\n{\"metricDate\":\"2013-05-24\",\"pageCountTotal\":\"28967\",\"landCountTotal\":\"10821\",\"newLandCountTotal\":\"7396\",\"returnLandCountTotal\":\"3425\",\"spiderCountTotal\":\"106\",\"goalCountTotal\":\"167.000000\",\"callGoalCountTotal\":\"167.000000\",\"callCountTotal\":\"203.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.54\",\"callConversionPerc\":\"82.27\"}\n{\"metricDate\":\"2013-05-25\",\"pageCountTotal\":\"19741\",\"landCountTotal\":\"8393\",\"newLandCountTotal\":\"6168\",\"returnLandCountTotal\":\"2225\",\"spiderCountTotal\":\"78\",\"goalCountTotal\":\"0.000000\",\"callGoalCountTotal\":\"0.000000\",\"callCountTotal\":\"28.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.00\",\"callConversionPerc\":\"0.00\"}\n{\"metricDate\":\"2013-05-26\",\"pageCountTotal\":\"19770\",\"landCountTotal\":\"8237\",\"newLandCountTotal\":\"6009\",\"returnLandCountTotal\":\"2228\",\"spiderCountTotal\":\"79\",\"goalCountTotal\":\"0.000000\",\"callGoalCountTotal\":\"0.000000\",\"callCountTotal\":\"8.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.00\",\"callConversionPerc\":\"0.00\"}\n{\"metricDate\":\"2013-05-27\",\"pageCountTotal\":\"26208\",\"landCountTotal\":\"9755\",\"newLandCountTotal\":\"6779\",\"returnLandCountTotal\":\"2976\",\"spiderCountTotal\":\"82\",\"goalCountTotal\":\"26.000000\",\"callGoalCountTotal\":\"26.000000\",\"callCountTotal\":\"40.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.27\",\"callConversionPerc\":\"65.00\"}\n{\"metricDate\":\"2013-05-28\",\"pageCountTotal\":\"36980\",\"landCountTotal\":\"12463\",\"newLandCountTotal\":\"8226\",\"returnLandCountTotal\":\"4237\",\"spiderCountTotal\":\"132\",\"goalCountTotal\":\"208.000000\",\"callGoalCountTotal\":\"208.000000\",\"callCountTotal\":\"276.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.67\",\"callConversionPerc\":\"75.36\"}\n{\"metricDate\":\"2013-05-29\",\"pageCountTotal\":\"34190\",\"landCountTotal\":\"12014\",\"newLandCountTotal\":\"8279\",\"returnLandCountTotal\":\"3735\",\"spiderCountTotal\":\"90\",\"goalCountTotal\":\"179.000000\",\"callGoalCountTotal\":\"179.000000\",\"callCountTotal\":\"235.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.49\",\"callConversionPerc\":\"76.17\"}\n{\"metricDate\":\"2013-05-30\",\"pageCountTotal\":\"33867\",\"landCountTotal\":\"11965\",\"newLandCountTotal\":\"8231\",\"returnLandCountTotal\":\"3734\",\"spiderCountTotal\":\"63\",\"goalCountTotal\":\"160.000000\",\"callGoalCountTotal\":\"160.000000\",\"callCountTotal\":\"219.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.34\",\"callConversionPerc\":\"73.06\"}\n{\"metricDate\":\"2013-05-31\",\"pageCountTotal\":\"27536\",\"landCountTotal\":\"10302\",\"newLandCountTotal\":\"7333\",\"returnLandCountTotal\":\"2969\",\"spiderCountTotal\":\"108\",\"goalCountTotal\":\"173.000000\",\"callGoalCountTotal\":\"173.000000\",\"callCountTotal\":\"226.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.68\",\"callConversionPerc\":\"76.55\"}\n\r\n"
and here is my R output
metricDate
"2013-05-01"
pageCountTotal
"33682"
landCountTotal
"11838"
newLandCountTotal
"8023"
returnLandCountTotal
"3815"
spiderCountTotal
"84"
goalCountTotal
"177.000000"
callGoalCountTotal
"177.000000"
callCountTotal
"237.000000"
onlineGoalCountTotal
"0.000000"
conversionPerc
"1.50"
callConversionPerc
"74.68\"}{\"metricDate\":\"2013-05-02\",\"pageCountTotal\":\"32622\",\"landCountTotal\":\"11626\",\"newLandCountTotal\":\"7945\",\"returnLandCountTotal\":\"3681\",\"spiderCountTotal\":\"58\",\"goalCountTotal\":\"210.000000\",\"callGoalCountTotal\":\"210.000000\",\"callCountTotal\":\"297.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.81\",\"callConversionPerc\":\"70.71\"}{\"metricDate\":\"2013-05-03\",\"pageCountTotal\":\"28467\",\"landCountTotal\":\"11102\",\"newLandCountTotal\":\"7786\",\"returnLandCountTotal\":\"3316\",\"spiderCountTotal\":\"56\",\"goalCountTotal\":\"186.000000\",\"callGoalCountTotal\":\"186.000000\",\"callCountTotal\":\"261.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.68\",\"callConversionPerc\":\"71.26\"}{\"metricDate\":\"2013-05-04\",\"pageCountTotal\":\"20884\",\"landCountTotal\":\"9031\",\"newLandCountTotal\":\"6670\",\"returnLandCountTotal\":\"2361\",\"spiderCountTotal\":\"51\",\"goalCountTotal\":\"7.000000\",\"callGoalCountTotal\":\"7.000000\",\"callCountTotal\":\"44.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.08\",\"callConversionPerc\":\"15.91\"}{\"metricDate\":\"2013-05-05\",\"pageCountTotal\":\"20481\",\"landCountTotal\":\"8782\",\"newLandCountTotal\":\"6390\",\"returnLandCountTotal\":\"2392\",\"spiderCountTotal\":\"58\",\"goalCountTotal\":\"1.000000\",\"callGoalCountTotal\":\"1.000000\",\"callCountTotal\":\"8.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.01\",\"callConversionPerc\":\"12.50\"}{\"metricDate\":\"2013-05-06\",\"pageCountTotal\":\"25175\",\"landCountTotal\":\"10019\",\"newLandCountTotal\":\"7082\",\"returnLandCountTotal\":\"2937\",\"spiderCountTotal\":\"62\",\"goalCountTotal\":\"24.000000\",\"callGoalCountTotal\":\"24.000000\",\"callCountTotal\":\"47.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.24\",\"callConversionPerc\":\"51.06\"}{\"metricDate\":\"2013-05-07\",\"pageCountTotal\":\"35892\",\"landCountTotal\":\"12615\",\"newLandCountTotal\":\"8391\",\"returnLandCountTotal\":\"4224\",\"spiderCountTotal\":\"62\",\"goalCountTotal\":\"239.000000\",\"callGoalCountTotal\":\"239.000000\",\"callCountTotal\":\"321.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.89\",\"callConversionPerc\":\"74.45\"}{\"metricDate\":\"2013-05-08\",\"pageCountTotal\":\"34106\",\"landCountTotal\":\"12391\",\"newLandCountTotal\":\"8389\",\"returnLandCountTotal\":\"4002\",\"spiderCountTotal\":\"90\",\"goalCountTotal\":\"221.000000\",\"callGoalCountTotal\":\"221.000000\",\"callCountTotal\":\"295.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.78\",\"callConversionPerc\":\"74.92\"}{\"metricDate\":\"2013-05-09\",\"pageCountTotal\":\"32721\",\"landCountTotal\":\"12447\",\"newLandCountTotal\":\"8541\",\"returnLandCountTotal\":\"3906\",\"spiderCountTotal\":\"54\",\"goalCountTotal\":\"207.000000\",\"callGoalCountTotal\":\"207.000000\",\"callCountTotal\":\"280.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.66\",\"callConversionPerc\":\"73.93\"}{\"metricDate\":\"2013-05-10\",\"pageCountTotal\":\"29724\",\"landCountTotal\":\"11616\",\"newLandCountTotal\":\"8063\",\"returnLandCountTotal\":\"3553\",\"spiderCountTotal\":\"139\",\"goalCountTotal\":\"207.000000\",\"callGoalCountTotal\":\"207.000000\",\"callCountTotal\":\"301.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.78\",\"callConversionPerc\":\"68.77\"}{\"metricDate\":\"2013-05-11\",\"pageCountTotal\":\"22061\",\"landCountTotal\":\"9660\",\"newLandCountTotal\":\"6971\",\"returnLandCountTotal\":\"2689\",\"spiderCountTotal\":\"52\",\"goalCountTotal\":\"3.000000\",\"callGoalCountTotal\":\"3.000000\",\"callCountTotal\":\"40.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.03\",\"callConversionPerc\":\"7.50\"}{\"metricDate\":\"2013-05-12\",\"pageCountTotal\":\"23341\",\"landCountTotal\":\"9935\",\"newLandCountTotal\":\"6960\",\"returnLandCountTotal\":\"2975\",\"spiderCountTotal\":\"45\",\"goalCountTotal\":\"0.000000\",\"callGoalCountTotal\":\"0.000000\",\"callCountTotal\":\"12.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.00\",\"callConversionPerc\":\"0.00\"}{\"metricDate\":\"2013-05-13\",\"pageCountTotal\":\"36565\",\"landCountTotal\":\"13583\",\"newLandCountTotal\":\"9277\",\"returnLandCountTotal\":\"4306\",\"spiderCountTotal\":\"69\",\"goalCountTotal\":\"246.000000\",\"callGoalCountTotal\":\"246.000000\",\"callCountTotal\":\"324.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.81\",\"callConversionPerc\":\"75.93\"}{\"metricDate\":\"2013-05-14\",\"pageCountTotal\":\"35260\",\"landCountTotal\":\"13797\",\"newLandCountTotal\":\"9375\",\"returnLandCountTotal\":\"4422\",\"spiderCountTotal\":\"59\",\"goalCountTotal\":\"212.000000\",\"callGoalCountTotal\":\"212.000000\",\"callCountTotal\":\"283.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.54\",\"callConversionPerc\":\"74.91\"}{\"metricDate\":\"2013-05-15\",\"pageCountTotal\":\"35836\",\"landCountTotal\":\"13792\",\"newLandCountTotal\":\"9532\",\"returnLandCountTotal\":\"4260\",\"spiderCountTotal\":\"94\",\"goalCountTotal\":\"187.000000\",\"callGoalCountTotal\":\"187.000000\",\"callCountTotal\":\"258.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.36\",\"callConversionPerc\":\"72.48\"}{\"metricDate\":\"2013-05-
(I've truncated the output a little).
The R output has been read properly up until "callConversionPerc" and after that the JSON parsing seems to break. Is there some default parameter that I've missed that could couse this behaviour? I have checked for unmasked speechmarks and anything obvious like that I didn't see any.
Surely it wouldn't be the new line operator that occurs shortly after, would it?
EDIT: So this does appear to be a new line issue.
Here's another 'JSON' string I've pulled into R, again the double quote marks are all escaped
"{\"modelId\":\"7\",\"igrp\":\"1\",\"modelName\":\"Equally Weighted\",\"modelType\":\"spread\",\"status\":200,\"matchCriteria\":\"\",\"lookbackDays\":90}\n{\"modelId\":\"416\",\"igrp\":\"1\",\"modelName\":\"First and Last Click Weighted \",\"modelType\":\"spread\",\"status\":200,\"matchCriteria\":\"\",\"lookbackDays\":90,\"firstWeight\":3,\"lastWeight\":3}\n{\"modelId\":\"5\",\"igrp\":\"1\",\"modelName\":\"First Click\",\"modelType\":\"first\",\"status\":200,\"matchCriteria\":\"\",\"lookbackDays\":90}\n{\"modelId\":\"8\",\"igrp\":\"1\",\"modelName\":\"First Click Weighted\",\"modelType\":\"spread\",\"status\":200,\"matchCriteria\":\"\",\"lookbackDays\":90,\"firstWeight\":3}\n{\"modelId\":\"128\",\"igrp\":\"1\",\"modelName\":\"First Click Weighted across PPC\",\"modelType\":\"spread\",\"status\":200,\"matchCriteria\":\"\",\"lookbackDays\":90,\"firstWeight\":3,\"channelsMode\":\"include\",\"channels\":[5]}\n{\"modelId\":\"6\",\"igrp\":\"1\",\"modelName\":\"Last Click\",\"modelType\":\"last\",\"status\":200,\"matchCriteria\":\"\",\"lookbackDays\":90}\n{\"modelId\":\"417\",\"igrp\":\"1\",\"modelName\":\"Last Click Weighted \",\"modelType\":\"spread\",\"status\":200,\"matchCriteria\":\"\",\"lookbackDays\":90,\"lastWeight\":3}\n\r\n"
When I try to parse this using fromJSON I get the same problem, it gets to the last term on the first line and then stop parsing properly. Note that in this new case the output is slightly different from before returning NULL for the last item (instead of the messy string from the previous example.
$modelId
[1] "7"
$igrp
[1] "1"
$modelName
[1] "Equally Weighted"
$modelType
[1] "spread"
$status
[1] 200
$matchCriteria
[1] ""
$lookbackDays
NULL
As you can see, the components now use the "$" convention as if they are naming components and the last item is null.
I am wondering if this is to do with the way that fromJSON is parsing the strings, and when it is asked to create a variable with the same name as a variable that already exists it then fails and just returns a string or a NULL.
I would have thought that dealing with that sort of case would be coded into RJSONIO as it's pretty standard for JSON data to have repeating names.
I'm stumped as to how to fix this.
There are two aspects of the JSON that seem to be causing trouble. The first is the trailing "\n\r\n", so get rid of that
contJSON = sub("\n\r\n$, "", contJSON)
The second is that the string is actually a series of valid JSON lines rather than a single JSON object. So either split it into valid JSON objects and process each individually
lapply(strsplit(contJSON, "\n"), fromJSON, asText=TRUE)
or create a string representing a single valid JSON object and process that
fromJSON(sprintf("[%s]", gsub("\n", ",", contJSON)), asText=TRUE)
Both of these rely on details of the data so are not generally useful.
It's clear that asText is an argument for fromJSON
> args(RJSONIO::fromJSON)
function (content, handler = NULL, default.size = 100, depth = 150L,
allowComments = TRUE, asText = isContent(content), data = NULL,
maxChar = c(0L, nchar(content)), simplify = Strict, nullValue = NULL,
simplifyWithNames = TRUE, encoding = NA_character_, stringFun = NULL,
...)
NULL
So if R is complaining about an unused parameter it's likely that you're actually accessing a different function, in particular rjson::fromJSON. Perhaps search() shows that rjson appears before RJSONIO?

Convert character to html in R

What's the prefered way in R to convert a character (vector) containing non-ASCII characters to html? I would for example like to convert
"ü"
to
"ü"
I am aware that this is possible by a clever use of gsub (but has anyone doen it once and for all?) and I thought that the package R2HTML would do that, but it doesn't.
EDIT: Here is what I ended up using; it can obviously be extended by modifying the dictionary:
char2html <- function(x){
dictionary <- data.frame(
symbol = c("ä","ö","ü","Ä", "Ö", "Ü", "ß"),
html = c("ä","ö", "ü","Ä",
"Ö", "Ü","ß"))
for(i in 1:dim(dictionary)[1]){
x <- gsub(dictionary$symbol[i],dictionary$html[i],x)
}
x
}
x <- c("Buschwindröschen", "Weißdorn")
char2html(x)
This question is pretty old but I couldn't find any straightforward answer... So I came up with this simple function which uses the numerical html codes and works for LATIN 1 - Supplement (integer values 161 to 255). There's probably (certainly?) a function in some package that does it more thoroughly, but what follows is probably good enough for many applications...
conv_latinsupp <- function(...) {
out <- character()
for (s in list(...)) {
splitted <- unlist(strsplit(s, ""))
intvalues <- utf8ToInt(enc2utf8(s))
pos_to_modify <- which(intvalues >=161 & intvalues <= 255)
splitted[pos_to_modify] <- paste0("&#0", intvalues[pos_to_modify], ";")
out <- c(out, paste0(splitted, collapse = ""))
}
out
}
conv_latinsupp("aeiou", "àéïôù12345")
## [1] "aeiou" "àéïôù12345"
The XML uses a method insertEntities for this, but that method is internal. So you may use it at your own risk, as there are no guarantees that it will remain to operate like this in future versions.
Right now, your code could be accomplished using
char2html <- function(x) XML:::insertEntities(x, c("ä"="auml", "ö"="ouml", …))
The use of a named list instead of a data.frame feels kind of elegant, but doesn't change the core of things. Under the hood, insertEntities calls gsub in much the same way your code does.
If numeric HTML entities are valid in your environment, then you could probably convert all your text into those using utf8ToInt and then turn safely printable ASCII characters back into unescaped form. This would save you the trouble of maintaining a dictionary for your entities.

How to organize big R functions?

I'm writing an R function, that is becoming quite big. It admit multiple choice, and I'm organizing it like so:
myfun <- function(y, type=c("aa", "bb", "cc", "dd" ... "zz")){
if (type == "aa") {
do something
- a lot of code here -
....
}
if (type == "bb") {
do something
- a lot of code here -
....
}
....
}
I have two questions:
Is there a better way, in order to not use the 'if' statement, for every choice of the parameter type?
Could it be more functional to write a sub-function for every "type" choice?
If I write subfunction, it would look like this:
myfun <- function(y, type=c("aa", "bb", "cc", "dd" ... "zz")){
if (type == "aa") result <- sub_fun_aa(y)
if (type == "bb") result <- sub_fun_bb(y)
if (type == "cc") result <- sub_fun_cc(y)
if (type == "dd") result <- sub_fun_dd(y)
....
}
Subfunction are of course defined elsewhere (in the top of myfun, or in another way).
I hope I was clear with my question. Thanks in Advance.
- Additional info -
I'm writing a function that applies some different filters to an image (different filter = different "type" parameter). Some filters share some code (for example, "aa" and "bb" are two gaussian filters, which differs only for one line code), while others are completely different.
So I'm forced to use a lot of if statement, i.e.
if(type == "aa" | type == "bb"){
- do something common to aa and bb -
if(type == "aa"){
- do something aa-related -
}
if(type == "bb"){
- do something bb-related -
}
}
if(type == "cc" | type == "dd"){
- do something common to cc and dd -
if(type == "cc"){
- do something cc-related -
}
if(type == "dd"){
- do something dd-related -
}
}
if(type == "zz"){
- do something zz-related -
}
And so on.
Furthermore, there are some if statement in the code "do something".
I'm looking for the best way to organize my code.
Option 1
One option is to use switch instead of multiple if statements:
myfun <- function(y, type=c("aa", "bb", "cc", "dd" ... "zz")){
switch(type,
"aa" = sub_fun_aa(y),
"bb" = sub_fun_bb(y),
"bb" = sub_fun_cc(y),
"dd" = sub_fun_dd(y)
)
}
Option 2
In your edited question you gave far more specific information. Here is a general design pattern that you might want to consider. The key element in this pattern is that there is not a single if in sight. I replace it with match.function, where the key idea is that the type in your function is itself a function (yes, since R supports functional programming, this is allowed).:
sharpening <- function(x){
paste(x, "General sharpening", sep=" - ")
}
unsharpMask <- function(x){
y <- sharpening(x)
#... Some specific stuff here...
paste(y, "Unsharp mask", sep=" - ")
}
hiPass <- function(x) {
y <- sharpening(x)
#... Some specific stuff here...
paste(y, "Hipass filter", sep=" - ")
}
generalMethod <- function(x, type=c(hiPass, unsharpMask, ...)){
match.fun(type)(x)
}
And call it like this:
> generalMethod("stuff", "unsharpMask")
[1] "stuff - General sharpening - Unsharp mask"
> hiPass("mystuff")
[1] "mystuff - General sharpening - Hipass filter"
There is hardly ever a reason not to refactor your code into smaller functions. In this case, besides the reorganisation, there is an extra advantage: the educated user of your function(s) can immediately call the subfunction if she knows where she's at.
If these functions have lots of parameters, a solution (to ease maintenance) could be to group them in a list of class "myFunctionParameters", but depends on your situation.
If code is shared between the different sub_fun_xxs, just plug that into another function that you use from within each of the sub_fun_xxs, or (if that's viable) calculate the stuff up front and pass it directly into each sub_fun_xx.
This is a much more general question about program design. There's no definitive answer, but there's almost certainly a better route than what you're currently doing.
Writing functions that handle the different types is a good route to go down. How effective it will be depends on several things - for example, how many different types are there? Are they at all related, e.g. could some of them be handled by the same function, with slightly different behavior depending on the input?
You should try to think about your code in a modular way. You have one big task to do overall. Can you break it down into a sequence of smaller tasks, and write functions that perform the smaller tasks? Can you generalize any of those tasks in a way that doesn't make the functions (much) more difficult to write, but does give them wider applicability?
If you give some more detail about what your program is supposed to be achieving, we will be able to help you more.
This is more of a general programming question than an R question. As such, you can follow basic guidelines of code quality. There are tools that can generate code quality reports from reading your code and give you guidelines on how to improve. One such example is Gendarme for .NET code. Here is a typical guideline that would appear in a report with too long methods:
AvoidLongMethodsRule