How to organize big R functions? - function

I'm writing an R function, that is becoming quite big. It admit multiple choice, and I'm organizing it like so:
myfun <- function(y, type=c("aa", "bb", "cc", "dd" ... "zz")){
if (type == "aa") {
do something
- a lot of code here -
....
}
if (type == "bb") {
do something
- a lot of code here -
....
}
....
}
I have two questions:
Is there a better way, in order to not use the 'if' statement, for every choice of the parameter type?
Could it be more functional to write a sub-function for every "type" choice?
If I write subfunction, it would look like this:
myfun <- function(y, type=c("aa", "bb", "cc", "dd" ... "zz")){
if (type == "aa") result <- sub_fun_aa(y)
if (type == "bb") result <- sub_fun_bb(y)
if (type == "cc") result <- sub_fun_cc(y)
if (type == "dd") result <- sub_fun_dd(y)
....
}
Subfunction are of course defined elsewhere (in the top of myfun, or in another way).
I hope I was clear with my question. Thanks in Advance.
- Additional info -
I'm writing a function that applies some different filters to an image (different filter = different "type" parameter). Some filters share some code (for example, "aa" and "bb" are two gaussian filters, which differs only for one line code), while others are completely different.
So I'm forced to use a lot of if statement, i.e.
if(type == "aa" | type == "bb"){
- do something common to aa and bb -
if(type == "aa"){
- do something aa-related -
}
if(type == "bb"){
- do something bb-related -
}
}
if(type == "cc" | type == "dd"){
- do something common to cc and dd -
if(type == "cc"){
- do something cc-related -
}
if(type == "dd"){
- do something dd-related -
}
}
if(type == "zz"){
- do something zz-related -
}
And so on.
Furthermore, there are some if statement in the code "do something".
I'm looking for the best way to organize my code.

Option 1
One option is to use switch instead of multiple if statements:
myfun <- function(y, type=c("aa", "bb", "cc", "dd" ... "zz")){
switch(type,
"aa" = sub_fun_aa(y),
"bb" = sub_fun_bb(y),
"bb" = sub_fun_cc(y),
"dd" = sub_fun_dd(y)
)
}
Option 2
In your edited question you gave far more specific information. Here is a general design pattern that you might want to consider. The key element in this pattern is that there is not a single if in sight. I replace it with match.function, where the key idea is that the type in your function is itself a function (yes, since R supports functional programming, this is allowed).:
sharpening <- function(x){
paste(x, "General sharpening", sep=" - ")
}
unsharpMask <- function(x){
y <- sharpening(x)
#... Some specific stuff here...
paste(y, "Unsharp mask", sep=" - ")
}
hiPass <- function(x) {
y <- sharpening(x)
#... Some specific stuff here...
paste(y, "Hipass filter", sep=" - ")
}
generalMethod <- function(x, type=c(hiPass, unsharpMask, ...)){
match.fun(type)(x)
}
And call it like this:
> generalMethod("stuff", "unsharpMask")
[1] "stuff - General sharpening - Unsharp mask"
> hiPass("mystuff")
[1] "mystuff - General sharpening - Hipass filter"

There is hardly ever a reason not to refactor your code into smaller functions. In this case, besides the reorganisation, there is an extra advantage: the educated user of your function(s) can immediately call the subfunction if she knows where she's at.
If these functions have lots of parameters, a solution (to ease maintenance) could be to group them in a list of class "myFunctionParameters", but depends on your situation.
If code is shared between the different sub_fun_xxs, just plug that into another function that you use from within each of the sub_fun_xxs, or (if that's viable) calculate the stuff up front and pass it directly into each sub_fun_xx.

This is a much more general question about program design. There's no definitive answer, but there's almost certainly a better route than what you're currently doing.
Writing functions that handle the different types is a good route to go down. How effective it will be depends on several things - for example, how many different types are there? Are they at all related, e.g. could some of them be handled by the same function, with slightly different behavior depending on the input?
You should try to think about your code in a modular way. You have one big task to do overall. Can you break it down into a sequence of smaller tasks, and write functions that perform the smaller tasks? Can you generalize any of those tasks in a way that doesn't make the functions (much) more difficult to write, but does give them wider applicability?
If you give some more detail about what your program is supposed to be achieving, we will be able to help you more.

This is more of a general programming question than an R question. As such, you can follow basic guidelines of code quality. There are tools that can generate code quality reports from reading your code and give you guidelines on how to improve. One such example is Gendarme for .NET code. Here is a typical guideline that would appear in a report with too long methods:
AvoidLongMethodsRule

Related

Replace Quotation in List of Lists R

I am trying to get a JSON response from an API:
test <- GET(url, add_headers(`api_key` = key))
content(test, 'parsed')
When I run content(test, 'parsed'), I get the following error:
# Error: lexical error: invalid string in json text. .Note: Final passage of the "fiscal cliff bill" on January 1
I think this is because of the double quotations. How can I either replace the double quotes or if this is not the problem, how can I fix this issue?
Thanks!
So I had run into a similar problem before, and I had intended to write a quite function to use Jeroen's fix to try to repair the JSON. Since I intended to do it anyway, here's a quick hack attempt.
NB: repairing a structured format like this is speculative at best and most certainly prone to errors. The good news is that I tried to keep this specific enough so that it will not produce false results: it'll either fix what it knows it can, or fail. The "unit-testing" really needs to check other corner-cases. If you find something that this does not fix (and should) or that this breaks (gasp!), please comment!
fix_json_quotes <- function(s) {
if (length(s) != 1) {
warning("the argument has length > 1 and only the first element will be used")
s <- s[[1]]
}
stopifnot(is.character(s))
val <- jsonlite::validate(s)
while (! val) {
ind <- attr(val, "offset") - 1
snew <- gsub("(.*)(['\"])([[:space:],]*)$", "\\1\\\\\\2\\3", substr(s, 1, ind))
if (snew != substr(s, 1, ind)) {
s <- paste0(snew, substr(s, ind + 1, nchar(s)))
} else {
break
}
val <- jsonlite::validate(s)
}
if (! val) {
# still not validating
stop("unable to fix quotes")
}
return(s)
}
Some sample data, unit-testing if you will (testthat is not required for use of the function):
library(testthat)
lst <- list(a="final \"cliff bill\" on")
json <- as.character(toJSON(lst))
json
# [1] "{\"a\":[\"final \\\"cliff bill\\\" on\"]}"
Okay, there should be no change:
expect_equal(json, fix_json_quotes(json))
Some bad data:
# un-escape the double quotes
badlst <- "{\"a\":[\"final \"cliff bill\" on\"]}"
expect_error(jsonlite::fromJSON(badlst))
expect_equal(json, fix_json_quotes(badlst))
PS: this looks specifically for double-quotes, nothing more. However, I believe that there are related errors that this might also be able to fix. I "left room" for this, in the second group within the regex (([\"])); for example, if single-quotes could also cause a problem, then the group could be changed to be ([\"']). I don't know if it's useful or even necessary.

(Python)How to process complicated json data

//Please excuse my poor English.
Hello everyone, I am doing a project which is about a facebook comment spider.
then I find the Facebook Graph GUI. It will return a json file that's so complicated for me.
The json file is include so many parts
then I use json.loads to get all the json code
finally it return a dict for me.
and i dont know how to access the Value
for example i want get all the id or comment.
but i can only get the 2 key of dict "data" and "pading"
so, how can i get the next key? like "id" or "comment"
and how to process this complicated data.
code
Thank you very much.
Two ways I can think of, either you know what you're looking for and access it directly or you loop over the keys, look at the value of the keys and nest another loop until you reach the end of the tree.
You can do this using a self-calling function and with the appropriate usage of jQuery.
Here is an example:
function get_the_stuff(url)
{
$.getJSON(url, function ( data ) {
my_parser(data) });
}
function my_parser(node)
{
$.each(node, function(key, val) {
if ( val && typeof val == "object" ) { my_parser(val); }
else { console.log("key="+key+", val="+val); }
});
}
I omitted all the error checking. Also make sure the typeof check is appropriate. You might need some other elseif's to maybe treat numbers, strings, null or booleans in different ways. This is just an example.
EDIT: I might have slightly missed that the topic said "Python"... sorry. Feel free to translate to python, same principles apply.
EDIT2: Now lets' try in Python. I'm assuming your JSON is already imported into a variable.
def my_parser(node, depth=0):
if type(node) == "list":
for val in node:
my_parser(val,depth+1)
elif type(node) == "dict":
for key in node:
printf("level=%i key=%s" % ( depth, key ))
my_parser(node[key], depth+1)
elif type(node) == "str":
pritnf("level=%i value=%s" % ( depth, node ))
elsif type(node) == "int":
printf("level=%i value=%i % ( depth, node ))
else:
printf("level=%i value_unknown_type" % ( depth ))

How compare CodeFunction.Access value?

I'm generating methods of partial classes. Using T4 Text Template
At first, I'm looking for extra implemented methods in the interface.
After that, reading access type, calling CodeFunction.Access.
I need compare CodeFunction.Access result.
I tried:
if(extraMethod.Access == vsCMAccessPublic)
if(extraMethod.Access == "vsCMAccessPublic")
no result....
If withdraw <#= extraMethod.Access #> I get vsCMAccessPublic
Answer:
if(extraMethod.Access == EnvDTE.vsCMAccess.vsCMAccessPublic)
or
if(extraMethod.Access == vsCMAccess.vsCMAccessPublic)

Latex or HTML summary output table for vglm regression objects (VGAM)

I'm trying to get a latex or html output of the regression results of a VGAM model (in the example bellow it's a generalized ordinal logit). But the packages I know for this purpose do not work with a vglm object.
Here you can see a little toy example with the error messages I'm getting:
library(VGAM)
n <- 1000
x <- rnorm(n)
y <- ordered( rbinom(n, 3, prob=.5) )
ologit <- vglm(y ~ x,
family = cumulative(parallel = F , reverse = TRUE),
model=T)
library(stargazer)
stargazer(ologit)
Error in objects[[i]]$zelig.call : $ operator not defined for this S4 class
library(texreg)
htmlreg(ologit)
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘extract’ for signature ‘"vglm"’
library(memisc)
mtable(ologit)
Error in UseMethod("getSummary") : no applicable method for 'getSummary' applied to an object of class "c('vglm', 'vlm', 'vlmsmall')"
I just had the same problem. My first work around is to run the OLogit Regression with the polr function of the MASS package. The resulting objects are easily visualizable / summarizable by the usual packages (I recommend sjplot 's tab_model function for the table output!)
2nd Option is to craft your own table, which you then turn into a neat HTML object via stargazer.
For this you need to know that s4 objects are not subsettable in the same manner as conventional objects (http://adv-r.had.co.nz/Subsetting.html). The most straight forward solution is to subset the object, i.e. extract the relevant aspects with an # instead of a $ symbol:
sumobject <- summaryvglm(yourvglmobject)
stargazer(sumpbject#coef3, type="html", out = "RegDoc.doc")
A little cumbersome but it did the trick for me. Hope this helps!

Convert character to html in R

What's the prefered way in R to convert a character (vector) containing non-ASCII characters to html? I would for example like to convert
"ü"
to
"ü"
I am aware that this is possible by a clever use of gsub (but has anyone doen it once and for all?) and I thought that the package R2HTML would do that, but it doesn't.
EDIT: Here is what I ended up using; it can obviously be extended by modifying the dictionary:
char2html <- function(x){
dictionary <- data.frame(
symbol = c("ä","ö","ü","Ä", "Ö", "Ü", "ß"),
html = c("ä","ö", "ü","Ä",
"Ö", "Ü","ß"))
for(i in 1:dim(dictionary)[1]){
x <- gsub(dictionary$symbol[i],dictionary$html[i],x)
}
x
}
x <- c("Buschwindröschen", "Weißdorn")
char2html(x)
This question is pretty old but I couldn't find any straightforward answer... So I came up with this simple function which uses the numerical html codes and works for LATIN 1 - Supplement (integer values 161 to 255). There's probably (certainly?) a function in some package that does it more thoroughly, but what follows is probably good enough for many applications...
conv_latinsupp <- function(...) {
out <- character()
for (s in list(...)) {
splitted <- unlist(strsplit(s, ""))
intvalues <- utf8ToInt(enc2utf8(s))
pos_to_modify <- which(intvalues >=161 & intvalues <= 255)
splitted[pos_to_modify] <- paste0("&#0", intvalues[pos_to_modify], ";")
out <- c(out, paste0(splitted, collapse = ""))
}
out
}
conv_latinsupp("aeiou", "àéïôù12345")
## [1] "aeiou" "àéïôù12345"
The XML uses a method insertEntities for this, but that method is internal. So you may use it at your own risk, as there are no guarantees that it will remain to operate like this in future versions.
Right now, your code could be accomplished using
char2html <- function(x) XML:::insertEntities(x, c("ä"="auml", "ö"="ouml", …))
The use of a named list instead of a data.frame feels kind of elegant, but doesn't change the core of things. Under the hood, insertEntities calls gsub in much the same way your code does.
If numeric HTML entities are valid in your environment, then you could probably convert all your text into those using utf8ToInt and then turn safely printable ASCII characters back into unescaped form. This would save you the trouble of maintaining a dictionary for your entities.