Latex or HTML summary output table for vglm regression objects (VGAM) - html

I'm trying to get a latex or html output of the regression results of a VGAM model (in the example bellow it's a generalized ordinal logit). But the packages I know for this purpose do not work with a vglm object.
Here you can see a little toy example with the error messages I'm getting:
library(VGAM)
n <- 1000
x <- rnorm(n)
y <- ordered( rbinom(n, 3, prob=.5) )
ologit <- vglm(y ~ x,
family = cumulative(parallel = F , reverse = TRUE),
model=T)
library(stargazer)
stargazer(ologit)
Error in objects[[i]]$zelig.call : $ operator not defined for this S4 class
library(texreg)
htmlreg(ologit)
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘extract’ for signature ‘"vglm"’
library(memisc)
mtable(ologit)
Error in UseMethod("getSummary") : no applicable method for 'getSummary' applied to an object of class "c('vglm', 'vlm', 'vlmsmall')"

I just had the same problem. My first work around is to run the OLogit Regression with the polr function of the MASS package. The resulting objects are easily visualizable / summarizable by the usual packages (I recommend sjplot 's tab_model function for the table output!)
2nd Option is to craft your own table, which you then turn into a neat HTML object via stargazer.
For this you need to know that s4 objects are not subsettable in the same manner as conventional objects (http://adv-r.had.co.nz/Subsetting.html). The most straight forward solution is to subset the object, i.e. extract the relevant aspects with an # instead of a $ symbol:
sumobject <- summaryvglm(yourvglmobject)
stargazer(sumpbject#coef3, type="html", out = "RegDoc.doc")
A little cumbersome but it did the trick for me. Hope this helps!

Related

Getting alignment/attention during translation in OpenNMT-py

Does anyone know how to get the alignments weights when translating in Opennmt-py? Usually the only output are the resulting sentences and I have tried to find a debugging flag or similar for the attention weights. So far, I have been unsuccessful.
I'm not sure if this is a new feature, since I did not come across this when looking for alignments a few months back, but onmt seems to have added a flag -report_align to output word alignments along with the translation.
https://opennmt.net/OpenNMT-py/FAQ.html#raw-alignments-from-averaging-transformer-attention-heads
Excerpt from opennnmt.net -
Currently, we support producing word alignment while translating for Transformer based models. Using -report_align when calling translate.py will output the inferred alignments in Pharaoh format. Those alignments are computed from an argmax on the average of the attention heads of the second to last decoder layer.
You can get the attention matrices. Note that it is not the same as alignment which is a term from statistical (not neural) machine translation.
There is a thread on github discussing it. Here is a snippet from the discussion. When you get the translations from the mode, the attentions are in the attn field.
import onmt
import onmt.io
import onmt.translate
import onmt.ModelConstructor
from collections import namedtuple
# Load the model.
Opt = namedtuple('Opt', ['model', 'data_type', 'reuse_copy_attn', "gpu"])
opt = Opt("PATH_TO_SAVED_MODEL", "text", False, 0)
fields, model, model_opt = onmt.ModelConstructor.load_test_model(
opt, {"reuse_copy_attn" : False})
# Test data
data = onmt.io.build_dataset(
fields, "text", "PATH_TO_DATA", None, use_filter_pred=False)
data_iter = onmt.io.OrderedIterator(
dataset=data, device=0,
batch_size=1, train=False, sort=False,
sort_within_batch=True, shuffle=False)
# Translator
translator = onmt.translate.Translator(
model, fields, beam_size=5, n_best=1,
global_scorer=None, cuda=True)
builder = onmt.translate.TranslationBuilder(
data, translator.fields, 1, False, None)
batch = next(data_iter)
batch_data = translator.translate_batch(batch, data)
translations = builder.from_batch(batch_data)
translations[0].attn # <--- here are the attentions

Convert R data.frame to multilevel JSON

I have a periodic process in R that yields me a data.frame.
I want to use this data.frame to create a dropdown selector with AngularJS.
My final data.frame will look more or less as follows (my real example might have a deeper hierarchical structure):
DF<-data.frame(hie1=c(rep("Cl1",2),"Cl2"),hie2=c("Cl1op1","Cl1op2","Clop1"),
hie3=c("/first.html","/second.html","/third.html"))
I need to convert that data.frame into a JSON with the following structure :
{
"Cl1":{"Cl1op1":"/first.html","Cl1op2": "/second.html"},
"Cl2":{"Cl2op1":"/third.html"}
}
So far, I have tried all the toJSON commands of the rjson and RJSONIO packages for the data.frame with and without column names:
library(rjson)
#library(RJSONIO)
DF2<-DF
colnames(DF2)<-NULL
cat(toJSON(DF))
cat(toJSON(DF2))
I thought about using reshape2's dcast function beforeusing toJSON, but I do not know what kind of structure I need to achieve my goal.
I also used the functions toJSON2 an toJSONArray from the rCharts with no success.
Is there an appropriate transformation in R to get the output I am looking for?
P.S. (I do not mind having [] instead of {})
EDIT:
I have created a couple of functions (included below) to fulfil my needs.
However, they are not too clean and I believe that there must be a better way to perform this transformation in R.
I keep this question open expecting a better solution.
linktwo<-function(V){
paste0(sapply(V,function(x) paste0("'",toString(x),"'")),collapse=":")
}
pastehier<-function(DF){
if(ncol(DF)==2){
return(paste0(apply(DF,1,linktwo),collapse=","))
}else{
u<-unique(DF[,1])
output=character()
for(i in u){
output<-append(output,paste0(paste0("'",i,"'"),":{",pastehier(DF[DF[,1]==i,-1]),
"}"))
}
return(paste0(output,collapse=","))
}
}
pastehier(DF)
I do not fully understand your request and maybe my solution is useless, but here is a try:
library(reshape2)
prova <- dcast(DF, hie1 ~ ... )
toJSON(prova, pretty = TRUE)
[
{
"hie1": "Cl1",
"Cl1op1": "/first.html",
"Cl1op2": "/second.html"
},
{
"hie1": "Cl2",
"Clop1": "/third.html"
}
]
where:
> prova
hie1 Cl1op1 Cl1op2 Clop1
1 Cl1 /first.html /second.html <NA>
2 Cl2 <NA> <NA> /third.html

access leaves of json tree

I have a JSON file of the form:
{"id":442500000116137984, "reply":0, "children":[{"id":442502378957201408, "reply":0, "children":[]}]}
{"id":442500001084612608, "reply":0, "children":[{"id":442500145871990784, "reply":1, "children":[{"id":442500258421952512, "reply":1, "children":[]}]}]}
{"id":442500000258342912, "reply":0, "children":[{"id":442500636668489728, "reply":0, "children":[]}]}
In this each line refers to a separate tree. Now I want to go to the leaves of every tree and do something, basically
import json
f = open("file", 'r')
for line in f:
tree = json.loads(line)
#somehow walk through the tree and find leaves
if isLeaf(child):
print "Reached Leaf"
How do I walk through this tree object to detect all leaves?
This should work.
import json
f = open("file", 'r')
leafArray = []
def parseTree(obj):
if len(obj["children"]) == 0:
leafArray.append(obj)
else:
for child in obj["children"]:
parseTree(child)
for line in f:
global leafArray
leafArray = []
tree = json.loads(line.strip())
parseTree(tree)
#somehow walk through the tree and find leaves
print ""
for each in leafArray:
print each
You know, I once had to deal with a lot of hypermedia objects out of JSON, so I wrote this library. The problem was that I didn't know the depths of the trees beforehand, so I needed to be able to search around and get what I called the "paths" (the set of keys/indices you would use to reach a leaf) and values.
Anyway, you can mine it for ideas (I wrote it only for Python3.3+, but here's the method inside a class that would do what you want).
The basic idea is that you walk down the tree and check the objects you encounter and if you get more dictionaries (even inside of lists), you keep plunging deeper (I found it easier to write it as a recursive generator mostly by subclassing collections.MutableMapping and creating a class with a custom enumerate).
You keep track of the path you've taken along the way and once you get a value that doesn't merit further exploration (it's not a dict or a list), then you yield your path and the value:
def enumerate(self, path=None):
"""Iterate through the PelicanJson object yielding 1) the full path to
each value and 2) the value itself at that path.
"""
if path is None:
path = []
for k, v in self.store.items():
current_path = path[:]
current_path.append(k)
if isinstance(v, PelicanJson):
yield from v.enumerate(path=current_path)
elif isinstance(v, list):
for idx, list_item in enumerate(v):
list_path = current_path[:]
list_path.append(idx)
if isinstance(list_item, PelicanJson):
yield from list_item.enumerate(path=list_path)
else:
yield list_path, list_item
else:
yield current_path, v
Because this is exclusively for Python3, it takes advantage of things like yield from, so it won't work out of the box for you (and I certainly don't mean to offer my solution as the only one). Personally, I just got frustrated with reusing a lot of this logic in various functions, so writing this library saved me a lot of work and I could go back to doing weird things with the Hypermedia APIs I had to deal with.
You can do something like this. (I don't know the syntax of python).
temp = tree #Your JSON object in each line
while (temp.children ! = []){
temp = temp.children;
}
Your temp will now be the leaf.

R: Generic flattening of JSON to data.frame

This question is about a generic mechanism for converting any collection of non-cyclical homogeneous or heterogeneous data structures into a dataframe. This can be particularly useful when dealing with the ingestion of many JSON documents or with a large JSON document that is an array of dictionaries.
There are several SO questions that deal with manipulating deeply nested JSON structures and turning them into dataframes using functionality such as plyr, lapply, etc. All the questions and answers I have found are about specific cases as opposed to offering a general approach for dealing with collections of complex JSON data structures.
In Python and Ruby I've been well-served by implementing a generic data structure flattening utility that uses the path to a leaf node in a data structure as the name of the value at that node in the flattened data structure. For example, the value my_data[['x']][[2]][['y']] would appear as result[['x.2.y']].
If one has a collection of these data structures that may not be entirely homogeneous the key to doing a successful flattening to a dataframe would be to discover the names of all possible dataframe columns, e.g., by taking the union of all keys/names of the values in the individually flattened data structures.
This seems like a common pattern and so I'm wondering whether someone has already built this for R. If not, I'll build it but, given R's unique promise-based data structures, I'd appreciate advice on an implementation approach that minimizes heap thrashing.
Hi #Sim I had cause to reflect on your problem yesterday define:
flatten<-function(x) {
dumnames<-unlist(getnames(x,T))
dumnames<-gsub("(*.)\\.1","\\1",dumnames)
repeat {
x <- do.call(.Primitive("c"), x)
if(!any(vapply(x, is.list, logical(1)))){
names(x)<-dumnames
return(x)
}
}
}
getnames<-function(x,recursive){
nametree <- function(x, parent_name, depth) {
if (length(x) == 0)
return(character(0))
x_names <- names(x)
if (is.null(x_names)){
x_names <- seq_along(x)
x_names <- paste(parent_name, x_names, sep = "")
}else{
x_names[x_names==""] <- seq_along(x)[x_names==""]
x_names <- paste(parent_name, x_names, sep = "")
}
if (!is.list(x) || (!recursive && depth >= 1L))
return(x_names)
x_names <- paste(x_names, ".", sep = "")
lapply(seq_len(length(x)), function(i) nametree(x[[i]],
x_names[i], depth + 1L))
}
nametree(x, "", 0L)
}
(getnames is adapted from AnnotationDbi:::make.name.tree)
(flatten is adapted from discussion here How to flatten a list to a list without coercion?)
as a simple example
my_data<-list(x=list(1,list(1,2,y='e'),3))
> my_data[['x']][[2]][['y']]
[1] "e"
> out<-flatten(my_data)
> out
$x.1
[1] 1
$x.2.1
[1] 1
$x.2.2
[1] 2
$x.2.y
[1] "e"
$x.3
[1] 3
> out[['x.2.y']]
[1] "e"
so the result is a flattened list with roughly the naming structure you suggest. Coercion is avoided also which is a plus.
A more complicated example
library(RJSONIO)
library(RCurl)
json.data<-getURL("http://www.reddit.com/r/leagueoflegends/.json")
dumdata<-fromJSON(json.data)
out<-flatten(dumdata)
UPDATE
naive way to remove trailing .1
my_data<-list(x=list(1,list(1,2,y='e'),3))
gsub("(*.)\\.1","\\1",unlist(getnames(my_data,T)))
> gsub("(*.)\\.1","\\1",unlist(getnames(my_data,T)))
[1] "x.1" "x.2.1" "x.2.2" "x.2.y" "x.3"
R has two packages for dealing with JSON input: rjson and RJSONIO. If I understand correctly what you mean by "collection of non-cyclical homogeneous or heterogeneous data structures", I think either of these packages will import that sort of structure as a list.
You can then flatten that list (into a vector) using the unlist function.
If the list is suitably structured (a non-nested list where each element is the same length) then as.data.frame prvoides an alternative to convert the list to be a data frame.
An example:
(my_data <- list(x = list('1' = 1, '2' = list(y = 2))))
unlist(my_data)
The jsonlite package is a fork of RJSONIO specifically designed to make conversion between JSON and data frames easier. You don't provide any example json data, but I think this might be what you are looking for. Have a look at this blog post or the vignette.
Great answer with the flatten and getnames functions. Took a few minutes to figure out all the options needed to get from a vector of JSON strings to a data.frame, so I thought I'd record that here. Suppose jsonvec is a vector of JSON strings. The following builds a data.frame (data.table) where there is one row per string, and each column corresponds to a different possible leaf node of the JSON tree. Any string missing a particular leaf node is filled with NA.
library(data.table)
library(jsonlite)
parsed = lapply(jsonvec, fromJSON, simplifyVector=FALSE)
flattened = lapply(parsed, flatten) #using flatten from accepted answer
d = rbindlist(flattened, fill=TRUE)
I'm now a big fan of simply:
library(jsonlite)
library(tidyverse)
fromJSON("file_path.json") %>%
unlist() %>%
enframe()
And then potentially, depending on your data, piping that into
%>%
pivot_wider()
Once it's in a flat table shape, there are a load of tools in tidyverse and other R libraries more generally for wrangling things around and e.g., dealing with columns with similar prefixes (which will result from the above pipeline as the parent name of the children within a nested json chunk will be prefixed to the child's name).

Using \Sexpr{} in LaTeX tabular environment

I am trying to use \Sexpr{} to include values from my R objects in a LaTeX table. I am essentially trying to replicate the summary output of a lm object in R because xtable's built in methods xtable.lm and xtable.summary.lm don't seem to include the Fstats, adjusted R-squared, etc (all the stuff at the bottom of the summary printout of the lm object in R console) So I tried accomplishing this by building a matrix to replicate the xtable.summary.lm output then construct a data frame of the relevant info for the extra stuff so I can refer to the values using \Sexpr{}. I tried doing this by using add.to.row to append the \multicolumn{} command in order to merge all columns of the last row of the LaTeX table and then just pass all the information I need into that cell of the table.
The problem is that I get an "Undefined control sequence" for the \Sexpr{} expression in the \multicolumn{} expression. Are these two not compatible? If so, what am I doing wrong and if not does anyone know how to do what I am trying to do?
Thanks,
Here is the relevant part of my code:
<<Test, results=tex>>=
model1 <- lm(stndfnl ~ atndrte + frosh + soph)
# Build matrix to replicate xtable.summary.lm output
x <- summary(model1)
colnames <- c("Estimate", "Std. Error", "t value", "Pr(<|t|)")
rownames <- c("(Intercept)", attr(x$terms, "term.labels"))
fpval <- pf(x$fstatistic[1],x$fstatistic[2], x$fstatistic[3], lower.tail=FALSE)
mat1 <- matrix(coef(x), nrow=length(rownames), ncol=length(colnames), dimnames=list(rownames,colnames))
# Make a data frame for extra information to be called by \Sexpr in last row of table
residse <- x$sigma
degf <- x$df[2]
multr2 <- x$r.squared
adjr2 <- x$adj.r.squared
fstat <- x$fstatistic[1]
fstatdf1 <- x$fstatistic[2]
fstatdf2 <- x$fstatistic[3]
extradat <- data.frame(v1 = round(residse,4), v2 =degf, v3=round(multr2,4), v4=round(adjr2,4),v5=round(fstat,3), v6=fstatdf1, v7=fstatdf2, v8=round(fpval,6))
addtorow<- list()
addtorow$pos <-list()
addtorow$pos[[1]] <- dim(mat1)[1]
addtorow$command <-c('\\hline \\multicolumn{5}{l}{Residual standard error:\\Sexpr{extradat$v1}} \\\\ ')
print(xtable(mat1, caption="Summary Results for Regression in Equation \\eqref{model1} ", label="tab:model1"), add.to.row=addtorow, sanitize.text.function=NULL, caption.placement="top")
You don't need to have Sexpr in your R code; the R code can use the expressions directly. Sexpr is not a LaTeX command, even though it looks like one; it's an Sweave command, so it doesn't work to have it as output from R code.
Try
addtorow$command <-paste('\\hline \\multicolumn{5}{l}{Residual standard error:',
extradat$v1, '} \\\\ ')
Also, no need to completely recreate the matrix used by xtable, you can just build on the default output. Building on what you have above, something like:
mytab <- xtable(model1, caption="Summary Results", label="tab:model1")
addtorow$pos[[1]] <- dim(mytab)[1]
print(mytab, add.to.row=addtorow, sanitize.text.function=NULL,
caption.placement="top")
See http://people.su.se/~lundh/reproduce/sweaveintro.pdf for an example which you might be able to use as is.