I am interested in aggregating / collecting values from analyses in order to output them to a file. The motivation behind this is to keep the analysis process as much "hands off" as possible so as to avoid typing mistakes and be more efficient in producing results (i.e., not winnowing around a plain text file for a bunch of values and then retyping those into a document...).
As an example, I would like to run three hierarchical regressions, and save the marginal predicted value of SEX on the outcome variable TOTALSCORE.
I know I could start a log file and save all the output, but I would like to avoid having to retype by hand.
I did find a discussion about a similar topic here, but couldn't figure out how to make it work...
use http://www.stata-press.com/data/r13/depression.dta , replace
foreach v of varlist * {
rename `v' `=lower("`v'")'
}
****
anova totalscore i.sex
ereturn list , all
return list , all
estat esize
return list, all
margins i.sex, at( (mean) _c (asobserved) _f)
return list , all
matrix list r(b)
anova totalscore i.sex i.race
ereturn list , all
estat esize
margins i.sex, at( (mean) _c (asobserved) _f)
matrix list r(b)
anova totalscore i.sex i.race c.age
ereturn list , all
estat esize
margins i.sex, at( (mean) _c (asobserved) _f)
matrix list r(b)
/*
would ultimately like to produce something like
this and save to a file :
Model 0.sex 1.sex est_name
model 1 57.237 57.840 anova totalscore i.sex
model 2 57.243 57.825 anova totalscore i.sex i.race
model 3 57.228 57.864 anova totalscore i.sex i.race c.age
*/
You can use the user-written module ESTOUT (run ssc describe estout).
An example:
clear
use http://www.stata-press.com/data/r13/depression.dta
rename _all, lower
local mods `" "i.sex" "i.sex i.race" "i.sex i.race c.age" "'
quietly foreach mod of local mods {
anova totalscore `mod'
margins i.sex, at( (mean) _c (asobserved) _f) post
eststo
}
esttab, noobs not nostar mtitles nonumbers title(Marginal Effects)
eststo clear
(Notice the post option given to the margins command.)
The command allows writing results to file and customizing output in many ways, but requires thorough reading.
another answer is given here: http://www.statalist.org/forums/forum/general-stata-discussion/general/1131792-how-to-collect-aggregate-stata-output-from-multiple-analyses-to-a-file
Related
I'm using the package rdrobust in R and Stata. I planned to fully implement the analysis in R, but encountered a problem with the function rdbwselect. This function computes different bandwidths depending on the selection procedure. By default, the procedure is Mean Square Error bwselect=mserd. However, I'm interested in exploring other procedures and comparing them. I then tried ALL=true; which is the option that according to the package "if specified, rdbwselect reports all available bandwidth selection procedures"
My issue is that, in R, rdbwselect is not showing me the bandwidths, not with the default not with the 'all' option or any other specification
x<-runif(1000,-1,1)
y<-5+3*x+2*(x>=0)+rnorm(1000)
## With default mserd
rdbwselect(y,x,)
## All selection procedures
rdbwselect(y,x,all= TRUE)
Output rdwselect
The output of both lines of rdbwselect code is exactly the same (see image), and it should not. I also try replicating the script from the rdrobust article in The R Journal (Page 49) and I don't get the same output as in the paper.
Nevertheless, the function is working in Stata 16
clear all
set obs 1000
set seed 1234
gen x = runiform(-1,1)
gen y = 5+3*x+2*(x>=0)+rnormal()
rdbwselect y x
rdbwselect y x, all
Could someone provide me with some guidance on why R is not showing me the complete expected output of the function rdbwselect? I'm wondering if this is an issue related to my version of R? Could this be a bug with the R package or the specific function rdbwselect? How can I verify the computation behind rdbwselect?
I appreciate any advice or follow-up questions.
Found the solution. All I needed to do was to embed the function within the summary() function
summary(rdbwselect(y,x,))
or add a pipe and the summary function
rdbwselect(y,x,all= TRUE) %>%
summary()
I want to post it as this is nowhere mentioned in the package documentation nor the article in The R Journal.
I am using mallet from a scala project. After training the topic models and got the inferencer file, I tried to assign topics to new texts. The problem is I got different results with different calling methods. Here are the things I tried:
creating a new InstanceList and ingest just one document and get the topic results from the InstanceList
somecontentList.map(text=>getTopics(text, model))
def getTopics(text:String, inferencer: TopicInferencer):Array[Double]={
val testing = new InstanceList(pipe)
testing.addThruPipe(new Instance(text, null, "test instance", null))
inferencer.getSampledDistribution(testing.get(0), iter, 1, burnIn)
}
Put everything in a InstanceList and predict topics together.
val testing = new InstanceList(pipe)
somecontentList.foreach(text=>
testing.addThruPipe(new Instance(text, null, "test instance", null))
)
(0 until testing.size).map(i=>
ldaModel.getSampledDistribution(testing.get(i), 100, 1, 50))
These two methods produce very different results except for the first instance. What is the right way of using the inferencer?
Additional information:
I checked the instance data.
0: topic (0)
1: beaten (1)
2: death (2)
3: examples (3)
4: forum (4)
5: wanted (5)
6: contributing (6)
I assume the number in parenthesis is the index of words used in prediction. When I put all text into the InstanceList, the index is different because the collection has more text. Not sure how exactly that information is considered in the model prediction process.
Remember that the new instances must be imported with the pipe from the original data as recorded in the Inferencer in order for the alphabets to match. It's not clear where pipe is coming from in the scala code, but the fact that the first six words seem to have what looks like it might be ids starting with 0 suggests that this is a new alphabet.
I too found similar issue, although with R plug in. We ended up calling the Inferencer for each row/document separately.
However, there will be some differences in inferences when you call for the same row, because of stochasticity in the drawing and inferencer. Although, I agree that the differences should be small.
In the given example of MNIST in the Caffe installation.
For any given test image, how to get the softmax scores for each category and do some processing on them? Say compute the mean and variance of them.
I am newbie so a detail would help me a lot. I am able to train the model and use the testing feature to get the prediction but I am not sure which files are to be edited in order to get the above results.
You can use python interface
import caffe
net = caffe.Net('/path/to/deploy.prototxt', '/path/to/weights.caffemodel', caffe.TEST)
in_ = read_data(...) # this is up to you to read a sample and convert it to numpy array
out_ = net.forward(data=in_) # assuming your net expects "data" in blob
Now you have the output of your net in a dictionary out (keys are names of output blobs). You can run it in a loop on several examples etc.
I can try to answer your question. Assuming in your deploying net, the softmax layer is like below:
layer {
name: "prob"
type : "Softmax"
bottom: "fc6"
top: "prob"
}
In your python code that processes data, combining with the code #Shai provided, you can get the probability of each category by adding code based on #Shai's code:
predicted_prob = net.blobs['prob'].data
predicted_prob will be returned an array that contains the probabilities with all categories.
For example, if you only have two categories, predicted_prob[0][0] will be the probability that this testing data belongs to one category and predicted_prob[0][1] will be the probability of the other one.
PS:
If you don't want to write any additional python script, according to https://github.com/BVLC/caffe/tree/master/examples/mnist
it says this example will automatically do the testing every 500 iterations. "500" is defined in solver, such as https://github.com/BVLC/caffe/blob/master/examples/mnist/lenet_solver.prototxt
So you need to trace back the caffe source code that processes the solver file. I guess it should be https://github.com/BVLC/caffe/blob/master/src/caffe/solver.cpp
I am not sure solver.cpp is the correct file you need to look at. But in this file, you can see it has functions of testing and calculation of some values. I hope it can give you some ideas if no one else can answer your question.
I am working with serialized array fields in one of my models, specifically in counting how many members of each array are shared.
Now, by the nature of my project, I am having to a HUGE number of these overlap countings.. so I was wondering if there was a super quick, cleaver way to do this.
At the moment, I am using the '&' method, so my code looks like this
(user1.follower_names & user2.follower_names).count
which works fine... but I was hoping there might be a faster way to do it.
Sets are faster for this.
require 'benchmark'
require 'set'
alphabet = ('a'..'z').to_a
user1_followers = 100.times.map{ alphabet.sample(3) }
user2_followers = 100.times.map{ alphabet.sample(3) }
user1_followers_set = user1_followers.to_set
user2_followers_set = user2_followers.to_set
n = 1000
Benchmark.bm(7) do |x|
x.report('arrays'){ n.times{ (user1_followers & user2_followers).size } }
x.report('set'){ n.times{ (user1_followers_set & user2_followers_set).size } }
end
Output:
user system total real
arrays 0.910000 0.000000 0.910000 ( 0.926098)
set 0.350000 0.000000 0.350000 ( 0.359571)
An alternative to the above is to use the '-' operator on arrays:
user1.follower_names.size - (user1.follower_names - user2.follower_names).size
Essentially this gets the size of list one and minuses the size of the joint list without the intersection. This isn't as fast as using sets but much quicker than using intersection alone with Arrays
I defined some variable in the beginning of a Mathematica notebook and used it afterwards to calculate my results. Now I want to do the same calculation several times for different values of the variable and use the results somewhere. So it would be useful to define a new function with this variable as parameter and the content of my notebook as body. But then I would have to write everything in one single input line and there would be no comfortable way to see the intermediate results.
Is there any good way to deal with this kind of situation?
To clarify what I mean, the a short example:
What I could do is something like this:
In[1] := f[variable_] := (
calculations;
many lines of calcalutions;
many lines of calcalutions;
(* some comments *);
(* finally the result... *);
result
)
And afterwards use this function:
In[1] := f[value1] + f[value2]
But if somebody is interested in the intermediate result of line 1 of function f ("calculations"), then it's necessary to copy the line somewhere else. But then you can't just remove the semicolon at the end of the line to see the line's result.
Using
lc = Notebook[{Cell[
BoxData[\(\(Pause[1]\) ;\)]
, "Input"], Cell[
BoxData[\(\(Print[\(Date[]\)]\) ;\)]
, "Input"], Cell[
BoxData[\(\(Print[
\("variable = ", variable\)] \) ;\)]
, "Input"], Cell[
BoxData[\(result = \(variable^2\)\)]
, "Input"]}, WindowSize ->
{701, 810}, WindowMargins ->
{{Automatic, 149}, {Automatic,
35}}, FrontEndVersion -> "8.0 for Microsoft Windows (64-bit) (October \
6, 2011)", StyleDefinitions ->
"Default.nb"];
Or, if you saved it under longcalc.nb into the same directory as your working notebook, then
lc = Get[FileNameJoin[{SetDirectory[NotebookDirectory[]], "longcalc.nb"}]];
Now, in your working notebook evaluate:
f[var_] := (variable = var;
NotebookEvaluate[NotebookPut[lc], InsertResults -> True]);
f[value1] + f[value2]
will do what you want.
If you do instead
f[variable_] := (
{calculations,
many lines of calcalutions,
many lines of calcalutions,
(* some comments *);
(* finally the result... *);
result}
)
then your function will return a list {ir1,ir2,...,result}, where the ir1 etc are the intermediate results. You could then assign {ir1,ir2,..,re}=f[value] whereupon re would contain the final result while the ir the intermediate results.
Does this work?
you can also do intRes={}; outside function and inside the function dump values into it. Of course this gets tricky if you use parallelization inside your function, or parallelize the whole function.
AppendTo[intRes,ir1];
AppendTo[intRes,ir2];
or
f[variable_] := Block[{output={}},
calculations;
AppendTo[output,ir1];
many lines of calcalutions;
(* some comments *);
AppendTo[output,ir2];
(* finally the result... *);
{output,result}];
and execute as {intRes,result}=f[var]; -- intRes will be a list of interrim results.
If you don't need to retain intermediate results for computation, just see them, then there are much more elegant ways to view what's happenning.
For slower functions, use Monitor[] or Dynamic[] or PrintTemporary[] or ProgressIndicator[]
Results of these outputs change and/or disappear as the function progresses.
If you want a more permanent record (let's say the function runs really fast), then use Print[] to see intermediate output.
UNLESS of course you need to use intermediate results in computation.