Problem: Output of plm Package Dummy Variables with Additional 1 - output

i have a strange output when using the plm package to do a pooled OLS regression.
The dummy variables, D_1, D_2, D_3 and D_4, are displayed in the output and in the summary with an additional 1.
Can someone tell me where this 1 comes from and how to get rid of it?
Thanks a lot

Related

Predicting a single point of data from a Stata non-linear model

I have a hazard model on a Weibull distribution, using the Stata command streg with the options nohr and time appended to that line of code. At least, that's the code from the do file I downloaded from a replication file.
If I have a new sliver of data, how do I compute the value of the model for that specific sliver of data? I would solve by hand in Excel (my wheelhouse is R or Python) but the closed form of the regression eludes me. I'm not sure from the documentation on the command exactly how they're adding in the other regressors and the Weibull regression has a lot of parameters that I'd rather not manually chug at. I'm hoping someone can help with what I believe is a simple out-of-sample forecast in a language I simply do not use.
infile warnum frstyear lastyear ccode1 ccode2 length logleng censor oadm oada oadp omdm omda omdp opdm opda opdp durscale rterrain rterrstr summperb sumpopbg popratbg bofadjbg qualratb salscale reprsumb demosumb surpdiff nactors adis3010 using 1perwarf.raw
stset length, id(warnum) fail(censor)
streg oadm oada oadp opda rterrain rterrstr bofadjbg summperb sumpopbg popratbg qualratb surpdiff salscale reprsumb demosumb adis3010 nactors, dist(weibull) nohr time

Nltk .most_common(), what is the order it is returned in?

I have found the frequecny of bigrams in certain sentences using:
import nltk
from nltk import ngrams
mydata = “xxxxx"
mylist = mydata.split()
mybigrams =list(ngrams(mylist, 2))
fd = nltk.FreqDist(mybigrams)
print(fd.most_common())
On printing out the bigrams with the most common frequencies, one occurs 7 times wheras all 95 other bigrams only occur 1 time. However when comparing the bigrams to my sentences I can see no logical order to the way the bigrams all of frequency 1 are printed out. Does anyone know if there is any logic to the way .most_common() prints the bigrams or is it randomly generated
Thanks in advance
Short answer, based on the documentation of collections.Counter.most_common:
Elements with equal counts are ordered arbitrarily:
In current versions of NLTK, nltk.FreqDist is based on nltk.compat.Counter. On Python 2.7 and 3.x, collections.Counter will be imported from the standard library. On Python 2.6, NLTK provides its own implementation.
For details, look at the source code:
https://github.com/nltk/nltk/blob/develop/nltk/compat.py
In conclusion, without checking all possible version configurations, you cannot expect words with equal frequency to be ordered.

lme4 glmm model convergence issue

I am trying to use the lme4 package for a glmm and am getting a convergence code of 0 and a statement: Model failed to converge with max|grad| = 0.00791467 (tol = 0.001, component 1). I am interested in using the lme4 package because I would like to have AIC values to determine the appropriate model as I add in additional covariates.
Two weeks ago when I tried the same approach I got a warning message that the model failed to converge because of the max|grad| issue, but am not getting the warning message this time, just the statement at the end of the summary output.
Does this mean that the model is not converging? I also used the glmmPQL method. The coefficient parameter estimates are similar between the two model types.
Here is glmer (lme4) model code. I increased the maxfun to deal with other issues I had when I ran the model last time.
l1<-glmer(Meat_Weight~logsh+SAMS_region_2015+(1|StationID),
family="Gamma"(link="log"),data=datad,control=glmerControl(optCtrl=list(maxfun=100000)))
Here is the glmmPQL code.
m1<-glmmPQL(fixed=Meat_Weight~logsh+SAMS_region_2015,random=~1|StationID,
family=Gamma(link="log"),data=datad)
I am sure this is not information to diagnosis the problem, but if anyone has suggestions I can provide more data.
Thanks
Try to change the optimizer
l1<-glmer(Meat_Weight~logsh+SAMS_region_2015+(1|StationID),
family="Gamma"(link="log"),data=datad, control = glmerControl(optimizer="bobyqa"))

How to connect SentiWordNet to RapidMiner?

SentiWordNet is a text file. In RapidMiner 'OpenWordNet Dictionary' can only be used to access only exe files. How can I extract the sentiment scores from SentiWordNet for further processing?
Thanks in Advance.
of course you can.. with a little bit of code you can take the sentiwordnet score from the text file.
but the problem is each same word might have several different meaning.
in handling this you can simply take the average score or doing wordsense disambiguation

Reporting Services 2008R2 throws "native compiler error [BC30494] Line is too long."

An unexpected error occurred while compiling expressions. Native compiler return value: '[BC30494] Line is too long.'.
When RS throws this error, the typical scenario appears to be that there are too many text boxes on a specific data region; and the only known measure seems to be to 'minify' text box names (ie. rename TextBox345 to T345).
My report is not that large (<100 text boxes); but I make extensive use of the Lookup() function to set many of the textbox style properties from a styles dataset (>2500 Lookup() calls).
So my guess is that the VB code-behind that gets generated for the Lookup() function is quite verbose and therefore breaks the 64K limit for a generated VB code block per data region.
Can I test my hypothesis? Ie. is there a way I can inspect the generated VB code?
Any suggestions as to how to fix/dodge this problem? Needless to say that using abbreviated names in my case didn't cut it.
Quite a delayed response but for the sake of posterity:
The .vb source file of the generated code exists on disk temporarily within the directory C:\Users\{RS Service Account Name}\AppData\Local\Temp. As you mentioned, if any line goes past 65535 characters then it will fail compilation due to a VB limitation
This issue was just fixed in Reporting Services 2012 SP2 CU5; the KB article is located here. Unfortunately mainstream support for SQL 2008 R2 has ended so the fix is unlikely to be backported. As for workarounds:
Shorten the textbox names to the absolute minimum possible (e.g., use all possible single character names, then all possible two character names)
Try using subreports to split up the report
Try to rework your dataset query so that you can reduce the Lookup() calls