Stata: Dummy variable as instrument in xtivreg or xtivreg2? - binary

I am trying to estimate the price elasticity of electricity demand for time series data. In order to handle the endogeneity of price, I am running a panel IV regression.
Treatment groups (4) and 1 control group were randomly assigned and assign the household to a certain pricing program so treatment status should be highly correlated with the endogenous variable (log_price). However, due to random assignment, the treatment status should not be correlated with log_usage. I also include weather variables and a vector of household variables when not using fixed effects/xtivreg2. Treatment is a dummy variable for any treatment group vs. control. I have tried two regressions but both give me errors:
(1) xtivreg:
xtset ID datetime
xtivreg log_usage weather (log_price = i.treatment) household_vars
/// Error: Matsize too small
--> Here I am told that my matsize is too small, although I have set it to 800. I have a very large data set so maybe this is impossible?
(2) xtivreg2
xtset ID datetime
xtivreg2 log_usage weather (log_price = i.treatment), fe
///Error: Factor variables not allowed
(3) I would actually prefer to somehow differentiate between the different treatment groups (1-4) so I created dummies for each of the groups. The control group is the omitted base group:
xtset ID datetime
xtivreg log_usage weather (log_price = i.group1 i.group2 i.group3 i.group4)
///Error: matsize too small
How do I run xtivreg or xtivreg2 with binary variables as a instrument (or instruments)? How do I differentiate between any form of treatment and control and the individual treatment groups (1-4)?

Related

Employing a large discrete observation space in OpenAI Gym

I am creating a custom environment in OpenAI Gym, and I'm having some trouble navigating the observation space.
Every timestep, the agent is given two potential students to accept or deny admission to - these are randomized and are part of the observation space. As the reward is based on which students are currently enrolled (who we have accepted in the past), we need to keep track of who has been accepted and who has not within the state space (there are a limited number of spots available to students). Each student has a 'major' (1-15) and a 'minor' (1-5) which, in the simulator I built, have weights associated with them that have a bearing on the reward, so they must be included in the state space. After a number of timesteps (varies depending on the major/minor combination), students graduate and can be removed from the list of enrolled students (and removed from being represented in the state space).
Thus, I currently have something like:
spaces = {
'potential_student_I': spaces.Tuple(((spaces.Discrete(15), spaces.Discrete(5)))),
'potential_student_II': spaces.Tuple(((spaces.Discrete(15), spaces.Discrete(5)))),
'enrolled_student_I': spaces.Tuple(((spaces.Discrete(16), spaces.Discrete(6)))),
'enrolled_student_II': spaces.Tuple(((spaces.Discrete(16), spaces.Discrete(6)))),
'enrolled_student_III': spaces.Tuple(((spaces.Discrete(16), spaces.Discrete(6)))),
}
self.observation_space = spaces.Dict(spaces)
In the above code, there's only room for three potential accepted students to be represented. These are spaces.Tuple(((spaces.Discrete(16), spaces.Discrete(6)))) rather than spaces.Tuple(((spaces.Discrete(15), spaces.Discrete(5)))) because the list doesn't necessarily need to be filled, so there are extra options for 'NULL'.
Is there a better way to do this? I thought about maybe using one-hot encoding or something similar. Ideally this environment could have up to 50 enrolled students, which obviously is not efficient if I continue representing the observation space the way I currently am. I plan on using a neural net because of the large state space, but I'm caught up on how to efficiently represent the observation space.

Write a query to check that all the routes in the circuit called "Beginner" are compatible

Slot(sid, wall, x, y)
Hold(hid, color, desc)
Route(rid, name, circuit)
Placement(rid, hid, sid)
Slot represents the possible locations for a hold. sid is a surrogate key, the wall is the name of the wall (e.g., "north," "front"), (x,y) is the location on the wall, measured in meters.
Hold manages the inventory of shaped resin pieces that simulate outcroppings on which to step or grab.
Route is a set of holds attached to particular slots. name is a descriptive text string. circuit is a label indicating that this route is part of a set of related routes.
sid, hid, rid are integers.
Question: A conflict is when two holds occupy the same slot. A set of routes are compatible if they have no conflicts. Write a query to check that all the routes in the circuit called Beginner are compatible. Your query should return the sid that is causing the conflict.
There would be two sids that conflict with each other as I understand it, but the question only requires one. If their Ids are numeric and incremental then this query should return the latest one.
SELECT MAX(sid)
FROM placement JOIN Route ON Route.rid = Placement.rid
WHERE Route.circuit = 'Beginner'
GROUP BY hid HAVING Count(sid) > 1

Warning appears in mixed effect model using spss

I am using spss to conduct mixed effect model of the following project: 
The participant is being asked some open ended questions and their answers are recorded.
For example, if the participant's answer is related to equality, the variable "equality" is coded as "1". Otherwise, it is coded as "0". Therefore, dependent variable is the variable "equality".
Fixed effects: 
- participant's country (Asians vs. Westerners)
- gender (Male vs Female)
- age group (younger age group vs. older age group)
- condition (control group vs. intervention group)
Random effect: Subject ID (participants)
Sample size: over 600 participants
My syntax in spss: 
MIXED  Equality BY Country Gender AgeGroup Condition
/CRITERIA=CIN(95) MXITER(100) MXSTEP(10) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE)
/FIXED= Country Gender AgeGroup Condition  | SSTYPE(3)
/METHOD=ML
/PRINT=SOLUTION TESTCOV
/RANDOM=INTERCEPT | SUBJECT(SubID_R) COVTYPE(VC).
When running this analysis in spss, the following warning appears: 
Iteration was terminated but convergence has not been achieved.
The MIXED procedure continues despite this warning. Subsequent results produced are based on the last iteration. Validity of the model fit is uncertain.    
I try to increase the number of "MXSTEP" from 10 to 10000 in syntax, but another warning appear: 
The final Hessian matrix is not positive definite although all
convergence criteria are satisfied.
The MIXED procedure continues despite this warning. Validity of subsequent results cannot be ascertained.  
I also try to increase the number of  "MXITER" but the warning remains. May I ask how to deal with this problem to get rid of the warning? 
Aside from what you've already tried, in some cases increasing the number of Fisher scoring steps can be helpful, but it may be the case that your random intercept variance is truly redundant and you won't be able to resolve this problem with those data and that model.
Also, typically you would not use a linear model for a binary response variable, but would use something like a logistic model (this can be done in GENLINMIXED, under Analyze>Mixed Models>Generalized Linear in the menus).

Small sample size for a regression marketing model

I have sales, advertising spend and price data for 10 brands of same industry from 2013-2018. I want to develop an equation to predict 2019 sales.
The variables I have are (price & ad spend by type) :PricePerUnit Magazine, News, Outdoor, Broadcasting, Print.
The confusion I have is I am not sure whether to run regression using only 2018 data with 2018 sales as Target variable and adding additional variable like Past_2Yeas_Sales(2016-17) to above price & ad spend variables (For clarity-Refer the image of data). With this type of data I will have a sample size of only 10 as there are only 10 brands. This I think is too low for linear regression to give correct results.
Second option (which will increase sample size) I figure is could be instead of having a brand as an observation, I take brand+year as an observation which will increase my sample size to 60- for e.g. Brand A has 6 observations like A-2013, A-2014, A-2014...,A-2018, B has B-2013,B-2014..B-2018 and so on for 10 brands(Refer image for data).
Is the second option valid way to run regression? What is the right way to run regression in such situations of small sample size?

Q-learning algorithm

Good afternoon, I used q-learning to model the following problem: a set of agents have the access to 2 access points (AP) states to upload data. S={1,2} the set of states which refers to the connection to AP1 or 2. A={remain, change}. We suppose that during the total duration of simulation, the agents can access to the 2 APs. The goal is to upload the maximum of data during the simulation. The reward is a function that depends on time, which is defined as follows: R(t)= alpha*T+b, where T is the length of the interval of time and b varies over time.
In this situation, is it true to define the terminal condition as the convergence of q-tables to a pre-defined value? How the exploitation phase can be expressed (because there is not a step defined as a final goal)?
Thank you in advance for your help.