Region fixed effects in OLS - regression

What is the Stata code for adding region fixed-effects in ordinary least squares regression? My dependent variable is volume of sale of a product and independent one is dummy variable, 1 for red pamphlet, 0 for blue pamphlet distributed to a sample of people over five districts. I want to include region fixed effects in the model. I tried generating dummy variables for the five regions and adding the dummies in the model.
Is this approach correct? If not, which one is?
reg pamph sale income plotsize region1 region2 region3 region4 region5

There are a number of ways to control for group fixed effects.
The simplest (IMO) in your situation is to use a factor variable.
For example:
webuse nlswork
reg ln_w grade age i.ind_code
In your case this would look like:
reg pamph sale income plotsize i.region
Assuming that region is a variable with a unique id for each region.
Other options are areg (see help areg) or reghdfe (see here):
areg ln_w grade age, absorb(ind_code)
reghdfe ln_w grade age, absorb(ind_code)

Related

How to get multiple coefficients from a panel data set

I have a data set with the following variables: ID of an individual, current year, year of graduation, degree, income, and a 0/1 variable to indicate treatment. The income is in the same year as the variable year.
What I want is to regress current income over treatment for every possible combination of: year, year of graduation, and degree.
That means running multiple different regressions that will give me multiple coefficients.
I have zero clues how to do so. I would normally just use:
reg income treatment
But this will not give me multiple coefficients.
Try something like this:
sysuse auto
statsby, by(foreign rep78): regress mpg weight

Stata: Dummy variable as instrument in xtivreg or xtivreg2?

I am trying to estimate the price elasticity of electricity demand for time series data. In order to handle the endogeneity of price, I am running a panel IV regression.
Treatment groups (4) and 1 control group were randomly assigned and assign the household to a certain pricing program so treatment status should be highly correlated with the endogenous variable (log_price). However, due to random assignment, the treatment status should not be correlated with log_usage. I also include weather variables and a vector of household variables when not using fixed effects/xtivreg2. Treatment is a dummy variable for any treatment group vs. control. I have tried two regressions but both give me errors:
(1) xtivreg:
xtset ID datetime
xtivreg log_usage weather (log_price = i.treatment) household_vars
/// Error: Matsize too small
--> Here I am told that my matsize is too small, although I have set it to 800. I have a very large data set so maybe this is impossible?
(2) xtivreg2
xtset ID datetime
xtivreg2 log_usage weather (log_price = i.treatment), fe
///Error: Factor variables not allowed
(3) I would actually prefer to somehow differentiate between the different treatment groups (1-4) so I created dummies for each of the groups. The control group is the omitted base group:
xtset ID datetime
xtivreg log_usage weather (log_price = i.group1 i.group2 i.group3 i.group4)
///Error: matsize too small
How do I run xtivreg or xtivreg2 with binary variables as a instrument (or instruments)? How do I differentiate between any form of treatment and control and the individual treatment groups (1-4)?

How to get rid of arrays in mysql database?

Dining room specializes on complex dinners. Have collection of recipes (each of them collect rates of the products). Every product have changeable price.
Is it the best design?
Recipe(r_id, r_title, r_category, r_price)
Product(p_id, p_title, p_price)
UsingProducts(r_id, p_id, amount)
I am just not sure about UsingProducts..
The design looks quite okay.
As zerkms mentioned, you're lacking units. That doesn't have to be a problem, as your product can be "100 g flour" so the unit is implicit. However, when printing the recipe, you would print "5 x 100 g flour" instead of "500 g flour". It would also print "10 x 100 g flour" instead of "1 kilo flour".
Just think about whether this an issue for you and if you even need unit conversion like 1000 g = 1 kilo.
Another point is your category. So a recipe can only belong to one category. So you won't have something like "vegetarian" and "soups" with the problem where to place a vegetarian soup, but use distinct categories instead. Okay. However, don't you want a table for them, so to be able to easily select them? If you want to stay with this design you should at least make them an enum column (something special in MySQL), so you dont mistakenly have recipes in "soups" and others in "suops".
At last: What is the r_price for? Shouldn't that be the the sum of all sub prices (product price x amount)? Don't hold data redundantly. This must not be done. Otherwise inconsistencies can occur (e.g. 10$ + 10$ = 30$). Remove r_price from table recipe to have a normalized database.

MySQL what would the best approach to ranking highest to lowest possible match?

I have a MySQL database I'm searching through. Lets say this is a database of people. When querying for a specific record, it is possible to find a match 100% on each attribute. But querying the database to find closest match on probability (closest matches on table attributes) is more of the strategy.
In this scenario, does it make sense to create a temporary table (much like a tally-sheet) to indicate what attributes match/what attributes are present? What is the typical approach to doing advanced searches on database like this?
Example (below) of a hypothetical stored Procedure
*parameters are just to exemplify how I would search. I'm not concerned how to perform my selects. Question is about approach, strategy, technique *
call FindPerson ("Brown Eyes", "Brown hair", "Height:6'1", "white", "Name:Joe" ,"weight180", "Age 34" "sex m");
RESULT TABLE
NAME AGE HEIGHT WEIGHT HAIR SKIN sex RANK_MATCH
Joe 32 6'1 180 Brown white m 1
Mike 33 6'1 179 Brown white m 2
James 31 6'0 179 Brown black m 3
Just out of my mind. You can create your own score and sort by it. Something like
SELECT `id`,
(IF(`age`=32,1,0)+IF(`height`="6'1",1,0)+...) as `score`
FROM `people`
HAVING `score` > 0
ORDER BY `score` DESC
LIMIT 10;
With this, you can handle every field with its own comparison, and also weight the individual attributes by not just add 1 but 2 or more.
But I'm quiet not sure, how performant this is.
The approach I would use would be to create a scoring function (your stored proc) that would evaluate the given input's standard distance from the mean.
In the proc, you would judge each criteria in a fashion similar to:
INPUT AGE: 32
calculate MEAN of AGE WHERE (sex = m): 34.5
calculate STANDARD DEVIATION of AGE WHERE (sex = m): 2.5
calculate how many STDEVs 32 is from the 34.5 (also known as z-score): 1
Repeat this process for all numeric datatypes, summing them and ORDER BY the sum.
In doing so, the following schema change would be required: height changed from foot/inch form to strictly inches.
Depending on your needs, you may also consider coming up with an arbitrary scale for sex and skin color/hair color. Of course, you may think that measures like these should NOT be factored in because of how drastically it would change the scoring function. If you chose to, you'd have to find some number that would be added to the above SUM...but it's hard because nominative variables don't translate easily into these kinds of things.
If you find that haircolor/skin color is able to be usefully transferred into say, the continous color spectrum, your scoring tidbit would be the same...color value of input vs color value of means and standard deviations.
The query that would find your matches would be something to the effect of:
SELECT
ABS(INPUT_AGE - AVG(AGE)) / STD(AGE) AS age_z,
ABS(INPUT_WT - AVG(WT)) / STD(WT) AS wt_z,
...
(age_z + wt_z + ...) AS score
FROM `table`
ORDER BY score ASC

Stock management of assemblies and its sub parts (relationships)

I have to track the stock of individual parts and kits (assemblies) and can't find a satisfactory way of doing this.
Sample bogus and hyper simplified database:
Table prod:
prodID 1
prodName Flux capacitor
prodCost 900
prodPrice 1350 (900*1.5)
prodStock 3
-
prodID 2
prodName Mr Fusion
prodCost 300
prodPrice 600 (300*2)
prodStock 2
-
prodID 3
prodName Time travel kit
prodCost 1200 (900+300)
prodPrice 1560 (1200*1.3)
prodStock 2
Table rels
relID 1
relSrc 1 (Flux capacitor)
relType 4 (is a subpart of)
relDst 3 (Time travel kit)
-
relID 2
relSrc 2 (Mr Fusion)
relType 4 (is a subpart of)
relDst 3 (Time travel kit)
prodPrice: it's calculated based on the cost but not in a linear way. In this example for costs of 500 or less, the markup is a 200%. For costs of 500-1000 the markup is 150%. For costs of 1000+ the markup is 130%
That's why the time travel kit is much cheaper than the individual parts
prodStock: here is my problem. I can sell kits or the individual parts, So the stock of the kits is virtual.
The problem when I buy:
Some providers sell me the Time Travel kit as a whole (with one barcode) and some sells me the individual parts (with a different barcode)
So when I load the stock I don't know how to impute it.
The problem when I sell:
If I only sell kits, calculate the stock would be easy: "I have 3 Flux capacitors and 2 Mr Fusions, so I have 2 Time travel kits and a Flux Capacitor"
But I can sell Kits or individual parts. So, I have to track the stock of the individual parts and the possible kits at the same time (and I have to compensate for the sell price)
Probably this is really simple, but I can't see a simple solution.
Resuming: I have to find a way of tracking the stock and the database/program is the one who has to do it (I cant ask the clerk to correct the stock)
I'm using php+MySql. But this is more a logical problem than a programing one
Update: Sadly Eagle's solution wont work.
the relationships can and are recursive (one kit uses another kit)
There are kit that does use more than one of the same part (2 flux capacitors + 1 Mr Fusion)
I really need to store a value for the stock of the kit. The same database is used for the web page where users want to buy the parts. And I should show the avaliable stock (otherwise they wont even try to buy). And can't afford to calculate the stock on every user search on the web page
But I liked the idea of a boolean marking the stock as virtual
Okay, well first of all since the prodStock for the Time travel kit is virtual, you cannot store it in the database, it will essentially be a calculated field. It would probably help if you had a boolean on the table which says if the prodStock is calculated or not. I'll pretend as though you had this field in the table and I'll call it isKit for now (where TRUE implies it's a kit and the prodStock should be calculated).
Now to calculate the amount of each item that is in stock:
select p.prodID, p.prodName, p.prodCost, p.prodPrice, p.prodStock from prod p where not isKit
union all
select p.prodID, p.prodName, p.prodCost, p.prodPrice, min(c.prodStock) as prodStock
from
prod p
inner join rels r on (p.prodID = r.relDst and r.relType = 4)
inner join prod c on (r.relSrc = c.prodID and not c.isKit)
where p.isKit
group by p.prodID, p.prodName, p.prodCost, p.prodPrice
I used the alias c for the second prod to stand for 'component'. I explicitly wrote not c.isKit since this won't work recursively. union all is used rather than union for effeciency reasons, since they will both return the same results.
Caveats:
This won't work recursively (e.g. if
a kit requires components from
another kit).
This only works on kits
that require only one of a particular
item (e.g. if a time travel kit were
to require 2 flux capacitors and 1
Mr. Fusion, this wouldn't work).
I didn't test this so there may be minor syntax errors.
This only calculates the prodStock field; to do the other fields you would need similar logic.
If your query is much more complicated than what I assumed, I apologize, but I hope that this can help you find a solution that will work.
As for how to handle the data when you buy a kit, this assumes you would store the prodStock in only the component parts. So for example if you purchase a time machine from a supplier, instead of increasing the prodStock on the time machine product, you would increase it on the flux capacitor and the Mr. fusion.