Panel data or time-series data and xt regression - regression

Need help observing simple regression as well as xt-regression for panel data.
The dataset consists of 16 participants in which daily observations were made.
I would like to observe the difference between pre-test (from the first date on which observations were taken) and post-test (the last date on which observations were made) across different variables.
also I was advised to do xtregress, re
what is this re? and its significance?

If the goal is to fit some xt model at the end, you will need the data in long form. I would use:
bysort id (year): keep if inlist(_n,1,_N)
For each id, this puts the data in ascending chronological order, and keeps the first and last observation for each id.
The RE part of your question is off-topic here. Try Statalist or CV SE site, but do augment your questions with details of the data and what you hope to accomplish. These may also reveal that getting rid of the intermediate data is a bad idea.
Edit:
Add this after the part above:
bysort id (year): gen t= _n
reshape wide x year, i(id) j(t)
order id x1 x2 year1 year2

Perhaps this sample code will set you in the direction you seek.
clear
input id year x
1 2001 11
1 2002 12
1 2003 13
1 2004 14
2 2001 21
2 2002 22
2 2003 23
3 1005 35
end
xtset id year
bysort id (year): generate firstx = x[1]
bysort id (year): generate lastx = x[_N]
list, sepby(id)
With regard to xterg, re, that fits a random effects model. See help xtreg for more details, as well as the documentation for xtreg in the Stata Longitudinal-Data/Panel-Data Reference Manual included in your Stata documentation.

Related

Nested sort in SELECT followed by Conditional INSERT based upon results of SELECT inquiry

I have been struggling with the following for some time.
The server I am using has MySQL ver 5.7 installed.
The issue:
I wish to take recorded tank level readings from one table, find the difference between the last two records for a particular tank, and multiply this by a factor to get a quantity used.
The extracted quantity, if it is +ve, else 0 , then to be inserted into another table for further use.
The Quant value extracted may be +ve or -ve as tanks fill and empty. I only require the used quantity -ie falling level.
The two following tables are used:
Table 'tf_rdgs' sample;
value 1 is content height.
id
location
value1
reading_time
1
18
1500
2
18
1340
3
9
1600
4
18
1200
5
9
1400
6
18
1765
yyyy
7
18
1642
xxxx
Table 'flow' example
id
location
Quant
reading_time
1
18
5634
dd-mm: HH-mm
2
18
0
dd-mm: HH-mm
3
18
123
current time
I do not require to go back over history and am only interested in the latest level readings as a new level reading is inserted.
I can get the following to work with a table of only one location.
INSERT INTO flow (location, Quant)
SELECT t1.location, (t2.value1 - t1.value1) AS Quant
FROM tf_rdgs t1 cross join tf_rdgs t2 on t1.reading_time > t2.reading_time
ORDER BY t2.reading_time DESC limit 1
It is not particularly efficient but works and gives the following return from the above table.
location
Quant
18
123
for a table with mixed locations including a WHERE t1.location = ... statement does not work.
The problems i am struggling with are
How to nest the initial sorting by location for the subsequent inquiry of difference between the last two tank level readings.
A singular location search is ok rather than all tanks.
A Conditional INSERT to insert the 'Quant' value only if it is +ve or else insert a 0 if it is -ve (ie filling)
I have tried many permutations on these without success.
Once the above has been achieved it needs to run on a conditional trigger - based upon location of inserted data - in the tf_rdgs table activated upon each new reading inserted from the sensors on a particular tank.
I can achieve the above with the exception of the conditional insert if each tank had a dedicated table but unfortunately I cant go there due existing data structure and usage.
Any direction or assitance on parts or whole of this much appreciated.

Isolating unique observations and calculating the average in Stata

Currently I have a dataset that appears as follows:
mnbr firm contribution
1591 2 1
9246 6 1
812 6 1
674 6 1
And so on. The idea is that mnbr is the member number of employees who work at firm # whatever. If contribution is 1 (and I have dropped all the 0s for this purpose) said employee has contributed to a certain fund.
I additionally used codebook to determine the number of unique firms that exist. The goal is to determine the average number of contributions per firm i.e. there was 1 contribution for firm 2, 3 contributions for firm 6 and so on. The problem I arrive at is accessing that the unique values number from codebook.
I read some documentation online for
inspect *varlist*
display r(N_unique)
which suggests to me that using r(N_unique) would store that value, yet unfortunately this method did not work for me. So that is part 1.
Part 2 is I'd also like to create a variable that shows the contributions in each firm i.e.
mnbr firm contribution average
1591 2 1 1
9246 6 . 2/3
812 6 1 2/3
674 6 1 2/3
to show that for firm 6, 2 out of the 3 employees contributed to this fund.
Thanks in advance for the help.
To answer your comment, this works for me:
clear
set more off
input ///
mnbr firm cont
1591 2 1
9246 6 .
812 6 1
674 6 1
end
list
// problem 1
inspect firm
display r(N_unique)
// problem 2
bysort firm: egen totc = total(cont)
by firm: gen share = totc / _N
list
You have to use r(N_unique) before running another Stata command, or it can get lost. You can also save that result to a local or scalar.
Problem 2 is also addressed.

Access report building for Dummies

I have a database that tracks invoice payment certification. I am trying to build a report that will bring The following info in
Service Line - ABC
Total "unique" records - 2
max age - 3
Min age - 1
Avg of max age - 3
count of records over 14 days - 0
count of records under 14 days - 2
score: records under 14 days/total records - 100%
I am trying to build the report from a query that includes a date range. the important column names are:
Service Line DLN Age in days
ABC 123456 1
DEF 987654 3
ABC 123456 2
DEF 987654 4
ABC 123456 3
The DLN is an identifier to each different invoice number. I entered the data in above that I need it to return as it correlates to the table below.
I totally forgot I asked the question on here and ended up fixing the issue. What I ended up doing is building the report from scratch from the query instead of using the wizard. Then I just free typed the functions into unbound text boxes. I did then use the grouping to separate it the way I needed. Once I realized my mistake and that the solution was so easy I felt kind of dumb.

COUNTIF for rows which contain a given value in another column

My table lists every character from all 5 of George R. R. Martin's currently published A Song of Ice and Fire novels. Each row contains a record indicating which book in the series the character is from (numbered 1-5) and a single letter indicating the character's gender (M/F). For example:
A B C
1 Character Book Gender
------------------------------
2 Arya Stark - 1 - F
3 Eddard Stark - 1 - M
4 Davos Seaworth - 2 - M
5 Lynesse Hightower - 2 - F
6 Xaro Xhoan Daxos - 2 - M
7 Elinor Tyrell - 3 - F
I can use COUNTIF to find out that there are three females and three males in this table, but I want to know, for example, how many males there are in book 2. How could I write a formula that would make this count? Here is a pseudocode of what I'm trying to achieve:
=COUNTIF(C2:C7, Column B = '2' AND Column C = 'M')
This would output 2.
I'm aware that this task is far better suited to databases and a SELECT query, but I'd like to know how to solve this problem within the constraints of a LibreOffice Calc spreadsheet, without using a macro. Excel-based solutions are fine, so long as they also work in Calc. If there's no solution that uses COUNTIF, it doesn't matter, so long as it works.
I worked it out, thanks to a prompt by assylias. The COUNTIFS formula produces the result I want by counting multiple search criteria. For example, this formula works out how many male characters are in Book 1 (A Game of Thrones).
=COUNTIFS($A$2:$A$2102, "=1", $L$2:$L$2102, "=M")

Selecting rows if the total sum of a row is equal to X

I have a table that holds items and their "weight" and it looks like this:
items
-----
id weight
---------- ----------
1 1
2 5
3 2
4 9
5 8
6 4
7 1
8 2
What I'm trying to get is a group where the sum(weight) is exactly X, while honouring the order in which were inserted.
For example, if I were looking for X = 3, this should return:
id weight
---------- ----------
1 1
3 2
Even though the sum of ids 7 and 8 is 3 as well.
Or if I were looking for X = 7 should return
id weight
---------- ----------
2 5
3 2
Although the sum of the ids 1, 3 and 6 also sums 7.
I'm kind of lost in this problem and haven't been able to come up with a query that does at least something similar, but thinking this problem through, it might get extremely complex for the RDBMS to handle. Could this be done with a query? if not, what's the best way I can query the database to get the minimum amount of data to work with?
Edit: As Twelfth says, I need to return the sum, regardless of the amount of rows it returns, so if I were to ask for X = 20, I should get:
id weight
---------- ----------
1 1
3 2
4 9
5 8
This could turn out to be very difficult in sql. What you're attempting to do is solve the knapsack problem, which is non-trivial.
The knapsack problem is interesting from the perspective of computer science for many reasons:
The decision problem form of the knapsack problem (Can a value of at least V be achieved without exceeding the weight W?) is NP-complete, thus there is no possible algorithm both correct and fast (polynomial-time) on all cases, unless P=NP.
While the decision problem is NP-complete, the optimization problem is NP-hard, its resolution is at least as difficult as the decision problem, and there is no known polynomial algorithm which can tell, given a solution, whether it is optimal (which would mean that there is no solution with a larger, thus solving the decision problem NP-complete).
There is a pseudo-polynomial time algorithm using dynamic programming.
There is a fully polynomial-time approximation scheme, which uses the pseudo-polynomial time algorithm as a subroutine, described below.
Many cases that arise in practice, and "random instances" from some distributions, can nonetheless be solved exactly.