I investigate certain effects within a household / between partners. I have paneldata (person-year) for several variables, and a partner id. I would like to regress the outcome of a person on the dependent variable values of its partner. I don't know how to do this specification in Stata.
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(year id pid y x)
1 1 3 9 2
2 1 3 10 4
3 1 . 11 6
1 2 4 20 2
2 2 4 21 6
3 2 3 22 7
1 3 1 25 5
2 3 1 30 10
3 3 2 35 15
1 4 2 20 4
2 4 2 30 6
3 4 . 40 8
end
* pooled regression
reg y x
* fixed effects regression
xtset year id
xtreg y x, fe
I can do pooled and fixed effects regressions. But even for the pooled / simple regression, how can I regress someones outcome on somebody else's independent variable?
Actually for Person 1, I need to regress 9/10/11 on 5/10/. and so on.
Person 2: regress 20/21/22 on 4/6/15
Person 3: regress 25/30/35 on 2/4/7
Person 4: regress 20/30/40 on 2/6/.
Idea: If there is no option in the regress function, I guess I could create new variables for each independent variable I have and name it x_partner. In this example x_partner should contain 5,10,.,4,6,15,2,4,7,2,6,. but I still don't know how to achieve this.
bysort id (year): egen x_partner = x[pid] // rough idea
The rough idea won't work. egen needs one of its own functions specified, and that alone makes the syntax illegal.
But the essence here is to look up the partner's values and put in new variables aligned with each identifier.
Thanks for using dataex.
rangestat from SSC, a community-contributed command, allows a one-line solution. Consider
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(year id pid y x)
1 1 3 9 2
2 1 3 10 4
3 1 . 11 6
1 2 4 20 2
2 2 4 21 6
3 2 3 22 7
1 3 1 25 5
2 3 1 30 10
3 3 2 35 15
1 4 2 20 4
2 4 2 30 6
3 4 . 40 8
end
ssc install rangestat
rangestat wanted_y=y wanted_x=x if !missing(id, pid), interval(id pid pid) by(year)
list, sepby(id)
+-------------------------------------------------+
| year id pid y x wanted_y wanted_x |
|-------------------------------------------------|
1. | 1 1 3 9 2 25 5 |
2. | 2 1 3 10 4 30 10 |
3. | 3 1 . 11 6 . . |
|-------------------------------------------------|
4. | 1 2 4 20 2 20 4 |
5. | 2 2 4 21 6 30 6 |
6. | 3 2 3 22 7 35 15 |
|-------------------------------------------------|
7. | 1 3 1 25 5 9 2 |
8. | 2 3 1 30 10 10 4 |
9. | 3 3 2 35 15 22 7 |
|-------------------------------------------------|
10. | 1 4 2 20 4 20 2 |
11. | 2 4 2 30 6 21 6 |
12. | 3 4 . 40 8 . . |
+-------------------------------------------------+
Related
Basically I am trying to calculate shots received in golf for various four balls, here is my data:-
DatePlayed PlayerID HCap Groups Hole01 Hole02 Hole03 Shots
----------------------------------------------------------------------
2018-11-10 001 15 2 7 3 6
2018-11-10 004 20 1 7 4 6
2018-11-10 025 20 2 7 4 5
2018-11-10 047 17 1 8 3 6
2018-11-10 048 20 2 8 4 6
2018-11-10 056 17 1 6 3 5
2018-11-10 087 18 1 7 3 5
I want to retrieve the above lines with an additional column which is to be calculated depending on the value in the group column, which is the players (Handicap - (the lowest handicap in the group)) x .75
I can achieve it in a group by but need to aggregate everything, is there a way I can return the value as above?, here is query that returns the value:
SELECT
PlayerID,
MIN(Handicap),
MIN(Hole01) AS Hole01,
MIN(Hole02) AS Hole02,
MIN(Hole03) AS Hole03,
MIN(CourseID) AS CourseID,
Groups,
ROUND(
MIN((Handicap -
(SELECT MIN(Handicap) FROM Results AS t
WHERE DatePlayed='2018-11-10 00:00:00' AND t.Groups=Results.Groups)) *.75))
AS Shots
FROM
Results
WHERE
Results.DatePlayed='2018=11=10 00:00:00'
GROUP BY
DatePlayed, Groups, PlayerID
.
PlayerID MIN(Handicap)Hole01 Hole02 Hole03 CourseID Groups Shots
-----------------------------------------------------------------
4 20 7 4 6 1 1 2
47 17 8 3 6 1 1 0
56 17 6 3 5 1 1 0
87 18 7 3 5 1 1 1
1 15 7 3 6 1 2 0
25 20 7 4 5 1 2 4
48 20 8 4 6 1 2 4
Sorry about any formatting really couldn't see how to get my table in here, any help will be much appreciated, I am using the latest mysql from ubuntu 18.04
Not an answer; too long for a comment...
First off, I happily know nothing about golf, so what follows might not be optimal, but it must, at least, be a step in the right direction...
A normalized schema might look something like this...
rounds
round_id DatePlayed PlayerID HCap Groups
1 2018-11-10 1 15 2
2 2018-11-10 4 20 1
round_detail
round_id hole shots
1 1 7
1 2 3
1 3 6
2 1 7
2 2 4
2 3 6
Hi Guys I have found the solution, basically I need to drop the MIN immediately after the ROUND of the equation and therefore it does not need a Group By.
SELECT
PlayerID,
Handicap,
Hole01,
Hole02,
Hole03,
CourseID,
Groups,
ROUND((Handicap -
(SELECT MIN(Handicap) FROM Results AS t
WHERE DatePlayed='2018-11-10 00:00:00'
AND t.Groups=Results.Groups))
*.75) AS Shots
FROM
Results
WHERE
Results.DatePlayed='2018=11=10 00:00:00'
just would like to compare cell entries and return values.
coustmer_NO id A1 A2 A3 A4
1 5 10 20 45 0
1 13 0 45 2 5
2 4 0 10 7 8
2 3 7 9 55 0
2 10 0 0 0 0
3 4 90 8 14 3
3 10 20 7 4 15
how to count the ID that has (value > 030) for each customer_no
then, the min number of values before 030 appears.
The expected output would be something like:
cosutmer_no , count_ac_num , values
1 2 1
2 1 1
3 1 3
I would recommend converting to a more vertical structure. Then you can begin trying to apply your business logic, although I am having a hard time understanding what that is.
Assuming that the quotes are not meaningful (looks like someone had a string like "xxx" that had actual quote characters in it that was written to a CSV file and so extra quotes where added to protect the existing quotes so it became """xxx""") you could just use compress() function to remove them.
You could then just split the resulting string into 3 character substrings.
data want ;
set have ;
array h history1 history2 ;
do history=1 to dim(h);
h(history)=compress(h(history),'"');
length index 8 value $3 ;
do index=1 by 1 until (value=' ');
value=substrn(h(history),3*(index-1)+1,3);
if value ne ' ' then output;
end;
end;
drop history1 history2;
run;
So you end up with something like this:
Obs id type history index value
1 1 13 1 1 STD
2 1 13 1 2 STD
3 1 13 1 3 058
4 1 13 1 4 030
5 1 13 2 1 STD
6 1 13 2 2 030
7 1 13 2 3 066
8 1 13 2 4 036
9 1 13 2 5 030
10 1 13 2 6 STD
11 1 13 2 7 STD
12 1 13 2 8 STD
13 1 13 2 9 STD
14 1 13 2 10 STD
15 1 3 1 1 STD
16 1 3 1 2 STD
17 1 3 1 3 STD
18 1 3 1 4 XXX
19 1 3 1 5 STD
20 1 3 1 6 XXX
21 1 3 1 7 S
I'm using the clustercommand and am having difficulties due to insufficient memory. To get around this problem I would like to delete all duplicate observations.
I would like to cluster via the variables A, B and C and I identify duplicate values as so:
/* Create dummy data */
input id A B C
1 1 1 1
2 1 1 1
3 1 1 1
4 2 2 2
5 2 2 2
6 2 2 2
7 2 2 2
8 3 3 3
9 3 3 3
10 4 4 4
end
sort A B C id
duplicates tag A B C, gen(dup_tag)
I would like to add a variable dup_ID which tells me that ids 2 and 3 are duplicates of id 1, ids 5 and 6 of id 4, and so on. How could I do this?
/* Desired result */
id A B C dup_id
1 1 1 1 1
2 1 1 1 1
3 1 1 1 1
4 2 2 2 4
5 2 2 2 4
6 2 2 2 4
7 2 2 2 4
8 3 3 3 8
9 3 3 3 8
10 4 4 4 10
duplicates is a wonderful command (see its manual entry for why I say that), but you can do this directly:
bysort A B C : gen tag = _n == 1
tags the first occurrence of duplicates of A B C as 1 and all others as 0. For the other way round use _n > 1, _n != 1, or whatever.
EDIT:
So then the id of tagged observations is just
by A B C: gen dup_id = id[1]
For basic technique with by: see (e.g.) this discussion
You can refer to the first observation in each group of A B C using the subscript [1] on ID. Note the (id) argument in bysort, which sorts by id, but identifies the groups by A, B, and C only.
clear
input id A B C
1 1 1 1
2 1 1 1
3 1 1 1
4 2 2 2
5 2 2 2
6 2 2 2
7 2 2 2
8 3 3 3
9 3 3 3
10 4 4 4
end
bysort A B C (id): gen dup_id = id[1]
li, noobs sepby(dup_id)
yielding
+-------------------------+
| id A B C dup_id |
|-------------------------|
| 1 1 1 1 1 |
| 2 1 1 1 1 |
| 3 1 1 1 1 |
|-------------------------|
| 4 2 2 2 4 |
| 5 2 2 2 4 |
| 6 2 2 2 4 |
| 7 2 2 2 4 |
|-------------------------|
| 8 3 3 3 8 |
| 9 3 3 3 8 |
|-------------------------|
| 10 4 4 4 10 |
+-------------------------+
I am trying make a complex query in MySQL i have following two tables
departments employees
Id parent title id name department status
--------------------------- ------------------------------------
1 0 Health 1 abc 3 1
2 0 Sports 2 def 3 1
3 0 Education 3 ghi 5 1
4 1 Physical 4 jkl 10 1
5 1 Mental 5 kkk 6 1
6 2 Football 6 lll 6 1
7 2 Baseball 7 sss 8 1
8 2 Beachball 8 xxx 6 1
9 2 Hockey 9 yyy 6 1
10 4 ENT 10 zzz 7 1
11 0 Finance 11 mnb 11 1
Departments table have four main departments(i-e: parent = 0) with multiple level depths of sub-departments.
Currently i have done it through a PHP function by running queries multiple time to achieve this, but still i want know if this is possible to fetch it with a Query.
Whats the best way OR how to select max 3 employee randomly for each main department with status 1.
The expected result should be something like
id name department maindep
1 abc 3 3
2 def 3 3
3 ghi 5 1
4 jkl 10 1
5 kkk 6 2
7 sss 8 2
10 zzz 7 2
11 mnb 11 11
tables i am using
product_parameter
id productid parameterid value
1 1 1 10
2 2 2 11
3 1 2 34
4 2 4 44
5 3 2 55
6 3 3 43
7 4 1 33
8 1 3 33
9 1 4 24
and so on i want to display in form
parameterid 1 2 3 4. . .
productid
1 43 34 33 24
2 null 11 null 44
3
4
.
.
and so on rows and columns are not fixed
tables i am using
product_parameter
Sorry wanted to correct the output format i needed after query
id productid parameterid value
1 1 1 10
2 2 2 11
3 1 2 34
4 2 4 44
5 3 2 55
6 3 3 43
7 4 1 33
8 1 3 33
9 1 4 24
and so on i want to display in form
parameterid 1 2 3 4. . .
productid
1 43 34 33 24
2 null 11 null 44
3
4
.
.
and so on... rows and columns values are not fixed
It would be much easier to create 2D array in application code.
In SQL you have to add a join for each column.