Stata Probit Model Interaction Term Interpretation - regression

for my thesis i am currently investigating the effects of emissions on health on a regional basis. the dependent variable is bicategorical which takes the value 0 (if health is good) and 1 (if health is bad) with the exception of emissions and capita_gdp every variable is categorical:
here is an exemplary regression:
probit health i.year i.region##emissions age educ smoker gender urban capita_gdp, robust
nofvlabel allbaselevels
Probit regression Number of obs = 67,041
Wald chi2(64) = 5850.28
Prob > chi2 = 0.0000
Log pseudolikelihood = -43026.965 Pseudo R2 = 0.0660
-------------------------------------------------------------------------------------
| Robust
health | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
year |
1 | 0 (base)
2 | -.0236149 .0290446 -0.81 0.416 -.0805412 .0333115
3 | -.0552885 .0343119 -1.61 0.107 -.1225386 .0119615
4 | -.7498958 .0521191 -14.39 0.000 -.8520474 -.6477442
|
region |
1 | 0 (base)
2 | .3424928 .1944582 1.76 0.078 -.0386383 .723624
3 | .6631291 .343445 1.93 0.054 -.0100107 1.336269
4 | 1.005453 .1809361 5.56 0.000 .6508251 1.360081
5 | .5202438 .2705144 1.92 0.054 -.0099547 1.050442
6 | .853456 .2053275 4.16 0.000 .4510215 1.25589
7 | -1.32784 1.329886 -1.00 0.318 -3.934369 1.278688
8 | .2074103 .5587633 0.37 0.710 -.8877457 1.302566
9 | .8778635 1.005655 0.87 0.383 -1.093184 2.848911
10 | .614019 .2058646 2.98 0.003 .2105317 1.017506
11 | 1.103564 .2395228 4.61 0.000 .6341078 1.57302
12 | -.9928198 1.189953 -0.83 0.404 -3.325084 1.339444
13 | .2024027 .3014841 0.67 0.502 -.3884953 .7933008
14 | .8510637 .1966648 4.33 0.000 .4656078 1.23652
15 | -.4685238 1.062594 -0.44 0.659 -2.551171 1.614123
16 | .1222191 .4271317 0.29 0.775 -.7149435 .9593818
17 | 1.777416 .9296525 1.91 0.056 -.0446694 3.599502
18 | .7016812 .3960197 1.77 0.076 -.0745032 1.477866
19 | .2164103 .2324297 0.93 0.352 -.2391436 .6719642
20 | -.8683004 2.079837 -0.42 0.676 -4.944707 3.208106
21 | .6094313 .1969787 3.09 0.002 .2233601 .9955025
22 | .4586692 .2175369 2.11 0.035 .0323048 .8850336
23 | .1376296 .316405 0.43 0.664 -.4825129 .7577721
24 | .8800929 .2139805 4.11 0.000 .4606989 1.299487
25 | .5008748 .181908 2.75 0.006 .1443417 .8574079
26 | .7885192 .2055236 3.84 0.000 .3857004 1.191338
27 | .8370192 .2066431 4.05 0.000 .4320061 1.242032
28 | .0342872 .3383975 0.10 0.919 -.6289597 .697534
|
emissions | .2331187 .0475761 4.90 0.000 .1398713 .3263662
|
region#c.emissions|
1 | 0 (base)
2 | -.1763598 .0473856 -3.72 0.000 -.2692338 -.0834858
3 | .0902526 .3483855 0.26 0.796 -.5925705 .7730757
4 | -.2545669 .0436166 -5.84 0.000 -.3400539 -.1690798
5 | -.1903919 .0525988 -3.62 0.000 -.2934837 -.0873002
6 | -.2595892 .0565328 -4.59 0.000 -.3703914 -.148787
7 | .3660934 .3615611 1.01 0.311 -.3425534 1.07474
8 | -.1810636 .0873587 -2.07 0.038 -.3522836 -.0098436
9 | -.2360667 .2817683 -0.84 0.402 -.7883225 .316189
10 | -.2362498 .0452001 -5.23 0.000 -.3248403 -.1476593
11 | -.2986525 .0606014 -4.93 0.000 -.4174291 -.179876
12 | .4210453 .4355456 0.97 0.334 -.4326084 1.274699
13 | -.1393217 .063414 -2.20 0.028 -.2636109 -.0150324
14 | -.2428271 .0452505 -5.37 0.000 -.3315166 -.1541377
15 | -.1078827 .1281398 -0.84 0.400 -.359032 .1432667
16 | -.1121361 .0991541 -1.13 0.258 -.3064746 .0822024
17 | -.3670531 .1360779 -2.70 0.007 -.6337609 -.1003453
18 | -.241021 .1572069 -1.53 0.125 -.5491408 .0670988
19 | -.2128744 .0452858 -4.70 0.000 -.3016328 -.1241159
20 | .103139 .4313025 0.24 0.811 -.7421983 .9484763
21 | -.217597 .0532092 -4.09 0.000 -.3218851 -.1133089
22 | -.1796928 .0509009 -3.53 0.000 -.2794568 -.0799288
23 | -.1510797 .0529603 -2.85 0.004 -.2548799 -.0472795
24 | -.2589344 .0509662 -5.08 0.000 -.3588264 -.1590425
25 | -.231851 .0448358 -5.17 0.000 -.3197276 -.1439745
26 | -.2411263 .0442314 -5.45 0.000 -.3278182 -.1544344
27 | -.2452313 .0465597 -5.27 0.000 -.3364867 -.153976
28 | -.0563099 .1191566 -0.47 0.637 -.2898525 .1772328
|
age | .1085835 .0049886 21.77 0.000 .098806 .1183609
educ | -.1802489 .0107034 -16.84 0.000 -.2012272 -.1592707
smoker | .080728 .0145963 5.53 0.000 .0521198 .1093362
gender | -.2019473 .0145416 -13.89 0.000 -.2304483 -.1734463
urban | -.1362217 .0112233 -12.14 0.000 -.1582189 -.1142245
capita_gdp | -8.36e-06 .0000194 -0.43 0.667 -.0000464 .0000297
_cons | -.4987429 .1638654 -3.04 0.002 -.8199132 -.1775726
-------------------------------------------------------------------------------------
My question is, how can I exactly interpret the coefficients of emissions and the interaction of region.c#emissions on the dependent variable ? To my understanding the coefficient of emissions for region 1 is the base level and the coefficient of emissions in region 2 is lower than region 1 by -.176 ?

Correct. Two extra things worth noting:
Interactions work both ways. So the interaction coefficient tells you that the emissions effect is 0.176 smaller in region 2, but also that the effect of being in region 2 is 0.176 smaller if emissions are one unit larger. That also means you cannot directly interpret any coefficient involved in the interaction (region & emissions) as they both depend on each other.
Stata has an excellent margins and marginsplot command that calculates for you what the coefficients are at particular levels of region and/or emissions. It has a bit of a learning curve, but if you get the hang of it you can produce beautiful graphs to illustrate the interaction effect that will be much more informative than a long regression table.
There are many tutorials online on how to use margins and there's also this presentation by Ben Jann.

Related

Regression with all variables without explicitly declaring them

I have a dataset that I would like to run a regression on in Stata. I want to make one of the dummy variables the base so I use the ib1.month1 in the regress command.
Is it possible to include in my regression all other variables in the dataset without explicitly writing out each variable again?
You can use the ds command:
sysuse auto, clear
drop make
ds price foreign, not
regress price ib1.foreign `r(varlist)'
Source | SS df MS Number of obs = 69
-------------+---------------------------------- F(10, 58) = 8.66
Model | 345416162 10 34541616.2 Prob > F = 0.0000
Residual | 231380797 58 3989324.09 R-squared = 0.5989
-------------+---------------------------------- Adj R-squared = 0.5297
Total | 576796959 68 8482308.22 Root MSE = 1997.3
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
foreign |
Domestic | -3334.848 957.2253 -3.48 0.001 -5250.943 -1418.754
mpg | -21.80518 77.3599 -0.28 0.779 -176.6578 133.0475
rep78 | 184.7935 331.7921 0.56 0.580 -479.3606 848.9476
headroom | -635.4921 383.0243 -1.66 0.102 -1402.198 131.2142
trunk | 71.49929 95.05012 0.75 0.455 -118.7642 261.7628
weight | 4.521161 1.411926 3.20 0.002 1.694884 7.347438
length | -76.49101 40.40303 -1.89 0.063 -157.3665 4.38444
turn | -114.2777 123.5374 -0.93 0.359 -361.5646 133.0092
displacement | 11.54012 8.378315 1.38 0.174 -5.230896 28.31115
gear_ratio | -318.6479 1124.34 -0.28 0.778 -2569.259 1931.964
_cons | 13124.34 6726.3 1.95 0.056 -339.8103 26588.5
------------------------------------------------------------------------------

3 mysql table join but not getting expected result that i want

Here my table:
account
ac name
120 Tom
130 Jony
140 Jone
bread_sale
ac pcs amount date
120 12 60 2018-01-03
120 10 50 2018-01-04
140 8 40 2018-01-04
130 5 25 2018-01-05
water_sale
ac pcs amount date
130 2 30 2018-01-03
130 5 75 2018-01-04
140 3 45 2018-01-04
130 4 60 2018-01-05
120 5 75 2018-01-07
Here's the query that I have tried:
select account.ac,
account.name,
bread_sale.amount as BSAmount,
bread_sale.date as BSDate,
water_sale.amount as WSAmount,
water_sale.date as WSDate
from account left outer join bread_sale on account.ac = bread_sale.ac
left outer join water_sale on water_sale.ac = account.ac
order by account.ac
This is the result:
ac name BSAmount BSdate WSAmount WSdate
120 Tom 30 2018-01-03 75 2018-01-07
120 Tom 75 2018-01-04 75 2018-01-07
130 Jony 45 2018-01-05 30 2018-01-03
130 Jony 60 2018-01-05 75 2018-01-04
130 Jony 75 2018-01-05 60 2018-01-05
140 Jone 75 2018-01-04 45 2018-01-04
But I want to obtain something like this:
ac name BSAmount BSdate WSAmount WSdate
120 Tom 60 2018-01-03 75 2018-01-07
120 Tom 50 2018-01-04 0 2018-01-07
130 Jony 25 2018-01-05 30 2018-01-03
130 Jony 0 2018-01-05 75 2018-01-04
130 Jony 0 2018-01-05 60 2018-01-05
140 Jone 40 2018-01-04 45 2018-01-04
In 2018-01-07 Tom did not sale water but I get 75 amount.
Someone help me, please
It is achievable but messy and probably not very efficient. There is a relationship between the bread water and account on ac. It is possible to establish the maximum number of row numbers spanning bread and water and then rejoining on row number. Put another way bread and water are joined on a position basis (ie the order rows appear in the tables) The resulting query is horrible and parses the data more frequently than I would be comfortable with personally.
so
select * from
(
select c.ac cac,t.name tname,d.amount bsamount, d.dt ddt,
e.amount wsamount ,e.dt edt
,
case when d.dt is not null and d.dt < e.dt then d.dt
when d.dt is not null and e.dt is null then d.dt
else e.dt
end as sortorder
from
(
select *
from
(
select bs.ac,
if(bs.ac <> #p,#rn:=1,#rn:=#rn+1) rn,
#p:=bs.ac p
from bs
cross join (select #rn:=0,#p:=0) r
order by bs.ac,bs.dt
) a
union
(
select ac2,rn1,p1
from
(
select ws.ac ac2,
if(ws.ac <> #p1,#rn1:=1,#rn1:=#rn1+1) rn1,
#p1:=ws.ac p1
from ws
cross join (select #rn1:=0,#p1:=0) r
order by ws.ac,ws.dt
) b
)
) c
left join
(
select bs.ac,pcs,amount,dt,
if(bs.ac <> #p3,#rn3:=1,#rn3:=#rn3+1) rn3,
#p3:=bs.ac p
from bs
cross join (select #rn3:=0,#p3:=0) r
order by bs.ac,bs.dt
) d
on d.ac = c.ac and d.rn3 = c.rn
left join
(
select ws.ac,pcs,amount,dt,
if(ws.ac <> #p4,#rn4:=1,#rn4:=#rn4+1) rn4,
#p4:=ws.ac p
from ws
cross join (select #rn4:=0,#p4:=0) r
order by ws.ac,ws.dt
) e
on e.ac = c.ac and e.rn4 = c.rn
join t on t.ac = c.ac
) f
order by cac , sortorder;
+------+-------+----------+------------+----------+------------+------------+
| cac | tname | bsamount | ddt | wsamount | edt | sortorder |
+------+-------+----------+------------+----------+------------+------------+
| 120 | Tom | 60 | 2018-01-03 | 75 | 2018-01-07 | 2018-01-03 |
| 120 | Tom | 50 | 2018-01-04 | NULL | NULL | 2018-01-04 |
| 130 | Jony | 25 | 2018-01-05 | 30 | 2018-01-03 | 2018-01-03 |
| 130 | Jony | NULL | NULL | 75 | 2018-01-04 | 2018-01-04 |
| 130 | Jony | NULL | NULL | 60 | 2018-01-05 | 2018-01-05 |
| 140 | Jone | 40 | 2018-01-04 | 45 | 2018-01-04 | 2018-01-04 |
+------+-------+----------+------------+----------+------------+------------+
6 rows in set (0.00 sec)
select account.ac,
account.name,
bread_sale.amount as BSAmount,
bread_sale.date as BSDate,
water_sale.amount as WSAmount,
water_sale.date as WSDate
from account left outer join bread_sale on account.ac = bread_sale.ac
left outer join water_sale on water_sale.ac = bread_sale.ac
order by account.ac
This Query would produce this result. Your zeros will be replaced by correct value. I hope may be you would need this reuslt.
ac name BSAmount BSdate WSAmount WSdate
120 Tom 60 2018-01-03 75 2018-01-07
120 Tom 50 2018-01-04 75 2018-01-07
130 Jony 25 2018-01-05 30 2018-01-03
130 Jony 25 2018-01-05 75 2018-01-04
130 Jony 25 2018-01-05 60 2018-01-05
140 Jone 40 2018-01-04 45 2018-01-04

Unit-specific Trends and R-squared near 1

I am currently working on a country panel dataset in which I am running a Dif-in-Dif regression including unit specific trends in Stata
My main concern is that the adjusted R-squared obtained is really high, sometimes even 0.99. I am assuming this is a sign of some kind of mistake but I do not know how to correct it.
For the model, I have near 5000 observations. The number of countries are 201, I have 36 years and 5 control variables, then the number of parameters would be around 450.
Here I attach the code used:
xtset id_num year // id_num = id_country
reg `outcome' i.treatment i.year i.id_num c.year#i.id_num `controls' if id_country!="USA" & `subgroup'==1, cluster(id_num)
In case is useful, this is the first part of the output
note: 201.id_num#c.year omitted because of collinearity
Linear regression Number of obs = 4,789
F(39, 174) = .
Prob > F = .
R-squared = 0.9994
Root MSE = .20753
(Std. Err. adjusted for 175 clusters in id_country)
-------------------------------------------------------------------------------
| Robust
obesity_as | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
1.treatment | .1847802 .1341994 1.38 0.170 -.080088 .4496483
|
year |
1981 | .2162895 .0156983 13.78 0.000 .185306 .2472731
1982 | .4461132 .0224864 19.84 0.000 .4017319 .4904944
1983 | .6690157 .0281392 23.78 0.000 .6134777 .7245538
1984 | .915047 .0311529 29.37 0.000 .8535609 .9765332
1985 | 1.177176 .0344991 34.12 0.000 1.109085 1.245266
1986 | 1.421679 .0389734 36.48 0.000 1.344758 1.498601
1987 | 1.68354 .0413294 40.73 0.000 1.601969 1.765112
1988 | 1.963494 .0440206 44.60 0.000 1.876611 2.050377
1989 | 2.236331 .0472635 47.32 0.000 2.143048 2.329615
1990 | 2.52923 .0498206 50.77 0.000 2.4309 2.62756

How can I specify the base level of a factor variable?

I have data for 2000-2016 and I am trying to estimate the following regression:
xtset id
xtreg lnp i.year i.year#fp, fe vce(robust)
However, when I do this, Stata omits 2008 because of collinearity.
Is there a way to specify which year is omitted?
More generally, you can specify the omitted level of a factor variable (i.e. the
base) by using the ib operator (see also help fvvarlist).
Below is a reproducible example using Stata's toy dataset nlswork:
webuse nlswork, clear
xtset idcode
Using 77 as the base year:
xtreg ln_wage ib77.year age, fe vce(robust)
Fixed-effects (within) regression Number of obs = 28,510
Group variable: idcode Number of groups = 4,710
R-sq: Obs per group:
within = 0.1060 min = 1
between = 0.0914 avg = 6.1
overall = 0.0805 max = 15
F(15,4709) = 69.49
corr(u_i, Xb) = 0.0467 Prob > F = 0.0000
(Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
| Robust
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
year |
68 | -.108365 .1111117 -0.98 0.329 -.3261959 .1094659
69 | -.0335029 .0995142 -0.34 0.736 -.2285973 .1615915
70 | -.0604953 .0867605 -0.70 0.486 -.2305866 .1095959
71 | -.0218073 .0742761 -0.29 0.769 -.1674232 .1238087
72 | -.0226893 .0622792 -0.36 0.716 -.1447857 .0994071
73 | -.0203581 .049851 -0.41 0.683 -.1180894 .0773732
75 | -.0305043 .0259707 -1.17 0.240 -.081419 .0204104
78 | .0225868 .0147272 1.53 0.125 -.0062854 .0514591
80 | .0058999 .0381391 0.15 0.877 -.0688706 .0806704
82 | .0006801 .0622403 0.01 0.991 -.1213399 .1227001
83 | .0127622 .074435 0.17 0.864 -.1331653 .1586897
85 | .0381987 .0989316 0.39 0.699 -.1557535 .2321508
87 | .0298993 .1237839 0.24 0.809 -.2127751 .2725736
88 | .0716091 .1397635 0.51 0.608 -.2023927 .345611
|
age | .0125992 .0123091 1.02 0.306 -.0115323 .0367308
_cons | 1.312096 .3453967 3.80 0.000 .6349571 1.989235
-------------+----------------------------------------------------------------
sigma_u | .4058746
sigma_e | .30300411
rho | .64212421 (fraction of variance due to u_i)
------------------------------------------------------------------------------
Using 80 as the base year:
xtreg ln_wage ib80.year age, fe vce(robust)
Fixed-effects (within) regression Number of obs = 28,510
Group variable: idcode Number of groups = 4,710
R-sq: Obs per group:
within = 0.1060 min = 1
between = 0.0914 avg = 6.1
overall = 0.0805 max = 15
F(15,4709) = 69.49
corr(u_i, Xb) = 0.0467 Prob > F = 0.0000
(Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
| Robust
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
year |
68 | -.1142649 .1480678 -0.77 0.440 -.4045471 .1760172
69 | -.0394028 .136462 -0.29 0.773 -.3069323 .2281266
70 | -.0663953 .1237179 -0.54 0.592 -.3089402 .1761497
71 | -.0277072 .1112026 -0.25 0.803 -.2457164 .190302
72 | -.0285892 .0991208 -0.29 0.773 -.2229124 .165734
73 | -.026258 .0866489 -0.30 0.762 -.1961303 .1436142
75 | -.0364042 .0625743 -0.58 0.561 -.1590791 .0862706
77 | -.0058999 .0381391 -0.15 0.877 -.0806704 .0688706
78 | .0166869 .0258678 0.65 0.519 -.0340261 .0673999
82 | -.0052198 .0257713 -0.20 0.840 -.0557437 .0453041
83 | .0068623 .0378166 0.18 0.856 -.0672759 .0810005
85 | .0322987 .0620538 0.52 0.603 -.0893558 .1539533
87 | .0239993 .0868397 0.28 0.782 -.1462471 .1942457
88 | .0657092 .1028815 0.64 0.523 -.1359868 .2674052
|
age | .0125992 .0123091 1.02 0.306 -.0115323 .0367308
_cons | 1.317996 .3824809 3.45 0.001 .5681546 2.067838
-------------+----------------------------------------------------------------
sigma_u | .4058746
sigma_e | .30300411
rho | .64212421 (fraction of variance due to u_i)
------------------------------------------------------------------------------

MySQL matching row values sets

I am relatively new with mysql and php. I have developed a hockey stat db. Until now, I have been doing pretty basic queries and reporting of the stats.
I want to do a little more advanced query now.
I have a table that records which players were on the ice (shows as a "fk_pp1_id" - "fk_pp5_id") when a goal is scored. here is the table:
pt_id | fk_gf_id | fk_pp1_id | fk_pp2_id | fk_pp3_id | fk_pp4_id | fk_pp5_id
1 | 1 | 19 | 20 | 68 | 90 | 97
2 | 2 | 1 | 19 | 20 | 56 | 91
3 | 3 | 1 | 56 | 88 | 91 | 93
4 | 4 | 1 | 19 | 64 | 88 | NULL
5 | 5 | 19 | 62 | 68 | 88 | 97
6 | 6 | 55 | 19 | 20 | 45 | 62
7 | 7 | 1 | 19 | 20 | 56 | 61
8 | 8 | 65 | 68 | 90 | 93 | 97
9 | 9 | 19 | 20 | 45 | 55 | 62
10 | 10 | 1 | 19 | 20 | 56 | 61
11 | 11 | 1 | 19 | 20 | 56 | 61
12 | 12 | 19 | 20 | 68 | 90 | 97
13 | 13 | 19 | 20 | 68 | 90 | 97
14 | 14 | 19 | 20 | 55 | 62 | 91
15 | 15 | 1 | 56 | 61 | 64 | 88
16 | 16 | 1 | 56 | 61 | 64 | 88
17 | 17 | 1 | 19 | 20 | 56 | 61
18 | 18 | 1 | 19 | 20 | 56 | 61
19 | 19 | 1 | 65 | 68 | 93 | 97
I want to do several queries:
Show which of the five players were together on the ice most often
when a goal was scored.
Select say 2 players and show which other players were on the ice most often with them when a goal was scored.
I was able to write a query which partially accomplishes query #1 above.
SELECT
fk_pp1_id,
fk_pp2_id,
fk_pp3_id,
fk_pp4_id,
fk_pp5_id,
count(*)
FROM TABLE1
group by
fk_pp1_id,
fk_pp2_id,
fk_pp3_id,
fk_pp4_id,
fk_pp5_id
Here are the results:
fk_pp1_id fk_pp2_id fk_pp3_id fk_pp4_id fk_pp5_id count(*)
1 19 20 56 61 4
1 19 20 56 91 1
1 19 64 88 (null) 1
1 56 61 64 88 2
1 56 88 91 93 1
1 65 68 93 97 1
19 1 20 56 61 1
19 20 45 55 62 1
19 20 55 62 91 1
19 20 68 90 97 3
19 62 68 88 97 1
55 19 20 45 62 1
65 68 90 93 97 1 4
See this sqlfiddle:
http://sqlfiddle.com/#!9/e3f5f/1
This seems to work at first, but I realized this query, as written, is sensitive to the order in which the players are listed. That is to say a row with:
1, 19, 20, 68, 90
will not match
19, 1, 20, 68, 90
So to fix this problem, I feel like I have a couple options:
Ensure the data is input into the table in numerical order
Re-write the query so the order of the data in the table doesn't matter
Make the resulting query a sub-query to another query that first
orders the column (left to right) in numerical order.
Change the schema to record/store the data in a better way
1, I can do, but would prefer to have the query be fool-proof.
2 or 3 I prefer, but don't know how to do either.
4, I don't know how to do and is least desirable as I already have some complex queries against this table that would need to be totally re-written.
Am i going about this in the wrong way or is there a solution??
Thanks for your help
UPDATE -
OK I (hopefully) better normalized the data in the table. Thanks #strawberry. Now my table has a column for the goal_id (foreign key) and a column for the player_id (another foreign key) that was on the ice at the time the goal was scored.
Here is the new fiddle:
http://sqlfiddle.com/#!9/39e5a
I can easily get the one player who was on the ice most when goals are scored, but I can't get my mind around how to find the occurrences of a group of players who were on the ice together. For example, how many times were a group of 5 players on the ice together. Then from there, how often a group of 2 players were on the ice together with the 3 other players.
Any other clues???
I find a similar problem here and based on that i come up with this solution.
For the first part of your problem to select how many time same five player were on the ice when the goal is scored your query could look like this:
SELECT GROUP_CONCAT(t1.fk_gf_id) AS MinOfGoal,
t1.players AS playersNumber,
COUNT(t1.fk_gf_id) AS numOfTimes
FROM (SELECT fk_gf_id, GROUP_CONCAT(fk_plyr_id ORDER BY fk_plyr_id) AS players
FROM Table1
GROUP BY fk_gf_id) AS t1
GROUP BY t1.players
ORDER BY numOfTimes DESC;
And for your second part of the question where you want to select two players and find three more player which were on the ice when goal were scored you should extend previous query whit WERE clause like this
SELECT GROUP_CONCAT(t1.fk_gf_id) AS MinOfGoal,
t1.players AS playersNumber,
COUNT(t1.fk_gf_id) AS numOfTimes
FROM (SELECT fk_gf_id, GROUP_CONCAT(fk_plyr_id ORDER BY fk_plyr_id) AS players
FROM Table1
WHERE fk_gf_id IN (SELECT fk_gf_id
FROM Table1
WHERE fk_plyr_id = 19)
AND fk_gf_id IN (SELECT fk_gf_id
FROM Table1
WHERE fk_plyr_id = 56)
GROUP BY fk_gf_id) AS t1
GROUP BY t1.players
ORDER BY numOfTimes DESC;
You can see how it's work here in SQL Fiddle...
Note: I added some data in Table1 (don't be confused with more date counted).
GL!