MySQL : why is left join slower then inner join? Optimization help required - mysql

I have a MySQL query which joins between the two table. I need to map call id from first table with second table. Second table may not have the call id, hence I need to left join the tables. Below is the query, it takes around 125 seconds to finish.
select uniqueid, TRANTAB.DISP, TRANTAB.DIAL FROM
closer_log LEFT JOIN
(select call_uniqueId, sum(dispo_duration) as DISP, sum(dialing_duration) as DIAL
from agent_transition_log group by call_uniqueId) TRANTAB
on closer_log.uniqueid=TRANTAB.call_uniqueId;
Here is the explain output of the query with left join.
+----+-------------+----------------------+-------+---------------+----------------------------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+-------+---------------+----------------------------+---------+------+--------+-------------+
| 1 | PRIMARY | closer_log | index | NULL | uniqueid | 43 | NULL | 37409 | Using index |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 32535 | |
| 2 | DERIVED | agent_transition_log | index | NULL | index_agent_transition_log | 43 | NULL | 159406 | |
+----+-------------+----------------------+-------+---------------+----------------------------+---------+------+--------+-------------+
If I do the internal join, then execution time is around 2 seconds.
select uniqueid, TRANTAB.DISP, TRANTAB.DIAL FROM
closer_log JOIN
(select call_uniqueId, sum(dispo_duration) as DISP, sum(dialing_duration) as DIAL
from agent_transition_log group by call_uniqueId) TRANTAB
on closer_log.uniqueid=TRANTAB.call_uniqueId;
Explain output of query with internal join.
+----+-------------+----------------------+-------+------------------------------------+----------------------------+---------+-----------------------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+-------+------------------------------------+----------------------------+---------+-----------------------+--------+--------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 32535 | |
| 1 | PRIMARY | closer_log | ref | uniqueid,index_closer_log | index_closer_log | 43 | TRANTAB.call_uniqueId | 1 | Using where; Using index |
| 2 | DERIVED | agent_transition_log | index | NULL | index_agent_transition_log | 43 | NULL | 159406 | |
+----+-------------+----------------------+-------+------------------------------------+----------------------------+---------+-----------------------+--------+--------------------------+
My question is, why is internal join so much faster then left join. Does my query has any logical fault which is causing the slow execution? What are my optimization options. The call ids in both the tables are indexed.
Edit 1) Added table descriptions
mysql> desc agent_transition_log;
+--------------------+----------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+----------------------+------+-----+---------+-------+
| user_log_id | int(9) unsigned | NO | MUL | NULL | |
| event_time | datetime | YES | | NULL | |
| dispoStatus | varchar(6) | YES | | NULL | |
| call_uniqueId | varchar(40) | YES | MUL | NULL | |
| xfer_call_uid | varchar(40) | YES | | NULL | |
| pause_duration | smallint(5) unsigned | YES | | 0 | |
| wait_duration | smallint(5) unsigned | YES | | 0 | |
| dialing_duration | smallint(5) unsigned | YES | | 0 | |
| ring_wait_duration | smallint(5) unsigned | YES | | 0 | |
| talk_duration | smallint(5) unsigned | YES | | 0 | |
| dispo_duration | smallint(5) unsigned | YES | | 0 | |
| park_duration | smallint(5) unsigned | YES | | 0 | |
| rec_duration | smallint(5) unsigned | YES | | 0 | |
| xfer_wait_duration | smallint(5) unsigned | YES | | 0 | |
| logged_in_duration | smallint(5) unsigned | YES | | 0 | |
| sub_status | varchar(6) | YES | | NULL | |
+--------------------+----------------------+------+-----+---------+-------+
16 rows in set (0.00 sec)
mysql> desc closer_log;
+----------------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+----------------------+------+-----+---------+----------------+
| closecallid | int(9) unsigned | NO | PRI | NULL | auto_increment |
| lead_id | int(9) unsigned | NO | MUL | NULL | |
| list_id | bigint(14) unsigned | YES | | NULL | |
| campaign_id | varchar(20) | YES | MUL | NULL | |
| call_date | datetime | YES | MUL | NULL | |
| start_epoch | int(10) unsigned | YES | | NULL | |
| end_epoch | int(10) unsigned | YES | | NULL | |
| length_in_sec | int(10) | YES | | NULL | |
| status | varchar(6) | YES | | NULL | |
| phone_code | varchar(10) | YES | | NULL | |
| phone_number | varchar(18) | YES | MUL | NULL | |
| user | varchar(20) | YES | | NULL | |
| comments | varchar(255) | YES | | NULL | |
| processed | enum('Y','N') | YES | | NULL | |
| queue_seconds | decimal(7,2) | YES | | 0.00 | |
| user_group | varchar(20) | YES | | NULL | |
| xfercallid | int(9) unsigned | YES | | NULL | |
| uniqueid | varchar(40) | YES | MUL | NULL | |
| callerid | varchar(40) | YES | | NULL | |
| agent_only | varchar(20) | YES | | | |
| queue_position | smallint(4) unsigned | YES | | 1 | |
| root_uid | varchar(40) | YES | | NULL | |
| parent_uid | varchar(40) | YES | | NULL | |
| extension | varchar(100) | YES | | NULL | |
| alt_dial | varchar(6) | YES | | NULL | |
| talk_duration | smallint(5) unsigned | YES | | 0 | |
| did_pattern | varchar(50) | YES | | NULL | |
+----------------+----------------------+------+-----+---------+----------------+

Left join looks for the fields from left + unmatched entries from right, so it has to check every joined field in the right table which might be NULL (if you don't have an index on the fields for that JOIN, it means the query will check the whole right table every time). Inner join looks only for direct matches, so it might not have to go over the whole table to perform a join (Especially if you join on indexed fields).
By the way, if you only want to display the entries mentioned in agent_transition_log, you don't need join at all:
select call_uniqueId, sum(dispo_duration) as DISP, sum(dialing_duration) as DIAL
from agent_transition_log group by call_uniqueId;
will do the job.
OR if you do want to add the missing entries:
SELECT call_uniqueId, sum(dispo_duration) as DISP, sum(dialing_duration) as DIAL
from agent_transition_log group by call_uniqueId
UNION
SELECT uniqueid as call_uniqueid, NULL as DISP, NULL as DIAL from closer_log
WHERE uniqueid not in (SELECT call_uniqueid FROM agent_transition_log);

Related

how to add a foreign key using an existing column in mysql

I have following table called transactions and I want to add foreign key using column account_id. But when I execute the query (see below) 0 rows affected. I am not getting what is wrong with the query.
I have two tables called transactions and accounts. Accounts table has an id as primary key and an account has_many transactions.
transactions table
+---------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| account_number | int(11) | YES | | NULL | |
| m_number | varchar(255) | YES | | NULL | |
| registration_number | varchar(255) | YES | | NULL | |
| page_number | varchar(255) | YES | | NULL | |
| entry_date | datetime | YES | | NULL | |
| narration | varchar(255) | YES | | NULL | |
| voucher_number | varchar(255) | YES | | NULL | |
| debit_credit | float | YES | | NULL | |
| profit | float | YES | | NULL | |
| account_code | int(11) | YES | | NULL | |
| balance | float | YES | | NULL | |
| branch | varchar(255) | YES | | NULL | |
| account_id | int(11) | YES | MUL | NULL | |
| created_at | datetime | YES | | NULL | |
| updated_at | datetime | YES | | NULL | |
+---------------------+--------------+------+-----+---------+----------------+
16 rows in set (0,01 sec)
mysql> ALTER TABLE transactions ADD FOREIGN KEY (account_id) REFERENCES accounts(id);
Query OK, 0 rows affected (0,03 sec)
Records: 0 Duplicates: 0 Warnings: 0
accounts table
+-----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| account_number | int(11) | YES | | NULL | |
| account_code | int(11) | YES | | NULL | |
| branch | varchar(255) | YES | | NULL | |
| name | varchar(255) | YES | | NULL | |
| father_name | varchar(255) | YES | | NULL | |
| nic_number | varchar(255) | YES | | NULL | |
| mohalla | varchar(255) | YES | | NULL | |
| village | varchar(255) | YES | | NULL | |
| nominee | varchar(255) | YES | | NULL | |
| relationship | varchar(255) | YES | | NULL | |
| opening_balance | float | YES | | NULL | |
| opening_date | datetime | YES | | NULL | |
| created_at | datetime | YES | | NULL | |
| updated_at | datetime | YES | | NULL | |
+-----------------+--------------+------+-----+---------+----------------+
After running the query shouldn't the column say F_K or something like PRI ?
Any help would be great. Thanks!

Select tracks and the date that has maximum visits

Here is the deal. I have two tables
tracks
+--------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+----------------+
| track_id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| artist_id | int(11) | YES | | NULL | |
| genre_id | int(11) | YES | | NULL | |
| track_artist | varchar(255) | YES | | NULL | |
| track_title | varchar(255) | NO | | NULL | |
| track_lyric | text | YES | | NULL | |
| track_video | varchar(255) | YES | | NULL | |
| play_url | varchar(255) | YES | | NULL | |
| shares | int(11) | NO | | 0 | |
| likes | int(11) | NO | | 0 | |
| dislikes | int(11) | NO | | 0 | |
| is_active | enum('T','F') | NO | | T | |
| created_at | int(10) | YES | | NULL | |
+--------------+------------------+------+-----+---------+----------------+
track_visits
+----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+------------------+------+-----+---------+----------------+
| track_visit_id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| track_id | int(11) | NO | MUL | NULL | |
| ip_address | varchar(255) | NO | | NULL | |
| created_at | int(10) | YES | | NULL | |
+----------------+------------------+------+-----+---------+----------------+
And the question is:
How can I select all the tracks and the date that has maximum visits for every specific track?
Regards
Nasko
Since you are using MySql, you can use this:
SELECT tracks.*, track_visits.*
FROM
tracks LEFT JOIN (
SELECT track_id, MAX(created_at) MaxCreatedAt
FROM track_visits
GROUP BY track_id) mx
ON tracks.track_id=mx.track_id
LEFT JOIN track_visits
ON mx.track_id=track_visits.track_id
AND mx.MaxCreatedAt=track_visits.created_at
GROUP BY
tracks.track_id

Determine if a field from one table is like the field from another table and if so count occurrence and if less than 3 return the name

I have two tables images and Interactions in the same DB. I want to determine if the image.images which has a png file name for example
1001_A01_1-4_5mM_3AT_Xgal_7d_W.cropped.resized.grey.png is like plate_name.Interactions which would look like 1001_A01 and then count how many times the images show up. I should get 3 images if there are fewer than 3 images I would like to have plate_name.Interactions returned. I would like to do this using 1 query.
So far I have just tried to count how many occurrences there are but this is failing:
select plate_name.Interactions, count(*) as count from Interactions where plate_name.Interactions like image.images;
Here are the tables in question:
mysql> desc images;
+------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| image | varchar(100) | NO | MUL | NULL | |
| user_id | varchar(50) | YES | | NULL | |
| project_id | varchar(50) | YES | | NULL | |
+------------+--------------+------+-----+---------+-------+
mysql> desc Interactions;
+----------------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+-------------+------+-----+---------+-------+
| plate_name | varchar(25) | NO | MUL | NULL | |
| plate_number | int(11) | NO | MUL | NULL | |
| bait_sequence_name | varchar(25) | NO | | NULL | |
| bait_gene_promoter | varchar(25) | NO | MUL | NULL | |
| array_coord | varchar(25) | NO | | NULL | |
| transcriptor_factor | varchar(25) | NO | | NULL | |
| orf_name | varchar(25) | NO | | NULL | |
| y_coord | varchar(25) | NO | MUL | NULL | |
| x_coord | varchar(25) | NO | MUL | NULL | |
| orig_intensity_value | varchar(25) | NO | | NULL | |
| rc_intensity_value | varchar(25) | NO | | NULL | |
| ptp_intensity_value | varchar(25) | NO | | NULL | |
| z_score | varchar(25) | NO | MUL | NULL | |
| z_prime | varchar(20) | YES | | NULL | |
| call_type | varchar(25) | NO | | NULL | |
| bleed_over | varchar(25) | NO | MUL | NULL | |
| plate_median | int(11) | YES | MUL | NULL | |
| bait_gene | varchar(25) | NO | | NULL | |
| bait_prey_orf | varchar(25) | NO | | NULL | |
| human_call | varchar(25) | NO | | NULL | |
| modified_call | varchar(25) | NO | | NULL | |
| duplicate_call | varchar(25) | YES | | NULL | |
| user_id | varchar(25) | YES | MUL | NULL | |
| project_id | varchar(25) | YES | MUL | NULL | |
+----------------------+-------------+------+-----+---------+-------+
Added in response to Bobby's answer:
+----+-------------+--------------+-------+---------------+--------------+---------+--- ---+---------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+---------------+--------------+---------+--- ---+---------+-----------------------------------------------------------+
| 1 | SIMPLE | images | index | NULL | image | 208 | NULL | 19581 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | Interactions | range | plate_median | plate_median | 5 | NULL | 3714984 | Using where; Using join buffer |
+----+-------------+--------------+-------+---------------+--------------+---------+------+---------+-----------------------------------------------------------+
I think you write your table name and your column name in incorrect format..
if I'm right, the query should return only plate_name which it has less than 3 images,
so you can do it with a GROUP BY and HAVING method..
SELECT Iteractions.plate_name, count(*) as `count`
FROM Interactions, Images
WHERE Interactions.plate_name LIKE Images.image
GROUP BY Interactions.plate_name
HAVING count(*) < 3
Please try that, and tell me if that's not what you want.. :)

How to make this SQL query faster?

I have the following query:
SELECT DISTINCT `movies_manager_movie`.`id`,
`movies_manager_movie`.`title`,
`movies_manager_movie`.`original_title`,
`movies_manager_movie`.`synopsis`,
`movies_manager_movie`.`keywords`,
`movies_manager_movie`.`release_date`,
`movies_manager_movie`.`rating`,
`movies_manager_movie`.`poster_web_url`,
`movies_manager_movie`.`has_poster`,
`movies_manager_movie`.`number`,
`movies_manager_movie`.`has_sources`,
`movies_manager_movie`.`season_id`,
`movies_manager_movie`.`created`,
`movies_manager_movie`.`updated`,
`movies_manager_moviecache`.`activity_name`
FROM `movies_manager_movie`
LEFT OUTER JOIN `movies_manager_moviecache` ON (`movies_manager_movie`.`id` = `movies_manager_moviecache`.`movie_id`)
WHERE (`movies_manager_movie`.`has_sources` = 1
AND (`movies_manager_moviecache`.`team_member_id` IN (
SELECT U0.`id` FROM `movies_manager_movieteammember` U0
INNER JOIN `movies_manager_movieteammemberactivity` U1 ON (U0.`id` = U1.`team_member_id`)
WHERE U1.`movie_id` = 3588 )
AND `movies_manager_movie`.`number` IS NULL
)
AND NOT (`movies_manager_movie`.`id` = 3588 ))
ORDER BY `movies_manager_moviecache`.`activity_name` DESC LIMIT 3;
This query can take up to 3 seconds and I'm very surprise since I got indexes everywhere and no more than 35 rows in each of my MyIsam tables, using the latest MySQL version.
I cached everything I could but I have at least to run this one 20000 times every day, which is approximately 16 h of waiting for loading. And I'm pretty sure none of my user (nor Google Bot) appreciate a 4 secondes waiting time for each page loading.
What could I do to make it faster ?
I thought about duplicating field from movie to moviecache since the all purpose of movie cache is to denormalize to complex join already.
I tried inlining the subquery to a list of ID but it surprisingly doubled the time of the query.
Tables:
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(120) | NO | UNI | NULL | |
| original_title | varchar(120) | YES | | NULL | |
| synopsis | longtext | YES | | NULL | |
| keywords | varchar(120) | YES | | NULL | |
| release_date | date | YES | | NULL | |
| rating | int(11) | NO | | NULL | |
| poster_web_url | varchar(255) | YES | | NULL | |
| has_poster | tinyint(1) | NO | | NULL | |
| number | int(11) | YES | | NULL | |
| season_id | int(11) | YES | MUL | NULL | |
| created | datetime | NO | | NULL | |
| updated | datetime | NO | | NULL | |
| has_sources | tinyint(1) | NO | | NULL | |
+----------------+--------------+------+-----+---------+----------------+
+---------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(120) | NO | UNI | NULL | |
| biography | longtext | YES | | NULL | |
| birth_date | date | YES | | NULL | |
| picture_web_url | varchar(255) | YES | | NULL | |
| allocine_link | varchar(255) | YES | | NULL | |
| created | datetime | NO | | NULL | |
| updated | datetime | NO | | NULL | |
| has_picture | tinyint(1) | NO | | NULL | |
| biography_linkyfied | longtext | YES | | NULL | |
+---------------------+--------------+------+-----+---------+----------------+
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| movie_id | int(11) | NO | MUL | NULL | |
| tag_slug | varchar(100) | YES | MUL | NULL | |
| team_member_id | int(11) | YES | MUL | NULL | |
| cast_rank | int(11) | YES | | NULL | |
| activity_name | varchar(30) | YES | MUL | NULL | |
+----------------+--------------+------+-----+---------+----------------+
Mysql tells me it's a slow query:
# Query_time: 3 Lock_time: 0 Rows_sent: 9 Rows_examined: 454128
Move movies_manager_movieteammemberactivity and movies_manager_movieteammember to your main join statement (so that you're doing a left outer between movies_manager_movie and the inner join product of the other 3 tables). This should speed up your query considerably.
Try this:
SELECT `movies_manager_movie`.`id`,
`movies_manager_movie`.`title`,
`movies_manager_movie`.`original_title`,
`movies_manager_movie`.`synopsis`,
`movies_manager_movie`.`keywords`,
`movies_manager_movie`.`release_date`,
`movies_manager_movie`.`rating`,
`movies_manager_movie`.`poster_web_url`,
`movies_manager_movie`.`has_poster`,
`movies_manager_movie`.`number`,
`movies_manager_movie`.`has_sources`,
`movies_manager_movie`.`season_id`,
`movies_manager_movie`.`created`,
`movies_manager_movie`.`updated`,
(
SELECT `movies_manager_moviecache`.`activity_name`
FROM `movies_manager_moviecache`
WHERE (`movies_manager_movie`.`id` = `movies_manager_moviecache`.`movie_id`
AND (`movies_manager_moviecache`.`team_member_id` IN (
SELECT U0.`id` FROM `movies_manager_movieteammember` U0
INNER JOIN `movies_manager_movieteammemberactivity` U1 ON (U0.`id` = U1.`team_member_id`)
WHERE U1.`movie_id` = 3588 )
AND `movies_manager_movie`.`number` IS NULL
) ) LIMIT 1) AS `activity_name`
FROM `movies_manager_movie`
WHERE (`movies_manager_movie`.`has_sources` = 1
AND NOT (`movies_manager_movie`.`id` = 3588 ))
ORDER BY `activity_name` DESC
LIMIT 3;
Let me know how that performs

MYSQL: How to find player_id from surname?

I'm now trying to populate my 'testMatch' table (below) with data from my unormalised 'summary' table:
TESTMATCH TABLE
+------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+--------------+------+-----+---------+-------+
| match_id | int(11) | NO | PRI | NULL | |
| match_date | date | YES | | NULL | |
| ground | varchar(50) | YES | MUL | NULL | |
| homeTeam | varchar(100) | YES | MUL | NULL | |
| awayTeam | varchar(100) | YES | MUL | NULL | |
| matchResult | varchar(100) | YES | MUL | NULL | |
| manOfMatch | varchar(30) | YES | | NULL | |
| homeTeam_captain | int(10) | YES | MUL | NULL | |
| homeTeam_keeper | int(10) | YES | MUL | NULL | |
| awayTeam_captain | int(10) | YES | MUL | NULL | |
| awayTeam_keeper | int(10) | YES | MUL | NULL | |
+------------------+--------------+------+-----+---------+-------+
There is no problem populating match_id -----> manOfMatch - it is 'homeTeam_captain', 'homeTeam_keeper', 'awayTeam_captain' and 'awayTeam_keeper' that i'm having problems bringing in.
SUMMARY TABLE
mysql> DESCRIBE SUMMARY;
+-----------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+--------------+------+-----+---------+-------+
| matchID | int(11) | NO | PRI | NULL | |
| Test | int(11) | YES | | NULL | |
| matchDate | date | YES | | NULL | |
| Ground | varchar(50) | YES | | NULL | |
| HomeTeam | varchar(100) | YES | | NULL | |
| AwayTeam | varchar(100) | YES | | NULL | |
| matchResult | varchar(50) | YES | | NULL | |
| MarginRuns | int(11) | YES | | NULL | |
| MarginWickets | int(11) | YES | | NULL | |
| ManOfMatch | varchar(40) | YES | | NULL | |
| HomeTeamCaptain | varchar(30) | YES | | NULL | |
| HomeTeamKeeper | varchar(30) | YES | | NULL | |
| AwayTeamCaptain | varchar(30) | YES | | NULL | |
| AwayTeamKeeper | varchar(30) | YES | | NULL | |
+-----------------+--------------+------+-----+---------+-------+
I need to somehow select the data from summary, get the corresponding player_id and input the player_id into my 'testMatch'. Player table below:
PLAYERS TABLE
mysql> describe players;
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| player_id | int(11) | NO | PRI | NULL | auto_increment |
| player_surname | varchar(30) | YES | | NULL | |
| team | varchar(100) | YES | MUL | NULL | |
+----------------+--------------+------+-----+---------+----------------+
So to clarify, eg. I want to select homeTeam_captain data FROM summary table BUT not the name, I want the corresponding player_id instead.
I assume I need to use some sort of join/subqueries to get this done... i've tried selecting with:
select matchID, player_id, player_surname, team from players p, summary s
where
s.hometeamcaptain = p.player_surname ORDER BY matchID;
but this brings back 73 rows and there should only be 65 (65 matches).
I hope this makes sense and thanks for your help!!
Theresa
Are there overlapping names? If so, also assure that the teams correspond (add s.HomeTeam = p.team to the where block). If there are players with the same name in one team, you will have to solve these conflicts manually.
To select all the keepers/captains at once, you need left outer joins. I guess it will be one join per player, so you have to join the same table 4 times.
Once you've selected the right data, you can insert it in your testMatch table with INSERT ... SELECT.