Subquery has been wrong all this time what do I do? - mysql

So I have the following table structure for a Sports Event system
TEAMS TABLE
team_id
game_id
team_name
team_logo
PLAYERS TABLE
player_id
team_id
player_name
player_mobile
player_email
So whenever a player submits a team registration details get saved on both tables. Events could be something like Cricket, Basketball, Netball, etc. Sometimes they dont fill in players details and sometimes they resubmit their team again which means same team name is submitted.
So whenever I need to check the accurate details of the team list I have been using this:
SELECT team_id FROM `teams` WHERE `game_id`= 35 GROUP BY `team_name
To get a list of the people in these teams that are the same name I was using this:
SELECT team_id, player_name FROM `player` WHERE team_id IN (SELECT team_id FROM `teams` WHERE `game_id`= 35 GROUP BY `team_name`) AND player_name IS NOT NULL AND player_name <> ''
The problem is the query on top gives me different results to what I am getting on the bottom. What I need to do is to get a list of current teams whenever i need. Duplicates of teams should be not there. Then I need a list of the players of these teams.
Currently stumped :( Help me pls.

TL;DR
You can get the desired results with a JOIN and DISTINCT
SELECT DISTINCT t.team_name, P.player_name
FROM teams AS t
INNER JOIN Players AS p
ON p.team_id = t.team_id;
FULL EXPLANATION
The following query is not deterministic, that is to say, you could run the same query on the same data multiple times and get different results:
SELECT team_id
FROM `teams`
WHERE `game_id`= 35
GROUP BY `team_name`;
Many DBMS would not even allow this query to run. You have stated that some teams are duplicated, so consider the following dummy data:
team_id team_name game_id
------------------------------------
1 The A-Team 35
2 The A-Team 35
3 The A-Team 35
When you group by team_name you are end up with one group, so if we start with a valid query:
SELECT team_name
FROM `teams`
WHERE `game_id`= 35
GROUP BY `team_name`;
We would expect one result:
team_name
--------------
The A-Team
When you add team_id in to the select, with no aggregate function, you need to pick one value for team_id, but the query engine has 3 different values to chose from, and none of them are more correct than any other. This is why anything in the select statement, must be contained within the group by (or functionally dependent on something that is), or part of an aggregate function.
The MySQL Docs state:
In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause. For example, this query is illegal in standard SQL because the name column in the select list does not appear in the GROUP BY:
SELECT o.custid, c.name, MAX(o.payment)
FROM orders AS o, customers AS c
WHERE o.custid = c.custid
GROUP BY o.custid;
For the query to be legal, the name column must be omitted from the select list or named in the GROUP BY clause.
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group.
The reason this clause exists is valid, and can save some time, consider the following query:
SELECT t.team_id, t.team_name, COUNT(*) AS Players
FROM teams AS t
LEFT JOIN Players AS p
ON p.team_id = t.team_id
GROUP BY t.team_id;
Here, we can include team_name in the select list even though it is not in the group by, but we can do this safely since team_id is the primary key, therefore it would be impossible to have two different values of team_name for a single team_id.
Anyway, I digress, the problem you are most likely having is that the value returned for team_id in each of your queries will likely be different depending on the context of the query and the execution plan chosen.
You can get a distinct list of players and teams using DISTINCT:
SELECT DISTINCT t.team_name, P.player_name
FROM teams AS t
INNER JOIN Players AS p
ON p.team_id = t.team_id;
This is essentially a hack, and while it does remove duplicate records it does not resolve the underlying issue, of duplicate records, and potentially a sub-optimal data structure.
If it is not too late, I would reconsider your design and make a few changes. If team names are supposed to be unique, then make them unique with a unique constraint, so instead of working around duplicate entries, you prevent them completely.
You should probably be using junction tables for players and games, i.e. have your main tables
Team (team_id, team_name, team_logo etc)
Game (game_id, game_name, etc)
Player (player_id, player_name, player_email, player_mobile etc)
Then tables to link them
Team_Game (team_id, game_id)
Team_Player (team_id, player_id)
This then allows one player to play for multiple teams, or one team to enter multiple events.

Select t.team_id , p.player_name from player p
JOIN teams t
ON t.team_id = p.team_id
Where t.game_id = 35 AND p.player_name IS NOT NULL AND p.player_name <> ''
GROUP BY(t.team_name)
```
You should do a unique constraint on the team_name column, this way you are not allowing duplicate teams
Ps. I did not test the query but it should work

Related

SQL Comment Grouping

I have two table in MySQL
Table 1: List of ID's
--Just a single column list of ID's
Table 2: Groups
--Group Titles
--Members **
Now the member field is basically a comments field where all the ID's that are part of that group are listed. So for instance one whole field of members looks like this:
"ID003|ID004|ID005|ID006|ID007|ID008|... Etc."
There they can be up to 500+ listed in the field.
What I would like to do is to run a query and find out which ID's appear in only three or less groups.
I've been taking cracks at it, but honestly I'm totally lost. Any ideas?
Edit; I misunderstood the question the first time, so I'm changing my answer.
SELECT l.id
FROM List_of_ids AS l
JOIN Groups AS g ON CONCAT('|', g.members, '|') LIKE CONCAT('%|', l.id, '|%')
GROUP BY l.id
HAVING COUNT(*) <= 3
This is bound to perform very poorly, because it forces a table-scan of both tables. If you have 500 id's and 500 groups, it must run 250000 comparisons.
You should really consider if storing a symbol-separated list is the right way to do this. See my answer to Is storing a delimited list in a database column really that bad?
The proper way to design such a relationship is to create a third table that maps id's to groups:
CREATE TABLE GroupsIds (
memberid INT,
groupid INT,
PRIMARY KEY (memberid, groupid)
);
With this table, it would be much more efficient by using an index for the join:
SELECT l.id
FROM List_of_ids AS l
JOIN GroupsIds AS gi ON gi.memberid = l.id
GROUP BY l.id
HAVING COUNT(*) <= 3
select * from
(
select ID,
(
select count(*)
From Groups
where LOCATE(concat('ID', a.id, '|'), concat(Members, '|'))>0
) as groupcount
from ListIDTable as a
) as q
where groupcount <= 3

How to count the teams with no foreign players

I have a table for the football clubs of a country. The fields are "teamName", "playerName", and "country".
I'd like to count the clubs that all their players are foreigners
. I tried the following query but I think it's not working since it seems that it counts when we have at least one foreigner but I want it to count if all the players of a team are foreigners!
SELECT COUNT(DISTINCT teamName)
FROM teams
WHERE country not like '%England%'
Please advise. Thanks!
One way would be:
SELECT COUNT(DISTINCT teamName)
FROM teams T1
WHERE NOT EXISTS
( select * from teams T2
WHERE T1.teamName=T2.teamName and T2.country like '%England%')
Hmm even join seems to need an inner query:
SELECT COUNT(*)
FROM (
SELECT teamName,
SUM(country like '%England%') AS Locals
FROM teams
GROUP BY teamName
) AS t
WHERE Locals = 0;
Seems like there should be a shorter answer though...
Quick and dirty answer, and it does only one pass through the data, no subquery. This selects teams that are all foreign. You can play with the CASE expression if that is not what you want.
SELECT team_name
FROM teams
GROUP BY team_name
HAVING COUNT(country)=
SUM(CASE country != 'England' WHEN TRUE THEN 1 ELSE 0 END);
Longer answer: Your schema is not normalized, but should be. You want want table of teams and a second table of players, which includes a foreign key into the team table for that player's current team. This is basic DB normalization. However, replacing the single table in the FROM with the join of those two tables, the same GROUP BY/HAVING trick works.

Database Count & Group By error

I am quite new on SQL and I am trying to practice to improve myself.
I have a database which has a
Table : Players, Teams, Plays, and Wins
Players : pid, pname, age, country
Plays : pid, season, tid, value ( pid -> pid in Players, tid -> tid in Teams )
Teams : tid, tname, tcolor, tbudget
Wins : wtid, ltid, season, wscore, lscore ( wtid,ltid -> tid in Teams )
The question is Find the name of the players whose played in atleast 2 dif. teams with same color
What I did is
SELECT DISTINCT P.pname
FROM Players P
,Teams T1
GROUP BY T1.tcolor
HAVING 1 < (
SELECT COUNT (10)
FROM Teams T2
WHERE T1.tcolor=T2.tcolor)
When I try to query this , I get an error which is ;
Error Code: 1630
FUNCTION PRALATEST.COUNT does not exist. Check the 'Function Name Parsing and Resolution' section in the Reference Manual
In which part am I doing wrong?
Try this:
select pname
from Players
join Plays on Plays.pid = Players.pid
join Teams on Teams.tid = Plays.tid
group by pname, tcolor
having count(Teams.tname) > 1
The condition count(Teams.tname) > 1 is in a having clause instead of a where clause becuase it needs to operate on the results AFTER the group by is performed.
Couple things. Your error message is because you put a numeric constant in the COUNT function. You should just use an asterisk.
Also, you have not specified a join condition for your Players and Teams tables. As a result, you are doing a product join (probably not what you want). I'm guessing you need to join to your Plays table.
You should change your coding practice to use "explicit" join syntax to avoid errors like this in the future.

MYSQL View and summing fields

I need some help I have been scouring the web and haven't been able to find something too similar. I have a MYSQL database for my Golf League. I need to display standings by creating a view from this database. There are 2 people per team, my primary key in 'players' is 'id' there is also a teamID (numerical value 1 - 20, 20 teams) for each player which corresponds to their teammates. Basically what I need is a view that contains 'teamID', both players 'LName' (maybe an 'LNameA','LNameB'), and a sum of the two players 'points' field. I have never summed a field from one person and another or created a view in MYSQL.
EDIT:
I was trying something like
CREATE
VIEW standings1
AS SELECT teamID, LName, points
FROM players
but need teamID to be the primaryKey of the view which will contain each players last name, and their points summed together.
Try this:
create view standings as
select teamId, group_concat(lname separator ', ') as TeamMembers,
sum(points) TotalPoints from players
group by teamId
Oh, one more thing. If you want to have the names of the players in different fields (group_concat just separate them by commas, but it is still a single field) you can use this query:
create view standings as
select a.teamId, a.lname as player1, b.lname as player2,
a.points + b.points TotalPoints
from players a
join players b ON a.teamId = b.teamId AND a.id >= b.id
group by a.teamId, a.id
having count(*) = 2
That way you can play better with the names in PHP without having to parse the ", "
If I understand your table structure, you will need a JOIN against the table's own teamID. I'm assuming the teamID refers to a team, and is not the id of the player. The trick here is to join two copies of the table on the same teamID, but where the player ids are non-equal. That should produce the pair of players per team.
CREATE VIEW standings AS
(
SELECT
p1.teamID AS teamID,
p1.id AS p1id,
p2.id AS p2id,
p1.LName AS p1LName,
p2.LName AS p2LName,
p1.score + p2.score AS totalScore
FROM
/* JOIN on matching teamID and non-matching player-id (so you don't get the same player twice) */
players p1 JOIN players p2 ON p1.teamID = p2.teamID and p1.id <> p2.id
);

Optimize slow ranking query

I need to optimize a query for a ranking that is taking forever (the query itself works, but I know it's awful and I've just tried it with a good number of records and it gives a timeout).
I'll briefly explain the model. I have 3 tables: player, team and player_team. I have players, that can belong to a team. Obvious as it sounds, players are stored in the player table and teams in team. In my app, each player can switch teams at any time, and a log has to be mantained. However, a player is considered to belong to only one team at a given time. The current team of a player is the last one he's joined.
The structure of player and team is not relevant, I think. I have an id column PK in each. In player_team I have:
id (PK)
player_id (FK -> player.id)
team_id (FK -> team.id)
Now, each team is assigned a point for each player that has joined. So, now, I want to get a ranking of the first N teams with the biggest number of players.
My first idea was to get first the current players from player_team (that is one record top for each player; this record must be the player's current team). I failed to find a simple way to do it (tried GROUP BY player_team.player_id HAVING player_team.id = MAX(player_team.id), but that didn't cut it.
I tried a number of querys that didn't work, but managed to get this working.
SELECT
COUNT(*) AS total,
pt.team_id,
p.facebook_uid AS owner_uid,
t.color
FROM
player_team pt
JOIN player p ON (p.id = pt.player_id)
JOIN team t ON (t.id = pt.team_id)
WHERE
pt.id IN (
SELECT max(J.id)
FROM player_team J
GROUP BY J.player_id
)
GROUP BY
pt.team_id
ORDER BY
total DESC
LIMIT 50
As I said, it works but looks very bad and performs worse, so I'm sure there must be a better way to go. Anyone has any ideas for optimizing this?
I'm using mysql, by the way.
Thanks in advance
Adding the explain. (Sorry, not sure how to format it properly)
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t ALL PRIMARY NULL NULL NULL 5000 Using temporary; Using filesort
1 PRIMARY pt ref FKplayer_pt77082,FKplayer_pt265938,new_index FKplayer_pt77082 4 t.id 30 Using where
1 PRIMARY p eq_ref PRIMARY PRIMARY 4 pt.player_id 1
2 DEPENDENT SUBQUERY J index NULL new_index 8 NULL 150000 Using index
Try this:
SELECT t.*, cnt
FROM (
SELECT team_id, COUNT(*) AS cnt
FROM (
SELECT player_id, MAX(id) AS mid
FROM player_team
GROUP BY
player_id
) q
JOIN player_team pt
ON pt.id = q.mid
GROUP BY
team_id
) q2
JOIN team t
ON t.id = q2.team_id
ORDER BY
cnt DESC
LIMIT 50
Create an index on player_team (player_id, id) (in this order) for this to work fast.
Its the subquery that is killing it - if you add a current field on the player_team table, where you give it value = 1 if it is current, and 0 if it is old you could simplify this alot by just doing:
SELECT
COUNT(*) AS total,
pt.team_id,
p.facebook_uid AS owner_uid,
t.color
FROM
player_team pt
JOIN player p ON (p.id = pt.player_id)
JOIN team t ON (t.id = pt.team_id)
WHERE
player_team.current = 1
GROUP BY
pt.team_id
ORDER BY
total DESC
LIMIT 50
Having multiple entries in the player_team table for the same relationship where the only way to distinguish which one is the 'current' record is by comparing two (or more) rows I think is bad practice. I have been in this situation before and the workarounds you have to do to make it work really kill performance. It is far better to be able to see which row is current by doing a simple lookup (in this case, where current=1) - or by moving historical data into a completely different table (depending on your situation this might be overkill).
I sometimes find that more complex queries in MySQL need to be broken into two pieces.
The first piece would pull the data required into a temporary table and the second piece would be the query that attempts to manipulate the dataset created. Doing this definitely results in a significant performance gain.
This will get the current teams with colours ordered by size:
SELECT team_id, COUNT(player_id) c AS total, t.color
FROM player_team pt JOIN teams t ON t.team_id=pt.team_id
GROUP BY pt.team_id WHERE current=1
ORDER BY pt.c DESC
LIMIT 50;
But you've not given a condition for which player should be considered owner of the team. Your current query is arbitrarily showing one player as owner_id because of the grouping, not because that player is the actual owner. If your player_team table contained an 'owner' column, you could join the above query to a query of owners. Something like:
SELECT o.facebook_uid, a.team_id, a.color, a.c
FROM player_teams pt1
JOIN players o ON (pt1.player_id=o.player_id AND o.owner=1)
JOIN (...above query...) a
ON a.team_id=pt1.team_id;
You could add a column "last_playteam_id" to player table, and update it each time a player changes his team with the pk from player_team table.
Then you can do this:
SELECT
COUNT(*) AS total,
pt.team_id,
p.facebook_uid AS owner_uid,
t.color
FROM
player_team pt
JOIN player p ON (p.id = pt.player_id) and p.last_playteam_id = pt.id
JOIN team t ON (t.id = pt.team_id)
GROUP BY
pt.team_id
ORDER BY
total DESC
LIMIT 50
This could be fastest because you don't have to update the old player_team rows to current=0.
You could also add instead a column "last_team_id" and keep it's current team there, you get the fastest result for the above query, but it could be less helpful with other queries.