MySQL doubled results using sum and left join - mysql

I have a three table setup : kids, toys and games, each with unique primary keys : id_kid, id_toy and id_game. Each kid can have multiple toys and games, but each toy or game is owned by only one kid.
The toys and games have a bought column with 3 states : -1,0,1
The table structure is something like this :
kids
id_kid
kid_name
etc
games
id_game
id_kid_games --> links with id_kid in kids_table (maybe not the best name, I know)
game_name
bought --> can be -1,0,1
toys
id_toy
id_kid_toys --> links with id_kid in kids_table
toy_name
bought --> can be -1,0,1
For each kid i'm trying to get a total of toys and games, bought and not bought, using the query below, however the results are doubled :
SELECT kids.*,
COUNT(DISTINCT toys.id_toy) AS total_toys,
SUM(CASE toys.bought WHEN 1 THEN 1 ELSE 0 END) AS toys_bought,
SUM(CASE toys.bought WHEN -1 THEN 1 ELSE 0 END) AS toys_not_bought,
COUNT(DISTINCT games.id_game) AS total_games,
SUM(CASE games.bought WHEN 1 THEN 1 ELSE 0 END) AS games_bought,
SUM(CASE games.bought WHEN -1 THEN 1 ELSE 0 END) AS games_not_bought
FROM kids as k
LEFT JOIN toys t ON k.id_kid = t.id_kid_toys
LEFT JOIN games g ON k.id_kid = g.id_kid_games
GROUP BY k.id_kid
ORDER BY k.name ASC
One kid has 2 toys and 4 games, all bought, and the results are 2 total toys (correct), 4 total games (correct), 8 toys bought, 8 games bought. (both wrong)
Please help with an answer - if possible - without using subselects.
Thank you.

As you are selecting data from two unrelated relations (kids joined to toys, and kids joined to games), subqueries are the natural way of doing it. As uncorrelated subqueries may be used, this should not be particularly slow.
Try if this query is sufficiently efficient:
Compared to your original query, it basically just reverses the order of joinining and grouping.
SELECT kids.*, t.total_toys, t.toys_bought, t.toys_not_bought,
g.total_games, g.games_bought, g.games_not_bought
FROM kids
LEFT JOIN (SELECT id_kids_toys,
COUNT(*) AS total_toys,
SUM(CASE bought WHEN 1 THEN 1 ELSE 0 END) as toys_bought,
SUM(CASE bought WHEN -1 THEN 1 ELSE 0 END) as toys_not_bought
FROM toys
GROUP BY id_kids_toys) AS t
ON t.id_kids_toys = kids.id_kid
LEFT JOIN (SELECT id_kids_games,
COUNT(*) AS total_games,
SUM(CASE bought WHEN 1 THEN 1 ELSE 0 END) as games_bought,
SUM(CASE bought WHEN -1 THEN 1 ELSE 0 END) as games_not_bought
FROM games
GROUP BY id_kids_games) AS g
ON g.id_kids_games = kids.id_kid
ORDER by kids.name;
If you insist on avoiding subqueries, this, probably far less efficient, query might do:
SELECT kids.*,
COUNT(DISTINCT toys.id_toy) AS total_toys,
-- sum only toys joined to first game
SUM(IF(g2.id_game IS NULL AND bought = 1, 1, 0)) AS toys_bought,
SUM(IF(g2.id_game IS NULL AND bought = -1, 1, 0)) AS toys_not_bought,
-- sum only games joined to first toy
COUNT(DISTINCT games.id_game) AS total_games,
SUM(IF(t2.id_toy IS NULL AND bought = 1, 1, 0)) AS games_bought,
SUM(IF(t2.id_toy IS NULL AND bought = -1, 1, 0)) AS games_not_bought
FROM kids as k
LEFT JOIN toys t ON k.id_kid = t.id_kid_toys
LEFT JOIN games g ON k.id_kid = g.id_kid_games
-- select only rows where either game or toy is the first one for this kid
LEFT JOIN toys t2 on k.id_kid = t.id_kid_toys AND t2.id_toy < t.id_toy
LEFT JOIN games g2 ON k.id_kid = g.id_kid_games AND g2.id_game < g.id_game
WHERE t2.id_toy IS NULL OR g2.id_game IS NULL
GROUP BY k.id_kid
ORDER BY k.name ASC
It works by ensuring that for each kid, only the games joined to the first toy is counted, and only the toys joined to the first game is counted.

Related

MySQL Where Clause with Union All getting wrong results

I will preface this by saying I am still very much learning MySQL, and I am absolutely at that stage where I know just enough to be dangerous.
I have a database with data for scorekeeping for a sports league. We record wins/losses as either 1 or zero points. There is a night that has double play involved (meaning the players play twice in a single night, for 2 different formats). My data is structured like so (just a sample, I have hundreds of rows, over different formats):
ID
FID
WK
Type
HomeTeam
AwayTeam
HF1
HF2
AF1
AF2
1
44
1
PL
TM1
TM2
1
0
0
1
2
44
1
PL
TM3
TM4
0
0
1
1
3
44
2
PL
TM2
TM3
1
1
0
0
4
44
2
PL
TM4
TM1
0
1
1
0
5
44
3
PL
TM3
TM1
999
0
999
1
6
44
3
PL
Tm2
TM4
1
0
0
1
Where the 999 is used as a code number for us to know that the match hasn't yet been played, or the scoresheet hasn't been turned in to us for recordkeeping. (I use PHP to call these to a website for users to see what is going on, and am using an IF statement to convert that 999 to "TBD" on the website)
I can pull the Format 1 and Format 2 scores separately and get a listing just fine, but when I try to pull them together and get a total score, I am getting an incorrect count. I know the error lies with my WHERE Clause, but I've been banging my head trying to get it to work correctly, and I think I just need an extra set of eyes on this.
My current SQL Query is as follows:
SELECT Team,
SUM(TotalF1) AS TotalF1,
SUM(TotalF2) AS TotalF2,
SUM(TotalF1+TotalF2) AS Total
FROM ( ( SELECT HomeTeam AS Team,
HF1 AS TotalF1,
HF2 AS TotalF2
FROM tbl_teamscores
WHERE FID = 44
AND Type = 'PL'
AND HF1 != 999
AND HF2 != 999 )
UNION ALL
( SELECT AwayTeam,
AF1,
AF2
FROM tbl_teamscores
WHERE FID = 44
AND Type = 'PL'
AND AF1 != 999
AND AF2 != 999 )
) CC
GROUP BY Team
ORDER BY Total desc, Team ASC;
I am getting incorrect totals though, and I know the reason is because of those 999 designations, as the WHERE clause is skipping over ALL lines where either home or away score matches 999.
I tried separating it out to 4 separate Select Statements, and unioning them, but I just get an error when I do that. I also tried using Inner Join, but MySQL doesn't seem to like that either.
Edit to add DBFiddle with Real World Table Data and queries: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=1d4d090b08b8280e734218ba32db6d88
An example of the problem can be observed when looking at the data for Player 10. The overall total should be 13, but I am only getting 12.
Any suggestions would be very helpful.
Thanks in advance!
You can use conditional aggregation:
SELECT Team,
SUM(CASE WHEN Total8 <> 999 THEN Total8 END) AS Total8,
SUM(CASE WHEN TotalLO <> 999 THEN TotalLO END) AS TotalLO,
SUM(CASE WHEN Total8 <> 999 THEN Total8 END) + SUM(CASE WHEN TotalLO <> 999 THEN TotalLO END) AS Total
FROM (
SELECT HomeTeam AS Team, Home8PTS AS Total8, HomeLOPTS AS TotalLO FROM tbl_teamscores WHERE FID = 44 AND Type = 'PL'
UNION ALL
SELECT AwayTeam, Away8PTS, AwayLOPTS FROM tbl_teamscores WHERE FID = 44 AND Type = 'PL'
) CC
GROUP BY Team
ORDER BY Team ASC;
or:
SELECT Team,
SUM(NULLIF(Total8, 999)) AS Total8,
SUM(NULLIF(TotalLO, 999)) AS TotalLO,
SUM(NULLIF(Total8, 999)) + SUM(NULLIF(TotalLO, 999)) AS Total
FROM (
SELECT HomeTeam AS Team, Home8PTS AS Total8, HomeLOPTS AS TotalLO FROM tbl_teamscores WHERE FID = 44 AND Type = 'PL'
UNION ALL
SELECT AwayTeam, Away8PTS, AwayLOPTS FROM tbl_teamscores WHERE FID = 44 AND Type = 'PL'
) CC
GROUP BY Team
ORDER BY Team ASC;
If you get nulls in the results then you should also use COALESCE():
SELECT Team,
COALESCE(SUM(NULLIF(Total8, 999)), 0) AS Total8,
COALESCE(SUM(NULLIF(TotalLO, 999)), 0) AS TotalLO,
COALESCE(SUM(NULLIF(Total8, 999)), 0) + COALESCE(SUM(NULLIF(TotalLO, 999)), 0) AS Total
FROM (
SELECT HomeTeam AS Team, Home8PTS AS Total8, HomeLOPTS AS TotalLO FROM tbl_teamscores WHERE FID = 44 AND Type = 'PL'
UNION ALL
SELECT AwayTeam, Away8PTS, AwayLOPTS FROM tbl_teamscores WHERE FID = 44 AND Type = 'PL'
) CC
GROUP BY Team
ORDER BY Team ASC;
See the demo.

how to get two table joined data exclude some conditional data of one table in mysql?

I have two tables: Student and fee
student
sid
name
roll_no
1
John
22
2
Karina
32
3
Navin
42
fee
fid
s_id
month
fee
1
2
January
1000
2
3
January
1200
3
2
Fabruary
1000
I want to get students (who not paid fee) for Fabruary : like this...
Student id
Name
Roll No
January
February
1
John
22
0
0
3
Navin
42
1200
0
My code is :
SELECT s.sid,s.name,s.roll_no,f.fee
from student s
left join
fee f
ON f.fid = s.sid
AND f.month = 'January'
where s.sid NOT IN (SELECT s_id from fee
where month = 'February')
order by c.id;
I got zero value in both months for all students
------Thanks in advance-------
After correction this code works in mysql workbench but not in java application.
SELECT
s.sid,s.name,s.roll_no, ifnull(f.fee,0) as pre_month,0 as current_month
from student s
left join
fee f
ON f.s_id = s.sid
AND f.month = 'January'
where s.sid NOT IN
(SELECT s_id from fee
where month = 'February')
order by s.sid;
For the results you want, I would suggest conditional aggregation and a having clause:
SELECT s.sid, s.name, s.roll_no,
SUM(CASE WHEN f.month = 'January' THEN f.fee ELSE 0 END) as january,
SUM(CASE WHEN f.month = 'February' THEN f.fee ELSE 0 END) as february
FROM student s LEFT JOIN
fee f
ON f.fid = s.sid AND f.month = 'January'
GROUP BY s.sid, s.name, s.roll_no
HAVING SUM(CASE WHEN f.month = 'February' THEN f.fee ELSE 0 END) = 0 AND
SUM(CASE WHEN f.month = 'January' THEN f.fee ELSE 0 END) > 0;
Here is a db<>fiddle.
Note that storing month names in a column is usually a really bad idea. It does not distinguish between the years, for instance. It is much better to use a date.
You can use a left join to join the students with their fees, and then a having clause to filter the list down to those who don't have one for February. The below query should get you a list of all students who did not pay a fee for February.
It also looked like you might be using the wrong column on the fee table for the join. It looked like you might want s_id instead of fid.
select s.*
from students s
left join fee f on f.s_id=s.sid and f.month='February'
having f.fid is null;

How to write a SQL query for multiple Inner Join?

A sample record:
Row(user_id='KxGeqg5ccByhaZfQRI4Nnw', gender='male', year='2015', month='September', day='20',
hour='16', weekday='Sunday', reviewClass='place love back', business_id='S75Lf-Q3bCCckQ3w7mSN2g',
business_name='Notorious Burgers', city='Scottsdale', categories='Nightlife, American (New), Burgers,
Comfort Food, Cocktail Bars, Restaurants, Food, Bars, American (Traditional)', user_funny='1',
review_sentiment='Positive', friend_id='my4q3Sy6Ei45V58N2l8VGw')
This table has more than a 100 million records. My SQL query is doing the following:
Select the most occurring review_sentiment among the friends (friend_id) and the most occurring gender among friends of a particular user visiting a specific business
friend_id is eventually a user_id
Example Scenario:
One user
Has Visited 4 Businesses
Has 10 friends
5 of these friends have visited Business 1 & 2 while other 5 have
visited 3rd business only and none have visited the fourth
Now, for Business 1 and 2, the 5 friends have more positive than
negative sentiments for B1 and have more -ve than +ve sentiment for
B2 and all -ve for B3
I want the following output for this:
**user_id | business_id | friend_common_sentiment | mostCommonGender | .... otherCols**
user_id_1 | business_id_1 | positive | male | .... otherCols
user_id_1 | business_id_2 | negative | female | .... otherCols
user_id_1 | business_id_3 | negative | female | .... otherCols
Here's a simple query I wrote for this in pyspark:
SELECT user_id, gender, year, month, day, hour, weekday, reviewClass, business_id, business_name, city,
categories, user_funny, review_sentiment FROM events1 GROUP BY user_id, friend_id, business_id ORDER BY
COUNT(review_sentiment DESC LIMIT 1
This query will not give what is expected but I'm not sure how exactly to fit in a INNER-JOIN into this?
Man does that data structure make things hard. But lets break it down into steps,
You need to self join to get the data for friends
Once you have the data for friends, perform aggregate functions to get counts of each possible value, grouping by the user and the business
sub query the above in order to make decisions between the values based on counts.
I'm just going to call your table "tags", so the join would be as follows, sadly just like in real life we can't assume everyone has friends, and since you didn't specify to exclude the forever alone crowd, we need to use a left join to keep users without friends.
From tags as user
left outer join tags as friends on user.friend_id = friends.user_id
and friends.business_id = user.business_id
Next you have to figure out what the most common gender/review is for a given user and business combination. This is where the data structure really kicks us in the butt, we could do this in one step with some clever window functions, but I want this answer to be easily understood, so I'm going to use a sub-query and a case statements. For the sake of simplicity I'm assuming binary genders, but depending on the woke level of your app, you can follow the same patterns for additional genders.
select user.user_id, user.business_id
, sum(case when friends.gender = 'Male' then 1 else 0 end) as MaleFriends
, sum(case when friends.gender = 'Female' then 1 else 0 end) as FemaleFriends
, sum(case when friends.review_sentiment = 'Positive' then 1 else 0 end) as FriendsPositive
, sum(case when friends.review_sentiment = 'Negative' then 1 else 0 end) as FriendsNegative
From tags as user
left outer join tags as friends on user.friend_id = friends.user_id
and friends.business_id = user.business_id
where user.business_id = <<your business id here>>
group by user.user_id, user.business_id
Now we just have to grab data from the sub-query and make some decisions, you may want to add some additional options, for instance you may want to add options in case there are no friends, or friends are evenly split between gender/sentiment. same pattern as below though with extra values to choose from.
select user_id
, business_id
, case when MaleFriends > than FemaleFriends then 'Male' else 'Female' as MostCommonGender
, case when FriendsPositive > FriendsNegative then 'Positive' else 'Negative' as MostCommonSentiment
from ( select user.user_id, user.business_id
, sum(case when friends.gender = 'Male' then 1 else 0 end) as MaleFriends
, sum(case when friends.gender = 'Female' then 1 else 0 end) as FemaleFriends
, sum(case when friends.review_sentiment = 'Positive' then 1 else 0 end) as FriendsPositive
, sum(case when friends.review_sentiment = 'Negative' then 1 else 0 end) as FriendsNegative
From tags as user
left outer join tags as friends on user.friend_id = friends.user_id
and friends.business_id = user.business_id
where user.business_id = <<your business id here>>
group by user.user_id, user.business_id) as a
This gives you the steps to follow, and hopefully a clear explanation on how they work. Good luck!

Rank standings table of soccer matches by H2H

Let me start explaining this with an example, I have a table with records of matches played in a soccer league, by using this table and its matches results am able to generate a standings table for the teams in this league via a mysql query.
Table [matches] (example)
--------------------------------------------------------
|id | hometeam |goalsfor|goalsagainst| awayteam |
--------------------------------------------------------
--------------------------------------------------------
| 8 | Real Madrid | 2 | 0 | Inter Milan |
--------------------------------------------------------
| 9 | Inter Milan | 3 | 3 | Real Madrid |
--------------------------------------------------------
Generated standings by query
Pos Team Pld W D L F A GD Pts
1 FC Barcelona 5 2 3 0 8 5 3 9
2 Inter Milan 6 2 2 2 11 10 1 8
3 Real Madrid 6 2 2 2 8 8 0 8
4 AC Milan 5 0 3 2 8 12 -4 3
The query:
select
team,
count(*) played,
count(case when goalsfor > goalsagainst then 1 end) wins,
count(case when goalsagainst> goalsfor then 1 end) lost,
count(case when goalsfor = goalsagainst then 1 end) draws,
sum(goalsfor) goalsfor,
sum(goalsagainst) goalsagainst,
sum(goalsfor) - sum(goalsagainst) goal_diff,
sum(
case when goalsfor > goalsagainst then 3 else 0 end
+ case when goalsfor = goalsagainst then 1 else 0 end
) score
from (
select hometeam team, goalsfor, goalsagainst from scores
union all
select awayteam, goalsagainst, goalsfor from scores
) a
group by team
order by score desc, goal_diff desc;
What I want to do is to order the standings based on Head to Head matches, so it would first order by points, then if there's a draw in points the second sorting would be to look at the two teams matches and compare who has more wins, or scored more than the other, then use that to sort the table.
By doing this as in the example Real Madrid will become ranked as 2nd and then Inter Milan as 3rd.
How can I achieve this?
I want to compare the two teams matches when they are equal in points, and use that to sort.
ORDER BY score DESC, h2h DESC; goal_diff DESC
Update: I ended going with a solution mix of sql and php, first I find equaled teams in rank, and then generate mini h2h standings for those team and update the rank based on it. I still see this doable with just sql, but with my heavy query its too complicated to implement with just sql, thats why I mixed with php in the implementation.
You need to process this in two steps. First, run the query above and store the results in a work table (call it work below). Then you need to get a tie breaker score for each team that is on the same score. Below, I join the matches table to the work table for each team, and ignore any where the work rows do not have the same score, as they are not important. Then give the team 1 if they won. Have to do it again for the other side. You might want to change this to the 3 for win, 1 for draw.
Sum these results up, join that result to the team row in work, and you have a tie break score for each row where where the score is the same.
You need to check what happens if you have many teams on the same score, and see if this is the result you want.
select w.*, b.hth
From work w
left outer join (
select team, SUM(hth) hth
from (
Select hometeam team, case when m.goalsfor > m.goalsagainst then 1 else 0 end hth
from matches m
inner join work w1 on m.hometeam = w1.team
inner join work w2 on m.awayteam = w2.team
where w1.score = w2.score
union all
Select awayteam team, case when m.goalsAgainst > m.goalsFor then 1 else 0 end hth
from matches m
inner join work w1 on m.hometeam = w1.team
inner join work w2 on m.awayteam = w2.team
where w1.score = w2.score
) a --all hth at same points
group by team
) b --summed to one row per team
on b.team = w.team
order by w.score desc, b.hth desc;

MySQL joining tables by creating more columns instead of rows

I am mostly new to SQL and ran across a situation that I can't figure out. Say that I have 2 tables: P and A.
Person id Live Income
--------- -- ---- ------
Tom 1 House 10
Sarah 2 Apt 7
Sterling 3 Playpen 0
Chris 4 House 6
Juanita 5 Apt 12
...
Live2 id Attribute
--------- -- -----
House 1 Job
House 2 Car
House 3 Kids
Apt 4 Job
Apt 5 Car
Playpen 6 Diapers
So if you have a 'House' then you always also have a Job, Car, and Kids. If you have a Playpen then you only have Diapers (and never a Job).
What I am trying to do (without double-counting people) is find the total income for 'House' people (Live='House', the 1st category), then 'Job' people (Attribute='Job', 2nd category), then 'Diaper' people, etc. So Tom is counted as a 'House' person but not a 'Job' person (because he has been previously classified and I don't want to double-count income).
Logically I can think of several ways to approach and based on my research this seems to be a perfect place to use the long form of CASE because I can specify conditions from different columns. BUT, I can't seem to join the tables in a way that I don't end up double counting income by creating too many rows. For example, I'll JOIN them and it will create 3 Tom entries (one each for Job, Car, Kids).
IMO either we need multiple 'Attribute' columns (one each for Job, Car, Kids, Diapers) so 'Tom' is still fully described on one row or some way to ignore all the other 'Tom' rows once he has been counted in a classification.
without knowing some additional details im guessing this is what you want... table a is the one with the person in it table b has live2 in it
SELECT
SUM(CASE WHEN live = 'House' THEN income ELSE 0 END) as house,
SUM(CASE WHEN live = 'Apt' THEN income ELSE 0 END) as apt,
SUM(CASE WHEN live = 'Playpen' THEN income ELSE 0 END) as playpen
FROM
( SELECT a.*
FROM a
JOIN b ON b.live2 = a.live
GROUP BY a.id
)t
DEMO
what this is assuming is that besides house the only other live is apt that can have a job. if thats the case then this query will do what you want.
If you want to actually specify 'Job' in the query then you can do it like this.
SELECT
SUM(CASE WHEN live = 'House' THEN income ELSE 0 END) as house,
SUM(CASE WHEN attribute = 'Job' AND live2 <> 'House' THEN income ELSE 0 END) as apt,
SUM(CASE WHEN live = 'Playpen' THEN income ELSE 0 END) as playpen
FROM
( SELECT a.*, b.live2, b.attribute
FROM a
JOIN b ON b.live2 = a.live
GROUP BY a.id
)t
you could also join with each field specified.. if you want an example I can show you
You're question is a bit strange, I'll agree. There isn't any complication since none of your persons live in a house and an apartment...
select live,
sum(income) income,
count(*) people
from p
left join
a
on p.live = a.live2
and a.attribute = 'job'
group by live