Two weeks ago I started learning SQL and it has been going pretty good so far but I have run into a situation that I can't seem to resolve. After two days of searching the web and looking at books, I am no closer to solving this issue, so time to ask for some help. Part of my problem is that being so new to SQL I'm not exactly sure what to search for.
This is using mySQL and INNODB.
After some joins and other things I have the table below with athlete information giving the type of event in which the athlete participated and the distance of that event. The possible event/distance combinations are {A10,A15,B10,B15}.
Events Table
last_name first_name event distance
Munster Eddie A 10
Brady Marsha A 10
Clampet Jethro B 15
Grumby Jonas A 15
Brady Peter A 10
Brady Marsha A 10
Brady Marsha B 15
Grant Ginger B 15
Munster Eddie B 10
Brady Marsha A 10
What I am trying to do as the final step is to transform this table into a form that shows how many times each athlete participated in each event, like the following output:
last_name first_name A10 A15 B10 B15
Munster Eddie 1 0 1 0
Brady Marsha 3 0 0 1
Clampet Jethro 0 0 0 1
Grumby Jonas 0 1 0 0
Brady Peter 1 0 0 0
Grant Ginger 0 0 0 1
I think I want to use correlated subqueries, so I have tried a number variants of this following SQL query but it returns "Operand should contain 1 column(s)", which makes sense.
SELECT last_name, first_name,
count(if(event='A',1,0) AND if(distance=10,1,0)) AS A10
FROM sample s
WHERE (SELECT last_name, first_name, event, distance
FROM sample s1
WHERE s1.last_name = s.last_name
)
ORDER BY last_name, first_name;
The steps I see I need are:
1. create a set of each name in the table, which I can do, and then
2. iterate through each name, creating a new query selecting event/distance and then
3. summing that query on event/distance combination;
4. return the result back up to #1.
I see that procedures provide some looping capabilities, is that the way to do this? Is what I want to do possible in the SQL environment? My next step is to just dump the raw table to PHP and process it there.
Any thoughts and/or solutions are greatly appreciated.
Add GROUP BY:
SELECT last_name,
first_name,
count(if(event = 'A', 1, NULL)
AND if(distance = 10, 1, NULL)) AS A10,
count(if(event = 'A', 1, NULL)
AND if(distance = 15, 1, NULL)) AS A15,
count(if(event = 'B', 1, NULL)
AND if(distance = 10, 1, NULL)) AS B10,
count(if(event = 'B', 1, NULL)
AND if(distance = 15, 1, NULL)) AS B15
FROM sample s
GROUP BY last_name,
first_name
ORDER BY last_name,
first_name;
I would suggest using the SUM + CASE syntax.
Try something like this:
SELECT LAST_NAME,
FIRST_NAME,
SUM(CASE
WHEN EVENT = 'A'
AND DISTANCE = 10 THEN 1
ELSE 0
END) AS A10,
SUM(CASE
WHEN EVENT = 'A'
AND DISTANCE = 15 THEN 1
ELSE 0
END) AS A15,
SUM(CASE
WHEN EVENT = 'B'
AND DISTANCE = 10 THEN 1
ELSE 0
END) AS B10,
SUM(CASE
WHEN EVENT = 'B'
AND DISTANCE = 15 THEN 1
ELSE 0
END) AS B15
FROM SAMPLE
GROUP BY LAST_NAME,
FIRST_NAME
You can find a working example on SQL Fiddle.
Good Luck!
Related
A sample record:
Row(user_id='KxGeqg5ccByhaZfQRI4Nnw', gender='male', year='2015', month='September', day='20',
hour='16', weekday='Sunday', reviewClass='place love back', business_id='S75Lf-Q3bCCckQ3w7mSN2g',
business_name='Notorious Burgers', city='Scottsdale', categories='Nightlife, American (New), Burgers,
Comfort Food, Cocktail Bars, Restaurants, Food, Bars, American (Traditional)', user_funny='1',
review_sentiment='Positive', friend_id='my4q3Sy6Ei45V58N2l8VGw')
This table has more than a 100 million records. My SQL query is doing the following:
Select the most occurring review_sentiment among the friends (friend_id) and the most occurring gender among friends of a particular user visiting a specific business
friend_id is eventually a user_id
Example Scenario:
One user
Has Visited 4 Businesses
Has 10 friends
5 of these friends have visited Business 1 & 2 while other 5 have
visited 3rd business only and none have visited the fourth
Now, for Business 1 and 2, the 5 friends have more positive than
negative sentiments for B1 and have more -ve than +ve sentiment for
B2 and all -ve for B3
I want the following output for this:
**user_id | business_id | friend_common_sentiment | mostCommonGender | .... otherCols**
user_id_1 | business_id_1 | positive | male | .... otherCols
user_id_1 | business_id_2 | negative | female | .... otherCols
user_id_1 | business_id_3 | negative | female | .... otherCols
Here's a simple query I wrote for this in pyspark:
SELECT user_id, gender, year, month, day, hour, weekday, reviewClass, business_id, business_name, city,
categories, user_funny, review_sentiment FROM events1 GROUP BY user_id, friend_id, business_id ORDER BY
COUNT(review_sentiment DESC LIMIT 1
This query will not give what is expected but I'm not sure how exactly to fit in a INNER-JOIN into this?
Man does that data structure make things hard. But lets break it down into steps,
You need to self join to get the data for friends
Once you have the data for friends, perform aggregate functions to get counts of each possible value, grouping by the user and the business
sub query the above in order to make decisions between the values based on counts.
I'm just going to call your table "tags", so the join would be as follows, sadly just like in real life we can't assume everyone has friends, and since you didn't specify to exclude the forever alone crowd, we need to use a left join to keep users without friends.
From tags as user
left outer join tags as friends on user.friend_id = friends.user_id
and friends.business_id = user.business_id
Next you have to figure out what the most common gender/review is for a given user and business combination. This is where the data structure really kicks us in the butt, we could do this in one step with some clever window functions, but I want this answer to be easily understood, so I'm going to use a sub-query and a case statements. For the sake of simplicity I'm assuming binary genders, but depending on the woke level of your app, you can follow the same patterns for additional genders.
select user.user_id, user.business_id
, sum(case when friends.gender = 'Male' then 1 else 0 end) as MaleFriends
, sum(case when friends.gender = 'Female' then 1 else 0 end) as FemaleFriends
, sum(case when friends.review_sentiment = 'Positive' then 1 else 0 end) as FriendsPositive
, sum(case when friends.review_sentiment = 'Negative' then 1 else 0 end) as FriendsNegative
From tags as user
left outer join tags as friends on user.friend_id = friends.user_id
and friends.business_id = user.business_id
where user.business_id = <<your business id here>>
group by user.user_id, user.business_id
Now we just have to grab data from the sub-query and make some decisions, you may want to add some additional options, for instance you may want to add options in case there are no friends, or friends are evenly split between gender/sentiment. same pattern as below though with extra values to choose from.
select user_id
, business_id
, case when MaleFriends > than FemaleFriends then 'Male' else 'Female' as MostCommonGender
, case when FriendsPositive > FriendsNegative then 'Positive' else 'Negative' as MostCommonSentiment
from ( select user.user_id, user.business_id
, sum(case when friends.gender = 'Male' then 1 else 0 end) as MaleFriends
, sum(case when friends.gender = 'Female' then 1 else 0 end) as FemaleFriends
, sum(case when friends.review_sentiment = 'Positive' then 1 else 0 end) as FriendsPositive
, sum(case when friends.review_sentiment = 'Negative' then 1 else 0 end) as FriendsNegative
From tags as user
left outer join tags as friends on user.friend_id = friends.user_id
and friends.business_id = user.business_id
where user.business_id = <<your business id here>>
group by user.user_id, user.business_id) as a
This gives you the steps to follow, and hopefully a clear explanation on how they work. Good luck!
SELECT COUNT(NAME) AS NAMEA
FROM (DATA.1 WHERE MARKS > 50), COUNT(NAME) AS NAMEB FROM DATA.1
After running it I get this
Syntax error in JOIN operation
I am trying to get the percentage of student pass, no. of student pass, total no. of student and no. of student fail.
Please help me to find whats wrong in the above Query.
please help me.
Thanks
You're essentially running two queries in one.
Query 1:
SELECT COUNT(NAME) AS NAMEA
FROM DATA1
WHERE MARKS > 50
Query 2:
SELECT COUNT(NAME) AS NAMEB
FROM DATA1
If you want both columns in the same query then you would have to use a SUM of a CASE WHEN for the first query.
SELECT SUM(CASE WHEN MARKS > 50 THEN 1 ELSE 0 END) AS NAMEA,
COUNT(NAME) AS NAMEB
FROM DATA1
For the rest of your points you would need the query:
SELECT COUNT(NAME) AS TOTAL_STUDENTS,
SUM(CASE WHEN MARKS > 50 THEN 1 ELSE 0 END) AS STUDENTS_PASSED,
SUM(CASE WHEN MARKS > 50 THEN 1 ELSE 0 END)/COUNT(NAME) AS PASS_RATE,
SUM(CASE WHEN MARKS < 50 THEN 1 ELSE 0 END) AS STUDENTS_FAILED
FROM DATA1
Bear in mind that this missed out students that have exactly 50 marks. If 50 is a pass then you would need to use >= 50 for STUDENTS_PASSED. If 50 is a fail then you would need to use <= 50 for STUDENTS_FAILED.
For example purposes lets say Im trying to figure out the average score for males and females from each parent.
Example data looks like this:
parentID childID sex score
------------------------------------
1 21 m 17
1 23 f 12
2 33 f 55
2 55 m 22
3 67 m 26
3 78 f 29
3 93 m 31
This is the result I want:
parentID offspring m f avg-m avg-f avg-both
----------------------------------------------------
1 2 1 1 17 12 14.5
2 2 1 1 22 55 38.5
3 3 2 1 28.5 29 28.67
With the below query I can find the average for both males and females but I'm not sure how to get the average for either male or female
SELECT parentID, COUNT( childID ) AS offspring, SUM( IF( sex = 'm', 1, 0 ) ) AS m, SUM( IF( sex = 'f', 1, 0 ) ) AS f, max(score) as avg-both
FROM sexb_1
WHERE avg-both > 11
GROUP BY parentID
I tried something like this in the query but it returns an error
AVG(IF(sex = 'm', max(score),0)) as avg-m
I tried something like this in the query but it returns an error
AVG(IF(sex = 'm', max(score),0)) as avg-m
You can't use one aggregate function within another (in this case, MAX() within AVG())—what would that even mean? Once one has discovered the MAX() of the group, over what is there to take an average?
Instead, you want to take the AVG() of score values where the sex matches your requirement; since AVG() ignores NULL values and the default for unmatched CASE expressions is NULL, one can simply do:
SELECT parentID,
COUNT(*) offspring,
SUM(sex='m') m,
SUM(sex='f') f,
AVG(CASE sex WHEN 'm' THEN score END) `avg-m`,
AVG(CASE sex WHEN 'f' THEN score END) `avg-f`,
AVG(score) `avg-both`
FROM sexb_1
GROUP BY parentID
HAVING `avg-both` > 11
See it on sqlfiddle.
Using if
SELECT parentID, COUNT( childID ) AS offspring,
SUM(iF( sex='m', 1 ,0 )) AS m,
SUM(iF( sex='f', 1 ,0 )) AS f,
AVG(if(sex='m', score, null)) as avg_m,
AVG(if(sex='f', score, null)) as avg_f,
AVG(score) as avgboth
FROM sexb_1
GROUP BY parentID
HAVING avgboth > 11
fiddle
In your query the error is due to the usage of avg-both You need to use back ticks or underscore for the alias name. Here it considers it as difference of avg and both
And also you cannot use alias names inside where clause as after the table name is picked up from the query, it is the where clause that comes next. So the database doesn't know the alias names yet.
You can try below query-
SELECT
parentID, COUNT(childID) AS `offspring`,
COUNT(IF(sex = 'm',sex ,NULL )) AS `m`, COUNT(IF(sex = 'f', sex,NULL)) AS `f`,
AVG(IF(sex = 'm',score,NULL )) AS `avg-m`, COUNT(IF(sex = 'f', score,NULL)) AS `avg-f`,
AVG(score) AS `avg-both`
FROM sexb_1
GROUP BY parentID
HAVING `avg-both` > 11;
I am building a Hockey Sports score and prediction system using PHP/MySQL. Below are the system design.
I have a GAMES table where two team numbers and their score in the game is present.The columns from this table are as below.
ID ---- TEAM1 ---- SCORE1 ---- TEAM2 ---- SCORE2
1 70 1 73 2
2 74 0 70 1
3 74 0 73 0
I also have a PICKS table where the details related to user's game predictions are present. Users can guess which team will win in a game and that data is stored in this table. The columns from this table are as below. Each user can guess only once for each game.
ID ---- GAME ---- USER ---- TEAM ---- POINT
1 1 1 70 1
2 2 1 70 1
3 3 1 73 1
3 1 2 70 1
Based on the above available data, I am trying to build up the result where each user (column USER) should be awarded the points(column POINT) for each correct guess. The guess can be validated based on the scores from GAMES table. The final output should be like as below.
USER ---- POINTS ---- CORRECT GUESS COUNT ---- WRONG GUESS COUNT
1 1 1 2
2 0 0 1
The columns "CORRECT GUESS COUNT" and "WRONG GUESS COUNT" represent the total number of correct guess and wrong guess done by the user.
I have created a SQL Fiddle for the above tables with some sample data.
http://sqlfiddle.com/#!2/8d469/4/0
EDIT:
Some more inforamtion are below. It's possible that a game can be a
draw.
In that case the score will be 0 for each team. When a game is
draw, users get no points.
SELECT p.user,
SUM(IF(g.id IS NOT NULL, p.point, 0)) As points,
SUM(IF(g.id IS NOT NULL, 1, 0)) Correct,
SUM(IF(g.id IS NULL, 1, 0)) Wrong
FROM Games g
RIGHT JOIN Picks p ON g.id = p.game AND
p.team = IF(g.score1 > g.score2 , g.team1, IF(g.score1 < g.score2, g.team2, NULL))
GROUP BY p.user;
SQL Fiddle (with your data)
You'll have to forgive me, if there is a more MySQL way to do it than this (background is Oracle/SQL Server):
SELECT
p.user
,sum(CASE
WHEN p.team = g.winner THEN point ELSE 0 END) points
,sum(CASE
WHEN p.team = g.winner THEN 1 ELSE 0 END) good_guess
,sum(CASE
WHEN p.team <> g.winner THEN 1 ELSE 0 END) bad_guess
FROM
picks p
INNER JOIN (
SELECT
id game_id
,CASE
WHEN score1 > score2 THEN team1
WHEN score2 > score1 THEN team2
ELSE -1 --no team_id as negative
END winner
FROM
games
) g
ON
g.game_id = p.game
GROUP BY
p.user
I'm working with MySQL, and I have the following schema:
id school round score win loss tie
2 My School Name 1 10 1 0 0
3 My School Name 2 20 0 1 0
4 My School Name 3 30 1 0 0
5 My School Name 4 40 1 0 0
6 My School Name 5 50 1 0 0
7 My School Name 6 60 0 0 1
And I need the following output, grouped by school name
School Round1 Round2 Round3 Round4 Round5 Round6 wins losses ties
My School Name 10 20 30 40 50 60 4 1 1
So far I feel like I can use GROUP BY School and SUM(win) as wins to get most of the functionality out of it. The hard part, though, is to get those Round_ fields.
Does anyone know how to do this? Thanks in advance, any help would be much appreciated!
Edit: to clarify, I know I have 10 rounds exactly.
We can use a SELECT statement with a GROUP BY school to create a record for each school. The ties, wins, and losses columns are readily calculated with the SUM aggregate function, as you noted. To target a specific round, we can use some clever math (to avoid verbose conditional statements like the one CodeByMoonlight suggested):
If we want to target round R, we note that "round-R" is 0 only when round == R, otherwise it isn't 0. When we take the NOT of "round-R", 0 gets inverted to 1, while everything else gets set to 0. Now, if we multiply !(round-R) by the score of that round, it will give us 0 when the round is not R (as 0*score = 0) and it will give us "score" when the round is R (as 1*score = score). Next, when we take the SUM of this value over the columns, we add score when round=R and 0 otherwise, effectively giving us just the round R score.
Putting that all together gives:
SELECT school AS `School`,
SUM(!(round-1)*score) AS `Round1`,
SUM(!(round-2)*score) AS `Round2`,
SUM(!(round-3)*score) AS `Round3`,
SUM(!(round-4)*score) AS `Round4`,
SUM(!(round-5)*score) AS `Round5`,
SUM(!(round-6)*score) AS `Round6`,
SUM(!(round-7)*score) AS `Round7`,
SUM(!(round-8)*score) AS `Round8`,
SUM(!(round-9)*score) AS `Round9`,
SUM(!(round-10)*score) AS `Round10`,
SUM(win) AS `wins`,
SUM(loss) AS `losses`,
SUM(tie) AS `ties`
FROM `RoundScores` GROUP BY `school`
where RoundScores is the table in question.
EDIT:
If we do not want to manually add 10, we can use prepared statements :
# Store all the conditionals in a string:
# I was not able to to have round loop from 1 to 10, so I iterated over
# all distinct values of 'round' present in the table.
SET #s = "";
SELECT `round`, (#s := CONCAT( #s , "SUM(!(round-",round, ")*score) AS `Round",round, "`," )) FROM `RoundScores` GROUP BY `round`;
# Combine the conditionals from before with the rest of the statement needed.
SET #qry = CONCAT("SELECT school AS `School`,",#s,"SUM(win) AS `wins`,SUM(loss) AS `losses` FROM `RoundScores` GROUP BY `school`");
# Prepare and execute the statement.
PREPARE stmt1 FROM #qry;
EXECUTE stmt1;
TRY WITH UNION ( not tested)
SELECT SUM(win) AS wins, SUM(loss) AS losses, SUM(tie) AS ties
FROM table
GROUP BY (school)
UNION
SELECT score AS round1 FROM table WHERE round=1
UNION
SELECT score AS round2 FROM table WHERE round=2
.... AND so on..
SELECT School, Sum(Case
When Round = 1 Then Score
Else 0
End) AS Round1, Sum(Case
When Round = 2 Then Score
Else 0
End) AS Round2, Sum(Case
When Round = 3 Then Score
Else 0
End) AS Round3, Sum(Case
When Round = 4 Then Score
Else 0
End) AS Round4, Sum(Case
When Round = 5 Then Score
Else 0
End) AS Round5, Sum(Case
When Round = 6 Then Score
Else 0
End) AS Round6, Sum(Case
When Round = 7 Then Score
Else 0
End) AS Round7, Sum(Case
When Round = 8 Then Score
Else 0
End) AS Round8, Sum(Case
When Round = 9 Then Score
Else 0
End) AS Round9, Sum(Case
When Round = 10 Then Score
Else 0
End) AS Round10, Sum(Wins) AS Wins, Sum(Losses) AS Losses, Sum(Ties) AS Ties
FROM MyTable
GROUP BY School
Should work :)