Determining Rookie Years in Lahman Database - mysql

I'm using the MySQL version of the Lahman Baseball Database and I'm having trouble trying to determine the year a player lost their rookie standing. The rules for an MLB player losing rookie standing are:
A player shall be considered a rookie unless, during a previous season or seasons, he has (a) exceeded 130 at-bats or 50 innings pitched in the Major Leagues; or (b) accumulated more than 45 days on the active roster of a Major League club or clubs during the period of 25-player limit (excluding time in the military service and time on the disabled list).
Is there a query that can be run to do this for Batters and Pitchers, or is this something that would be programmatically done?

Using the Lahman Database you can figure out Rookies by At Bats (>130) and Innings Pitched (>50), however there isn't anything for service time during the 25 man roster (non-Sept) limit.
You would need retrosheets {http://www.retrosheet.org/game.htm} data to do that.
The queries below would give you ALL of the rookies by At Bats and Innings Pitched, however the service time rookies would be the exception. There's only a few of those as teams don't tend to keep rookies on the MLB roster and not play them. The lose development time (not playing) and accelerate their service time to lose out on controlled years. So if you're happy with that, these tables will do.
You can use this as a Xref table with batters or pitchers to highlight their rookie year. Or you could add an extra column to batters and pitchers with the RookieYr distinction (advise against it as if you want to add new seasons to your Lahman DB - less customizing needed).
/************************************ Create MLB Rookie Xref Table **********************************************
-- Sort Out Batters who accumulate 130 AB
-- Sort Out Pitchers who accumulate 50 IP
-- Define Rookie Year, Drop off years previous and years after
-- Can be updated Annually using "player ID not in (select distinct playerID from Xref_RookieYr)
-- Using the Sean Lahman Database
-- Authored By Paul DeVos {www.linkedin.com/in/devosp/}
*****************************************************************************************************************/
/****** Query uses T-SQL, Query ran in MS SQL 2012 - you may need to tweek for other platorms or versions. ******/
--Step 1 - Run this for hitter accumulated ABs and when Rookie Year (130 Career At Bats)
Select
concat(m.nameFirst, ' ', m.nameLast) as Name,
b.PlayerID,
b.yearID,
m.debut,
sum(b.ab) over (partition by b.playerID order by b.playerID, b.yearID) as CumulativeAB,
null as CumulativeIP, -- Place Holder for Rookie Pitchers Insert
case when sum(b.ab) over (partition by b.playerID order by b.playerID, b.yearID) >= 130 then b.yearID end as RookieYR
into #temp_rookie_year
from
[master] m
inner join Batting b
on m.playerID=b.playerID
-- Selects Position Players
where b.playerID not in (select distinct f.playerID from Fielding f where f.pos = 'P')
--Step 2 - Run this to get accumulated IP and Rookie Year (50 Career IP)
Insert into #temp_rookie_year
(
Name, PlayerID, YearID, Debut, CumulativeAB, CumulativeIP, RookieYR
)
Select
concat(m.nameFirst, ' ', m.nameLast) as Name,
p.PlayerID,
p.yearID,
m.debut,
null as CumulativeAB,
sum(p.IPouts) over (partition by p.playerID order by p.playerID, p.yearID) as CumulativeIP,
case when sum(p.IPouts) over (partition by p.playerID order by p.playerID, p.yearID) >= 150 then p.yearID end as RookieYR
from [master] m
inner join pitching p
on m.playerID=p.playerID
--Chooses Pitchers
where p.playerID in (select distinct f.playerID from Fielding f where f.pos = 'P')
--Step 3 Run this - sorts out the rookie year into Rookie Xref Table
select Name, PlayerID, min(RookieYr) as RookieYear
into #Xref_RookieYr
from #temp_rookie_year
--where name = 'Hank Aaron'
group by Name, PlayerID
order by RookieYear desc
--Step 4 - run IF you want to remove players who never lost rookie status (cup of cofee players, etc - anyone under 130 AB or 50 IP)
select * from #Xref_RookieYr
order by playerID
Delete from #Xref_RookieYr where RookieYear is null
select * from #Xref_RookieYr
order by playerID
/*****************************************************************************************************************
You can change drop the "#" in front of the table (and name it whatever you want) when you want a permanent table.
If you leave it, it'll drop off when you close the program. e.g. Xref_Rookie_2013
*****************************************************************************************************************/

This can be done in SQL. How it is done will be based upon what is the most optimal way of doing it. Most likely it could be done with one query like so (pseudo-code):
SELECT Master.*
FROM Master
LEFT JOIN Batting ON Master.player_id = Batting.player_id
LEFT JOIN Pitching ON Master.player_id = Pitching.player_id
WHERE Batting.AB > 130 OR Pitching.IPOuts > (50 x 3)
OR Master.DaysActive > 45
That last part of the WHERE statement is a bit iffy because I don't find anything like that in the data from your database provider. I see active games but that isn't the same thing. The Appearances table might get you close but that is about all you can do.
Here is the data I based my pseudo-code off of:
http://baseball1.com/files/database/readme58.txt
I did find another guy who was doing something similar to what you are doing (including calculating who is a rookie). Here is his site (with code):
http://baseballsimulator.com/blog/category/database/

Related

sql get all available people that are not booked on a particular date and time

I'm struggling with an SQL query.
I am building a booking system for a ski resort and in my database I have instructors and sessions. A session can have an instructor, and it has a date and startTime and endTime.
In order to add a session, I want to get all available instructors for a chosen time and date. In other words, all instructors who don't have a booking on that date and at that time.
Table example:
e.g
instructors: i1, i2, i3, i4, i5, i6, i7, i8
sessions:
Instructor | date | start | end |
**i1** **2017-05-03** **14:30:00** **15:30:00**
**i2** **2017-05-03** **14:30:00** **15:30:00**
**i3** **2017-10-03** **10:30:00** **11:30:00**
**i4** **2017-05-03** **10:30:00** **11:30:00**
**i1** **2017-11-03** **14:30:00** **15:30:00**
Then for input date='2017-05-03' and start='14:30' and end='15'30' i want to get
i3,i4,i5,i6,i7,i8
Figured out that I need to left join session to instructors, group by instructor id and then eliminate those ids that have a field in the group with the selected
inputs. However, for the GROUP BY clause, i have to use an aggregate function and i don't know which one could apply here.
SirWinning's self-answer looks like it should work, but my version below removes some parts which weren't required.
select *
from instructor
where id not in
(select instructorid
from Session
where date='2017-03-19' and starttime<='15:30:00' and endtime>='14:30:00')
This code will find any instructors who aren't booked for a session which overlaps the 14:30-15:30 window on the relevant date.
If that's what's wanted, then you're good to go. Of course it doesn't follow that the instructor is "really available". There could be other things which affect their availabilty (working hours, annual leave etc), so you'll need to ensure that there are things in place to handle such things.
Note also, that this code will prevent an instructor appearing available for "back to back" bookings. If you want to allow a booking to start at 14:30 when another one ends at that time, you'll need to change the <= and >= to < and >.
using not exists()
select *
from instructors i
where not exists (
select 1
from sessions s
where s.instructor = i.id
and s.date = '2017-05-03'
and s.start = '14:30'
and s.end = '15:30'
)
So I tried this query and apparently it works(at least for my test case)
Can anybody take a look and tell me if it looks correct?
select *
from instructor
where id in
(select id
from instructor
group by id
having id not in
(select distinct(instructorid) from Session
where date='2017-03-19' and starttime<='15:30:00' and endtime>='14:30:00') )

Relational Database Logic

I'm fairly new to php / mysql programming and I'm having a hard time figuring out the logic for a relational database that I'm trying to build. Here's the problem:
I have different leaders who will be in charge of a store anytime between 9am and 9pm.
A customer who has visited the store can rate their experience on a scale of 1 to 5.
I'm building a site that will allow me to store the shifts that a leader worked as seen below.
When I hit submit, the site would take the data leaderName:"George", shiftTimeArray: 11am, 1pm, 6pm (from the example in the picture) and the shiftDate and send them to an SQL database.
Later, I want to be able to get the average score for a person by sending a query to mysql, retrieving all of the scores that that leader received and averaging them together. I know the code to build the forms and to perform the search. However, I'm having a hard time coming up with the logic for the tables that will relate the data. Currently, I have a mysql table called responses that contains the following fields,
leader_id
shift_date // contains the date that the leader worked
shift_time // contains the time that the leader worked
visit_date // contains the date that the survey/score was given
visit_time // contains the time that the survey/score was given
score // contains the actual score of the survey (1-5)
I enter the shifts that the leader works at the beginning of the week and then enter the survey scores in as they come in during the week.
So Here's the Question: What mysql tables and fields should I create to relate this data so that I can query a leader's name and get the average score from all of their surveys?
You want tables like:
Leader (leader_id, name, etc)
Shift (leader_id, shift_date, shift_time)
SurveyResult (visit_date, visit_time, score)
Note: omitted the surrogate primary keys for Shift and SurveyResult that I would probably include.
To query you join shifts and surveys group on leader and taking the average then jon that back to leader for a name.
The query might be something like (but I haven;t actually built it in MySQL to verify syntax)
SELECT name
,AverageScore
FROM Leader a
INNER JOIN (
SELECT leader_id
, AVG(score) AverageScore
FROM Shift
INNER JOIN
SurveyResult ON shift_date = visit_date
AND shift_time = visit_time --depends on how you are recording time what this really needs to be
GROUP BY leader ID
) b ON a.leader_id = b.leader_id
I would do the following structure:
leaders
id
name
leaders_timetabke (can be multiple per leader)
id,
leader_id
shift_datetime (I assume it stores date and hour here, minutes and seconds are always 0
survey_scores
id,
visit_datetime
score
SELECT l.id, l.name, AVG(s.score) FROM leaders l
INNER JOIN leaders_timetable lt ON lt.leader_id = l.id
INNER JOIN survey_scores s ON lt.shift_datetime=DATE_FORMAT('Y-m-d H:00:00', s.visit_datetime)
GROUP BY l.id
DATE_FORMAT here helps to cut hours and minutes from visit_datetime so that it could be matched against shift_datetime. This is MYSQL function, so if you use something else you'll need to use different function
Say you have a 'leader' who has 5 survey rows with scores 1, 2, 3, 4 and 5.
if you select all surveys from this leader, sum the survey scores and divide them by 5 (the total amount of surveys that this leader has). You will have the average, in this case 3.
(1 + 2 + 3 + 4 + 5) / 5 = 3
You wouldn't need to create any more tables or fields, you have what you need.

Count, max, and multiple sub querys SQL

I'm currently working on a league systeme for my sport team. A ladder, as seen as in some video games.
It's a mobile web site, allowing coaches to create games, and monitor players performances.
I have games automatically balanced, taking into accounts player's experiences and points, then, i give bonus points to the all the players of the winner team, and remove points from the losers.
I have a relatively simple database. 3 tables.
User : id - name
Games : id - ETA - cration_date
game_joueur: id- id_game - id_joueur - team - result - bonus
game_joueur beeing an assoc table, in wich i register for each new game players id, the team he has been seeded on, and afterwards, update the bonus field with the points earned and the result field with an integer (1 = lose, 2= win)
That way i can sum the bonus on my players stat and get the total points.
You can have a better look at the table here :
http://sqlfiddle.com/#!2/d3e06/2
What i'm tryng to acomplish is for each player's stat page, retrieve from the database the name of his most succesfull partner( the guy wich whom he won the most games), and also his worst ally , the men he lost the most match with.
This is what i do on my user stat page :
SELECT
(SELECT COUNT(lad_game_joueur.result) FROM lad_game_joueur WHERE result = 1 AND lad_game_joueur.id_joueur = lad_user.id) as lose,
(SELECT SUM(lad_game_joueur.bonus) FROM lad_game_joueur WHERE lad_game_joueur.id_joueur = lad_user.id) as points,
lad_user.id as id ,
(SELECT COUNT(lad_game_joueur.result) FROM lad_game_joueur WHERE lad_game_joueur.id_joueur = lad_user.id AND result =2) as win,
lad_user.name
FROM lad_user,lad_game_joueur
WHERE lad_game_joueur.id_joueur = lad_user.id AND lad_user.id
='.$id_joueur.'
GROUP BY lad_user.id
ORDER BY puntos DESC
I'm sure this is not the best way to do it, but it works :) ( i'm no sql specialist)
How can i tune this query to also retrive the informations i'm looking for?
I wont mind doing another query.
Thanks a lot in advance!
Ben
Ok i finealy found a way.
Here's what i did :
SELECT
SUM(result)as result_sum, sum(Bonus) as bonus_sum, id_joueur
from lad_game_joueur
where result= 2
and id_game in
(SELECT lad_game_joueur.id_game from lad_game_joueur,lad_game where id_joueur=2
AND result= 2 and lad_game_joueur.id_game=lad_game.id)
group by id_joueur
order by result_sum DESC, bonus_sum desc
As you see, the sum of result would give me 4 if i won two games with the person, but i just divide by 2 on php and voilĂ  :)

mysql query - multiple counts using left join and where clause

I'm currently trying to get the following data:
UserName, UserImageURL, Total Games Played, Games Completed, Games Lost, Average Won (as percentage) and Points of the user
And as well another set of data:
User Statistics data such as:
Most Games Played on League: 23 - Monster Killers
Games Most Won On: 19/23 - Monster Killers
Games Most Lost On: 3/32 - Frog Racers
Your Game Winning Accuracy (total from all games) - 68% accuracy
Site Stats:
Most Games Played on League: 650 - Helicopter Run
Top Game Played: 1200 - Monster Killers
Whole site winning accuracy: 82%
I have the following Tables:
-User Table-
userID (int-pk), userName (varchar), userImageUrl (text)
-Games table-
gameId (int-pk), gameName (varchar), gameUserID (int), gameLeagueId (int), score1 (int), score2 (int), gameResultOut (0 or 1), gameWon (0 or 1)
-UserBalance table-
ubId(int-pk) userId (int) balance (int)
-League table-
leagueId (int-pk) leagueName (varchar)
Just to give you a heads up on what's happening, when a user plays a game and chooses some results a row is inserted into the games table. Since the game is time based, when the results are out, there is a check that checks if there are any games which have that id and will update the gameResultOut to 1 and gameWon to 1 or 0 according to what the user had selected as a score.
I tried the following:
SELECT u.userID, u.userName, u.userImageUrl, l.leagueName ,
COUNT(g.gameId) AS predTotal,
(SELECT COUNT(g.gameId) FROM games AS g WHERE g.gameResultOut = 1 AND g.gameWon = 1) AS gamesWon,
(SELECT COUNT(g.gameId) FROM games AS g WHERE g.gameResultOut = 1 AND g.gameWon = 0) AS gamesLost,
ub.balance
FROM games AS g
LEFT JOIN league AS l ON l.leagueId = g.gameLeagueId
LEFT JOIN user AS u ON u.user_id = g.gameUserID
LEFT JOIN user_balance AS ub ON ub.userId = u.userID
WHERE l.leagueId = 4
GROUP BY u.userId
ORDER BY ub.balance DESC
I can calculate easily the win percentage after the query so that's not a problem, but the result for the Wins and Lost are all the same and even when it comes to changing the leageId, the results are still the same which is not what I want.
Can anyone help?
Thanks & Regards,
Necron
As far as I see, the games table stores games that users played. So, in order to know how many games each user played/won/lost, you're missing the link in the subqueries between games and users.
Your subqueries are:
(SELECT COUNT(g.gameId ) FROM games AS g WHERE g.gameResultOut = 1 AND g.gameWon = 1) AS gamesWon,
(SELECT COUNT(g.gameId) FROM games AS g WHERE g.gameResultOut = 1 AND g.gameWon = 0) AS gamesLost,
And they should be:
(SELECT COUNT(gw.gameId ) FROM games AS gw WHERE gw.gameResultOut = 1 AND gw.gameWon = 1 AND gw.gameUserID = u.user_id) AS gamesWon,
(SELECT COUNT(gl.gameId) FROM games AS gl WHERE gl.gameResultOut = 1 AND gl.gameWon = 0 AND gl.gameUserID = u.user_id) AS gamesLost,
I guess this is what you're looking for :)
EDIT based on comments, adding tips for User and Site statistics:
For those information you'll need to perform several distinct queries, as most of them are going to sum some values and/or group by a given column, which won't fit for another query. I'll try to give you some ideas so you can work on them.
User Statistics
Most Games Won or Lost
The previous answer for the query you provided counts how many times user has lost/won any game, but does not distinct this data between games.
So, if you want to know in which game user has most wins/losses, you should have something like this:
SELECT
g.gameName,
-- How many times the user won per game
(SELECT COUNT(gw.gameId) FROM games gw WHERE gw.gameResultOut = 1 AND gw.gameWon = 1 AND gw.gameUserID = u.user_id) AS gamesWon,
-- How many times the user payed each game
COUNT(g.gameId) AS gamesPlayed,
-- The Win Ratio. This may need a little work on, depending on what you want.
-- Be aware that if a user played a game 1 time and won, it's ratio will be 1 (100%)
-- So maybe you'll want to add your own rule to determine which game should show up here
(gamesWon / gamesPlayed) AS winRatio
FROM
games g
INNER JOIN user u ON u.user_id = g.gameUserID
-- Groups and counts data based on games + users
GROUP BY g.gameId, u.user_id
-- Now you order by the win ratio
ORDER BY winRatio DESC
-- And get only the first result, which means the game the player has most wins.
LIMIT 1
For lost games, it's pretty much the same query, changing the desired fields and maths.
Game winning accuracy
Somewhat the previous query, except that you won't group by the gameID anymore. Just group by the user and do your math.
Site Statistics
Well, as far as I see, we're still on a similar query. The difference is that for the whole Site statistics you won't ever group by user. You may group by game or league, depending on what you are trying to achieve.
Bottom line: looks that most queries are similar, you'll have to play with them and adapt for each information you need to retrieve. Please note that they might not work plenty as I could not test them on your DB. You may need to correct some inconsistence according to your database/tables schema.
I hope this may give you some insight to work on.

Getting scores from MySQL - better option than sub-queries?

I'm building a website for a friend (kind of a hobby thing, not for anything pro/work related) that'll store information about players, games and scores. I have built most of the reporting/statistical info but I want to display how many times a player hit the max score and am wondering if I can improve my idea (based on sub-queries). My 'scores' table is set out as so:
scores (id, gameID, playerID, venueID, versusID, score1, score2, score3, score4, score5, total, seasonID) - all the xID's are foreign keys.
The premise is that a new entry is made per game, per player so I have PHP insert data from text fields etc. This means that say there's 20 games in a season and for score1 'John Smith' hits the max score of 10 4 times that season. But he also hits it 8 times on score2, 6 times on score3 etc (and obviously, these could be in different games). So at the end of the season, I have a big table with a load of results in (I'd have 240 rows given there's 12 players per team) and when I'm looking at my stats, I want to find out how many times John Smith hit a 10 that season. I can obviously do 5 queries (or 1 with sub-queries) and add the results to tell me this, but I'm wondering what's the best method (or the one the 'SQL guru' would use, if you like) purely for my own development.
So to finish: I'm hoping to run my query and get a resultset that tells me:
Name | Total
John Smith | 12
Rob Smith | 11
Will Smith | 11
etc... | 1
The firstName and secondName are stored in the 'player' table (which is linked to the 'scores' table by the playerID foreign key). I'd like to be able to modify the query later on-demand if I wish, for example if I wanted to see how many times players scored a 9 rather than a 10 (but that can obviously be done by passing the number via PHP).
Searching here (+ Google) has lead me down the 'JOIN' route but I've not had much success. Any help, please? :)
I think this should do the trick:
SELECT playerID, COUNT(playerID) AS Total FROM (
SELECT playerID FROM scores WHERE score1='10'
UNION ALL
SELECT playerID FROM scores WHERE score2='10'
UNION ALL
SELECT playerID FROM scores WHERE score3='10'
UNION ALL
SELECT playerID FROM scores WHERE score4='10'
UNION ALL
SELECT playerID FROM scores WHERE score5='10'
) AS thetable
GROUP BY playerID
Where 10 is the score you want.
This will get the playerID with respective number of 10 scores:
select
playerID,
count(score1 = 10 or null) +
count(score2 = 10 or null) +
count(score3 = 10 or null) +
count(score4 = 10 or null) +
count(score5 = 10 or null)
as total
from scores
group by playerID
having total > 0
Join it to the player table to get the names.