Find all pizzerias that serve ONLY pizzas eaten by people over 30 - relational-database

I know the correct answer but I'm not quite sure about ONE part of it.
RA is a relational algebra interpreter that translates relational algebra queries into SQL queries, then executes the SQL on a standard relational database system. So the language is similar to SQL but based on relational algebra.
Here is the sample database:
Serves(pizzeria, pizza, price)
Eats (name, pizza)
Person (name, age, gender)
Find all pizzerias that serve ONLY pizzas eaten by people over 30
\project_{pizzeria} Serves
\diff
\project_{pizzeria} (
Serves
\join (
(\project_{pizza} Serves)
\diff (\project_{pizza} ((\select_{age>'30'} Person) \join Eats))
)
)
What I fail to understand is the last part of the query:
\project_{pizza} ((\select_{age>'30'} Person) \join Eats)
Wouldn't you want to diff age < 30 not age > 30? This would minus all the ages that are less than 30 and leave you with all the ages > 30 no? Yet, I know this is wrong. Can someone explain the logic behind this?

Don't jump to conclusions. Read carefully.
project {pizza} (select {age>'30'} (Person) join Eats)
That expression involves people over 30. But it is not the final answer.
You subtract those tuples from something else. If you subtracted people over 30 from all people then you would have people not over 30. This is not people over 30 and you are not subtracting it from all people. It is pizzas eaten by people over 30 and you are subtracting it from all pizzas to get pizzas not eaten by people over 30.
project {pizza} (Serves)
- project {pizza} (select {age>'30'} (Person) join Eats)
Later you get those pizzas' pizzerias, ie the pizzerias that serve pizzas not eaten by people over 30.
project {pizzeria} (
Serve
join (
project {pizza} (Serves)
- project {pizza} (select {age>'30'} (Person) join Eats)
)
Then you subtract them from all pizzerias to get pizzerias that only serve pizzas eaten by people over 30.
project {pizzeria} (Serves)
- project {pizzeria} (
Serve
join (
project {pizza} (Serves)
- project {pizza} (select {age>'30'} (Person) join Eats)
)
)
So you never diff/minus/subtract people over 30. In particular, you never subtract them from all people to get people not over 30.

Related

MySQL query for an MLB database

There are 5 tables: mlb_batting, mlb_manager, mlb_master, mlb_pitching, mlb_team.
Find the top 10 (highest) “strike outs per walk” statistic for all pitches with at least 1 walk that played in at least 25 games. You should display their first name, last name, and K/BB statistic. K/BB is computed by dividing the number of strike outs by the number of walks (“base on balls”). You will need to use “limit” in MySQL (not talked about in class or notes – you will have to search how to do it). I would like this query done 2 different ways. One that only looks at the 25 games and 1 walk on a per stint basis. That is, if they played for two different teams (two different stints) then you would count those separate. And the other query should combine all the stints they had. That is, if they played for two different teams you would add up their games and walks.
My solution is:
SELECT NAME_FIRST, NAME_LAST, SUM(strikeouts) / SUM(walks) AS KS_PER_BB
FROM mlb_master
JOIN mlb_pitching
ON mlb_master.player_id = mlb_pitching.player_id
WHERE walks >= 1 AND games >= 25
GROUP BY name_first, name_last, mlb_pitching.stint
ORDER BY KS_PER_BB DESC
LIMIT 10;
I am wondering if this solution is better for the first way my professor wants it done or the second way, if any.
This solution is appropriate for the first query because by having GROUP BY stint, each stint is considered different for each player.
For the second way, could I remove the stint column from the GROUP BY clause so that it groups the records for a particular player together, regardless of the different stints they played for?
Would this result in the sum of all their walks and strikeouts from all their stints being used to calculate the KS_PER_BB statistic, giving you the combined total for each player?

MYSQL How to input externally calculated value into row

I'm having some issues with this MySQL query. I've got two tables, one that has a list of all the "Leaders of the Opposition"(People elected into office) with the date that they were elected. And I've got another table of all the people they've been married to, and the year they got married in.
I'm trying to make a query that returns all the Leaders of the Opposition ordered by their appointment date with their current spouses name at the time and the date of their marriage.
Here is some practice data of just one leader, dates changed a bit to fit the sort of problem I'm trying to solve.
TABLE ONE:
Leader_of_Opposition------Date Elected
Beazley K C, 1996-03-19
Beazley K C, 2005-01-28
TABLE TWO:
Leader_of_Opposition----Spouses's Name----Year Married
Beazley K C, Mary Ciccarelli, 1974
Beazley K C, Susie Annus, 2004
-
And I'm trying to get it to something like this:
Leader_of_Opposition------Date Elected------Spouses's name--------Year Married
Beazley K C, 1996-03-19, Mary Ciccarelli, 1974
Beazley K C, 2005-01-28, Susie Annus, 2004
-
So far I've got the basics of:
SELECT opposition.leader_name, opposition.time_begin, opposition_marriage.spouse_name, opposition_marriage.year_married'
FROM opposition, opposition_marriage
AND opposition.leader_name=opposition_marriage.leader_name
ORDER BY opposition.time_begin
But it gives me results where the leaders are mentioned multiple times for each marriage. And I can't figure out the syntax to search the other table then place that value into the row.
Any help would be extremely appreciated, been banging my head up against this one for a while now.
Thanks in advance.
I think this is going to be easiest with correlated subqueries. Alas, though, your tables do not have unique identifiers for each row.
SELECT o.leader_name, o.time_begin,
(select om.spouse_name
from opposition_marriage om
where o.leader_name = om.leader_name and om.year_married <= year(o.date_elected)
order by om.year_married desc
limit 1
) as spouse_name,
(select om.year_married
from opposition_marriage om
where o.leader_name = om.leader_name and om.year_married <= year(o.date_elected)
order by om.year_married desc
limit 1
) as year_married
FROM opposition o
ORDER BY o.time_begin;
This handles as many marriages as you like.
Now some comments:
It seems really strange to have a table only of marriages for the opposition leaders and not for all politicians.
The granularity is at the level of a "year", so a leader married in the same year after s/he takes office counts as being married to that spouse.
You do not have a "marriage end date", so a divorced or widowed partner would be considered current until the next marriage.
As I mention in the beginning, you should have a unique identifier for each row.

Relational Database Logic

I'm fairly new to php / mysql programming and I'm having a hard time figuring out the logic for a relational database that I'm trying to build. Here's the problem:
I have different leaders who will be in charge of a store anytime between 9am and 9pm.
A customer who has visited the store can rate their experience on a scale of 1 to 5.
I'm building a site that will allow me to store the shifts that a leader worked as seen below.
When I hit submit, the site would take the data leaderName:"George", shiftTimeArray: 11am, 1pm, 6pm (from the example in the picture) and the shiftDate and send them to an SQL database.
Later, I want to be able to get the average score for a person by sending a query to mysql, retrieving all of the scores that that leader received and averaging them together. I know the code to build the forms and to perform the search. However, I'm having a hard time coming up with the logic for the tables that will relate the data. Currently, I have a mysql table called responses that contains the following fields,
leader_id
shift_date // contains the date that the leader worked
shift_time // contains the time that the leader worked
visit_date // contains the date that the survey/score was given
visit_time // contains the time that the survey/score was given
score // contains the actual score of the survey (1-5)
I enter the shifts that the leader works at the beginning of the week and then enter the survey scores in as they come in during the week.
So Here's the Question: What mysql tables and fields should I create to relate this data so that I can query a leader's name and get the average score from all of their surveys?
You want tables like:
Leader (leader_id, name, etc)
Shift (leader_id, shift_date, shift_time)
SurveyResult (visit_date, visit_time, score)
Note: omitted the surrogate primary keys for Shift and SurveyResult that I would probably include.
To query you join shifts and surveys group on leader and taking the average then jon that back to leader for a name.
The query might be something like (but I haven;t actually built it in MySQL to verify syntax)
SELECT name
,AverageScore
FROM Leader a
INNER JOIN (
SELECT leader_id
, AVG(score) AverageScore
FROM Shift
INNER JOIN
SurveyResult ON shift_date = visit_date
AND shift_time = visit_time --depends on how you are recording time what this really needs to be
GROUP BY leader ID
) b ON a.leader_id = b.leader_id
I would do the following structure:
leaders
id
name
leaders_timetabke (can be multiple per leader)
id,
leader_id
shift_datetime (I assume it stores date and hour here, minutes and seconds are always 0
survey_scores
id,
visit_datetime
score
SELECT l.id, l.name, AVG(s.score) FROM leaders l
INNER JOIN leaders_timetable lt ON lt.leader_id = l.id
INNER JOIN survey_scores s ON lt.shift_datetime=DATE_FORMAT('Y-m-d H:00:00', s.visit_datetime)
GROUP BY l.id
DATE_FORMAT here helps to cut hours and minutes from visit_datetime so that it could be matched against shift_datetime. This is MYSQL function, so if you use something else you'll need to use different function
Say you have a 'leader' who has 5 survey rows with scores 1, 2, 3, 4 and 5.
if you select all surveys from this leader, sum the survey scores and divide them by 5 (the total amount of surveys that this leader has). You will have the average, in this case 3.
(1 + 2 + 3 + 4 + 5) / 5 = 3
You wouldn't need to create any more tables or fields, you have what you need.

How to perform this MySQL query given this relation diagram

I have the following relation diagram where arrows represent Foreign Keys. The word in the blue is the table name and the words below are column names.
My question is how I could extract the following data from this table:
-what is the GPA of the student with ID=1?
-what are the average GPAs for students by department?
Given that: there are only
five letter grades with values A=4, B=3, C=2, D=1, and F=0, and GPA is the sum of
course credits x course grade value divided by total credits x 4. (so takes.grade is an int from 0-4 inclusive).
I have been trying to figure this out for hours with no avail. Could anyone steer me in the right direction?
Thanks for any help.
Ok, I've actually had to do this for a client over 15 yrs ago, and did for the entire database of all students, not as difficult once you have the pieces.
Without your exact queries as you want guidance.
Start with a single query that pulls data into a TEMPORARY table
CREATE TEMPORARY TABLE AllStudentsCourses
SELECT blah... from blah... join blah.. where... order by ...
A list of all classes a person has signed up for, and while you are at it, have columns computed at the PER-CLASS BASIS the grade earned A-F. You'll also need the credit hours of the class too as that is basis of computing a GPA.
a 3 hour course with an A gets 3 cr hrs towards GPA.
a 6 hour course with an A gets 6 cr hrs towards GPA.
a 6 hour course with a B gets LESS weighted value towards GPA
and you'll need to roll aggregates up.
Now, once you have all the classes a student attempted, you'll need to compute the per-class as sample described. If you want to apply that in the first query, do so.
Once you have that, then, you can roll-up the total credit hrs attempted vs credit hrs earned.
Basic math should help you along.
I didn't quite understand how the GPA is calculated, think there is something wrong with your original post. Here's a starting point that may or may not be right:
SELECT avg(t.grade)
FROM takes t
INNER JOIN course c
ON t.course_id = c.course_id
WHERE t.ID = 1;
SELECT c.dept_name, avg(grade)
FROM takes t
INNER JOIN course c
ON t.course_id = c.course_id
GROUP BY c.dept_name

Determining Rookie Years in Lahman Database

I'm using the MySQL version of the Lahman Baseball Database and I'm having trouble trying to determine the year a player lost their rookie standing. The rules for an MLB player losing rookie standing are:
A player shall be considered a rookie unless, during a previous season or seasons, he has (a) exceeded 130 at-bats or 50 innings pitched in the Major Leagues; or (b) accumulated more than 45 days on the active roster of a Major League club or clubs during the period of 25-player limit (excluding time in the military service and time on the disabled list).
Is there a query that can be run to do this for Batters and Pitchers, or is this something that would be programmatically done?
Using the Lahman Database you can figure out Rookies by At Bats (>130) and Innings Pitched (>50), however there isn't anything for service time during the 25 man roster (non-Sept) limit.
You would need retrosheets {http://www.retrosheet.org/game.htm} data to do that.
The queries below would give you ALL of the rookies by At Bats and Innings Pitched, however the service time rookies would be the exception. There's only a few of those as teams don't tend to keep rookies on the MLB roster and not play them. The lose development time (not playing) and accelerate their service time to lose out on controlled years. So if you're happy with that, these tables will do.
You can use this as a Xref table with batters or pitchers to highlight their rookie year. Or you could add an extra column to batters and pitchers with the RookieYr distinction (advise against it as if you want to add new seasons to your Lahman DB - less customizing needed).
/************************************ Create MLB Rookie Xref Table **********************************************
-- Sort Out Batters who accumulate 130 AB
-- Sort Out Pitchers who accumulate 50 IP
-- Define Rookie Year, Drop off years previous and years after
-- Can be updated Annually using "player ID not in (select distinct playerID from Xref_RookieYr)
-- Using the Sean Lahman Database
-- Authored By Paul DeVos {www.linkedin.com/in/devosp/}
*****************************************************************************************************************/
/****** Query uses T-SQL, Query ran in MS SQL 2012 - you may need to tweek for other platorms or versions. ******/
--Step 1 - Run this for hitter accumulated ABs and when Rookie Year (130 Career At Bats)
Select
concat(m.nameFirst, ' ', m.nameLast) as Name,
b.PlayerID,
b.yearID,
m.debut,
sum(b.ab) over (partition by b.playerID order by b.playerID, b.yearID) as CumulativeAB,
null as CumulativeIP, -- Place Holder for Rookie Pitchers Insert
case when sum(b.ab) over (partition by b.playerID order by b.playerID, b.yearID) >= 130 then b.yearID end as RookieYR
into #temp_rookie_year
from
[master] m
inner join Batting b
on m.playerID=b.playerID
-- Selects Position Players
where b.playerID not in (select distinct f.playerID from Fielding f where f.pos = 'P')
--Step 2 - Run this to get accumulated IP and Rookie Year (50 Career IP)
Insert into #temp_rookie_year
(
Name, PlayerID, YearID, Debut, CumulativeAB, CumulativeIP, RookieYR
)
Select
concat(m.nameFirst, ' ', m.nameLast) as Name,
p.PlayerID,
p.yearID,
m.debut,
null as CumulativeAB,
sum(p.IPouts) over (partition by p.playerID order by p.playerID, p.yearID) as CumulativeIP,
case when sum(p.IPouts) over (partition by p.playerID order by p.playerID, p.yearID) >= 150 then p.yearID end as RookieYR
from [master] m
inner join pitching p
on m.playerID=p.playerID
--Chooses Pitchers
where p.playerID in (select distinct f.playerID from Fielding f where f.pos = 'P')
--Step 3 Run this - sorts out the rookie year into Rookie Xref Table
select Name, PlayerID, min(RookieYr) as RookieYear
into #Xref_RookieYr
from #temp_rookie_year
--where name = 'Hank Aaron'
group by Name, PlayerID
order by RookieYear desc
--Step 4 - run IF you want to remove players who never lost rookie status (cup of cofee players, etc - anyone under 130 AB or 50 IP)
select * from #Xref_RookieYr
order by playerID
Delete from #Xref_RookieYr where RookieYear is null
select * from #Xref_RookieYr
order by playerID
/*****************************************************************************************************************
You can change drop the "#" in front of the table (and name it whatever you want) when you want a permanent table.
If you leave it, it'll drop off when you close the program. e.g. Xref_Rookie_2013
*****************************************************************************************************************/
This can be done in SQL. How it is done will be based upon what is the most optimal way of doing it. Most likely it could be done with one query like so (pseudo-code):
SELECT Master.*
FROM Master
LEFT JOIN Batting ON Master.player_id = Batting.player_id
LEFT JOIN Pitching ON Master.player_id = Pitching.player_id
WHERE Batting.AB > 130 OR Pitching.IPOuts > (50 x 3)
OR Master.DaysActive > 45
That last part of the WHERE statement is a bit iffy because I don't find anything like that in the data from your database provider. I see active games but that isn't the same thing. The Appearances table might get you close but that is about all you can do.
Here is the data I based my pseudo-code off of:
http://baseball1.com/files/database/readme58.txt
I did find another guy who was doing something similar to what you are doing (including calculating who is a rookie). Here is his site (with code):
http://baseballsimulator.com/blog/category/database/