how to get average of rows that have a certain relationship - mysql

I have a bunch of data that is stored pertaining to county demographics in a database. I need to be able to access the average of data within in the state of a certain county.
For example, I need to be able to get the average of all counties who's state_id matches the state_id of the county with a county_id of 1. Essentially, if a county was in Virginia, I would need the average of all of the counties in Virginia. I'm having trouble setting up this query, and I was hoping that you guys could give me some help. Here's what I have written, but it only returns one row from the database because of it linking the county_id of the two tables together.
SELECT AVG(demographic_data.percent_white) as avg_percent_white
FROM demographic_data,counties, states
WHERE counties.county_id = demographic_data.county_id AND counties.state_id = states.state_id
Here's my basic database layout:
counties
------------------------
county_id | county_name
states
---------------------
state_id | state_name
demographic_data
-----------------------------------------
percent_white | percent_black | county_id

Your query is returning one row, because there's an aggregate and no GROUP BY. If you want an average of all counties within a state, we'd expect only one row.
To get a "statewide" average, of all counties within a state, here's one way to do it:
SELECT AVG(d.percent_white) AS avg_percent_white
FROM demographic_data d
JOIN counties a
ON a.county_id = d.county_id
JOIN counties o
ON o.state_id = a.state_id
WHERE o.county_id = 42
Note that there's no need to join to the state table. You just need all counties that have a matching state_id. The query above is using two references to the counties table. The reference aliased as "a" is for all the counties within a state, the reference aliased as "o" is to get the state_id for a particular county.
If you already had the state_id, you wouldn't need a second reference:
SELECT AVG(d.percent_white) AS avg_percent_white
FROM demographic_data d
JOIN counties a
ON a.county_id = d.county_id
WHERE a.state_id = 11
FOLLOWUP
Q What if I wanted to bring in another table.. Let's call it demographic_data_2 that was also linked via the county_id
A I made the assumption that the demographic_data table had one row per county_id. If the same holds true for the second table, then a simple JOIN operation.
JOIN demographic_data_2 c
ON c.county_id = d.county_id
With that table joined in, you could add an appropriate aggregate expression in the SELECT list (e.g. SUM, MIN, MAX, AVG).
The trouble spots are typically "missing" and "duplicate" data... when there isn't a row for every county_id in that second table, or there's more than one row for a particular county_id, that leads to rows not included in the aggregate, or getting double counted in the aggregate.
We note that the aggregate returned in the original query is an "average of averages". It's an average of the values for each county.
Consider:
bucket count_red count_blue count_total percent_red
------ --------- ---------- ----------- -----------
1 480 4 1000 48
2 60 1 200 30
Note that there's a difference between an "average of averages", and calculating an average using totals.
SELECT AVG(percent_red) AS avg_percent_red
, SUM(count_red)/SUM(count_total) AS tot_percent_red
avg_percent_red tot_percent_red
--------------- ---------------
39 45
Both values are valid, we just don't want to misinterpret or misrepresent either the value.

Related

Creating a SQL view from tables without UIDs

I have two tables:
match_rating, which have data on a team's performance in a match. There are naturally two tuples for every matchId (since there are two teams to each match). The PK is matchId, teamId.
event, which has information on events during matches. The PK is an autoincremented UID, and it contains the Foreign Keys match_id and subject_team_id as well.
Now I want to create a new view which counts how many times certain events happen in a match, for each team, with fields like this:
But for the life of me I cannot get around the fact that there are 1) two tuples for each match in the match_rating table, and 2) querying the event table on match_id returns events for both teams.
The closest I got was something like this:
SELECT SUM(
CASE
WHEN evt.event_type_id = 101 THEN 1
WHEN evt.event_type_id = 111 THEN 1
WHEN evt.event_type_id = 121 THEN 1
[etc]
END
) AS 'mid_chances',
SUM(
CASE
WHEN evt.event_type_id = 103 THEN 1
WHEN evt.event_type_id = 113 THEN 1
WHEN evt.event_type_id = 123 THEN 1
[etc]
END
) AS 'right_chances',
mr.tactic,
mr.tactic_skill,
mr.bp,
evt.match_id,
evt.subject_team_id
FROM event evt
JOIN match_rating mr
ON evt.match_id = mr.match_id
WHERE evt.event_type_id BETWEEN 100 AND 104 OR
evt.event_type_id BETWEEN 110 AND 114 OR
evt.event_type_id BETWEEN 120 AND 124 OR
[etc]
GROUP BY evt.match_id
ORDER BY `right_chances` DESC
But still, this counts the events twice, reporting 2 events where there was only 1, 6 for 3 events and so on. I have tried grouping on team_id as well (GROUP BY evt.match_id AND team_id) , but that returns only 2 rows with all events counted.
I hope I have made my problem clear, and it should be obvious that I really need a good tip or two.
Edit for clarity (sorry):
Sample data for match_rating table:
Sample data for the event table:
What I would like to see as the result is this:
That is, two tuples for each match, one for each team, where the types of events that team had is summed up. Thanks so much for looking into this!
Update after comments/feedback
OK.. just to confirm, what you want is
Each row of the output represents a team within a match
Other values (other than match_id and team_id) are sums or other aggregations across multiple rows?
If that is the case, then I believe you should be doing a GROUP BY the match_id and team_id. This should cause the correct number of rows to be generated (one for each match_id/team_id combination). You say in your question that you have tried it already - I suggest reviewing it (potentially after also considering the below).
With your data, it appears that the 'event' table also has a field which indicates the team_id. To ensure you only get the relevant team's events, I suggest your join between match_rating and event be on both fields e.g.,
FROM event evt
JOIN match_rating mr
ON evt.match_id = mr.match_id
AND evt.subject_team_id = mr.team_id
Previous answer - does not answer the question (as per later comments)
Just confirming - the issue is that when you run it, for each match it returns 2 rows - one for each team - but you want to do processing on both teams as one row only?
As such, you could do a few things (e.g., self-join the match rating table to itself, with Team1 ratings and Team2 ratings).
Alternatively, you could modify your FROM to have joins to match_rating twice - where the first has the lower ID for the two teams e.g.,
FROM event evt
JOIN match_rating mr_team1
ON evt.match_id = mr_team1.match_id
JOIN match_rating mr_team2
ON evt.match_id = mr_team2.match_id
AND mr_team1.match_id < mr_team2.match_id
Of course, your processing then needs to be modified to take this into account e.g., one row represents a match, and you have a bunch of data for team1 and similar data for team2. You'd then, I assume, compare the data for team1 columns and team2 columns to get some sort of rating etc (e.g., chance for Team1 to win, etc).

How to make a single MySQL query that uses the results of another query

I have a Perl program that queries a MySQL database to bring back results based upon which "report" option a user has selected from a web page.
One of the reports is all occupants of a student housing building who have applied for a parking permit, but who have not yet been given one.
When the students apply for a permit, it records the specifics about their car (make, model, year, color, etc.) in a single table row. Each apartment can have up to three students, and each student may apply for a permit. So an apartment might have 0 permits, or 1, 2, or 3 permits, depending upon how many of them have cars.
What I'd like to be able to do, is execute a MySQL query that will find out how many occupants in each apartment have applied for a parking permit, and then based on the results of that query, find out how many permits have been issued. If the number of permits issued is less than the number of applications, that apartment number should be returned in the result set. It doesn't have to name the specific occupant, just the fact that the apartment has at least one occupant who has applied for a permit, but not yet received one.
So I have two tables, one is called occupant_info and it contains all kinds of info about the occupant, but the relevant fields are:
counter (a unique row id)
parking_permit_1_number
parking_permit_2_number
parking_permit_3_number
When a parking permit has been assigned, it is recorded in the appropriate parking_permit_#_number field (if it's occupant number one's permit, it would be recorded in parking_permit_1_number, etc.).
The second table is called, parking_permits, and contains all of the car/owner specifics (make, model, year, owner, owner address, etc.). It also contains a field which references the counter from the occupant_info table.
So an example would be:
occupant_info table
counter | parking_permit_1_number | parking_permit_2_number | parking_permit_3_number
--------|-------------------------|-------------------------|------------------------
1 | 12345 | | 98765
2 | 43920 | |
3 | 30239 | | 34233
parking_permits table
counter | counter_from_occupant_info | permit_1_name | permit_2_name | permit_3_name
--------|----------------------------|---------------|-----------------|-------------------
1 |2 | David Jones | James Cameron | Michael Smerconish
2 |3 | Bill Epps | Hillary Clinton | Donald Trump
3 |1 | Joanne Miller | | Sridevi Gupta
I want a query that will first look at how many occupants in an apartment have applied for a permit. This is determined by counting the names in the parking_permits table. In that table, row 1 has three names, row 2 has three names, and row 3 has two names. The query should then look at the occupant_info table, and for each counter_from_occupant_info from the parking_permits table, see if the same number of parking permits have been issued. This can be determined by comparing the number of non-blank parking_permit_#_number fields.
Using the data above, the query would see the following :
parking_permit table row 1
Has counter_from_occupant_info equal to "2"
Has three names
The row in occupant_info with counter = "2" has only one permit number issued,
so counter_from_occupant_info 2 from parking_permits should be in the result set.
parking_permit table row 2
Has counter_from_occupant_info equal to "3"
Has three names
The row in occupant_info with counter = "3" has only two permit numbers issued,
so counter_from_occupant_info 3 from parking_permits should be in the result set.
parking_permit table row 3
Has counter_from_occupant_info equal to "1"
Has two names
The row in occupant_info with counter = "1" has two permit numbers issued,
so this row should *not* be in the result set.
I've thought about using if, then, case, when, type logic to do this in one query, but frankly can't wrap my head around how to do so.
I was thinking something like:
SELECT
CASE WHEN ( SELECT counter_from_occupant_info
FROM parking_permits
WHERE parking_permit_1_name != ""
AND parking_permit_2_name != ""
AND parking_permit_3_name != "" ) THEN
IF ( SELECT parking_permit_1_number,
parking_permit_2_number,
parking_permit_3_number
FROM occupant_info
WHERE counter = ***somehow reference counter from above case statement--I don't know how to do this***
But then my head explodes and I realize I don't know what the heck I'm doing.
Any help would be appreciated. :-)
Doug
You have a few problems:
Your occupants table schema is bad. There's worse out there, but it looks like someone that doesn't understand how a database works built this table.
Your permits table is also bad. Same reason.
You have no idea what you are doing (kidding... kidding...)
Problem 1:
Your occupants table should probably be two tables. Because an occupant could have 0-3 permits (possibly more, I can't tell from the sample data) then you need a table for your occupant's attributes (name, height, gender, age, primary smell, favorite color, first rent date, I dunno).
Occupants
OccupantID | favorite TV Show | number of limbs | first name | last name | aptBuilding
And... another table for Relationship between the occupant and the permit:
Occupant_permits
OccupantID | Permit ID | status
Now... an occupant can have as many permits as you can stuff into that table and the relationship between them has a status "Applied for", or "Granted" or "Revoked" or what have you.
Problem 2
Your permit info table is doing double duty as well. It holds the information about a permit (it's name) as well as the relationship to the occupant. Since we already have a relationship to the occupant with the "Occupant_Permits" table above, we just need a permits table to hold attributes of a permit:
Permits
Permit ID | Permit Name | Description | etc..
Problem 3
Now that you have a correct schema where objects are in their own table (Occupant, Permit, Occupant and Permit Relationship) your query to get a list of apartments that have at least one occupant that has applied, but not yet received a permit would be:
SELECT
COUNT(DISTINCT o.AptBuilding)
FROM
occupants as o
INNER JOIN occupants_permit as op
ON o.occupant_id = op.occupant_id
INNER JOIN permits as p
ON op.permit_id = p.permit_id
WHERE
op.Status = "Applied"
That's nice and simple and you aren't relying on CASE or UNION or count comparison or any fancy stuff. Just nice straight joins and a simple WHERE clause. This will be fast to query and there's no funny business.
Because your schema isn't great, in order to get something similar you'll need to make use of either UNION queries to stack your many permit_N_ fields into a single field and run something similar to the above query, or you'll have use a fair amount of CASE/IF statements:
SELECT DISTINCT p.pCounter
FROM
(
SELECT
counter as Ocounter
CASE WHEN parking_permit_1_number IS NOT NULL THEN 1 ELSE 0 END
+
CASE WHEN parking_permit_2_number IS NOT NULL THEN 1 ELSE 0 END
+
CASE WHEN parking_permit_3_number IS NOT NULL THEN 1 ELSE 0 END AS permitCount
FROM occupant_info
) as o
LEFT OUTER JOIN
(
SELECT
counter_from_occupant_info as pCounter
CASE WHEN parking_permit_1_name IS NOT NULL THEN 1 ELSE 0 END
+
CASE WHEN parking_permit_2_name IS NOT NULL THEN 1 ELSE 0 END
+
CASE WHEN parking_permit_3_Name IS NOT NULL THEN 1 ELSE 0 END AS permitPermitCount
) as p ON o.Ocounter = p.Pcounter
WHERE p.permitCounter > o.PermitCount
I'm not 100% convinced that is exactly what you are looking for since your schema is confusing where you have multiple objects in a single table and everything is pivoted, but... it should get you in the ball park.
This will be much slower too. There's intermediate result sets, CASE statements, and math, so don't expect MySQL to spit this out in milliseconds.

MySQL Query Is Too Slow Or Times Out

I am having a complete nightmare with my application. I haven't worked with datasets this big before, and my query is either timing out or taking ages to return something. I've got a feeling that my approach is just all wrong.
I have a payments table with a postcode field (among others). It has 40,000 rows roughly (one for each transaction). It has an auto-inc PRIMARY key and an INDEX on the postcode foreign-key.
I also have a postcodes lookup table with 2,500,000 rows. The table is structured like so;
postcode | country | county | localauthority | gor
AB1 1AA S99999 E2304 X 45
AB1 1AB S99999 E2304 X 45
The postcode field is PRIMARY and I have INDEXes on all the other fields.
Each field (apart from postcode) has a lookup table. In the case of country it's something like;
code | description
S99999 Wales
The point of the application is that the user can select areas of interest (such as "England", "London", "South West England" etc) and be shown payments results for those areas.
To do this, when a user selects the areas they are interested, I then created a temp table, with one row, listing ALL postcodes for the areas they selected. Then I LEFT JOIN it on to my payments table.
The problem is that if the user selects a big region (like "England") then I have to create a massive temp table (or about 1 million rows) and then compare it to the 40,000 payments to decide which to display.
UPDATE
Here is my code;
$generated_temp_table = "selected_postcodes_".$timestamp_string."_".$userid;
$temp_table_data = $temp_table
->setTempTable($generated_temp_table)
->newQuery()
->with(['payment' => function ($query) use ($column_values) {
$query->select($column_values);
}])
;
Here is my attempt to print out the raw query;
$sql = str_replace(['%', '?'], ['%%', "'%s'"], $temp_table_data->toSql());
$fullSql = vsprintf($sql, $temp_table_data->getBindings());
print_r($fullSql);
This is the result;
select * from `selected_postcodes_1434967426_1`
This doesn't look like the right query, I can't work out what Eloquent is doing here. I don't know why the full query is not printing out.
if you have too many result like 1 million, then use offset limit concept. Then it will save you'r time of the query. Also make sure in you select query you are filtering required fields only.( avoid select * from XXXX. use select A, B from XXX).

MySQL Query - Find_in_set on comma separated columns

I have an issue with a Query I'm conducting to do a search on a Database of events.
The purpose is about sports and the structure is:
id_event event_sport event_city
1 10 153
2 12 270
3 09 135
The table sports is like:
sport_id sport_name
1 Basketball
and the table cities is:
city_id city_name
1 NYC
So things get complicated, because my events table is like:
id_event event_sport event_city
1 10,12 153,270
2 7,14 135,271
3 8,12 143,80
and I have a multi-input search form, so that people can search for events in their city for multiple sports or for multiple cities. I'm using Chosen
The search resultant from Chosen is, for example:
City = 153,270 (if user selected more than one city)
Sport = 12 (if user only selected one sport, can be "9,15")
So what I need is to search for multiple values on cities and sports in the same column, separated by commas, knowing that sometimes we can be searching only for one value, if user didn't input more than one.
My current query is:
SELECT * FROM events e
LEFT JOIN cities c ON e.event_city=c.city_id
LEFT JOIN sports s ON e.event_sport=s.sport_id
WHERE FIND_IN_SET('1CITY', e.event_city) AND FIND_IN_SET('1SPORT', e.event_sport)
;
Which is good to search for one city, but if the user searches for two or more, I don't have way to show it.
Can you please help me?
Thanks in advance.
When the user inputs multiple cities and/or sports, split it on commas, and then the query should look like:
SELECT * FROM events e
LEFT JOIN cities c on e.event_city = c.city_id
LEFT JOIN sports s ON e.event_sport = s.sport_id
WHERE (FIND_IN_SET('$city[0]', e.event_city) OR FIND_IN_SET('$city[1]', e.event_city) OR ...)
AND (FIND_IN_SET('$sport[0]', e.event_sport) OR FIND_IN_SET('$sport[1]', e.event_sport) OR ...)
Using PHP you can build up those OR expressions with:
$city_list = implode(' OR ', array_map(function($x) { return "FIND_IN_SET('$x', e.event_city)"; }, explode(',', $_POST['cities'])));
Do the same to make $sport_list, and then your SQL string would contain:
WHERE ($city_list) AND ($sport_list)
As you can see, this is really convoluted and inefficient, I recommend you normalize your schema as suggested in the comments.

using sum and group by and ifnull

I have a table of money owed, along with a team identifier (could be 1,2,3 for example)
I have another table which gives a name to these team identifiers (so 1 could refer to Team1, 2 could refer to John's jokers etc)
The first table can have multiple entries for money owed and I need to get the total owed per team identifier, and use the team name if it exists.
So I left join the tables and use a sum clause and get a total amount owed per teamname, or null if the teamname is not present. If it is null then I want to use the team identifier, so the results would look like
name total
.....................
team1 100
John's jokers 1000
99 50
where 99 is a team identifier because there was no teamname and there was a null present.
I tried using ifnull(columnName, teamID) but this failed when using a sum clause.
Could anyone help with this problem please
I think ifnull() is used like this:
select ifnull(teams.team_name, teams.team_id) from teams;
So in this case it tries to retrieve the name of the team, and if that comes back null it instead uses the team's identifier. In this case your query would look like this:
select ifnull(teams.team_name, owing.team_id), sum(amount_owed)
from owing left join teams on owing.team_id = teams.id
group by owing.team_id
Make sure the group by asks for the ID field from owing, not teams, otherwise you'll be grouping on a null field.
Does this resolve the issue?