MySQL - Average column B values based on distinct column A values - mysql

I am trying to find the average value from column B based on the distinct (country) of column A.
Argentina 49.5600
Argentina 31.5100
Austria 353.0700
Austria 67.8800
Belgium 6.2700
Belgium 0.1700
This is part of the table. I am trying to average the values in column B based on country, such as 49.5 + 31.5 averaged for only Argentina, etc.
I have tried several combinations so far with no luck.
select shipcountry, round(avg(freight), 1)
from Table.Orders
order by shipcountry;
select shipcountry,
(select round(avg(freight), 1)
from Table.Orders)
from Table.Orders
order by shipcountry;
The first query only returns one row with a single country and value. The second query returns the countries, but column B is averaged altogether. Is there a way to separate the averages by country?

No subquery needed, you're just missing the GROUP BY clause:
SELECT shipcountry, ROUND(AVG(freight),1)
FROM Table.Orders
GROUP BY shipcountry;
GROUP BY in the docs

Related

Query to find count of tagging for item in column A to item in column B

I need to find out count of different entries for same entry in a column (State - North Dakota) to another column (Onshore, offshore) in SQL. In the attached sample - Number of mismatch is 1 (North Dakota - Offshore). Many thanks for any help.
.
You seem to want to count the number of states that have more than 1 distinct terrain. If so, you can use two levels of aggregation:
select count(*) no_mismatches
from (select state from mytable group by state having min(state) <> max(state)) t
Do you want count(distinct)?
select state, count(distinct terrain)
from t
group by state;

Sort fields in MySQL table by sum of the columns

I have data in a table with the fields comprising of the names of individuals, and the rows comprising of data related to their contribution to a certain task. For example:
Person 1's sum is 13, while person 2's sum is 25. I would like the results to present the columns in the Person 2, Person 1 order.
I am looking to sort the query results to rearrange table fields (the names of the people) in order of the sums of each person (column sum)
You can use the GREATEST and LEAST functions to reorder the values. And you can calculate the sums of each original column in a subquery.
SELECT IF(GREATEST(sum1, sum2) = sum1, 'Person1', 'Person2') as HigherName,
GREATEST(sum1, sum2) as val1 AS HigherVal,
IF(LEAST(sum1, sum2) = sum1, 'Person1', 'Person2') AS LowerName,
LEAST(sum1, sum2) as LowerVal
FROM (SELECT SUM(Person1) AS sum1, SUM(Person2) AS sum2
FROM YourTable) AS x
This will produce a result like:
HigherName HigherVal LowerName LowerVal
Person2 25 Person1 13

how to get average of rows that have a certain relationship

I have a bunch of data that is stored pertaining to county demographics in a database. I need to be able to access the average of data within in the state of a certain county.
For example, I need to be able to get the average of all counties who's state_id matches the state_id of the county with a county_id of 1. Essentially, if a county was in Virginia, I would need the average of all of the counties in Virginia. I'm having trouble setting up this query, and I was hoping that you guys could give me some help. Here's what I have written, but it only returns one row from the database because of it linking the county_id of the two tables together.
SELECT AVG(demographic_data.percent_white) as avg_percent_white
FROM demographic_data,counties, states
WHERE counties.county_id = demographic_data.county_id AND counties.state_id = states.state_id
Here's my basic database layout:
counties
------------------------
county_id | county_name
states
---------------------
state_id | state_name
demographic_data
-----------------------------------------
percent_white | percent_black | county_id
Your query is returning one row, because there's an aggregate and no GROUP BY. If you want an average of all counties within a state, we'd expect only one row.
To get a "statewide" average, of all counties within a state, here's one way to do it:
SELECT AVG(d.percent_white) AS avg_percent_white
FROM demographic_data d
JOIN counties a
ON a.county_id = d.county_id
JOIN counties o
ON o.state_id = a.state_id
WHERE o.county_id = 42
Note that there's no need to join to the state table. You just need all counties that have a matching state_id. The query above is using two references to the counties table. The reference aliased as "a" is for all the counties within a state, the reference aliased as "o" is to get the state_id for a particular county.
If you already had the state_id, you wouldn't need a second reference:
SELECT AVG(d.percent_white) AS avg_percent_white
FROM demographic_data d
JOIN counties a
ON a.county_id = d.county_id
WHERE a.state_id = 11
FOLLOWUP
Q What if I wanted to bring in another table.. Let's call it demographic_data_2 that was also linked via the county_id
A I made the assumption that the demographic_data table had one row per county_id. If the same holds true for the second table, then a simple JOIN operation.
JOIN demographic_data_2 c
ON c.county_id = d.county_id
With that table joined in, you could add an appropriate aggregate expression in the SELECT list (e.g. SUM, MIN, MAX, AVG).
The trouble spots are typically "missing" and "duplicate" data... when there isn't a row for every county_id in that second table, or there's more than one row for a particular county_id, that leads to rows not included in the aggregate, or getting double counted in the aggregate.
We note that the aggregate returned in the original query is an "average of averages". It's an average of the values for each county.
Consider:
bucket count_red count_blue count_total percent_red
------ --------- ---------- ----------- -----------
1 480 4 1000 48
2 60 1 200 30
Note that there's a difference between an "average of averages", and calculating an average using totals.
SELECT AVG(percent_red) AS avg_percent_red
, SUM(count_red)/SUM(count_total) AS tot_percent_red
avg_percent_red tot_percent_red
--------------- ---------------
39 45
Both values are valid, we just don't want to misinterpret or misrepresent either the value.

GroupBy and get percentage for each

I have my SQL table like this:
**CLIENTS:**
id
country
I want to echo a table with all countries I have with percentage fo each.
For example, if I have 2 Canadians and 1 French in my table, I want:
1 - Canada - 66%
2 - France - 33%
What I tried:
SELECT country FROM `mytable` GROUP BY `Country`;
It works, but how to have the percentage for each ?
Thanks.
You can use subquery:
SELECT
country,
COUNT(id) * 100 / (SELECT COUNT(id) FROM `mytable`) AS `something`
FROM
`mytable`
GROUP BY
`Country`;
You don't specify a falvor of SQL, but years ago microsoft posted their suggested solution:
select au_id
,(convert(numeric(5,2),count(title_id))
/(Select convert(numeric(5,2),count(title_id)) from titleauthor)) * 100
AS "Percentage Of Total Titles"
from titleauthor group by au_id
To calculate the percentage of total records contained within a group
is a simple result that you can compute. Divide the number of records
aggregated in the group by the total number of records in the table,
and then multiply the result by 100. This is exactly what the
preceding query does. These points explain the query in greater
detail:
The inner nested query returns the total number of records in the
TitleAuthor table: [ Select convert(numeric(5,2),count(title_id)) from
titleauthor ]
The value returned by the COUNT(title_id) in the outer
GROUP BY query returns the number of titles written by a specific
author.
The value returned in step 2 is divided by the value returned
in step 1, and the result is multiplied by 100 to compute and display
the percentage of the total number of titles written by each author.
The nested SELECT is executed once for each row returned by the outer
GROUP BY query
The CONVERT function is used to cast the values
returned by the COUNT aggregate function to the numeric data type with
a precision of 5 and a scale of 3 to achieve the required level of
precision.

MySQL - The most occuring for the specific day?

I'm stuck on this problem.
Basically I need to find out for each department how to figure out which days had the most sales made in them. The results display the department number and the date of the day and a department number can appear several times in the results if there were several days that have equally made the most sales.
This is what I have so far:
SELECT departmentNo, sDate FROM Department
HAVING MAX(sDate)
ORDER BY departmentNo, sDate;
I tried using the max function to find which dates occurred most. But it only returns one row of values. To clarify more, the dates that has the most sales should appear with the corresponding column called departmentNo. Also, if two dates for department A has equal amount of most sales then department A would appear twice with both dates showing too.
NOTE: only dates with the most sales should appear and the departmentNo.
I've started mySQL for few weeks now but still struggling to grasp the likes of subqueries and store functions. But i'll learn from experiences. Thank you in advance.
UPDATED:
Results I should get:
DepartmentNo Column 1: 1 | Date Column 2: 15/08/2000
DepartmentNo Column 1: 2 | Date Column 2: 01/10/2012
DepartmentNo Column 1: 3 | Date Column 2: 01/06/1999
DepartmentNo Column 1: 4 | Date Column 2: 08/03/2002
DepartmentNo Column 1: nth | Date Column 2: nth date
These are the data:
INSERT INTO Department VALUES ('1','tv','2012-05-20','13:20:01','19:40:23','2');
INSERT INTO Department VALUES ('2','radio','2012-07-22','09:32:23','14:18:51','4');
INSERT INTO Department VALUES ('3','tv','2012-09-14','15:15:43','23:45:38','3');
INSERT INTO Department VALUES ('2','tv','2012-06-18','06:20:29','09:57:37','1');
INSERT INTO Department VALUES ('1','radio','2012-06-18','11:34:07','15:41:09','2');
INSERT INTO Department VALUES ('2','batteries','2012-06-18','16:20:01','23:40:23','3');
INSERT INTO Department VALUES ('2','remote','2012-06-18','13:20:41','19:40:23','4');
INSERT INTO Department VALUES ('1','computer','2012-06-18','13:20:54','19:40:23','4');
INSERT INTO Department VALUES ('2','dishwasher','2011-06-18','13:20:23','19:40:23','4');
INSERT INTO Department VALUES ('3','lawnmower','2011-06-18','13:20:57','20:40:23','4');
INSERT INTO Department VALUES ('3','lawnmower','2011-06-18','11:20:57','20:40:23','4');
INSERT INTO Department VALUES ('1','mobile','2012-05-18','13:20:31','19:40:23','4');
INSERT INTO Department VALUES ('1','mouse','2012-05-18','13:20:34','19:40:23','4');
INSERT INTO Department VALUES ('1','radio','2012-05-18','13:20:12','19:40:23','4');
INSERT INTO Department VALUES ('2','lawnmowerphones','2012-05-18','13:20:54','19:40:23','4');
INSERT INTO Department VALUES ('2','tv','2012-05-12','06:20:29','09:57:37','1');
INSERT INTO Department VALUES ('2','radio','2011-05-23','11:34:07','15:41:09','2');
INSERT INTO Department VALUES ('1','batteries','2011-05-21','16:20:01','23:40:23','3');
INSERT INTO Department VALUES ('2','remote','2011-05-01','13:20:41','19:40:23','4');
INSERT INTO Department VALUES ('3','mobile','2011-05-09','13:20:31','19:40:23','4');
For department1 the date 2012-05-18 would appear because that date occurred the most. And for every department, it should only show the one with the most sales, and if same amount of sales appears on the same date then both will appear, e.g. Department 1 will appear twice with both the dates of max sales.
I've tested the following query based on the table and two columns you've provided along with sample data. So, let me describe it for you. The inner-most "PREQUERY" is doing a count by department and date. The results of this will be pre-ordered by Department first, THEN the highest count in DESCENDING ORDER (so highest sales count is listed FIRST), it doesn't matter what date the count happened.
Next, by utilizing MySQL #variables, I'm pre-declaring two to be used in the query. #variables are like inline programming with MySQL. They can be declared once and then changed as applied to each record being processed. So, I'm defaulting to a bogus department value and a zero sales count.
Now, I'm grabbing the results of the PreQuery (Dept, #Sales and Date), but now, adding a test. If it is the FIRST ENTRY for a given department, use that record's "NumberOfSales" and put into the #maxSales variable and store as a final column name "MaxSaleCnt". The next column name uses the #lastDept and is set to whatever the current record's Department # is. So it can be compared to the next record.
If the next record is the same department, then it just keeps whatever the #maxSales value was from the previous, thus keeping the same first count(*) result for ALL entries on each respective department.
Now, the closure. I've added a HAVING clause (not a WHERE as that restricts what records get tested, but HAVING processes AFTER the records are part of the PROCESSED set. So now, it would have all 5 columns. I am saying ONLY KEEP those records where the final NumberOfSales for the record MATCHES the MaxSaleCnt for the department. If one, two or more dates, no problem it returns them all per respective department.
So, one department could have 5 dates with 10 sales each, and another department has 2 dates with only 3 sales each, and another with only 1 date with 6 sales.
select
Final.DepartmentNo,
Final.NumberOfSales,
Final.sDate
from
(select
PreQuery.DepartmentNo,
PreQuery.NumberOfSales,
PreQuery.sDate,
#maxSales := if( PreQuery.DepartmentNo = #lastDept, #maxSales, PreQuery.NumberOfSales ) MaxSaleCnt,
#lastDept := PreQuery.DepartmentNo
from
( select
D.DepartmentNo,
D.sDate,
count(*) as NumberOfSales
from
Department D
group by
D.DepartmentNo,
D.sDate
order by
D.DepartmentNo,
NumberOfSales DESC ) PreQuery,
( select #lastDept := '~',
#maxSales := 0 ) sqlvars
having
NumberOfSales = MaxSaleCnt ) Final
To clarify the "#" and "~" per you final comment. The "#" indicates a local variable to the program (or in this case and in-line sql variable) that can be used in the query. The '~' is nothing more than a simple string that probability would never exist that of any of your departments, so when it is compared to the first qualified record, does an IF( '~' = YourFirstDepartmentNumber, then use this answer, otherwise use this answer).
Now, how do the above work. Lets say the following is the results of your data returned by the inner-most query, grouped and ordered by the most sales at the top going down... SLIGHTLY altered from your data, lets just assume the following to simulate multiple dates on Dept 2 that have the same sales quantity...
Row# DeptNo Sales Date # Sales
1 1 2012-05-18 3
2 1 2012-06-18 2
3 1 2012-05-20 1
4 2 2012-06-18 4
5 2 2011-05-23 4
6 2 2012-05-18 2
7 2 2012-05-12 1
8 3 2011-06-18 2
9 3 2012-09-14 1
Keep track of the actual rows. The innermost query that finishes as alias "PreQuery" returns all the rows in the order you see here. Then, that is joined (implied) with the declarations of the # sqlvariables (special to MySQL, other sql engines dont do this) and starts their values with the lastDept = '~' and the maxSales = 0 (via assignment with #someVariable := result of this side ).
Now, think of the above being handled as a
DO WHILE WE HAVE RECORDS LEFT
Get the department #, Number of Sales and sDate from the record.
IF the PreQuery Record's Department # = whatever is in the #lastDept
set MaxSales = whatever is ALREADY established as max sales for this dept
This basically keeps the MaxSales the same value for ALL in the same Dept #
ELSE
set MaxSales = the # of sales since this is a new department number and is the highest count
END IF
NOW, set #lastDept = the department you just processed to it
can be compared when you get to the next record.
Skip to the next record to be processed and go back to the start of this loop
END DO WHILE LOOP
Now, the reason you need to have the #MaxSales and THEN the #LastDept as returned columns is they must be computed for each record to be used to compare to the NEXT record. This technique can be used for MANY application purposes. If you click on my name, look at my tags and click on the MySQL tag, it will show you the many MySQL answers I've responded to. Many of them do utilize # sqlvariables. In addition, there are many other people who are very good at working queries, so dont just look in one place. As for any question, if you find a good answer that you find helpful, even if you didn't post the question, clicking on an up-arrow next to the answer helps others indicate what really helped them understand and get resolution to questions -- again, even if its not your question. Good luck on your MySQL growth.
I think this can be achieved with a single query, but my experiences for similar functionality have involved either WITH (as defined in SQL'99) using either Oracle or MSSQL.
The best (only?) way to approach a problem like this is to break in into smaller components. (I don't think your provided statement provides all columns, so I'm going to have to make a few assumptions.)
First, how many sales were made for each day for each group:
SELECT department, COUNT(1) AS dept_count, sale_date
FROM orders
GROUP BY department, sale_date
Next, what's the most sales for each department
SELECT tmp.department, MAX(tmp.dept_count)
FROM (
SELECT department, COUNT(1) AS dept_count
FROM orders
GROUP BY department
) AS tmp
GROUP BY tmp.department
Finally, putting the two together:
SELECT a.department, a.dept_count, b.sale_date
FROM (
SELECT tmp.department, MAX(tmp.dept_count) AS max_dept_count
FROM (
SELECT department, COUNT(1) AS dept_count
FROM orders
GROUP BY department
) AS tmp
GROUP BY tmp.department
) AS a
JOIN (
SELECT department, COUNT(1) AS dept_count, sale_date
FROM orders
GROUP BY department, sale_date
) AS b
ON a.department = b.department
AND a.max_dept_count = b.dept_count