SQL - Average number of related records with group_by - mysql

I have a table of records (lets call them TV shows) with an air_date field.
I have another table of advertisements that are related by a show_id field.
I am trying to get the average number of advertisements per show for each date (with a where clause specifying the shows).
I currently have this:
SELECT
`air_date`,
(SELECT COUNT(*) FROM `commercial` WHERE `show_id` = `show`.`id`) AS `num_commercials`,
FROM `show`
WHERE ...
This gives me a result like so:
air_date | num_commercials
2015-6-30 | 6
2015-6-30 | 3
2015-6-30 | 8
2015-6-30 | 2
2015-6-31 | 9
2015-6-31 | 4
When I do a GROUP_BY, it only gives me one of the records, but I want the average for each air_date.

Not too sure I am clear on what you want - but does this do it
SELECT `air_date`,
AVG((SELECT COUNT(*) FROM `commercial` WHERE `show_id` = `show`.`id`)) AS `num_commercials`,
FROM `show`
WHERE .....
GROUP BY `air_date`
(Note double parentheses for AVG function is required)

You can use a sub-query to select count of commercials by air_date/show, then use an outer query to select the average commercials count per air_date.
Something like this should work:
select air_date, avg(num_commercials)
from
(
select show.air_date as air_date,
show.id as show_id,
count(*) as num_commercials
from show
inner join commercial on commercial.show_id = show.id
group by show.air_date, show.id
where ...
) sub
group by air_date

Related

Mysql multi column count having minimum occurrences values in columns

I have a table t1 with 5 columns and 80000 rows :
+---+--------+-------+--------+------------+
|id |category|groupe |subject | description|
+---+--------+-------+--------+------------+
|1 |categ1 |group1 |subject1| desc1 |
|2 |categ1 |group2 |subject2| desc2 |
|3 |categ1 |group2 |subject5| desc3 |
|4 |categ2 |group1 |subject5| desc4 |
|5 |categ2 |group3 |subject1| desc5 |
|6 |categ2 |group3 |subject2| desc6 |
|7 |categ3 |group1 |subject1| desc7 |
|8 |categ3 |group1 |subject4| desc8 |
+---+--------+-------+--------+------------+
I need to extract rows that have minimum 30 occurrences of values in category AND 30 occurrences of group AND 30 of subject.
This means if "categ3" appears more than 30 times, i need rows with categ3
same with group and subject.
but when i used the query bellow the final result can have less than 30 categ3 because result has been filtered by group or subject that remove id who have categ3.
You can see an example on db<>fiddle,the good query result count() with 10 occurences have to return 118 rows.
select
*
from
t1
where
category in (
SELECT
category
FROM
t1
GROUP BY
category
HAVING
COUNT(category) >= 30
)
and
groupe in (
SELECT
groupe
FROM
t1
GROUP BY
groupe
HAVING
COUNT(groupe) >= 30
)
and
subject in (
SELECT
subject
FROM
t1
GROUP BY
subject
HAVING
COUNT(subject) >= 30
)
This query return intersection on ID where category,groupe and subject have 30 occurrences on values, but this intersection reduce the result count...
this means certain category values count could be reduce to a number less than 30.
for resume,i need 30 occurences in the intersection result.
I think I need to do a recursive filter and have to repeat the loop until input rows is equal to output rows.. But I don't know how to do that... An idea?
Thanks 😊
Add some DISTINCT's, while grouping on the 3 columns.
select *
from dataset t
where t.category in (SELECT distinct category FROM dataset GROUP BY category, groupe, subject HAVING COUNT(*) >= 30)
and t.groupe in (SELECT distinct groupe FROM dataset GROUP BY category, groupe, subject HAVING COUNT(*) >= 30)
and t.subject in (SELECT distinct subject FROM dataset GROUP BY category, groupe, subject HAVING COUNT(*) >= 30)
A test on db<>fiddle here
For reference sake, this query will only select those with a tupple that occurs 30 times or more.
Which will naturally be less that the query above.
SELECT *
FROM dataset
WHERE (category, groupe, subject) IN (
SELECT category, groupe, subject
FROM dataset
GROUP BY category, groupe, subject
HAVING COUNT(*) >= 30
)
Pro tip: This is a case where describing your requirement takes a lot of thought. As you think about it, think of SQL as a processor of sets of rows. It is always worthwhile to describe the requirement as carefully as you can, especially when it is as tricky as this one. Often it's helpful to describe the problem domain, rather than just talking about columns and values.
I guess you need the sets of rows meeting your three different criteria (more than x duplicates). You can use a set of id values for those rows because they are apparently a primary key (unique).
Here's one set of IDs
SELECT id FROM dataset WHERE category IN (
SELECT category FROM dataset GROUP BY category HAVING COUNT(*) >= 5))
I believe you need all the rows lying in the intersection of those three sets. That is, you want any rows having all three items recurring frequently. You can get that with
id IN set1 AND id IN set2 AND id IN set3
If you need the union of those sets you can use this instead. This gives you the rows with any of the three items recurring frequently.
id IN set1 OR id IN set2 OR id IN set3
So here's the query.
SELECT *
FROM dataset
WHERE id IN (
SELECT id FROM dataset WHERE category IN (
SELECT category FROM dataset GROUP BY category HAVING COUNT(*) >= 5))
AND id IN (
SELECT id FROM dataset WHERE groupe IN (
SELECT groupe FROM dataset GROUP BY groupe HAVING COUNT(*) >= 5))
AND id IN (
SELECT id FROM dataset WHERE subject IN (
SELECT subject FROM dataset GROUP BY subject HAVING COUNT(*) >= 5))
I used 5 for the repeat threshold. You can use another number.
If you want your result set to contain only those rows with at least ten items in the result set, rather than in the dataset, you would use this query.
select d.*
from dataset d
join (
select count(*), groupe, category, subject
from dataset
group by groupe, category, subject
having count(*) >= 10
) e ON d.groupe=e.groupe AND d.category = e.category AND d.subject = e.subject

How do add together a COUNT() using SUM()?

Basically, I have this -
SELECT COUNT(student_dormpm.DormCode) * dorm_datapm.DormCharge
FROM student_dormpm
JOIN dorm_datapm
USING (DormCode)
GROUP BY DormCode;
with a result of
+---------------------------------------------------------+
| COUNT(student_dormpm.DormCode) * dorm_datapm.DormCharge |
+---------------------------------------------------------+
| 5250.00 |
| 11250.00 |
| 9600.00 |
| 6500.00 |
| 5510.00 |
+---------------------------------------------------------+
But whenever I try to add SUM(), it falls apart and I get an error of Invalid use of group function.
I've tried this SUM(COUNT(student_dormpm.DormCode) * dorm_datapm.DormCharge)
Use sum on entire the first query output where an alias name say t is specified to the first query output (to simulate a new table) and also an alias say total specified to the counter so you can calculate sum(total):
select sum(total) as mysum from
(
SELECT COUNT(student_dormpm.DormCode) * dorm_datapm.DormCharge as total
FROM student_dormpm
JOIN dorm_datapm
USING (DormCode)
GROUP BY DormCode
) t
It looks like dorm_datapm is your dorm table that has one record per DormCode with the associated DormCharge. (In that case a table name dorm ight be more appropriate, though.)
If this is so, then you merely want to sum up the dorm charges:
SELECT DormCode, SUM(dd.DormCharge)
FROM student_dormpm sd
JOIN dorm_datapm dd USING (DormCode)
GROUP BY DormCode
ORDER BY DormCode;
If, however dorm_datapm can have multiple records per DormCode, then you'll have to see, whether above query gets you the results you want or if you want to join a student_dormpm row with only one dorm_datapm row (in which case you would have to extend your join criteria somehow).
If you are only looking for the total, then don't group by DormCode:
SELECT SUM(dd.DormCharge)
FROM student_dormpm sd
JOIN dorm_datapm dd USING (DormCode);

MySQL - Group By Latest and Join First Instance

I've tried a few things but I've ended up confusing myself.
What I am trying to do is find the most recent records from a table and left join the first after a certain date.
An example might be
id | acct_no | created_at | some_other_column
1 | A0001 | 2017-05-21 00:00:00 | x
2 | A0001 | 2017-05-22 00:00:00 | y
3 | A0001 | 2017-05-22 00:00:00 | z
So ideally what I'd like is to find the latest record of each acct_no sorted by created_at DESC so that the results are grouped by unique account numbers, so from the above record it would be 3, but obviously there would be multiple different account numbers with records for different days.
Then, what I am trying to achieve is to join on the same table and find the first record with the same account number after a certain date.
For example, record 1 would be returned for a query joining on acct_no A0001 after or equal to 2017-05-21 00:00:00 because it is the first result after/equal to that date, so these are sorted by created_at ASC AND created_at >= "2017-05-21 00:00:00" (and possibly AND id != latest.id.
It seems quite straight forward but I just can't get it to work.
I only have my most recent attempt after discarding multiple different queries.
Here I am trying to solve the first part which is to select the most recent of each account number:
SELECT latest.* FROM my_table latest
JOIN (SELECT acct_no, MAX(created_at) FROM my_table GROUP
BY acct_no) latest2
ON latest.acct_no = latest2.acct_no
but that still returns all rows rather than the most recent of each.
I did have something using a join on a subquery but it took so long to run I quite it before it finished, but I have indexes on acct_no and created_at but I've also ran into other problems where columns in the select are not in the group by. I know this can be turned off but I'm trying to find a way to perform the query that doesn't require that.
Just try a little edit to your initial query:
SELECT latest.* FROM my_table latest
join (SELECT acct_no, MAX(created_at) as max_time FROM my_table GROUP
BY acct_no) latest2
ON latest.acct_no = latest2.acct_no AND latest.created_at = latest2.max_time
Trying a different approach. Not sure about the performance impact. But hoping that avoiding self join and group by would be better in terms of performance.
SELECT * FROM (
SELECT mytable1.*, IF(#temp <> acct_no, 1, 0) selector, #temp := acct_no FROM `mytable1`
JOIN (SELECT #temp := '') a
ORDER BY acct_no, created_at DESC , id DESC
) b WHERE selector = 1
Sql Fiddle
you need to get the id where max date is created.
SELECT latest.* FROM my_table latest
join (SELECT max(id) as id FROM my_table GROUP
BY acct_no where created_at = MAX(created_at)) latest2
ON latest.id = latest2.id

how to marge two sql results in one sql query?

I am working on a project where I need to write validation query to validate data.
so I have two tables 1. Input table(raw data) 2. Output table(Harmonized data)
Currently, as a validation query, I am using below two queries to fetch results & then copy both results into excel file to validate if there is any difference in data or not.
1 Query
Select Date,sum(Val),sum(Vol)
From Input_table
Group by Date
2 Query
Select Date,sum(Val),sum(Vol)
From Output_table
Group by Date
Is there any way where I can put both these results in one query and also create one calculated column like.... (sum(Input_table.VAL)-sum(Output_table.VAL)) as Validation_Check.
So output will be like:
Date | sum(Input_table.Val) | sum(Output_table.Val) | Validation_Check
thanks.
It looks like you need to full join your results like:
select
ifnull(I.[Date], O.[Date]) as Date,
I.Val as Input_Val,
O.Val as Output_Val,
ifnull(I.Val, 0) - ifnull(O.Val, 0) as Validation_Check
from
(
Select Date,sum(Val) as Val,sum(Vol) as Vol
From Input_table
Group by Date
) as I
full outer join
(
Select Date,sum(Val) as Val,sum(Vol) as Vol
From Output_table
Group by Date
) as O on O.[Date] = I.[Date]
Use UNION. This will join two query on the condition that the two query have the same datatypes in the columns.
Use this Statement:
select Date, sum(Val), sum(Vol) from (
Select Date,Val,Vol
From Input_table
union
Select Date,Val,Vol
From Input_table
)
Group by Date
This will concat the data of both tables in the inner select and then Group it to one result
SELECT Date, SUM(VAL) as SUM_VAL, SUM(VOL) as SUM_VOL, SUM(VAL-VOL) as Validation_Check from
(Select Date,val,vol
From Input_table
UNION ALL
Select Date,val, vol
From Output_table
) X
group by Date
I suggest using a UNION ALL instead of a UNION here since there may be similar results fetched from both queries.
For example, your query 1 has a result like
May 01, 2017 | 5 | 5
and your query 2 has a result with the same values
May 01, 2017 | 5 | 5
If you use union, you'd only get 1 instance of
May 01, 2017 | 5 | 5
instead of 2 instances of
May 01, 2017 | 5 | 5
May 01, 2017 | 5 | 5
If your MySQL Supports FULL JOIN then you can use
SELECT
IFNULL(a.Date, b.Date) AS Date,
SUM(IFNULL(a.Val, 0)) AS Input_Val_Sum,
SUM(IFNULL(b.Val, 0)) AS Output_Val_Sum,
SUM(IFNULL(a.Val, 0) - IFNULL(b.Val, 0)) AS Validation_Check
FROM Input_table AS a
FULL OUTER JOIN Output_table AS b
ON a.Date = b.Date
GROUP BY IFNULL(a.Date, b.Date)

Get the greatest Year value in mysql after grouping by a column

The below table contains an id and a Year and Groups
GroupingTable
id | Year | Groups
1 | 2000 | A
2 | 2001 | B
3 | 2001 | A
Now I want select the greatest year even after grouping them by the Groups Column
SELECT
id,
Year,
Groups
FROM
GroupingTable
GROUP BY
`Groups`
ORDER BY Year DESC
And below is what I am expecting even though the query above doesnt work as expected
id | Year | Groups
2 | 2001 | B
3 | 2001 | A
You need to learn how to use aggregate functions.
SELECT
MAX(Year) AS Year,
Groups
FROM
GroupingTable
GROUP BY
`Groups`
ORDER BY Year DESC
When using GROUP BY, only the column(s) you group by are unambiguous, because they have the same value on every row of the group.
Other columns return a value arbitrarily from one of the rows in the group. Actually, this is behavior of MySQL (and SQLite), but because of the ambiguity, it's an illegal query in standard SQL and all other brands of SQL implementations.
For more on this, see my answer to Reason for Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause
Your query misuses the heinously confusing nonstandard extension to GROUP BY that's built in to MySQL. Read this and weep. https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
If all you want is the year it's a snap.
SELECT MAX(Year) Year, Groups
FROM GroupingTable
GROUP BY Groups
If you want the id of the row in question, you have to do a bunch of monkey business to retrieve the column id from the above query.
SELECT a.*
FROM GroupingTable a
JOIN (
SELECT MAX(Year) Year, Groups
FROM GroupingTable
GROUP BY Groups
) b ON a.Groups = b.Groups AND a.Year = b.Year
You have to do this because the GROUP BY query yields a summary result set, and you have to join that back to the detail result set to retrieve the ID.