generate a mean for a 2-uple with MySQL - mysql

I can generate a table from records like that :
ID|Var1|Var2|Measure
1 10 13 10
1 10 15 8
1 15 13 0
...
One ID can have several Var2 that are identical. How I can generate a mean for each 2-uple ID-Var2 like that :
ID|Var2|Mean_Measure
1 13 5
1 14 8
...
2 13 7
Thank you

You would need to use a GROUP BY clause to group the rows with the same ID and Var2 together and then the AVG function calculates the average:
SELECT t.ID, t.Var2, AVG(t.Measure) AS Mean_Measure FROM YourTable t GROUP BY t.ID, t.Var2
I might add that GROUP BY will alter the output of the query quite a bit. It also adds some restrictions on the output. First off - after a group by you can only add expressions in the SELECT clause where one the following applies:
The expression is part of the GROUP BY clause
The expression is an application of an aggregate function
In the above example t.ID and t.Var2 exists in the GROUP BY clause and AVG(t.Measure) is an application of the aggregate function AVG on t.Measure.
When dealing with WHERE clauses and GROUP BY there's also some things to note:
WHERE is applied after the GROUP BY this means generally that expressions not in GROUP BY cannot be used in the WHERE clause
If you wish to filter data before the GROUP BY use HAVING instead of WHERE
I hope this makes sense - and for more and better information on how GROUP BYs work - I'd suggest consulting the MySQL manual on the topic.

Related

Can Someone explain the order of execution of ORDER BY and SELECT in MySQL when a counting variable is used in SELECT?

Suppose the table is:
Data
id
val
0
4
1
7
2
2
3
1
4
9
consider the query:
SET #r:=0;
SELECT val, #r:=#r+1 as row_num
FROM Data
ORDER BY val;
Now, I have read that ORDER BY executed after SELECT then this query should give output as below:
val
row_num
1
4
2
3
4
1
7
2
9
5
BUT the query gives output as below:
val
row_num
1
1
2
2
4
3
7
4
9
5
It is like ORDER BY executed first and then the SELECT executed. How the query is executing actually?
There are several comments to make here. First, the perceived "order" in your table, as you have pasted it above, does not really exist. SQL tables are based on unordered tuples of records. Regarding your observation, the output you are seeing is consistent with the ordering you specified in the ORDER BY clause. If you want the original ordering, then you should maintain a second column which preserves this ordering.
By the way, with the advent of MySQL 8+, you may now take advantage of the analytic functions. In this case, ROW_NUMBER comes in handy:
SELECT val, ROW_NUMBER() OVER (ORDER BY val) row_num
FROM data
ORDER BY val;
The user variables in your question are heading towards being deprecated, so the above version is the way go here.

How to display column created in the SQL correlated subquery?

Im working with sql in an intro to databases course, and am having some troubles with a question.
I have a database that needs to be displayed as
ID Name Description Code MaximumStudents StudentCount
1 Computer Science 310 SQL NULL CS-HU310 10 8
3 Communications NULL COMM113 5 4
4 English NULL ENG101 4 6
5 Math NULL MA030 5 3
6 Electrical NULL ECE230 10 0
However when I run the following query
SELECT *
FROM Class
WHERE EXISTS
(SELECT COUNT(DISTINCT ClassStudent.StudentID) AS StudentCount
FROM ClassStudent
WHERE ClassID=c.ID);
I am unable to get that last "StudentCount" column, even though the subquery is
cycling through the select statement.
Is there a way to get this to work how I intend to?
I tried to add my select statement from the subquery into my list of columns directly after the first select statement, but this is a little repetitive because it would work fine without using the correlated subquery.
Move the correlated subquery to the select:
SELECT c.*,
(SELECT COUNT(DISTINCT cs.StudentID) AS StudentCount
FROM ClassStudent cs
WHERE cs.ClassID = c.ID
)
FROM Class c;
EXISTS checks if a subquery returns any rows. An aggregation query with no GROUP BY always returns one row. Hence, your WHERE is doing nothing. Hence I removed it.

Complex SQL Select query with inner join

My SQL query needs to return a list of values alongside the date, but with my limited knowledge I have only been able to get this far.
This is my SQL:
select lsu_students.student_grouping,lsu_attendance.class_date,
count(lsu_attendance.attendance_status) AS count
from lsu_attendance
inner join lsu_students
ON lsu_students.student_grouping="Central1A"
and lsu_students.student_id=lsu_attendance.student_id
where lsu_attendance.attendance_status="Present"
and lsu_attendance.class_date="2015-02-09";
This returns:
student_grouping class_date count
Central1A 2015-02-09 23
I want it to return:
student_grouping class_date count
Central1A 2015-02-09 23
Central1A 2015-02-10 11
Central1A 2015-02-11 21
Central1A 2015-02-12 25
This query gets the list of the dates according to the student grouping:
select distinct(class_date)from lsu_attendance,lsu_students
where lsu_students.student_grouping like "Central1A"
and lsu_students.student_id = lsu_attendance.student_id
order by class_date
I think you just want a group by:
select s.student_grouping, a.class_date, count(a.attendance_status) AS count
from lsu_attendance a inner join
lsu_students s
ON s.student_grouping = 'Central1A' and
s.student_id = a.student_id
where a.attendance_status = 'Present'
group by s.student_grouping, a.class_date;
Comments:
Using single quotes for string constants, unless you have a good reason.
If you want a range of class dates, then use a where with appropriate filtering logic.
Notice the table aliases. The query is easier to write and to read.
I added student grouping to the group by. This would be required by any SQL engine other than MySQL.
Just take out and lsu_attendance.class_date="2015-02-09" or change it to a range, and then add (at the end) GROUP BY lsu_students.student_grouping,lsu_attendance.class_date.
The group by clause is what you're looking for, to limit aggregates (e.g. the count function) to work within each group.
To get the number of students present in each group on each date, you would do something like this:
select student_grouping, class_date, count(*) as present_count
from lsu_students join lsu_attendance using (student_id)
where attendance_status = 'Present'
group by student_grouping, class_date
Note: for your example, using is simpler than on (if your SQL supports it), and putting the table name before each field name isn't necessary if the column name doesn't appear in more than one table (though it doesn't hurt).
If you want to limit which data rows get included, put your constraints get in the where clause (this constrains which rows are counted). If you want to constrain the aggregate values that are displayed, you have to use the having clause. For example, to see the count of Central1A students present each day, but only display those dates where more than 20 students showed up:
select student_grouping, class_date, count(*) as present_count
from lsu_students join lsu_attendance using (student_id)
where attendance_status = 'Present' and student_grouping = 'Central1A'
group by student_grouping, class_date
having count(*) > 20

Limit On Accumulated Column in MySQL

I'm trying to find an elegant way to write a query that only returns enough rows for a certain column to add up to at least n.
For example, let's say n is 50, and the table rows look like this:
id count
1 12
2 13
3 5
4 18
5 14
6 21
7 13
Then the query should return:
id count
1 12
2 13
3 5
4 18
5 14
Because the counts column adds up to n > 50. (62, to be exact)
It must return the results consecutively starting with the smallest id.
I've looked a bit into accumulators, like in this one: MySQL select "accumulated" column
But AFAIK, there is no way to have the LIMIT clause in an SQL query limit on an SUM instead of a row count.
I wish I could say something like this, but alas, this is not valid SQL:
SELECT *
FROM elements
LIMIT sum(count) > 50
Also, please keep in my the goal here is to insert the result of this query into another table atomically in an automated, performance efficient fashion, so please no suggestions to use a spreadsheet or anything that's not SQL compatible.
Thanks
There are many ways to do this. One is by using Correlated Subquery
SELECT id,
count
FROM (SELECT *,
(SELECT Isnull(Sum(count), 0)
FROM yourtable b
WHERE b.id < a.id) AS Run_tot
FROM yourtable a) ou
WHERE Run_tot < 50

Mysql subquery with sum causing problems

This is a summary version of the problems I am encountering, but hits the nub of my problem. The real problem involves huge UNION groups of monthly data tables, but the SQL would be huge and add nothing. So:
SELECT entity_id,
sum(day_call_time) as day_call_time
from (
SELECT entity_id,
sum(answered_day_call_time) as day_call_time
FROM XCDRDNCSum201108
where (day_of_the_month >= 10 AND day_of_the_month<=24)
and LPAD(core_range,4,"0")="0987"
and LPAD(subrange,3,"0")="654"
and SUBSTR(LPAD(core_number,7,"0"),4,7)="3210"
) as summary
is the problem: when the table in the subquery XCDRDNCSum201108 returns no rows, because it is a sum, the column values contain null. And entity_id is part of the primary key, and cannot be null.
If I take out the sum, and just query entity_id, the subquery contains no rows, and thus the outer query does not fail, but when I use sum, I get error 1048 Column 'entity_id' cannot be null
how do I work around this problem ? Sometimes there is no data.
You are completely overworking the query... pre-summing inside, then summing again outside. In addition, I understand you are not a DBA, but if you are ever doing an aggregation, you TYPICALLY need the criteria that its grouped by. In the case presented here, you are getting sum of calls for all entity IDs. So you must have a group by any non-aggregates. However, if all you care about is the Grand total WITHOUT respect to the entity_ID, then you could skip the group by, but would also NOT include the actual entity ID...
If you want inclusive to show actual time per specific entity ID...
SELECT
entity_id,
sum(answered_day_call_time) as day_call_time,
count(*) number_of_calls
FROM
XCDRDNCSum201108
where
(day_of_the_month >= 10 AND day_of_the_month<=24)
and LPAD(core_range,4,"0")="0987"
and LPAD(subrange,3,"0")="654"
and SUBSTR(LPAD(core_number,7,"0"),4,7)="3210"
group by
entity_id
This would result in something like (fictitious data)
Entity_ID Day_Call_Time Number_Of_Calls
1 10 3
2 45 4
3 27 2
If all you cared about were the total call times
SELECT
sum(answered_day_call_time) as day_call_time,
count(*) number_of_calls
FROM
XCDRDNCSum201108
where
(day_of_the_month >= 10 AND day_of_the_month<=24)
and LPAD(core_range,4,"0")="0987"
and LPAD(subrange,3,"0")="654"
and SUBSTR(LPAD(core_number,7,"0"),4,7)="3210"
This would result in something like (fictitious data)
Day_Call_Time Number_Of_Calls
82 9
Would:
sum(answered_day_call_time) as day_call_time
changed to
ifnull(sum(answered_day_call_time),0) as day_call_time
work? I'm assuming mysql here but the coalesce function would/should work too.