SQL - How to list items which are below the average - mysql

I have quite a basic databast of 3 tables. "Students" "Tests" and "Scores"
For each test I need to list all students with test scores that are below the average for that test. (If that makes any sense at all)
I have an SQL query which simply prints the average score for each test.
SELECT t.Test_name, AVG(sc.Result) AS Avgscore
FROM Tests t
JOIN Scores sc ON t.id_Tests = sc.Tests_id_Tests
JOIN Students s ON sc.Students_id_Students = s.id_Students
WHERE t.id_Tests = $c"
($c is a parameter from a for loop, which is incrementing to printing out each test as a separate table)
Any help would be appreciated, thanks

Change the select list for whatever columns you want to display, but this will limit the results as you want, for a given testid (replace testXYZ with the actual test you're searching on)
SELECT t.Test_name, s.*, sc.*
FROM Tests t
JOIN Scores sc
ON t.id_Tests = sc.Tests_id_Tests
JOIN Students s
ON sc.Students_id_Students = s.id_Students
WHERE t.id_Tests = 'textXYZ'
and sc.result <
(select avg(x.result)
from scores x
where sc.Tests_id_Tests = x.Tests_id_Tests)
Note: To run this for ALL tests, and have scores limited to those that are below the average for each test, you would just leave that one line out of the where clause and run:
SELECT t.Test_name, s.*, sc.*
FROM Tests t
JOIN Scores sc
ON t.id_Tests = sc.Tests_id_Tests
JOIN Students s
ON sc.Students_id_Students = s.id_Students
WHERE sc.result <
(select avg(x.result)
from scores x
where sc.Tests_id_Tests = x.Tests_id_Tests)

For example in PostgreSQL you could use a window function like AVG(Score) OVER (GROUP BY id_Tests), but in MySQL I suggest using a subquery as follows:
SELECT Scores.*, Students.*, t.Test_name, Avgscore
FROM Scores
JOIN Students ON sc.Students_id_Students = s.id_Students
JOIN
SELECT id_Tests, t.Test_name, AVG(sc.Result) AS Avgscore
FROM Tests t
JOIN Scores sc ON t.id_Tests = sc.Tests_id_Tests
-- WHERE id_Tests = $c
GROUP BY id_Tests, t.Test_name
) avgsc ON Scores.Tests_id_Tests=avgsc.id_Tests
WHERE Scores.Result < Avgscore
Note that a student can be listed multiple times if they got below average score multiple times -- might or might not be what you want.
I commented out the line filtering the test as I guess it is easier to retrieve all tests at once, but if you insist on filtering on one test on application level then you can filter here by uncommenting it.

Related

MYSQL: Using math in SELECT with alias

I have an extremely complex SQL query that I am needing help with. Essentially, this query will see how many total assignments a student is assigned (total) and how many they have completed (completed) for the course. I need one final column that would give me the percentage of completed assignments, because I want to run a query to select all users who have completed less than 50% of their assignments.
What am I doing wrong? I am getting an error "Unknown column 'completed' in 'field list'"
Is there a better way to execute this? I am open to changing my query.
Query:
SELECT students.usid AS ID, students.firstName, students.lastName,
(
SELECT COUNT(workID) FROM assignGrades
INNER JOIN students ON students.usid = assignGrades.usid
INNER JOIN assignments ON assignments.assID = assignGrades.assID
WHERE
assignGrades.usid = ID AND
assignments.subID = 4 AND
(
assignGrades.submitted IS NOT NULL OR
(assignGrades.score IS NOT NULL AND CASE WHEN assignments.points > 0 THEN assignGrades.score ELSE 1 END > 0)
)
) AS completed,
(
SELECT COUNT(workID) FROM assignGrades
INNER JOIN students ON students.usid = assignGrades.usid
INNER JOIN assignments ON assignments.assID = assignGrades.assID
WHERE
assignGrades.usid = ID AND
assignments.subID = 4 AND
(NOW() - INTERVAL 5 HOUR) > assignments.assigned
) AS total,
(completed/total)*100 AS percentage
FROM students
INNER JOIN profiles ON profiles.usid = students.usid
INNER JOIN classes ON classes.ucid = profiles.ucid
WHERE classes.utid=2 AND percentage < 50
If I cut the (percentage) part in the SELECT statement, the query runs as expected. See below for results.
Information about the tables involved in this query:
assignGrades: Lists the student's score for each assignment.
assignments: List the assignments for each course.
students: Lists student information
classes: Lists class information
profiles: Links a student to a class
If you need to check when value is >50% but you don't need to see it, you might use a different approach using HAVING clause
SELECT (now) AS completed, (totalassignments) AS total
FROM db
HAVING (completed/total)*100 > 50;

MySQL Max() with group by and multiple tables

for my studies i need to get a code working. I do have two tables:
Training:
UserID
Date
Uebung
Gewicht
Wiederholungen
Mitglied:
UserID
Name
Vorname
and i need to display the max power which you get if you multiply 'Wiederholungen' with 'Gewicht' from the 'Training' table for EACH User with the date and name.
I know there is a "problem" with max() and group by. But i'm kinda new to MySQL and i was only able to find fixes with one table and also every column already existing. I have to join two tables AND create the power column.
I tried a lot and i think this may be my best chance
select name, vorname, x.power from
(SELECT mitglied.UserID,max( Wiederholungen*Gewicht) as Power
FROM training join mitglied
where Uebung = 'Beinstrecker'
and training.UserID = mitglied.UserID
group by training.UserID) as x
inner join (training, mitglied)
on (training.UserID = mitglied.UserID)
and x.Power = Power;
'''
I get way too many results. I know the last statement is wrong (x.power = power) but i have no clue how to solve it.
This is actually a fairly typical question here, but I am bad a searching for previous answers so....
You "start" in a subquery, finding those max values:
SELECT UserID, Uebung, MAX(Gewicht*Wiederholugen) AS Power
FROM training
WHERE Uebung = 'Beinstrecker'
GROUP BY UserID, Uebung
Then, you join that back to the table it came from to find the date(s) those maxes occurred:
( SELECT UserID, Uebung, MAX(Gewicht*Wiederholugen) AS Power
FROM training
WHERE Uebung = 'Beinstrecker'
GROUP BY UserID, Uebung
) AS maxes
INNER JOIN training AS t
ON maxes.UserID = t.UserID
AND maxes.Uebeng = t.Uebeng
AND maxes.Power = (t.Gewicht*t.Wiederholugen)
Finally, you join to mitglied to get information for the user:
SELECT m.name, m.vorname, maxes.Power
FROM ( SELECT UserID, Uebung, MAX(Gewicht*Wiederholugen) AS Power
FROM training
WHERE Uebung = 'Beinstrecker'
GROUP BY UserID, Uebung
) AS maxes
INNER JOIN training AS t
ON maxes.UserID = t.UserID
AND maxes.Uebeng = t.Uebeng
AND maxes.Power = (t.Gewicht*t.Wiederholugen)
INNER JOIN mitglied AS m ON t.UserID = m.UserID
;
Note: t.Uebung = 'Beinstrecker' could be used as a join condition instead, and might be faster; but as a matter of style I try to prevent redundant literals like that unless there is a worthwhile performance difference.

Better way to write MySQL sub-query

I have two tables in my MySQL database: allele and locus. I want to know for a given locus how many alleles there are and of those how many have the status Tentative. I currently have the following query with subquery:
SELECT COUNT(*) as alleleCount,
(SELECT COUNT(*)
FROM allele
INNER JOIN locus ON allele.LocusID = locus.PrimKey
WHERE Status = 'Tentative'
AND locus.ID = 762
) as newAlleleCount
FROM allele
INNER JOIN locus ON allele.LocusID = locus.PrimKey
WHERE locus.ID = 762
but I feel there must be a better way to write this query.
You can use SUM() using sum with condition will result in a boolean 1 or 0 so it will give you the count for your conditions
SELECT locus.ID,COUNT(*) `all_alleles_per_locus`,
SUM(Status = 'Tentative') `tentative_alleles_762`
FROM allele
INNER JOIN locus ON allele.LocusID = locus.PrimKey
GROUP BY locus.ID
One way would be to group the locus by its statuses and fetch each status's respective count; using the WITH ROLLUP modifier will add a NULL status at the end representing the total:
SELECT status, COUNT(*)
FROM allele JOIN locus ON locus.PrimKey = allele.LocusID
WHERE locus.ID = 762
GROUP BY status WITH ROLLUP
If you absolutely do not want a list of all statuses, you can instead GROUP BY status = 'Tentative' (optionally WITH ROLLUP if desired)—but it will not be sargable.

optimize Mysql: get latest status of the sale

In the following query, I show the latest status of the sale (by stage, in this case the number 3). The query is based on a subquery in the status history of the sale:
SELECT v.id_sale,
IFNULL((
SELECT (CASE WHEN IFNULL( vec.description, '' ) = ''
THEN ve.name
ELSE vec.description
END)
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
WHERE veh.id_sale = v.id_sale
AND vec.id_stage = 3
ORDER BY veh.id_record DESC
LIMIT 1
), 'x') sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
WHERE 1 =1
AND v.flag =1
AND v.id_quarters =4
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
the query delay 0.0057seg and show 1011 records.
Because I have to filter the sales by the name of the state as it would have to repeat the subquery in a where clause, I have decided to change the same query using joins. In this case, I'm using the MAX function to obtain the latest status:
SELECT
v.id_sale,
IFNULL(veh3.State3,'x') AS sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
LEFT JOIN (
SELECT veh.id_sale,
(CASE WHEN IFNULL(vec.description,'') = ''
THEN ve.name
ELSE vec.description END) AS State3
FROM t_record veh
INNER JOIN (
SELECT id_sale, MAX(id_record) AS max_rating
FROM(
SELECT veh.id_sale, id_record
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign AND vec.id_stage = 3
) m
GROUP BY id_sale
) x ON x.max_rating = veh.id_record
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
) veh3 ON veh3.id_sale = v.id_sale
WHERE v.flag = 1
AND v.id_quarters = 4
This query shows the same results (1011). But the problem is it takes 0.0753 sec
Reviewing the possibilities I have found the factor that makes the difference in the speed of the query:
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
If I remove this clause, both queries the same time delay... Why it works better? Is there any way to use this clause in the joins? I hope your help.
EDIT
I will show the results of EXPLAIN for each query respectively:
q1:
q2:
Interesting, so that little statement basically determines if there is a match between t_record.id_sale and t_sale.id_sale.
Why is this making your query run faster? Because Where statements applied prior to subSelects in the select statement, so if there is no record to go with the sale, then it doesn't bother processing the subSelect. Which is netting you some time. So that's why it works better.
Is it going to work in your join syntax? I don't really know without having your tables to test against but you can always just apply it to the end and find out. Add the keyword EXPLAIN to the beginning of your query and you will get a plan of execution which will help you optimize things. Probably the best way to get better results in your join syntax is to add some indexes to your tables.
But I ask you, is this even necessary? You have a query returning in <8 hundredths of a second. Unless this query is getting ran thousands of times an hour, this is not really taxing your DB at all and your time is probably better spent making improvements elsewhere in your application.

Query worked fine on localhost but it's very slow on our server

We've tested with 1 million records on every table, results were fine, always under 0,08.
So we implemented on our server but it's very slow there, taking up to 36 secs.
We've asked for help before to optimize the query we were running on our test machine, we detailed the basic structure of our one to many relationship:
Problems to optimize large query and tables structure
That's the final query, the one we're using after getting help on the link above:
explain
SELECT
st.sid, st.title, st.summary, st.storynotes, st.thumb, st.completed, st.wordcount, st.rid, st.date, st.updated,
stats.total_reviews, stats.total_recommendations,
(SELECT GROUP_CONCAT(CAST(catid AS CHAR)) FROM fanfiction_stories_categories WHERE sid = st.sid) as categories,
(SELECT GROUP_CONCAT(CAST(genre_id AS CHAR)) FROM fanfiction_stories_genres WHERE sid = st.sid) as genres,
(SELECT GROUP_CONCAT(CAST(warning_id AS CHAR)) FROM fanfiction_stories_warnings WHERE sid = st.sid) as warnings
FROM
fanfiction_stories st
LEFT JOIN fanfiction_stories_stats stats ON st.sid = stats.sid
JOIN fanfiction_stories_categories cat ON st.sid = cat.sid AND cat.catid = 924
WHERE validated = 1
ORDER BY updated DESC
LIMIT 0, 15
That's the explain:
http://dl.dropbox.com/u/14508898/Printscreen/stackoverflow_explain_print_003.PNG
0 rows affected, 6 rows found. Duration for 1 query: 31,356 sec.
Updated
We removed some old indexes of the previous DB structure there was at fanfiction_stories and added new indexes to fanfiction_stories_categories, now is much faster. That's the updated explain:
http://dl.dropbox.com/u/14508898/Printscreen/stackoverflow_explain_print_004.PNG
Sorry, the program that I use only format the explain table as HTML, CSV, etc, doesn't make an ASCII table to display here.
Can we optimize it even more? Any help is very appreciated.
Hi There instead of a JOIN you might be better using an explicit INNER JOIN like:
It might also be all the GROUP_CONCAT's that you are doing, they are quite memory hungry.
SELECT
st.sid, st.title, st.summary, st.storynotes, st.thumb, st.completed, st.wordcount, st.rid, st.date, st.updated,
stats.total_reviews, stats.total_recommendations,
(SELECT GROUP_CONCAT(CAST(catid AS CHAR)) FROM fanfiction_stories_categories WHERE sid = st.sid) as categories,
(SELECT GROUP_CONCAT(CAST(genre_id AS CHAR)) FROM fanfiction_stories_genres WHERE sid = st.sid) as genres,
(SELECT GROUP_CONCAT(CAST(warning_id AS CHAR)) FROM fanfiction_stories_warnings WHERE sid = st.sid) as warnings
FROM
fanfiction_stories st
LEFT JOIN fanfiction_stories_stats stats ON st.sid = stats.sid
INNER JOIN fanfiction_stories_categories cat ON st.sid = cat.sid AND cat.catid = 924
WHERE validated = 1
ORDER BY updated DESC
LIMIT 0, 15
This should work although I don't have table structures and sample data to simulate. By removing each of the (SELECT ... ) as Column and just leaving as left joins, group the entire outer query by the sid should give the same result. I think its more efficient than each subquery AS Column than normal query/join. The Group_Concat is grouped based on the "sid" at the end anyway and should retain... The only thing that might be an issue is any NULL values at the end on these concat fields which you can then wrap with IFNULL() test.
I would ensure EACH of these tables has index on the "sid" used for the join. Additionally, your main stories table to have an index on Validated for its criteria = 1.
Based on your feedback, I would shift the criteria and first table to the top by categories.. Get ONE CATEGORY first, then see what stories are associated with it. Then, from only those stories, hook up the rest of the genre, warnings, comments, etc. You obviously have a smaller set of categories, so I would hit THAT as the primary table in the query. Let me know how this works.
SELECT STRAIGHT_JOIN
st.sid,
st.title,
st.summary,
st.storynotes,
st.thumb,
st.completed,
st.wordcount,
st.rid,
st.date,
st.updated,
stats.total_reviews,
stats.total_recommendations,
GROUP_CONCAT( DISTINCT cat.catid ) categories,
GROUP_CONCAT( DISTINCT genre.genre_id ) genres,
GROUP_CONCAT( DISTINCT warn.warning_id ) as warnings
FROM
fanfiction_stories_categories cat
JOIN fanfiction_stories st
ON cat.sid = st.sid
AND st.Validated = 1
LEFT JOIN fanfiction_stories_stats stats
ON st.sid = stats.sid
LEFT JOIN fanfiction_stories_genres genre
on st.sid = genre.sid
LEFT JOIN fanfiction_stories_warnings warn
on st.sid = warn.sid
WHERE
cat.catid = 924
group by
st.sid
ORDER BY
updated DESC
LIMIT
0, 15