I have an extremely complex SQL query that I am needing help with. Essentially, this query will see how many total assignments a student is assigned (total) and how many they have completed (completed) for the course. I need one final column that would give me the percentage of completed assignments, because I want to run a query to select all users who have completed less than 50% of their assignments.
What am I doing wrong? I am getting an error "Unknown column 'completed' in 'field list'"
Is there a better way to execute this? I am open to changing my query.
Query:
SELECT students.usid AS ID, students.firstName, students.lastName,
(
SELECT COUNT(workID) FROM assignGrades
INNER JOIN students ON students.usid = assignGrades.usid
INNER JOIN assignments ON assignments.assID = assignGrades.assID
WHERE
assignGrades.usid = ID AND
assignments.subID = 4 AND
(
assignGrades.submitted IS NOT NULL OR
(assignGrades.score IS NOT NULL AND CASE WHEN assignments.points > 0 THEN assignGrades.score ELSE 1 END > 0)
)
) AS completed,
(
SELECT COUNT(workID) FROM assignGrades
INNER JOIN students ON students.usid = assignGrades.usid
INNER JOIN assignments ON assignments.assID = assignGrades.assID
WHERE
assignGrades.usid = ID AND
assignments.subID = 4 AND
(NOW() - INTERVAL 5 HOUR) > assignments.assigned
) AS total,
(completed/total)*100 AS percentage
FROM students
INNER JOIN profiles ON profiles.usid = students.usid
INNER JOIN classes ON classes.ucid = profiles.ucid
WHERE classes.utid=2 AND percentage < 50
If I cut the (percentage) part in the SELECT statement, the query runs as expected. See below for results.
Information about the tables involved in this query:
assignGrades: Lists the student's score for each assignment.
assignments: List the assignments for each course.
students: Lists student information
classes: Lists class information
profiles: Links a student to a class
If you need to check when value is >50% but you don't need to see it, you might use a different approach using HAVING clause
SELECT (now) AS completed, (totalassignments) AS total
FROM db
HAVING (completed/total)*100 > 50;
Related
I am not able to further select from a joined subquery.
I have data in three tables: "events", "records" and "work_list". Each table has one piece of the puzzle where work_list is the shortest and contains top-level data, and the events table tracks many tiny frequent events.
I need to calculate many statistical variables from the events based on some key variables defined in work_list like weighted moving average etc. I have those metrics ready and working, but I have problems filtering the data in events based on selected parameters stored in work_list.
Here is code that does not work. The SELECT * is not important, I will change it to be more meaningful later, it is for clarity. However, I have tried many selections in place of the * without success.
What is wrong with this query from subquery?
Query example 1:
SELECT * FROM
(SELECT events.id, events.type,events.timestamp, work_list.task
FROM
( events
INNER JOIN records ON events.record_id = records.id
INNER JOIN work_list ON records.work_list_id = work_list.id
)
WHERE work_list.customer_number = '1234' AS subquery
);
#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'as subquery ) LIMIT 0, 25' at line 8
The inner joined subquery works and it returns a normal table.
Query example 2:
SELECT events.id, events.type,events.timestamp, work_list.task
FROM (
events
INNER JOIN records ON events.record_id = records.id
INNER JOIN work_list ON records.work_list_id = work_list.id
)
WHERE work_list.customer_number = '1234';
I tried using parenthesis in different orders, and I changed selected variables in SELECT events.id, events.type,events.timestamp, work_list.task. I wonder if this is a poor way of doing this. I have the calculation part. So even if there might be better structures for this, I am interested in solutions that maintain this structure.
The goal of this phase is to filter the events table for further queries that are coded on top of it replacing the SELECT *.
These are the final calculations made earlier which I plan to use when I figure out the problem with Query example 1.
Query example 3:
SELECT *, ((SUM(rate * diff) OVER (ORDER BY startTime
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)) /
(SUM(diff) OVER(ORDER BY startTime
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW))) as rate_WMA
FROM (
SELECT id, startTime, counts, diff, (counts / diff)*3600 as rate
FROM (
SELECT id, TIMESTAMPDIFF(SECOND, MIN(timestamp), MAX(timestamp))AS diff, SUM(change) as counts, MIN(timestamp) as startTime
FROM `the filered subquery here`
GROUP BY id
) AS subquery
WHERE diff > 0
) AS totaltotal;
You have extra parenthesis (no need for those) and the alias for the subquery should be placed after the subquery:
SELECT *
FROM (
SELECT events.id, events.type,events.timestamp, work_list.task
FROM events
INNER JOIN records ON events.record_id = records.id
INNER JOIN work_list ON records.work_list_id = work_list.id
WHERE work_list.customer_number = '1234'
) AS subquery;
http://sqlfiddle.com/#!9/e6effb/1
I'm trying to get a top 10 by revenue per brand for France on december.
There are 2 tables (first table has date, second table has brand and I'm trying to join them)
I get this error "FUNCTION db_9_d870e5.SUM does not exist. Check the 'Function Name Parsing and Resolution' section in the Reference Manual"
Is my use of Inner join there correct?
It's because you had an extra space after SUM. Please change it from
SUM (o1.total_net_revenue)to SUM(o1.total_net_revenue).
See more about it here.
Also after correcting it, your query still had more error as you were not selecting order_id on your intermediate table i2 so edited here as :
SELECT o1.order_id, o1.country, i2.brand,
SUM(o1.total_net_revenue)
FROM orders o1
INNER JOIN (
SELECT i1.brand, SUM(i1.net_revenue) AS total_net_revenue,order_id
FROM ordered_items i1
WHERE i1.country = 'France'
GROUP BY i1.brand
) i2
ON o1.order_id = i2.order_id AND o1.total_net_revenue = i2.total_net_revenue
AND o1.total_net_revenue = i2.total_net_revenue
WHERE o1.country = 'France' AND o1.created_at BETWEEN '2016-12-01' AND '2016-12-31'
GROUP BY 1,2,3
ORDER BY 4
LIMIT 10`
--EDIT stack Fan is correct that the o2.total_net_revenue exists. My confusion was because the data structure duplicated three columns between the tables, including one that was being looked for.
There were a couple errors with your SQL statement:
1. You were referencing an invalid column in your outer-select-SUM function. I believe you're actually after i2.total_net_revenue.
The table structure is terrible, the "important" columns (country, revenue, order_id) are duplicated between the two tables. I would also expect the revenue columns to share the same name, if they always have the same values in them. In the example, there's no difference between i1.net_revenue and o1.total_net_revenue.
In your inner join, you didn't reference i1.order_id, which meant that your "on" clause couldn't execute correctly.
PROTIP:
When you run into an issue like this, take all the complicated bits out of your query and get the base query working correctly first. THEN add your functions.
PROTIP:
In your GROUP BY clause, reference the actual columns, NOT the column numbers. It makes your query more robust.
This is the query I ended up with:
SELECT o1.order_id, o1.country, i2.brand,
SUM(i2.total_net_revenue) AS total_rev
FROM orders o1
INNER JOIN (
SELECT i1.order_id, i1.brand, SUM(i1.net_revenue) AS total_net_revenue
FROM ordered_items i1
WHERE i1.country = 'France'
GROUP BY i1.brand
) i2
ON o1.order_id = i2.order_id AND o1.total_net_revenue = i2.total_net_revenue
AND o1.total_net_revenue = i2.total_net_revenue
WHERE o1.country = 'France' AND o1.created_at BETWEEN '2016-12-01' AND '2016-12-31'
GROUP BY o1.order_id, o1.country, i2.brand
ORDER BY total_rev
LIMIT 10
Take the following:
SELECT
Count(a.record_id) AS newrecruits
,a.studyrecord_id
FROM
visits AS a
INNER JOIN
(
SELECT
record_id
, MAX(modtime) AS latest
FROM
visits
GROUP BY
record_id
) AS b
ON (a.record_id = b.record_id) AND (a.modtime = b.latest)
WHERE (((a.visit_type_id)=1))
GROUP BY a.studyrecord_id;
I want to amend the COUNT part to display a zero if there are no records since I assume COUNT will evaluate to Null.
I have tried the following but still get no results:
IIF(ISNULL(COUNT(a.record_id)),0,COUNT(a.record_id)) AS newrecruits
Is this an issue because the join is on record_id? I tried changing the INNER to LEFT but also received no results.
Q
How do I get the above to evaluate to zero if there are no records matching the criteria?
Edit:
To give a little detail to the reasoning.
The studies table contains a field called 'original_recruits' based on activity before use of the database.
The visits tables tracks new_recruits (Count of records for each study).
I combine these in another query (original_recruits + new_recruits)- If there have been no new recruits I still need to display the original_recruits so if there are no records I need it to evalulate to zero instead of null so the final sum still works.
It seems like you want to count records by StudyRecords.
If you need a count of zero when you have no records, you need to join to a table named StudyRecords.
Did you have one? Else this is a nonsense to ask for rows when you don't have rows!
Let's suppose the StudyRecords exists, then the query should look like something like this :
SELECT
Count(a.record_id) AS newrecruits -- a.record_id will be null if there is zero count for a studyrecord, else will contain the id
sr.Id
FROM
visits AS a
INNER JOIN
(
SELECT
record_id
, MAX(modtime) AS latest
FROM
visits
GROUP BY
record_id
) AS b
ON (a.record_id = b.record_id) AND (a.modtime = b.latest)
LEFT OUTER JOIN studyrecord sr
ON sr.Id = a.studyrecord_id
WHERE a.visit_type_id = 1
GROUP BY sr.Id
I solved the problem by amending the final query where I display the result of combining the original and new recruits to include the IIF there.
SELECT
a.*
, IIF(IsNull([totalrecruits]),consents,totalrecruits)/a.target AS prog
, IIf(IsNull([totalrecruits]),consents,totalrecruits) AS trecruits
FROM
q_latest_studies AS a
LEFT JOIN q_totalrecruitment AS b
ON a.studyrecord_id=b.studyrecord_id
;
In the following query, I show the latest status of the sale (by stage, in this case the number 3). The query is based on a subquery in the status history of the sale:
SELECT v.id_sale,
IFNULL((
SELECT (CASE WHEN IFNULL( vec.description, '' ) = ''
THEN ve.name
ELSE vec.description
END)
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
WHERE veh.id_sale = v.id_sale
AND vec.id_stage = 3
ORDER BY veh.id_record DESC
LIMIT 1
), 'x') sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
WHERE 1 =1
AND v.flag =1
AND v.id_quarters =4
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
the query delay 0.0057seg and show 1011 records.
Because I have to filter the sales by the name of the state as it would have to repeat the subquery in a where clause, I have decided to change the same query using joins. In this case, I'm using the MAX function to obtain the latest status:
SELECT
v.id_sale,
IFNULL(veh3.State3,'x') AS sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
LEFT JOIN (
SELECT veh.id_sale,
(CASE WHEN IFNULL(vec.description,'') = ''
THEN ve.name
ELSE vec.description END) AS State3
FROM t_record veh
INNER JOIN (
SELECT id_sale, MAX(id_record) AS max_rating
FROM(
SELECT veh.id_sale, id_record
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign AND vec.id_stage = 3
) m
GROUP BY id_sale
) x ON x.max_rating = veh.id_record
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
) veh3 ON veh3.id_sale = v.id_sale
WHERE v.flag = 1
AND v.id_quarters = 4
This query shows the same results (1011). But the problem is it takes 0.0753 sec
Reviewing the possibilities I have found the factor that makes the difference in the speed of the query:
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
If I remove this clause, both queries the same time delay... Why it works better? Is there any way to use this clause in the joins? I hope your help.
EDIT
I will show the results of EXPLAIN for each query respectively:
q1:
q2:
Interesting, so that little statement basically determines if there is a match between t_record.id_sale and t_sale.id_sale.
Why is this making your query run faster? Because Where statements applied prior to subSelects in the select statement, so if there is no record to go with the sale, then it doesn't bother processing the subSelect. Which is netting you some time. So that's why it works better.
Is it going to work in your join syntax? I don't really know without having your tables to test against but you can always just apply it to the end and find out. Add the keyword EXPLAIN to the beginning of your query and you will get a plan of execution which will help you optimize things. Probably the best way to get better results in your join syntax is to add some indexes to your tables.
But I ask you, is this even necessary? You have a query returning in <8 hundredths of a second. Unless this query is getting ran thousands of times an hour, this is not really taxing your DB at all and your time is probably better spent making improvements elsewhere in your application.
Based on my research, this is a very common problem which generally has a fairly simple solution. My task is to alter several queries from get all results into get top 3 per group. At first this was going well and I used several recommendations and answers from this site to achieve this (Most Viewed Products). However, I'm running into difficulty with my last one "Best Selling Products" because of multiple joins.
Basically, I need to get all products in order by # highest sales per product in which the maximum products per vendor is 3 I've got multiple tables being joined to create the original query, and each time I attempt to use the variables to generate rankings it produces invalid results. The following should help better understand the issue (I've removed unnecessary fields for brevity):
Product Table
productid | vendorid | approved | active | deleted
Vendor Table
vendorid | approved | active | deleted
Order Table
orderid | `status` | deleted
Order Items Table
orderitemid | orderid | productid | price
Now, my original query to get all results is as follows:
SELECT COUNT(oi.price) AS `NumSales`,
p.productid,
p.vendorid
FROM products p
INNER JOIN vendors v ON (p.vendorid = v.vendorid)
INNER JOIN orders_items oi ON (p.productid = oi.productid)
INNER JOIN orders o ON (oi.orderid = o.orderid)
WHERE (p.Approved = 1 AND p.Active = 1 AND p.Deleted = 0)
AND (v.Approved = 1 AND v.Active = 1 AND v.Deleted = 0)
AND o.`Status` = 'SETTLED'
AND o.Deleted = 0
GROUP BY oi.productid
ORDER BY COUNT(oi.price) DESC
LIMIT 100;
Finally, (and here's where I'm stumped), I'm trying to alter the above statement such that I received only the top 3 product (by # sold) per vendor. I'd add what I have so far, but I'm embarrassed to do so and this question is already a wall of text. I've tried variables but keep getting invalid results. Any help would be greatly appreciated.
Even though you specify LIMIT 100, this type of query will require a full scan and table to be built up, then every record inspected and row numbered before finally filtering for the 100 that you want to display.
select
vendorid, productid, NumSales
from
(
select
vendorid, productid, NumSales,
#r := IF(#g=vendorid,#r+1,1) RowNum,
#g := vendorid
from (select #g:=null) initvars
CROSS JOIN
(
SELECT COUNT(oi.price) AS NumSales,
p.productid,
p.vendorid
FROM products p
INNER JOIN vendors v ON (p.vendorid = v.vendorid)
INNER JOIN orders_items oi ON (p.productid = oi.productid)
INNER JOIN orders o ON (oi.orderid = o.orderid)
WHERE (p.Approved = 1 AND p.Active = 1 AND p.Deleted = 0)
AND (v.Approved = 1 AND v.Active = 1 AND v.Deleted = 0)
AND o.`Status` = 'SETTLED'
AND o.Deleted = 0
GROUP BY p.vendorid, p.productid
ORDER BY p.vendorid, NumSales DESC
) T
) U
WHERE RowNum <= 3
ORDER BY NumSales DESC
LIMIT 100;
The approach here is
Group by to get NumSales
Use variables to row number the sales per vendor/product
Filter the numbered dataset to allow for a max of 3 per vendor
Order the remaining by NumSales DESC and return only 100
I like this elegant solution, however when I run an adapted but similar query on my dev machine I get a non-deterministic result-set returned. I believe this is due to the way the MySql optimiser deals with assigning and reading user variables within the same statement.
From the docs:
As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get the results you expect, but this is not guaranteed. The order of evaluation for expressions involving user variables is undefined and may change based on the elements contained within a given statement; in addition, this order is not guaranteed to be the same between releases of the MySQL Server.
Just adding this note here in case someone else comes across this weird behaviour.
The answer given by #RichardTheKiwi worked great and got me 99% of the way there! I am using MySQL and was only getting the first row of each group marked with a row number, while the rest of the rows remained NULL. This resulted in the query returning only the top hit for each group rather than the first three rows. To fix this, I had to initialize #r in the initvars subquery. I changed,
from (select #g:=null) initvars
to
from (select #g:=null, #r:=null) initvars
You could also initialize #r to 0 and it would work the same. And for those less familiar with this type of syntax, the additional section is reading through each sorted group and if a row has the same vendorid as the previous row, which is tracked with the #g variable, it increments the row number, which is stored in the variable #r. When this process reaches the next group with a new vendorid, the IF statement will no longer evaluate as true and the #r variable (and thereby the RowNum) will be reset to 1.