As part of an SQL Queries assignment, I am required to meet the following criteria:
"Display all customers who have bought anything in the last 6 months. Show >customer name, loyalty card number, date of order, and total value of order. >Ensure this is named correctly in the query results as Total_Order_Value."
For this, I came up with a script which has been marked as wrong. I am confused by the feedback as I believe I have met the question criteria.
Find the script and feedback below:
Script
SELECT Aorder.*, Acustomer.*, AorderDetails.quantity, (AorderDetails.quantity *AmenuItem.itemCost) AS Total_Order_Value
FROM Aorder, Acustomer, AmenuItem, AorderDetails
WHERE orderDateTime < Now() AND orderDateTime > DATE_ADD(Now(), INTERVAL -6 MONTH)
AND Acustomer.customerID = Aorder.customerID
AND Aorder.orderID = AorderDetails.orderID
AND AorderDetails.itemID = AmenuItem.itemID
AND Aorder.paymentType IN ('Cash' , 'Card');
Feedback
"At the moment this will multiply cost *qty for each individual item bought in one order. You need the total value for each order. I.e. at the moment I would see a value for each item I bought in one order, I would like to see the total for the whole order. You need to add an aggregate and a group by"
I would appreciate any assistance in helping me understand what went wrong and how I may structure this correctly to meet the requirements.
Thank you in advance.
Essentially, you are reporting results at the orders items level and not customers and orders level as the original question asked. Your current resultset likely repeats customers and order details for each corresponding item which can be lengthy with its one-to-many relationships.
To resolve, simply refactor your SQL statement into an aggregate query that groups on customer and order items and sums each order item's value to retrieve the total amount of whole order. Additionally heed best practices in SQL:
EXPLICIT JOIN: As mentioned in comments do not use the old-join style of commas in FROM clause with matching conditions in WHERE. This is known as implicit joins. The current standard introduced in ANSI-92 emphasizes explicit joins using JOIN and ON clauses. While this does not change efficiency or output, it does aid in readability and maintainability.
SELECT CLAUSE: Try avoiding selecting all fields in tables with Aorder.*, Acustomer.* which is an open-ended resultset output. Your question specifically asked for certain fields: customer name, loyalty card number, date of order, and total value of order. So, select them accordingly.
TABLE ALIASES: For longer table names and tables that share the same prefixes, stems, or suffixes like your A tables, use table aliases that properly abbreviates and defines your identifiers. Again this practice should not change output but aids in readability and maintainability.
See below working SQL statement (adjust field names to actuals).
SELECT c.customer_name,
c.loyalty_card_number,
CAST(o.orderDateTime AS DATE) AS Order_Date,
SUM(d.quantity * m.itemCost) AS Total_Order_Value
FROM Aorder o
INNER JOIN Acustomer c ON c.customerID = o.customerID
INNER JOIN AorderDetails d ON o.orderID = d.orderID
INNER JOIN AmenuItem m ON d.itemID = m.itemID
WHERE o.orderDateTime < Now()
AND o.orderDateTime > DATE_ADD(Now(), INTERVAL -6 MONTH)
AND o.paymentType IN ('Cash' , 'Card')
GROUP BY c.customer_name,
c.loyalty_card_number,
CAST(o.orderDateTime AS DATE)
NOTE: Do not make the mistake as many MySQL users do in excluding non-aggregated columns in GROUP BY clause of an aggregate query which is required in ANSI-SQL. Asterisks, *, should never be used in aggregate queries. MySQL unfortunately allows this feature with its ONLY FULL GROUP BY mode turned off and can return unreliable results.
Related
I am working on a simple problem set, and I cannot seem to find the issue that is generating this same error: "Syntax Error in FROM Clause".
The question involves the use of various databases in this instant to find "Which employee has sold the most product?"
Here is my code
SELECT (Employees.FirstName + Employees.LastName) as Employee, SUM(Orders.Quantity)
FROM Employees, Orders
JOIN Employees ON Orders.EmployeeID=Employees.EmployeeID
JOIN OrderDetails ON Orders.OrderID=OrderDetails.OrderID
GROUP BY Employee
ORDER BY max(SUM(Quantity)) DESC;
If I am misinterpreting the use of some syntax, please let me know. I am still learning.
Thanks for your help!
When you're using ANSI JOIN you don't list all the tables in the FROM clause. Just list the first table, and the other tables are in JOIN.
You also can't nest aggregate functions as MAX(SUM(Quantity)). If you want to find the employee who sold the most, order by quantity, and use TOP 1 to get the first row.
There's no need to join with OrderDetails, since you're not using anything from that table.
The query should be:
SELECT TOP 1 (Employees.FirstName + Employees.LastName) as Employee, SUM(Orders.Quantity) AS Quantity
FROM Employees
JOIN Orders ON Orders.EmployeeID=Employees.EmployeeID
GROUP BY Employee
ORDER BY Quantity DESC;
Note that if there's a tie for the most sold, this will just show one of them. Getting all of them is more complex, because you need a second query to get that maximum. See sql HAVING max(count()) return zero rows
I've never been able to get my head around INNER JOINs (or any other JOIN types for that matter) so I'm struggling to work out how to use it in my specific situation. In fact, I'm not even sure if it's what I need. I've looked at other examples and read tutorials but my brain just doesn't seem to work the way needed to truly get it (or it doesn't function at all).
Here's the scenario:
I have two tables -
phone_numbers - this table has a list of phone numbers that
belong to lots of different customers. A single customer can have
multiple numbers. For simplicity's sake, we'll say the fields are
'number_id', 'customer_id', 'phone_number'.
call_history - this table has a record of every single call that one of these
numbers in the first table could have had. There's a record for
every individual call going back years. Again, for simplicity,
we'll say the relevant fields are customer_id, phone_number,
call_start_time.
What I'm trying to accomplish is to find all of the numbers that belong to a particular customer_id in the phone numbers table and use that information to search through the call_history table and find the number of calls each phone number has received, and group that by the number of calls for each number, preferably also showing zeros where a number hasn't received any calls at all.
The reason the zero calls is important is because that's the data I'm interested in. Otherwise, I could just get all the information out of the call_history table. But what I'm trying to achieve is find the numbers with no activity.
All I've been able to accomplish is run one query to get all of the numbers belonging to one customer:
SELECT customer_id, phone_number FROM phone_numbers WHERE customer_id = Y;
Then run a second query to get all phone calls for that customer_id for a set duration:
SELECT customer_id, phone_number, COUNT(*) FROM call_history WHERE customer_id = Y and call_start_time >= DATE_SUB(SYSDATE(), INTERVAL 30 DAY) GROUP BY phone_number;
I've then had to use the data returned from both queries and use a VLOOKUP function in Excel to match number of calls for each individual number from the second query to the list of all numbers from the first query, thus leaving blanks in my "all numbers" table and identifying those numbers that had no calls for that time period.
I'm hoping there's some way to do all of this with a single query and return a table of results, listing the zero number of calls with it and eliminate the whole manual Excel bit as it's not overly efficient and prone to human error.
Without at least a workable example from you, it's not easy to re-create your situation. Anyway, INNER JOIN might not return the result as how you expected. In my short time with MySQL, I mainly use 2 types of JOIN; one is already mentioned and the other is LEFT JOIN. From what I can understand in your question, what you want to achieve can be done by using LEFT JOIN instead of INNER JOIN. I may not be the best person to explain this to you but this is how I understand it:
INNER JOIN - only return anything that match in ON clause between two (or more) tables.
LEFT JOIN - will return everything from the table on the left side of the join and return NULL if ON get no match in the table on the right side of the join .. unless you specify some WHERE condition from something on the right table.
Now, here is my query suggestion and hopefully it'll be useful for you:
SELECT A.customer_id, A.phone_number,
SUM(CASE WHEN call_start_time >= DATE_SUB(SYSDATE(), INTERVAL 30 DAY)
THEN 1 ELSE 0 END) AS Total
FROM phone_numbers A
LEFT JOIN call_history B
ON A.customer_id=B.customer_id
GROUP BY A.customer_id,A.phone_number;
What I did here is I LEFT JOIN phone_numbers table with call_history on customer_id and I re-position the WHERE call_start_time >= .. condition into a CASE expression in the SELECT since putting it at WHERE will turn this into a normal join or inner join instead.
Here is an example fiddle : https://www.db-fiddle.com/f/hriFWqVy5RGbnsdj8i3aVG/1
For Inner join You should have to do like this way..
SELECT customer_id,phone_number FROM phone_numbers as pn,call_history as ch where pn.customer_id = ch.customer_id and call_start_time >= DATE_SUB(SYSDATE(), INTERVAL 30 DAY) GROUP BY phone_number;
Just add table name whatever you want to join and add condition
I have created a query but Diee takes a long time to get the data I do not know what it is you can simplify the query or optimize so that it runs more fluently?
SELECT
Number,
SUM(Price) AS PRICE,
Pnr,
MAX(DATE_FORMAT(orderdate,'%Y-%m-%d %H:%i:%s')) AS order,
AL,
AM
FROM
order_name,
user,
card_user
WHERE
DATE_FORMAT(orderdate,'%m%Y')
AND
order_data.cid= card_user.card_number
AND
order_data.cid = card_user.card_number
AND
order BETWEEN card_user.valid_from
AND
card_user.valid_to
AND
card_user.user_id = user.user_id
AND
order_data.BTYPE IN ('1','4')
GROUP BY
card_number,
P_NR,
DATE_FORMAT(orderdate,'%Y%m%d'),
AV,
AK;
Thank you already for the answers!
First, many things wrong with the query, but not to complain, just explain as you are obviously new. Here is your original query rewritten with more appropriate JOIN syntax. How do you get from the first (LEFT) table to the second (RIGHT) table becomes the "ON" clause of the join.
Second, always try to use table.column or alias.column in your queries. Those who do not know your table structures would otherwise be guessing at the source. What is number.. an order number, credit card number, or just some other number? Because your GROUP BY clause has CARD_NUMBER. Similarly with Pnr, AL, AM. No context on those.
Your where clause only had the conditions between the tables (now moved into respective JOIN clauses), but no actual limitations, so you are getting ALL records. Your clause on the "order" between the credit card valid from/to date, I would hope and expect all purchases with credit card be within the credit card's valid expiration date, so you are getting all data.
If you mean to get records based on when they happened, you would want your ORDER Date to be within some dates... such as with an Order date since July 1, 2019, or From Jan 1 - June 30, 2019 or similar. Your time is getting ALL records that are of "BType" either 1 or 4. We have no idea what that type indicates, but that is your only criteria filtering being applied.
Your group by should be by all non-aggregate columns in the SELECT (columns) list.
SELECT
Number,
SUM(Price) AS PRICE,
Pnr,
MAX(DATE_FORMAT(orderdate,'%Y-%m-%d %H:%i:%s')) AS order,
AL,
AM
FROM
order_data OD
JOIN card_user CU
ON OD.cid= CU.card_number
AND order BETWEEN CU.valid_from
AND CU.valid_to
JOIN user u
CU.user_id = U.user_id
WHERE
DATE_FORMAT(orderdate,'%m%Y')
AND OD.BTYPE IN ('1','4')
GROUP BY
CU.card_number,
P_NR,
DATE_FORMAT(orderdate,'%Y%m%d'),
AV,
AK;
So, to answer with a question... I would edit your original post and ask you to fix the missing references of table.columnn or alias.column, but also in simple English description.. What are you looking for. Ex: I am looking for all orders within X and Y time period. And for each person's credit card number, what was the most recent purchase. Hopefully this question can move forward with such clarification.
I'm doing what I would have expected to be a fairly straightforward query on a modified version of the imdb database:
select primary_name, release_year, max(rating)
from titles natural join primary_names natural join title_ratings
group by year
having title_category = 'film' and year > 1989;
However, I'm immediately running into
"column must appear in the GROUP BY clause or be used in an aggregate function."
I've tried researching this but have gotten confusing information; some examples I've found for this problem look structurally identical to mine, where others state that you must group every single selected parameter, which defeats the whole purpose of a group as I'm only wanting to select the maximum entry per year.
What am I doing wrong with this query?
Expected result: table with 3 columns which displays the highest-rated movie of each year.
If you want the maximum entry per year, then you should do something like this:
select r.*
from ratings r
where r.rating = (select max(r2.rating) where r2.year = r.year) and
r.year > 1989;
In other words, group by is the wrong approach to writing this query.
I would also strongly encourage you to forget that natural join exists at all. It is an abomination. It uses the names of common columns for joins. It does not even use properly declared foreign key relationships. In addition, you cannot see what columns are used for the join.
While I am it, another piece of advice: qualify all column names in queries that have more than one table reference. That is, include the table alias in the column name.
If you want to display all the columns you can user window function like :
select primary_name, year, max(rating) Over (Partition by year) as rating
from titles natural
join primary_names natural join ratings
where title_type = 'film' and year > 1989;
I'm trying to find out if the code below is in the right format to retrieve the yearly sum of payments
select sum(payment)
select mem_type.mtype, member.name, payment.payment_amt
from mem_type, member, payment
where mem_type.mtype = member.mtype
and member.mem_id = payment.mem_id
group by mem_id
having payment.date > '2014-1-1' <'2014-12-31';
There's a few problems with the statement.
The keyword SELECT appears twice, and that's not valid the way you have it. (A SELECT keyword is needed in a subquery or an inline view, but otherwise, it's not valid to repeat the keyword SELECT.
The predicate in the HAVING clause isn't quite right. (MySQL may accept that as valid syntax, but it's not doing what you are wanting to do. To return rows that have a payment.date in a specific year, we'd typically specify that as predicates in the WHERE clause:
WHERE payment.date >= '2014-01-01'
AND payment.date < '2015-01-01'
Also, I'd recommend you ditch the old-school comma syntax for the join operation, and use the JOIN keyword instead, and relocate the join predicates from the WHERE clause to an ON clause. For example:
SELECT ...
FROM member
JOIN mem_type
ON mem_type.mtype = member.mtype
JOIN payment
ON payment.mem_id = member.mem_id
It's good to see that you've qualified all the column references.
Unfortunately, it's not possible to recommend the syntax that will return the resultset you are looking for. There are too many unknowns, we'd just be guessing. An example of the result you are wanting returned, from what data, that would go a long ways towards a specification.
If I had to take a blind "guess" at a query that would meet the ambiguous specification, without any knowledge of the tables, columns, datatypes, et al. my guess would be something like this:
SELECT m.mem_id
, t.mtype
, m.name
, IFNULL(SUM(p.payment_amt),0) AS total_payments_2014
FROM member m
LEFT
JOIN mem_type t
ON t.mtype = m.mtype
LEFT
JOIN payment p
ON p.mem_id = m.mem_id
WHERE p.date >= '2014-01-01'
AND p.date < '2014-01-01' + INTERVAL 1 YEAR
GROUP BY m.mem_id
This only serves as an example. This is premised on a whole lot of information that isn't provided (e.g. what is the datatype of the date column in the payment table? Do we want to exclude payments with dates of 1/1 or 12/31? Is the mem_id column unique in member table? Is mtype column unique in the mem_type table, can mem_type column in the members table be NULL, do we want all rows from the members table returned, or only those that had a payment in 2014, etc. Can the mem_id column on the payment table be NULL, are there rows in payment that we want included but which aren't related to a member? et al.