correct way to write a Sum query - mysql

I'm trying to find out if the code below is in the right format to retrieve the yearly sum of payments
select sum(payment)
select mem_type.mtype, member.name, payment.payment_amt
from mem_type, member, payment
where mem_type.mtype = member.mtype
and member.mem_id = payment.mem_id
group by mem_id
having payment.date > '2014-1-1' <'2014-12-31';

There's a few problems with the statement.
The keyword SELECT appears twice, and that's not valid the way you have it. (A SELECT keyword is needed in a subquery or an inline view, but otherwise, it's not valid to repeat the keyword SELECT.
The predicate in the HAVING clause isn't quite right. (MySQL may accept that as valid syntax, but it's not doing what you are wanting to do. To return rows that have a payment.date in a specific year, we'd typically specify that as predicates in the WHERE clause:
WHERE payment.date >= '2014-01-01'
AND payment.date < '2015-01-01'
Also, I'd recommend you ditch the old-school comma syntax for the join operation, and use the JOIN keyword instead, and relocate the join predicates from the WHERE clause to an ON clause. For example:
SELECT ...
FROM member
JOIN mem_type
ON mem_type.mtype = member.mtype
JOIN payment
ON payment.mem_id = member.mem_id
It's good to see that you've qualified all the column references.
Unfortunately, it's not possible to recommend the syntax that will return the resultset you are looking for. There are too many unknowns, we'd just be guessing. An example of the result you are wanting returned, from what data, that would go a long ways towards a specification.
If I had to take a blind "guess" at a query that would meet the ambiguous specification, without any knowledge of the tables, columns, datatypes, et al. my guess would be something like this:
SELECT m.mem_id
, t.mtype
, m.name
, IFNULL(SUM(p.payment_amt),0) AS total_payments_2014
FROM member m
LEFT
JOIN mem_type t
ON t.mtype = m.mtype
LEFT
JOIN payment p
ON p.mem_id = m.mem_id
WHERE p.date >= '2014-01-01'
AND p.date < '2014-01-01' + INTERVAL 1 YEAR
GROUP BY m.mem_id
This only serves as an example. This is premised on a whole lot of information that isn't provided (e.g. what is the datatype of the date column in the payment table? Do we want to exclude payments with dates of 1/1 or 12/31? Is the mem_id column unique in member table? Is mtype column unique in the mem_type table, can mem_type column in the members table be NULL, do we want all rows from the members table returned, or only those that had a payment in 2014, etc. Can the mem_id column on the payment table be NULL, are there rows in payment that we want included but which aren't related to a member? et al.

Related

SELECT from many different tables and JOIN

I need to combine a few different columns from different tables.
These are listed here. I just can't seem to get the syntax right. I'm a beginner so be patient with me!
The tables are 'report', 'mission' and 'hist_unit'
and the following values are the same
mission.id = report.mission_id
hist_unit.id = report.deployed_unit_id
Tried something along these lines
SELECT
mission_id AS mission_id,
deployed_unit_id AS depl_unit_id,
accepted AS accepted,
character_id AS character_id,
pilot_status AS pilot_status
FROM report
id AS depl_unit_id
faction AS faction
FROM hist_unit
mission.id AS mission_id
hist_date AS hist_date
FROM mission
What I want this query to do is putting together the columns shown above and checking that the values shown at the top correspond to each other.
Then I want it to show me only the lines where faction = 3 and accepted = 1.
Then I want it to show me only the entries
WHERE hist_date BETWEEN '1941-11-15 00:00:00.000' AND '1942-04-15 23:59:59:999'
Output should be something like this
mission_id,depl_unit_id,faction,character_id,pilot_status,accepted,hist_date
Something like this:
SELECT r.id AS report_id
, r.mission_id AS mission_id
, r.deployed_unit_id AS depl_unit_id
, r.accepted AS accepted
, r.character_id AS character_id
, r.pilot_status AS pilot_status
, h.id AS h_depl_unit_id
, h.faction AS faction
, m.id AS m_mission_id
FROM report r
JOIN hist_unit h
ON h.id = r.deployed_unit_id
JOIN mission m
ON m.id = r.mission_id
WHERE h.hist_date >= '1941-11-15'
AND h.hist_date < '1942-04-15' + INTERVAL 1 DAY
AND h.faction = 3
AND r.accepted = 1
ORDER
BY r.id
, r.mission_id
After the SELECT keyword, list all of the expressions to be returned.
The FROM clause references the tables. Multiple tables should be separated by JOIN keyword, and then either an ON clause. (INNER keyword is not required; CROSS keyword is not required, but can be included when there is no join condition, as an aid to the future reader; outer joins require addition of LEFT (or RIGHT keyword).
Consider assigning an alias to each table reference.
Qualify all column references with the assigned table alias (or the table name, if an alias is not assigned.)
When the AS column_name is omitted for a column in the SELECT list, the column name is used. For example, we could omit AS accepted and that fourth column would still have the name accepted.
Avoid returning multiple columns with the sane name. It's not illegal to do that, but consider modifying the column aliases (column names in the resultset) to be unique. i.e. one of the depl_unit_id columns can be renamed.
When comparing date time values in ranges, strongly consider using >= and < comparisons. Don't try to muck with less than or equal to 23:59:59.999 ... what if precision is microseconds, we are leaving a small gap. Let's avoid the gap, and just do a "less than" midnight of the next day, the first datetime value we want to exclude. (Yes, we have to specify the column name / expression twice, once for each comparison, because there is not BETWEEN_GE_LT comparison operator that substitutes a "less than" comparison for the "less than or equal to" comparison of the BETWEEN operator. That's a small price to pay for a more explicitly accurate representation of what we are attempting to achieve.

SQL Query: Showing multiple data and total order values

As part of an SQL Queries assignment, I am required to meet the following criteria:
"Display all customers who have bought anything in the last 6 months. Show >customer name, loyalty card number, date of order, and total value of order. >Ensure this is named correctly in the query results as Total_Order_Value."
For this, I came up with a script which has been marked as wrong. I am confused by the feedback as I believe I have met the question criteria.
Find the script and feedback below:
Script
SELECT Aorder.*, Acustomer.*, AorderDetails.quantity, (AorderDetails.quantity *AmenuItem.itemCost) AS Total_Order_Value
FROM Aorder, Acustomer, AmenuItem, AorderDetails
WHERE orderDateTime < Now() AND orderDateTime > DATE_ADD(Now(), INTERVAL -6 MONTH)
AND Acustomer.customerID = Aorder.customerID
AND Aorder.orderID = AorderDetails.orderID
AND AorderDetails.itemID = AmenuItem.itemID
AND Aorder.paymentType IN ('Cash' , 'Card');
Feedback
"At the moment this will multiply cost *qty for each individual item bought in one order. You need the total value for each order. I.e. at the moment I would see a value for each item I bought in one order, I would like to see the total for the whole order. You need to add an aggregate and a group by"
I would appreciate any assistance in helping me understand what went wrong and how I may structure this correctly to meet the requirements.
Thank you in advance.
Essentially, you are reporting results at the orders items level and not customers and orders level as the original question asked. Your current resultset likely repeats customers and order details for each corresponding item which can be lengthy with its one-to-many relationships.
To resolve, simply refactor your SQL statement into an aggregate query that groups on customer and order items and sums each order item's value to retrieve the total amount of whole order. Additionally heed best practices in SQL:
EXPLICIT JOIN: As mentioned in comments do not use the old-join style of commas in FROM clause with matching conditions in WHERE. This is known as implicit joins. The current standard introduced in ANSI-92 emphasizes explicit joins using JOIN and ON clauses. While this does not change efficiency or output, it does aid in readability and maintainability.
SELECT CLAUSE: Try avoiding selecting all fields in tables with Aorder.*, Acustomer.* which is an open-ended resultset output. Your question specifically asked for certain fields: customer name, loyalty card number, date of order, and total value of order. So, select them accordingly.
TABLE ALIASES: For longer table names and tables that share the same prefixes, stems, or suffixes like your A tables, use table aliases that properly abbreviates and defines your identifiers. Again this practice should not change output but aids in readability and maintainability.
See below working SQL statement (adjust field names to actuals).
SELECT c.customer_name,
c.loyalty_card_number,
CAST(o.orderDateTime AS DATE) AS Order_Date,
SUM(d.quantity * m.itemCost) AS Total_Order_Value
FROM Aorder o
INNER JOIN Acustomer c ON c.customerID = o.customerID
INNER JOIN AorderDetails d ON o.orderID = d.orderID
INNER JOIN AmenuItem m ON d.itemID = m.itemID
WHERE o.orderDateTime < Now()
AND o.orderDateTime > DATE_ADD(Now(), INTERVAL -6 MONTH)
AND o.paymentType IN ('Cash' , 'Card')
GROUP BY c.customer_name,
c.loyalty_card_number,
CAST(o.orderDateTime AS DATE)
NOTE: Do not make the mistake as many MySQL users do in excluding non-aggregated columns in GROUP BY clause of an aggregate query which is required in ANSI-SQL. Asterisks, *, should never be used in aggregate queries. MySQL unfortunately allows this feature with its ONLY FULL GROUP BY mode turned off and can return unreliable results.

SQL: Column Must Appear in the GROUP BY Clause Or Be Used in an Aggregate Function

I'm doing what I would have expected to be a fairly straightforward query on a modified version of the imdb database:
select primary_name, release_year, max(rating)
from titles natural join primary_names natural join title_ratings
group by year
having title_category = 'film' and year > 1989;
However, I'm immediately running into
"column must appear in the GROUP BY clause or be used in an aggregate function."
I've tried researching this but have gotten confusing information; some examples I've found for this problem look structurally identical to mine, where others state that you must group every single selected parameter, which defeats the whole purpose of a group as I'm only wanting to select the maximum entry per year.
What am I doing wrong with this query?
Expected result: table with 3 columns which displays the highest-rated movie of each year.
If you want the maximum entry per year, then you should do something like this:
select r.*
from ratings r
where r.rating = (select max(r2.rating) where r2.year = r.year) and
r.year > 1989;
In other words, group by is the wrong approach to writing this query.
I would also strongly encourage you to forget that natural join exists at all. It is an abomination. It uses the names of common columns for joins. It does not even use properly declared foreign key relationships. In addition, you cannot see what columns are used for the join.
While I am it, another piece of advice: qualify all column names in queries that have more than one table reference. That is, include the table alias in the column name.
If you want to display all the columns you can user window function like :
select primary_name, year, max(rating) Over (Partition by year) as rating
from titles natural
join primary_names natural join ratings
where title_type = 'film' and year > 1989;

Need a simple solutin to slow query

I have following query..
SELECT avg(h.price)
FROM `car_history_optimized` h
LEFT JOIN vin_data vd ON (concat(substr(h.vin,1,8),'_',substr(h.vin,10,3))=vd.prefix)
WHERE h.date >='2015-01-01'
AND h.date <='2015-04-01'
AND h.dealer_id <> 2389
AND vd.prefix IN
(SELECT concat(substr(h.vin,1,8),'_',substr(h.vin,10,3))
FROM `car_history_optimized` h
LEFT JOIN vin_data vd ON (concat(substr(h.vin,1,8),'_',substr(h.vin,10,3))=vd.prefix)
WHERE h.date >='2015-03-01'
AND h.date <='2015-04-01'
AND h.dealer_id =2389)
It finds the average market value of a car sold within last 3 months by everyone else other than (2389) but only those car which have the same Make, Model sold by (2389)
can above query be optimized ? it's taking 2 minutes to run for 11 million records..
Thanks
How often will you use that particular "prefix"? If often, then I will direct you toward indexing a 'virtual' column.
Otherwise, you need
INDEX(date) -- for the outer query
INDEX(dealer_id, date) -- for what is now the subquery
Then do the EXISTS as suggested, or use a LEFT JOIN ... WHERE ... IS NULL.
Is date a DATE? or a DATETIME? You may be including an extra day. Suggest this pattern:
WHERE date >= '2015-01-01'
AND date < '2015-01-01' + INTERVAL 3 MONTH
If you want a simple solution, my initial thought is to figure out a way to not have function calls in your joins.
You negatively affect the chance that an index will be helpful.
(concat(substr(h.vin,1,8),'_',substr(h.vin,10,3))=vd.prefix)
Maybe a like statement would be a better idea, however, either approach in a join clause is to be avoided.
Bottom line is your table structure & relationships here leaves room for improvement... If you need the concat because you are avoiding joining intermediate tables, don't -- allow the indexes to be used and it should improve your query performance.
Also, make sure you have indexes.
I suggest 3 things
add a column and index it (avoid the functions in the join)
use an inner join
use EXISTS (...) instead of IN (...)
To "optimize" that query you need to add a column to the table car_history_optimized which contains the result of concat(substr(vin,1,8),'_',substr(vin,10,3)) and this column should be indexed.
Also, use INNER JOIN. In the current query the left outer join is wasted because you require every row of that table to be IN (the subquery) so NULL from that table isn't permitted hence you have the same effect as an inner join.
Use EXISTS instead of IN
SELECT
AVG(h.price)
FROM car_history_optimized h
INNER JOIN vin_data vd ON h.new_column = vd.prefix
WHERE h.`date` >= '2015-01-01'
AND h.`date` <= '2015-04-01'
AND h.dealer_id <> 2389
AND EXISTS (
SELECT
NULL
FROM car_history_optimized cho
WHERE cho.`date` >= '2015-03-01'
AND cho.`date` <= '2015-04-01'
AND cho.dealer_id = 2389
AND vd.prefix = cho.new_column
)
;
By the way:
I assume already have some indexes and those include date and dealer_id
in future avoid using "date" as a column name (it's a reserved word)

OUTER JOIN -> want to return something even if empty

I try to return a group_concat on 2 tables
One being my list of schools and the other, some numeric data.
For some dates, i have NO DATA at all in the table SimpleData and so my lEFT OUTER JOINS returns 10 results where i have 11 schools (i need 11 rows for javascript treatment in order too)
here is my query (tell me if i need to give more details about tables
SELECT A.nomEcole,
A.Compteur,
IFNULL(SUM(B.rendementJour), '0') AS TOTAL,
B.jourUS,
B.rendementJour
FROM ecoles A LEFT OUTER JOIN SimpleData B ON A.Compteur = B.compteur
WHERE jourUS LIKE '2013-07-%'
GROUP BY ecole
in this example, i have no data in SimpleData for this month( not data was recorded at all)
I have to show either NULL or '0' for this missing school and i'm starting to lose my head on something easy apparently :(
Thanks for any help !
olivier
As one way is mentioned by #Abhik Chakraborty where will filter out the records which doesn't match the criteria ,another is you can use CASE statement
SELECT A.nomEcole,
A.Compteur,
SUM(CASE WHEN jourUS LIKE '2013-07-%' THEN B.rendementJour ELSE 0 END) AS TOTAL,
B.jourUS,
B.rendementJour
FROM ecoles A
LEFT OUTER JOIN SimpleData B ON A.Compteur = B.compteur
GROUP BY ecole
I suspect you just need to move the where condition to the on clause:
SELECT A.nomEcole, A.Compteur, IFNULL(SUM(B.rendementJour), 0) AS TOTAL,
B.jourUS, B.rendementJour
FROM ecoles A LEFT OUTER JOIN
SimpleData B
ON A.Compteur = B.compteur and b.jourUS >= '2013-07-01' and b.jourUS < '2013-08-01'
GROUP BY A.ecole;
Some other changes:
Don't use single quotes for numeric constants. Single quotes should really only be used for date and string constants.
Don't use like for dates. like is an operation on strings, not dates, and the date has to be implicitly converted to a string. Instead, do direct comparisons on the date ranges you are interested in.
I would also recommend that the table aliases be abbreviations for the tables you are using. This makes the query easier to read. (So e instead of A for ecoles.)
Also note that the values that you are returning for JourUS and RendementJour are indeterminate. If there are multiple rows in the B table that match, then an arbitrary value will be returned. Perhaps you want max() or group_concat() for them.
Your WHERE clause turns the LEFT OUTER JOIN into an INNER JOIN, because outer-joined records values are NULL and NULL is never LIKE '2013-07-%'.
This is the reason you must move jourUS LIKE '2013-07-%' to the ON clause, because you only want to join records where jourUS LIKE '2013-07-%' and otherwise outer join a null record.