Mysql summary query with date range, multiple tables - mysql

Im running a sql query that is returning results between dates I have selected (2012-07-01 - 2012-08-01). I can tell from the values they are wrong though.
Im confused cause its not telling me I have a syntax error but the values returned are wrong.
The dates in my database are stored in the date column in the format YYYY-mm-dd.
SELECT `jockeys`.`JockeyInitials` AS `Initials`, `jockeys`.`JockeySurName` AS Lastname`,
COUNT(`runs`.`JockeysID`) AS 'Rides',
COUNT(CASE
WHEN `runs`.`Finish` = 1 THEN 1
ELSE NULL
END
) AS `Wins`,
SUM(`runs`.`StakeWon`) AS 'Winnings'
FROM runs
INNER JOIN jockeys ON runs.JockeysID = jockeys.JockeysID
INNER JOIN races ON runs.RacesID = races.RacesID
WHERE `races`.`RaceDate` >= STR_TO_DATE('2012,07,01', '%Y,%m,%d')
AND `races`.`RaceDate` <= STR_TO_DATE('2012,08,01', '%Y,%m,%d')
GROUP BY `jockeys`.`JockeySurName`
ORDER BY `Wins` DESC`

It's hard to guess what the problem is from your question.
Are you looking to summarize all the races in July and the races on the first of August? That's a slightly strange date range.
You should try the following kind of date-range selection if you want to be more precise. You MUST use it if your races.RaceDate column is a DATETIME expression.
WHERE `races`.`RaceDate` >= STR_TO_DATE('2012,07,01', '%Y,%m,%d')
AND `races`.`RaceDate` < STR_TO_DATE('2012,08,01', '%Y,%m,%d') + INTERVAL 1 DAY
This will pick up the July races and the races at any time on the first of August.
But, it's possible you're looking for just the July races. In that case you might try:
WHERE `races`.`RaceDate` >= STR_TO_DATE('2012,07,01', '%Y,%m,%d')
AND `races`.`RaceDate` < STR_TO_DATE('2012,07,01', '%Y,%m,%d') + INTERVAL 1 MONTH
That will pick up everything from midnight July 1, inclusive, to midnight August 1 exclusive.
Also, you're not using GROUP BY correctly. When you summarize, every column in your result set must either be a summary (SUM() or COUNT() or some other aggregate function) or mentioned in your GROUP BY clause. Some DBMSs enforce this. MySQL just rolls with it and gives strange results. Try this expression.
GROUP BY `jockeys`.`JockeyInitials`,`jockeys`.`JockeySurName`

My best guess is that the jocky surnames are not unique. Try changing the group by expression to:
group by `jockeys`.`JockeyInitials`, `jockeys`.`JockeySurName`
In general, it is bad practice to include columns in the SELECT clause of an aggregation query that are not included in the GROUP BY line. You can do this in MySQL (but not in other databases), because of a (mis)feature called Hidden Columns.

Related

Why this WHERE condition that have to select records in a specified timeframe is not working as expected?

I am not so into database and I have the following problem working on this MySql query:
SELECT
CCMD.id AS crop_calendar_message_details_id,
CCMD.broadcasting_start_date AS broadcasting_start_date,
CCMD.broadcasting_end_date AS broadcasting_end_date,
CCMD.creation_date AS creation_date,
CCM.id AS message_id,
CCM.content_en AS content_en,
IFNULL(CCMN.content, CCM.content_en) AS content,
CCMN.audio_link AS audio_link,
CCMD.crop_action_details_id AS crop_action_details_id
FROM CropCalendarMessageDetails AS CCMD
INNER JOIN CropCalendarMessage AS CCM
ON CCMD.crop_calendar_message_id = CCM.id
LEFT JOIN CropCalendarMessageName AS CCMN
ON CCMN.crop_calendar_message_id = CCM.id AND CCMN.language_id = :language_id
INNER JOIN CropActionDetails AS CAD
ON CCMD.crop_action_details_id = CAD.id
WHERE
CCMD.commodity_id = 10
AND
CCMD.country_id = 2
AND
CAD.id = :cad_id
AND
CCMD.broadcasting_start_date >= CURDATE()
AND
CURDATE() <= CCMD.broadcasting_end_date
ORDER BY CCMD.broadcasting_start_date
I have some records that have the following fixed values for thes date fields:
CCMD.broadcasting_start_date = 22/12/2018 23:59:00
CCMD.broadcasting_end_date = 30/05/2018
So in theory my query should skip these values because I have set this section on my WHERE clause:
AND
CCMD.broadcasting_start_date >= CURDATE()
AND
CURDATE() <= CCMD.broadcasting_end_date
The problem is that these records are returned by my query so this dates filter condition is not working.
Why? What is wrong? What am I missing? How can I fix it?
When dealing with date/time values and querying, I personally have always tried to apply >= and < the boundaries. For example, if you wanted something for All activity within March, 2018, I would do
where '2018-03-01' <= DateTimeField
AND DateTimeField < '2018-04-01'
By doing greater or equal to the start of just a date, you get from midnight all the way through the date period. As for the ending date, I always go LESS than the following day (hence April 1st). So I get everything up to Mar 31 at 11:59:59pm.
This way you also dont need to mess with date conversion functions just to ensure something is on the same day or time-portions thereof.
Might this help in resolving the date/time considerations of your query.

MYSQL updating a table containing a join and subquery

I am relatively new to SQL, i am trying to update monthly salary based on employees working for a certain duration, the query displays the data using info from the person and employee table but it won't update, i keep getting a 'operand should contain 1 column' error? How would i go about displaying all the data and be able to update the monthly_salary column as well? Thanks.
UPDATE employee ep set monthly_salary = monthly_salary*1.15 = all(
SELECT p.person_id, p.name_first, p.name_last, ep.monthly_salary, ep.start_date, curdate() as today_date,
TIMESTAMPDIFF(month,ep.start_date,curdate()) as duration_months
FROM employee ep
INNER JOIN person p ON ep.person_id = p.person_id having duration_months > 24);
query result
I want this expected result but the monthly salary hasn't been updated yet, is it possible to display this and update the monthly_salary?
You are not able to do both in a single query. Typically one would run a "select query" to inspect if the desired logic appears correct, e.g.
SELECT
p.person_id
, p.name_first
, p.name_last
, ep.start_date
, curdate() as today_date
, TIMESTAMPDIFF(month,ep.start_date,curdate()) as duration_months
FROM employee ep
INNER JOIN person p ON ep.person_id = p.person_id
WHERE ep.start_date < curdate() - INTERVAL 24 MONTH
;
In that query the important piece of logic is the where clause which seeks out any employees with a start date earlier than today - 24 months.
If that logic is correct, then apply the same logic in an "update query":
UPDATE employee ep
SET monthly_salary = monthly_salary*1.15
WHERE ep.start_date < curdate() - INTERVAL 24 MONTH
;
Syntax notes:
you cannot string multiple conditions together using multiple equality operators (monthly_salary = monthly_salary*1.15 = all(...) there are 2 = signs in that
x = all() requires that all values returned by a subquery will equal x
the having clause is NOT just a substitute for a where clause. A having clause is designed for evaluating aggregated data e.g. having count(*) > 2
Finally, while it was inventive to use the having clause, what you were doing was gaining access to the alias 'duration_months', so you could simply have done this instead:
where TIMESTAMPDIFF(month,ep.start_date,curdate()) > 24
BUT this is not a good way to filter information because it requires running a function on every row of data before a decision can be reached. This has he effect of making queries slower. Compare that to the following:
WHERE ep.start_date < curdate() - INTERVAL 24 MONTH
ep.start_date is not affected by any function, and curdate() - INTERVAL 24 MONTH is just one calculation (not done every row). So this is much more efficient (also known as "sargable").

SQL: Reuse function result in query without using sub-query

In a MySQL DB table that stores sale orders, I have a LastReviewed column that holds the last date and time when the sale order was modified (type timestamp, default value CURRENT_TIMESTAMP). I'd like to plot the number of sales that were modified each day, for the last 90 days, for a particular user.
I'm trying to craft a SELECT that returns the number of days since LastReviewed date, and how many records fall within that range. Below is my query, which works just fine:
SELECT DATEDIFF(CURDATE(), LastReviewed) AS days, COUNT(*) AS number FROM sales
WHERE UserID=123 AND DATEDIFF(CURDATE(),LastReviewed)<=90
GROUP BY days
ORDER BY days ASC
Notice that I am computing the DATEDIFF() as well as CURDATE() multiple times for each record. This seems really ineffective, so I'd like to know how I can reuse the results of the previous computation. The first thing I tried was:
SELECT DATEDIFF(CURDATE(), LastReviewed) AS days, COUNT(*) AS number FROM sales
WHERE UserID=123 AND days<=90
GROUP BY days
ORDER BY days ASC
Error: Unknown column 'days' in 'where clause'. So I started to look around the net. Based on another discussion (Can I reuse a calculated field in a SELECT query?), I next tried the following:
SELECT DATEDIFF(CURDATE(), LastReviewed) AS days, COUNT(*) AS number FROM sales
WHERE UserID=123 AND (SELECT days)<=90
GROUP BY days
ORDER BY days ASC
Error: Unknown column 'days' in 'field list'. I'm also tried the following:
SELECT #days := DATEDIFF(CURDATE(), LastReviewed) AS days,
COUNT(*) AS number FROM sales
WHERE UserID=123 AND #days <=90
GROUP BY days
ORDER BY days ASC
The query returns zero result, so #days<=90 seems to return false even though if I put it in the SELECT clause and remove the WHERE clause, I can see some results with #days values below 90.
I've gotten things to work by using a sub-query:
SELECT * FROM (
SELECT DATEDIFF(CURDATE(),LastReviewed) AS sales ,
COUNT(*) AS number FROM sales
WHERE UserID=123
GROUP BY days
) AS t
WHERE days<=90
ORDER BY days ASC
However I odn't know whether it's the most efficient way. Not to mention that even this solution computes CURDATE() once per record even though its value will be the same from the start to the end of the query. Isn't that wasteful? Am I overthinking this? Help would be welcome.
Note: Mods, should this be on CodeReview? I posted here because the code I'm trying to use doesn't actually work
There are actually two problems with your question.
First, you're overlooking the fact that WHERE precedes SELECT. When the server evaluates WHERE <expression>, it then already knows the value of the calculations done to evaluate <expression> and can use those for SELECT.
Worse than that, though, you should almost never write a query that uses a column as an argument to a function, since that usually requires the server to evaluate the expression for each row.
Instead, you should use this:
WHERE LastReviewed < DATE_SUB(CURDATE(), INTERVAL 90 DAY)
The optimizer will see this and get all excited, because DATE_SUB(CURDATE(), INTERVAL 90 DAY) can be resolved to a constant, which can be used on one side of a < comparison, which means that if an index exists with LastReviewed as the leftmost relevant column, then the server can immediately eliminate all of the rows with LastReviewed >= that constant value, using the index.
Then DATEDIFF(CURDATE(), LastReviewed) AS days (still needed for SELECT) will only be evaluated against the rows we already know we want.
Add a single index on (UserID, LastReviewed) and the server will be able to pinpoint exactly the relevant rows extremely quickly.
Builtin functions are much less costly than, say, fetching rows.
You could get a lot more performance improvement with the following 'composite' index:
INDEX(UserID, LastReviewed)
and change to
WHERE UserID=123
AND LastReviewed >= CURRENT_DATE() - INTERVAL 90 DAY
Your formulation is 'hiding' LastRevieded in a function call, making it unusable in an index.
If you are still not satisfied with that improvement, then consider a nightly query that computes yesterday's statistics and puts them in a "Summary table". From there, the SELECT you mentioned can run even faster.

How do I subtract two declared variables in MYSQL

The question I am working on is as follows:
What is the difference in the amount received for each month of 2004 compared to 2003?
This is what I have so far,
SELECT #2003 = (SELECT sum(amount) FROM Payments, Orders
WHERE YEAR(orderDate) = 2003
AND Payments.customerNumber = Orders.customerNumber
GROUP BY MONTH(orderDate));
SELECT #2004 = (SELECT sum(amount) FROM Payments, Orders
WHERE YEAR(orderDate) = 2004
AND Payments.customerNumber = Orders.customerNumber
GROUP BY MONTH(orderDate));
SELECT MONTH(orderDate), (#2004 - #2003) AS Diff
FROM Payments, Orders
WHERE Orders.customerNumber = Payments.customerNumber
Group By MONTH(orderDate);
In the output I am getting the months but for Diff I am getting NULL please help. Thanks
I cannot test this because I don't have your tables, but try something like this:
SELECT a.orderMonth, (a.orderTotal - b.orderTotal ) AS Diff
FROM
(SELECT MONTH(orderDate) as orderMonth,sum(amount) as orderTotal
FROM Payments, Orders
WHERE YEAR(orderDate) = 2004
AND Payments.customerNumber = Orders.customerNumber
GROUP BY MONTH(orderDate)) as a,
(SELECT MONTH(orderDate) as orderMonth,sum(amount) as orderTotal FROM Payments, Orders
WHERE YEAR(orderDate) = 2003
AND Payments.customerNumber = Orders.customerNumber
GROUP BY MONTH(orderDate)) as b
WHERE a.orderMonth=b.orderMonth
Q: How do I subtract two declared variables in MySQL.
A: You'd first have to DECLARE them. In the context of a MySQL stored program. But those variable names wouldn't begin with an at sign character. Variable names that start with an at sign # character are user-defined variables. And there is no DECLARE statement for them, we can't declare them to be a particular type.
To subtract them within a SQL statement
SELECT #foo - #bar AS diff
Note that MySQL user-defined variables are scalar values.
Assignment of a value to a user-defined variable in a SELECT statement is done with the Pascal style assignment operator :=. In an expression in a SELECT statement, the equals sign is an equality comparison operator.
As a simple example of how to assign a value in a SQL SELECT statement
SELECT #foo := '123.45' ;
In the OP queries, there's no assignment being done. The equals sign is a comparison, of the scalar value to the return from a subquery. Are those first statements actually running without throwing an error?
User-defined variables are probably not necessary to solve this problem.
You want to return how many rows? Sounds like you want one for each month. We'll assume that by "year" we're referring to a calendar year, as in January through December. (We might want to check that assumption. Just so we don't find out way too late, that what was meant was the "fiscal year", running from July through June, or something.)
How can we get a list of months? Looks like you've got a start. We can use a GROUP BY or a DISTINCT.
The question was... "What is the difference in the amount received ... "
So, we want amount received. Would that be the amount of payments we received? Or the amount of orders that we received? (Are we taking orders and receiving payments? Or are we placing orders and making payments?)
When I think of "amount received", I'm thinking in terms of income.
Given the only two tables that we see, I'm thinking we're filling orders and receiving payments. (I probably want to check that, so when I'm done, I'm not told... "oh, we meant the number of orders we received" and/or "the payments table is the payments we made, the 'amount we received' is in some other table"
We're going to assume that there's a column that identifies the "date" that a payment was received, and that the datatype of that column is DATE (or DATETIME or TIMESTAMP), some type that we can reliably determine what "month" a payment was received in.
To get a list of months that we received payments in, in 2003...
SELECT MONTH(p.payment_received_date)
FROM payment_received p
WHERE p.payment_received_date >= '2003-01-01'
AND p.payment_received_date < '2004-01-01'
GROUP BY MONTH(p.payment_received_date)
ORDER BY MONTH(p.payment_received_date)
That should get us twelve rows. Unless we didn't receive any payments in a given month. Then we might only get 11 rows. Or 10. Or, if we didn't receive any payments in all of 2003, we won't get any rows back.
For performance, we want to have our predicates (conditions in the WHERE clause0 reference bare columns. With an appropriate index available, MySQL will make effective use of an index range scan operation. If we wrap the columns in a function, e.g.
WHERE YEAR(p.payment_received_date) = 2003
With that, we will be forcing MySQL to evaluate that function on every flipping row in the table, and then compare the return from the function to the literal. We prefer not do do that, and reference bare columns in predicates (conditions in the WHERE clause).
We could repeat the same query to get the payments received in 2004. All we need to do is change the date literals.
Or, we could get all the rows in 2003 and 2004 all together, and collapse that into a list of distinct months.
We can use conditional aggregation. Since we're using calendar years, I'll use the YEAR() shortcut (rather than a range check). Here, we're not as concerned with using a bare column inside the expression.
SELECT MONTH(p.payment_received_date) AS `mm`
, MAX(MONTHNAME(p.payment_received_date)) AS `month`
, SUM(IF(YEAR(p.payment_received_date)=2004,p.payment_amount,0)) AS `2004_month_total`
, SUM(IF(YEAR(p.payment_received_date)=2003,p.payment_amount,0)) AS `2003_month_total`
, SUM(IF(YEAR(p.payment_received_date)=2004,p.payment_amount,0))
- SUM(IF(YEAR(p.payment_received_date)=2003,p.payment_amount,0)) AS `2004_2003_diff`
FROM payment_received p
WHERE p.payment_received_date >= '2003-01-01'
AND p.payment_received_date < '2005-01-01'
GROUP
BY MONTH(p.payment_received_date)
ORDER
BY MONTH(p.payment_received_date)
If this is a homework problem, I strongly recommend you work on this problem yourself. There are other query patterns that will return an equivalent result.
I think this is the problem:
In #2003 and #2004, you select only the sum. And even if you group by the month you still select one column i.e. each row does not say what month it is select for. So when you try to subtract SQL asks which row in #2003 should be subtracted from #2004.
So I think the solution is to select the month with the sum and do the subtract later based on the month.

MySQL Week Function Unexpected Results

I am querying a database of hour entries and summing up by company and by week. I understand that MySQL's week function is based on a calendar week. That being said, I'm getting some unexpected grouping results. Perhaps you sharp-eyed folks can lend a hand:
SELECT * FROM (
SELECT
tms.date,
SUM( IF( tms.skf_group = "HP Group", tms.hours, 0000.00 )) as HPHours,
SUM( IF( tms.skf_group = "SKF Canada", tms.hours, 000.00 )) as SKFHours
FROM time_management_system tms
WHERE date >= "2012-01-01"
AND date <= "2012-05-11"
AND tms.skf_group IN ( "HP Group", "SKF Canada" )
GROUP BY WEEK( tms.date, 7 )
# ORDER BY tms.date DESC
# LIMIT 7
) AS T1
ORDER BY date ASC
My results are as follows: (Occasionally we don't have entries on a Sunday for example. Do null values matter?)
('date'=>'2012-01-01','HPHours'=>'0.00','SKFHours'=>'2.50'),
('date'=>'2012-01-02','HPHours'=>'97.00','SKFHours'=>'78.75'),
('date'=>'2012-01-09','HPHours'=>'86.50','SKFHours'=>'100.00'),
('date'=>'2012-01-16','HPHours'=>'68.00','SKFHours'=>'96.25'),
('date'=>'2012-01-24','HPHours'=>'39.00','SKFHours'=>'99.50'),
('date'=>'2012-02-05','HPHours'=>'3.00','SKFHours'=>'93.00'),
('date'=>'2012-02-06','HPHours'=>'12.00','SKFHours'=>'122.50'),
('date'=>'2012-02-13','HPHours'=>'64.75','SKFHours'=>'117.50'),
('date'=>'2012-02-21','HPHours'=>'64.50','SKFHours'=>'93.00'),
('date'=>'2012-03-02','HPHours'=>'45.50','SKFHours'=>'143.25'),
('date'=>'2012-03-05','HPHours'=>'62.00','SKFHours'=>'136.75'),
('date'=>'2012-03-12','HPHours'=>'54.25','SKFHours'=>'133.00'),
('date'=>'2012-03-19','HPHours'=>'77.75','SKFHours'=>'130.75'),
('date'=>'2012-03-26','HPHours'=>'61.00','SKFHours'=>'147.00'),
('date'=>'2012-04-02','HPHours'=>'86.75','SKFHours'=>'96.75'),
('date'=>'2012-04-09','HPHours'=>'84.25','SKFHours'=>'120.50'),
('date'=>'2012-04-16','HPHours'=>'90.00','SKFHours'=>'127.25'),
('date'=>'2012-04-23','HPHours'=>'103.25','SKFHours'=>'89.50'),
('date'=>'2012-05-02','HPHours'=>'72.50','SKFHours'=>'143.75'),
('date'=>'2012-05-07','HPHours'=>'68.25','SKFHours'=>'119.00')
January 2nd is the first Monday, hence Jan 1st is only one day. I would expect the output to be consecutive Mondays (Monday Jan 2, 9, 16, 23, 30, etc)? The unexpected week groupings below continue throughout the results. Any ideas?
Thanks very much!
It's not clear what selecting tms.date even means when you're grouping by some function on tms.date. My guess is that it means "the date value from any source row corresponding to this group". At that point, the output is entirely reasonable.
Given that any given group can have seven dates within it, what date do you want to get in the results?
EDIT: This behaviour is actually documented in "GROUP BY and HAVING with Hidden Columns":
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause.
...
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values the server chooses.
The tms.date column isn't part of the GROUP BY clause - only a function operating on tms.date is part of the GROUP BY clause, so I believe the text above applies to the way that you're selecting tms.date: you're getting any date within that week.
If you want the earliest date, you might try
SELECT MIN(tms.date), ...
That's assuming that MIN works with date/time fields, of course. I can't easily tell from the documentation.
Question is not clear for me but I guess you don't want to group by week. Because week gives week of year. which is 19th week today.
I think you want to group by Weekday like GROUP BY WEEKday(tms.date)