What's the difference between the two SQL statements? - mysql

This is a question from leetcode, using the second query I got the question wrong but could not identify why
SELECT
user_id,
max(time_stamp) as "last_stamp"
from
logins
where
year(time_stamp) = '2020'
group by
user_id
and
select
user_id,
max(time_stamp) as "last_stamp"
from
logins
where
time_stamp between '2020-01-01' and '2020-12-31'
group by
user_id

The first query uses a function on every row to extract the year (an integer) and compares that to a string. (It would be preferable to use an integer instead.) Whilst this may be sub-optimal, this query would accurately locate all rows that fall into the year 2020.
The second query could fail to locate all rows that fall into 2020. Here it is important to remember that days have a 24 hour duration, and that each day starts at midnight and concludes at midnight 24 hours later. That is; a day does have a start point (midnight) and an end-point (midnight+24 hours).
However a single date used in SQL code cannot be both the start-point and the end-point of the same day, so every date in SQL represents only the start-point. Also note here, that between does NOT magically change the second given date into "the end of that day" - it simply cannot (and does not) do that.
So, when you use time_stamp between '2020-01-01' and '2020-12-31' you need to think of it as meaning "from the start of 2020-01-01 up to and including the start of 2020-12-31". Hence, this excludes the 24 hours duration of 2020-12-31.
The safest way to deal with this is to NOT use between at all, instead write just a few characters more code which will be accurate regardless of the time precision used by any date/datetime/timestamp column:
where
time_stamp >= '2020-01-01' and time_stamp <'2021-01-01'
with the second date being "the start-point of the next day"
See answer to SQL "between" not inclusive

Related

SQL: Reuse function result in query without using sub-query

In a MySQL DB table that stores sale orders, I have a LastReviewed column that holds the last date and time when the sale order was modified (type timestamp, default value CURRENT_TIMESTAMP). I'd like to plot the number of sales that were modified each day, for the last 90 days, for a particular user.
I'm trying to craft a SELECT that returns the number of days since LastReviewed date, and how many records fall within that range. Below is my query, which works just fine:
SELECT DATEDIFF(CURDATE(), LastReviewed) AS days, COUNT(*) AS number FROM sales
WHERE UserID=123 AND DATEDIFF(CURDATE(),LastReviewed)<=90
GROUP BY days
ORDER BY days ASC
Notice that I am computing the DATEDIFF() as well as CURDATE() multiple times for each record. This seems really ineffective, so I'd like to know how I can reuse the results of the previous computation. The first thing I tried was:
SELECT DATEDIFF(CURDATE(), LastReviewed) AS days, COUNT(*) AS number FROM sales
WHERE UserID=123 AND days<=90
GROUP BY days
ORDER BY days ASC
Error: Unknown column 'days' in 'where clause'. So I started to look around the net. Based on another discussion (Can I reuse a calculated field in a SELECT query?), I next tried the following:
SELECT DATEDIFF(CURDATE(), LastReviewed) AS days, COUNT(*) AS number FROM sales
WHERE UserID=123 AND (SELECT days)<=90
GROUP BY days
ORDER BY days ASC
Error: Unknown column 'days' in 'field list'. I'm also tried the following:
SELECT #days := DATEDIFF(CURDATE(), LastReviewed) AS days,
COUNT(*) AS number FROM sales
WHERE UserID=123 AND #days <=90
GROUP BY days
ORDER BY days ASC
The query returns zero result, so #days<=90 seems to return false even though if I put it in the SELECT clause and remove the WHERE clause, I can see some results with #days values below 90.
I've gotten things to work by using a sub-query:
SELECT * FROM (
SELECT DATEDIFF(CURDATE(),LastReviewed) AS sales ,
COUNT(*) AS number FROM sales
WHERE UserID=123
GROUP BY days
) AS t
WHERE days<=90
ORDER BY days ASC
However I odn't know whether it's the most efficient way. Not to mention that even this solution computes CURDATE() once per record even though its value will be the same from the start to the end of the query. Isn't that wasteful? Am I overthinking this? Help would be welcome.
Note: Mods, should this be on CodeReview? I posted here because the code I'm trying to use doesn't actually work
There are actually two problems with your question.
First, you're overlooking the fact that WHERE precedes SELECT. When the server evaluates WHERE <expression>, it then already knows the value of the calculations done to evaluate <expression> and can use those for SELECT.
Worse than that, though, you should almost never write a query that uses a column as an argument to a function, since that usually requires the server to evaluate the expression for each row.
Instead, you should use this:
WHERE LastReviewed < DATE_SUB(CURDATE(), INTERVAL 90 DAY)
The optimizer will see this and get all excited, because DATE_SUB(CURDATE(), INTERVAL 90 DAY) can be resolved to a constant, which can be used on one side of a < comparison, which means that if an index exists with LastReviewed as the leftmost relevant column, then the server can immediately eliminate all of the rows with LastReviewed >= that constant value, using the index.
Then DATEDIFF(CURDATE(), LastReviewed) AS days (still needed for SELECT) will only be evaluated against the rows we already know we want.
Add a single index on (UserID, LastReviewed) and the server will be able to pinpoint exactly the relevant rows extremely quickly.
Builtin functions are much less costly than, say, fetching rows.
You could get a lot more performance improvement with the following 'composite' index:
INDEX(UserID, LastReviewed)
and change to
WHERE UserID=123
AND LastReviewed >= CURRENT_DATE() - INTERVAL 90 DAY
Your formulation is 'hiding' LastRevieded in a function call, making it unusable in an index.
If you are still not satisfied with that improvement, then consider a nightly query that computes yesterday's statistics and puts them in a "Summary table". From there, the SELECT you mentioned can run even faster.

Return rows for next month, MYSQL

I have a mysql table which stores users' availability, stored in 'start' and 'end' columns as date fields.
I have a form where other users can search through the 'availabilty' with various periods like, today, tomorrow and next week . I'm trying to figure out how to construct the query to get all the rows for users who are available 'next month'.
The 'start' values maybe from today and the 'end' value might might be three months away but if next month falls between 'start' and 'end' then I would want that row returned.
The nearest I can get is with the query below but that just returns rows where 'start' falls within next month. Many thanks,
sql= "SELECT * FROM mytable WHERE start BETWEEN DATE_SUB(LAST_DAY(DATE_ADD(NOW(), INTERVAL 1 MONTH)),INTERVAL DAY(LAST_DAY(DATE_ADD(NOW(), INTERVAL 1 MONTH)))-1 DAY) AND LAST_DAY(DATE_ADD(NOW(), INTERVAL 1 MONTH))";
As you are interested in anything that happens in the full month following the current date you could try something like this:
SELECT * FROM mytable WHERE
FLOOR(start/100000000)<=FLOOR(NOW()/100000000)+1 AND
FLOOR( end/100000000)>=FLOOR(NOW()/100000000)+1
This query make use of the fact that datetime values are stored in MySql internally as a number like
SELECT now()+0
--> 20150906130640
where the digits 09 refer to the current month. FLOOR(NOW()/100000000) filters out the first digits of the number (in this case:201509). The WHERE conditions now simply test whether the start date is anywhere before the end of the next month and the end date is at least in or after the period of the next month.
(In my version I purposely left out the condition that start needs to be "after today", since a period that has started earlier seems in my eyes still applicable for your described purpose. If, however, you wanted that condition included too you could simply add an AND start > now() at the end of your WHERE clause.)
Edit
As your SQLfiddle is set-up with a date instead of a (as I was assuming) datetime column your dates will be represented differently in mumeric format like 20150907 and a simple division by 100 will now get you the desired month-number for comparison (201509):
SELECT * FROM mytable WHERE
FLOOR(start/100)<=FLOOR(NOW()/100000000)+1 AND
FLOOR( end/100)>=FLOOR(NOW()/100000000)+1
The number returned by NOW() is still a 14-digit figure and needs to be divided by 100000000. See your updated fiddle here: SQLfiddle
I also added another record ('Charlie') which does not fulfill your requirements.
Update
To better accommodate change-of-year scenarios I updated my SqlFiddle. The where clause is now based on 12*YEAR(..)+MONTH(..) type functions.

WHERE clause to filter times that are under an hour

SELECT
name,
start_time,
TIME(cancelled_date) AS cancelled_time,
TIMEDIFF(start_time, TIME(cancelled_date)) AS difference
FROM
bookings
I'm trying to get from the database a list of bookings which were cancelled with less than an hour's notice. The start time and the cancellation times are both in TIME format, I know a timestamp would have made this easier. So above I've calculated the time difference between the two values and now need to add a WHERE clause to restrict it to only those records that have a difference of under 1:00:00. Obviously this isn't a number, it's a time, so a simple bit of maths won't do it.
start_time is a TIME
cancelled_date is a DATETIME but I'm converting it to TIME in the query to then calculate cancelled_time and difference.
I would be inclined to do this by adding and hour to the notice, something like this:
WHERE start_time > date_add(cancelled_date, interval 1 hour)
I can't quite tell what the right logic is from the question, because your column names don't match the description.
In this case, so a subtraction or doing the comparison are similar performance wise. But, if you had a constant instead of cancelled_date, then there is a difference. The following:
WHERE start_time < date_add(now(), interval -1 hour)
Allows the engine to use an index on start_time.
you can use having difference<time('1:00')

Trouble of a MySql query not returning the right information

This is the query I have already:
use willkara;
select EngagementNumber,AgentID, EntertainerID, StartDate, EndDate, ContractPrice, ContractPrice/DateDiff(EndDate,StartDate) AS PricePerDay
FROM EA_Engagements
where StartDate <= '1999-8-13'
and EndDate >= '1999-8-8'
ORDER BY EngagementNumber;
And this is the problem:
I need a list of engagements that occurred between 8/8/1999 and 8/13/1999. I only want to see the engagements that started on or after 8/8/1999 and ended on or before 8/13/1999. For each of those engagements, I need to know how long (in days) the engagement was, and the IDs of the entertainer and the agent, and the contract price per day of entertainment. Remember, when we compute the length of an engagement, we include both the day it started and the day it ended. Please sort the information in Engagement number order. [2 rows]
8 columns needed; Last column must be labeled PricePerDay
For some reason, some of the end dates are 8-15 and 8-19 and it's only suppose to be the dates that end on the 13th.
Since they are two fields, it would be theoretically possible for someone to put in a start date that's older than the end date and vice versa leading to incorrect results. I'd adjust your query accordingly, do a between or something similar.

Getting week started date using MySQL

If I have MySQL query like this, summing word frequencies per week:
SELECT
SUM(`city`),
SUM(`officers`),
SUM(`uk`),
SUM(`wednesday`),
DATE_FORMAT(`dateTime`, '%d/%m/%Y')
FROM myTable
WHERE dateTime BETWEEN '2011-09-28 18:00:00' AND '2011-10-29 18:59:00'
GROUP BY WEEK(dateTime)
The results given by MySQL take the first value of column dateTime, in this case 28/09/2011 which happens to be a Saturday.
Is it possible to adjust the query in MySQL to show the date upon which the week commences, even if there is no data available, so that for the above, 2011-09-28 would be replaced with 2011/09/26 instead? That is, the date of the start of the week, being a Monday. Or would it be better to adjust the dates programmatically after the query has run?
The dateTime column is in format 2011/10/02 12:05:00
It is possible to do it in SQL but it would be better to do it in your program code as it would be more efficient and easier. Also, while MySQL accepts your query, it doesn't quite make sense - you have DATE_FORMAT(dateTime, '%d/%m/%Y') in select's field list while you group by WEEK(dateTime). This means that the DB engine has to select random date from current group (week) for each row. Ie consider you have records for 27.09.2011, 28.09.2011 and 29.09.2011 - they all fall onto same week, so in the final resultset only one row is generated for those three records. Now which date out of those three should be picked for the DATE_FORMAT() call? Answer would be somewhat simpler if there is ORDER BY in the query but it still doesn't quite make sense to use fields/expressions in the field list which aren't in GROUP BY or which aren't aggregates. You should really return the week number in the select list (instead of DATE_FORMAT call) and then in your code calculate the start and end dates from it.