Query 1 works but query 2 doesn't:
Query #1:
SELECT * FROM `users` WHERE users.dob <= '1994-1-14' AND users.dob >= '1993-1-14' LIMIT 10
Query #2:
SELECT * FROM `users` WHERE users.dob BETWEEN '1994-1-14' AND '1993-1-14' LIMIT 10
The 2nd one should be able to do the same thing as the first but I don't understand why it's not working.
The dob (date of birth) field in the users table is a type date field with records that look like this:
1988-11-08
1967-11-14
1991-03-09
1958-03-08
1967-06-30
1988-10-19
1986-01-23
1965-09-20
YEAR - MONTH - DAY
With either query #1 or #2 I'm trying to get back all users who are between 18 and 19 years of age, because 1994-1-14 is exactly 18 years from today and 1993-1-14 is 19 years from today. So is there a way to get the between query to work?
By not working I mean it doesn't return any records from the db while the working query does.
Also is the between query more efficient or is the performance difference negligible?
To answer the first part: "expr BETWEEN min AND max". Try switching those 2 dates in the second query.
The usage is wrong. See the BETWEEN documentation:
expr BETWEEN min AND max is equivalent to (min <= expr AND expr <= max).
Therefore, users.dob BETWEEN '1994-1-14' AND '1993-1-14' is the same as ('1994-1-14' <= users.dob AND users.dob <= '1993-1-14'), of which there will never be more than 0 results.
Simply reverse the order :)
There will be no performance difference when using either form, possibly subject to the note below. This transformation happens at the query planner level. However, if you have concerns, remember to profile, profile, profile. Then you can see for yourself and appease the premature-optimization demons.
Also note the ... note:
For best results when using BETWEEN with date or time values, use CAST() to explicitly convert the values to the desired data type.
Related
Some background first. We have a MySQL database with a "live currency" table. We use an API to pull the latest currency values for different currencies, every 5 seconds. The table currently has over 8 million rows.
Structure of the table is as follows:
id (INT 11 PK)
currency (VARCHAR 8)
value (DECIMAL
timestamp (TIMESTAMP)
Now we are trying to use this table to plot the data on a graph. We are going to have various different graphs, e.g: Live, Hourly, Daily, Weekly, Monthly.
I'm having a bit of trouble with the query. Using the Weekly graph as an example, I want to output data from the last 7 days, in 15 minute intervals. So here is how I have attempted it:
SELECT *
FROM currency_data
WHERE ((currency = 'GBP')) AND (timestamp > '2017-09-20 12:29:09')
GROUP BY UNIX_TIMESTAMP(timestamp) DIV (15 * 60)
ORDER BY id DESC
This outputs the data I want, but the query is extremely slow. I have a feeling the GROUP BY clause is the cause.
Also BTW I have switched off the sql mode 'ONLY_FULL_GROUP_BY' as it was forcing me to group by id as well, which was returning incorrect results.
Does anyone know of a better way of doing this query which will reduce the time taken to run the query?
You may want to create summary tables for each of the graphs you want to do.
If your data really is coming every 5 seconds, you can attempt something like:
SELECT *
FROM currency_data cd
WHERE currency = 'GBP' AND
timestamp > '2017-09-20 12:29:09' AND
UNIX_TIMESTAMP(timestamp) MOD (15 * 60) BETWEEN 0 AND 4
ORDER BY id DESC;
For both this query and your original query, you want an index on currency_data(currency, timestamp, id).
Please consider the following query:
SELECT submitted_time FROM jobs WHERE timediff(NOW(), submitted_time) < '24:00:00'
My hope is for this to return all rows that have a "submitted_time" column containing a timestamp that was within the last 24 hours, However I am receiving the following results:
2017-01-18 14:58:34
2017-01-16 14:58:34
If I run the query SELECT NOW() I get 2017-01-25 18:58:32
Which appears to be correct.
What is stranger still is that I have more recent rows in the DB such as:
2017-01-24 15:17:13
Which are not being returned.
I hope I have made a glaringly obvious error that someone can point out, rather than beginning the descent into madness.
Just to be clear, the simplest and probably most performant way to handle this is (as per the link I provided in the comment)
SELECT submitted_time FROM jobs WHERE submitted_time > DATE_ADD(NOW(), INTERVAL -1 DAY);
This should be all jobs submitted literally within the last 24 hours at the moment the query is issued.
This might not be important to you for this query, but whenever you apply functions to columns in your table, any indexes you might have can not be used, because the database must run the function(s) on each value in the table before it can perform a comparison.
Using this method you figure out what the comparable datetime needs to be and mysql will use an index on submitted_time for the comparison, assuming that column is indexed appropriately.
In a MySQL DB table that stores sale orders, I have a LastReviewed column that holds the last date and time when the sale order was modified (type timestamp, default value CURRENT_TIMESTAMP). I'd like to plot the number of sales that were modified each day, for the last 90 days, for a particular user.
I'm trying to craft a SELECT that returns the number of days since LastReviewed date, and how many records fall within that range. Below is my query, which works just fine:
SELECT DATEDIFF(CURDATE(), LastReviewed) AS days, COUNT(*) AS number FROM sales
WHERE UserID=123 AND DATEDIFF(CURDATE(),LastReviewed)<=90
GROUP BY days
ORDER BY days ASC
Notice that I am computing the DATEDIFF() as well as CURDATE() multiple times for each record. This seems really ineffective, so I'd like to know how I can reuse the results of the previous computation. The first thing I tried was:
SELECT DATEDIFF(CURDATE(), LastReviewed) AS days, COUNT(*) AS number FROM sales
WHERE UserID=123 AND days<=90
GROUP BY days
ORDER BY days ASC
Error: Unknown column 'days' in 'where clause'. So I started to look around the net. Based on another discussion (Can I reuse a calculated field in a SELECT query?), I next tried the following:
SELECT DATEDIFF(CURDATE(), LastReviewed) AS days, COUNT(*) AS number FROM sales
WHERE UserID=123 AND (SELECT days)<=90
GROUP BY days
ORDER BY days ASC
Error: Unknown column 'days' in 'field list'. I'm also tried the following:
SELECT #days := DATEDIFF(CURDATE(), LastReviewed) AS days,
COUNT(*) AS number FROM sales
WHERE UserID=123 AND #days <=90
GROUP BY days
ORDER BY days ASC
The query returns zero result, so #days<=90 seems to return false even though if I put it in the SELECT clause and remove the WHERE clause, I can see some results with #days values below 90.
I've gotten things to work by using a sub-query:
SELECT * FROM (
SELECT DATEDIFF(CURDATE(),LastReviewed) AS sales ,
COUNT(*) AS number FROM sales
WHERE UserID=123
GROUP BY days
) AS t
WHERE days<=90
ORDER BY days ASC
However I odn't know whether it's the most efficient way. Not to mention that even this solution computes CURDATE() once per record even though its value will be the same from the start to the end of the query. Isn't that wasteful? Am I overthinking this? Help would be welcome.
Note: Mods, should this be on CodeReview? I posted here because the code I'm trying to use doesn't actually work
There are actually two problems with your question.
First, you're overlooking the fact that WHERE precedes SELECT. When the server evaluates WHERE <expression>, it then already knows the value of the calculations done to evaluate <expression> and can use those for SELECT.
Worse than that, though, you should almost never write a query that uses a column as an argument to a function, since that usually requires the server to evaluate the expression for each row.
Instead, you should use this:
WHERE LastReviewed < DATE_SUB(CURDATE(), INTERVAL 90 DAY)
The optimizer will see this and get all excited, because DATE_SUB(CURDATE(), INTERVAL 90 DAY) can be resolved to a constant, which can be used on one side of a < comparison, which means that if an index exists with LastReviewed as the leftmost relevant column, then the server can immediately eliminate all of the rows with LastReviewed >= that constant value, using the index.
Then DATEDIFF(CURDATE(), LastReviewed) AS days (still needed for SELECT) will only be evaluated against the rows we already know we want.
Add a single index on (UserID, LastReviewed) and the server will be able to pinpoint exactly the relevant rows extremely quickly.
Builtin functions are much less costly than, say, fetching rows.
You could get a lot more performance improvement with the following 'composite' index:
INDEX(UserID, LastReviewed)
and change to
WHERE UserID=123
AND LastReviewed >= CURRENT_DATE() - INTERVAL 90 DAY
Your formulation is 'hiding' LastRevieded in a function call, making it unusable in an index.
If you are still not satisfied with that improvement, then consider a nightly query that computes yesterday's statistics and puts them in a "Summary table". From there, the SELECT you mentioned can run even faster.
I'm trying to filter a SELECT query between NOW() and NOW - interval 10 minute(?), but i can't seem to get this to work, and it's given me a few questions on the topic.
I've looked through some documentation online, and alot of questions on stackoverflow but non of the solutions give me what i need. Looking at the TIMEDIFF and TIMESTAMPDIFF documentation, i only see it used like this;
SELECT TIMESTAMPDIFF(SECOND,'2007-12-30 12:01:01','2007-12-31 10:02:00');
However i don't want to just select the time difference, i want to use it in a query as a WHERE clause, something like;
SELECT * FROM tableName WHERE (the time difference betweeen NOW() and the stored timestamp is less than x minutes);
Is there a particular data type i need to set my column to?
How can i use the TIMEDIFF / TIMESTAMPDIFF correctly, and if these are not the correct methods i should be using, what is?
SELECT * FROM tableName WHERE TIMESTAMPDIFF(MINUTE,timestamp,NOW()) < 10
SELECT * FROM tableName
WHERE now() - interval 10 minute < stored_timestamp
select (SELECT power FROM newdb.newmeter where date(dt)=curdate() order by dt desc limit 1)
-(select Power from newdb.newmeter where date(dt)=(select date(subdate(now(), interval weekday(now()) day))) limit 0,1) as difference;
The above query is part of my prog which gives me difference in data being stored from day 1 of the week to the current day of the week. Those queries individually works fine as below, and returns:
SELECT power FROM newdb.newmeter where date(dt)=curdate() order by dt desc limit 1;
result: 941690 current time
select Power from newdb.newmeter where date(dt)=(select date(subdate(now(), interval weekday(now()) day))) limit 0,1;
result 93242.4 at the start of the week (or day for today as its monday)
But as soon as I run the difference query which is just the difference between above two that result in : 848447.8515625
This seems just really strange don't understand whats wrong with it? Please help.
You don't order by dt in your second query, looking for power at the start of week. Which means you are selecting undefined record that happens to have matching date. For a simple select the table is usually sorted by insert order, but it can change when optimizer thinks it can run a query faster using some other order. Basically, if you don't define order you don't care about order.
How many decimal places do you want in the answer? Use DECIMAL(7,1) for Power (I assume you want one decimal place) instead and see what you get.
I tend to avoid floats/doubles as the approximate values can get you into trouble.