Where datetime=date without full table scan?

Where datetime=date without full table scan? - mysql

Possible duplicate of: How to select date from datetime column?
But the problem with the accepted answer is it will preform a full table scan.
I want to do something like this:
UPDATE records SET earnings=(SELECT SUM(rate)
FROM leads
WHERE records.user_id=leads.user_id
AND DATE(leads.datetime)=records.date)
Notice the last portion: DATE(leads.datetime)=records.date. This does exactly what it needs to do, but it has to scan every row. Some users have thousands of leads so it can take a while.
The leads table has an INDEX on user_id,datetime.
I know you can use interval functions and do something like WHERE datetime BETWEEN date AND interval + days or something like that.
What is the most efficient and accurate way to do this?

I'm not familiar with date functions in MySQL, but try changing it to
UPDATE records SET earnings=
(SELECT SUM(rate)
FROM leads
WHERE records.user_id=leads.user_id
AND leads.datetime >= records.date
And leads.datetime < records.date [+ one day]) -- however you do that in MySQL
You are getting a complete table scan because the expression DATE(leads.datetime) is not Sargable. This is because it is a function which needs to operate on the value stored in a column of the table, and which is also stored in any index on that column. The function's value, obviously, cannot be pre-computed and stored in any index, only the actual column value, so no index search can identify which rows will, after having the function executed on them, meet the criteria expressed in the Where clause predicate. Changing the expression so that the column value is, by itself on one side or the other of the where clause operator, (equal sign or whatever), allows the column values in the index to be searched based on a single expression.

You can try this:
UPDATE records
SET earnings = (SELECT SUM(rate)
FROM leads
WHERE records.user_id=leads.user_id AND
leads.datetime >= records.date and
leads.datetime < date_add(records.date, interval 1 day)
);
You need an index on leads(user_id, datetime) for this to work.

Related

MySQL Table With UNIX Timestamps - Sorting By Timestamp And Calculating Difference Between One Row And Previous

Similar Questions
First off, I'm pretty sure something similar must be answered somewhere but I haven't been able to find it. Here are some similar pages that aren't what I'm looking for:
MySQL - UPDATE query based on SELECT Query
That's calculating the time difference between two rows which the ID already known of both rows and in my case I have no idea what the ID will be of the previous row. Also, using DATETIME when my table is a simple Unix timestamp.
Calculate the time difference between two timestamps in mysql
Seems to depend on functions related to a specific TIMESTAMP type.
Difference between current and previous timestamp
Seems to be asking the right thing but his table doesn't have realistic timestamps and I can't get any of the answers to work.
MySQL: how to get the difference between two timestamps in seconds
This one is unrelated. My time is already in timestamps so I can just subtract.
Calculate delta(difference of current and previous row) mysql group by specific column
This one is pretty close, but it's a SELECT query and I'm trying to UPDATE the table to have this information in a new column so I can use it in subsequent queries. The DATEDIFF it uses could probably easily be converted to simple subtraction. I don't really want to have to set up a new table with all the possible difference values in seconds.
What I'm Trying To Do
In Excel, I could take rows, sort them by a 'timestamp' column, and then set a new column (call it 'delta') which is equal to one timestamp minus the previous timestamp. So I'll get a value in seconds which is the time that's passed between one timestamp and the previous one. If a row was, for example, 1 second after the previous, the value would be '1' or if it was a minute later it would be '60'. All of the timestamps are Unix timestamps, so it's just seconds since January 1st 1970.
It's easy to add the new column in MySQL, but I can't seem to find the right query to populate it.
Here's an example table with the delta column filled in:
id
timestamp
delta
3
1623400800
NULL
2
1623444000
43200
56
1623444060
60
Solution Constraints
Ideally, since there are a lot of rows, I'd like something that functions similarly to what I'd do in Excel, for efficiency. That is, sorting the table and filling in the delta based on the data of the sorted table.
Granted, if that's not possible, then a query that has to do an individual search to populate 'delta' for every row is probably acceptable for the time being. I'll just have to run it a lot of times on portions of the data.

You would just use lag():
select t.*,
(timestamp - lag(timestamp) over (order by timestamp)) as delta
from t;
In older versions of MySQL, you can do this rather painfully using variables:
select t.*,
(timestamp -
(case when (#tempt := #prevt) = null -- never happens
then 0
when (#prev := timestamp) = null -- never happens
then 0
else #tempt
end)
) as delta
from (select t.* from t order by timestamp) t cross join
(select #prevt := -1) params;
Note that this uses a case expression to implement "sequential" logic for the variable processing. This is a hack and you can understand why MySQL has switched to using standard window functions for this type of operation.
You can try having an index on (timestamp) to improve performance.
EDIT:
If you have an index on timestamp and the timestamps are unique, then you can do:
update t join
(select t.*,
(timestamp - lag(timestamp) over (order by timestamp)) as delta
from t
) tt
using (timestamp)
set t.delta = tt.delta;

What keys should be indexed here to make this query optimal

I have a query that looks like the following:
SELECT * from foo
WHERE days >= DATEDIFF(CURDATE(), last_day)
In this case, days is an INT. last_day is a DATE column.
so I need two individual indexes here for days and last_day?

This query predicate, days >= DATEDIFF(CURDATE(), last_day), is inherently not sargeable.
If you keep the present table design you'll probably benefit from a compound index on (last_day, days). Nevertheless, satisfying the query will require a full scan of that index.
Single-column indexes on either one of those columns, or both, will be useless or worse for improving this query's performance.
If you must have this query perform very well, you need to reorganize your table a bit. Let's figure that out. It looks like you are trying to exclude "overdue" records: you want expiration_date < CURDATE(). That is a sargeable search predicate.
So if you added a new column expiration_date to your table, and then set it as follows:
UPDATE foo SET expiration_date = last_day + INTERVAL days DAY
and then indexed it, you'd have a well-performing query.

You must be careful with indexes, they can help you reading, but they can reduce performance in insert.
You may consider to create a partition over last_day field.
I should try to create only in last_day field, but, I think the best is making some performance tests with different configurations.

Since you are using an expression in the where criteria, mysql will not be able to use indexes on any of the two fields. If you use this expression regularly and you have at least mysql v5.7.8, then you can create a generated column and create an index on it.
The other option is to create a regular column and set its value to the result of this expression and index this column. You will need triggers to keep it updated.

SQL query using a date field to ensure I use my index

I have a table with date column named day. The way I have it indexed is using multiple keys:
KEY user_id (user_id,day)
I want to make sure I use the index properly when I make a query that selects every row for a user_id from the beginning of the month to a given day in the month. For example, let's say I want to query for every day since the beginning of the month until today, what's the best way to write my query to ensure that I hit the index, here's what I have so far:
select * from table_name
WHERE user_id = 1
AND (day between DATE_FORMAT(NOW() ,'%Y-%m-01') AND NOW() )

use Explain to ensure if your query using the index you've created
For example
explain
select * from table_name
WHERE user_id = 1
AND (day between DATE_FORMAT(NOW() ,'%Y-%m-01') AND NOW() )
should give you the details of the query plan by the mysql optimizer
In the query plan in possible keys there user_id(index) should be present --- > which states that the possible indexes that can be used to fetch the result set from a particular table
keys field user_id(index) should be present --- > which shows that the index that is being used to get the results
Extra ---> Additional information (using index--the whole result set can be fetched from the index file itself, using where ---> query uses index for filter the criteria and so on...)
In Your query you have mentioned user_id = 1 constant which will help more in reducing the number of records that are to be scanned, even though there is a range check on day, so the query will use index provided that you have less percentage of duplicate values in user_id column

Why does MySQL drops my index when using DATE(`table`.`column`)

I have a MySQL innodb table with a few columns.
one of them is named "dateCreated" which is a DATETIME column and it is indexed.
My query:
SELECT
*
FROM
`table1`
WHERE
DATE(`dateCreated`) BETWEEN '2014-8-7' AND '2013-8-7'
MySQL for some reason refuses to use the index on the dateCreated column (even with USE INDEX or FORCE INDEX.
However, if I change the query to this:
SELECT
*
FROM
`table1`
WHERE
`dateCreated` BETWEEN '2014-8-7' AND '2013-8-7'
note the DATE(...) removal
MySQL uses the index just fine.
I could manage without using the DATE() function, but this is just weird to me.
I understand that maybe MySQL indexes the full date and time and when searching only a part of it, it gets confused or something. But there must be a way to use a partial date (lets say MONTH(...) or DATE(...)) and still benefit from the indexed column and avoid the full table scan.
Any thoughts..?
Thanks.

As you have observed once you apply a function to that field you destroy access to the index. So,
It will help if you don't use between. The rationale for applying the function to the data is so you can get the data to match the parameters. There are just 2 parameter dates and several hundred? thousand? million? rows of data. Why not reverse this, change the parameters to suit the data? (making it a "sargable" predicate)
SELECT
*
FROM
`table1`
WHERE
( `dateCreated` >= '2013-08-07' AND `dateCreated` < '2014-08-07' )
;
Note 2013-08-07 is used first, and this needs to be true if using between also. You will not get any results using between if the first date is younger than the second date.
Also note that exactly 12 months of data is contained >= '2013-08-07' AND < '2014-08-07', I presume this is what you are seeking.
Using the combination of date(dateCreated) and between would include 1 too many days as all events during '2014-08-07' would be included. If you deliberately wanted one year and 1 day then add 1 day to the higher date i.e. so it would be < '2014-08-08'

Two(with subquery) or one query to select max(date) in where clause. MySQL

I need to create a table and store there cached status of some events. So I will have to do only two operations:
1) Insert id of event, it's status, and time of when this record was stored in db;
2) Get last record with certain event id.
There are several methods to get the result (status):
Method 1:
SELECT status FROM status_log a
WHERE a.event_id = 1
ORDER BY a.update_date DESC
LIMIT 1
Method 2:
SELECT status FROM status_log a
WHERE a.update_date = (
SELECT max(b.update_date) FROM status_log b
WHERE b.event_id = 1
) AND a.event_id = 1
So I have two questions:
Which query to use
Which field type to set to update_date field (int or timestamp)

Actually, your second query does not resolve question 'find record with greatest date of update for event #1' - because there could be many different events with same latest update_date. So, in terms of semantics - you should use first query. (after your edit this is fixed)
First query will be effective if you'll create an index by event_id index and this column will have good cardinality (i.e. WHERE clause will filter few enough rows by using that index). However, this can be improved by adding column update_date to index - but that makes sense only if there will be many rows with same event_id (many enough for MySQL to use second index part) - and again with good cardinality inside first index part.
But in practice - my advice is just a theory, you'll have to figure it out with EXPLAIN syntax and your own measures on real data.
As for data type - common practice is to use proper data type (i.e. datetime/timestamp for something which means time point)

Which query to use
I believe the first one should be faster. Anyway just run an EXPLAIN on them and you'll find out yourself.
The index you should be using will be:
ALERT TABLE status_log ADD INDEX(event_id, update_date)
Now... did you notice that those queries are NOT equivalent? The second one will return all status from all event_id that have a maximum date.
Which field type to set to update_date field (int or timestamp)
If you have a field named update_date I just can't imagine why an int would serve the same purpose. Rephrasing the question to choose between datetime or timestamp, then the answer is up to the requirements. If you just want to know when a record in the DB was updated use a timestamp. If the update_date refers to an entity in your domain model go for a datetime. You will most likely need to perform calculations on the date (add time, remove time, extract a month, etc) so using a unix timestamp (which I'd say should be almost write-only) will result in extra calculation time because you'll have to convert the timestamp to a datetime and then perform the function over that result.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008