MySQL multiple Date subqueries - mysql

I am writing a query for a very specific report that contains a variable number of columns, based on specific relationships of an item. I am open to suggetions on how to change the query if needs be, but I don't think it can be. I would prefer to keep this as a single query, as opposed to running it in a loop. The table that is being searched contains around 4 million records, and cannot be archived.
What I would like to know, is why the DATEADD index is not being used on the subquery, although it is being used in the outer query, which is on the same table. I am aware that functions on a field stop MySQL from being able to index, but this is only on the item, not what you are comparing it to.
The result of the report is a number for each specific item (subquery) for each date in the range, where something took place. The date range is generated dynamically. The subquerys should return the results for a single day
We are using MySQL version 5.0.77, which we cannot change as it is managed by our Hosting Provider.
Here is the query:
SELECT DATE_FORMAT(DATEADD, '%d/%m/%y') AS DATEADD,
(SELECT COUNT(ID)
FROM ATABLE AS
WHERE ELEMNAME = 'ANELEMENT' AND COMPID = 132
AND VT.DATEADD BETWEEN CONCAT(DATE(V.DATEADD)," 00:00:00") AND CONCAT(DATE(V.DATEADD)," 23:59:59")))
AS '132',
(SELECT COUNT(ID)
FROM ATABLE AS
WHERE ELEMNAME = 'ANELEMENT' AND COMPID = 149
AND VT.DATEADD BETWEEN CONCAT(DATE(V.DATEADD)," 00:00:00") AND CONCAT(DATE(V.DATEADD)," 23:59:59")))
AS '149'
FROM ATABLE AS V
WHERE 1 = 1 AND COMPID = 132
AND (V.DATEADD >= "2010-09-01 00:00:00"
AND V.DATEADD <= "2010-10-26 23:59:59")
AND 1 = 1
AND ELEMNAME = 'ANELEMENT'
GROUP BY DATE_FORMAT(DATEADD, '%Y-%m-%d')
The number of times the subquery is ran depends on the number of links this item has, and is determined when the query is built.
We have tried:
replacing the between with
"VT.DATEADD <= DATE(V.DATEADD) and VT.DATEADD <= DATE(V.DATEADD) +1"
however this doesnt work either, changing it to
"VT.DATEADD = DATE(V.DATEADD)"
does use the index, however doesnt return the correct number of rows, as DATEADD is a datetime. If we change it to:
"VT.DATEADD >= "2010-09-01" AND VT.DATEADD <= "2010-09-02"
The output from Explain is
+----+--------------------+-------+-------+-------------------------+----------+---------+-------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+-------------------------+----------+---------+-------+-------+----------------------------------------------+
| 1 | PRIMARY | V | range | DATEADD,COMPID,ELEMNAME | DATEADD | 8 | NULL | 1386 | Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | VT | ref | COMPID,ELEMNAME | ELEMNAME | 103 | const | 44277 | Using where |
+----+--------------------+-------+-------+-------------------------+----------+---------+-------+-------+----------------------------------------------+
Using USE INDEX, or FORCE INDEX (when it is available but not used) uses NULL key
without fixing this, the query runs incredibly slowly, even over a tiny date range and locks the database up.

I don't know if I'm over simplifying what you want overall, but will this one work for you. It appears you want to know how much activity for two "compid" values within a given date range.
SELECT
DATE_FORMAT(DATEADD, '%Y-%m-%d'),
SUM( if( compid = 132, 1, 0 ) ) as Count132,
SUM( if( compid = 149, 1, 0 ) ) as Count149
from
ATable
where
elemname = "ANELEMENT"
AND ( compid = 132 or compid = 149 )
AND DATEADD BETWEEN "2010-09-01 00:00:00" AND "2010-10-26 23:59:59"
group by
dateadd

Related

How to get data between start and expiration date if date is not empty or null?

I am trying to select offers between two dates, one of start and one of expiration and in case the expiration date is empty or null it will always show the offers.
Table
+----------------+---------------------+---------------------+
| deal_title | deal_start | deal_expire |
+----------------+---------------------+---------------------+
| Example Deal | 10-24-2021 16:10:00 | 10-25-2021 16:10:00 |
| Example Deal 2 | 10-24-2021 16:10:00 | NULL |
+----------------+---------------------+---------------------+
Php Function to get the current date by timezone.
function getDateByTimeZone(){
$date = new DateTime("now", new DateTimeZone("Europe/London") );
return $date->format('m-d-Y H:i:s');
}
Mysql query:
SELECT deals.*, categories.category_title AS category_title
FROM deals
LEFT JOIN categories ON deal_category = categories.category_id
WHERE deals.deal_status = 1
AND deals.deal_featured = 1
AND deals.deal_start >= '".getDateByTimeZone()."'
AND '".getDateByTimeZone()."' < deals.deal_expire
OR deals.deal_expire IS NULL
OR deals.deal_expire = ''
GROUP BY deals.deal_id ORDER BY deals.deal_created DESC
You didn't really explain what problem you're having. Having written queries like this many times in the past, you likely need parentheses around the expiration side of your date qualifications.
WHERE deals.deal_status = 1
AND deals.deal_featured = 1
AND deals.deal_start >= '".getDateByTimeZone()."'
AND (
'".getDateByTimeZone()."' < deals.deal_expire
OR deals.deal_expire IS NULL
)
If you don't put parentheses around your OR clause, then operator precedence will cause the whole WHERE clause to be true whenever the expire date is NULL and that's not what you want. You want a compounded OR clause here.
I don't think you need to compare against empty string either, just assuming you put that in there trying to figure things out so I left it out in my sample code.
Also I'm not familiar with PHP string interpolation enough to know if there's an issue with the way you're interpolating the result of the 'getDateByTimeZone' function into that query. It looks funky to me based on past experience with PHP, but I'm ignoring that part of it under the assumption that there's something wrapping this code which resolves it correctly.
The best would be to have MySQL datetimes from the start in your database
But you can do all in MySQL.
STR_TO_DATE will cost time every time it runs
When you put around all expire dates a () it will give back a true if youe of them is true
CREATE TABLE deals (
deal_id int,
deal_status int,
deal_featured int,
deal_category int,
`deal_title` VARCHAR(14),
`deal_start` VARCHAR(19),
`deal_expire` VARCHAR(19)
,deal_created DATEtime
);
INSERT INTO deals
(deal_id,deal_status,deal_featured,deal_category,`deal_title`, `deal_start`, `deal_expire`,deal_created)
VALUES
(1,1,1,1,'Example Deal', '10-24-2021 16:10:00', '10-25-2021 16:10:00',NOW()),
(2,1,1,1,'Example Deal 2', '10-24-2021 16:10:00', NULL,NOW());
CREATE TABLE categories (category_id int,category_title varchar(20) )
INSERT INTO categories VALUES(1,'test')
SELECT
deals.deal_id, MIN(`deal_title`), MIN(`deal_start`), MIN(`deal_expire`),MIN(deals.deal_created) as deal_created , MIN(categories.category_title)
FROM
deals
LEFT JOIN
categories ON deal_category = categories.category_id
WHERE
deals.deal_status = 1
AND deals.deal_featured = 1
AND STR_TO_DATE(deals.deal_start, "%m-%d-%Y %H:%i:%s") >= NOW() - INTERVAL 1 DAY
AND (NOW() < STR_TO_DATE(deals.deal_expire, "%m-%d-%Y %H:%i:%s")
OR deals.deal_expire IS NULL
OR deals.deal_expire = '')
GROUP BY deals.deal_id
ORDER BY deal_created DESC
deal_id | MIN(`deal_title`) | MIN(`deal_start`) | MIN(`deal_expire`) | deal_created | MIN(categories.category_title)
------: | :---------------- | :------------------ | :------------------ | :------------------ | :-----------------------------
1 | Example Deal | 10-24-2021 16:10:00 | 10-25-2021 16:10:00 | 2021-10-24 22:42:34 | test
2 | Example Deal 2 | 10-24-2021 16:10:00 | null | 2021-10-24 22:42:34 | test
db<>fiddle here

Unexpectedly slow MYSQL query on newly indexed data

I've got the following query:
SELECT DISTINCT
CONCAT(COALESCE(location.google_id, ''),
'-',
COALESCE(locationData.resolution, ''),
'-',
COALESCE(locationData.time_slice, '')) AS google_id
FROM
LocationData AS locationData
JOIN
Location AS location ON location.id = locationData.location_id
WHERE
location.company_google_id = 5679037876797440
AND location.google_id IN (4679055472328704, 6414382784315392, 5747093579759616)
AND locationData.resolution = 8
AND locationData.time_slice >= ((SELECT max(s.time_slice) FROM LocationData as s WHERE s.location_id = location.id ORDER BY s.time_slice ASC) - 255)
AND location.active = TRUE
ORDER BY location.google_id ASC , locationData.time_slice ASC
LIMIT 0 , 101
I've got indices on all columns in the WHERE and ORDER BY clauses and I've added a compound index for (LocationData.time_slice, LocationData.location_id)
Running explain gives (which gave some challenges formatting here, so hopefully it shows up nicely):
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | PRIMARY | location | range | PRIMARY,google_id_UNIQUE | google_id_UNIQUE | 8 | NULL | 3 | Using index condition; Using where; Using temporary; Using filesort
1 | PRIMARY | locationData | ref | max_time_slice_idx,max_time_slice_idx_desc | max_time_slice_idx | 5 | index2.location.id | 301 | Using where
2 | DEPENDENT SUBQUERY | s | ref | max_time_slice_idx,max_time_slice_idx_desc | max_time_slice_idx | 5 | index2.location.id | 301 | Using index
I know the dependent subquery is slow, and I'm open to suggestions for getting similar behavior, but I'm seeing this query take about 92 seconds to run, which is about 4 orders of magnitude different than test data I ran before adding the new compound index to production.
Is there index building that happens after the ALTER statement is run? Is there some way to check that the index is performing correctly?
Row counts for the two tables:
Production:
Location: 6,814
LocationData: 13,070,888
Test Data:
Location: 626
LocationData: 594,780
Any thoughts or suggestions are appreciated. Thanks in advance!
Just a suggestion
you could avoid the subselect using an inner join
SELECT DISTINCT
CONCAT(COALESCE(location.google_id, ''),
'-',
COALESCE(locationData.resolution, ''),
'-',
COALESCE(locationData.time_slice, '')) AS google_id
FROM LocationData AS locationData
INNER JOIN Location AS location ON location.id = locationData.location_id
INNER JOIN (
SELECT s.location_id, max(s.time_slice) -255 my_max_time_slice
FROM LocationData as s
GROUP BY s.location_id
) t on t.location_id = Location.id
WHERE
location.company_google_id = 5679037876797440
AND location.google_id IN (4679055472328704, 6414382784315392, 5747093579759616)
AND locationData.resolution = 8
AND locationData.time_slice >= t.my_max_time_slice
AND location.active = TRUE
ORDER BY location.google_id ASC , locationData.time_slice ASC
LIMIT 0 , 101
In this way you shoudl avoid the repeteation of the subquery for each id using just one query for buil the aggregated result for the max_time_slice
hope this is useful
(Adding onto the recommendation by #scaisEdge...)
WHERE l.company_google_id = 5679037876797440
AND l.google_id IN (4679055472328704, 6414382784315392, 5747093579759616)
AND ld.resolution = 8
AND ld.time_slice >= t.my_max_time_slice
AND l.active = TRUE
ORDER BY l.google_id ASC , ld.time_slice ASC
Optimal indexes are, assuming the subquery needs to be run first. (This is the case for older versions of MySQL.)
LocationData: (location_id, time_slice) -- in this order, for the subquery
locationData: (time_slice, resolution, location_id) -- for JOIN
If id is the PRIMARY KEY of location, there no extra index is needed there.
For newer versions, the subquery can be materialized and a suitable index can be built. In this case, it would probably start with location:
location: (company_google_id, active, -- in either order
google_id) -- last
locationData: (location_id, time_slice) -- in this order (for subquery)
locationData: (location_id, resolution -- in either order (for JOIN)
time_slice) -- last
There is no way to optimize the ORDER BY since it hits two tables, nor any way to avoid a sort.
Sugges you add all of those indexes, then get EXPLAIN SELECT ... if you need to discuss it further. SHOW CREATE TABLE would also be handy.

Select total of rows, total of field1 = 1, total of field1 = 0 in a single query?

I have the following table :
id | command_id | started_at | ended_at | rows_involved | completed
-----------------------------------------------------------------------------------------------
1 | 1 | 2015-05-20 12:02:25 | 2015-05-20 12:02:28 | 1 | 1
2 | 1 | 2015-05-20 12:02:47 | NULL | NULL | 0
3 | 1 | 2015-05-20 12:11:10 | NULL | NULL | 0
4 | 1 | 2015-05-20 12:11:46 | NULL | NULL | 0
5 | 1 | 2015-05-20 12:12:25 | NULL | NULL | 0
I want to fetch a COUNT of rows where started_at is '2015-05-20' AND commande_id = 1 and I want to get 2 sub totals, 1 is the total of these rows where completed = 1 and 1 is the total of these rows where completed = 0.
Expected data set is then the following :
array(4) {
["totalRows"]=> 5
["name"]=> "evo:send_post_registration_mail_1"
["totalCompleted"] => 1
["totalUncompleted"] => 4
}
The "name" column is not important, is a join with another table on command_id field.
My current query is the following, but it doesn't fetch the 2 subtotals :
SELECT COUNT(s0_.id) AS totalRows, s1_.name AS name
FROM sf_command_executions s0_
INNER JOIN sf_commands s1_ ON (s1_.id = s0_.command_id)
WHERE DATE_FORMAT(s0_.started_at,'%Y-%m-%d') = '2015-05-20'
GROUP BY s0_.command_id
Can I fetch these 2 subtotals within that single query ?
You can use conditional aggregation. Use an expression like this in your SELECT list...
SELECT ...
, SUM(IF(s0_.completed=1,1,0)) AS tot_completed_1
, SUM(IF(s0_.completed=0,1,0)) AS tot_completed_0
You can achieve the same thing using a (more ANSI-standards compliant) CASE expression:
, SUM(CASE WHEN s0_.completed = 1 THEN 1 ELSE 0 END) AS tot_completed_1
Or you can use even shorter MySQL shorthand, since boolean expressions return a value of 1, 0 or NULL:
, SUM(s0_.completed=1) AS tot_completed_1
EDIT
The following doesn't address the question you asked (see above for an answer to the question you asked). But I wanted to point out the predicate on the started_at column (i.e. the WHERE clause).
WHERE DATE_FORMAT(s0_.started_at,'%Y-%m-%d') = '2015-05-20'
^^^^^^^^^^^^ ^^^^^^^^^^^^
The DATE_FORMAT function wrapped around the column reference prevents MySQL from using an index range scan operation to satisfy that predicate.
That is, MySQL has to evaluate that function on every row in the table, and then compare the result from the expression to a literal value.
If started_at is defined as a DATETIME or TIMESTAMP, we can rewrite that to an equivalent condition, but on the bare started_at column. That would allow MySQL to use an index range scan operation. For example, we could get the same rows writing it like this:
WHERE s0_.started_at >= '2015-05-20'
AND s0_.started_at < '2015-05-20' + INTERVAL 1 DAY
If started_at is defined as a DATE, we could reference the bare column with an equality comparison. There's no need for a DATE_FORMAT function.
If we have to use a function to do some sort of conversion so the values can be compared, we'd prefer a function to wrapped around the literal rather than the column reference. Around the literal, that function only has to be evaluated once.
This isn't actually required in this case, but just as an example of wrapping the literal in a function:
WHERE s0_.started_at >= STR_TO_DATE('2015-05-20','%Y-%m-%d')
AND s0_.started_at < STR_TO_DATE('2015-05-20','%Y-%m-%d') + INTERVAL 1 DAY
Note (again) that using the STR_TO_DATE function isn't actually required; this is just demonstrating a pattern. If we did need to do a conversion, we'd prefer that to be on the literal side, rather than on the column, to allow MySQL to make use of an available index on started_at.
You can use conditional sum as
SELECT
COUNT(s0_.id) AS totalRows,
s1_.name AS name ,
sum(s0_.completed=1) as totalCompleted,
sum(s0_.completed=0) as totalUncompleted
FROM sf_command_executions s0_
INNER JOIN sf_commands s1_ ON (s1_.id = s0_.command_id)
WHERE DATE_FORMAT(s0_.started_at,'%Y-%m-%d') = '2015-05-20'
GROUP BY s0_.command_id
Try with this:
SELECT COUNT(s0_.id) AS totalRows, s1_.name AS name,
(select count(S2_.id) from sf_command_executions S2_ where s0_.command_id=S2_.command_id and s2_.completed = 1) AS totalCompleted,
(select count(S2_.id) from sf_command_executions S2_ where s0_.command_id=S2_.command_id and s2_.completed = 0) AS totalUncompleted
FROM sf_command_executions s0_
INNER JOIN sf_commands s1_ ON (s1_.id = s0_.command_id)
WHERE DATE_FORMAT(s0_.started_at,'%Y-%m-%d') = '2015-05-20'
GROUP BY s0_.command_id

Complex query with two tables and multilpe data and price ranges

Let's suppose that I have these tables:
[ properties ]
id (INT, PK)
name (VARCHAR)
[ properties_prices ]
id (INT, PK)
property_id (INT, FK)
date_begin (DATE)
date_end (DATE)
price_per_day (DECIMAL)
price_per_week (DECIMAL)
price_per_month (DECIMAL)
And my visitor runs a search like: List the first 10 (pagination) properties where the price per day (price_per_day field) is between 10 and 100 on the period for 1st may until 31 december
I know thats a huge query, and I need to paginate the results, so I must do all the calculation and login in only one query... that's why i'm here! :)
Questions about the problem
If there are gaps, would that be an acceptable property?
There're no gaps. All the possible dates are in the database.
If the price is between 10 and 100 in some sup-periods, but not in others, do you want to get that property?
In the perfect world, no... We'll need to calculate the "sum" of that type of price in that period considering all the variations/periods.
Also, what are the "first 10"? How are they ordered? Lowest price first? But there could be more than one price.
This is just an example of pagination with 10 results per page... Can be ordered by the FULLTEXT search that I'll add with keywords and these things... As I said, it's a pretty big query.
This is similar to the answer given by #mdma, but I use a condition in the join clause for the price range, instead of the HAVING trick.
SELECT p.id, MAX(p.name),
MIN(v.price_per_day) AS price_low,
MAX(v.price_per_day) AS price_high
FROM properties p
JOIN properties_prices v ON p.id = v.property_id
AND v.price_per_day BETWEEN 10 AND 100
AND v.date_begin < '2010-12-31' AND v.date_end > '2010-05-01'
GROUP BY p.id
ORDER BY ...
LIMIT 10;
I would also recommend creating a covering index:
CREATE INDEX prices_covering ON properties_prices
(property_id, price_per_day, date_begin, date_end);
This allows your query to run as optimally as possible, because it can read the values directly from the index. It won't have to read the rows of data from the table at all.
+----+-------------+-------+-------+-----------------+-----------------+---------+-----------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-----------------+-----------------+---------+-----------+------+--------------------------+
| 1 | SIMPLE | p | index | PRIMARY | PRIMARY | 4 | NULL | 1 | |
| 1 | SIMPLE | v | ref | prices_covering | prices_covering | 4 | test.p.id | 6 | Using where; Using index |
+----+-------------+-------+-------+-----------------+-----------------+---------+-----------+------+--------------------------+
What you tell us is not precise enough. From your data structure and your question I assume:
the price of a property can change in that period, and there would be a properties_price entry for each sub-period
there should be no overlaps in the sub-periods, but the data structure does not guarantee that
there can be gaps in the sub-periods
But there are still questions:
If there are gaps, would that be an acceptable property?
If the price is between 10 and 100 in some sup-periods, but not in others, do you want to get that property?
Also, what are the "first 10"? How are they ordered? Lowest price first? But there could be more than one price.
Depending on the answers, there might be no single query doing the trick. But if you accept the gaps, that could return what you want:
SELECT *
FROM properties AS p
WHERE EXISTS -- property is available in the price range
(SELECT * FROM properties_prices AS pp1
WHERE p.id = pp1.property_id AND
pp1.price_per_day between 10 and 100 AND
(pp1.date_begin <= "2010-12-31" OR pp1.date_end >= "2010-05-01")) AND
NOT EXISTS -- property is in the price range in all sup-periods, but there might be gaps
(SELECT * FROM properties_prices AS pp2
WHERE p.id = pp2.property_id AND
pp2.price_per_day not between 10 and 100 AND
(pp2.date_begin <= "2010-12-31" OR pp2.date_end >= "2010-05-01"))
ORDER BY name --- ???
LIMIT 10
That query doesn't give you the prices or other details. That would need to go in an extra query. But perhaps my assumptions are all wrong anyway.
This can also be done as a GROUP BY, which I think will be quite efficient, and we get some aggregates as part of the package:
SELECT
prperty_id, MIN(price_per_day), MAX(price_per_day)
FROM
properties_prices
WHERE
date_begin <= "2010-12-31" AND date_end >= "2010-05-01"
GROUP BY
property_id
HAVING MIN(IF( (price_per_day BETWEEN 10 AND 100), 1, 0))=1
ORDER BY ...
LIMIT 10
(I don't have MySQL to hand so I haven't tested. I was unsure about the MIN(IF ...) but a mock-up using a CASE worked on SQLServer.)

Optimizing MySQL Aggregation Query

I've got a very large table (~100Million Records) in MySQL that contains information about files. One of the pieces of information is the modified date of each file.
I need to write a query that will count the number of files that fit into specified date ranges. To do that I made a small table that specifies these ranges (all in days) and looks like this:
DateRanges
range_id range_name range_start range_end
1 0-90 0 90
2 91-180 91 180
3 181-365 181 365
4 366-1095 366 1095
5 1096+ 1096 999999999
And wrote a query that looks like this:
SELECT r.range_name, sum(IF((DATEDIFF(CURDATE(),t.file_last_access) > r.range_start and DATEDIFF(CURDATE(),t.file_last_access) < r.range_end),1,0)) as FileCount
FROM `DateRanges` r, `HugeFileTable` t
GROUP BY r.range_name
However, quite predictably, this query takes forever to run. I think that is because I am asking MySQL to go through the HugeFileTable 5 times, each time performing the DATEDIFF() calculation on each file.
What I want to do instead is to go through the HugeFileTable record by record only once, and for each file increment the count in the appropriate range_name running total. I can't figure out how to do that....
Can anyone help out with this?
Thanks.
EDIT: MySQL Version: 5.0.45, Tables are MyISAM
EDIT2: Here's the descibe that was asked for in the comments
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE r ALL NULL NULL NULL NULL 5 Using temporary; Using filesort
1 SIMPLE t ALL NULL NULL NULL NULL 96506321
First, create an index on HugeFileTable.file_last_access.
Then try the following query:
SELECT r.range_name, COUNT(t.file_last_access) as FileCount
FROM `DateRanges` r
JOIN `HugeFileTable` t
ON (t.file_last_access BETWEEN
CURDATE() + INTERVAL r.range_start DAY AND
CURDATE() + INTERVAL r.range_end DAY)
GROUP BY r.range_name;
Here's the EXPLAIN plan that I got when I tried this query on MySQL 5.0.75 (edited down for brevity):
+-------+-------+------------------+----------------------------------------------+
| table | type | key | Extra |
+-------+-------+------------------+----------------------------------------------+
| t | index | file_last_access | Using index; Using temporary; Using filesort |
| r | ALL | NULL | Using where |
+-------+-------+------------------+----------------------------------------------+
It's still not going to perform very well. By using GROUP BY, the query incurs a temporary table, which may be expensive. Not much you can do about that.
But at least this query eliminates the Cartesian product that you had in your original query.
update: Here's another query that uses a correlated subquery but I have eliminated the GROUP BY.
SELECT r.range_name,
(SELECT COUNT(*)
FROM `HugeFileTable` t
WHERE t.file_last_access BETWEEN
CURDATE() - INTERVAL r.range_end DAY AND
CURDATE() - INTERVAL r.range_start DAY
) as FileCount
FROM `DateRanges` r;
The EXPLAIN plan shows no temporary table or filesort (at least with the trivial amount of rows I have in my test tables):
+----+--------------------+-------+-------+------------------+--------------------------+
| id | select_type | table | type | key | Extra |
+----+--------------------+-------+-------+------------------+--------------------------+
| 1 | PRIMARY | r | ALL | NULL | |
| 2 | DEPENDENT SUBQUERY | t | index | file_last_access | Using where; Using index |
+----+--------------------+-------+-------+------------------+--------------------------+
Try this query on your data set and see if it performs better.
Well, start by making sure that file_last_access is an index for the table HugeFileTable.
I'm not sure if this is possible\better, but try to compute the dates limits first (files from date A to date B), then use some query with >= and <=. It will, theoretically at least, improve the performance.
The comparison would be something like:
t.file_last_access >= StartDate AND t.file_last_access <= EndDate
You could get a small improvement by removing CURDATE() and putting a date in the query as it will run this function for each row twice in your SQL.