MySQL - turning data points into ranges - mysql

I have a database of measurements that indicate a sensor, a reading, and the timestamp the reading was taken. The measurements are only recorded when there's a change. I want to generate a result set that shows the range each sensor is reading a particular measurement.
The timestamps are in milliseconds but I'm outputting the result in seconds.
Here's the table:
CREATE TABLE `raw_metric` (
`row_id` BIGINT NOT NULL AUTO_INCREMENT,
`sensor_id` BINARY(6) NOT NULL,
`timestamp` BIGINT NOT NULL,
`angle` FLOAT NOT NULL,
PRIMARY KEY (`row_id`)
)
Right now I'm getting the results I want using a subquery, but it's fairly slow when there's a lot of datapoints:
SELECT row_id,
HEX(sensor_id),
angle,
(
COALESCE((
SELECT MIN(`timestamp`)
FROM raw_metric AS rm2
WHERE rm2.`timestamp` > rm1.`timestamp`
AND rm2.sensor_id = rm1.sensor_id
), UNIX_TIMESTAMP() * 1000) - `timestamp`
) / 1000 AS duration
FROM raw_metric AS rm1
Essentially, to get the range, I need to get the very next reading (or use the current time if there isn't another reading). The subquery finds the minimum timestamp that is later than the current one but is from the same sensor.
This query isn't going to occur very often so I'd prefer to not have to add an index on the timestamp column and slow down inserts. I was hoping someone might have a suggestion as to an alternate way of doing this.
UPDATE:
The row_id's should be incremented along with timestamps but it can't be guaranteed due to network latency issues. So, it's possible that an entry with a lower row_id comes occurs AFTER a later row_id, though unlikely.

This is perhaps more appropriate as a comment than as a solution, but it is too long for a comment.
You are trying to implement the lead() function in MySQL, and MySQL does not, unfortunately, have window functions. You could switch to Oracle, DB2, Postgres, SQL Server 2012 and use the built-in (and optimized) functionality there. Ok, that may not be realistic.
So, given your data structure you need to do either a correlated subquery or a non-equijoin (actually a partial equi-join because there is match on sensor_id). These are going to be expensive operations, unless you add an index. Unless you are adding measurements tens of times per second, the additional overhead on the index should not be a big deal.
You could also change your data structure. If you had a "sensor counter" that was a sequential number enumerating the readings, then you could use this as an equijoin (although for good performance you might still want an index). Adding this in to your table would require having a trigger -- and that is likely to perform even worse than an index for when inserting.
If you only have a handful of sensors, you could create a separate table for each one. Oh, I can feel the groans at this suggestion. But, if you did, then an auto-incremented id would perform the same role. To be honest, I would only do this if I could count the number of sensors on each hand.
In the end, I might suggest that you take the hit during insertion and have "effective" and "end' times on each record (as well as an index on sensor id and either timestamp or id). With these additional columns, you will probably find more uses for the table.
If you are doing this for just one sensor, then create a temoprary table for the information and use an auto-incremented id column. Then insert the data into it:
insert into temp_rawmetric (orig_row_id, sensor_id, timestamp, angle)
select orig_row_id, sensor_id, timestamp, angle
from raw_metric
order by sensor_id, timestamp;
Be sure your table has a temp_rawmetric_id column that is auto-incremented and the primary key (creates an index automatically). The order by makes sure this is incremented according to the timestamp.
Then you can do your query as:
select trm.sensor_id, trm.angle,
trm.timestamp as startTime, trmnext.timestamp as endTime
from temp_rawmetric trm left outer join
temp_rawmetric trmnext
on trmnext.temp_rawmetric_id = trm.temp_rawmetric_id+1;
This will require a pass through the original data to extra the data, and then a primary key join on the temporary table. The first might take some time. The second should be pretty quick.

Select rm1.row_id
,HEX(rm1.sensor_id)
,rm1.angle
,(COALESCE(rm2.timestamp, UNIX_TIMESTAMP() * 1000) - rm1.timestamp) as duration
from raw_metric rm1
left outer join
raw_metric rm2
on rm2.sensor_id = rm1.sensor_id
and rm2.timestamp = (
select min(timestamp)
from raw_metric rm3
where rm3.sensor_id = rm1.sensor_id
and rm3.timestamp > rm1.timestamp
)

If you use auto_increment for primary key, you may replace timestamp by row_id in query condition part. Like this:
SELECT row_id,
HEX(sensor_id),
angle,
(
COALESCE((
SELECT MIN(`timestamp`)
FROM raw_metric AS rm2
WHERE rm2.`row_id` > rm1.`row_id`
AND rm2.sensor_id = rm1.sensor_id
), UNIX_TIMESTAMP() * 1000) - `timestamp`
) / 1000 AS duration
FROM raw_metric AS rm1
It must work some quickly.
Also you can add one more subquery for fast select row id of new senser value. See:
SELECT row_id,
HEX(sensor_id),
angle,
(
COALESCE((
SELECT timestamp FROM raw_metric AS rm1a
WHERE row_id =
(
SELECT MIN(`row_id`)
FROM raw_metric AS rm2
WHERE rm2.`row_id` > rm1.`row_id`
AND rm2.sensor_id = rm1.sensor_id
)
), UNIX_TIMESTAMP() * 1000) - `timestamp`
) / 1000 AS duration
FROM raw_metric AS rm1

Related

MySQL BETWEEN query not using index

I have geoip data in a table, network_start_ip and network_end_ip are varbinary(16) columns with the result of INET6_ATON(ip_start/end) as values. 2 other columns are latitude and longitude.
CREATE TABLE `ipblocks` (
`network_start_ip` varbinary(16) NOT NULL,
`network_last_ip` varbinary(16) NOT NULL,
`latitude` double NOT NULL,
`longitude` double NOT NULL,
KEY `network_start_ip` (`network_start_ip`),
KEY `network_last_ip` (`network_last_ip`),
KEY `idx_range` (`network_start_ip`,`network_last_ip`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
As you can see I have created 3 indexes for testing. Why does my (quite simple) query
SELECT
latitude, longitude
FROM
ipblocks b
WHERE
INET6_ATON('82.207.219.33') BETWEEN b.network_start_ip AND b.network_last_ip
not use any these indexes?
The query takes ~3 seconds which is way too long to use it in production.
It doesn't work because there are two columns referenced -- and that is really hard to optimize. Assuming that there are no overlapping IP ranges, you can restructure the query as:
SELECT b.*
FROM (SELECT b.*
FROM ipblocks b
WHERE b.network_start_ip <= INET6_ATON('82.207.219.33')
ORDER BY b.network_start_ip DESC
LIMIT 1
) b
WHERE INET6_ATON('82.207.219.33') <= network_last_ip;
The inner query should use an index on ipblocks(network_start_ip). The outer query is only comparing one row, so it does not need any index.
Or as:
SELECT b.*
FROM (SELECT b.*
FROM ipblocks b
WHERE b.network_last_ip >= INET6_ATON('82.207.219.33')
ORDER BY b.network_end_ip ASC
LIMIT 1
) b
WHERE network_last_ip <= INET6_ATON('82.207.219.33');
This would use an index on (network_last_ip). MySQL (and I think MariaDB) does a better job with ascending sorts than descending sorts.
Thanks to Gordon Linoff I found the optimal query for my question.
SELECT b.* FROM
(SELECT b.* FROM ipblocks b WHERE b.network_start_ip <= INET6_ATON('82.207.219.33')
ORDER BY b.network_start_ip DESC LIMIT 1 )
b WHERE INET6_ATON('82.207.219.33') <= network_last_ip
Now we select the blocks smaller than INET6_ATON(82.207.219.33) in the inner query but we order them descending which enables us to use the LIMIT 1 again.
Query response time is now .002 to .004 seconds. Great!
Does this query give you correct results? Your start/end IPs seem to be stored as a binary string while you're searching for an integer representation.
I would first make sure that network_start_ip and network_last_ip are unsigned INT fields with the integer representation of the IP addresses. This is assuming that you work with IPv4 only:
CREATE TABLE ipblocks_int AS
SELECT
INET_ATON(network_start_ip) as network_start_ip,
INET_ATON(network_last_ip) as network_last_ip,
latitude,
longitude
FROM ipblocks
Then use (network_start_ip,network_last_ip) as primary key.
It's a tough problem. There is no simple solution.
The reason it is tough is that it is effectively
start <= 123 AND
last >= 123
Regardless of what indexes are available, the Optimizer will work with one or the other of those. With INDEX(start, ...), it will pick start <= 123 it will scan the first part of the index. Similarly for the other clause. One of those scans more than half the index, the other scans less -- but not enough less to be worth using an index. Moving it into the PRIMARY KEY will help with some cases, but it is hardly worth the effort.
Bottom line, not matter what you do in the way of INDEX or PRIMARY KEY, most IP constants will lead to more than 1.5 seconds for the query.
Do your start/last IP ranges overlap? If so, that adds complexity. In particular, overlaps would probably invalidate Gordon's LIMIT 1.
My solution involves requires non-overlapping regions. Any gaps in IPs necessitate 'unowned' ranges of IPs. This is because there is only a start_ip; the last_ip is implied by being less than the start of the next item in the table. See http://mysql.rjweb.org/doc.php/ipranges (It includes code for IPv4 and for IPv6.)
Meanwhile, DOUBLE for lat/lng is overkill: http://mysql.rjweb.org/doc.php/latlng#representation_choices

Seek paginated query gets progressively slower on a big table

I have a biggish table of events. (5.3 million rows at the moment). I need to traverse this table mostly from the beginning to the end in a linear fashion. Mostly no random seeks. The data currently includes about 5 days of these events.
Due to the size of the table I need to paginate the results, and the internet tells me that "seek pagination" is the best method.
However this method works great and fast for traversing the first 3 days, after this mysql really begins to slow down. I've figured out it must be something io-bound as my cpu usage actually falls as the slowdown starts.
I do belive this has something to do with the 2-column sorting I do, and the usage of filesort, maybe Mysql needs to read all the rows to sort my results or something. Indexing correctly might be a proper fix, but I've yet been unable to find an index that solves my problem.
The compexifying part of this database is the fact that the ids and timestamps are NOT perfectly in order. The software requires the data to be ordered by timestamps. However when adding data to this database, some events are added 1 minute after they have actually happened, so the autoincremented ids are not in the chronological order.
As of now, the slowdown is so bad that my 5-day traversal never finishes. It just gets slower and slower...
I've tried indexing the table on multiple ways, but mysql does not seem to want to use those indexes and EXPLAIN keeps showing "filesort". Indexing is used on the where-statement though.
The workaround I'm currently using is to first do a full table traversal and load all the row ids and timestamps in memory. I sort the rows in the python side of the software and then load the full data in smaller chunks from mysql as I traverse (by ids only). This works fine, but is quite unefficient due to the total of 2 traversals of the same data.
The schema of the table:
CREATE TABLE `events` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`server` varchar(45) DEFAULT NULL,
`software` varchar(45) DEFAULT NULL,
`timestamp` bigint(20) DEFAULT NULL,
`data` text,
`event_type` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index3` (`timestamp`,`server`,`software`,`id`),
KEY `index_ts` (`timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=7410472 DEFAULT CHARSET=latin1;
The query (one possible line):
SELECT software,
server,
timestamp,
id,
event_type,
data
FROM events
WHERE ( server = 'a58b'
AND ( software IS NULL
OR software IN ( 'ASD', 'WASD' ) ) )
AND ( timestamp, id ) > ( 100, 100 )
AND timestamp <= 200
ORDER BY timestamp ASC,
id ASC
LIMIT 100;
The query is based on https://blog.jooq.org/2013/10/26/faster-sql-paging-with-jooq-using-the-seek-method/ (and some other postings with the same idea). I belive it is called "seek pagination with seek predicate". The basic gist is that I have a starting timestamp and ending timestamp, and I need to get all the events with the software on the servers I've specifed OR only the server-specific events (software = NULL). The weirdish ( )-stuff is due tho python constructing the queries based on the parameters it is given. I left them visible if by a small chance they might have some effect.
I'm excepting the traversal to finish before the heat death of the universe.
First change
AND ( timestamp, id ) > ( 100, 100 )
to
AND (timestamp > 100 OR timestamp = 100 AND id > 100)
This optimisation is suggested in the official documentation: Row Constructor Expression Optimization
Now the engine will be able to use the index on (timestamp). Depending on cardinality of the columns server and software, that could be already fast enough.
An index on (server, timestamp, id) should improve the performance farther.
If still not fast enough, i would suggest a UNION optimization for
AND (software IS NULL OR software IN ('ASD', 'WASD'))
That would be:
(
SELECT software, server, timestamp, id, event_type, data
FROM events
WHERE server = 'a58b'
AND software IS NULL
AND (timestamp > 100 OR timestamp = 100 AND id > 100)
AND timestamp <= 200
ORDER BY timestamp ASC, id ASC
LIMIT 100
) UNION ALL (
SELECT software, server, timestamp, id, event_type, data
FROM events
WHERE server = 'a58b'
AND software = 'ASD'
AND (timestamp > 100 OR timestamp = 100 AND id > 100)
AND timestamp <= 200
ORDER BY timestamp ASC, id ASC
LIMIT 100
) UNION ALL (
SELECT software, server, timestamp, id, event_type, data
FROM events
WHERE server = 'a58b'
AND software = 'WASD'
AND (timestamp > 100 OR timestamp = 100 AND id > 100)
AND timestamp <= 200
ORDER BY timestamp ASC, id ASC
LIMIT 100
)
ORDER BY timestamp ASC, id ASC
LIMIT 100
You will need to create an index on (server, software, timestamp, id) for this query.
There are multiple complications going on.
The quick fix is
INDEX(software, timestamp, id) -- in this order
together with
WHERE server = 'a58b'
AND timestamp BETWEEN 100 AND 200
AND ( software IS NULL
OR software IN ( 'ASD', 'WASD' ) ) )
AND ( timestamp, id ) > ( 100, 100 )
ORDER BY timestamp ASC,
id ASC
LIMIT 100;
Note that server needs to be first in the index, not after the thing you are doing a range on (timestamp). Also, I broke out timestamp BETWEEN ... to make it clear to the optimizer that the next column of the ORDER BY might make use of the index.
You said "pagination", so I assume you have an OFFSET, too? Add it back in so we can discuss the implications. My blog on "remembering where you left off" instead of using OFFSET may (or may not) be practical.

Very slow SQL ORDER BY, and EXPLAIN doesn't explain

The query:
SELECT
m.*,
mic.*
FROM
members m,
members_in_class_activities mic
WHERE
m.id = mic.member_id AND
mic.not_cancelled = '1' AND
mic.class_activity_id = '32' AND
mic.date = '2016-02-14' AND
mic.time = '11:00:00'
ORDER BY
mic.reservation_order
Table members has ~ 100k records, and table members_in_class_activities has about 300k. The result set is just 2 records. (Not 2k, just 2.)
All relevant columns are indexed (id and reservation_order are primary): member_id, not_cancelled, class_activity_id, date, time.
There is also a UNIQUE key for class_activity_id + member_id + date + time + not_cancelled. not_cancelled is NULL or '1'.
All other queries are very fast (1-5 ms), but this one is crazy slow: 600-1000 ms.
What doesn't help:
select only the 2 primary keys instead of * (0 % change)
a real JOIN instead of an implicit join (it actually seems slightly slower, but probably not) (0 % change)
removing the join to members entirely makes it slightly faster (15 % change)
What does help, immensely:
remove the ORDER BY on the primary key (99% change)
I have only 2 questions:
What????
How do I still ORDER BY x, but make it fast too? Do I really need a separate column?
I'm running 10.1.9-MariaDB on my dev machine, but it's slow on production's MySQL 5.5.27-log too.
Dont use order by on your main query. Try this :
SELECT * FROM (
... your query
) ORDER BY mic.reservation_order
As you mentioned that members_in_class_activities has about 300k records, so your order by will apply on all 300k records that definitely slow down your query.
class_activity_id + member_id + date + time + not_cancelled -- not optimal.
Start with the '=' fields in the WHERE in any order, then add on the ORDER BY field:
INDEX(class_activity_id, date, time, not_cancelled,
reservation_order)
Since you seem to need the UNIQUE constraint, then it would be almost as good to shuffle your index by putting member_id at the end (where it will be 'out of the way', but not used):
UNIQUE(class_activity_id, date, time, not_cancelled, member_id)
In general, it is bad to split a date and time apart; suggest a DATETIME column for the pair.
Cookbook on creating indexes.

search by date mysql performance

I have a large table with about 100 million records, with fields start_date and end_date, with DATE type. I need to check the number of overlaps with some date range, say between 2013-08-20 AND 2013-08-30, So I use.
SELECT COUNT(*) FROM myTable WHERE end_date >= '2013-08-20'
AND start_date <= '2013-08-30'
date column are indexed.
The important points is that the date ranges that I am searching for overlap are always in the future, while the main part of the records in the table are in the past (say about 97-99 million).
So, will this query be faster, if I add a column is_future - TINYINT, so, by checking only that condition like this
SELECT COUNT(*) FROM myTable WHERE is_future = 1
AND end_date >= '2013-08-20' AND start_date <= '2013-08-30'
it will exclude the rest 97 million or so records and will check the date condition for only the remaining 1-3 million records ?
I use MySQL
Thanks
EDIT
The mysql engine is innodb, but will matter considerably if it is say, MyISAM
here is the create table
CREATE TABLE `orders` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`title`
`start_date` date DEFAULT NULL,
`end_date` date DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=24 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
EDIT 2 after #Robert Co answer
The partitioning looks like a good idea for this case, but it does not allow me to create partition based on is_future field unless I define it as primary key, otherwise I should remove my main primary key - id, which I can not do. So, if I define that field as primary key, then is there a meaning of partitioning, will not it be fast already if I search by is_future field which is primary key.
EDIT 3
The actual query where I need to use this is to select restaurant that have some free tables for that date range
SELECT r.id, r.name, r.table_count
FROM restaurants r
LEFT JOIN orders o
ON r.id = o.restaurant_id
WHERE o.id IS NULL
OR (r.table_count > (SELECT COUNT(*)
FROM orders o2
WHERE o2.restaurant_id = r.id AND
end_date >= '2013-08-20' AND start_date <= '2013-08-30'
AND o2.status = 1
)
)
SOLUTION
After a lot more research and testing the fastest way for counting the number of rows in my case was to just add one more condition, that start_date is more than current date (because the date ranges for search are always in the future)
SELECT COUNT(*) FROM myTable WHERE end_date >= '2013-09-01'
AND start_date >= '2013-08-20' AND start_date <= '2013-09-30'
also it is necessary to have one index - with start_date and end_date fields (thank you #symcbean).
As a result the execution time on table with 10m rows from 7 seconds - became 0.050 seconds.
SOLUTION 2 (#Robert Co)
partitioning in this case worked as well !! - perhaps it is better solution than indexing. Or they can both be applied together.
Thanks
This is a perfect use case for
table partitioning. If the Oracle INTERVAL feature makes it to MySQL, then it will just add to the awesomeness.
date column are indexed
What type of index? A hash based index is no use for range queries. If it's not a BTREE index then change it now. And you've not shown us *how they are indexed. Are both columns in the same index? Is there other stuff in there too? What order (end_date must appear as the first column)?
There are implicit type conversions in the script - this should be handled automatically by the optimizer, but it's worth checking....
SELECT COUNT(*) FROM myTable WHERE end_date >= 20130820000000
AND start_date <= 20130830235959
if I add a column is_future - TINYINT
First, in order to be of any use, this would require that the future dates be a small proportion of the total data stored in the table (less than 10%). And that's just to make it more efficient than a full table scan.
Secondly, it's going to require very frequent updates to the index to maintain it, which in addition to the overhead of initial populatiopn is likely to lead to fragmentation of the index and degraded performance (depending on how the iondex is constructed).
Thirdly, if this still has to process 3 million rows of data (and specifically, via an index lookup) then it's going to be very slow even with the data pegged in memory.
Further, the optimizer is never likely to use this index without being forced to (due to the low cardinality).
I have done a simple test, just created an index on the tinyint column. The structures may not be the same, but with an index it seems to work.
http://www.sqlfiddle.com/#!2/514ab/1/0
and for count
http://www.sqlfiddle.com/#!2/514ab/2/0
View execution plan there to see that the select just scans one row which means it would process only the lesser number of records in your case.
So the simple answer is yes, with an index it would work.

MySQL, return all measurements and results within X last hours

This question is very much related to my previous question: MySQL, return all results within X last hours altough with additional significant constraint:
Now i have 2 tables, one for measurements and one for classified results for part of the measurements.
measurements are constantly arrive so as result, that are constantly added after classification of new measurements.
results will not necessarily be stored in the same order of measurement's arrive and store order!
I am interested only to present the last results. By last i mean to take the max time (the time is a part of the measurement structure) of last available result call it Y and a range of X seconds , and present the measurements together with the available results in the range beteen Y and Y-X.
The following are the structure of 2 tables:
event table:
CREATE TABLE `event_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`Feature` char(256) NOT NULL,
`UnixTimeStamp` int(10) unsigned NOT NULL,
`Value` double NOT NULL,
KEY `ix_filter` (`Feature`),
KEY `ix_time` (`UnixTimeStamp`),
KEY `id_index` (`id`)
) ENGINE=MyISAM
classified results table:
CREATE TABLE `event_results` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`level` enum('NORMAL','SUSPICIOUS') DEFAULT NULL,
`score` double DEFAULT NULL,
`eventId` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `eventId_index` (`eventId`)
) ENGINE=MyISAM
I can't query for the last measurements timestamp first since i want to present measurements for which there are currently results, and since measurements arrive constantly, results may still not be available.
Therefore i thought of joining the two tables using
event_results.eventId=event_data.id and than selecting the max time of event_data.UnixTimeStamp as maxTime , after i have the maxTime, i need to do the same opearation again (joining 2 tables) and adding in a where clause a condition
WHERE event_data.UnixTimeStamp >= maxTime + INTERVAL -X SECOND
It seems to be not efficient to execute 2 joins only to achieve what i am asking, Do you have more ef
From my understanding, you are using an aggregate function, MAX. This will produce a record set of size one as a result, which is the highest time from which you will perform. Therefore, it needs to be broken out into a sub query (As you say, nested select). You HAVE to do 2 queries at some point. (Your answer to the last question has 2 queries in it, by having subqueries/nested selects).
The main time sub queries cause problems is when you perform the subquery in the select part of the query, as it performs the subquery for each time there is a row, which will make the query run exponentially slower as the resultset grows. Lets take the answer to your last question and write it in a horrible, inefficient way:
SELECT timeStart,
(SELECT max(timeStart) FROM events) AS maxTime
FROM events
WHERE timeStart > (maxTime + INTERVAL -1 SECOND)
This will perform a select query for each time there is an eventTime record, for the max eventtime. It should produce the same result, but this is slow. This is where the fear of subqueries comes from.
It also performs the aggregate function MAX on each row, which will return the same answer each time. So, you perform that sub query ONCE rather than on each row.
However, in the case of the answer of your last question, the MAX sub query part is ran once, and used to filter on the where, of which that select is ran once. So, in total, 2 queries are ran.
2 super fast queries are faster ran one after the other than 1 super slow query that is super slow.
I'm not entirely sure what resultset you want returned, so I am going to make some assumptions. Please feel free to correct any assumptions I've made.
It sounds (to me) like you want ALL rows from event_data that are within an hour (or however many seconds) of the absolute "latest" timestamp, and along with those rows, you also want to return any related rows from event_results, if any matching rows are available.
If that's the case, then using an inline view to retrieve the maximum value of timestamp is the way to go. (That operation will be very efficient, since the query will be returning a single row, and it can be efficiently retrieved from an existing index.)
Since you want all rows from a specified period of time (from the "latest time" back to "latest time minus X seconds"), we can go ahead and calculate the starting timestamp of the period in that same query. Here we assume you want to "go back" one hour (=60*60 seconds):
SELECT MAX(UnixTimeStamp) - 3600 FROM event_data
NOTE: the expression in the SELECT list above is based on UnixTimeStamp column defined as integer type, rather than as a DATETIME or TIMESTAMP datatype. If the column were defined as DATETIME or TIMESTAMP datatype, we would likely express that with something like this:
SELECT MAX(mydatetime) + INTERVAL -3600 SECONDS
(We could specify the interval units in minutes, hours, etc.)
We can use the result from that query in another query. To do that in the same query text, we simply wrap that query in parentheses, and reference it as a rowsource, as if that query were an actual table. This allows us to get all the rows from event_data that are within in the specified time period, like this:
SELECT d.id
, d.Feature
, d.UnixTimeStamp
, d.Value
JOIN ( SELECT MAX(l.UnixTimeStamp) - 3600 AS from_unixtimestamp
FROM event_data l
) m
JOIN event_data d
ON d.UnixTimetamp >= m.from_unixtimestamp
In this particular case, there's no need for an upper bound predicate on UnixTimeStamp column in the outer query. This is because we already know there are no values of UnixTimeStamp that are greater than the MAX(UnixTimeStamp), which is the upper bound of the period we are interested in.
(We could add an expression to the SELECT list of the inline view, to return MAX(l.UnixTimeStamp) AS to_unixtimestamp, and then include a predicate like AND d.UnixTimeStamp <= m.to_unixtimestamp in the outer query, but that would be unnecessarily redundant.)
You also specified a requirement to return information from the event_results table.
I believe you said that you wanted any related rows that are "available". This suggests (to me) that if no matching row is "available" from event_results, you still want to return the row from the event_data table.
We can use a LEFT JOIN operation to get that to happen:
SELECT d.id
, d.Feature
, d.UnixTimeStamp
, d.Value
, r.id
, r.level
, r.score
, r.eventId
JOIN ( SELECT MAX(l.UnixTimeStamp) - 3600 AS from_unixtimestamp
FROM event_data l
) m
JOIN event_data d
ON d.UnixTimetamp >= m.from_unixtimestamp
LEFT
JOIN event_results r
ON r.eventId = d.id
Since there is no unique constraint on the eventID column in the event_results table, there is a possibility that more than one "matching" row from event_results will be found. Whenever that happens, the row from event_data table will be repeated, once for each matching row from event_results.
If there is no matching row from event_results, then the row from event_data will still be returned, but with the columns from the event_results table set to NULL.
For performance, remove any columns from the SELECT list that you don't need returned, and be judicious in your choice of expressions in an ORDER BY clause. (The addition of a covering index may improve performance.)
For the statement as written above, MySQL is likely to use the ix_time index on the event_data table, and the eventId_index index on the event_results table.