SQL query on timestamp interval: optimization - mysql

I am getting stuck by the execution time of a query. I have a table (no written by me) with a lot of rows (4mio) and a column representing the timestamp.
I want to do a query that will keep only the datas between two given timestamp.
I am currently using :
SELECT * FROM myTable WHERE timestamp BETWEEN "x" AND "y"
This query takes approximatively 11 sec to return 4000 row, while the query without the WHERE statement but a limit of 50'000 rows is executed in less than 0.1 sec. I am aware of the fact that with the WHERE statement, more rows are tested.
Because the timestamp is always increasing, is there any way to stop the query if the upperbound of the timestamp is reached? Or another way to run the same query much more faster?
Thank you very much
Kilian

WHERE NumberModule=24 AND Timestamp BETWEEN 40764 AND 40772
Needs a different index:
INDEX(NumberModule, Timestamp)
My Index Cookbook discusses why.
Please provide SHOW CREATE TABLE so we can see all the indexes, plus the ENGINE.

Add an BTREE index (best way to deal with BETWEEN, see here):
ALTER TABLE myTable ADD INDEX myIdx USING BTREE (`timestamp`)
Please post the result of
EXPLAIN SELECT * FROM myTable WHERE `timestamp` BETWEEN "x" AND "y"
after that.

Related

How to optimise mysql query as Full ProcessList is showing Sending Data for over 24 hours

I have the following query that runs forever and I am looking to see if there is anyway that I can optimise it. This is running on a table that has in total 1,406,480 rows of data but apart from the Filename and Refcolumn, the ID and End_Date have both been indexed.
My Query:
INSERT INTO UniqueIDs
(
SELECT
T1.ID
FROM
master_table T1
LEFT JOIN
master_table T2
ON
(
T1.Ref_No = T2.Ref_No
AND
T1.End_Date = T2.End_Date
AND
T1.Filename = T2.Filename
AND
T1.ID > T2.ID
)
WHERE T2.ID IS NULL
AND
LENGTH(T1.Ref_No) BETWEEN 5 AND 10
)
;
Explain Results:
The reason for not indexing the Ref_No is that this is a text column and therefore I get a BLOB/TEXT error when I try and index this column.
Would really appreciate if somebody could advise on how I can quicken this query.
Thanks
Thanks to Bill in regards to multi column indexes I have managed to make some headway. I first ran this code:
CREATE INDEX I_DELETE_DUPS ON master_table(id, End_Date);
I then added a new column to show the length of the Ref_No but had to change it from the query Bill mentioned as my version of MySQL is 5.5. So I ran it in 3 steps:
ALTER TABLE master_table
ADD COLUMN Ref_No_length SMALLINT UNSIGNED;
UPDATE master_table SET Ref_No_length = LENGTH(Ref_No);
ALTER TABLE master_table ADD INDEX (Ref_No_length);
Last step was to change my insert query with the where clause for the length. This was changed to:
AND t1.Ref_No_length between 5 and 10;
I then ran this query and within 15 mins I had 280k worth of id's inserted into my UniqueIDs table. I did go change my insert script to see if I could add more values to the length by doing the following:
AND t1.Ref_No_length IN (5,6,7,8,9,10,13);
This was to bring in the values where length was also equal to 13. This query took a lot longer, 2hr 50 mins to be precise but the additional ask of looking for all rows that have length of 13 gave me an extra 700k unique ids.
I am looking at ways to optimise the query with the IN clause, but a big improvement where this query kept running for 24 hours. So thank you so much Bill.
For the JOIN, you should have a multi-column index on (Ref_No, End_Date, Filename).
You can create a prefix index on a TEXT column like this:
ALTER TABLE master_table ADD INDEX (Ref_No(10));
But that won't help you search based on the LENGTH(). Indexing only helps search by value indexed, not by functions on the column.
In MySQL 5.7 or later, you can create a virtual column like this, with an index on the values calculated for the virtual column:
ALTER TABLE master_table
ADD COLUMN Ref_No_length SMALLINT UNSIGNED AS (LENGTH(Ref_No)),
ADD INDEX (Ref_No_length);
Then MySQL will recognize that your condition in your query is the same as the expression for the virtual column, and it will automatically use the index (exception: in my experience, this doesn't work for expressions using JSON functions).
But this is no guarantee that the index will help. If most of the rows match the condition of the length being between 5 and 10, the optimizer will not bother with the index. It may be more work to use the index than to do a table-scan.
the ID and End_Date have both been indexed.
You have PRIMARY KEY(id) and redundantly INDEX(id)? A PK is a unique key.
"have both been indexed" -- INDEX(a), INDEX(b) is not the same as INDEX(a,b) -- they have different uses. Read about "composite" indexes.
That query smells a lot like "group-wise" max done in a very slow way. (Alas, that may have come from the online docs.)
I have compiled the fastest ways to do that task here: http://mysql.rjweb.org/doc.php/groupwise_max (There are multiple versions, based on MySQL version and what issues your code can/cannot tolerate.)
Please provide SHOW CREATE TABLE. One important question: Is id the PRIMARY KEY?
This composite index may be useful:
(Filename, End_Date, Ref_No, -- first, in any order
ID) -- last
This, as others have noted, is unlikely to be helped by any index, hence T1 will need a full-table-scan:
AND LENGTH(T1.Ref_No) BETWEEN 5 AND 10
If Ref_No cannot be bigger than 191 characters, change it to a VARCHAR so that it can be used in an index. Oh, did I ask for SHOW CREATE TABLE? If you can't make it VARCHAR, then my recommended composite index is
INDEX(Filename, End_Date, ID)

How to optimize a MySQL query which is taking 10 sec to fetch results

I have a MySQL database. I have a table in it which has around 200000 rows.
I am querying through this table to fetch the latest data.Query
select *
from `db`.`Data`
where
floor = "floor_value" and
date = "date_value" and
timestamp > "time_value"
order by
timestamp DESC
limit 1
It is taking about 9 sec to fetch the data, when the number of rows in the table were less, it did not take this long to fetch the data. Can anyone help me with how do I reduce the time taken for the query?
Try adding the following compound index:
CREATE INDEX idx ON Data (floor, date, timestamp);
This index should cover the entire WHERE clause and also ideally should be usable for the ORDER BY clause. The reason why timestamp appears last in the index is that this allows for generating a final set of matching timestamp values by scanning the index. Had we put timestamp first, MySQL might have to seek back to the clustered index to find the set of matching timestamp values.

Most efficient query to get last modified record in large table

I have a table with a large number of records ( > 300,000). The most relevant fields in the table are:
CREATE_DATE
MOD_DATE
Those are updated every time a record is added or updated.
I now need to query this table to find the date of the record that was modified last. I'm currently using
SELECT mod_date FROM table ORDER BY mod_date DESC LIMIT 1;
But I'm wondering if this is the most efficient way to get the answer.
I've tried adding a where clause to limit the date to the last month, but it looks like that's actually slower (and I need the most recent date, which could be older than the last month).
I've also tried the suggestion I read elsewhere to use:
SELECT UPDATE_TIME
FROM information_schema.tables
WHERE TABLE_SCHEMA = 'db'
AND TABLE_NAME = 'table';
But since I might be working on a dump of the original that query might result into NULL. And it looks like this is actually slower than the original query.
I can't resort to last_insert_id() because I'm not updating or inserting.
I just want to make sure I have the most efficient query possible.
The most efficient way for this query would be to use an index for the column MOD_DATE.
From How MySQL Uses Indexes
8.3.1 How MySQL Uses Indexes
Indexes are used to find rows with specific column values quickly.
Without an index, MySQL must begin with the first row and then read
through the entire table to find the relevant rows. The larger the
table, the more this costs. If the table has an index for the columns
in question, MySQL can quickly determine the position to seek to in
the middle of the data file without having to look at all the data. If
a table has 1,000 rows, this is at least 100 times faster than reading
sequentially.
You can use
SHOW CREATE TABLE UPDATE_TIME;
to get the CREATE statement and see, if an index on MOD_DATE is defined.
To add an Index you can use
CREATE INDEX
CREATE [UNIQUE|FULLTEXT|SPATIAL] INDEX index_name
[index_type]
ON tbl_name (index_col_name,...)
[index_option]
[algorithm_option | lock_option] ...
see http://dev.mysql.com/doc/refman/5.6/en/create-index.html
Make sure that both of those fields are indexed.
Then I would just run -
select max(mod_date) from table
or create_date, whichever one.
Make sure to create 2 indexes, one on each date field, not a compound index on both.
As for a discussion of the difference between this and using limit, see MIN/MAX vs ORDER BY and LIMIT
Use EXPLAIN:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
This tells You how mysql executes statement, thanks to that You can figure out most efficient way, cause it depends on Your db structure and there is no one universal solution.

How to optimize MySQL query ‘SELECT * from table WHERE Date=CURDATE() and ID=1;’

I have the following data in a MySQL table table:
ID: int(11) [this is the primary key]
Date: date
and I run the MySQL query:
SELECT * from table WHERE Date=CURDATE() and ID=1;
This takes between 0.6 and 1.2 seconds.
Is there any way to optimize this query to get results quicker?
My objective is to find out if I already have a record for today for this ID.
Add indexes on ID and Date.
See CREATE INDEX manual.
You could add a limit 1 at the end, since you are searching for a primary key the max results is 1.
And if you only want to know wether it exists or not you could replace * with ID to select only the ID.
Furthermore, if you haven't already, you really need to add indexes.
SET #cur_date = CURDATE()
...WHERE Date = #cur_date ...
and then create an index of Date, ID (order is important, it should match the order you query on).
In general, calling functions before you do the query and storing them to variables lets SQL treat them like numbers instead of functions, which tends to allow it to use a faster query algorithm.

Slow MySQL query

Hey I have a very slow MySQL query. I'm sure all I need to do is add the correct index but all the things I try don't work.
The query is:
SELECT DATE(DateTime) as 'SpeedDate', avg(LoadTime) as 'LoadTime'
FROM SpeedMonitor
GROUP BY Date(DateTime);
The Explain for the query is:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE SpeedMonitor ALL 7259978 Using temporary; Using filesort
And the table structure is:
CREATE TABLE `SpeedMonitor` (
`SMID` int(10) unsigned NOT NULL auto_increment,
`DateTime` datetime NOT NULL,
`LoadTime` double unsigned NOT NULL,
PRIMARY KEY (`SMID`)
) ENGINE=InnoDB AUTO_INCREMENT=7258294 DEFAULT CHARSET=latin1;
Any help would be greatly appreciated.
You're just asking for two columns in your query, so indexes could/should go there:
DateTime
LoadTime
Another way to speed your query up could be split DateTime field in two: date and time.
This way db can group directly on date field instead of calculating DATE(...).
EDITED:
If you prefer using a trigger, create a new column(DATE) and call it newdate, and try with this (I can't try it now to see if it's correct):
CREATE TRIGGER upd_check BEFORE INSERT ON SpeedMonitor
FOR EACH ROW
BEGIN
SET NEW.newdate=DATE(NEW.DateTime);
END
EDITED AGAIN:
I've just created a db with the same table speedmonitor filled with about 900,000 records.
Then I run the query SELECT newdate,AVG(LoadTime) loadtime FROM speedmonitor GROUP BY newdate and it took about 100s!!
Removing index on newdate field (and clearing cache using RESET QUERY CACHE and FLUSH TABLES), the same query took 0.6s!!!
Just for comparison: query SELECT DATE(DateTime),AVG(LoadTime) loadtime FROM speedmonitor GROUP BY DATE(DateTime) took 0.9s.
So I suppose that the index on newdate is not good: remove it.
I'm going to add as many records as I can now and test two queries again.
FINAL EDIT:
Removing indexes on newdate and DateTime columns, having 8mln records on speedmonitor table, here are results:
selecting and grouping on newdate column: 7.5s
selecting and grouping on DATE(DateTime) field: 13.7s
I think it's a good speedup.
Time is taken executing query inside mysql command prompt.
The problem is that you're using a function in your GROUP BY clause, so MySQL has to evaluate the expression Date(DateTime) on every record before it can group the results. I'd suggest adding a calculated field for Date(DateTime), which you could then index and see if that helps your performance.
I hope you'll permit me to point out that before you put a table into production with millions of records you should seriously consider how that data is going to be used and plan accordingly.
What is happening right now is that your query cannot use any indexes and hence scans the entire table building a response. Not the fastest way to work with relatively large tables.
You have some things to consider if you want to get to a better state:
How fast is it collecting data?
How much history do you need?
How granular are your reporting requirements?
Are you able to suspend logging to make table changes?
If the answer is "No" to the last question you could always create a new table/solution and start writing records there... importing in old data if/as needed.
Reporting granularity is important as you could, for example, compress a day's worth of data into 24 records. Load the current day into an index free loading table and then process it the next day into per hour averages. Name each loading table based on the sample date and you can delete old tables as processed.
Of course, hourly may not be fine grained enough.
Depending on your retention needs you might want to consider some type of partitioned storage. This can let you query against subsets of sample data and simply drop or archive old partitions when they are no long current enough to be relevant.
Anyhow, you seem to be on the edge of having some type of massive sampling, reporting and/or monitoring system (particularly if you were reporting on a variety of sites or pages with different characteristics). You may want to put some effort into designing this so it will fit your needs... ;)