Slow MySQL query - mysql

Hey I have a very slow MySQL query. I'm sure all I need to do is add the correct index but all the things I try don't work.
The query is:
SELECT DATE(DateTime) as 'SpeedDate', avg(LoadTime) as 'LoadTime'
FROM SpeedMonitor
GROUP BY Date(DateTime);
The Explain for the query is:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE SpeedMonitor ALL 7259978 Using temporary; Using filesort
And the table structure is:
CREATE TABLE `SpeedMonitor` (
`SMID` int(10) unsigned NOT NULL auto_increment,
`DateTime` datetime NOT NULL,
`LoadTime` double unsigned NOT NULL,
PRIMARY KEY (`SMID`)
) ENGINE=InnoDB AUTO_INCREMENT=7258294 DEFAULT CHARSET=latin1;
Any help would be greatly appreciated.

You're just asking for two columns in your query, so indexes could/should go there:
DateTime
LoadTime
Another way to speed your query up could be split DateTime field in two: date and time.
This way db can group directly on date field instead of calculating DATE(...).
EDITED:
If you prefer using a trigger, create a new column(DATE) and call it newdate, and try with this (I can't try it now to see if it's correct):
CREATE TRIGGER upd_check BEFORE INSERT ON SpeedMonitor
FOR EACH ROW
BEGIN
SET NEW.newdate=DATE(NEW.DateTime);
END
EDITED AGAIN:
I've just created a db with the same table speedmonitor filled with about 900,000 records.
Then I run the query SELECT newdate,AVG(LoadTime) loadtime FROM speedmonitor GROUP BY newdate and it took about 100s!!
Removing index on newdate field (and clearing cache using RESET QUERY CACHE and FLUSH TABLES), the same query took 0.6s!!!
Just for comparison: query SELECT DATE(DateTime),AVG(LoadTime) loadtime FROM speedmonitor GROUP BY DATE(DateTime) took 0.9s.
So I suppose that the index on newdate is not good: remove it.
I'm going to add as many records as I can now and test two queries again.
FINAL EDIT:
Removing indexes on newdate and DateTime columns, having 8mln records on speedmonitor table, here are results:
selecting and grouping on newdate column: 7.5s
selecting and grouping on DATE(DateTime) field: 13.7s
I think it's a good speedup.
Time is taken executing query inside mysql command prompt.

The problem is that you're using a function in your GROUP BY clause, so MySQL has to evaluate the expression Date(DateTime) on every record before it can group the results. I'd suggest adding a calculated field for Date(DateTime), which you could then index and see if that helps your performance.

I hope you'll permit me to point out that before you put a table into production with millions of records you should seriously consider how that data is going to be used and plan accordingly.
What is happening right now is that your query cannot use any indexes and hence scans the entire table building a response. Not the fastest way to work with relatively large tables.
You have some things to consider if you want to get to a better state:
How fast is it collecting data?
How much history do you need?
How granular are your reporting requirements?
Are you able to suspend logging to make table changes?
If the answer is "No" to the last question you could always create a new table/solution and start writing records there... importing in old data if/as needed.
Reporting granularity is important as you could, for example, compress a day's worth of data into 24 records. Load the current day into an index free loading table and then process it the next day into per hour averages. Name each loading table based on the sample date and you can delete old tables as processed.
Of course, hourly may not be fine grained enough.
Depending on your retention needs you might want to consider some type of partitioned storage. This can let you query against subsets of sample data and simply drop or archive old partitions when they are no long current enough to be relevant.
Anyhow, you seem to be on the edge of having some type of massive sampling, reporting and/or monitoring system (particularly if you were reporting on a variety of sites or pages with different characteristics). You may want to put some effort into designing this so it will fit your needs... ;)

Related

MySQL poor performance with a large table

I have a monitoring table which holds monitoring data for some 200+ servers.
Each server adds 3 records of data to the table every minute of every day.
I hold 6 months of data for historical reports for customers, and as you can imagine the table gets pretty large.
My issue currently is that running SELECT queries on this table is taking an age.
I understand why; It's the sheer amount of rows its working through whilst performing the SELECT, but I have tried to reduce the result set significantly by adding in time lookups...
SELECT * FROM `host_monitoring_data`
WHERE parent_id = 47 AND timestamp > (NOW() - INTERVAL 5 MINUTE);
... but still I'm looking at a long time before the data is returned to me.
I'm used to working with fairly small tables and this is by far the biggest that I've ever worked with, so I'm not familiar with how to overcome this sort of issue.
Any help at all is vastly appriciated.
My table structure is currently id, parent_id, timestamp, type, U, A, T
U,A,T is Used/Available/Total, Type tells me what kind of measurable we are working with, Timestamp is exactly that, parent_id is the id of the parent host to which the data belongs, and id is an auto-incrementing id for the record in question.
When I'm doing lookups, I'm basically trying to get the most recent 20 rows where parent_id = x or whatever, so I just do...
SELECT u,a,t from host_monitoring_data
WHERE parent_id=X AND timestamp > (NOW() - INTERVAL 5 MINUTE)
ORDER BY timestamp DESC LIMIT 20
EDIT 1 - Including the results of EXPLAIN:
EXPLAIN SELECT * FROM `host_monitoring_data`
WHERE parent_id=36 AND timestamp > (NOW() - INTERVAL 5 MINUTE)
ORDER BY timestamp DESC LIMIT 20
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE host_monitoring_data ALL NULL NULL NULL NUL 2865454
Using where; Using filesort
Based on your EXPLAIN report, I see it says "type: ALL" which means it's scanning all the rows (the whole table) for every query.
You need an index to help it scan fewer rows.
Your first condition for parent_id=X is an obvious choice. You should create an index starting with parent_id.
The other condition on timestamp >= ... is probably the best second choice. Your index should include timestamp as the second column.
You can create this index this way:
ALTER TABLE host_monitoring_data ADD INDEX (parent_id, timestamp);
You might like my presentation How to Design Indexes, Really and a video of me presenting it: https://www.youtube.com/watch?v=ELR7-RdU9XU
P.S.: Please when you ask questions about query optimization, run SHOW CREATE TABLE <tablename> and include its output in your question. This shows us your columns, data types, current indexes, and constraints. Don't make us guess! Help us help you!
Three good tips:
EXPLAIN (as others said), will tell you what are you doing and hints to do it better.
Avoid using "*", instead, select fields you need.
Use procedure analyse to know what are the most recommended type of variables you need (and change them if needed).
https://dev.mysql.com/doc/refman/5.7/en/procedure-analyse.html
I also avoid using "order by" whenever you can.

SQL query on timestamp interval: optimization

I am getting stuck by the execution time of a query. I have a table (no written by me) with a lot of rows (4mio) and a column representing the timestamp.
I want to do a query that will keep only the datas between two given timestamp.
I am currently using :
SELECT * FROM myTable WHERE timestamp BETWEEN "x" AND "y"
This query takes approximatively 11 sec to return 4000 row, while the query without the WHERE statement but a limit of 50'000 rows is executed in less than 0.1 sec. I am aware of the fact that with the WHERE statement, more rows are tested.
Because the timestamp is always increasing, is there any way to stop the query if the upperbound of the timestamp is reached? Or another way to run the same query much more faster?
Thank you very much
Kilian
WHERE NumberModule=24 AND Timestamp BETWEEN 40764 AND 40772
Needs a different index:
INDEX(NumberModule, Timestamp)
My Index Cookbook discusses why.
Please provide SHOW CREATE TABLE so we can see all the indexes, plus the ENGINE.
Add an BTREE index (best way to deal with BETWEEN, see here):
ALTER TABLE myTable ADD INDEX myIdx USING BTREE (`timestamp`)
Please post the result of
EXPLAIN SELECT * FROM myTable WHERE `timestamp` BETWEEN "x" AND "y"
after that.

MySQL table setup for stock information

I am collecting about 3 - 6 millions lines of stock data per day and storing it in a MySQL database.
All of the data is coming from Interactive Brokers every piece of information comes with these five fields: Symbol, Date, Time, Value and Type (type being information on what type of data I am receiving such as price, volume etc)
Here is my create table statement. idticks is just my unique key but I almost never am able to use it in queries.
CREATE TABLE `ticks` (
`idticks` int(11) NOT NULL AUTO_INCREMENT,
`symbol` varchar(30) NOT NULL,
`date` int(11) NOT NULL,
`time` int(11) NOT NULL,
`value` double NOT NULL,
`type` double NOT NULL,
KEY `idticks` (`idticks`),
KEY `symbol` (`symbol`),
KEY `date` (`date`),
KEY `idx_ticks_symbol_date` (`symbol`,`date`),
KEY `idx_ticks_type` (`type`),
KEY `idx_ticks_date_type` (`date`,`type`),
KEY `idx_ticks_date_symbol_type` (`date`,`symbol`,`type`),
KEY `idx_ticks_symbol_date_time_type` (`symbol`,`date`,`time`,`type`)
) ENGINE=InnoDB AUTO_INCREMENT=13533258 DEFAULT CHARSET=utf8
/*!50100 PARTITION BY KEY (`date`)
PARTITIONS 1 */;
As you can see, I have no idea what I am doing because I just keep on creating indexes to make my queries go faster.
Right now the data is being stored on a rather slow computer for testing purposes so I understand that my queries are not nearly as fast as they could be (I have a 6 core, 64gig of ram, SSD machine arriving tomorrow which should help significantly)
That being said, I am running queries like this one
select time, value from ticks where symbol = "AAPL" AND date = 20150522 and type = 8 order by time asc
The query above, if I do not limit it, returns 12928 records for one of my test days and takes 10.2 seconds if I do it from cleared cache.
I am doing lots of graphing and eventually would like to be able to just query the data as I need to it graph. Right now I haven't noticed a lot of difference in speed between getting part of a days worth of data vs just getting the entire day's. It would be cool to have those queries respond fast enough that there is barely any delay when I moving to the next day/screen whatever.
Another query I am using for usability of a program I am writing to interact with the data include
String query = "select distinct `date` from ticks where symbol = '" + symbol + "' order by `date` desc";
But most of my need is the ability to pull a certain type of data from a certain day for a certain symbol like my first query.
I've googled all over the place and I think I understand that creating tons of indexes makes the database bigger and slows down the input speed (I get about 300 pieces of information per second on a busy day). Should I just index each column individually?
I am willing to throw more harddrives at things if it means responsive interface.
Basically, my questions relate to the creation/altering of my table. Based on the above query, can you think of anything I could do to make that faster? Or an indexing system that would help me out? Is InnoDB even the right engine? I tried googling this vs MyISam and after a couple of hours of this, I still wasn't sure.
Thanks :)
Combine date and time into a DATETIME field
Assuming Price and Volume always come in together, put them together (2 columns) and get rid if type.
Get rid of the AUTO_INCREMENT; change to PRIMARY KEY(symbol, datetime)
Get rid of any indexes that are the left part of some other index.
Once you are using DATETIME, use date ranges to find everything in a single date (if you need such). Do not use DATE(datetime) = '...', performance will be terrible.
Symbol can probably be ascii, not utf8.
Use InnoDB, the clustering of the Primary Key can be beneficial.
Do you expect to collect (and use) more data than will fit in innodb_buffer_pool_size? If so, we need to discuss your SELECTs and look into PARTITIONing.
Make those changes, then come back for more advice/abuse.
You're creating a historical database, so MyISAM would work as well as InnoDB. InnoDB is a transactional relational database, and is better suited for relational databases with multiple tables that must remain synchronized.
Your Stock table looks like this.
Stock
-----
Stock ID (idticks)
Symbol
Date
Time
Value
Type
It would be better if you combine the date and time into a time stamp column, and unpack the types like this.
Stock
-----
Stock ID
Symbol
Time Stamp
Volume
Open
Close
Bid
Ask
...
This makes it easier for the database to return rows for a query on a particular type, like the close value.
As far as indexes, you can create as many indexes as you want. You're adding (inserting) information, so the increased time to add information is offset by the decreased time to query the information.
I'd have a primary index on Stock ID, and a unique index on Symbol and Time Stamp descending. You could also have indexes on the values you query most often, like Close.

Most efficient query to get last modified record in large table

I have a table with a large number of records ( > 300,000). The most relevant fields in the table are:
CREATE_DATE
MOD_DATE
Those are updated every time a record is added or updated.
I now need to query this table to find the date of the record that was modified last. I'm currently using
SELECT mod_date FROM table ORDER BY mod_date DESC LIMIT 1;
But I'm wondering if this is the most efficient way to get the answer.
I've tried adding a where clause to limit the date to the last month, but it looks like that's actually slower (and I need the most recent date, which could be older than the last month).
I've also tried the suggestion I read elsewhere to use:
SELECT UPDATE_TIME
FROM information_schema.tables
WHERE TABLE_SCHEMA = 'db'
AND TABLE_NAME = 'table';
But since I might be working on a dump of the original that query might result into NULL. And it looks like this is actually slower than the original query.
I can't resort to last_insert_id() because I'm not updating or inserting.
I just want to make sure I have the most efficient query possible.
The most efficient way for this query would be to use an index for the column MOD_DATE.
From How MySQL Uses Indexes
8.3.1 How MySQL Uses Indexes
Indexes are used to find rows with specific column values quickly.
Without an index, MySQL must begin with the first row and then read
through the entire table to find the relevant rows. The larger the
table, the more this costs. If the table has an index for the columns
in question, MySQL can quickly determine the position to seek to in
the middle of the data file without having to look at all the data. If
a table has 1,000 rows, this is at least 100 times faster than reading
sequentially.
You can use
SHOW CREATE TABLE UPDATE_TIME;
to get the CREATE statement and see, if an index on MOD_DATE is defined.
To add an Index you can use
CREATE INDEX
CREATE [UNIQUE|FULLTEXT|SPATIAL] INDEX index_name
[index_type]
ON tbl_name (index_col_name,...)
[index_option]
[algorithm_option | lock_option] ...
see http://dev.mysql.com/doc/refman/5.6/en/create-index.html
Make sure that both of those fields are indexed.
Then I would just run -
select max(mod_date) from table
or create_date, whichever one.
Make sure to create 2 indexes, one on each date field, not a compound index on both.
As for a discussion of the difference between this and using limit, see MIN/MAX vs ORDER BY and LIMIT
Use EXPLAIN:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
This tells You how mysql executes statement, thanks to that You can figure out most efficient way, cause it depends on Your db structure and there is no one universal solution.

MYSQL indexing issue

I am having some difficulties finding an answer to this question...
For simplicity lets create use this situation.
I create a table like this..
CREATE TABLE `test` (
`MerchID` int(10) DEFAULT NULL,
KEY `MerchID` (`MerchID`)
) ENGINE=InnoDB AUTO_INCREMENT=32769 DEFAULT CHARSET=utf8;
I will insert some data into the column of this table...
INSERT INTO test
SELECT 1
UNION
SELECT 2
UNION
SELECT null
Now I examine the query using MYSQL's explain feature...
EXPLAIN
SELECT * FROM test
WHERE merchid IS NOT NULL
Resting in ID=1
,select_type=SIMPLE
,table=test
,type=index
,possible_keys=MerchID
,key=MerchID
,key_len=5
,ref=NULL
,rows=3
,Extra= Using where
;Using index
In production in my real procedure something like this takes a long time with this index. If I re declare the table with the index line reading "KEY MerchID (MerchID) USING BTREE' I get much better results. The explain feature seems to return the same results too. I have read some basics about the BTREE, HASH and RTREE storage types for indexes/keys. When no storage type is specified I was unded the assumption that BTREE would be assumed. However I am kinda stumped why when modifying my index to use this storage type my procedure seems to fly. Any ideas?
I am using MYSQL 5.1 and coding in MYSQL Workbench. The part of procedure that appears to be help up is like the one I illustrated above where the column of a joined table is tested for NULL.
I think you are on the wrong path. For InnoDB storage the only available index method is the BTREE so if you are safe to omit the BTREE keyword from you table create script.Supported index types here along with other useful information.
The performance issue is coming from a different place.
Whenever testing performance, be sure to always use the SQL_NO_CACHE directive, otherwise, with query caching, the second time you run a query, your results may be returned a lot faster simply due to caching.
With a covering index (all of the selected and filtered columns are in the index), the query is rather efficient. Using index in the EXPLAIN result shows that it's being used as a covering index.
However, if the index were not a covering index, MySQL would have to perform a seek for each row returned by the index in order to grab the actual table data. While this would still be fast for a small result set, with a result set of 1 million rows, that would be 1 million seeks. If the number of NULL rows were a high percentage, MySQL would abandon the index altogether to avoid the seeks.
Ensure that your real "production" index is a covering index as well.