Very slow SQL ORDER BY, and EXPLAIN doesn't explain - mysql

The query:
SELECT
m.*,
mic.*
FROM
members m,
members_in_class_activities mic
WHERE
m.id = mic.member_id AND
mic.not_cancelled = '1' AND
mic.class_activity_id = '32' AND
mic.date = '2016-02-14' AND
mic.time = '11:00:00'
ORDER BY
mic.reservation_order
Table members has ~ 100k records, and table members_in_class_activities has about 300k. The result set is just 2 records. (Not 2k, just 2.)
All relevant columns are indexed (id and reservation_order are primary): member_id, not_cancelled, class_activity_id, date, time.
There is also a UNIQUE key for class_activity_id + member_id + date + time + not_cancelled. not_cancelled is NULL or '1'.
All other queries are very fast (1-5 ms), but this one is crazy slow: 600-1000 ms.
What doesn't help:
select only the 2 primary keys instead of * (0 % change)
a real JOIN instead of an implicit join (it actually seems slightly slower, but probably not) (0 % change)
removing the join to members entirely makes it slightly faster (15 % change)
What does help, immensely:
remove the ORDER BY on the primary key (99% change)
I have only 2 questions:
What????
How do I still ORDER BY x, but make it fast too? Do I really need a separate column?
I'm running 10.1.9-MariaDB on my dev machine, but it's slow on production's MySQL 5.5.27-log too.

Dont use order by on your main query. Try this :
SELECT * FROM (
... your query
) ORDER BY mic.reservation_order
As you mentioned that members_in_class_activities has about 300k records, so your order by will apply on all 300k records that definitely slow down your query.

class_activity_id + member_id + date + time + not_cancelled -- not optimal.
Start with the '=' fields in the WHERE in any order, then add on the ORDER BY field:
INDEX(class_activity_id, date, time, not_cancelled,
reservation_order)
Since you seem to need the UNIQUE constraint, then it would be almost as good to shuffle your index by putting member_id at the end (where it will be 'out of the way', but not used):
UNIQUE(class_activity_id, date, time, not_cancelled, member_id)
In general, it is bad to split a date and time apart; suggest a DATETIME column for the pair.
Cookbook on creating indexes.

Related

MySQL Month and Year grouping slow performance

I've got a report which I need to show the month and year profit from my transaction the query which I made and works is very slow and can not figure out how I can manage to change the query the way that consumes less time to load.
SELECT MONTH(MT4_TRADES.CLOSE_TIME) as MONTH
, YEAR(MT4_TRADES.CLOSE_TIME) as YEAR
, SUM(MT4_TRADES.SWAPS) as SWAPS
, SUM(MT4_TRADES.VOLUME)/100 as VOLUME
, SUM(MT4_TRADES.PROFIT) AS PROFIT
FROM MT4_TRADES
JOIN MT4_USERS
ON MT4_TRADES.LOGIN = MT4_USERS.LOGIN
WHERE MT4_TRADES.CMD < 2
AND MT4_TRADES.CLOSE_TIME <> "1970-01-01 00:00:00"
AND MT4_USERS.AGENT_ACCOUNT <> "1"
GROUP
BY YEAR(MT4_TRADES.CLOSE_TIME)
, MONTH(MT4_TRADES.CLOSE_TIME)
ORDER
BY YEAR
This is the full query, any suggestion would be highly appreciated.
This is the result of explain:
Echoing the comment from #Barmar, look at the EXPLAIN output to see query execution plan. Verify that suitable indexes are being used.
Likely the big rock in terms of performance is the "Using filesort" operation.
To get around that, we would need a suitable index available. and that would require some changes to the table. (The typical question on "improve query performance" topic on SO comes with a restrictions that we "can't add indexes or make any changes to the table".)
I'd be looking at a functional index (feature added in MySQL 8.0, for MySQL 5.7, I'd be looking at adding generated columns and including generated columns in a secondary index, featured added in MySQL 5.7)
CREATE INDEX `MT4_TRADES_ix2` ON MT4_TRADES ((YEAR(close_time)),(MONTH(close_time)))
I'd be tempted to go with a covering index, and also change the grouping to a single expression e.g. DATE_FORMAT(close_time,'%Y-%m')
CREATE INDEX `MT4_TRADES_ix3` ON MT4_TRADES ((DATE_FORMAT(close_time,'%Y-%m'))
,swaps,volume,profit,login,cmd,closetime)
from the query, it looks like login is going to be UNIQUE in MT4_USERS table, likely that's the PRIMARY KEY or a UNIQUE KEY, so an index is going to be available, but we're just guessing...
With suitable indexes available, we could so something like this:
SELECT DATE_FORMAT(close_time,'%Y-%m') AS close_year_mo
, SUM(IF(t.cmd < 2 AND t.close_time <> '1970-01-01', t.swaps ,NULL)) AS swaps
, SUM(IF(t.cmd < 2 AND t.close_time <> '1970-01-01', t.volume ,NULL))/100 AS volume
, SUM(IF(t.cmd < 2 AND t.close_time <> '1970-01-01', t.profit ,NULL)) AS profit
FROM MT4_TRADES t
JOIN MT4_USERS u
ON u.login = t.login
AND u.agent_account <> '1'
GROUP BY close_year_mo
ORDER BY close_year_mo
and we'd expect MySQL to do a loose index scan, with the EXPLAIN output top show "using index for group-by" and not show "Using filesort"
EDIT
For versions of MySQL before 5.7, we could create new columns, e.g.year_close and month_close, populate the columns with the results of expressions YEAR(close_time) and MONTH(close_time) (we could create BEFORE INSERT and BEFORE UPDATE triggers to handle that automatically for us)
Then we could create index with those columns as the leading columns
CREATE INDEX ... ON MT4_TRADES ( year_close, month_close, ... )
And then reference the new columns in the query
SELECT t.year_close AS `YEAR`
, t.month_close AS `MONTH`
FROM MT4_TRADES t
JOIN ...
WHERE ...
GROUP
BY t.year_close
, t.month_close
Ideally include in the index all of referenced columns from MT4_TRADES, to make a covering index for the query.

MySQL Slow query ~ 10 seconds

I have this query which basically goes through a bunch of tables to get me some formatted results but I can't seem to find the bottleneck. The easiest bottleneck was the ORDER BY RAND() but the performance are still bad.
The query takes from 10 sec to 20 secs without ORDER BY RAND();
SELECT
c.prix AS prix,
ST_X(a.point) AS X,
ST_Y(a.point) AS Y,
s.sizeFormat AS size,
es.name AS estateSize,
c.title AS title,
DATE_FORMAT(c.datePub, '%m-%d-%y') AS datePub,
dbr.name AS dateBuiltRange,
m.myId AS meuble,
c.rawData_id AS rawData_id,
GROUP_CONCAT(img.captionWebPath) AS paths
FROM
immobilier_ad_blank AS c
LEFT JOIN PropertyFeature AS pf ON (c.propertyFeature_id = pf.id)
LEFT JOIN Adresse AS a ON (c.adresse_id = a.id)
LEFT JOIN Size AS s ON (pf.size_id = s.id)
LEFT JOIN EstateSize AS es ON (pf.estateSize_id = es.id)
LEFT JOIN Meuble AS m ON (pf.meuble_id = m.id)
LEFT JOIN DateBuiltRange AS dbr ON (pf.dateBuiltRange_id = dbr.id)
LEFT JOIN ImageAd AS img ON (img.commonAd_id = c.rawData_id)
WHERE
c.prix != 0
AND pf.subCatMyId = 1
AND (
(
c.datePub > STR_TO_DATE('01-04-2016', '%d-%m-%Y')
AND c.datePub < STR_TO_DATE('30-04-2016', '%d-%m-%Y')
)
OR date_format(c.datePub, '%d-%m-%Y') = '30-04-2016'
)
AND a.validPoint = 1
GROUP BY
c.id
#ORDER BY
# RAND()
LIMIT
5000
Here is the explain query:
Visual Portion:
And here is a screenshot of mysqltuner
EDIT 1
I have many indexes Here they are:
EDIT 2:
So you guys did it. Down to .5 secs to 2.5 secs.
I mostly followed all of your advices and changed some of my.cnf + runned optimized on my tables.
You're searching for dates in a very suboptimal way. Try this.
... c.datePub >= STR_TO_DATE('01-04-2016', '%d-%m-%Y')
AND c.datePub < STR_TO_DATE('30-04-2016', '%d-%m-%Y') + INTERVAL 1 DAY
That allows a range scan on an index on the datePub column. You should create a compound index for that table on (datePub, prix, addresse_id, rawData_id) and see if it helps.
Also try an index on a (valid_point). Notice that your use of a geometry data type in that table is probably not helping anything.
To begin with you have quite a lot of indexes but many of them are not useful. Remember more indexes means slower inserts and updates. Also mysql is not good at using more than one index per table in complex queries. The following indexes have a cardinality < 10 and probably should be dropped.
IDX_...E88B
IDX....62AF
IDX....7DEE
idx2
UNIQ...F210
UNIQ...F210..
IDX....0C00
IDX....A2F1
At this point I got tired of the excercise, there are many more
Then you have some duplicated data.
point
lat
lng
The point field has the lat and lng in it. So the latter two are not needed. That means you can lose two more indexes idxlat and idxlng. I am not quite sure how idxlng appears twice in the index list for the same table.
These optimizations will lead to an overall increase in performance for INSERTS and UPDATES and possibly for all SELECTs as well because the query planner needs to spend less time deciding which index to use.
Then we notice from your explain that the query does not use any index on table Adresse (a). But your where clause has a.validPoint = 1 clearly you need an index on it as suggested by #Ollie-Jones
However I suspect that this index may have low cardinality. In that case I recommend that you create a composite index on this column + another.
The problem is your join with (a). The table has an index, but the index can't be used, more than likely due to the sort (/group by), or possibly incompatible types. The EXPLAIN shows three quarters of a million rows examined, this means that index lookup was not possible.
When designing a query, look for the smallest possible result set - search by that index, and then join from there. Perhaps "c" isn't the best table for the primary query.
(You could try using FORCE INDEX (id) on table a, if it doesn't work, the error may give you more information).
As others have pointed out, you need an index on a.validPoint but what about c.datePub that is also used in the WHERE clause. Why not a multiple column index on datePub, address_id the index on address_id is already used, so a multiple column index will be better here.

MySQL - turning data points into ranges

I have a database of measurements that indicate a sensor, a reading, and the timestamp the reading was taken. The measurements are only recorded when there's a change. I want to generate a result set that shows the range each sensor is reading a particular measurement.
The timestamps are in milliseconds but I'm outputting the result in seconds.
Here's the table:
CREATE TABLE `raw_metric` (
`row_id` BIGINT NOT NULL AUTO_INCREMENT,
`sensor_id` BINARY(6) NOT NULL,
`timestamp` BIGINT NOT NULL,
`angle` FLOAT NOT NULL,
PRIMARY KEY (`row_id`)
)
Right now I'm getting the results I want using a subquery, but it's fairly slow when there's a lot of datapoints:
SELECT row_id,
HEX(sensor_id),
angle,
(
COALESCE((
SELECT MIN(`timestamp`)
FROM raw_metric AS rm2
WHERE rm2.`timestamp` > rm1.`timestamp`
AND rm2.sensor_id = rm1.sensor_id
), UNIX_TIMESTAMP() * 1000) - `timestamp`
) / 1000 AS duration
FROM raw_metric AS rm1
Essentially, to get the range, I need to get the very next reading (or use the current time if there isn't another reading). The subquery finds the minimum timestamp that is later than the current one but is from the same sensor.
This query isn't going to occur very often so I'd prefer to not have to add an index on the timestamp column and slow down inserts. I was hoping someone might have a suggestion as to an alternate way of doing this.
UPDATE:
The row_id's should be incremented along with timestamps but it can't be guaranteed due to network latency issues. So, it's possible that an entry with a lower row_id comes occurs AFTER a later row_id, though unlikely.
This is perhaps more appropriate as a comment than as a solution, but it is too long for a comment.
You are trying to implement the lead() function in MySQL, and MySQL does not, unfortunately, have window functions. You could switch to Oracle, DB2, Postgres, SQL Server 2012 and use the built-in (and optimized) functionality there. Ok, that may not be realistic.
So, given your data structure you need to do either a correlated subquery or a non-equijoin (actually a partial equi-join because there is match on sensor_id). These are going to be expensive operations, unless you add an index. Unless you are adding measurements tens of times per second, the additional overhead on the index should not be a big deal.
You could also change your data structure. If you had a "sensor counter" that was a sequential number enumerating the readings, then you could use this as an equijoin (although for good performance you might still want an index). Adding this in to your table would require having a trigger -- and that is likely to perform even worse than an index for when inserting.
If you only have a handful of sensors, you could create a separate table for each one. Oh, I can feel the groans at this suggestion. But, if you did, then an auto-incremented id would perform the same role. To be honest, I would only do this if I could count the number of sensors on each hand.
In the end, I might suggest that you take the hit during insertion and have "effective" and "end' times on each record (as well as an index on sensor id and either timestamp or id). With these additional columns, you will probably find more uses for the table.
If you are doing this for just one sensor, then create a temoprary table for the information and use an auto-incremented id column. Then insert the data into it:
insert into temp_rawmetric (orig_row_id, sensor_id, timestamp, angle)
select orig_row_id, sensor_id, timestamp, angle
from raw_metric
order by sensor_id, timestamp;
Be sure your table has a temp_rawmetric_id column that is auto-incremented and the primary key (creates an index automatically). The order by makes sure this is incremented according to the timestamp.
Then you can do your query as:
select trm.sensor_id, trm.angle,
trm.timestamp as startTime, trmnext.timestamp as endTime
from temp_rawmetric trm left outer join
temp_rawmetric trmnext
on trmnext.temp_rawmetric_id = trm.temp_rawmetric_id+1;
This will require a pass through the original data to extra the data, and then a primary key join on the temporary table. The first might take some time. The second should be pretty quick.
Select rm1.row_id
,HEX(rm1.sensor_id)
,rm1.angle
,(COALESCE(rm2.timestamp, UNIX_TIMESTAMP() * 1000) - rm1.timestamp) as duration
from raw_metric rm1
left outer join
raw_metric rm2
on rm2.sensor_id = rm1.sensor_id
and rm2.timestamp = (
select min(timestamp)
from raw_metric rm3
where rm3.sensor_id = rm1.sensor_id
and rm3.timestamp > rm1.timestamp
)
If you use auto_increment for primary key, you may replace timestamp by row_id in query condition part. Like this:
SELECT row_id,
HEX(sensor_id),
angle,
(
COALESCE((
SELECT MIN(`timestamp`)
FROM raw_metric AS rm2
WHERE rm2.`row_id` > rm1.`row_id`
AND rm2.sensor_id = rm1.sensor_id
), UNIX_TIMESTAMP() * 1000) - `timestamp`
) / 1000 AS duration
FROM raw_metric AS rm1
It must work some quickly.
Also you can add one more subquery for fast select row id of new senser value. See:
SELECT row_id,
HEX(sensor_id),
angle,
(
COALESCE((
SELECT timestamp FROM raw_metric AS rm1a
WHERE row_id =
(
SELECT MIN(`row_id`)
FROM raw_metric AS rm2
WHERE rm2.`row_id` > rm1.`row_id`
AND rm2.sensor_id = rm1.sensor_id
)
), UNIX_TIMESTAMP() * 1000) - `timestamp`
) / 1000 AS duration
FROM raw_metric AS rm1

mysql / matlab: optimize query - removing dates from a list

I have a table with ~3M rows. The rows are date, time, msec, and some other columns with int data. Some unknown fraction of these rows are considered 'invalid' based on their existence in a separate table outages (based on date ranges).
Currently the query does a select * and then uses a huge WHERE to remove the invalid date ranges ( lots of 'and not ( RecordDate >'2008-08-05' and RecordDate < '2008-08-10' )') and so on. This blows away any chance of using an index.
Im looking for a better way to limit the results. As it stands now the query takes several minutes to run.
DELETE b FROM bigtable b
INNER JOIN outages o ON (b.`date` BETWEEN o.datestart AND o.dateend)
WHERE (1=1) //In some modes MySQL demands a `where` clause or it will not run.
Make sure you have an index on all fields involved in the query.

How can I speed up a MySQL query with a large offset in the LIMIT clause?

I'm getting performance problems when LIMITing a mysql SELECT with a large offset:
SELECT * FROM table LIMIT m, n;
If the offset m is, say, larger than 1,000,000, the operation is very slow.
I do have to use limit m, n; I can't use something like id > 1,000,000 limit n.
How can I optimize this statement for better performance?
Perhaps you could create an indexing table which provides a sequential key relating to the key in your target table. Then you can join this indexing table to your target table and use a where clause to more efficiently get the rows you want.
#create table to store sequences
CREATE TABLE seq (
seq_no int not null auto_increment,
id int not null,
primary key(seq_no),
unique(id)
);
#create the sequence
TRUNCATE seq;
INSERT INTO seq (id) SELECT id FROM mytable ORDER BY id;
#now get 1000 rows from offset 1000000
SELECT mytable.*
FROM mytable
INNER JOIN seq USING(id)
WHERE seq.seq_no BETWEEN 1000000 AND 1000999;
If records are large, the slowness may be coming from loading the data. If the id column is indexed, then just selecting it will be much faster. You can then do a second query with an IN clause for the appropriate ids (or could formulate a WHERE clause using the min and max ids from the first query.)
slow:
SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000
fast:
SELECT id FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000
SELECT * FROM table WHERE id IN (1,2,3...10)
There's a blog post somewhere on the internet on how you should best make the selection of the rows to show should be as compact as possible, thus: just the ids; and producing the complete results should in turn fetch all the data you want for only the rows you selected.
Thus, the SQL might be something like (untested, I'm not sure it actually will do any good):
select A.* from table A
inner join (select id from table order by whatever limit m, n) B
on A.id = B.id
order by A.whatever
If your SQL engine is too primitive to allow this kind of SQL statements, or it doesn't improve anything, against hope, it might be worthwhile to break this single statement into multiple statements and capture the ids into a data structure.
Update: I found the blog post I was talking about: it was Jeff Atwood's "All Abstractions Are Failed Abstractions" on Coding Horror.
I don't think there's any need to create a separate index if your table already has one. If so, then you can order by this primary key and then use values of the key to step through:
SELECT * FROM myBigTable WHERE id > :OFFSET ORDER BY id ASC;
Another optimisation would be not to use SELECT * but just the ID so that it can simply read the index and doesn't have to then locate all the data (reduce IO overhead). If you need some of the other columns then perhaps you could add these to the index so that they are read with the primary key (which will most likely be held in memory and therefore not require a disc lookup) - although this will not be appropriate for all cases so you will have to have a play.
Paul Dixon's answer is indeed a solution to the problem, but you'll have to maintain the sequence table and ensure that there is no row gaps.
If that's feasible, a better solution would be to simply ensure that the original table has no row gaps, and starts from id 1. Then grab the rows using the id for pagination.
SELECT * FROM table A WHERE id >= 1 AND id <= 1000;
SELECT * FROM table A WHERE id >= 1001 AND id <= 2000;
and so on...
I have run into this problem recently. The problem was two parts to fix. First I had to use an inner select in my FROM clause that did my limiting and offsetting for me on the primary key only:
$subQuery = DB::raw("( SELECT id FROM titles WHERE id BETWEEN {$startId} AND {$endId} ORDER BY title ) as t");
Then I could use that as the from part of my query:
'titles.id',
'title_eisbns_concat.eisbns_concat',
'titles.pub_symbol',
'titles.title',
'titles.subtitle',
'titles.contributor1',
'titles.publisher',
'titles.epub_date',
'titles.ebook_price',
'publisher_licenses.id as pub_license_id',
'license_types.shortname',
$coversQuery
)
->from($subQuery)
->leftJoin('titles', 't.id', '=', 'titles.id')
->leftJoin('organizations', 'organizations.symbol', '=', 'titles.pub_symbol')
->leftJoin('title_eisbns_concat', 'titles.id', '=', 'title_eisbns_concat.title_id')
->leftJoin('publisher_licenses', 'publisher_licenses.org_id', '=', 'organizations.id')
->leftJoin('license_types', 'license_types.id', '=', 'publisher_licenses.license_type_id')
The first time I created this query I had used the OFFSET and LIMIT in MySql. This worked fine until I got past page 100 then the offset started getting unbearably slow. Changing that to BETWEEN in my inner query sped it up for any page. I'm not sure why MySql hasn't sped up OFFSET but between seems to reel it back in.