MySQL performance issue on a simple two tables joined Query - mysql

I'm facing a performance issue on MySQL and I'm unable to understand where I'm wrong. The machine runs MySQLServer 5.7.15 with two Xeon 64bit Processors and 8GBytes of RAM.
I've got two tables:
Table data_raw contains several fields (see VRMS0,VRMS1,VRMS2,PWRA0,PWRA1,PWRA2)
describing the voltages and active powers acquired from complicated instrumentation every 30 seconds from several probes on the field, each probe is uniquely identified by its DEVICE_ID.
Table data_timeslot contains few fields and is used to keep trace of when the single data_raw record was sent (see SRV_TIMESTAMP field)
and from which device (see DEVICE_ID field).
Each table contains about 7.800.000 records.
The two tables are joined using a PK on ID (auto-increment) on data_timeslot and a PK on TIMESLOT_ID (auto-increment) on data_timeslot.
Here is the query:
SELECT D.VRMS0,D.VRMS1,D.VRMS2,D.PWRA0,D.PWRA1,D.PWRA2,T.DEVICE_ID, T.SRV_TIMESTAMP
FROM data_raw AS D FORCE INDEX(PRIMARY)
INNER JOIN data_timeslots AS T ON T.ID=D.TIMESLOT_ID
WHERE T.DEVICE_ID='CEC02'
ORDER BY T.ID DESC LIMIT 1
The query takes always 10 seconds while the same query on a single table takes few milliseconds.
In other words the query
SELECT * FROM 'data_raw' order by TIMESLOT_ID desc limit 1
takes just 0.0071 sec and the query
SELECT * FROM 'data_timeslots' order by ID desc limit 1
takes just 0.0042 sec so I'm wondering why the join takes so long.
Where is the bottleneck?
P.S. The 'extend' shows that the DB is using properly the PK for the operation.
Below the extend printout:
`EXPLAIN SELECT D.VRMS0,D.VRMS1,D.VRMS2,D.PWRA0,D.PWRA1,D.PWRA2,T.DEVICE_ID, T.SRV_TIMESTAMP FROM data_raw AS D INNER JOIN data_timeslots AS T ON T.ID=D.TIMESLOT_ID WHERE T.DEVICE_ID='XXXXX' ORDER BY T.ID ASC LIMIT 1
1 SIMPLE T index PRIMARY,PK_CLUSTER_T,DEVICE_ID PRIMARY 8 30 3.23 Using where
1 SIMPLE D eq_ref PRIMARY PRIMARY 8 splc_smartpwr.T.ID 1 100.00 NULL`
UPDATE (suggested by #Alberto_Delgado_Roda): if I use ASC LIMIT 1 the query takes just 0,0261 sec

Reply to "why"
Data_timeslots has a clusteted index that suits the ascending order
How the Clustered Index Speeds Up Queries
Accessing a row through the clustered index is fast because the index search leads directly to the page with all the row data. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record. (For example, MyISAM uses one file for data rows and another for index records.)
See https://dev.mysql.com/doc/refman/5.7/en/innodb-index-types.html

Try this:
1: What happen if do you replace INNER JOIN for STRAIGHT_JOIN?
SELECT D.VRMS0,D.VRMS1,D.VRMS2,D.PWRA0,D.PWRA1,D.PWRA2,T.DEVICE_ID, T.SRV_TIMESTAMP
FROM data_raw AS D FORCE INDEX(PRIMARY)
STRAIGHT_JOIN data_timeslots AS T ON T.ID=D.TIMESLOT_ID
WHERE T.DEVICE_ID='CEC02'
ORDER BY T.ID DESC LIMIT 1
What happen if do you replace DESC LIMIT 1 for ASC LIMIT 1?

I just figured out that the query:
SELECT T.ID,T.DEVICE_ID, T.SRV_TIMESTAMP, D.VRMS0,D.VRMS1,D.VRMS2,D.PWRA0,D.PWRA1,D.PWRA2 FROM data_timeslots as T INNER JOIN data_raw AS D ON D.TIMESLOT_ID=T.ID ORDER BY T.ID DESC LIMIT 1
runs in just 0.0174 sec as expected. I just reversed the order in the SELECT statement and the result changed dramatically. The question now is why???

Related

Unexpected result by MySQL Optimizer for identical queries?

Following query is working like expected and uses index
Query takes 0,0481 sec
SELECT
geodb_locations.name,
geodb_locations.name_url,
COUNT(user.uid) AS useranzahl
FROM
user
LEFT JOIN
geodb_locations ON geodb_locations.id=user.plz
WHERE
user.freigeben=1 AND
geodb_locations.adm0='AT'
GROUP BY user.plz
ORDER BY useranzahl DESC
LIMIT 25
Explain
If only country locale is changed within the query from AT to DE
Query takes about 2.5 sec and does not use index
SELECT
geodb_locations.name,
geodb_locations.name_url,
COUNT(user.uid) AS useranzahl
FROM
user
LEFT JOIN
geodb_locations ON geodb_locations.id=user.plz
WHERE
user.freigeben=1 AND
geodb_locations.adm0='DE'
GROUP BY user.plz
ORDER BY useranzahl DESC
LIMIT 25
Explain
Why is index not used by the optimizer of second query and how to improve the query.
2.5 sec are to long ..
If u.uid cannot be NULL, use COUNT(*) instead of COUNT(u.uid).
As already pointed out, remove LEFT.
Add these indexes:
user: (freigeben, plz)
geodb_locations: (adm0, name_url, name)
As for why the EXPLAIN changed, ... It is quite normal (but somewhat rare) for the distribution of the constants to determine what order the tables are touched (Austria is less common than Germany?) or which index to use.
Regardless of optimizations, this query will have to scan a lot more rows for DE than for AT; this has to happen before the sort (ORDER BY) and LIMIT.
Two things prevent much optimization:
The WHERE references both tables.
The ORDER BY depends on a computed value.

Large SQL database - solving efficiency

I have this following SQL query, which, when I originally coded it, was exceptionally fast, it now takes over 1 second to complete:
SELECT counted/scount as ratio, [etc]
FROM
playlists
LEFT JOIN (
select AID, PLID FROM (SELECT AID, PLID FROM p_s ORDER BY `order` asc, PLSID desc)as g GROUP BY PLID
) as t USING(PLID)
INNER JOIN (
SELECT PLID, count(PLID) as scount from p_s LEFT JOIN audio USING(AID) WHERE removed='0' and verified='1' GROUP BY PLID
) as g USING(PLID)
LEFT JOIN (
select AID, count(AID) as counted FROM a_p_all WHERE ".time()." - playtime < 2678400 GROUP BY AID
) as r USING(AID)
LEFT JOIN audio USING (AID)
LEFT JOIN members USING (UID)
WHERE scount > 4 ORDER BY ratio desc
LIMIT 0, 20
I have identified the problem, the a_p_all table has over 500k rows. This is slowing down the query. I have come up with a solution:
Create a smaller temporary table, that only stores the data necessary, and deletes anything older than is needed.
However, is there a better method to use? Optimally I wouldn't need a temporary table; what do sites such as YouTube/Facebook do for large tables to keep query times fast?
edit
This is the EXPLAIN table for the query in the answer from #spencer7593
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 20
1 PRIMARY u eq_ref PRIMARY PRIMARY 8 q.AID 1 Using index
1 PRIMARY m eq_ref PRIMARY PRIMARY 8 q.UID 1 Using index
3 DERIVED <derived6> ALL NULL NULL NULL NULL 20
6 DERIVED t ALL NULL NULL NULL NULL 21
5 DEPENDENT SUBQUERY s ALL NULL NULL NULL NULL 49 Using where; Using filesort
4 DEPENDENT SUBQUERY c ALL NULL NULL NULL NULL 49 Using where
4 DEPENDENT SUBQUERY o eq_ref PRIMARY PRIMARY 8 database.c.AID 1 Using where
2 DEPENDENT SUBQUERY a ALL NULL NULL NULL NULL 510594 Using where
Two "big rock" issues stand out to me.
Firstly, this predicate
WHERE ".time()." - playtime < 2678400
(I'm assuming that this isn't the actual SQL being submitted to the database, but that what's being sent to the database is something like this...
WHERE 1409192073 - playtime < 2678400
such that we want only rows where playtime is within the past 31 days (i.e. within 31*24*60*60 seconds of the integer value returned by time().
This predicate can't make use of a range scan operation on a suitable index on playtime. MySQL evaluates the expression on the left side for every row in the table (every row that isn't excluded by some other predicate), and the result of that expression is compared to the literal on the right.
To improve performance, rewrite the predicate that so that the comparison is made on the bare column. Compare the value stored in the playtime column to an expression that needs to be evaluated one time, for example:
WHERE playtime > 1409192073 - 2678400
With a suitable index available, MySQL can perform a "range" scan operation, and efficiently eliminate a boatload of rows that don't need to be evaluated.
The second "big rock" is the inline views, or "derived tables" in MySQL parlance. MySQL is much different than other databases in how inline views are processed. MySQL actually runs that innermost query, and stores the result set as a temporary MyISAM table, and then the outer query runs against the MyISAM table. (The name that MySQL uses, "derived table", makes sense when we understand how MySQL processes the inline view.) Also, MySQL does not "push" predicates down, from an outer query down into the view queries. And on the derived table, there are no indexes created. (I believe MySQL 5.7 is changing that, and does sometimes create indexes, to improve performance.) But large "derived tables" can have a significant performance impact.
Also, the LIMIT clause gets applied last in the statement processing; that's after all the rows in the resultset are prepared and sorted. Even if you are returning only 20 rows, MySQL still prepares the entire resultset; it just doesn't transfer them to the client.
Lots of the column references are not qualified with the table name or alias, so we don't know, for example, which table (p_s or audio) contains the removed and verified columns.
(We know it can't be both, if MySQL isn't throwing a "ambiguous column" error. But MySQL has access to the table definitions, where we don't. MySQL also knows something about the cardinality of the columns, in particular, which columns (or combination of columns) are UNIQUE, and which columns can contain NULL values, etc.
Best practice is to qualify ALL column references with the table name or (preferably) a table alias. (This makes it much easier on the human reading the SQL, and it also avoids a query from breaking when a new column is added to a table.)
Also, the query as a LIMIT clause, but there's no ORDER BY clause (or implied ORDER BY), which makes the resultset indeterminate. We don't have any guaranteed which will be the "first" rows returned.
EDIT
To return only 20 rows from playlists (out of thousands or more), I might try using correlated subqueries in the SELECT list; using a LIMIT clause in an inline view to winnow down the number of rows that I'd need to run the subqueries for. Correlated subqueries can eat your lunch (and your lunchbox too) in terms of performance with large sets, due to the number of times those need to be run.
From what I can gather, you are attempting to return 20 rows from playlists, picking up the related row from member (by the foreign key in playlists), finding the "first" song in the playlist; getting a count of times that "song" has been played in the past 31 days (from any playlist); getting the number of times a song appears on that playlist (as long as it's been verified and hasn't been removed... the outerness of that LEFT JOIN is negated by the predicates on the removed and verified columns, if either of those columns is from the audio table...).
I'd take a shot with something like this, to compare performance:
SELECT q.*
, ( SELECT COUNT(1)
FROM a_p_all a
WHERE a.playtime < 1409192073 - 2678400
AND a.AID = q.AID
) AS counted
FROM ( SELECT p.PLID
, p.UID
, p.[etc]
, ( SELECT COUNT(1)
FROM p_s c
JOIN audio o
ON o.AID = c.AID
AND o.removed='0'
AND o.verified='1'
WHERE c.PLID = p.PLID
) AS scount
, ( SELECT s.AID
FROM p_s s
WHERE s.PLID = p.PLID
ORDER BY s.order ASC, s.PLSID DESC
LIMIT 1
) AS AID
FROM ( SELECT t.PLID
, t.[etc]
FROM playlists t
ORDER BY NULL
LIMIT 20
) p
) q
LEFT JOIN audio u ON u.AID = q.AID
LEFT JOIN members m ON m.UID = q.UID
LIMIT 0, 20
UPDATE
Dude, the EXPLAIN output is showing that you don't have suitable indexes available. To get any decent chance at performance with the correlated subqueries, you're going to want to add some indexes, e.g.
... ON a_p_all (AID, playtime)
... ON p_s (PLID, order, PLSID, AID)

Slow MySQL query with AS and subquery

I have a problem with this slow query that runs for 10+ seconds:
SELECT DISTINCT siteid,
storyid,
added,
title,
subscore1,
subscore2,
subscore3,
( 1 * subscore1 + 0.8 * subscore2 + 0.1 * subscore3 ) AS score
FROM articles
WHERE added > '2011-10-23 09:10:19'
AND ( articles.feedid IN (SELECT userfeeds.siteid
FROM userfeeds
WHERE userfeeds.userid = '1234')
OR ( articles.title REGEXP '[[:<:]]keyword1[[:>:]]' = 1
OR articles.title REGEXP '[[:<:]]keyword2[[:>:]]' = 1 ) )
ORDER BY score DESC
LIMIT 0, 25
This outputs a list of stories based on the sites that a user added to his account. The ranking is determined by score, which is made up out of the subscore columns.
The query uses filesort and uses indices on PRIMARY and feedid.
Results of an EXPLAIN:
1 PRIMARY articles
range
PRIMARY,added,storyid
PRIMARY 729263 rows
Using where; Using filesort
2 DEPENDENT SUBQUERY
userfeeds
index_subquery storyid,userid,siteid_storyid
siteid func
1 row
Using where
Any suggestions to improve this query? Thank you.
I would move the calculation logic to the client and only load fields from the database. This makes your query and the calculation itself faster. It's not a good style to do such things in SQL code.
And also is the regex very slow, maybe another searching mode like 'LIKE' is faster.
Looking at your EXPLAIN, it doesn't appear your query is utilizing any index (thus the filesort). This is being caused by the sort on the calculated column (score).
Another barrier is the size of the table (729263 rows). You don't want to create an index that is too wide as it will take much more space and impact performance of your CUD operations. What we want to do is target the columns that are being selected, however, in this situation we can't since it's a calculated column. You can try creating a VIEW or either remove the sort or do it at the application layer.

How can I optimize this query, takes more than a min to execute

I have following query which take more than a minute to execute, how can I optimize it. Its slow because of order by o.id desc, if I remove it query executes it few ms.
select o.*, per.email, p.name
from order o
inner join product p
on o.product_id=p.id
inner join person per
on o.person_id=per.id
order by o.id desc
limit 100;
Following is the result of explain
1 SIMPLE p index PRIMARY FK2EFC6C1E5DE2FC 8 NULL 6886 Using index; Using temporary; Using filesort
1 SIMPLE o ref FK67E9050121C383DB,FK67E90501FC44A17C FK67E90501FC44A17C 8 dev.p.id 58
1 SIMPLE per eq_ref PRIMARY PRIMARY 8 dev.o.person_id 1 Using index
All the tables are InnoDB and joins are on Primary and Foreign keys. Other than that indexes are on email column in Person and status column in Order
Number of records in each table
Person : 1,300,000
Product: 7,000
Order : 70,000
The planner, most probably, is not using the limit hint to eliminate rows from order table before the join. So the server has to do the join for all rows and then return just a few.
Try this:
select o.* from
(select * order order by id desc limit 100) o
inner join product p
on o.product_id=p.id
inner join person per
on o.person_id=per.id
order by o.id desc limit 100;
EDIT: This will work only if there is a constraint guaranteeing that corresponding rows are present in Product and Person tables.
yes, i understand your question, in this case first of all, save your query in .sql file.
you can use sql server utility called "Database Engine Tuning Advisor" in Tools menu of the Sql Server Management studio.
first open sql server and go to tools and select option "Database Engine Tuning Advisor".
then select the file you stored previously.
now tick mark your database which you are using and table which is used in query and then
click on Start Analysis in main menu.
it show you possible index and statestics to create
now to apply this recommandation to your query, go to action manu and select "Apply Recommandation" that will create indexes and statestics on on your table and reduce the execution time of query.
Increase the value of this variabled and restart MySQL server read_rnd_buffer_size
setting the variable to a large value can improve ORDER BY performance by a lot.
Also you can refer http://www.mysqlperformanceblog.com/2007/07/24/what-exactly-is-read_rnd_buffer_size/

How can I speed up a MySQL query with a large offset in the LIMIT clause?

I'm getting performance problems when LIMITing a mysql SELECT with a large offset:
SELECT * FROM table LIMIT m, n;
If the offset m is, say, larger than 1,000,000, the operation is very slow.
I do have to use limit m, n; I can't use something like id > 1,000,000 limit n.
How can I optimize this statement for better performance?
Perhaps you could create an indexing table which provides a sequential key relating to the key in your target table. Then you can join this indexing table to your target table and use a where clause to more efficiently get the rows you want.
#create table to store sequences
CREATE TABLE seq (
seq_no int not null auto_increment,
id int not null,
primary key(seq_no),
unique(id)
);
#create the sequence
TRUNCATE seq;
INSERT INTO seq (id) SELECT id FROM mytable ORDER BY id;
#now get 1000 rows from offset 1000000
SELECT mytable.*
FROM mytable
INNER JOIN seq USING(id)
WHERE seq.seq_no BETWEEN 1000000 AND 1000999;
If records are large, the slowness may be coming from loading the data. If the id column is indexed, then just selecting it will be much faster. You can then do a second query with an IN clause for the appropriate ids (or could formulate a WHERE clause using the min and max ids from the first query.)
slow:
SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000
fast:
SELECT id FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000
SELECT * FROM table WHERE id IN (1,2,3...10)
There's a blog post somewhere on the internet on how you should best make the selection of the rows to show should be as compact as possible, thus: just the ids; and producing the complete results should in turn fetch all the data you want for only the rows you selected.
Thus, the SQL might be something like (untested, I'm not sure it actually will do any good):
select A.* from table A
inner join (select id from table order by whatever limit m, n) B
on A.id = B.id
order by A.whatever
If your SQL engine is too primitive to allow this kind of SQL statements, or it doesn't improve anything, against hope, it might be worthwhile to break this single statement into multiple statements and capture the ids into a data structure.
Update: I found the blog post I was talking about: it was Jeff Atwood's "All Abstractions Are Failed Abstractions" on Coding Horror.
I don't think there's any need to create a separate index if your table already has one. If so, then you can order by this primary key and then use values of the key to step through:
SELECT * FROM myBigTable WHERE id > :OFFSET ORDER BY id ASC;
Another optimisation would be not to use SELECT * but just the ID so that it can simply read the index and doesn't have to then locate all the data (reduce IO overhead). If you need some of the other columns then perhaps you could add these to the index so that they are read with the primary key (which will most likely be held in memory and therefore not require a disc lookup) - although this will not be appropriate for all cases so you will have to have a play.
Paul Dixon's answer is indeed a solution to the problem, but you'll have to maintain the sequence table and ensure that there is no row gaps.
If that's feasible, a better solution would be to simply ensure that the original table has no row gaps, and starts from id 1. Then grab the rows using the id for pagination.
SELECT * FROM table A WHERE id >= 1 AND id <= 1000;
SELECT * FROM table A WHERE id >= 1001 AND id <= 2000;
and so on...
I have run into this problem recently. The problem was two parts to fix. First I had to use an inner select in my FROM clause that did my limiting and offsetting for me on the primary key only:
$subQuery = DB::raw("( SELECT id FROM titles WHERE id BETWEEN {$startId} AND {$endId} ORDER BY title ) as t");
Then I could use that as the from part of my query:
'titles.id',
'title_eisbns_concat.eisbns_concat',
'titles.pub_symbol',
'titles.title',
'titles.subtitle',
'titles.contributor1',
'titles.publisher',
'titles.epub_date',
'titles.ebook_price',
'publisher_licenses.id as pub_license_id',
'license_types.shortname',
$coversQuery
)
->from($subQuery)
->leftJoin('titles', 't.id', '=', 'titles.id')
->leftJoin('organizations', 'organizations.symbol', '=', 'titles.pub_symbol')
->leftJoin('title_eisbns_concat', 'titles.id', '=', 'title_eisbns_concat.title_id')
->leftJoin('publisher_licenses', 'publisher_licenses.org_id', '=', 'organizations.id')
->leftJoin('license_types', 'license_types.id', '=', 'publisher_licenses.license_type_id')
The first time I created this query I had used the OFFSET and LIMIT in MySql. This worked fine until I got past page 100 then the offset started getting unbearably slow. Changing that to BETWEEN in my inner query sped it up for any page. I'm not sure why MySql hasn't sped up OFFSET but between seems to reel it back in.