SQL query optimization on MySQL - mysql

I have a SQL query that is taking too much time to execute. How can I optimize it so that it should not take much time it is taking around 620sec that means 10 min.
| 190543 | root | localhost | ischolar | Query | 620 | Copying to tmp table
SELECT a.article_id, count(a.article_id) AS views
FROM timed_views_log a
INNER JOIN published_articles pa
ON (a.article_id = pa.article_id)
WHERE
a.date BETWEEN date_format(curdate() - interval 1 month,'%Y-%m-01 00:00:00') AND
date_format(last_day(curdate()-interval 1 month),'%Y-%m-%d 23:59:59')
GROUP BY a.article_id
ORDER BY
views desc
LIMIT 6, 5;

You may try adding indices which target the join and where conditions:
CREATE INDEX idx1 ON timed_views_log (date, article_id);
CREATE INDEX idx2 ON published_articles (article_id);
The first index, if used, should speed up the WHERE clause by allowing MySQL to use only the index to satisfy your filters on the date. The second index should allow MySQL to do the lookup for the join faster.

If you are using SQL server you can use sql server query execution plan and optimizations suggested by it.
reference article - https://www.sqlshack.com/using-the-sql-execution-plan-for-query-performance-tuning/
your query is a join with where clause, so mostly the data in the tables itself is large, try adding index.

Related

Slow update query despite index

I have a query updating a large table (4.1 million rows) and using an even larger origin table on which I do aggregation (63 million rows):
update
table1 t1,
(select
user_id,
count(distinct date(started_at)) as count_s
from sand s
where started_at >= (DATE(NOW()) - INTERVAL 7 DAY)
group by user_id) t2
set m.distinct_days_1week = t2.count_s
where m.user_id = t2.user_id
Both tables are indexed on user_id.
sand table is also indexed on started_at
On other update queries I usually run full destination table update in less than 5 mins, but I guess since the origin table is large, it takes much longer time.
The subquery if run alone, runs in less than 4s (didnt measure exactly though).
Explain shows that indexes are used and that the where clause filters a large part of the big sand table.
1 PRIMARY <derived2> ALL 12786201 100.0 Using where
1 UPDATE m eq_ref PRIMARY,user_id_IDX PRIMARY 8 t2.user_id 1 100.0 Using where
2 DERIVED s index user_id_IDX,started_at_IDX user_id_IDX 9 63784993 20.05 Using where
What am I missing to optimize that query?
For sand, replace your single column INDEX(started_at) and INDEX(user_id) with the following. Both are "composite" and "covering". (I don't know which order is better.)
INDEX(started_at, user_id)
INDEX(user_id, started_at)
DATE(NOW()) --> CURDATE() (a trivial simplification)
Are t1 and m the same??? Assuming so, it needs some kind of index starting with user_id.
Please provide SHOW CREATE TABLE for each table.

MySQL performance issue on a simple two tables joined Query

I'm facing a performance issue on MySQL and I'm unable to understand where I'm wrong. The machine runs MySQLServer 5.7.15 with two Xeon 64bit Processors and 8GBytes of RAM.
I've got two tables:
Table data_raw contains several fields (see VRMS0,VRMS1,VRMS2,PWRA0,PWRA1,PWRA2)
describing the voltages and active powers acquired from complicated instrumentation every 30 seconds from several probes on the field, each probe is uniquely identified by its DEVICE_ID.
Table data_timeslot contains few fields and is used to keep trace of when the single data_raw record was sent (see SRV_TIMESTAMP field)
and from which device (see DEVICE_ID field).
Each table contains about 7.800.000 records.
The two tables are joined using a PK on ID (auto-increment) on data_timeslot and a PK on TIMESLOT_ID (auto-increment) on data_timeslot.
Here is the query:
SELECT D.VRMS0,D.VRMS1,D.VRMS2,D.PWRA0,D.PWRA1,D.PWRA2,T.DEVICE_ID, T.SRV_TIMESTAMP
FROM data_raw AS D FORCE INDEX(PRIMARY)
INNER JOIN data_timeslots AS T ON T.ID=D.TIMESLOT_ID
WHERE T.DEVICE_ID='CEC02'
ORDER BY T.ID DESC LIMIT 1
The query takes always 10 seconds while the same query on a single table takes few milliseconds.
In other words the query
SELECT * FROM 'data_raw' order by TIMESLOT_ID desc limit 1
takes just 0.0071 sec and the query
SELECT * FROM 'data_timeslots' order by ID desc limit 1
takes just 0.0042 sec so I'm wondering why the join takes so long.
Where is the bottleneck?
P.S. The 'extend' shows that the DB is using properly the PK for the operation.
Below the extend printout:
`EXPLAIN SELECT D.VRMS0,D.VRMS1,D.VRMS2,D.PWRA0,D.PWRA1,D.PWRA2,T.DEVICE_ID, T.SRV_TIMESTAMP FROM data_raw AS D INNER JOIN data_timeslots AS T ON T.ID=D.TIMESLOT_ID WHERE T.DEVICE_ID='XXXXX' ORDER BY T.ID ASC LIMIT 1
1 SIMPLE T index PRIMARY,PK_CLUSTER_T,DEVICE_ID PRIMARY 8 30 3.23 Using where
1 SIMPLE D eq_ref PRIMARY PRIMARY 8 splc_smartpwr.T.ID 1 100.00 NULL`
UPDATE (suggested by #Alberto_Delgado_Roda): if I use ASC LIMIT 1 the query takes just 0,0261 sec
Reply to "why"
Data_timeslots has a clusteted index that suits the ascending order
How the Clustered Index Speeds Up Queries
Accessing a row through the clustered index is fast because the index search leads directly to the page with all the row data. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record. (For example, MyISAM uses one file for data rows and another for index records.)
See https://dev.mysql.com/doc/refman/5.7/en/innodb-index-types.html
Try this:
1: What happen if do you replace INNER JOIN for STRAIGHT_JOIN?
SELECT D.VRMS0,D.VRMS1,D.VRMS2,D.PWRA0,D.PWRA1,D.PWRA2,T.DEVICE_ID, T.SRV_TIMESTAMP
FROM data_raw AS D FORCE INDEX(PRIMARY)
STRAIGHT_JOIN data_timeslots AS T ON T.ID=D.TIMESLOT_ID
WHERE T.DEVICE_ID='CEC02'
ORDER BY T.ID DESC LIMIT 1
What happen if do you replace DESC LIMIT 1 for ASC LIMIT 1?
I just figured out that the query:
SELECT T.ID,T.DEVICE_ID, T.SRV_TIMESTAMP, D.VRMS0,D.VRMS1,D.VRMS2,D.PWRA0,D.PWRA1,D.PWRA2 FROM data_timeslots as T INNER JOIN data_raw AS D ON D.TIMESLOT_ID=T.ID ORDER BY T.ID DESC LIMIT 1
runs in just 0.0174 sec as expected. I just reversed the order in the SELECT statement and the result changed dramatically. The question now is why???

How to optimize this slow query?

I have the following query which runs very slow (almost 50000 records)
SELECT
news.id,
news.title,
DATE_FORMAT(news.date_online,'%d %m %Y')AS newsNL_Date
news_categories.parent_id
FROM
news,
news_categories
WHERE
DATE(news.date_online)=2013-02-25
AND news.category_id = news_categories.id
AND news.date_online < NOW()
AND news.activated ='t'
ORDER BY
news.date_online DESC
I have MySQL client version 5.0.96
When I run a EXPLAIN EXTENDED call with this query, this is the result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE news ref category_id,date_online,activated activated 1 const 43072 Using where; Using filesort
1 SIMPLE news_categories eq_ref PRIMARY,id PRIMARY 4 news_category_id 1
I have an index on the following columns
news_id (primary key)
date_online
activated
category_id
When I look at the EXPLAIN EXTENDED result, I see USING_WHERE; USING FILESORT. I know both of them are bad, but I don't know how to fix this.
Try adding indexes for date_online (type B-tree). Also I would try to avoid using NOW() in the where close rather set it in a variable and use the variable instead.
I suppose the the id fields are keys so they are already indexed.
Use a LEFT JOIN and see the difference
SELECT
news.id,
news.title,
DATE_FORMAT(news.date_online,'%d %m %Y')AS newsNL_Date
news_categories.parent_id
FROM
news
LEFT JOIN news_categories ON news_categories.id = news.category_id
WHERE
DATE(news.date_online)= '2013-02-25'
AND DATE(news.date_online) < DATE(NOW())
AND news.activated ='t'
ORDER BY
news.date_online DESC
Further to the other worthwhile suggestions, you have DATE(news.date_online)=2013-02-25. This would seem to force MySQL to convert news.date_online to a different format for every row on the table. This will stop the indexes being useful.
Possibly change it to news.date_online BETWEEN '2013-02-25 00:00:00' AND '2013-02-25 23:59:59' which should (I think) allow the index on date_online to be used
Your query is getting data for a specific day, so in this case you can omit the AND news.date_online < NOW() part (Maybe the query optimizer does this for you).
In the other case, where you want all active news without specifying a date_online, you also should get rid of the NOW(). The problem is, that your query cannot be cached by the database, because it contains parts ( NOW() ) that are different for every execution. You can do this by providing a calculated constant there. If you omit let's say the minute part during comparison, it can be cached for max. 60 seconds.
Next, you should create a multi-column index for the columns that you are using, because I guess category_id, date_online and activatedare used in almost every query. This will make inserts a little slower, but from what it looks like (news application) there will be much more reads than inserts.

optimize mysql indexes

I have a slow (1.4 second) query that has been bugging me for a while so I just thought I'd put it up and see if anyone can help me optimize my indexes to speed it up:
select sql_calc_found_rows t.id, q.im_id, concat(t.si_id, ' ', t.de), q.date, q.das, q.dac, u.name, q.ac, q.st
from t300q q
left join t300 t on t.id = q.con_id
left join users u on u.id = q.user_id
order by q.date desc limit 0,100
sql explain results:
SIMPLE q ALL 89126 Using filesort
SIMPLE t eq_ref PRIMARY PRIMARY 4 db.q.con_id 1
SIMPLE u eq_ref PRIMARY PRIMARY 4 db.q.user_id 1
session stats:
Handler_read_first = 0
Handler_read_key = 177934
Handler_read_next = 23
Handler_read_prev = 679
Handler_read_rnd = 15
Handler_read_rnd_next = 89127
and I have the following indexes:
t.id - primary key
q.con_id |
q.date | - all form a single index
q.user_id |
u.id - primary key
as you can see from the handler stats the size of table q is 89126 rows.
It's not a massive problem but I would like to get the speed down below 1 second for this query if possible.
The query is slow because you don't have a date index. The compound index cannot be used because the date is in the middle. Either move the date to be the first field in the existing index or create a stand alone index.
BTW mysql uses only equality for the first two columns of a 3 column index. The last column can use ranged queries.
Namely:
WHERE x=? AND y=? order by z;
will use an index of columns (x,y,z) (since z can be ranged).
Try moving 'date' to the 3rd column and rewriting the query.
If that doesn't work, then mysql isn't being smart enough to treat con_id and user_id in the join.. Perhaps you could rewrite it so those join conditions happen in the where clause.
try to trigger the OPTIMIZE or ANALYZE on your database but make sure that you trigger this on the time on which there are only few request or much better if there are no request that are being done on the server to avoid any problems to arise you may see more informations about this statements on this links:
http://dev.mysql.com/doc/refman/5.6/en/analyze-table.html
http://dev.mysql.com/doc/refman/4.1/en/optimize-table.html

Optimising sql query performing internal comparison

I have the following query which is a little expensive (currently 500ms):
SELECT * FROM events AS e, event_dates AS ed
WHERE e.id=ed.event_id AND ed.start >= DATE(NOW())
GROUP BY e.modified_datetime, e.id
ORDER BY e.modified_datetime DESC,e.created_datetime DESC
LIMIT 0,4
I have been trying to figure our how to speed it up and noticed that changing ed.start >= DATE(NOW()) to ed.start = DATE(NOW()) runs the query in 20ms. Can anyone help me with ways to speed up this date comparison? Would it help to calculate DATE(NOW()) before running the query??
EDIT: does this help, using EXPLAIN statement
BEFORE
table=event_dates
type=range
rows=25962
ref=null
extra=using where; Using temporary; Using filesort
AFTER
table=event_dates
type=ref
rows=211
ref=const
extra=Using temporary; Using filesort
SELECT * FROM events AS e
INNER JOIN event_dates AS ed ON (e.id=ed.event_id)
WHERE ed.start >= DATE(NOW())
GROUP BY e.modified_datetime, e.id
ORDER BY e.modified_datetime DESC,e.created_datetime DESC
LIMIT 0,4
Remarks
Please don't using implicit SQL '89 syntax, it is an SQL anti-pattern.
Make sure you have an index on all fields used in the join, in the where, in the group by and the order by clauses.
Don't do select * (another anti-pattern), explicitly state the fields you need instead.
Try using InnoDB instead of MyISAM, InnoDB has more optimization tricks for select statements, especially if you only select indexed fields.
For MyISAM tables try using REPAIR TABLE tablename.
For InnoDB that's not an option, but forcing the tabletype from MyISAM to InnoDB will obviously force a full rebuild of the table and all indexes.
Group by implicitly sorts the rows in ASC order, try changing the group by to group by -e.modified_datetime, e.id to minimize the reordering needed by the order by clause. (not sure about this point, would like to know the result)
For reference, using , notation for joins is poor practice AND has been a cause for poor execution plans.
SELECT
*
FROM
events AS e
INNER JOIN
event_dates AS ed
ON e.id=ed.event_id
WHERE
ed.start >= DATE(NOW())
GROUP BY
e.modified_datetime,
e.id
ORDER BY
e.modified_datetime DESC,
e.created_datetime DESC
LIMIT 0,4
Why = is faster than >= is simply because >= is a Range of values, not a very specific value. It's like saying "get me ever page in the book from page 101 onwards" instead of "get me page 101". It's more intensive by definition, especially as your query then involves aggregating and sorting many more records.
In terms of optimisation, your best option is to ensure relevant indexes...
event_dates:
- an index just on start should be sufficient
events:
- an index on id will dramatically improve the join performance
- adding modified_datetime and created_datetime to that index may help
Probably missing indexes on fields you are grouping and searching. Please provide us with: SHOW INDEXES FROM events and SHOW INDEXES FROM event_dates
If there are no indexes then you can add them:
ALTER TABLE events ADD INDEX(modified_datetime);
ALTER TABLE events ADD INDEX(created_datetime);
ALTER TABLE event_dates ADD INDEX(start);
Also be sure you have them on id fields. But here you would probably like to have them as primary keys.
Calculating DATE(NOW()) in advance will not have any impact on performance. It's computed only once (not for each row). But you have 2 different queries (one with >=, another with =). It seems natural that the first one (>=) takes longer time to execute since it returns many more rows. Also, it may decide to use different execution plan compared to query with = , for example, full table scan instead index seek/scan
You can do something like this
DECLARE #CURRENTDATE AS DATETIME
SET #CURRENTDATE = GETDATE()
then change your code to use
#CURRENTDATE variable.... "e.start >= #CURRENTDATE