I have this MySQL query which seems to be very very slow. It takes 3 seconds to run. This is trying to get all the posts from either who they're following or any interests they have. Also its trying to make sure it doesn'tshow any duplicate shares that match any post_id. What do you guys think I should do?
SELECT p.*,
IFNULL(post_data_share, UUID()) AS unq_share,
UNIX_TIMESTAMP(p.post_time) AS a
FROM posts p
LEFT JOIN users_interests i ON (i.user_id=1
AND p.post_interest = i.interest)
LEFT JOIN following f ON (f.user_id=1
AND p.post_user_id = f.follower_id)
WHERE (post_user_id=1
OR f.follower_id IS NOT NULL
OR i.interest IS NOT NULL)
AND (POST_DATA_SHARE NOT IN
(SELECT POST_ID
FROM posts p
LEFT JOIN following f ON f.user_id=1
AND p.post_user_id = f.follower_id
LEFT JOIN users_interests i ON (i.user_id=1
AND p.post_interest = i.interest)
WHERE (post_user_id=1
OR f.follower_id IS NOT NULL
OR i.interest IS NOT NULL))
OR POST_DATA_SHARE IS NULL)
GROUP BY unq_share
ORDER BY `post_id` DESC LIMIT 10;
Below are the Performance tips will definitely make difference.
Try Explain Statement.
Alter the Table by Normalize your tables by adding Primary Key and Foreign Key
Add Index for Repeated Values.
Avoid select * from table. Mention the specify column name.
Convert IS NULL to (='')
Convert IS NOT NULL to (!='')
Avoid More OR Condition.
MySQL Configurations to explore
key_buffer_size
innodb_buffer_pool_size
query_cache_size
thread_cache
Much more refer this SO Answer Best my.cnf configuration for a 8GB MySQL server with MyISAM use only
I would start by looking at the execution plan for the query. Here is a link to MySQL documentation on the EXPLAIN keyword to show you how the optimizer is structuring your query: http://dev.mysql.com/doc/refman/5.5/en/using-explain.html
If CPU usage is low, likely the bottleneck is disk access for large table scans.
The way the query is executed is often different from how it was written. Once you see how the execution plan is structured, you are probably going to create indexes on the largest joins. Every table should have one clustered index (often it is created by default), but other fields can often benefit from unclustered indexes.
If the problem is extremely bad and this is vital to your application, you may want to consider reorganizing the database.
Related
I have this query which basically goes through a bunch of tables to get me some formatted results but I can't seem to find the bottleneck. The easiest bottleneck was the ORDER BY RAND() but the performance are still bad.
The query takes from 10 sec to 20 secs without ORDER BY RAND();
SELECT
c.prix AS prix,
ST_X(a.point) AS X,
ST_Y(a.point) AS Y,
s.sizeFormat AS size,
es.name AS estateSize,
c.title AS title,
DATE_FORMAT(c.datePub, '%m-%d-%y') AS datePub,
dbr.name AS dateBuiltRange,
m.myId AS meuble,
c.rawData_id AS rawData_id,
GROUP_CONCAT(img.captionWebPath) AS paths
FROM
immobilier_ad_blank AS c
LEFT JOIN PropertyFeature AS pf ON (c.propertyFeature_id = pf.id)
LEFT JOIN Adresse AS a ON (c.adresse_id = a.id)
LEFT JOIN Size AS s ON (pf.size_id = s.id)
LEFT JOIN EstateSize AS es ON (pf.estateSize_id = es.id)
LEFT JOIN Meuble AS m ON (pf.meuble_id = m.id)
LEFT JOIN DateBuiltRange AS dbr ON (pf.dateBuiltRange_id = dbr.id)
LEFT JOIN ImageAd AS img ON (img.commonAd_id = c.rawData_id)
WHERE
c.prix != 0
AND pf.subCatMyId = 1
AND (
(
c.datePub > STR_TO_DATE('01-04-2016', '%d-%m-%Y')
AND c.datePub < STR_TO_DATE('30-04-2016', '%d-%m-%Y')
)
OR date_format(c.datePub, '%d-%m-%Y') = '30-04-2016'
)
AND a.validPoint = 1
GROUP BY
c.id
#ORDER BY
# RAND()
LIMIT
5000
Here is the explain query:
Visual Portion:
And here is a screenshot of mysqltuner
EDIT 1
I have many indexes Here they are:
EDIT 2:
So you guys did it. Down to .5 secs to 2.5 secs.
I mostly followed all of your advices and changed some of my.cnf + runned optimized on my tables.
You're searching for dates in a very suboptimal way. Try this.
... c.datePub >= STR_TO_DATE('01-04-2016', '%d-%m-%Y')
AND c.datePub < STR_TO_DATE('30-04-2016', '%d-%m-%Y') + INTERVAL 1 DAY
That allows a range scan on an index on the datePub column. You should create a compound index for that table on (datePub, prix, addresse_id, rawData_id) and see if it helps.
Also try an index on a (valid_point). Notice that your use of a geometry data type in that table is probably not helping anything.
To begin with you have quite a lot of indexes but many of them are not useful. Remember more indexes means slower inserts and updates. Also mysql is not good at using more than one index per table in complex queries. The following indexes have a cardinality < 10 and probably should be dropped.
IDX_...E88B
IDX....62AF
IDX....7DEE
idx2
UNIQ...F210
UNIQ...F210..
IDX....0C00
IDX....A2F1
At this point I got tired of the excercise, there are many more
Then you have some duplicated data.
point
lat
lng
The point field has the lat and lng in it. So the latter two are not needed. That means you can lose two more indexes idxlat and idxlng. I am not quite sure how idxlng appears twice in the index list for the same table.
These optimizations will lead to an overall increase in performance for INSERTS and UPDATES and possibly for all SELECTs as well because the query planner needs to spend less time deciding which index to use.
Then we notice from your explain that the query does not use any index on table Adresse (a). But your where clause has a.validPoint = 1 clearly you need an index on it as suggested by #Ollie-Jones
However I suspect that this index may have low cardinality. In that case I recommend that you create a composite index on this column + another.
The problem is your join with (a). The table has an index, but the index can't be used, more than likely due to the sort (/group by), or possibly incompatible types. The EXPLAIN shows three quarters of a million rows examined, this means that index lookup was not possible.
When designing a query, look for the smallest possible result set - search by that index, and then join from there. Perhaps "c" isn't the best table for the primary query.
(You could try using FORCE INDEX (id) on table a, if it doesn't work, the error may give you more information).
As others have pointed out, you need an index on a.validPoint but what about c.datePub that is also used in the WHERE clause. Why not a multiple column index on datePub, address_id the index on address_id is already used, so a multiple column index will be better here.
I have very large table with 17,044,833 Rows and 6.4 GB in size. I am running the simple query below and it takes like 5 seconds. Any ideas what optimizations can I do to improve the speed of this query?
SELECT
`stat_date`,
SUM(`adserver_impr`),
SUM(`adserver_clicks`)
FROM `dfp_stats` WHERE
`stat_date` >= '2014-02-01'
AND
`stat_date` <= '2014-02-28'
MySQL Config:
key_buffer = 16M
max_allowed_packet = 16M
thread_stack = 192K
thread_cache_size = 8
innodb_buffer_pool_size = 10G
Server:
Memory: 48GB
Disk: 480GB
UPDATE
ORIGINAL QUERY:
EXPLAIN
SELECT
DS.`stat_date` 'DATE',
DC.`name` COUNTRY,
DA.`name` ADVERTISER,
DOX.`id` ORDID,
DOX.`name` ORDNAME,
DLI.`id` LIID,
DLI.`name` LINAME,
DLI.`is_ron` ISRON,
DOX.`is_direct` ISDIRECT,
DSZ.`size` LISIZE,
PUBSITE.`id` SITEID,
SUM(DS.`adserver_impr`) 'DFPIMPR',
SUM(DS.`adserver_clicks`) 'DFPCLCKS',
SUM(DS.`adserver_rev`) 'DFPREV'
FROM `dfp_stats` DS
LEFT JOIN `dfp_adunit1` AD1 ON AD1.`id` = DS.`dfp_adunit1_id`
LEFT JOIN `dfp_adunit2` AD2 ON AD2.`id` = DS.`dfp_adunit2_id`
LEFT JOIN `dfp_adunit3` AD3 ON AD3.`id` = DS.`dfp_adunit3_id`
LEFT JOIN `dfp_orders` DOX ON DOX.`id` = DS.`dfp_order_id`
LEFT JOIN `dfp_advertisers` DA ON DA.`id` = DOX.`dfp_advertiser_id`
LEFT JOIN `dfp_lineitems` DLI ON DLI.`id` = DS.`dfp_lineitem_id`
LEFT JOIN `dfp_countries` DC ON DC.`id` = DS.`dfp_country_id`
LEFT JOIN `dfp_creativesize` DSZ ON DSZ.`id` = DS.`dfp_creativesize_id`
LEFT JOIN `pubsites` PUBSITE
ON AD1.`pubsite_id` = PUBSITE.`id`
OR AD2.`pubsite_id` = PUBSITE.`id`
WHERE
DS.`stat_date` >= '2014-02-01'
AND DS.`stat_date` <= '2014-02-28'
AND PUBSITE.`id` = 6
GROUP BY DLI.`id`,DS.`stat_date`;
RESULTS OF EXPLAIN: (This is after adding the COVERING INDEX)
http://i.stack.imgur.com/vhVeB.png
If you haven't, you might want to index the stat_date field for faster lookups. Here's the syntax:
ALTER TABLE TABLE_NAME ADD INDEX (COLUMN_NAME);
Read more about indexing and optimizations here: https://dev.mysql.com/doc/refman/5.5/en/optimization-indexes.html
For best performance of this query, create a covering index:
... ON `dfp_stats` (`stat_date`,`adserver_impr`,`adserver_clicks`)
The output from EXPLAIN should show "Using index". This means that the query can be satisfied entirely from the index, without needing to visit any pages in the underlying table. (The term "covering index" refers to an index that includes all of the columns referenced by a query.)
At a minimum, you'll want an index with a leading column of stat_date so that the query can use an index range scan operation. An index range scan can essentially skip over boatloads of rows, and more quickly locate the rows that actually need to be checked.
As far as changes to the configuration of the MySQL instance, that really depends on whether the table is InnoDB or MyISAM.
FOLLOWUP
For InnoDB, memory is still king. If there's memory available on the server, then you can increase innodb_buffer_pool.
Also consider enabling the MySQL query cache. (We have the query cache enabled only for queries that are specifically enabled to use the cache with the SQL_CACHE keyword i.e. SELECT SQL_CACHE t.foo,, so we don't clutter up the cache with queries that don't give us benefit. For other queries, we avoid running the extra code (that would otherwise be required) to search the cache and maintain the cache contents.
The place we get a benefit from the query cache is from "expensive" queries (which look at a lot of rows and do a lot of joins) against tables that are relatively static, and that return small resultsets. (I'd consider a query that gets a single row with a SUMs from a whole boatload of rows would be a good candidate for the query cache, if the table is infrequently updated, or if the same query is going to be run several times before a DML operation on the table invalidates the cache.)
It's a bit odd that your query is returning a non-aggregate that isn't in a GROUP BY clause.
If your query is using an index on stat_date, it's likely the query is returning the lowest value of stat_date within the range specified by the predicate; so it's likely that you would get an equivalent result using SELECT MIN(stat_date) AS stat_date.
A more complicated approach would be to setup a "summary" table, and refresh that periodically with the results from a query, and then have the application query the summary table. (A data warehouse type approach.) This doesn't work if you need "up-to-the-minute" accuracy. To get that, you'd likely need to introduce triggers on the target table, to maintain the summary table on INSERT, UPDATE and DELETE operations.
If I went that route, I'd probably opt for storing a summary row for each stat_date, so it could accommodate queries on any range or set of dates...
CREATE TABLE dfp_stats_summary
( stat_date DATE NOT NULL PRIMARY KEY
, adserver_impr BIGINT
, adserver_clicks BIGINT
) ENGINE=InnoDB ;
-- refresh
INSERT INTO dfp_stats_summary (stat_date, adserver_impr, adserver_clicks)
SELECT t.stat_date
, SUM(t.adserver_impr) AS adserver_impr
, SUM(t.adserver_clicks) AS adserver_clicks
FROM dfp_stats
GROUP BY t.stat_date
ON DUPLICATE KEY
UPDATE adserver_impr = VALUES(adserver_impr)
, adserver_clicks = VALUES(adserver_clicks)
;
The refresh query will crank; you might want to specify a date range in a WHERE clause to do a month or two at a time, and loop through all the possible months.
With the summary table populated, just change the original query to reference the new summary table, rather than the detail table. It would be a lot faster to add up 28 summary rows than several hundred thousands detail rows.
I have a left join:
$query = "SELECT a.`id`, a.`documenttitle`, a.`committee`, a.`issuedate`, b.`tagname`
FROM `#__document_management_documents` AS a
LEFT JOIN `#__document_managment_tags` AS b
ON a.id = b.documentid
".$tagexplode."
".$issueDateText."
AND a.committee in (".$committeeQueryTextExplode.")
AND a.documenttitle LIKE '".$documentNameFilter."%'
GROUP BY a.id ORDER BY a.documenttitle ASC
";
It's really slow abaout 7 seconds on 4000 records
Any ideas what I might be doing wrong
SELECT a.`id`, a.`documenttitle`, a.`committee`, a.`issuedate`, b.`tagname`
FROM `w4c_document_management_documents` AS a
LEFT JOIN `document_managment_tags` AS b
ON a.id = b.documentid WHERE a.issuedate >= ''
AND a.committee in ('1','8','9','10','11','12','13','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31','32','33','34','35','36','37','38','39','40','41','42','43','44','45','46','47')
AND a.documenttitle LIKE '%' GROUP BY a.id ORDER BY a.documenttitle ASC
I would put an index on a.committee, and full text index the doctitle col. The IN and LIKE are immediate flags to me. Issue date should also have an index because you are >= it
Try running the following commands in a MySQL client:
show index from #__document_management_documents;
show index from #_document_management_tags;
Check to see if there are keys/indexes on the id and documentid fields from the respective tables. If there aren't, MySQL will be doing a full table scan to lookup the values. Creating indexes on these fields makes the search time logarithmic, because it sorts them in a binary tree which is stored in the index file. Even better is to use primary keys (if possible), because that way the row data is stored in the leaf, which saves MySQL another I/O operation to lookup the data.
It could also simply be that the IN and >= operators have bad performance, in which case you might have to rewrite your queries or redesign your tables.
As mentioned above, try to find if your columns have index. You can even do "EXPLAIN" command in your MySQL client at the start of your query to see if the query is actually using indexes. You will see in the 'key' columns and 'Extra' column. Get more information here
This will help you optimize your query. Also group by causes using temporary and filesort which causes MySQL to create a temporary table and going through each rows. If you could use PHP to group by it would be faster.
I have this simple join that works great but is HORRIBLY slow I think because the tech table is very large. There are many instances of uid as it tracks timestamp of the uid thus the distinct. What is the best way to speed this query up?
SELECT DISTINCT tech.uid,
listing.empno,
listing.firstname,
listing.lastname
FROM tech,
listing
WHERE tech.uid = listing.empno
ORDER BY listing.empno ASC
First add an Index to tech.UID and listing.EmpNo on their respective tables.
After you are sure there are indexes you can try to re-write your query like this:
SELECT DISTINCT tech.uid, listing.EmpNo, listing.FirstName, listing.LastName
FROM listing INNER JOIN tech ON tech.uid = listing.EmpNo
ORDER BY listing.EmpNo ASC;
If it's still not fast enough, put the word EXPLAIN before the query to get some hints about the execution plan of the query.
EXPLAIN SELECT DISTINCT tech.uid, listing.EmpNo, listing.FirstName, listing.LastName
FROM listing INNER JOIN tech ON tech.uid = listing.EmpNo
ORDER BY listing.EmpNo ASC;
Posts the Explain results so we can get better insight.
Hope it helps,
This is very simple query. Only thing you can do in SQL - you may add indexes on fields used in JOIN/WHERE and ORDER BY clauses (tech.uid, listing.empno), if there are no indexes.
If there are JOIN fields with NULL values - they may ruin your performance. You should filter them in WHERE clause (WHERE tech.uid is not null and listing.empno not null). If there are many rows with JOIN on NULL field - that data may produce cartesian result (not sure how is this called in english) with may contain enormous count of rows.
You may change MySQL configuration. There are many options useful for performance tuning, like key_buffer_size, sort_buffer_size, tmp_table_size, max_heap_table_size, read_buffer_size etc.
I'm having a serious problem with MySQL (innoDB) 5.0.
A very simple SQL query is executed with a very unexpected query plan.
The query:
SELECT
SQL_NO_CACHE
mbCategory.*
FROM
MBCategory mbCategory
INNER JOIN ResourcePermission as rp
ON rp.primKey = mbCategory.categoryId
where mbCategory.groupId = 12345 AND mbCategory.parentCategoryId = 0
limit 20;
MBCategory - contains 216583 rows
ResourcePermission - contains 3098354 rows.
In MBCategory I've multiple indexes (columns order as in index):
Primary (categoryId)
A (groupId,parentCategoryId,categoryId)
B (groupId,parentCategoryId)
In ResourcePermission I've multiple indexes (columns order as in index):
Primary - on some column
A (primKey).
When I look into query plan Mysql changes tables sequence and selects rows from ResourcePermission at first and then it joins the MBCategory table (crazy idea) and it takes ages. So I added STRAIGHT_JOIN to force the innodb engine to use correct table sequence:
SELECT
STRAIGHT_JOIN SQL_NO_CACHE
mbCategory.*
FROM
MBCategory
mbCategory
INNER JOIN ResourcePermission as rp
ON rp.primKey = mbCategory.categoryId
where mbCategory.groupId = 12345 AND mbCategory.parentCategoryId = 0
limit 20;
But here the second problem materialzie:
In my opinion mysql should use index A (primKey) on the join operation instead it performs Range checked for each record (index map: 0x400) and it again takes ages !
Force index doesn't help, mysql still performing Range checked for each record .
There are only 23 rows in the MBCategory which fulfill where criteria, and after join there are only 75 rows.
How can I make mysql to choose correct index on this operation ?
Ok,
elementary problem.
I owe myself a beer.
The system I'm recently tunning is not a system I've developted - I've been assigned to it by my management to improve performance (originall team doesn't have knowledge on this topic).
After fee weeks of improving SQL queries, indexes, number of sql queries that are beeing executed by application I didn't check one of the most important things in this case !!
COLUMN TYPES ARE DIFFERENT !
Developer who have written than kind of code should get quite a big TALK.
Thanks for help !
I had the same problem with a different cause. I was joining a large table, and the ON clause used OR to compare the primary key (ii.itemid) to two different columns:
SELECT *
FROM share_detail sd
JOIN box_view bv ON sd.container_id = bv.id
JOIN boxes b ON b.id = bv.shared_id
JOIN item_index ii ON ii.itemid = bv.shared_id OR b.parent_itemid = ii.itemid;
Fortunately, it turned out the parent_itemid comparison was redundant, so I was able to remove it. Now the index is being used as expected. Otherwise, I was going to try splitting the item_index join into two separate joins.