mysql order by vs where clause performance - mysql

I have a database with the following info:
table reports: 166211 records
table report_content: 166211 records
table companies: 13188 records
This query takes 41.7324 sec to process:
select rc.* from `reports` r
left join `report_content` rc on rc.`report`=r.`id`
left join `companies` c on c.`id`=r.`company`
where not isnull(r.`published`) and not r.`deleted`
order by rc.`company` asc
limit 50
Where as this query takes 1.6146 sec to process:
I added and rc.company != ''
select rc.* from `reports` r
left join `report_content` rc on rc.`report`=r.`id`
left join `companies` c on c.`id`=r.`company`
where not isnull(r.`published`) and not r.`deleted`
and rc.`company` != ''
order by rc.`company` asc
limit 50
I have fulltext index on rc.company with a cardinality of 11872
all other clause/join fields have btree indexes (mostly primary)
Why is this so? Should i be using a fulltext/btree index on a varchar(255)?
The idea is to not have the rc.company != ''
FYI, the tables are MyISAM
note: The added condition doesn't change the result, it's merely to add rc.company into the conditions (this speeds up the query) and wonder if it is elegant?
Update: Thanks Frail, here are the results:
Query A:
1 SIMPLE r range published published 9 NULL 156085 Using where; Using temporary; Using filesort
1 SIMPLE rc ref report report 4 database.r.id 1
1 SIMPLE c eq_ref PRIMARY PRIMARY 4 database.r.company 1 Using index
Query B:
1 SIMPLE rc ALL report,company NULL NULL NULL 166339 Using where; Using filesort
1 SIMPLE r eq_ref PRIMARY,published PRIMARY 4 database.rc.report 1 Using where
1 SIMPLE c eq_ref PRIMARY PRIMARY 4 database.r.company 1 Using index

As far as I know full-text index is not for sorting. and you are sorting the result with the company column.
Let me try to explain full-text for you simply:
id company
1 "brown fox"
2 "something of fox"
your full-text index would create a btree for the words : "brown, something, fox" so you can match against these words, but as far as I know it wont help you sort.
so if you are using full-text index to get "fox" companies, keep it. but put a btree index on company aswell for sorting purposes.

You haven't listed your table indexes but try creating the following 3 indexes
on the reports table (published,deleted,company,id)
on the report_content table (report,company)
on the companies table (id)

Related

What is compound indexing and how do I use it properly?

I have a really slow query that repeats itself quite a bit. I've tried indexing the individual fields but it doesn't seem to help. The CPU usage is still very high and the queries still appear on the slow query log. It seems I need a compound index?
How would I index the following query properly?
select *
from `to_attachments` left join
`attachments`
on `to_attachments`.`attachment_id` = `attachments`.`id`
where `to_attachments`.`object_type` = 'communicator' and `to_attachments`.`object_id` = '64328'
order by `attachments`.`created_at` desc;
EXPLAIN Result:
1 SIMPLE to_attachments index NULL PRIMARY 775 NULL 244384 Using where; Using index; Using temporary; Using filesort
1 SIMPLE attachments eq_ref PRIMARY PRIMARY 4 quote.to_attachments.attachment_id 1 NULL
Index For to_attachments
You want indexes on to_attachments(object_type, object_id, attachment_id) and attachments(id).
You sequence of the index is wrong it should be (object_type, object_id, attachment_id). In the multicolumn index order of the columns in the index is MATTER.

Query speed drops on two "=" comparisons in WHERE clause

I have a music database with a table for releases and the release titles. This "releases_view" gets the title/title_id and the alternative title/alternative title_id of a track. This is the code of the view:
SELECT
t1.`title` AS title,
t1.`id` AS title_id,
t2.`title` AS title_alt,
t2.`id` AS title_alt_id
FROM
releases
LEFT JOIN titles t1 ON t1.`id`=`releases`.`title_id`
LEFT JOIN titles t2 ON t2.`id`=`releases`.`title_alt_id`
The title_id and title_alt_id fields in the joined tables are both int(11), title and title_alt are varchars.
The issue
This query will take less than 1 ms:
SELECT * FROM `releases_view` WHERE title_id=12345
This query will take less then 1 ms, too:
SELECT * FROM `releases_view` WHERE title_id=12345 OR title_alt_id!=54321
BUT: This query will take 0,2 s. It's 200 times slower!
SELECT * FROM `releases_view` WHERE title_id=20956 OR title_alt_id=38849
As soon I have two comparisons using "=" in the WHERE clause, things really get slow (although all queries only have a couple of results).
Can you help me to understand what is going on?
EDIT
´EXPLAIN´ shows a USING WHERE for the title_alt_id, but I do not understand why. How can I avoid this?
** EDIT **
Here is the EXPLAIN DUMP.
id select_type table partitions type possible_keys key key_len ref rows Extra
1 SIMPLE releases NULL ALL NULL NULL NULL NULL 76802 Using temporary; Using filesort
1 SIMPLE t1 NULL eq_ref PRIMARY PRIMARY 4 db.releases.title_id 1
1 SIMPLE t2 NULL eq_ref PRIMARY PRIMARY 4 db.releases.title_alt_id 1 Using where
The "really slow" is because the Optimizer does not work well with OR.
Plan A (of the Optimizer): Scan the entire table, evaluating the entire OR.
Plan B: "Index Merge Union" could be used for title_id = 20956 OR title_alt_id = 38849 if you have separate indexes in title_id and title_alt_id: use each index to get two lists of PRIMARY KEYs and "merge" the lists, then reach into the table to get *. Multiple steps, not cheap. So Plan B is rarely used.
title_id = 12345 OR title_alt_id != 54321 is a mystery, since it should return most of the table. Please provide EXPLAIN SELECT....
LEFT JOIN (as opposed to JOIN) needs to assume that the row may be missing in the 'right' table.

Slow MySQL query, EXPLAIN shows Using temporary; Using filesort

This query:
EXPLAIN SELECT ppi_loan.customerID,
loan_number,
CONCAT(forename, ' ', surname) AS agent,
name,
broker,
(SELECT timestamp
FROM ppi_sar_status
WHERE history = 0
AND (status = 10 || status = 13)
AND ppi_sar_status.loanID = ppi_loan.loanID) AS ppi_unsure_date,
fosSent,
letterSent,
(SELECT timestamp
FROM ppi_ques_status
WHERE status = 1
AND ppi_ques_status.loanID = ppi_loan.loanID
ORDER BY timestamp DESC LIMIT 1) AS sent_date,
ppi_ques_status.timestamp
FROM ppi_loan
LEFT JOIN ppi_assignments ON ppi_assignments.customerID = ppi_loan.customerID
LEFT JOIN italk.users ON italk.users.id = agentID
LEFT JOIN ppi_ques_status ON ppi_ques_status.loanID = ppi_loan.loanID
JOIN ppi_lenders ON ppi_lenders.id = ppi_loan.lender
JOIN ppi_status ON ppi_status.customerID = ppi_loan.customerID
JOIN ppi_statuses ON ppi_statuses.status = ppi_status.status
AND ppi_ques_status.status = 1
AND ppi_ques_status.history = 0
AND (cc_type = '' || (cc_type != '' AND cc_accepted = 'no'))
AND ppi_loan.deleted = 'no'
AND ppi_loan.customerID != 10
GROUP BY ppi_loan.customerID, loan_number
Is very slow, here are all the results from the EXPLAIN query
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY ppi_ques_status ref loanID,status,history status 3 const 91086 Using where; Using temporary; Using filesort
1 PRIMARY ppi_loan eq_ref PRIMARY,customerID PRIMARY 8 ppimm.ppi_ques_status.loanID 1 Using where
1 PRIMARY ppi_lenders eq_ref PRIMARY PRIMARY 4 ppimm.ppi_loan.lender 1 Using where
1 PRIMARY ppi_assignments eq_ref customerID customerID 8 ppimm.ppi_loan.customerID 1
1 PRIMARY users eq_ref PRIMARY PRIMARY 8 ppimm.ppi_assignments.agentID 1
1 PRIMARY ppi_status ref status,customerID customerID 8 ppimm.ppi_loan.customerID 6
1 PRIMARY ppi_statuses eq_ref PRIMARY PRIMARY 4 ppimm.ppi_status.status 1 Using where; Using index
3 DEPENDENT SUBQUERY ppi_ques_status ref loanID,status loanID 8 func 1 Using where; Using filesort
2 DEPENDENT SUBQUERY ppi_sar_status ref loanID,status,history loanID 8 func 2 Using where
Why is it scanning so many rows and why "Using temporary; Using filesort"?
I can't remove any subqueries as I need all of the results that they produce
As already mentioned in a comment, the main cause of a slow query is that you seem to have single column indexes only, while you would need multi-column indexes to cover the joins, the filters, and the group by.
Also, your query has 2 other issues:
Even though you group by on 2 fields only, several other fields are listed in the select list without being subject to an aggregate function, such as min(). MySQL does allow such queries to be run under certain sql mode settings, but they are still against the sql standard and may have unexpected side effects, unless you really know what your are doing.
You have filters on the ppi_loan table in the join condition that is the left table in a left join. Due to the nature of the left join, these records will not be eliminated from the resultset, but MySQL will not join any values on them. These criteria should be moved to the where clause.
The indexes I would create:
ppi_sar_status: multi-column index on loanID, status, history fields - I would consider moving this to the join section because this table is not there
ppi_ques_status: multi-column index on loanID, status, timestamp fields - this would support both the subquery and the join. Remember, the subquery also has filesort in the explain.
ppi_loan: as a minimum a multi-column index on customerID, loan_number fields to support the group by clause, therefore avoiding the filesort as a minimum. You may consider adding the other fields in the join criteria based on their selectivity to this index.
I'm also not sure why you have the last 2 status tables in the join, since you are not retrieving any values from them. If you re using these tables to eliminate certain records, then consider using an exists() subquery instead of a join. In a join MySQL needs to fetch data from all joined tables, whereas in an exists() subquery it would only check if at least 1 record exists in the resultset without retrieving any actual data from the underlying tables.

Complex MySQL Select Left Join Optimization Indexing

I have a very complex query that is running and finding locations of members joining the subscription details and sorting by distance.
Can someone provide instruction on the correct indexes and cardinality I should add to make this load faster.
Right now on 1 million records it takes 75 seconds and I know it can be improved.
Thank you.
SELECT SQL_CALC_FOUND_ROWS (((acos(sin((33.987541*pi()/180)) * sin((users_data.lat*pi()/180))+cos((33.987541*pi()/180)) * cos((users_data.lat*pi()/180)) * cos(((-118.472153- users_data.lon)* pi()/180))))*180/pi())*60*1.1515) as distance,subscription_types.location_limit as location_limit,users_data.user_id,users_data.last_name,users_data.filename,users_data.user_id,users_data.phone_number,users_data.city,users_data.state_code,users_data.zip_code,users_data.country_code,users_data.quote,users_data.subscription_id,users_data.company,users_data.position,users_data.profession_id,users_data.experience,users_data.account_type,users_data.verified,users_data.nationwide,IF(listing_type = 'Company', company, last_name) as name
FROM `users_data`
LEFT JOIN `users_reviews` ON users_data.user_id=users_reviews.user_id AND users_reviews.review_status='2'
LEFT JOIN users_locations ON users_locations.user_id=users_data.user_id
LEFT JOIN subscription_types ON users_data.subscription_id=subscription_types.subscription_id
WHERE users_data.active='2'
AND subscription_types.searchable='1'
AND users_data.state_code='CA'
AND users_data.country_code='US'
GROUP BY users_data.user_id
HAVING distance <= '50'
OR location_limit='all'
OR users_data.nationwide='1'
ORDER BY subscription_types.search_priority ASC, distance ASC
LIMIT 0,10
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE users_reviews system user_id,review_status NULL NULL NULL 0 const row not found
1 SIMPLE users_locations system user_id NULL NULL NULL 0 const row not found
1 SIMPLE users_data ref subscription_id,active,state_code,country_code state_code 47 const 88241 Using where; Using temporary; Using filesort
1 SIMPLE subscription_types ALL PRIMARY,searchable NULL NULL NULL 4 Using where; Using join buffer
You query is not that complex. You have only one join, on a table subscription_types which is certainly a little table with no more than a few hundred rows.
Where are your indexes ? The best way to improve your query is to create indexes on the field you are filtering, like active, country_code, state_code and searchable
Have you create the foreign key on users_data.subscription_id ? You need an index on that too.
ForceIndex is useless, let the RDBMS determine the best indexes to chose.
Left Join is useless too, because the line subscription_types.searchable='1' will remove the unmatch correspondance
The order on search_priority implies that you need indexes on this columns too
The filtering in the HAVING can make the indexes not used. You don't need to put these filters in the HAVING. If I understand your table schema, this is not really the aggregate that is filtered.
Your table contains 1 million rows, but how much rows are returned, without the limit? With the right indexes, the query should execute under a second.
SELECT ...
FROM `users_data`
INNER JOIN subscription_types
ON users_data.subscription_id = subscription_types.subscription_id
WHERE users_data.active='2'
AND users_data.country_code='US'
AND users_data.state_code='NY'
AND subscription_types.searchable='1'
AND (distance <= '50' OR location_limit='all' OR users_data.nationwide='1')
GROUP BY users_data.user_id
ORDER BY subscription_types.search_priority ASC, distance ASC
LIMIT 0,10

Help on MySQL table indexing when GROUP BY is used in a query

Thank you for your attention.
There are two INNODB tables:
Table authors
id INT
nickname VARCHAR(50)
status ENUM('active', 'blocked')
about TEXT
Table books
author_id INT
title VARCHAR(150)
I'm running a query against these tables, to get each author and a count of books he has:
SELECT a. * , COUNT( b.id ) AS book_count
FROM authors AS a, books AS b
WHERE a.status != 'blocked'
AND b.author_id = a.id
GROUP BY a.id
ORDER BY a.nickname
This query is very slow (takes about 6 seconds to execute). I have an index on books.author_id and it works perfectly, but I do not know how to create an index on authors table, so that this query could use it.
Here is how current EXPLAIN looks:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE a ALL PRIMARY,id_status_nickname NULL NULL NULL 3305 Using where; Using temporary; Using filesort
1 SIMPLE b ref key_author_id key_author_id 5 a.id 2 Using where; Using index
I've looked at MySQL manual on optimizing queries with group by, but could not figure out how I can apply it on my query.
I'll appreciate any help and hints on this - what must be the index structure, so that MySQL could use it?
Edit
I have tried:
(id, status, nickname)
(status, nickname)
Both resulted in the same situation.
I assume that the id_status_nickname is a composite index (id,status,nickname). In your query you filter the rows by saying a.status != blocked. This has following issues:
You dont have an index that can be used for this. (id,status,nickname) cannot be used because status is not the prefix of that index
Assuming you have an index on status, it cannot be used when using !=. you have to change that to status='active'
Also, status being an enum field with just two values the cardinality will be low. So mysql may endup not using the index at all.
You can try this: create index as (status,id,nickname) and use status='active'. My guess is that since you are using '=' and status is the prefix of the index it should select this index and then use it for group by and then order by.Hope this helps.
UPDATE:
Looks like it is not possible to avoid filesort when the WHERE clause does not have the field used in ORDER BY.
I would try an index on (status, nickname). That should get rid of the necessity of "Using filesort".