SELECT BB.NAME BranchName,VI.NAME Village,COUNT(BAC.CBSACCOUNTNUMBER) "No.Of Accounts",
SUM(BAC.CURRENTBALANCE) SumOfAmount,
SUM(CASE WHEN transactiontype = 'C' THEN amount ELSE 0 END) AS CreditTotal,
SUM(CASE WHEN transactiontype = 'D' THEN amount ELSE 0 END) AS DebitTotal,
SUM(CASE WHEN transactiontype = 'C' THEN amount WHEN transactiontype = 'D' THEN -1 * amount ELSE 0 END) AS CurrentBalance
FROM CUSTOMER CU,APPLICANT AP,ADDRESS AD,VILLAGE VI,BANKBRANCH BB,BANKACCOUNT BAC
LEFT OUTER JOIN accounttransaction ACT ON ACT.BANKACCOUNT_CBSACCOUNTNUMBER=BAC.CBSACCOUNTNUMBER
AND DATE_FORMAT(ACT.TRANDATE,'%Y-%m-%d')<='2013-05-09'
AND DATE_FORMAT(BAC.ACCOUNTOPENINGDATE,'%Y-%m-%d') <'2013-05-09'
AND ACT.BANKACCOUNT_CBSACCOUNTNUMBER IS NOT NULL
WHERE CU.CODE=AP.CUSTOMER_CODE AND BAC.ENTITY='CUSTOMER' AND BAC.ENTITYCODE=CU.CODE
AND AD.ENTITY='APPLICANT' AND AD.ENTITYCODE=AP.CODE
AND AD.VILLAGE_CODE=VI.CODE AND VI.STATE_CODE=AD.STATE_CODE AND VI.DISTRICT_CODE=AD.DISTRICT_CODE
AND VI.BLOCK_CODE=AD.BLOCK_CODE AND VI.PANCHAYAT_CODE=AD.PANCHAYAT_CODE
AND CU.BANKBRANCH_CODE=BB.CODE AND BAC.CBSACCOUNTNUMBER IS NOT NULL AND ACT.TRANSACTIONTYPE IS NOT NULL
GROUP BY BB.NAME,VI.NAME LIMIT 10;
and
below is my explain plan
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE AD index ADDRESS_ENTITYCODE ADDRESS_ENTITYCODE 598 NULL 47234 Using where; Using index; Using temporary; Using filesort
1 SIMPLE VI ref PRIMARY PRIMARY 62 fiserveraupgb.AD.VILLAGE_CODE 1 Using where
1 SIMPLE AP eq_ref PRIMARY,AppCodeIndex PRIMARY 62 fiserveraupgb.AD.ENTITYCODE 1
1 SIMPLE BAC ref BANKACCOUNT_ENTITYCODE BANKACCOUNT_ENTITYCODE 63 fiserveraupgb.AP.CUSTOMER_CODE 1 Using where; Using index
1 SIMPLE CU eq_ref PRIMARY,CustCodeIndex PRIMARY 62 fiserveraupgb.AP.CUSTOMER_CODE 1
1 SIMPLE BB ref PRIMARY,Bankbranch_CodeName PRIMARY 62 fiserveraupgb.CU.BANKBRANCH_CODE 1
1 SIMPLE ACT index NULL accounttransaction_sysidindes 280 NULL 22981 Using where; Using index; Using join buffer
Mysql server version 5.5 and I am using mysql workbench below is my query it is taking 13 min to execute, please suggestion the best method I have created the indexes for all the columns which are involved.
You mainly need indexes on columns that are used in joins and in your where clause. Other indexes don't add value for your select statements and slow down your inserts and updates.
In this case, you're using the column values in functions. Due to this the indexes cannot be used efficiently.
An expression like this is very inefficient:
DATE_FORMAT(ACT.TRANDATE,'%Y-%m-%d')<='2013-05-09'
It causes a lot of string conversions, because all TRANDATES are converted to a string representation of their value. These values need to be temporarily stored and are not indexed, so apart from the conversion, any index on ACT.TRANDATE is no longer used. That is probably causing the rather expensive 'Using join buffer' at the end of your explain plan.
Rather convert the string '2013-05-09' to a date value and use this value as a constant in or parameter for your query.
Another thing to do, is create not separate indexes for separate columns, but one index for a group of columns that is used in a where and/or join. For instance this part:
AD.ENTITY = 'APPLICANT' AND
AD.ENTITYCODE = AP.CODE AND
AD.VILLAGE_CODE = VI.CODE
Having one index on the columns ENTITY, ENTITYCODE, and VILLAGE_CODE together would be more efficient than having a separate index for each of them. And it may help to include the other columns as well.
And last: If a column or combination of columns is guaranteed to be unique, ad a unique index. It is slightly faster in selects.
A general advise: Don't mix old join syntax with ansi joins. It makes your query hard to read.
These hints (apart from the last one) should speed up your query, but it can still be slow, depending on the amount of data, the hardware and the load.
Related
Currently I am facing a rather slow query on a website, which also slows down the server on more traffic. How can I rewrite the query or what index can I write to avoid "Using temporary; Using filesort"? Without "order by" everything works fast, but without the wanted result/order.
SELECT cams.name, models.gender, TIMESTAMPDIFF(YEAR, models.birthdate, CURRENT_DATE) AS age, lcs.viewers
FROM cams
LEFT JOIN cam_tags ON cams.id = cam_tags.cam_id
INNER JOIN tags ON cam_tags.tag_id = tags.id
LEFT JOIN model_cams ON cams.id = model_cams.cam_id
LEFT JOIN models ON model_cams.model_id = models.id
LEFT JOIN latest_cam_stats lcs ON cams.id = lcs.cam_id
WHERE tags.name = '?'
ORDER BY lcs.time_stamp_id DESC, lcs.viewers DESC
LIMIT 24 OFFSET 96;
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
tags
NULL
const
PRIMARY,tags_name_uindex
tags_name_uindex
766
const
1
100
Using temporary; Using filesort
1
SIMPLE
cam_tags
NULL
ref
PRIMARY,cam_tags_cams_id_fk,cam_tags_tags_id_fk
cam_tags_tags_id_fk
4
const
75565047
100
Using where
1
SIMPLE
cams
NULL
eq_ref
PRIMARY
PRIMARY
4
cam_tags.cam_id
1
100
NULL
1
SIMPLE
model_cams
NULL
eq_ref
model_platforms_platforms_id_fk
model_platforms_platforms_id_fk
4
cam_tags.cam_id
1
100
NULL
1
SIMPLE
models
NULL
eq_ref
PRIMARY
PRIMARY
4
model_cams.model_id
1
100
NULL
1
SIMPLE
lcs
NULL
eq_ref
PRIMARY,latest_cam_stats_cam_id_time_stamp_id_viewers_index
PRIMARY
4
cam_tags.cam_id
1
100
NULL
There are many cases where it is effectively impossible to avoid "using temporary, using filesort".
"Filesort" does not necessarily involve a "file"; it is often done in RAM. Hence performance may not be noticeably hurt.
That said, I will assume your real question is "How can this query be sped up?".
Most of the tables are accessed via PRIMARY or "eq_ref" -- all good. But the second table involves touching an estimated 75M rows! Often that happens as the first table, not second. Hmmmm.
Sounds like cam_tags is a many-to-many mapping table? And it does not have any index starting with name? See this for proper indexes for such a table: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
Since the WHERE and ORDER BY reference more than one table, it is essentially impossible to avoid "using temporary, using filesort".
Worse than that, it needs to find all the ones with "name='?'", sort the list, skip 96 rows, and only finally deliver 24.
I need to optimise my query which is running very slow, but don't know how to do it. It contains a subquery which is making it very slow. If I remove the inline query then it runs very well.
The query is:
EXPLAIN
SELECT t.service_date,
t.service_time,
(SELECT js.modified_date FROM rej_job_status js WHERE js.booking_id=b.booking_id ORDER BY id DESC LIMIT 1) `cancel_datetime`,
b.booking_id,
b.ref_booking_id,
b.phone, b.city,
b.booking_time,
CONCAT(rc.firstname," ",rc.lastname) customer_name,
rc.phone_no,
rs.service_id,
rs.service_name,
rct.city_name
FROM rej_job_details t
JOIN rej_booking b ON t.booking_id = b.booking_id
JOIN rej_customer rc ON rc.customer_id = b.customer
JOIN rej_service rs ON t.service_id = rs.service_id
JOIN rej_city rct ON rct.city_id=b.city
WHERE t.act_status = 0 AND DATE(b.booking_time) >= '2016-06-01'
AND DATE(b.booking_time) <= '2016-06-14'
ORDER BY b.booking_time DESC
LIMIT 0 , 50
The explain plan shows this:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY b ALL PRIMARY NULL NULL NULL 32357 Using where; Using filesort
1 PRIMARY rct eq_ref PRIMARY PRIMARY 4 crmdb.b.city 1 NULL
1 PRIMARY t ref booking_id booking_id 4 crmdb.b.booking_id 1 Using where
1 PRIMARY rs eq_ref PRIMARY,service_id PRIMARY 4 crmdb.t.service_id 1 NULL
1 PRIMARY rc eq_ref PRIMARY PRIMARY 4 crmdb.b.customer 1 Using where
2 DEPENDENT SUBQUERY js index NULL PRIMARY 4 NULL 1 Using where
a) How to read this explain plan and know what it means?
b) How can I optimize this query?
booking_time is hiding inside a function, so INDEX(booking_time) cannot be used. That leads to a costly table scan.
AND DATE(b.booking_time) >= '2016-06-01'
AND DATE(b.booking_time) <= '2016-06-14'
-->
AND b.booking_time >= '2016-06-01'
AND b.booking_time < '2016-06-15' -- note 3 differences in this line
Or, this might be simpler (by avoiding second date calculation):
AND b.booking_time >= '2016-06-01'
AND b.booking_time < '2016-06-01' + INTREVAL 2 WEEK
In the EXPLAIN, I expect the 'ALL' to become 'range', and 'Filesort' to vanish.
To understand the full explain-plan, you should read the documentation, but the most important information it includes is the indexes mysql uses, or, usually more revealing, which it doesn't use.
For your DEPENDENT SUBQUERY (that is your "inline query"), it doesn't use a good index, which makes your query slow, so you need to add the index rej_job_status(booking_id) on your table rej_job_status.
Create it, test it and check your explain plan again, it should then list that new index under key for your DEPENDENT SUBQUERY.
Another optimization might be to add an index rej_booking(booking_time) for your table rej_booking. It depends on your data if it improves the query, but you should try it, since right now, mysql doesn't use an index for that selection.
I've always thought that Joins were faster than Subqueries. However for a very simple query in a small dataset the Join is returning in 1.0s whereas the Correlated-Subquery returns in 0.001s. Seems like something is wrong. I note that both queries are using the correct (appallingly named) indexes. Over 1 sec seems excessive for the Join. Any ideas?
Please compare these two queries with their Explain plans:
a) Using a Join
select user.id, user.username,
count(distinct bet_placed.id) as bets_placed,
count(distinct bet_won.id) as bets_won,
count(distinct bets_involved.id) as bets_involved
from user
left join bet as bet_placed on bet_placed.user_placed = user.id
left join bet as bet_won on bet_won.user_won = user.id
left join bet_accepters as bets_involved on bets_involved.user = user.id
group by user.id
Explain plan:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE user index PRIMARY PRIMARY 4 NULL 86 100.00 NULL
1 SIMPLE bet_placed ref fk_bet_user1_idx fk_bet_user1_idx 4 xxx.user.id 6 100.00 "Using index"
1 SIMPLE bet_won ref user_won_idx user_won_idx 5 xxx.user.id 8 100.00 "Using index"
1 SIMPLE bets_involved ref FK_user_idx FK_user_idx 4 xxx.user.id 8 100.00 "Using index"
Average response time: 1.0 secs
b) Using a Correlated-Subquery
select user.id, user.username,
(select COALESCE(count(bet.id), 0) from bet where bet.user_placed = user.id) as bets_placed,
(select COALESCE(count(bet.id), 0) from bet where bet.user_won = user.id) as bets_won,
(select COALESCE(count(bet_accepters.id), 0) from bet_accepters where bet_accepters.user = user.id) as bets_involved
from user;
Explain plan:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY user ALL NULL NULL NULL NULL 86 100.00 NULL
4 "DEPENDENT SUBQUERY" bet_accepters ref FK_user_idx FK_user_idx 4 xxx.user.id 8 100.00 "Using index"
3 "DEPENDENT SUBQUERY" bet ref user_won_idx user_won_idx 5 xxx.user.id 8 100.00 "Using index"
2 "DEPENDENT SUBQUERY" bet ref fk_bet_user1_idx fk_bet_user1_idx 4 xxx.user.id 6 100.00 "Using index"
Average response time: 0.001 secs
Please see
which shows a comparison in speed/rows for different types of query.
It may be that there is little/no difference (either way) on 'smaller' datasets (but may ary on way the db has been set up, as well as the DBMS used), but as you can see,
However, in relation to other 'query types', these are much faster than other operations (shown below):
Subquery vs. Join
Both the subquery and join solutions perform reasonably well when very
small partitions are involved (up to several hundred rows per
partition). As partition size grows, the performance of these
solutions degrades in a quadratic (N2) manner, becoming quite poor.
But as long as the partitions are small, performance degradation
caused by an increase in number of partitions is linear. One factor
that might affect your choice between using a subquery-based or
join-based solution is the number of aggregates requested. As I
discussed, the subquery-based approach requires a separate scan of the
data for each aggregate, whereas the join-based approach doesn’t—so
you’ll most likely want to use the join approach when you need to
calculate multiple aggregates.
~SOURCE
Please consider the following query
SELECT * FROM PC_SMS_OUTBOUND_MESSAGE AS OM
JOIN MM_TEXTOUT_SERVICE AS TOS ON TOS.TEXTOUT_SERVICE_ID = OM.SERVICE_ID
JOIN PC_SERVICE_NUMBER AS SN ON OM.TO_SERVICE_NUMBER_ID = SN.SERVICE_NUMBER_ID
JOIN PC_SUBSCRIBER AS SUB ON SUB.SERVICE_NUMBER_ID = SN.SERVICE_NUMBER_ID
JOIN MM_CONTACT CON ON CON.SUBSCRIBER_ID = SUB.SUBSCRIBER_ID
--AND CON.MM_CLIENT_ID = 1
AND OM.CLIENT_ID= 1
AND OM.CREATED>='2013-05-08 11:47:53' AND OM.CREATED<='2014-05-08 11:47:53'
ORDER BY OM.SMS_OUTBOUND_MESSAGE_ID DESC LIMIT 50
To get the dataset I require I need to filter on the (commented out) CONTACTS client_id as well as the OUTBOUND_MESSAGES client_id but this is what changes the performance from milliseconds to tens of minutes.
Execution plan without "AND CON.MM_CLIENT_ID = 1":
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE OM index FK4E518EAA19F2EA2B,SERVICEID_IDX,CREATED_IDX,CLIENTID_IDX,CL_CR_ST_IDX,CL_CR_STYPE_ST_IDX,SID_TOSN_CL_CREATED_IDX PRIMARY 8 NULL 6741 3732.00 Using where
1 SIMPLE SUB ref PRIMARY,FKA1845E3459A7AEF FKA1845E3459A7AEF 9 mmlive.OM.TO_SERVICE_NUMBER_ID 1 100.00 Using where
1 SIMPLE SN eq_ref PRIMARY PRIMARY 8 mmlive.OM.TO_SERVICE_NUMBER_ID 1 100.00 Using where
1 SIMPLE CON ref FK2BEC061CA525D30,SUB_CL_IDX FK2BEC061CA525D30 8 mmlive.SUB.SUBSCRIBER_ID 1 100.00
1 SIMPLE TOS eq_ref PRIMARY,FKDB3DF298AB3EF4E2 PRIMARY 8 mmlive.OM.SERVICE_ID 1 100.00
Execution plan with "AND CON.MM_CLIENT_ID = 1":
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE CON ref FK2BEC061CA525D30,FK2BEC06134399E2A,SUB_CL_IDX FK2BEC06134399E2A 8 const 18306 100.00 Using temporary; Using filesort
1 SIMPLE SUB eq_ref PRIMARY,FKA1845E3459A7AEF PRIMARY 8 mmlive.CON.SUBSCRIBER_ID 1 100.00
1 SIMPLE OM ref FK4E518EAA19F2EA2B,SERVICEID_IDX,CREATED_IDX,CLIENTID_IDX,CL_CR_ST_IDX,CL_CR_STYPE_ST_IDX,SID_TOSN_CL_CREATED_IDX FK4E518EAA19F2EA2B 9 mmlive.SUB.SERVICE_NUMBER_ID 3 100.00 Using where
1 SIMPLE SN eq_ref PRIMARY PRIMARY 8 mmlive.SUB.SERVICE_NUMBER_ID 1 100.00 Using where
1 SIMPLE TOS eq_ref PRIMARY,FKDB3DF298AB3EF4E2 PRIMARY 8 mmlive.OM.SERVICE_ID 1 100.00
Any suggestions on how to format the above to make it a little easier on the eye would be good.
ID fields are primary keys.
There are indexes on all joining columns.
You may be able to fix this problem by using a subquery:
JOIN (SELECT C.* FROM CONTACTS C WHERE C.USER_ID = 1) ON C.SUBSCRIBER_ID = SUB.ID
This will materialize the matching rows, which could have downstream effects on the query plan.
If this doesn't work, then edit your query and add:
The explain plans for both queries.
The indexes available on the table.
EDIT:
Can you try creating a composite index:
PC_SMS_OUTBOUND_MESSAGE(CLIENT_ID, CREATED, SERVICE_ID, TO_ SERVICE_ID, SMS_OUTBOUND_MESSAGE_ID);
This may change both query plans to start on the OM table with the appropriate filtering, hopefully making the results stable and good.
I've solved the riddle! For my case anyway so I'll share.
This all came down to the join order changing once I added that extra clause, which you can clearly see in the execution plan. When the query is fast, the Outbound Messages are at the top of the plan but when slow (after adding the clause), the Contacts table is at the top.
I think this means that the Outbound Messages index can no longer be utilised for the sorting which causes the dreaded;
"Using temporary; Using filesort"
By simply adding STRAIGHT_JOIN keyword directly after the select I could force the execution plan to join in the order denoted directly by the query.
Happy for anyone with a more intimate knowledge of this field to contradict any of the above in terms of what is actually happening but it definitely worked.
I have the following query:
select
t.Chunk as LeftChunk,
t.ChunkHash as LeftChunkHash,
q.Chunk as RightChunk,
q.ChunkHash as RightChunkHash,
count(t.ChunkHash) as ChunkCount
from
chunks as t
join
chunks as q
on
t.ID = q.ID
group by LeftChunkHash, RightChunkHash
And the following explain table:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t ALL IDIndex NULL NULL NULL 17796190 "Using temporary; Using filesort"
1 SIMPLE q ref IDIndex IDIndex 4 sotero.t.Id 12
note the "using temporary; using filesort".
When this query is run, I quickly run out of RAM (presumably b/c of the temp table), and then the HDD kicks in, and the query slows to a halt.
I thought it might be an index issue, so I started adding a few that sort of made sense:
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
chunks 0 PRIMARY 1 ChunkId A 17796190 NULL NULL BTREE
chunks 1 ChunkHashIndex 1 ChunkHash A 243783 NULL NULL BTREE
chunks 1 IDIndex 1 Id A 1483015 NULL NULL BTREE
chunks 1 ChunkIndex 1 Chunk A 243783 NULL NULL BTREE
chunks 1 ChunkTypeIndex 1 ChunkType A 2 NULL NULL BTREE
chunks 1 chunkHashByChunkIDIndex 1 ChunkHash A 243783 NULL NULL BTREE
chunks 1 chunkHashByChunkIDIndex 2 ChunkId A 17796190 NULL NULL BTREE
chunks 1 chunkHashByChunkTypeIndex 1 ChunkHash A 243783 NULL NULL BTREE
chunks 1 chunkHashByChunkTypeIndex 2 ChunkType A 261708 NULL NULL BTREE
chunks 1 chunkHashByIDIndex 1 ChunkHash A 243783 NULL NULL BTREE
chunks 1 chunkHashByIDIndex 2 Id A 17796190 NULL NULL BTREE
But still using the temporary table.
The db engine is MyISAM.
How can I get rid of the using temporary; using filesort in this query?
Just changing to InnoDB w/o explaining the underlying cause is not a particularly satisfying answer. Besides, if the solution is to just add the proper index, then that's much easier than migrating to another db engine.
I am new to relational databases. So I'm hoping that the solution is something obvious to the experts.
EDIT1:
ID is not the primary key. ChunkID is. There are approximately 40 ChunkIDs for each ID. So adding an additional ID to the table adds about 40 rows. Each unique chunk has a unique chunkHash associated with it.
EDIT2:
Here's the schema:
Field Type Null Key Default Extra
ChunkId int(11) NO PRI NULL
ChunkHash int(11) NO MUL NULL
Id int(11) NO MUL NULL
Chunk varchar(255) NO MUL NULL
ChunkType varchar(255) NO MUL NULL
EDIT 3:
The end objective of the query is to create a table of word co-occurrences across documents. ChunkIDs are word instances. Each instance is a word that is associated with a particular document (ID). About 40 words present per document. About 1 million documents. So the resulting table of co-occurrences is highly compressed compared to the full cross-product temporary table that is (apparently) being created. That is, the full cross-product temp table is 1 mil * 40 * 40 = 1.6 billion rows. The compressed resulting table is estimated at about 40 million rows.
EDIT 4:
Adding postgresql tag to see if any postgresql users can get a better execution plan on that SQL implementation. If that's the case, I'll switch over.
How about summarizing the table before the join?
The summary might be:
select count(*) count,
Chunk,
ChunkHash
from chunks
group by Chunk, ChunkHash
Then the join would be:
Select r.Chunk as RightChunk,
r.ChunkHash as RightChunkHash,
l.Chunk as LeftChunk,
l.ChunkHash as LeftChunkHash
sum (l.Count) + sum(r.Count) as Count
from (
select count(*) count,
Chunk,
ChunkHash
from chunks
group by Chunk, ChunkHash
) l
join (
select count(*) count,
Chunk,
ChunkHash
from chunks
group by Chunk, ChunkHash
) r on l.Chunk = r.Chunk
group by r.Chunk, r.ChunkHash, l.Chunk, l.ChunkHash
The thing I'm not sure about is what you're counting, exactly. So my SUM() + SUM() is a guess. You might want SUM() * SUM().
Also, I'm assuming that two Chunk values are equal if and only if ChunkHash values are equal.
Updated with a query that produces the same results. It won't be any faster though.
Create Index IX_ID On Chunks (ID);
Select
LeftChunk,
LeftChunkHash,
RightChunk,
RightChunkHash,
Sum(ChunkCount)
From (
Select
t.Chunk as LeftChunk,
t.ChunkHash as LeftChunkHash,
q.Chunk as RightChunk,
q.ChunkHash as RightChunkHash,
count(t.ChunkHash) as ChunkCount
From
chunks as t
inner join
chunks as q
on t.ID = q.ID
Group By
t.ID,
t.ChunkHash,
q.ChunkHash
) x
Group By
LeftChunk,
LeftChunkHash,
RightChunk,
RightChunkHash
Fiddle with example test data http://sqlfiddle.com/#!3/ea1a5/2
Latest Fiddle, with the problem reformulated as words and documents: http://sqlfiddle.com/#!3/f5aef/12
With the problem reformulated as documents and words, how many documents do you have, how many words, and how many document words?
Also, using the documents and words analogy, would you say your query is "For all pairs of words that appear in a document together, how often do they appear together in any document. If word A appears n times in a document and word B m times in the same document, then this counts as n * m times in the total."
I migrated from MySQL to PostgreSQL, and query execution time went from ~1.5 days to ~10 mins.
Here's the PostgreSQL query execution plan:
I am no longer using MySQL.