What sequence should indexes have? - mysql

I have such sql statement which aggregates data from 3 table of MySQL database. The query takes a very long time to complete. I am trying to use index to speed up the process.
SELECT
A.ID_SITE AS OBJECT_ID,
B.SITE_NAME AS OBJECT_NAME,
A.POLYGON,
C.TIME_KEY AS DATE_TIME_KEY,
B.ADDRESS,
B.REGION,
B.DISTRICT,
B.LOCATION,
B.LOCATION_TYPE
FROM TABLE_C AS C
INNER JOIN TABLE_A AS A
ON C.ID_OBJECT = A.ID_SITE
INNER JOIN TABLE_B B
ON A.ID_SITE = B.SITE_ID AND TRACK_IND != 1
WHERE
(C.TIME_KEY BETWEEN '2018-10-01 00:00:00' AND '2018-10-31 23:59:59')
AND
C.ID_TIME_MODE = 3
AND
C.ID_SUBOBJECT_TYPE = 1
AND (
C.CONG_POWER >= 1 OR
C.DIGITAL_POWER >= 3
)
AND
C.ID_OBJECT NOT IN (20158, 26875)
AND
A.MONTH_KEY = '2018-10-01'
I need some advice. In what sequence is the best way to create and use index in my case?
What I did right now:
CREATE INDEX index_a ON TABLE_A (ID_SITE);
CREATE INDEX index_b ON TABLE_B (SITE_ID, TRACK_IND);
CREATE INDEX index_c ON TABLE_C (TIME_KEY, ID_TIME_MODE, ID_SUBOBJECT_TYPE, CONG_POWER, DIGITAL_POWER, ID_OBJECT)
CREATE INDEX index_a_month_key ON TABLE_A (MONTH_KEY);
Also I think it would be better use FORCE INDEX operator, but I am confused how correctly to use them in my case.

For your query, the best indexes would probably be:
TABLE_C(ID_TIME_MODE, ID_SUBOBJECT_TYPE, TIME_KEY, CONG_POWER, DIGITAL_POWER, ID_OBJECT)
TABLE_A(ID_SITE, MONTH_KEY)
TABLE_B(SITE_ID)

Related

Speed up MySql query time with multiple conditional joins

There are 3 tables, persontbl1, persontbl2 (each 7500 rows) and schedule (~3000 active schedules i.e. schedule.status = 0). Person tables contain data for the same persons as one to one relationship and INNER join between two takes less than a second. And schedule table contains data about persons to be interviewed and not all persons have schedules in schedule table. With Left join query instantly takes around 45 seconds, which is causing all sorts of issues.
SELECT persontbl1._CREATION_DATE, persontbl2._TOP_LEVEL_AURI,
persontbl2.RESP_CNIC, persontbl2.RESP_CNIC_NAME,
persontbl1.MOB_NUMBER1, persontbl1.MOB_NUMBER2,
schedule.id, schedule.call_datetime, schedule.enum_id,
schedule.enum_change, schedule.status
FROM persontbl1
INNER JOIN persontbl2 ON (persontbl2._TOP_LEVEL_AURI = persontbl1._URI)
AND (AGR_CONTACT=1)
LEFT JOIN SCHEDULE ON (schedule.survey_id = persontbl1._URI)
AND (SCHEDULE.status=0)
AND (DATE(SCHEDULE.call_datetime) <= CURDATE())
ORDER BY schedule.call_datetime IS NULL DESC, persontbl1._CREATION_DATE ASC
Here is the explain for query:
Schedule Table structure:
Schedule Table indexes:
Please let me know if any further information is required.
Thanks.
Edit: Added fully qualified table names and their columns.
You should just replace this line:
AND (DATE(SCHEDULE.call_datetime) <= CURDATE())
to this one:
AND SCHEDULE.call_datetime <= '2015-04-18 00:00:00'
so mysql will not call 2 functions per every record but will use static constant '2015-04-18 00:00:00'.
So you can just try for performance improvements if your query is:
SELECT persontbl1._CREATION_DATE, persontbl2._TOP_LEVEL_AURI,
persontbl2.RESP_CNIC, persontbl2.RESP_CNIC_NAME,
persontbl1.MOB_NUMBER1, persontbl1.MOB_NUMBER2,
schedule.id, schedule.call_datetime, schedule.enum_id,
schedule.enum_change, schedule.status
FROM persontbl1
INNER JOIN persontbl2 ON (persontbl2._TOP_LEVEL_AURI = persontbl1._URI)
AND (AGR_CONTACT=1)
LEFT JOIN SCHEDULE ON (schedule.survey_id = persontbl1._URI)
AND (SCHEDULE.status=0)
AND (SCHEDULE.call_datetime <= '2015-02-01 00:00:00')
ORDER BY schedule.call_datetime IS NULL DESC, persontbl1._CREATION_DATE ASC
EDIT 1 So you said without LEFT JOIN part it was fast enough, so you can try then:
SELECT persontbl1._CREATION_DATE, persontbl2._TOP_LEVEL_AURI,
persontbl2.RESP_CNIC, persontbl2.RESP_CNIC_NAME,
persontbl1.MOB_NUMBER1, persontbl1.MOB_NUMBER2,
s.id, s.call_datetime, s.enum_id,
s.enum_change, s.status
FROM persontbl1
INNER JOIN persontbl2 ON (persontbl2._TOP_LEVEL_AURI = persontbl1._URI)
AND (AGR_CONTACT=1)
LEFT JOIN
(SELECT *
FROM SCHEDULE
WHERE status=0
AND call_datetime <= '2015-02-01 00:00:00'
) s
ON s.survey_id = persontbl1._URI
ORDER BY s.call_datetime IS NULL DESC, persontbl1._CREATION_DATE ASC
I'm guessing that AGR_CONTACT comes from p1. This is the query you want to optimize:
SELECT p1._CREATION_DATE, _TOP_LEVEL_AURI, RESP_CNIC, RESP_CNIC_NAME,
MOB_NUMBER1, MOB_NUMBER2,
s.id, s.call_datetime, s.enum_id, s.enum_change, s.status
FROM persontbl1 p1 INNER JOIN
persontbl2 p2
ON (p2._TOP_LEVEL_AURI = p1._URI) AND (p1.AGR_CONTACT = 1) LEFT JOIN
SCHEDULE s
ON (s.survey_id = p1._URI) AND
(s.status = 0) AND
(DATE(s.call_datetime) <= CURDATE())
ORDER BY s.call_datetime IS NULL DESC, p1._CREATION_DATE ASC;
The best indexes for this query are: persontbl2(agr_contact), persontbl1(_TOP_LEVEL_AURI, _uri), and schedule(survey_id, status, call_datime).
The use of date() around the date time is not recommended. In general, that precludes the use of indexes. However, in this case, you have a left join, so it doesn't make a difference. That column is not being used for filtering anyway. The index on schedule is only for covering the on clause.

MySQL Query Optimization for the derived & Sub Query Combination queries

The following sort of the queries are running on the server which uses the derived table and subquery. The constraint is that the subqueries are generated from the multiple modules based on the current situation so cannot really convert it into the join combination.
Please suggest the possible solution to optimize the query
SELECT COUNT(1)
AS total
FROM member tlb_m
where tlb_m.active = 1
and tlb_m.rank > 0
and tlb_m.member_id not in (5735,134,241,1055,348,272,476,43,7,804,7548,90,229,346,40895)
and tlb_m.type = 'M'
and (tlb_m.hometown_list_id in
(SELECT l2.list_id
FROM ((
SELECT t12.list_id
from list_tree_idx t12
INNER JOIN list_tree_idx t11
ON t12.list_parent_id=t11.list_id
where t11.list_parent_id='205546'
) UNION ALL (
SELECT list_id
from list_tree_idx
where list_parent_id='205546'
) ) as l2
) or tlb_m.hometown_list_id = 205546
)
I would suggest to use a closure table for optimal hierarchical queries.
For example, having a closure table with columns ANCESTOR_ID, CHILD_ID and DEPTH your query will look like this
SELECT COUNT(1) AS total
FROM member AS tlb_m
LEFT JOIN hometown_closure AS c ON c.child_id = tlb_m.hometown_list_id
where tlb_m.active = 1
and tlb_m.rank > 0
and tlb_m.member_id not in (5735,134,241,1055,348,272,476,43,7,804,7548,90,229,346,40895)
and tlb_m.type = 'M'
and c.ancestor_id = 205546

Optimizing self join in mysql

The following query takes forever to complete. I've added the indexes on all fields included in the join, tried putting the where conditions into the join and I thought I'd ask for advice before tinkering with FORCE/USE indexes. It just seems that indexes should be used on both sides of this join. Seems only i1 is being used.
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE a ALL i1,i2,i3 2399303 100.00 Using temporary
1 SIMPLE b ref i1,i2,i3 i1 5 db.a.bt 11996 100.00 Using where
create index i1 on obs(bt);
create index i2 on obs(st);
create index i3 on obs(bt,st);
create index i4 on obs(sid);
explain extended
select distinct b.sid
from obs a inner join obs b on a.bt = b.bt and a.st = b.st
where
a.sid != b.sid and
abs( datediff( b.sid_start_date , a.sid_expire_date ) ) < 60;
I've tried both ALTER TABLE and CREATE INDEX above to add indexes to obs.
Since you are not selecting any of the columns in table a, it may be better to use an exists. An exists allows you to check if the information you are looking for is in a specified table without using a join. Removing the join improves the performance. I also like exists because I think it makes the query easier to understand when you come back to it months later.
select distinct b.sid
from obs b
where exists (Select 1
From obs a
Where a.bt = b.bt
and a.st = b.st
and a.sid != b.sid
and abs( datediff( b.sid_start_date , a.sid_expire_date ) ) < 60);

How to optimize this DB operation?

I'm quite sloppy with databases, can't get this working with joins, and I'm not even sure that would be faster...
DELETE FROM atable
WHERE btable_id IN (SELECT id
FROM btable
WHERE param > 2)
AND ctable_id IN (SELECT id
FROM ctable
WHERE ( someblob LIKE '%_ID1_%'
OR someblob LIKE '%_ID2_%' ))
Atable contains ~19M rows, this would delete ~3M of that. At the moment, I can only run the query with LIMIT 100000, and I don't want to sit here with phpmyadmin all day, because each deletion (of 100.000 rows) runs for about 1.5 mins.
Any ways to speed this up / automate it?
MySQL 5.5
(do you think it's already bad DB design if any table contains 20M rows?)
Use EXISTS or JOIN instead of IN to improve perfromance
Using EXISTS:
DELETE FROM Atable A
WHERE EXISTS (SELECT 1 FROM Btable B WHERE A.Btable_id = B.id AND B.param > 2) AND
EXISTS (SELECT 1 FROM Ctable C WHERE A.Ctable_id = C.id AND (C.someblob LIKE '%_ID1_%' OR C.someblob LIKE '%_ID2_%'))
Using JOIN:
DELETE A
FROM Atable A
INNER JOIN Btable B ON A.Btable_id = B.id AND B.param > 2
INNER JOIN Ctable C WHERE A.Ctable_id = C.id AND (C.someblob LIKE '%_ID1_%' OR C.someblob LIKE '%_ID2_%')
first you should try with exist instead of in. it's faster in many many case.
Then you could try to do inner join instead of in and exists.
Example :
delete a
from a
inner join b on b.id = a.tablebid
And finally if it could be possible (i don't know if you have id3, ids) to change the or by something else. Sometimes strange and complicated change helps the optimizer. case when, subquery...
I don't see where a simple index would help much. I'd do:
delete from atable where id in (
select
id
from
atable a
join btable b on a.btable_id = b.id
join ctable c on a.ctable_id = c.id
where
b.param > 2
and (
c.someblob LIKE '%_ID1_%'
OR c.someblob LIKE '%_ID2_%'
)
)
Correction: I'm assuming you've got indexes on btable and ctable's id's (probably, if they're primary keys...) and on b.param (if it's numeric).
Beside optimizing the query you could also take a look at a good use of indexes, since they might prevent a full table scan.
For BTable for example create an index on id and param.
To explain why this helps:
If the database has to look up the id and param values in the table in a unsorted manner, the database has to read ALL rows. If the database reads the index, SORTED, it can look up the id and param with reduced costs.

MySQL Update query with left join and group by

I am trying to create an update query and making little progress in getting the right syntax.
The following query is working:
SELECT t.Index1, t.Index2, COUNT( m.EventType )
FROM Table t
LEFT JOIN MEvents m ON
(m.Index1 = t.Index1 AND
m.Index2 = t.Index2 AND
(m.EventType = 'A' OR m.EventType = 'B')
)
WHERE (t.SpecialEventCount IS NULL)
GROUP BY t.Index1, t.Index2
It creates a list of triplets Index1,Index2,EventCounts.
It only does this for case where t.SpecialEventCount is NULL. The update query I am trying to write should set this SpecialEventCount to that count, i.e. COUNT(m.EventType) in the query above. This number could be 0 or any positive number (hence the left join). Index1 and Index2 together are unique in Table t and they are used to identify events in MEvent.
How do I have to modify the select query to become an update query? I.e. something like
UPDATE Table SET SpecialEventCount=COUNT(m.EventType).....
but I am confused what to put where and have failed with numerous different guesses.
I take it that (Index1, Index2) is a unique key on Table, otherwise I would expect the reference to t.SpecialEventCount to result in an error.
Edited query to use subquery as it didn't work using GROUP BY
UPDATE
Table AS t
LEFT JOIN (
SELECT
Index1,
Index2,
COUNT(EventType) AS NumEvents
FROM
MEvents
WHERE
EventType = 'A' OR EventType = 'B'
GROUP BY
Index1,
Index2
) AS m ON
m.Index1 = t.Index1 AND
m.Index2 = t.Index2
SET
t.SpecialEventCount = m.NumEvents
WHERE
t.SpecialEventCount IS NULL
Doing a left join with a subquery will generate a giant
temporary table in-memory that will have no indexes.
For updates, try avoiding joins and using correlated
subqueries instead:
UPDATE
Table AS t
SET
t.SpecialEventCount = (
SELECT COUNT(m.EventType)
FROM MEvents m
WHERE m.EventType in ('A','B')
AND m.Index1 = t.Index1
AND m.Index2 = t.Index2
)
WHERE
t.SpecialEventCount IS NULL
Do some profiling, but this can be significantly faster in some cases.
my example
update card_crowd as cardCrowd
LEFT JOIN
(
select cc.id , count(1) as num
from card_crowd cc LEFT JOIN
card_crowd_r ccr on cc.id = ccr.crowd_id
group by cc.id
) as tt
on cardCrowd.id = tt.id
set cardCrowd.join_num = tt.num;