Why is COUNT() query from large table much faster than SUM() - mysql

I have a data warehouse with the following tables:
main
about 8 million records
CREATE TABLE `main` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`cid` mediumint(8) unsigned DEFAULT NULL, //This is the customer id
`iid` mediumint(8) unsigned DEFAULT NULL, //This is the item id
`pid` tinyint(3) unsigned DEFAULT NULL, //This is the period id
`qty` double DEFAULT NULL,
`sales` double DEFAULT NULL,
`gm` double DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_pci` (`pid`,`cid`,`iid`) USING HASH,
KEY `idx_pic` (`pid`,`iid`,`cid`) USING HASH
) ENGINE=InnoDB AUTO_INCREMENT=7978349 DEFAULT CHARSET=latin1
period
This table has about 50 records and has the following fields
id
month
year
customer
This has about 23,000 records and the following fileds
id
number //This field is unique
name //This is simply a description field
The following query runs very fast (less than 1 second) and returns about 2,000:
select count(*)
from mydb.main m
INNER JOIN mydb.period p ON p.id = m.pid
INNER JOIN mydb.customer c ON c.id = m.cid
WHERE p.year = 2013 AND c.number = 'ABC';
But this query is much slower (mmore than 45 seconds), which is the same as the previous but sums instead of counts:
select sum(sales)
from mydb.main m
INNER JOIN mydb.period p ON p.id = m.pid
INNER JOIN mydb.customer c ON c.id = m.cid
WHERE p.year = 2013 AND c.number = 'ABC';
When I explain each query, the ONLY difference I see is that on the 'count()'
query the 'Extra' field says 'Using index', while for the 'sum()' query this field is NULL.
Explain count() query
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | c | const | PRIMARY,idx_customer | idx_customer | 11 | const | 1 | Using index |
| 1 | SIMPLE | p | ref | PRIMARY,idx_period | idx_period | 4 | const | 6 | Using index |
| 1 | SIMPLE | m | ref | idx_pci,idx_pic | idx_pci | 6 | mydb.p.id,const | 7 | Using index |
Explain sum() query
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | c | const | PRIMARY,idx_customer | idx_customer | 11 | const | 1 | Using index |
| 1 | SIMPLE | p | ref | PRIMARY,idx_period | idx_period | 4 | const | 6 | Using index |
| 1 | SIMPLE | m | ref | idx_pci,idx_pic | idx_pci | 6 | mydb.p.id,const | 7 | NULL |
Why is the count() so much faster than sum()? Shouldn't it be using the index for both?
What can I do to make the sum() go faster?
Thanks in advance!
EDIT
All the tables show that it is using Engine InnoDB
Also, as a side note, if I just do a 'SELECT *' query, this runs very quickly (less than 2 seconds). I would expect that the 'SUM()' shouldn't take any longer than that since SELECT * has to retrieve the rows anyways...
SOLVED
This is what I've learned:
Since the sales field is not a part of the index, it has to retrieve the records from the hard drive (which can be kind've slow).
I'm not too familiar with this, but it looks like I/O performance can be increased by switching to a SSD (Solid-state drive). I'll have to research this more.
For now, I think I'm going to create another layer of summary in order to get the performance I'm looking for.
I redefined my index on the main table to be (pid,cid,iid,sales,gm,qty) and now the sum() queries are running VERY fast!
Thanks everybody!

The index is the list of key rows.
When you do the count() query the actual data from the database can be ignored and just the index used.
When you do the sum(sales) query, then each row has to be read from disk to get the sales figure, hence much slower.
Additionally, the indexes can be read in bulk and then processed in memory, while the disk access will be randomly trashing the drive trying to read rows from across the disk.
Finally, the index itself may have summaries of the counts (to help with the plan generation)
Update
You actually have three indexes on your table:
PRIMARY KEY (`id`),
KEY `idx_pci` (`pid`,`cid`,`iid`) USING HASH,
KEY `idx_pic` (`pid`,`iid`,`cid`) USING HASH
So you only have indexes on the columns id, pid, cid, iid. (As an aside, most databases are smart enough to combine indexes, so you could probably optimize your indexes somewhat)
If you added another key like KEY idx_sales(id,sales) that could improve performance, but given the likely distribution of sales values numerically, you would be adding extra performance cost for updates which is likely a bad thing

The simple answer is that count() is only counting rows. This can be satisfied by the index.
The sum() needs to identify each row and then fetch the page in order to get the sales column. This adds a lot of overhead -- about one page load per row.
If you add sales into the index, then it should also go very fast, because it will not have to fetch the original data.

Related

MySQL - Query becomes slow when combining WHERE and ORDER BY

I have a MySQL (v8.0.30) table that stores trades, the schema is the following:
CREATE TABLE `log_fill` (
`id` bigint NOT NULL AUTO_INCREMENT,
`orderId` varchar(63) NOT NULL,
`clientOrderId` varchar(36) NOT NULL,
`symbol` varchar(31) NOT NULL,
`executionId` varchar(255) DEFAULT NULL,
`executionSide` tinyint NOT NULL COMMENT '0 = long, 1 = short',
`executionSize` decimal(15,2) NOT NULL,
`executionPrice` decimal(21,8) unsigned NOT NULL,
`executionTime` bigint unsigned NOT NULL,
`executionValue` decimal(21,8) NOT NULL,
`executionFee` decimal(13,8) NOT NULL,
`feeAsset` varchar(63) DEFAULT NULL,
`positionSizeBeforeFill` decimal(21,8) DEFAULT NULL,
`apiKey` int NOT NULL,
`side` varchar(20) DEFAULT NULL,
`reconciled` tinyint unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `executionId` (`executionId`,`executionSide`),
KEY `apiKey` (`apiKey`)
) ENGINE=InnoDB AUTO_INCREMENT=6522695 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
As you can see, there's a BTREE index on the column apiKey which stands for a user, this way I can quickly retrieve all trades for a specific user.
My goal is a query that returns positionSizeBeforeFill + executionSize for the last record, given an apiKey and a symbol. So I wrote the following:
SELECT positionSizeBeforeFill + executionSize
FROM log_fill
WHERE apiKey = 90 AND symbol = 'ABCD'
ORDER BY id DESC
However the execution is extremely slow (around 100ms). I've noticed that running either WHERE or ORDER BY (and not both together) drastically reduces execution time. For example
SELECT positionSizeBeforeFill + executionSize
FROM log_fill
WHERE apiKey = 90 AND symbol = 'ABCD'
only takes 220 microseconds to execute. The number of records after filtering by apiKey and symbol is 388.
Similarly,
SELECT positionSizeBeforeFill + executionSize
FROM log_fill
ORDER BY id DESC
takes 26 microseconds (on a 3 million records table).
All in all, separately running WHERE and ORDER BY takes microseconds of execution, when I combine them we scale up to milliseconds (around 1000x more).
Running EXPLAIN on the slow query it turns out it has to examine 116032 rows.
I tried to create a temporary table hoping for MySQL to perform sorting only on the filtered records, but the outcome is the same. Was wondering whether the problem might be the index (whose cardinality is 203), but how can it be the case when WHERE alone takes very little time? I could not find similar cases on other questions or forums. I think I just fail at understanding how InnoDB selects data, I thought it would first filter by WHERE and then perform ORDER BY on the filtered rows. How can I improve this? Thanks!
Edit: The EXPLAIN statement on the slow query returns
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+------+---------------+--------+---------+-------+--------+----------+----------------------------------+
| 1 | SIMPLE | log_fill_tmp | NULL | ref | apiKey | apiKey | 4 | const | 116032 | 10.00 | Using where; Backward index scan |
The query with WHERE only
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+------+---------------+--------+---------+-------+--------+----------+-------------+
| 1 | SIMPLE | log_fill_tmp | NULL | ref | apiKey | apiKey | 4 | const | 116032 | 10.00 | Using where |
The query with ORDER BY only on the full table
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+-------+---------------+---------+---------+------+---------+----------+---------------------+
| 1 | SIMPLE | log_fill_tmp | NULL | index | NULL | PRIMARY | 8 | NULL | 2503238 | 100.00 | Backward index scan |
26 microseconds for any query against a 3-million-row table implies that you have the Query cache enabled. Please rerunning your timings with SELECT SQL_NO_CACHE .... (Even milliseconds would be suspicious.) Were all 3M rows returned? Shoveling that many rows probably takes more than a second under any circumstances.
Meanwhile, to speed up the first two queries, add
INDEX(symbol, apikey, id)
EXPLAIN gives only approximate (eg, 116032) counts. A "cardinality" of 203 is also just an estimate, but it is used by the Optimizer in some situations. Please get the exact count just to check that there really are any rows:
SELECT COUNT(*)
FROM log_fill
WHERE apiKey = 90 AND symbol = 'ABCD'
With the ORDER BY id DESC, it will scan the entire B+Tree what holds the data. As it says, it will do a 'Backward index scan'. However, since the "index" is the PRIMARY KEY and the PK is clustered with the data, it is really referring to the data's BTree.
The EXPLAIN for the first query decided that the indexes were not useful enough for WHERE; instead it avoided the sort (ORDER BY) by doing the Backward full table scan, same as the 3rd query. (And ignored any rows that did not match the WHERE.
I added a composite index on (apiKey, symbol) and now the query runs in as little as 0.2ms. In order to reduce the creation time for the index I reduce the number of records from 3M to 500K and the gain in time is about 97%, I believe it's going to be more on the full table. I thought using just the apiKey index it would first filter out by user, then by symbol, but I'm probably wrong.

Need index for simple query please

Can anyone suggest a good index to make this query run quicker?
SELECT
s.*,
sl.session_id AS session_id,
sl.lesson_id AS lesson_id
FROM
cdu_sessions s
INNER JOIN cdu_sessions_lessons sl ON sl.session_id = s.id
WHERE
(s.sort = '1') AND
(s.enabled = '1') AND
(s.teacher_id IN ('193', '1', '168', '1797', '7622', '19951'))
Explain:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
1 | SIMPLE | s | NULL | ALL | PRIMARY | NULL | NULL | NULL | 2993 | 0.50 | Using where
1 | SIMPLE | sl | NULL | ref | session_id,ix2 | ix2 | 4 | ealteach_main.s.id | 5 | 100.00 | Using index
cdu_sessions looks like this:
------------------------------------------------
id | int(11)
name | varchar(255)
map_location | enum('classroom', 'school'...)
sort | tinyint(1)
sort_titles | tinyint(1)
friend_gender | enum('boy', 'girl'...)
friend_name | varchar(255)
friend_description | varchar(2048)
friend_description_format | varchar(128)
friend_description_audio | varchar(255)
friend_description_audio_fid | int(11)
enabled | tinyint(1)
created | int(11)
teacher_id | int(11)
custom | int(1)
------------------------------------------------
cdu_sessions_lessons contains 3 fields - id, session_id and lesson_id
Thanks!
Without looking at the query plan, row count and distribution on each table, is hard to predict a good index to make it run faster.
But, I would say that this might help:
> create index sessions_teacher_idx on cdu_sessions(teacher_id);
looking at where condition you could use a composite index for table cdu_sessions
create index idx1 on cdu_sessions(teacher_id, sort, enabled);
and looking to join and select for table cdu_sessions_lessons
create index idx2 on cdu_sessions_lessons(session_id, lesson_id);
First, write the query so no type conversions are necessary. All the comparisons in the where clause are to numbers, so use numeric constants:
SELECT s.*,
sl.session_id, -- unnecessary because s.id is in the result set
sl.lesson_id
FROM cdu_sessions s INNER JOIN
cdu_sessions_lessons sl
ON sl.session_id = s.id
WHERE s.sort = 1 AND
s.enabled = 1 AND
s.teacher_id IN (193, 1, 168, 1797, 7622, 19951);
Although it might not be happening in this specific case, mixing types can impede the use of indexes.
I removed the column as aliases (as session_id for instance). These were redundant because the column name is the alias and the query wasn't changing the name.
For this query, first look at the WHERE clause. All the column references are from one table. These should go in the index, with the equality comparisons first:
create index idx_cdu_sessions_4 on cdu_sessions(sort, enabled, teacher_id, id)
I added id because it is also used in the JOIN.
Formally, id is not needed in the index if it is the primary key. However, I like to be explicit if I want it there.
Next you want an index for the second table. Only two columns are referenced from there, so they can both go in the index. The first column should be the one used in the join:
create index idx_cdu_sessions_lessons_2 on cdu_sessions_lessons(session_id, lesson_id);

Is there a way to hint mysql to use Using index for group-by

I was busying myself with exploring GROUP BY optimizations. On a classical "max salary per departament" query. And suddenly weird results. The dump below goes straight from my console. NO COMMAND were issued between these two EXPLAINS. Only some time had passed.
mysql> explain select name, t1.dep_id, salary
from emploee t1
JOIN ( select dep_id, max(salary) msal
from emploee
group by dep_id
) t2
ON t1.salary=t2.msal and t1.dep_id = t2.dep_id
order by salary desc;
+----+-------------+------------+-------+---------------+--------+---------+-------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+--------+---------+-------------------+------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4 | Using temporary; Using filesort |
| 1 | PRIMARY | t1 | ref | dep_id | dep_id | 8 | t2.dep_id,t2.msal | 1 | |
| 2 | DERIVED | emploee | index | NULL | dep_id | 8 | NULL | 84 | Using index |
+----+-------------+------------+-------+---------------+--------+---------+-------------------+------+---------------------------------+
3 rows in set (0.00 sec)
mysql> explain select name, t1.dep_id, salary
from emploee t1
JOIN ( select dep_id, max(salary) msal
from emploee
group by dep_id
) t2
ON t1.salary=t2.msal and t1.dep_id = t2.dep_id
order by salary desc;
+----+-------------+------------+-------+---------------+--------+---------+-------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+--------+---------+-------------------+------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4 | Using temporary; Using filesort |
| 1 | PRIMARY | t1 | ref | dep_id | dep_id | 8 | t2.dep_id,t2.msal | 3 | |
| 2 | DERIVED | emploee | range | NULL | dep_id | 4 | NULL | 9 | Using index for group-by |
+----+-------------+------------+-------+---------------+--------+---------+-------------------+------+---------------------------------+
3 rows in set (0.00 sec)
As you may notice, it examined ten times less rows in second run. I assume it's because some inner counters got changed. But I don't want to depend on these counters. So - is there a way to hint mysql to use "Using index for group by" behavior only?
Or - if my speculations are wrong - is there any other explanation on the behavior and how to fix it?
CREATE TABLE `emploee` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`dep_id` int(11) NOT NULL,
`salary` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `dep_id` (`dep_id`,`salary`)
) ENGINE=InnoDB AUTO_INCREMENT=85 DEFAULT CHARSET=latin1 |
+-----------+
| version() |
+-----------+
| 5.5.19 |
+-----------+
Hm, showing the cardinality of indexes may help, but keep in mind: range's are usually slower then indexes there.
Because it think it can match the full index in the first one, it uses the full one. In the second one, it drops the index and goes for a range, but guesses the total number of rows satisfying that larger range wildly lower then the smaller full index, because all cardinality has changed. Compare it to this: why would "AA" match 84 rows, but "A[any character]" match only 9 (note that it uses 8 bytes of the key in the first, 4 bytes in the second)? The second one will in reality not read less rows, EXPLAIN just guesses the number of rows differently after an update on it's metadata of indexes. Not also that EXPLAIN does not tell you what a query will do, but what it probably will do.
Updating the cardinality can or will occur when:
The cardinality (the number of different key values) in every index of a table is calculated when a table is opened, at SHOW TABLE STATUS and ANALYZE TABLE and on other circumstances (like when the table has changed too much). Note that all tables are opened, and the statistics are re-estimated, when the mysql client starts if the auto-rehash setting is set on (the default).
So, assume 'at any point' due to 'changed too much', and yes, connecting with the mysql client can alter the behavior in choosing indexes of a server. Also: reconnecting of the mysql client after it lost its connection after a timeout counts as connecting with auto-rehash AFAIK. If you want to give mysql help to find the proper method, run ANALYZE TABLE once in a while, especially after heavy updating. If you think the cardinality it guesses is often wrong, you can alter the number of pages it reads to guess some statistics, but keep in mind a higher number means a longer running update of that cardinality, and something you don't want to happen that often when 'data has changed to much' on a table with a lot of operations.
TL;DR: it guesses rows differently, but you'd actually prefer the first behavior if the data makes that possible.
Adding:
On this previously linked page, we can probably also find why especially dep_id might have this problem:
small values like 1 or 2 can result in very inaccurate estimates of cardinality
I could imagine the number of different dep_id's is typically quite small, and I've indeed observed a 'bouncing' cardinality on non-unique indexes with quite a small range compared to the number of rows in my own databases. It easily guesses a range of 1-10 in the hundreds and then down again the next time, just based on the specific sample pages it picks & some algorithm that tries to extrapolate that.

Speed up self join in MYSQL

I have a table of connections between different objects and I'm basically trying a graph traversal using self joins.
My table is defined as:
CREATE TABLE `connections` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`position` int(11) NOT NULL,
`dId` bigint(20) NOT NULL,
`sourceId` bigint(20) NOT NULL,
`targetId` bigint(20) NOT NULL,
`type` bigint(20) NOT NULL,
`weight` float NOT NULL DEFAULT '1',
`refId` bigint(20) NOT NULL,
`ts` bigint(20) NOT NULL,
PRIMARY KEY (`id`),
KEY `sourcetype` (`type`,`sourceId`,`targetId`),
KEY `targettype` (`type`,`targetId`,`sourceId`),
KEY `complete` (`dId`,`sourceId`,`targetId`,`type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
The table contains about 3M entries (~1K of type 1, 1M of type 2, and 2M of type 3).
The queries over 2 or 3 hops are actually quite fast (of course it takes a while to receive all the results), but getting the count of a query for 3 hops is terribly slow (> 30s).
Here's the query (which returns 2M):
SELECT
count(*)
FROM
`connections` AS `t0`
JOIN
`connections` AS `t1` ON `t1`.`targetid`=`t0`.`sourceid`
JOIN
`connections` AS `t2` ON `t2`.`targetid`=`t1`.`sourceid`
WHERE
`t2`.dId = 1
AND
`t2`.`sourceid` = 1
AND
`t2`.`type` = 1
AND
`t1`.`type` = 2
AND
`t0`.`type` = 3;
Here's the corresponding EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t2 ref targettype,complete,sourcetype complete 16 const,const 100 Using where; Using index
1 SIMPLE t1 ref targettype,sourcetype targettype 8 const 2964 Using where; Using index
1 SIMPLE t0 ref targettype,sourcetype sourcetype 16 const,travtest.t1.targetId 2964 Using index
Edit: Here is the EXPLAIN after adding and index to type:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t2 ref type,complete,sourcetype,targettype complete 16 const,const 100 Using where; Using index
1 SIMPLE t1 ref type,sourcetype,targettype sourcetype 16 const,travtest.t2.targetId 2 Using index
1 SIMPLE t0 ref type,sourcetype,targettype sourcetype 16 const,travtest.t1.targetId 2 Using index
Is there a way to improve this?
2nd edit:
EXPLAN EXTENDED:
+----+-------------+-------+------+-------------------------------------+------------+---------+----------------------------+------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+-------------------------------------+------------+---------+----------------------------+------+----------+--------------------------+
| 1 | SIMPLE | t2 | ref | type,complete,sourcetype,targettype | complete | 16 | const,const | 100 | 100.00 | Using where; Using index |
| 1 | SIMPLE | t1 | ref | type,sourcetype,targettype | sourcetype | 16 | const,travtest.t2.targetId | 1 | 100.00 | Using index |
| 1 | SIMPLE | t0 | ref | type,sourcetype,targettype | sourcetype | 16 | const,travtest.t1.targetId | 1 | 100.00 | Using index |
+----+-------------+-------+------+-------------------------------------+------------+---------+----------------------------+------+----------+--------------------------+
SHOW WARNINGS;
+-------+------+--------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+--------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select count(0) AS `count(*)` from `travtest`.`connections` `t0` |
| | | join `travtest`.`connections` `t1` join `travtest`.`connections` `t2` |
| | | where ((`travtest`.`t0`.`sourceId` = `travtest`.`t1`.`targetId`) and |
| | | (`travtest`.`t1`.`sourceId` = `travtest`.`t2`.`targetId`) and (`travtest`.`t0`.`type` = 3) |
| | | and (`travtest`.`t1`.`type` = 2) and (`travtest`.`t2`.`type` = 1) and |
| | | (`travtest`.`t2`.`sourceId` = 1) and (`travtest`.`t2`.`dId` = 1)) |
+-------+------+--------------------------------------------------------------------------------------------+
Create an index for sourceid,targetidand type columns then try with this query:
SELECT
count(*)
FROM
`connections` AS `t0`
JOIN
`connections` AS `t1` ON `t1`.`targetid`=`t0`.`sourceid` and `t1`.`type` = 2
JOIN
`connections` AS `t2` ON `t2`.`targetid`=`t1`.`sourceid` and `t2`.dId = 1 AND `t2`.`sourceid` = 1 AND `t2`.`type` = 1
WHERE
`t0`.`type` = 3;
-------UPDATE-----
I think that those indices are right and with those big tables you reached the best optimization you can have. I don't think you can improve this query with other optimization like table partitioning/sharding.
You could implement some kind of caching if those data does not change often or the only way I see is to scale vertically
Probably, you are handling graph data, am i right?
Your 3 hop query has a little chance to optimize. It is bushy tree. A lot of connections made. I think JOIN order and INDEX is right.
EXPLAIN tells me t2 produce about 100 targetId. If you get rid of t2 from join and add t1.sourceId IN (100 targetId). This will take same time with 3 times self join.
But what about break down 100 target to 10 sub IN list. If this reduce response time, with multi thread run 10 queries at once.
MySQL has no parallel feafure. So you do your self.
And did you tried graph databases like jena, sesame? I am not sure graph datdabase is faster than MYSQL.
This is not the answer. just FYI.
If MySQL or the other DBs is slow for you, You can implement your own graph database. then this paper Literature Survey of Graph Databases[1] is a nice work on Graph Database. surveyed several Graph Database, let us know many technique.
SCALABLE SEMANTIC WEB DATA MANAGEMENT USING VERTICAL PARTITIONING [2] introduce vertical partitioning but, your 3M edges is not big, vertical partitioning can't help you. [2] introduces another concept Materialized Path Expressions. I think this can help you.
[1] http://www.systap.com/pubs/graph_databases.pdf
[2] http://db.csail.mit.edu/projects/cstore/abadirdf.pdf
You comment that your query returns 2M (assuming you mean 2 million) is that the final count, or just that its going through the 2 million. Your query appears to be specifically looking for a single T2.ID, Source AND Type joined TO the other tables, but starting with connection 0.
I would drop your existing indexes and have the following so the engine does not try to use the others and cause confusion in how it joins. Also, by having the target ID in both (as you already had) means these will be covering indexes and the engine does not have to go to the actual raw data on the page to confirm any other criteria or pull values from.
The only index is based on your ultimate criteria as in the T2 by the source, type and ID. Since the targetID (part of the index) is the source of the next in the daisy-chain, you are using the same index for source and type going up the chain. No confusion on any indexes
INDEX ON (sourceId, type, dId, targetid )
I would try by reversing to what hopefully would be the smallest set and working OUT... Something like
SELECT
COUNT(*)
FROM
`connections` t2
JOIN `connections` t1
ON t2.targetID = t1.sourceid
AND t1.`type` = 2
JOIN `connections` t0
ON t1.targetid = t0.sourceid
AND t0.`type` = 3
where
t2.sourceid = 1
AND t2.type = 1
AND t2.dID = 1

MySQL join issue, query hangs

I have a table holding numeric data points with timestamps, like so:
CREATE TABLE `value_table1` (
`datetime` datetime NOT NULL,
`value` decimal(14,8) DEFAULT NULL,
KEY `datetime` (`datetime`)
) ENGINE=InnoDB;
My table holds a data point for every 5 seconds, so timestamps in the table will be, e.g.:
"2013-01-01 10:23:35"
"2013-01-01 10:23:40"
"2013-01-01 10:23:45"
"2013-01-01 10:23:50"
I have a few such value tables, and it is sometimes necessary to look at the ratio between two value series.
I therefore attempted a join, but it seems to not work:
SELECT value_table1.datetime, value_table1.value / value_table2.rate
FROM value_table1
JOIN value_table2
ON value_table1.datetime = value_table2.datetime
ORDER BY value_table1.datetime ASC;
Running EXPLAIN on the query shows:
+----+-------------+--------------+------+---------------+------+---------+------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+-------+---------------------------------+
| 1 | SIMPLE | value_table1 | ALL | NULL | NULL | NULL | NULL | 83784 | Using temporary; Using filesort |
| 1 | SIMPLE | value_table2 | ALL | NULL | NULL | NULL | NULL | 83735 | |
+----+-------------+--------------+------+---------------+------+---------+------+-------+---------------------------------+
Edit
Problem solved, no idea where my index disappeared to. EXPLAIN showed it, thanks!
Thanks!
As your explain shows, the query is not using indexes on the join. Without indexes, it has to scan every row in both tables to process the join.
First of all, make sure the columns used in the join are both indexed.
If they are, then it might be the column type that is causing issues. You could create an integer representation of the time, and then use that to join the two tables.