Mysql join optimize where clause

Mysql join optimize where clause - mysql

There are two tables in Mysql5.7, and each one has 100,000 records.
And each one contains data like this:
id name
-----------
1 name_1
2 name_2
3 name_3
4 name_4
5 name_5
...
The ddl is:
CREATE TABLE `table_a` (
`id` int(11) NOT NULL,
`name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `table_b` (
`id` int(11) NOT NULL,
`name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Now I execute following two queries to see whether the latter will be better.
select SQL_NO_CACHE *
from table_a a inner
join table_b b on a.name = b.name
where a.id between 50000 and 50100;
select SQL_NO_CACHE *
from (
select *
from table_a
where id between 50000 and 50100
) a
inner join table_b b on a.name = b.name;
I think that in the former query, it would iterate up to 100,000 * 100,000 times and then filter the result by where clause; in the latter query, it would first filter the table_a to get 100 intermediate result and then iterate up to 100 * 100,000 times to get final result. So the former would be much faster than the latter.
But the result is that both query spends 1.5 second. And by using explain statement, I can't find any substantial differences
Does the mysql optimize the former query so that it executes like the latter?

For INNER JOIN, ON and WHERE are optimized the same. For LEFT/RIGHT JOIN, the semantics are different, so the optimization is different. (Meanwhile, please use ON for stating the relationship and WHERE for filtering -- it helps humans in understanding the query.)
Both queries can start by fetching 100 rows from a because of a.id between 50000 and 50100, then reach into the other table 100 time. But how it has to do a table scan because of the lack of any useful index. So 100 x 100,000 operations. ("Nested Loop Join" or "NLJ")
The solution to the slowness is to add
INDEX(name)
Add it at least to b. Or, if this is really a lookup table for making "names" to "ids", then UNIQUE(name). With either index, the work should be down to 100 x 100.
Another technique for analyzing queries is
FLUSH STATUS;
SELECT ...
SHOW VARIABLES LIKE 'Handler%';
It counts the actual number of rows (data or index) touched. 100,000 (or multiples of such) indicate a full table/index scan(s) in your case.
More: Index Cookbook

Joins are always faster than sub-queries, so try to use joins instead of sub-queries wherever you can to speed up the process. Whereas in this case, both the queries are equivalent.
Another way to optimize the query would be using partitions. When using partitions, mysql will directly go to the partition according to your specified query which will reduce the time spent on other unrelated records.

Related

mysql index selection on large table

I have a couple of tables that looks like this:
CREATE TABLE Entities (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(45) NOT NULL,
client_id INT NOT NULL,
display_name VARCHAR(45),
PRIMARY KEY (id)
)
CREATE TABLE Statuses (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(45) NOT NULL,
PRIMARY KEY (id)
)
CREATE TABLE EventTypes (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(45) NOT NULL,
PRIMARY KEY (id)
)
CREATE TABLE Events (
id INT NOT NULL AUTO_INCREMENT,
entity_id INT NOT NULL,
date DATE NOT NULL,
event_type_id INT NOT NULL,
status_id INT NOT NULL
)
Events is large > 100,000,000 rows
Entities, Statuses and EventTypes are small < 300 rows a piece
I have several indexes on Events, but the ones that come into play are
idx_events_date_ent_status_type (date, entity_id, status_id, event_type_id)
idx_events_date_ent_status_type (entity_id, status_id, event_type_id)
idx_events_date_ent_type (date, entity_id, event_type_id)
I have a large complicated query, but I'm getting the same slow query results with a simpler one like the one below (note, in the real queries, I don't use evt.*)
SELECT evt.*, ent.name AS ent_name, s.name AS stat_name, et.name AS type_name
FROM `Events` evt
JOIN `Entities` ent ON evt.entity_id = ent.id
JOIN `EventTypes` et ON evt.event_type_id = et.id
JOIN `Statuses` s ON evt.status_id = s.id
WHERE
evt.date BETWEEN #start_date AND #end_date AND
evt.entity_id IN ( 19 ) AND -- this in clause is built by code
evt.event_type_id = #type_id
For some reason, mysql keeps choosing the index which doesn't cover Events.date and the query takes 15 seconds or more and returns a couple thousand rows. If I change the query to:
SELECT evt.*, ent.name AS ent_name, s.name AS stat_name, et.name AS type_name
FROM `Events` evt force index (idx_events_date_ent_status_type)
JOIN `Entities` ent ON evt.entity_id = ent.id
JOIN `EventTypes` et ON evt.event_type_id = et.id
JOIN `Statuses` s ON evt.status_id = s.id
WHERE
evt.date BETWEEN #start_date AND #end_date AND
evt.entity_id IN ( 19 ) AND -- this in clause is built by code
evt.event_type_id = #type_id
The query takes .014 seconds.
Since this query is built by code, I would much rather not force the index, but mostly, I want to know why it chooses one index over the other. Is it because of the joins?
To give some stats, there are ~2500 distinct dates, and ~200 entities in the Events table. So I suppose that might be why it chooses the index with all of the low cardinality columns.
Do you think it would help to add date to the end of idx_events_date_ent_status_type? Since this is a large table, it takes a long time to add indexes.
I tried adding an additional index,
ix_events_ent_date_status_et(entity_id, date, status_id, event_type_id)
and it actually made the queries slower.
I will experiment a bit more, but I feel like I'm not sure how the optimizer makes it's decisions.
Additional Info:
I tried removing the join to the Statuses table, and mysql switches to ix_events_date_ent_type, and the query runs in 0.045 sec
I can't wrap my head around why removing a join to a table that is not part of the filter impacts the choice of index.

I would add this index:
ALTER TABLE Events ADD INDEX (event_type_id, entity_id, date);
The order of columns is important. Put all column(s) used in equality conditions first. This is event_type_id in this case.
The optimizer can use multiple columns to optimize equalities, if the columns are left-most and consecutive.
Then the optimizer can use one more column to optimize a range condition. A range condition is anything other than = or IS NULL. So range conditions include >, !=, BETWEEN, IN(), LIKE (with no leading wildcard), IS NOT NULL, and so on.
The condition on entity_id is also an equality condition if the IN() list has one element. MySQL's optimizer can treat a list of one value as an equality condition. But if the list has more than one value, it becomes a range condition. So if the example you showed of IN (19) is typical, then all three columns of the index will be used for filtering.
It's still worth putting date in the index, because it can at least tell the InnoDB storage engine to filter rows before returning them. See https://dev.mysql.com/doc/refman/8.0/en/index-condition-pushdown-optimization.html It's not quite as good as a real index lookup, but it's worthwhile.
I would also suggest creating a smaller table to test with. Doing experiments on a 100 million row table is time-consuming. But you do need a table with a non-trivial amount of data, because if you test on an empty table, the optimizer behaves differently.

Rearrange your indexes to have columns in this order:
Any column(s) that will be tested with = or IS NULL.
Column(s) tested with IN -- If there is a single value, this will be further optimized to = for you.
One "range" column, such as your date.
Note that nothing after a "range" test will be used by WHERE.
(There are exceptions, but most are not relevant here.)
More discussion: Index Cookbook
Since the tables smell like Data Warehousing, I suggest looking into
Summary Tables In some cases, long queries on Events can be moved to the summary table(s), where they run much faster. Also, this may eliminate the need for some (or maybe even all) secondary indexes.
Since Events is rather large, I suggest using smaller numbers where practical. INT takes 4 bytes. Speed will improve slightly if you shrink those where appropriate.
When you have INDEX(a,b,c), that index will handle cases that need INDEX(a,b) and INDEX(a). Keep the longer one. (Sometimes the Optimizer picks the shorter index 'erroneously'.)

To most effectively use a composite index on multiple values of two different fields, you need to specify the values with joins instead of simple where conditions. So assuming you are selecting dates from 2022-12-01 to 2022-12-03 and entity_id in (1,2,3), do:
select ...
from (select date('2022-12-01') date union all select date('2022-12-02') union all select date('2022-12-03')) dates
join Entities on Entities.id in (1,2,3)
join Events on Events.entity_id=Entities.id and Events.date=dates.date
If you pre-create a dates table with all dates from 0000-01-01 to 9999-12-31, then you can do:
select ...
from dates
join Entities on Entities.id in (1,2,3)
join Events on Events.entity_id=Entities.id and Events.date=dates.date
where dates.date between #start_date and #end_date

MySql - Self Join - Full Table Scan (Cannot Scan Index)

I have the following self-join query:
SELECT A.id
FROM mytbl AS A
LEFT JOIN mytbl AS B
ON (A.lft BETWEEN B.lft AND B.rgt)
The query is quite slow, and after looking at the execution plan the cause appears to be a full table scan in the JOIN. The table has only 500 rows, and suspecting this to be the issue I increased it to 100,000 rows in order to see if it made a difference to the optimizer's selection.
It did not, with 100k rows it was still doing a full table scan.
My next step was to try and force indexes with the following query, but the same situation arises, a full table scan:
SELECT A.id
FROM categories_nested_set AS A
LEFT JOIN categories_nested_set AS B
FORCE INDEX (idx_lft, idx_rgt)
ON (A.lft BETWEEN B.lft AND B.rgt)
All columns (id, lft, rgt) are integers, all are indexed.
Why is MySql doing a full table scan here?
How can I change my query to use indexes instead of a full table scan?
CREATE TABLE mytbl ( lft int(11) NOT NULL DEFAULT '0',
rgt int(11) DEFAULT NULL,
id int(11) DEFAULT NULL,
category varchar(128) DEFAULT NULL,
PRIMARY KEY (lft),
UNIQUE KEY id (id),
UNIQUE KEY rgt (rgt),
KEY idx_lft (lft),
KEY idx_rgt (rgt) ) ENGINE=InnoDB DEFAULT CHARSET=utf8
Thanks

You have lot's of indexes, some of them are redundant. Let's begin by clearing up some of them. Too many indexes slows down inserts and updates.
PRIMARY KEY (lft),
KEY idx_lft (lft),
Since you already have a primary key defined on lft, there is no need what so ever for another index on lft. Similarly with a unique index on rgt there is not need for the second index listed below.
UNIQUE KEY rgt (rgt),
KEY idx_rgt (rgt)
Now let's look at your query.
SELECT A.id
FROM mytbl AS A
LEFT JOIN mytbl AS B
ON (A.lft BETWEEN B.lft AND B.rgt)
This is very unlikely to be a query that will be encountered in the wild. With 500 rows, this query may produce even 5000 rows? Do you really need the entire key created in one go? The reason that this query is slow is because mysql can only optimize range comparisions for constants. It's more likely that your actually query will look something like this:
SELECT B.*
FROM mytbl AS A
LEFT JOIN mytbl AS B
ON (A.lft BETWEEN B.lft AND B.rgt)
WHERE a.id = N;
Where you create the node for a particular id. This will use indexes and will be really fast. What's the point of optimizing for a query that you will not use much if at all?

The following SO question is critical to the solution, as there is very little information on the combination of adjacency lists and indices:
MySQL & nested set: slow JOIN (not using index)
It appears that adding a basic comparison condition triggers the use of an index, like so:
SELECT A.id
FROM mytbl AS A
LEFT JOIN mytbl AS B ON (A.lft BETWEEN B.lft AND B.rgt)
-- THE FOLLOWING DUMMY CONDITIONS TRIGGER INDEX
WHERE A.lft > 0
AND B.lft > 0
AND B.rgt > 0
And no more table scans.
EDIT: Comparison of EXPLAIN function between fixed and unfixed version of the query:

how to cache a subset for cascading select queries in mysql

heres another database problem I stubled upon.
I have a date-range partitioned Myisam lookup table with 200M records and ~150 columns.
On this Table I need to perform cascading SELECT-Statements to filter the data.
Output:
filter 126M
filter 110M
filter 40M
filter 5M
filter 100k
Every single SELECT is highly complex with regex (=no index possible) and multiple comparisons, which is why I want them to query the least amount of rows possible.
There are about 500 unique filters and around 200 constant users. Every filter needs to be run for each user, in total around 100k combinations.
Big question:
Is there a way for each subsequent SELECT statement to query only the previous subset?
Example:
Filter #5 should only have to query the 5M rows out of query 4 to get those 100k results. At the moment it has to scan through all 200M records.
EDIT
current approach: cache table
CREATE TABLE IF NOT EXISTS cache
( filter_id int(11) NOT NULL,
user_id int(11) NOT NULL,
lookup_id int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
ALTER TABLE cache ADD PRIMARY KEY (filter_id,user_id);
This would contain the relation between individual data-rows from the lookup table and the filters. PLUS I'd be able to use the primary index to get all of the lookup_ids from the previous filter.
Query for subsequent filters:
SELECT SUM( column), COUNT(*)
FROM cache c
LEFT JOIN lookup_table l
ON c.lookup_id= l.id
WHERE c.filter_id = 1
AND c. user_id= x
AND l.regex_column = preg_rlike...

May be you should save primary key of selected records to a some kind of temporary table? On next step join that temp table with your main table.

Slow SQL query when grouping by two columns with self join

I have a table rating with slightly less than 300k rows and a SQL query:
SELECT rt1.product_id as id1, rt2.product_id as id2, sum(1), sum(rt1.rate-rt2.rate) as sum
FROM rating as rt1
JOIN rating as rt2 ON rt1.user_id = rt2.user_id AND rt1.product_id != rt2.product_id
group by rt1.product_id, rt2.product_id
LIMIT 1
The problem is.. it's really slow. It takes 36 secs to execute it with limit 1, while I need to execute it without limit.
As I figured out, slowdown it caused by GROUP BY part. It works fine while grouping by one column no matter from which table rt1 or rt2.
I have also tried with indexes, I have created already indexes for user_id, product_id, rate and (user_id, product_id).
EXPLAIN doesn't tell much to me too.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE rt1 ALL PRIMARY,user_id,user_product NULL NULL NULL 289700 Using temporary; Using filesort
1 SIMPLE rt2 ref PRIMARY,user_id,user_product user_id 4 mgrshop.rt1.user_id 30 Using where
I need this to execute just once to generate some data, so it's not important to achieve optimal time, but reasonable.
Any ideas?
Edit.
Full table schema
CREATE TABLE IF NOT EXISTS `rating` (
`user_id` int(11) NOT NULL,
`product_id` int(11) NOT NULL,
`rate` int(11) NOT NULL,
PRIMARY KEY (`user_id`,`product_id`),
KEY `user_id` (`user_id`),
KEY `product_id` (`product_id`),
KEY `user_product` (`user_id`,`product_id`),
KEY `rate` (`rate`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Your problem is in the join, specifically AND rt1.product_id != rt2.product_id.
Lets say a user has rated 100 products, for that user, this query will generate 99,000 rows before it does the group by. For each of the 100 ratings, the table gets joined back to itself 99 times.
What is the question you are trying to answer with this query? Depending on that, there may be some more efficient approaches. Its just hard to tell what you are trying to achieve here.

In addition to what Declan_K mentioned about your cross-join result set that could be 100k rows before you know it, you could cut that down significantly by changing to just
rt1.product_id < rt2.product_id
instead of
rt1.product_id != rt2.product_id
Reason... Since they are the same table/records, you will only need to cycle through them once for the RT1.product_ID. With it being less than the highest, you'll already have the high as part of your compare. As it stands, if you did (for a single user) have 5 products (1-5), you would be getting results of
(1,2) (1,3) (1,4) (1,5)
(2,1) (2,3) (2,4) (2,5)
(3,1) (3,2) (3,4) (3,5)
(4,1) (4,2) (4,3) (4,5)
(5,1) (5,2) (5,3) (5,4)
By changing to LESS than, you'll eliminate the duplications such as 1,2 vs 2,1 1,3 vs 3,1
(1,2) (1,3) (1,4) (1,5)
(2,3) (2,4) (2,5)
(3,4) (3,5)
(4,5)
Just a bit of a smaller result set, and this is with only 5 products for one person.

My solution is not the easiest, but it should explain a little and speed up your query time.
When you join in MySQL, a temporary table is created. The more rows that are put into that temporary table, the more likely it is to go to disk. Disk is slow. The new temporary table has no indices. Querying without indices is slow.
The first line in your EXPLAIN statement is showing that the query will join first, creating a whole bunch of rows, and sticking that into a temporary table, and grouping by product ids. The key column is empty, showing that it can't use a key.
My solution is to create another table. This other table will consist of all the relevant columns from the JOIN. You'll need a batch job to update the table in the background. This will lead to slightly stale data, but it will run much faster.
CREATE TABLE `rate_tmp` (
userid ...,
id1 ...,
id2 ...,
rate1 ...,
rate2 ...,
PRIMARY KEY (id1, id2, userid)
)
The order on the primary key is very important. Your query then looks like this:
SELECT userid, id1, id2, sum(1), sum(rate1-rate2) as sum
from rate_tmp
group by id1, id2;
It should run very fast at that point, because, while the table is still persisted to disk, MySQL will not have to write the data to disk at query time. It can also, and more importantly, use the pre-defined indices that you have on the temporary table.

First I did it via temp table.
First selected rows without grouping and put them into a table made just for it. I got over 11kk rows. Then I just grouped them from temp table and put into final table.
Then I also tried to do this without creating any other table and it also worked for me.
SELECT id1, id2, sum(count), sum(sum)
FROM (SELECT rt1.product_id as id1, rt2.product_id as id2, 1 as count, rt1.rate - rt2.rate as sum
FROM rating as rt1
JOIN rating as rt2 ON rt1.user_id = rt2.user_id AND rt1.product_id != rt2.product_id) as temptab
GROUP BY id1, id2
And finally got about 19k rows.
Execution time: 35.8669
Not bad for my case of one-time data generating.

Proper Indexing/Optimization of a MySQL GROUP BY and JOIN Query

I've done a lot of reading and Googling on this and I cannot find any satisfactory answer so I'd appreciate any help. Most answers I find come close to my situation but do not address it (and attempting to follow the solutions has not done me any good).
See Edit #2 below for the best example
[This was the original question but is not a great representation of what I'm asking.]
Say I have 2 tables, each with 4 columns:
key (int, auto increment)
c1 (a date)
c2 (a varchar of length 3)
c3 (also a varchar of length 3)
And I want to perform the following query:
SELECT t.c1, t.c2, COUNT(*)
FROM test1 t
LEFT JOIN test2 t2 ON t2.key = t.key
GROUP BY t.c1, t.c2
Both key fields are indexed as primary keys. I want to get the number of rows returned in each grouping of c1, c2.
When I explain this query I get "using temporary; using filesort". The actual table I'm performing this query on is over 500,000 rows, so that means it's a time consuming query.
So my question is (assuming I'm not doing anything wrong in the query): is there a way to index this table to eliminate the temporary/filesort usage?
Thanks in advance for any help.
Edit
Here is the table definition (in this example both tables are identical - in reality they're not but I'm not sure it makes a difference at this point):
CREATE TABLE `test1` (
`key` int(11) NOT NULL auto_increment,
`c1` date NOT NULL,
`c2` varchar(3) NOT NULL,
`c3` varchar(3) NOT NULL,
PRIMARY KEY (`key`),
UNIQUE KEY `c1` (`c1`,`c2`),
UNIQUE KEY `c2_2` (`c2`,`c1`),
KEY `c2` (`c2`,`c3`)
) ENGINE=MyISAM AUTO_INCREMENT=3 DEFAULT CHARSET=utf8
Full EXPLAIN statement:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t ALL NULL NULL NULL NULL 2 Using temporary; Using filesort
1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 tracking.t.key 1 Using index
This is just for my sample tables. In my real tables the rows for t says 500,000+ (every row in the table, though that could be related to something else).
Edit #2
Here is a more concrete example to better explain my situation.
Let's say I have data on Little League baseball games. I have two tables. One holds data on the games:
CREATE TABLE `ex_games` (
`game_id` int(11) NOT NULL auto_increment,
`home_team` int(11) NOT NULL,
`date` date NOT NULL,
PRIMARY KEY (`game_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
The other holds data on the at bats in each game:
CREATE TABLE `ex_atbats` (
`ab_id` int(11) NOT NULL auto_increment,
`game` int(11) NOT NULL,
`team` int(11) NOT NULL,
`player` int(11) NOT NULL,
`result` tinyint(1) NOT NULL,
PRIMARY KEY (`hit_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
So I have two questions. Let's start with the simple version: I want to return a list of games with a count of how many at bats are in each game. So I think I would do something like this:
SELECT date, home_team, COUNT(h.ab_id) FROM `ex_atbats` h
LEFT JOIN ex_games g ON g.game_id = h.game
GROUP BY g.game_id
This query uses filesort/temporary. Is there a better way to structure this or to index the tables to get rid of that?
Then, the trickier part: say I now want to not only include a count of the number of at bats, but also include a count of the number of at bats that were preceded by an at bat with the same result by the same team. I assume that would be something like:
SELECT g.date, g.home_team, COUNT(ab.ab_id), COUNT(ab2.ab_id) FROM `ex_atbats` ab
LEFT JOIN ex_games g ON g.game_id = ab.game
LEFT JOIN ex_atbats ab2 ON ab2.ab_id = ab.ab_id - 1 AND ab2.result = ab.result
GROUP BY g.game_id
Is that the correct way to structure that query? This also uses filesort/temporary.
So what is the optimal way to go about accomplishing these tasks?
Thanks again.

Phrases Using temporary/filesort usually are not related to the indexes used in the JOIN operation. There is numerous examples where you can have all indexes set (they show up in key and key_len columns in EXPLAIN) but you still get Using temporary and Using filesort.
Check out what the manual says about Using temporary and Using filesort:
How MySQL Uses Internal Temporary Tables
ORDER BY Optimization
Having a combined index for all columns used in GROUP BY clause may help to get rid of Using filesort in certain circumstances. If you also issue ORDER BY you may need to add more complex indexes.
If you have a huge dataset consider partitioning it using some criteria like date or timestamp by means of actual partitioning or a simple WHERE clause.

First of all, the tables' definitions do matter. It's one thing to join using two primary keys, another to join using a primary key from one side and a non-unique key in the other, etc. It also matters what type of engine the tables use as InnoDB treats Primary Keys differently than MyISAM engine.
What I notice though is that on table test1, the (c1,c2) combination is Unique and the fields are not nullable. This allows your query to be rewritten as:
SELECT t.c1, t.c2, COUNT(*)
FROM test1 t
LEFT JOIN test2 t2 ON t2.key = t.key
GROUP BY t.key
It will give the same results while using the same field for the JOIN and the GROUP BY. Note that MySQL allows you to use in the SELECT list fields that are not in the GROUP BY list, without having aggregate functions on them. This is not allowed in most other systems and is seen as a bug by some. In this situation though it is a very nice feature. Every row can be either identified by (key) or (c1,c2), so it shouldn't matter which of the two is used for the grouping.
Another thing to note is that when you use LEFT JOIN, it's common to use the joining column from the right side for the counting: COUNT(t2.key) and not COUNT(*). Your original query will give 1 in that column for records in test1 that do not mmatch any record in test2 because it counts rows while you probably want to count the related records in test2 - and show 0 in those cases.
So, try this query and post the EXPLAIN:
SELECT t.c1, t.c2, COUNT(t2.key)
FROM test1 t
LEFT JOIN test2 t2 ON t2.key = t.key
GROUP BY t.key

The indexes help with the join, but you still need to do a full sort in order to do the group by. Essentially, it still has to process every record in the set.
Adding a where clause and limiting the set would run faster, of course. It just won't get you the results you want.
There may be other options than doing a group by on the entire table. I notice you're doing a SELECT * - What are you trying to get out of the query?
SELECT DISTINCT c1, c2
FROM test t
LEFT JOIN test2 t2 ON t2.key = t.key
may run faster, for instance. (I realize this was just a sample query, but understand that it's hard to optimize when you don't know what the end goal is!)
EDIT - In doing some reading (http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html), I learned that, under the correct circumstances, indexes can help significantly with the group by.
What I'm seeing is that it needs to be a sorted index (like BTREE), not a HASH. Perhaps:
CREATE INDEX c1c2 IN t (c1, c2) USING BTREE;
might help.

For innodb it will work, as the index caries your primary key by default. For myisam you have to have the key as the last column of your index be "key". That will give the optimizers all keys in the same order and he can skip the sort. You cannot do any range queryies on the index prefix theN, puts you right back into filesort. currently struggling with a similiar problem

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Mysql join optimize where clause - mysql

Related

mysql index selection on large table

MySql - Self Join - Full Table Scan (Cannot Scan Index)

how to cache a subset for cascading select queries in mysql

Slow SQL query when grouping by two columns with self join

Proper Indexing/Optimization of a MySQL GROUP BY and JOIN Query

Categories

Resources