I have the following table (it has more data columns, removed them because it would be a long post):
CREATE TABLE `members` (
`memberid` int(11) NOT NULL AUTO_INCREMENT,
`firstname` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
`lastname` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`memberid`),
KEY `members_lname_ix` (`lastname`)
) ENGINE=InnoDB AUTO_INCREMENT=1019 DEFAULT CHARSET=utf8
COLLATE=utf8_unicode_ci;
By default, a user only ever accesses 10-20 rows from this table at a time and it is usually sorted by the lastname column, it's all paginated server side. so I decided to add an index to lastname to help with sorting, however the index does not seem to be working like I would expect it to. when I run EXPLAIN SELECT * FROM members ORDER BY lastname ASC I get:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra
1 | simple | members | ALL | null | null | null | null | 711 | using filesort
I can at least confirm the index exists because if I run SHOW INDEX FROM members I get:
Table | Non_Unique | Key_name | Seq_in_ix | Col_name | Collation | Cardinality | Sub part | Packed | Null | Ix type
members | 0 | PRIMARY | 1 | memberid | A | 711 | null | null | (blank) | BTREE
members | 1 | members_lname_ix | 1 | lastname | A | 711 | null | null | YES | BTREE
if I add USE INDEX (members_lname_ix) both possible_keys and key will remain null. However if I add FORCE INDEX (members_lname_ix) possible_keys remains null and key shows members_lname_ix. This is my first time trying to apply indexing but to me this doesn't seem very intuitive - it feels like mysql should know that I created an index for lastname, no? I can't quite figure out what I'm doing wrong here unless I am misunderstanding something. Is the solution here to just keep using FORCE INDEX?
There are two ways to perform that query:
Plan A (as you were expecting):
Scan through the index sequentially, reading the entire (estimated) 711 rows.
Randomly look up each row in the data BTree. This involves reading the entire dataset.
Deliver the data in order.
Plan B (what it does):
Scan through the data, reading all 711 rows.
Sort the data
Deliver the sorted data.
Plan B does not touch the index at all; this was deemed to be a bigger savings than not having to sort the data.
In a table as tiny as yours, it would be hard to see a difference in speed. (In my test case, it took under 10 milliseconds either way.) In huge tables, the difference could be significant.
For optimal pagination, see http://mysql.rjweb.org/doc.php/pagination
Related
I have a MySQL (v8.0.30) table that stores trades, the schema is the following:
CREATE TABLE `log_fill` (
`id` bigint NOT NULL AUTO_INCREMENT,
`orderId` varchar(63) NOT NULL,
`clientOrderId` varchar(36) NOT NULL,
`symbol` varchar(31) NOT NULL,
`executionId` varchar(255) DEFAULT NULL,
`executionSide` tinyint NOT NULL COMMENT '0 = long, 1 = short',
`executionSize` decimal(15,2) NOT NULL,
`executionPrice` decimal(21,8) unsigned NOT NULL,
`executionTime` bigint unsigned NOT NULL,
`executionValue` decimal(21,8) NOT NULL,
`executionFee` decimal(13,8) NOT NULL,
`feeAsset` varchar(63) DEFAULT NULL,
`positionSizeBeforeFill` decimal(21,8) DEFAULT NULL,
`apiKey` int NOT NULL,
`side` varchar(20) DEFAULT NULL,
`reconciled` tinyint unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `executionId` (`executionId`,`executionSide`),
KEY `apiKey` (`apiKey`)
) ENGINE=InnoDB AUTO_INCREMENT=6522695 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
As you can see, there's a BTREE index on the column apiKey which stands for a user, this way I can quickly retrieve all trades for a specific user.
My goal is a query that returns positionSizeBeforeFill + executionSize for the last record, given an apiKey and a symbol. So I wrote the following:
SELECT positionSizeBeforeFill + executionSize
FROM log_fill
WHERE apiKey = 90 AND symbol = 'ABCD'
ORDER BY id DESC
However the execution is extremely slow (around 100ms). I've noticed that running either WHERE or ORDER BY (and not both together) drastically reduces execution time. For example
SELECT positionSizeBeforeFill + executionSize
FROM log_fill
WHERE apiKey = 90 AND symbol = 'ABCD'
only takes 220 microseconds to execute. The number of records after filtering by apiKey and symbol is 388.
Similarly,
SELECT positionSizeBeforeFill + executionSize
FROM log_fill
ORDER BY id DESC
takes 26 microseconds (on a 3 million records table).
All in all, separately running WHERE and ORDER BY takes microseconds of execution, when I combine them we scale up to milliseconds (around 1000x more).
Running EXPLAIN on the slow query it turns out it has to examine 116032 rows.
I tried to create a temporary table hoping for MySQL to perform sorting only on the filtered records, but the outcome is the same. Was wondering whether the problem might be the index (whose cardinality is 203), but how can it be the case when WHERE alone takes very little time? I could not find similar cases on other questions or forums. I think I just fail at understanding how InnoDB selects data, I thought it would first filter by WHERE and then perform ORDER BY on the filtered rows. How can I improve this? Thanks!
Edit: The EXPLAIN statement on the slow query returns
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+------+---------------+--------+---------+-------+--------+----------+----------------------------------+
| 1 | SIMPLE | log_fill_tmp | NULL | ref | apiKey | apiKey | 4 | const | 116032 | 10.00 | Using where; Backward index scan |
The query with WHERE only
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+------+---------------+--------+---------+-------+--------+----------+-------------+
| 1 | SIMPLE | log_fill_tmp | NULL | ref | apiKey | apiKey | 4 | const | 116032 | 10.00 | Using where |
The query with ORDER BY only on the full table
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+-------+---------------+---------+---------+------+---------+----------+---------------------+
| 1 | SIMPLE | log_fill_tmp | NULL | index | NULL | PRIMARY | 8 | NULL | 2503238 | 100.00 | Backward index scan |
26 microseconds for any query against a 3-million-row table implies that you have the Query cache enabled. Please rerunning your timings with SELECT SQL_NO_CACHE .... (Even milliseconds would be suspicious.) Were all 3M rows returned? Shoveling that many rows probably takes more than a second under any circumstances.
Meanwhile, to speed up the first two queries, add
INDEX(symbol, apikey, id)
EXPLAIN gives only approximate (eg, 116032) counts. A "cardinality" of 203 is also just an estimate, but it is used by the Optimizer in some situations. Please get the exact count just to check that there really are any rows:
SELECT COUNT(*)
FROM log_fill
WHERE apiKey = 90 AND symbol = 'ABCD'
With the ORDER BY id DESC, it will scan the entire B+Tree what holds the data. As it says, it will do a 'Backward index scan'. However, since the "index" is the PRIMARY KEY and the PK is clustered with the data, it is really referring to the data's BTree.
The EXPLAIN for the first query decided that the indexes were not useful enough for WHERE; instead it avoided the sort (ORDER BY) by doing the Backward full table scan, same as the 3rd query. (And ignored any rows that did not match the WHERE.
I added a composite index on (apiKey, symbol) and now the query runs in as little as 0.2ms. In order to reduce the creation time for the index I reduce the number of records from 3M to 500K and the gain in time is about 97%, I believe it's going to be more on the full table. I thought using just the apiKey index it would first filter out by user, then by symbol, but I'm probably wrong.
I'm new to query optimizations so I accept I don't understand everything yet but I do not understand why even this simple query isn't optimized as expected.
My table:
+------------------+-----------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+-----------+------+-----+-------------------+----------------+
| tasktransitionid | int(11) | NO | PRI | NULL | auto_increment |
| taskid | int(11) | NO | MUL | NULL | |
| transitiondate | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
+------------------+-----------+------+-----+-------------------+----------------+
My indexes:
+-----------------+------------+-------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+-------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tasktransitions | 0 | PRIMARY | 1 | tasktransitionid | A | 952 | NULL | NULL | | BTREE | | |
| tasktransitions | 1 | transitiondate_ix | 1 | transitiondate | A | 952 | NULL | NULL | | BTREE | | |
+-----------------+------------+-------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
My query:
SELECT taskid FROM tasktransitions WHERE transitiondate>'2013-09-31 00:00:00';
gives this:
+----+-------------+-----------------+------+-------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+-------------------+------+---------+------+------+-------------+
| 1 | SIMPLE | tasktransitions | ALL | transitiondate_ix | NULL | NULL | NULL | 1082 | Using where |
+----+-------------+-----------------+------+-------------------+------+---------+------+------+-------------+
If I understand everything correctly Using where and ALL means that all rows are retrieved from the storage engine and filtered at server layer. This is sub-optimal. Why does it refuse to use the index and only retrieve the requested range from the storage engine (innoDB)?
Cheers
MySQL will not use the index if it estimates that it would select a significantly large portion of the table, and it thinks that a table-scan is actually more efficient in those cases.
By analogy, this is the reason the index of a book doesn't contain very common words like "the" -- because it would be a waste of time to look up the word in the index and find the list of page numbers is a very long list, even every page in the book. It would be more efficient to simply read the book cover to cover.
My experience is that this happens in MySQL if a query's search criteria would match greater than 20% of the table, and this is usually the right crossover point. There could be some variation based on the data types, size of table, etc.
You can give a hint to MySQL to convince it that a table-scan would be prohibitively expensive, so it would be much more likely to use the index. This is not usually necessary, but you can do it like this:
SELECT taskid FROM tasktransitions FORCE INDEX (transitiondate_ix)
WHERE transitiondate>'2013-09-31 00:00:00';
I once was trying to join two tables and MySQL was refusing to use an index, resulting in >500ms queries, sometimes a few seconds. Turns out the column I was joining on had different encodings on each table. Changing both to the same encoding sped up the query to consistently less than 100ms.
Just in case, it helps somebody.
I have a table with a varchar column _id (long int coded as string). I added an index for this column, but query was still slow. I was executing this query:
select * from table where (_id = 2221835089) limit 1
I realized that the _id column wasn't been generated as string (I'm Laravel as DB framework). Well, if query is executed with the right data type in the where clause everything worked like a charm:
select * from table where (_id = '2221835089') limit 1
I am new at my MySQL 8.0, have finished 2 simple tutorials completely, and there is only two subjects that has not worked for me, one of them is indexing. I read the section labeled "2 Answers" and found that using
the statement suggested at the end of said section, seems to defeat the
purpose of the original USE INDEX or FORCE INDEX statement below. The suggested statement is like getting a table sorted via a WHERE statement instead of MySQL using USE INDEX or FORCE INDEX. It works, but seems to me it is not the same as using the natural USE INDEX or FORCE INDEX. Does any one knows why MySQL is ignoring my simple request to index a 10 row table on the Lname column?
Field
Type
Null
Key
Default
Extra
ID
int
NO
PRI
Null
auto_increment
Lname
varchar(20)
NO
MUL
Null
Fname
varchar(20)
NO
Mul
Null
City
varchar(15)
NO
Null
Birth_Date
date
NO
Null
CREATE INDEX idx_Lname ON TestTable (Lname);
SELECT * FROM TestTable USE INDEX (idx_Lname);
SELECT * From Testtable FORCE INDEX (idx_LastFirst);
I am trying to find the closest gene, given position information, from a gene table. Here is an example:
SELECT chrom, txStart, txEnd, name2, strand FROM
wgEncodeGencodeCompV12 WHERE chrom = 'chr1' AND txStart < 713885 AND
strand = '+' ORDER BY txStart DESC LIMIT 1;
My test runs have been pretty slow, which is problematic.
Here is an EXPLAIN output with default indexing (by chrom):
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | wgEncodeGencodeCompV12 | ref | chrom | chrom | 257 | const | 15843 | Using where; Using filesort |
Filesort is used and is probably causing all the sluggishness?
I tried speeding up the sorting by indexing (chrom, txStart, strand), or just txStart alone, but it only got slower (?). My reasoning is that txStart is not selective enough to be a good index, and that a whole-table scanning in this case is actually faster?
Here is the EXPLAIN output with the additional indexing:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | wgEncodeGencodeCompV12 | range | chrom,closest_gene_lookup | closest_gene_lookup | 261 | NULL | 57 | Using where |
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | wgEncodeGencodeCompV12 | range | chrom,txStart | txStart | 4 | NULL | 1571 | Using where |
Table structure
CREATE TABLEwgEncodeGencodeCompV12(
binsmallint(5) unsigned NOT NULL,
namevarchar(255) NOT NULL,
chromvarchar(255) NOT NULL,
strandchar(1) NOT NULL,
txStartint(10) unsigned NOT NULL,
txEndint(10) unsigned NOT NULL,
cdsStartint(10) unsigned NOT NULL,
cdsEndint(10) unsigned NOT NULL,
exonCountint(10) unsigned NOT NULL,
exonStartslongblob NOT NULL,
exonEndslongblob NOT NULL,
scoreint(11) default NULL,
name2varchar(255) NOT NULL,
cdsStartStatenum('none','unk','incmpl','cmpl') NOT NULL,
cdsEndStatenum('none','unk','incmpl','cmpl') NOT NULL,
exonFrameslongblob NOT NULL,
KEYchrom(chrom,bin),
KEYname(name),
KEYname2(name2)
)
Is there a way to make this more efficient? I appreciate your time!
(update)Solution:
Combining both commenters' suggestions significantly improved run time.
In your case (query on a single table, no joins, no complicated stuff) it is important to understand the distribution of values in each column, and to understand how the database server utilizes the indexes. When you have a field with a rather big range of different values, then that one should be used for indexing. (e.g. an index on strand would just split the whole data in + or - and downstream filters would have to process each row of the either + or - result set, thats near the worst case)
So far, we know that txStart has the most differentiated values distribution amongst the interesting columns of your query.
So, your query definitely should utilize an index query on that column! But a btree index, not a hash index (operators <, <=, > etc. are fast on btree, but not on hash).
Try again with just a single (btree) index on txStart (I know you already tried that, but please try again and avoid all secondary indexes etc..).
Multi column indexes are nice, but their complexity make them not as fast as plain single column indexes, MySQLs optimizer is rather stupid in selecting the optimal indexes ;-)
Another important factor could be the dynamic row size (because of using longblob columns). But I am not up-to-date on the current state of MySQL in that regard.
The index that you want is: wgEncodeGencodeCompV12(chrom, strand, txstart).
In general, you want the fields with equalities as the first columns in the index. Then add one field with the inequality.
My MySQL database has over 350 million rows, and is growing. It's 32GB in size right now. I am using SSD's and lots of RAM, but would like to seek advice to make sure I am using appropriate indexes.
CREATE TABLE `qcollector` (
`key` bigint(20) NOT NULL AUTO_INCREMENT,
`instrument` char(4) DEFAULT NULL,
`datetime` datetime DEFAULT NULL,
`last` double DEFAULT NULL,
`lastsize` int(10) DEFAULT NULL,
`totvol` int(10) DEFAULT NULL,
`bid` double DEFAULT NULL,
`ask` double DEFAULT NULL,
PRIMARY KEY (`key`),
KEY `datetime_index` (`datetime`)
) ENGINE=InnoDB;
show index from qcollector;
+------------+------------+----------------+--------------+-------------+-----------+-- -----------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| qcollector | 0 | PRIMARY | 1 | key | A | 378866659 | NULL | NULL | | BTREE | | |
| qcollector | 1 | datetime_index | 1 | datetime | A | 63144443 | NULL | NULL | YES | BTREE | | |
+------------+------------+----------------+--------------+-------------+-----------+------ -------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.03 sec)
select * from qcollector order by datetime desc limit 1;
+-----------+------------+---------------------+---------+----------+---------+---------+--------+
| key | instrument | datetime | last | lastsize | totvol | bid | ask |
+-----------+------------+---------------------+---------+----------+---------+---------+--------+
| 389054487 | ES | 2012-06-29 15:14:59 | 1358.25 | 2 | 2484771 | 1358.25 | 1358.5 |
+-----------+------------+---------------------+---------+----------+---------+---------+--------+
1 row in set (0.09 sec)
A typical query that is slow (full table scan, this query takes 3-4 minutes):
explain select date(datetime), count(lastsize) from qcollector where instrument = 'ES' and datetime > '2011-01-01' and time(datetime) between '15:16:00' and '15:29:00' group by date(datetime) order by date(datetime) desc;
+------+-------------+------------+------+----------------+------+---------+------+-----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+------+----------------+------+---------+------+-----------+----------------------------------------------+
| 1 | SIMPLE | qcollector | ALL | datetime_index | NULL | NULL | NULL | 378866659 | Using where; Using temporary; Using filesort |
+------+-------------+------------+------+----------------+------+---------+------+-----------+----------------------------------------------+
A couple ideas for you to consider:
A covering index (that is, an index that includes ALL of the columns referenced in the query) may help some. Such an index is going to require more disk (SSD?) space, but it will remove the necessity for MySQL to visit the data pages to lookup the values of the columns that aren't in the index.
ON qcollector (datetime,instrument,lastsize)
or
ON qcollector (instrument,datetime,lastsize)
Do you really need to exclude rows that have a NULL value for lastsize from the count? Could you return a count of all rows instead? If you could instead return COUNT(1) or SUM(1), then the query wouldn't need to reference the lastsize column, so it wouldn't be needed in an index to make it a covering index.
The COUNT(lastsize) expression is equivalent to SUM(IF(lastsize IS NULL,0,1))
Do you need to return dates when there are only NULL lastsize values for the datetime range, or could all of the rows with a NULL lastsize be excluded? That is, could you include a predicate like
AND lastsize IS NOT NULL
in your query?
Those may help some.
I think the big problem is that the predicates on the TIME(datetime) expression are not sargable. That is, MySQL won't use an index range scan operation for those. The predicate on the bare datetime column is sargable... that's why the EXPLAIN is showing the datetime_index as a possible key.
And the other big problem is that the query is doing GROUP BY and ORDER BY operations on a derived expression, which is going to require MySQL to generate an intermediate result set (as a temporary MyISAM table), and then process that result set. And that can be a lot of heavy lifting when there are lots of rows to process.
As far as table changes, I would consider using separate DATE and TIME columns, and using a TIMESTAMP datatype in place of DATETIME (if you need to store the date and time together). I would rewrite the query to reference the bare DATE and bare TIME columns, and consider adding a covering index that included all columns referenced in the rewritten query, with leading columns being the columns with the highest cardinality (and having the most selective predicates in the query.)
When you use date and time functions on a column the indexes cannot be used efficiently. You could also store the date and time in separate columns and index those, though this will take up more storage space.
You may also want to consider adding multi-column indexes. An index on (instrument, datetime) would probably help you here.
I have a database with the following stats
Tables Data Index Total
11 579,6 MB 0,9 GB 1,5 GB
So as you can see the Index is close to 2x bigger. And there is one table with ~7 million rows that takes up at least 99% of this.
I also have two indexes that are very similar
a) UNIQUE KEY `idx_customer_invoice` (`customer_id`,`invoice_no`),
b) KEY `idx_customer_invoice_order` (`customer_id`,`invoice_no`,`order_no`)
Update: Here is the table definition (at least structurally) of the largest table
CREATE TABLE `invoices` (
`id` int(10) unsigned NOT NULL auto_increment,
`customer_id` int(10) unsigned NOT NULL,
`order_no` varchar(10) default NULL,
`invoice_no` varchar(20) default NULL,
`customer_no` varchar(20) default NULL,
`name` varchar(45) NOT NULL default '',
`archived` tinyint(4) default NULL,
`invoiced` tinyint(4) default NULL,
`time` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
`group` int(11) default NULL,
`customer_group` int(11) default NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `idx_customer_invoice` (`customer_id`,`invoice_no`),
KEY `idx_time` (`time`),
KEY `idx_order` (`order_no`),
KEY `idx_customer_invoice_order` (`customer_id`,`invoice_no`,`order_no`)
) ENGINE=InnoDB AUTO_INCREMENT=9146048 DEFAULT CHARSET=latin1 |
Update 2:
mysql> show indexes from invoices;
+----------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| invoices | 0 | PRIMARY | 1 | id | A | 7578066 | NULL | NULL | | BTREE | |
| invoices | 0 | idx_customer_invoice | 1 | customer_id | A | 17 | NULL | NULL | | BTREE | |
| invoices | 0 | idx_customer_invoice | 2 | invoice_no | A | 7578066 | NULL | NULL | YES | BTREE | |
| invoices | 1 | idx_time | 1 | time | A | 541290 | NULL | NULL | | BTREE | |
| invoices | 1 | idx_order | 1 | order_no | A | 6091 | NULL | NULL | YES | BTREE | |
| invoices | 1 | idx_customer_invoice_order | 1 | customer_id | A | 17 | NULL | NULL | | BTREE | |
| invoices | 1 | idx_customer_invoice_order | 2 | invoice_no | A | 7578066 | NULL | NULL | YES | BTREE | |
| invoices | 1 | idx_customer_invoice_order | 3 | order_no | A | 7578066 | NULL | NULL | YES | BTREE | |
+----------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
My questions are:
Is there a way to find unused indexes in MySQL?
Are there any common mistakes that impact the size of the index?
Can indexA safely be removed?
How can you measure the size of each index? All I get is the total of all indexes.
You can remove index A, because, as you have noted, it is a subset of another index. And it's possible to do this without disrupting normal processing.
The size of the index files is not alarming in itself and it can easily be true that the net benefit is positive. In other words, the usefulness and value of an index shouldn't be discounted because it results in a large file.
Index design is a complex and subtle art involving a deep understanding of the query optimizer explanations and extensive testing. But one common mistake is to include too few fields in an index in order to make it smaller. Another is to test indexes with insufficient, or insufficiently representative data.
I may be wrong, but the first index (idx_customer_invoice) is UNIQUE, the second (idx_customer_invoice_order) is not, so you'll probably lose the uniqueness constraint when you remove it. No?
Is there a way to find unused indexes in MySQL?
The database engine optimizer will select a proper index when attempting to optimize your query. Depending on when you collected statistics on your indexes last, the index which is chosen will vary. Unused indexes could suddenly become used because of new data repartition.
Can indexA safely be removed?
I would say yes, if indexA and indexB are B-Tree indexes. This is because an index that starts with the same columns in the same order will have the same structure.
use
show indexes from table;
to define what indexes do you have in a particular table. Cardinality would tell how useful your index is.
You can remove your indexes safely (it will not break a table), but beware: some queries might execute slower. First you should analyze your queries to decide whether you need a certain index or not.
I don't think you can find out data length of a particular index, though.
BUT, I think you probably think that if indexes length is greater than data length twice is something abnormal... Well, you are wrong. All of your indexes might be useful ;) If you have a table that provides a lot of information and you have to search on it upon a large number of column, it can easily be that indexes of this table will 2 times bigger in size that the tables data.
indexA can remove because there's a
indexB include indexA
what impact your index length is
your column type and column length
use:
select index_length from information_schema.tables
where table_name='your_table_name' and
table_schema='your_db_name';
get your table index_length