I have a MySQL database with the following structure :
mysql> describe company;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int | NO | PRI | NULL | auto_increment |
| name | varchar(50) | NO | | NULL | |
+-------+-------------+------+-----+---------+----------------+
mysql> describe nameserver;
+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| id | int | NO | PRI | NULL | auto_increment |
| companyId | int | NO | MUL | NULL | |
| ns | varchar(250) | NO | MUL | NULL | |
+-----------+--------------+------+-----+---------+----------------+
mysql> describe domain;
+--------------+--------------+------+-----+-------------------+-------------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+-------------------+-------------------+
| id | int | NO | PRI | NULL | auto_increment |
| nameserverId | int | NO | MUL | NULL | |
| domain | varchar(250) | NO | MUL | NULL | |
| tld | varchar(20) | NO | MUL | NULL | |
| createDate | datetime | NO | | CURRENT_TIMESTAMP | DEFAULT_GENERATED |
| updatedAt | datetime | YES | | NULL | |
| status | tinyint | NO | | NULL | |
| fileNo | smallint | NO | MUL | NULL | |
+--------------+--------------+------+-----+-------------------+-------------------+
The indexes structure :
-- Indexes for table `company`
--
ALTER TABLE `company`
ADD PRIMARY KEY (`id`);
--
-- Indexes for table `domain`
--
ALTER TABLE `domain`
ADD PRIMARY KEY (`id`),
ADD KEY `nameserver` (`nameserverId`),
ADD KEY `domain` (`domain`),
ADD KEY `tld` (`tld`),
ADD KEY `fileNo` (`fileNo`);
--
-- Indexes for table `nameserver`
--
ALTER TABLE `nameserver`
ADD PRIMARY KEY (`id`),
ADD KEY `company` (`companyId`),
ADD KEY `ns` (`ns`);
--
-- AUTO_INCREMENT for dumped tables
--
--
-- AUTO_INCREMENT for table `company`
--
ALTER TABLE `company`
MODIFY `id` int NOT NULL AUTO_INCREMENT;
--
-- AUTO_INCREMENT for table `domain`
--
ALTER TABLE `domain`
MODIFY `id` int NOT NULL AUTO_INCREMENT;
--
-- AUTO_INCREMENT for table `nameserver`
--
ALTER TABLE `nameserver`
MODIFY `id` int NOT NULL AUTO_INCREMENT;
--
-- Constraints for dumped tables
--
--
-- Constraints for table `domain`
--
ALTER TABLE `domain`
ADD CONSTRAINT `nameserver` FOREIGN KEY (`nameserverId`) REFERENCES `nameserver` (`id`);
--
-- Constraints for table `nameserver`
--
ALTER TABLE `nameserver`
ADD CONSTRAINT `company` FOREIGN KEY (`companyId`) REFERENCES `company` (`id`);
The amount of data is as following:
domain table about 500 millions records
nameserver table about 2 millions records
Running this query take about 4 hours to get me the result :
SELECT distinct domain FROM domain
INNER join nameserver on nameserver.id = domain.nameserverId
WHERE nameserver.companyId = 2
The explain result for above query :
+----+-------------+------------+------------+------+-------------------
+------------+---------+-----------------------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+-------------------+------------+---------+-----------------------+------+----------+------------------------------+
| 1 | SIMPLE | nameserver | NULL | ref | PRIMARY,company | company | 4 | const | 1738 | 100.00 | Using index; Using temporary |
| 1 | SIMPLE | domain | NULL | ref | nameserver,domain | nameserver | 4 | tldzone.nameserver.id | 716 | 100.00 | NULL |
+----+-------------+------------+------------+------+-------------------+------------+---------+-----------------------+------+----------+------------------------------+
My question is how can I improve the speed of getting query from this database?
It is possible for me to change the DB structure or even replace it with another DBMS.
MySQL is running on a VPS with 8.0 GB RAM and dual core CPU.
nameserver: INDEX(companyId, id) -- in this order (you have this)
domain: INDEX(nameserverId, domain) -- in this order
("MUL" does not tell me whether you already have either of these composite indexes. SHOW CREATE TABLE is more descriptive than DESCRIBE.)
1 Add indexes to the relevant columns: Adding indexes to the companyId, nameserverId, and domain columns in the nameserver and domain tables can help to speed up the query by allowing the database to quickly locate the relevant rows.
2 Use a covering index: A covering index is an index that includes all the columns that are used in the query. By creating a covering index on the companyId, nameserverId, and domain columns, you can avoid the need for the database to look up the data in the actual tables, which can improve query performance.
3 Use a column-store index: A column-store index is an index that stores data by column rather than by row. Column-store indexes can be more efficient for querying large datasets and can improve the performance of the query you provided.
4 Use a database management system that is optimized for large datasets: If you are using a database management system that is not well-suited to handling large datasets, you may see improved performance by switching to a different system. Some options to consider include column-oriented database management systems such as Vertica or ClickHouse, or distributed database management systems such as Cassandra or HBase.
5 Consider using a distributed database: If you have a very large dataset and are still experiencing slow query performance, you may want to consider using a distributed database management system, which allows you to spread your data across multiple servers and can improve the scalability and performance of your database.
6 It's important to keep in mind that the specific solutions that work best for you will depend on the specific requirements of your database and the workload you are placing on it. It may be helpful to perform some benchmarking and testing to determine which approaches work best for your needs.
Related
I'm faced with a MySQL database which contains an events table with ~70 million rows which has foreign keys to other tables and is used to generate reports. Constructing a performant query to select (while counting/summing values) and grouping data per day from this table is proving challenging.
The database structure is as follows:
CREATE TABLE `client` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_client_id_name` (`id`,`name`)
) ENGINE=InnoDB AUTO_INCREMENT=66 DEFAULT CHARSET=utf8mb3
CREATE TABLE `class` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`client_id` int DEFAULT NULL,
`duration` int DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_client_id_idx` (`client_id`),
CONSTRAINT `fk_client_id` FOREIGN KEY (`client_id`) REFERENCES `client` (`id`) ON DELETE SET NULL ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=2606 DEFAULT CHARSET=utf8mb3
CREATE TABLE `event` (
`id` int NOT NULL AUTO_INCREMENT,
`start_time` datetime DEFAULT NULL,
`class_id` int DEFAULT NULL,
`venue_id` int DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_class_id_idx` (`class_id`),
KEY `fk_venue_id_idx` (`venue_id`),
KEY `idx_1` (`venue_id`,`class_id`,`start_time`),
CONSTRAINT `fk_class_id` FOREIGN KEY (`class_id`) REFERENCES `class` (`id`) ON DELETE SET NULL ON UPDATE CASCADE,
CONSTRAINT `fk_venue_id` FOREIGN KEY (`venue_id`) REFERENCES `venue` (`id`) ON DELETE SET NULL ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=64093231 DEFAULT CHARSET=utf8mb3
CREATE TABLE `venue` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_venue_id_name` (`id`,`name`)
) ENGINE=InnoDB AUTO_INCREMENT=29 DEFAULT CHARSET=utf8mb3
The query which is fine on an events table with a few thousand rows to demonstrate the desired outcome is as follows:
SELECT
CAST(event.start_time as date) as day,
class.name,
client.name,
venue.name,
COUNT(class.name) AS occurrences,
SUM(class.duration) AS duration
FROM
class,
client,
event,
venue
WHERE
event.venue_id = venue.id
AND event.class_id = class.id
AND class.client_id = client.id
GROUP BY day, class.name, client.name, venue.name
The database isn't indexed and although I've tried indexing with things like alter table events add index idx_test (venue_id, class_id, start_time); to improve performance it's still incredibly slow (I tend to abort them when they're past the 10 minute mark so don't know for sure how long they'd take to complete).
I figured this was a good use case for a summary table (as suggested by Rick James' guide) so that I could hold a separate set of summarized data broken down into day with occurrences and total duration calculated/incremented with each addition to the table (IODKU). However I'm then also up against creating rows per day in a summary table based on what is considered a day in the database (UTC) which may not match with the application's "day" due to timezone offset.
Short of converting the start_time column to a timestamp type (which is then inconsistent with all other date types in the database) is there any way round this or is there any other optimization I could be making to the original events table resulting in a more responsive query? TIA
Update 23/05
Here's the buffer pool size:
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
+-------------------------+-----------+
| Variable_name | Value |
+-------------------------+-----------+
| innodb_buffer_pool_size | 134217728 |
+-------------------------+-----------+
I've also made a bit of progress with indexing, modifying the query and creating a summary table.
I tried various ordering of columns to test indexes and found idx_event_venueid_classid_starttime (below), to be the most efficient for the event table:
SHOW INDEXES FROM EVENT;
+-------+------------+-------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+-------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| event | 0 | PRIMARY | 1 | id | A | 62142912 | NULL | NULL | | BTREE | | | YES | NULL |
| event | 1 | fk_class_id_idx | 1 | class_id | A | 51286 | NULL | NULL | YES | BTREE | | | YES | NULL |
| event | 1 | fk_venue_id_idx | 1 | venue_id | A | 16275 | NULL | NULL | YES | BTREE | | | YES | NULL |
| event | 1 | idx_event_venueid_classid_starttime | 1 | venue_id | A | 13378 | NULL | NULL | YES | BTREE | | | YES | NULL |
| event | 1 | idx_event_venueid_classid_starttime | 2 | class_id | A | 81331 | NULL | NULL | YES | BTREE | | | YES | NULL |
| event | 1 | idx_event_venueid_classid_starttime | 3 | start_time | A | 63909472 | NULL | NULL | YES | BTREE | | | YES | NULL |
+-------+------------+-------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
Here's my modified version of the query, using JOIN syntax and now uses CONVERT_TZ to convert from UTC to the timezone required for reporting and then group that by the date (discarding the time portion):
SELECT
DATE(CONVERT_TZ(event.start_time,
'UTC',
'Europe/London')) AS tz_date,
class.name,
client.name,
venue.name,
COUNT(class.id) AS occurrences,
SUM(class.duration) AS duration
FROM
event
JOIN
class ON class.id = event.class_id
JOIN
venue ON venue.id = event.venue_id
JOIN
client ON client.id = class.client_id
GROUP BY tz_date, class.name, client.name, venue.name;
And here's the output of explain for that query:
+----+-------------+--------+------------+--------+---------------------------------------------------------------------+-------------------------------------+---------+-------------------------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+--------+---------------------------------------------------------------------+-------------------------------------+---------+-------------------------+------+----------+------------------------------+
| 1 | SIMPLE | venue | NULL | index | PRIMARY,idx_venue_id_name | idx_venue_id_name | 772 | NULL | 28 | 100.00 | Using index; Using temporary |
| 1 | SIMPLE | event | NULL | ref | fk_class_id_idx,fk_venue_id_idx,idx_event_venueid_classid_starttime | idx_event_venueid_classid_starttime | 5 | example.venue.id | 4777 | 100.00 | Using where; Using index |
| 1 | SIMPLE | class | NULL | eq_ref | PRIMARY,fk_client_id_idx | PRIMARY | 4 | example.event.class_id | 1 | 100.00 | Using where |
| 1 | SIMPLE | client | NULL | eq_ref | PRIMARY,idx_client_id_name | PRIMARY | 4 | example.class.client_id | 1 | 100.00 | NULL |
+----+-------------+--------+------------+--------+---------------------------------------------------------------------+-------------------------------------+---------+-------------------------+------+----------+------------------------------+
The query takes ~1m 20s to run now so I figured I could prepend that with an insert into to populate a summary table with the dates being timezone specific and run that on a nightly basis. Summary table structure:
CREATE TABLE `summary` (
`tz_date` date NOT NULL,
`class` varchar(255) NOT NULL,
`client` varchar(255) NOT NULL,
`venue` varchar(255) NOT NULL,
`occurrences` int NOT NULL,
`duration` int NOT NULL,
PRIMARY KEY (`tz_date`,`class`,`client`,`venue`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
From the original ~60m+ rows in the event table, the aggregated summary table is populated with ~66k rows.
To then generate the reports from the summary table it takes a fraction of a second (shown below with data snipped):
SELECT * FROM SUMMARY;
66989 rows in set (0.03 sec)
I haven't looked into the impact of inserting into event while the query to populate the summary table is running - is using InnoDB likely to slow that down?
No further indexes are likely to help. It need to scan all the events table, reaching into the other tables to get the names.
Some things for us to look at:
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
EXPLAIN SELECT ...
How much RAM do you have?
Do the aggregates (COUNT and SUM) look correct? In some situations involving JOIN, they can be over-inflated.
Please use the newer JOIN ... ON syntax. (Won't change performance.)
As you observed, a Summary Table may help -- but only of the older data is not being modified. Please provide the SHOW CREATE TABLE and query for it.
Yes, timezone vs "definition of day" is a thorny issue. Notice how StackOverflow defines day based on UTC.
How many new rows are there per day? Are they spread out somewhat evenly throughout the day? If the average number of rows per hour is at least 20, then the Summary Table could be based on half-hour intervals. (I picked that because of India time vs most of the rest of the world.) The 20 comes from a Rule of Thumb that says that a summary table should have one-tenth as many rows as the Fact table.
Yes, TIMESTAMP instead of DATETIME may be a workaround.
Since you are talking about moderately large tables, consider whether to change INT NULL to SMALLINT UNSIGNED NOT NULL or some other sized integer.
(As for the cliff in 2038, ask yourself how many databases have been active on the same hardware and software since 2006. That may give some perspective on whether your design must survive 16 years.)
I seem to hit slower query result for searching nearby coordinates ( for now the query is for latitude). This is a mysql query
select ABS(propertyCoordinatesLat - 3.33234) as diff from tablename order by diff asc limit 0,20
is there a way to improve this besides relying on server scripting to do the sorting?
table dump.
CREATE TABLE `property` (
`propertyID` bigint(20) NOT NULL,
`propertyName` varchar(100) NOT NULL,
`propertyCoordinatesLat` varchar(100) NOT NULL,
`propertyCoordinatesLng` varchar(100) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--
-- Indexes for dumped tables
--
--
-- Indexes for table `property`
--
ALTER TABLE `property`
ADD PRIMARY KEY (`propertyID`),
ADD KEY `propertyCoordinatesLat` (`propertyCoordinatesLat`,`propertyCoordinatesLng`),
ADD KEY `propertyCoordinatesLat_2` (`propertyCoordinatesLat`),
ADD KEY `propertyCoordinatesLng` (`propertyCoordinatesLng`);
--
-- AUTO_INCREMENT for dumped tables
--
--
-- AUTO_INCREMENT for table `property`
--
ALTER TABLE `property`
MODIFY `propertyID` bigint(20) NOT NULL AUTO_INCREMENT;
COMMIT;
The query is ordering by the difference between a string and a float. This odd calculation confuses and angers MySQL and results in a slow filesort.
mysql> explain select ABS(propertyCoordinatesLat - 3.33234) as diff from property order by diff
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-----------------------------+
| 1 | SIMPLE | property | NULL | index | NULL | propertyCoordinatesLat_2 | 302 | NULL | 1 | 100.00 | Using index; Using filesort |
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-----------------------------+
Changing propertyCoordinatesLat and propertyCoordinatesLng to a more sensible numeric type lets MySQL optimize better. No more filesort. This should perform much better.
alter table property change propertyCoordinatesLat propertyCoordinatesLat numeric(10,8) not null;
alter table property change propertyCoordinatesLng propertyCoordinatesLng numeric(11,8) not null;
mysql> explain select ABS(propertyCoordinatesLat - 3.33234) as diff from property order by propertyCoordinatesLat asc limit 0,20;
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | property | NULL | index | NULL | propertyCoordinatesLat_2 | 5 | NULL | 1 | 100.00 | Using index |
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-------------+
If you want to get fancy, look into MySQL's spatial types. These will probably perform better, and definitely be more accurate.
i have this table
CREATE TABLE IF NOT EXISTS `transaction` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`amount` bigint(20) NOT NULL,
`req_id` int(11) NOT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `transactions_873a2484` (`req_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_persian_ci AUTO_INCREMENT=914 ;
i want to change this forign key transactions_873a2484 to a unque forign key
basically i want to change it to
UNIQUE KEY `transactions_req_id_de2b5683_uniq` (`req_id`),
i already have lots of data in my table otherwise i would have just remake this table .... is there anyway to do this withouth harming the data ?
I will improve this as I go. MySQL will honor your wishes, even allow you to shoot yourself in the foot as you go:
create table t9
(
id int auto_increment primary key,
thing varchar(20) not null,
key(thing),
unique key (thing),
unique key `yet_another` (thing)
);
-- warning 1831 dupe index
show create table t9;
CREATE TABLE `t9` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`thing` varchar(20) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `thing_2` (`thing`),
UNIQUE KEY `yet_another` (`thing`),
KEY `thing` (`thing`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
So look at all the baggage it you have to carry around with your upserts (read: slow extra unnecessary indexes).
So if you want it as lean as possible, as I mentioned in comments, unwind things first by dropping the FK's in the child tables, the referencing first. See This Answer.
Then drop the current non-unique parent key:
DROP INDEX index_name ON tbl_name;
Then add the unique key in the parent. This is the new referenced:
CREATE UNIQUE INDEX idxName ON tbl_name (colName);
Then add the FK's in the children (the referencing)
CREATE INDEX idxName ON child_tbl_name (colName);
You can get the key names by show create table theTableName or by SHOW INDEX. Use fresh names for the new ones, it doesn't matter.
Such as:
mysql> show index from t9;
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t9 | 0 | PRIMARY | 1 | id | A | 0 | NULL | NULL | | BTREE | | |
| t9 | 0 | thing_2 | 1 | thing | A | 0 | NULL | NULL | | BTREE | | |
| t9 | 0 | yet_another | 1 | thing | A | 0 | NULL | NULL | | BTREE | | |
| t9 | 1 | thing | 1 | thing | A | 0 | NULL | NULL | | BTREE | | |
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
ALTER TABLE `transaction`
DROP INDEX `transactions_873a2484`,
ADD UNIQUE(req_id);
You cannot turn a non-unique into UNIQUE, but the above should do the equivalent. The data will be unharmed.
I have found that MySQL (Win 7 64, 5.6.14) does not use index properly if I specify table output for IN statement. USER table contains 900k records.
If I use IN (_SOME_TABLE_OUTPUT_) syntax - I get fullscan for all 900k users. Query runs forever.
If I use IN ('CONCRETE','VALUES') syntax - I get a correct index usage.
How can I make MySQL finally USE the index?
1st case:
explain SELECT gu.id FROM USER gu WHERE gu.uuid in
(select '11b6a540-0dc5-44e0-877d-b3b83f331231' union
select '11b6a540-0dc5-44e0-877d-b3b83f331232');
+----+--------------------+------------+-------+---------------+------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+-------+---------------+------+---------+------+--------+--------------------------+
| 1 | PRIMARY | gu | index | NULL | uuid | 257 | NULL | 829930 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
| 3 | DEPENDENT UNION | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | NULL | NULL | Using temporary |
+----+--------------------+------------+-------+---------------+------+---------+------+--------+--------------------------+
2nd case:
explain SELECT gu.id FROM USER gu WHERE gu.uuid in
('11b6a540-0dc5-44e0-877d-b3b83f331231');
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| 1 | SIMPLE | gu | ref | uuid | uuid | 257 | const | 1 | Using where; Using index |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
Table structure:
CREATE TABLE `USER` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`version` bigint(20) NOT NULL,
`email` varchar(255) DEFAULT NULL,
`uuid` varchar(255) NOT NULL,
`partner_id` bigint(20) NOT NULL,
`password` varchar(255) DEFAULT NULL,
`date_created` datetime DEFAULT NULL,
`last_updated` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique-email` (`partner_id`,`email`),
KEY `uuid` (`uuid`),
CONSTRAINT `fk_USER_partner` FOREIGN KEY (`partner_id`) REFERENCES `partner` (`id`) ON DELETE CASCADE,
CONSTRAINT `FKB2D9FEBE725C505E` FOREIGN KEY (`partner_id`) REFERENCES `partner` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3315452 DEFAULT CHARSET=latin1
FORCE INDEX and USE INDEX statements don't change anything.
Demonstration SQLfiddle: http://sqlfiddle.com/#!2/c607e1/2
In fact I faced such problem before and it happened that I had one table that had a single column set as UTF-8 and the other tables where latin1. It did not matter what I did, MySQL insisted on using no indexes. The problem is quite well described on this blog post Slow queries in MySQL due to collation problems. Once you manage to fix the character set, I believe any of the queries will work.
An inner join on your virtual table might give you better performance. Try something along these lines.
SELECT gu.id
FROM USER gu
INNER JOIN (
select '11b6a540-0dc5-44e0-877d-b3b83f331231' uuid
union all
select '11b6a540-0dc5-44e0-877d-b3b83f331232') ids
on gu.uuid = ids.uuid;
I have a database with the following stats
Tables Data Index Total
11 579,6 MB 0,9 GB 1,5 GB
So as you can see the Index is close to 2x bigger. And there is one table with ~7 million rows that takes up at least 99% of this.
I also have two indexes that are very similar
a) UNIQUE KEY `idx_customer_invoice` (`customer_id`,`invoice_no`),
b) KEY `idx_customer_invoice_order` (`customer_id`,`invoice_no`,`order_no`)
Update: Here is the table definition (at least structurally) of the largest table
CREATE TABLE `invoices` (
`id` int(10) unsigned NOT NULL auto_increment,
`customer_id` int(10) unsigned NOT NULL,
`order_no` varchar(10) default NULL,
`invoice_no` varchar(20) default NULL,
`customer_no` varchar(20) default NULL,
`name` varchar(45) NOT NULL default '',
`archived` tinyint(4) default NULL,
`invoiced` tinyint(4) default NULL,
`time` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
`group` int(11) default NULL,
`customer_group` int(11) default NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `idx_customer_invoice` (`customer_id`,`invoice_no`),
KEY `idx_time` (`time`),
KEY `idx_order` (`order_no`),
KEY `idx_customer_invoice_order` (`customer_id`,`invoice_no`,`order_no`)
) ENGINE=InnoDB AUTO_INCREMENT=9146048 DEFAULT CHARSET=latin1 |
Update 2:
mysql> show indexes from invoices;
+----------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| invoices | 0 | PRIMARY | 1 | id | A | 7578066 | NULL | NULL | | BTREE | |
| invoices | 0 | idx_customer_invoice | 1 | customer_id | A | 17 | NULL | NULL | | BTREE | |
| invoices | 0 | idx_customer_invoice | 2 | invoice_no | A | 7578066 | NULL | NULL | YES | BTREE | |
| invoices | 1 | idx_time | 1 | time | A | 541290 | NULL | NULL | | BTREE | |
| invoices | 1 | idx_order | 1 | order_no | A | 6091 | NULL | NULL | YES | BTREE | |
| invoices | 1 | idx_customer_invoice_order | 1 | customer_id | A | 17 | NULL | NULL | | BTREE | |
| invoices | 1 | idx_customer_invoice_order | 2 | invoice_no | A | 7578066 | NULL | NULL | YES | BTREE | |
| invoices | 1 | idx_customer_invoice_order | 3 | order_no | A | 7578066 | NULL | NULL | YES | BTREE | |
+----------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
My questions are:
Is there a way to find unused indexes in MySQL?
Are there any common mistakes that impact the size of the index?
Can indexA safely be removed?
How can you measure the size of each index? All I get is the total of all indexes.
You can remove index A, because, as you have noted, it is a subset of another index. And it's possible to do this without disrupting normal processing.
The size of the index files is not alarming in itself and it can easily be true that the net benefit is positive. In other words, the usefulness and value of an index shouldn't be discounted because it results in a large file.
Index design is a complex and subtle art involving a deep understanding of the query optimizer explanations and extensive testing. But one common mistake is to include too few fields in an index in order to make it smaller. Another is to test indexes with insufficient, or insufficiently representative data.
I may be wrong, but the first index (idx_customer_invoice) is UNIQUE, the second (idx_customer_invoice_order) is not, so you'll probably lose the uniqueness constraint when you remove it. No?
Is there a way to find unused indexes in MySQL?
The database engine optimizer will select a proper index when attempting to optimize your query. Depending on when you collected statistics on your indexes last, the index which is chosen will vary. Unused indexes could suddenly become used because of new data repartition.
Can indexA safely be removed?
I would say yes, if indexA and indexB are B-Tree indexes. This is because an index that starts with the same columns in the same order will have the same structure.
use
show indexes from table;
to define what indexes do you have in a particular table. Cardinality would tell how useful your index is.
You can remove your indexes safely (it will not break a table), but beware: some queries might execute slower. First you should analyze your queries to decide whether you need a certain index or not.
I don't think you can find out data length of a particular index, though.
BUT, I think you probably think that if indexes length is greater than data length twice is something abnormal... Well, you are wrong. All of your indexes might be useful ;) If you have a table that provides a lot of information and you have to search on it upon a large number of column, it can easily be that indexes of this table will 2 times bigger in size that the tables data.
indexA can remove because there's a
indexB include indexA
what impact your index length is
your column type and column length
use:
select index_length from information_schema.tables
where table_name='your_table_name' and
table_schema='your_db_name';
get your table index_length