I have a table with about 500K jobs, each job has a unique ID which is used as primary key and a status that is used to indicate whether the job is pending, complete or failed. The status is an integer that is not key.
My problem is that a simple query where I try to select jobs based on the status takes too much time, more than 10 minutes. There are about 46 threads connected in the DB and I also did a restart but it didn't help the performance.
I pasted the table schema and the query I try to run here:
http://pastie.org/10416054
Is there any way to find what's the bottleneck and optimize the table so it doesn't take that long?
After hours I would rifle off the following command:
CREATE INDEX idx_qry_status ON queries(status);
As your query is doing a tablescan, employing no index whatsoever.
See the Manual page on Create Index.
A visual of the after, table-wise (not performance-wise):
create table queries
( id bigint auto_increment primary key,
status int null
-- partial definition
);
insert queries (status) values (7),(2),(1),(4),(1),(5),(9),(11);
CREATE INDEX idx_qry_status ON queries(status);
show indexes from queries;
+---------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| queries | 0 | PRIMARY | 1 | id | A | 8 | NULL | NULL | | BTREE | | |
| queries | 1 | idx_qry_status | 1 | status | A | 8 | NULL | NULL | YES | BTREE | | |
+---------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Related
I'm testing a proprietary tool that dumps a table in a MySQL RDS to the parquet format, and then restores it into another MySQL RDS.
Both tables have the same amount of rows:
mysql> SELECT COUNT(*) FROM fox_owners;
+----------+
| COUNT(*) |
+----------+
| 118950 |
+----------+
The table itself is configured the same way in both cases:
mysql> SHOW CREATE TABLE fox_owners;
+------------+-------------------------------------------------------+
| Table | Create Table |
+------------+-------------------------------------------------------+
| fox_owners | CREATE TABLE `fox_owners` (
`name` mediumtext,
`owner_id` bigint NOT NULL,
PRIMARY KEY (`owner_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+------------+-------------------------------------------------------+
So far so good, right?
However, the table sizes are different.
The original:
+----------+----------------------+------------+
| Database | Table | Size in MB |
+----------+----------------------+------------+
| stam_db | fox_owners | 5582.52 |
The restored one:
+----------+----------------------+------------+
| Database | Table | Size in MB |
+----------+----------------------+------------+
| stam_db | fox_owners | 5584.52 |
The restored one is 2MB bigger!
However, what's really bugging me, is the change in cardinality of the indexes between the 2 tables.
Original:
mysql> SHOW INDEX FROM fox_owners;
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| fox_owners | 0 | PRIMARY | 1 | owner_id | A | 118728 | NULL | NULL | | BTREE | | | YES | NULL |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
Restored:
mysql> SHOW INDEX FROM fox_owners;
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| fox_owners | 0 | PRIMARY | 1 | owner_id | A | 117518 | NULL | NULL | | BTREE | | | YES | NULL |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
Why would cardinality drop from 118728 to 117518?
If the restored table is less unique than the original - isn't this a clear sign that this table is different? How can I verify that these 2 tables in separate RDS databases have identical content?
And shouldn't the cardinality be 118950 in both of them anyway, since for a table with a single primary key column, the cardinality must be equal to the number of rows in the table?
I've ran ANALYZE TABLE on both tables, the values didn't change.
No Problem.
The "cardinality" is determined by making a small number of 'random' probes into the table. This leads to estimates. Sometimes the estimates are off by a factor of two or even more. 118728 and 117518 are unusually close to each other.
When loading/copying/altering a table, the BTrees are rebuilt. This leads to a likely variation of how the blocks of the BTree are laid out. So, it is normal to see the size (on disk) of a table change. A change of a factor of 2 is rare for this.
We are having a large table (Having arround 160 million records) in MySql 5.5.
The machine having 4GB RAM where we installed our mysql
table schema
+---------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+---------------+------+-----+---------+-------+
| domain | varchar(50) | YES | MUL | NULL | |
| uid | varchar(100) | YES | | NULL | |
| sid | varchar(100) | YES | MUL | NULL | |
| vurl | varchar(2500) | YES | | NULL | |
| ip | varchar(20) | YES | | NULL | |
| ref | varchar(2500) | YES | | NULL | |
| stats_time | datetime | YES | MUL | NULL | |
| country | varchar(50) | YES | | NULL | |
| region | varchar(50) | YES | | NULL | |
| place | varchar(50) | YES | | NULL | |
| email | varchar(100) | YES | MUL | NULL | |
+---------------+---------------+------+-----+---------+-------+
Indexes
+------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| visit_views | 1 | sid_index | 1 | sid | A | 157531031 | NULL | NULL | YES | BTREE | | |
| visit_views | 1 | domain_index | 1 | domain | A | 17 | NULL | NULL | YES | BTREE | | |
| visit_views | 1 | email_index | 1 | email | A | 392845 | NULL | NULL | YES | BTREE | | |
| visit_views | 1 | stats_time_index | 1 | stats_time | A | 78765515 | NULL | NULL | YES | BTREE | | |
+------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Example query
SELECT count(*)
FROM visit_views
WHERE domain ='our'
AND email!=''
AND stats_time BETWEEN '2010-06-21 00:00:00' AND '2015-08-21 00:00:00';
We are having very slow performance on queries like above, So I want to add composite index on this table
I ran following command
ALTER TABLE visit_views ADD INDEX domain_statstime_email (domain,stats_time,email);
after running this command , our table got locked, it has reached connection limit (connect limit is 1000). Now table is not responding for any INSERTS and SELECTS.
Here are my few questions
1.Why table got locked and why table is not releasing existing connections
2.How much time it will take to complete the index. I applied 3 hours back still index not created.
3.How to see index creation progress.
4.Why connection limit suddenly increasing to max while adding index to table.
5.Is it safe to add composite indexes for this kind of large table
6.If I add partitions for this table, will it any better performance.
I don't know much about indexes
some stats
+---------------------------+
| ##innodb_buffer_pool_size |
+---------------------------+
| 3221225472 |
+---------------------------+
Your query has three conditions: an inequality, an equality, and a range.
WHERE domain ='our'
AND email!=''
AND stats_time BETWEEN '2010-06-21 00:00:00' AND '2015-08-21 00:00:00';
To make this work, you should try the following indexes to see which one works better.
(email, domain, stats_time)
(domain, email, stats_time)
Why these? MySQL indexes are BTREE. That is, they're sorted in order. So to satisfy the query MySQL finds the first element in the index matching your query. That's based on domain, email, and the starting stats_time value. It then scans the index sequentially looking for the last matching value. Along the way it counts the records, and that satisfies your query. In other words it does a range scan on stats_time.
Why the choice? I don't know what MySQL will do with the inequality in your email matching predicate. That's why I suggest trying both.
If you have not simplified the query you showed us, you also might try a compound covering index on
(domain, stats_time, email)
This will random-access immediately to the first matching domain/stats_time combination, and then scan to the last one. As it scans, it will look at the email values from the index (that's why this is called a covering index) and pick out the rows matching. Along the way it counts the rows.
You should consider declaring your email column NOT NULL to help your inequality test use its index more efficiently. Read http://use-the-index-luke.com/ for good background information.
As to your questions:
Why table got locked and why table is not releasing existing connections
Why connection limit suddenly increasing to max while adding index to table.
It can take a long time to add an index to a large table. Yours, at 160 megarows, is large. While that indexing operation us going on, other users of the table must wait. So, if you're accessing this from a web app, the connections pile up waiting for it to become available.
How much time it will take to complete the index. I applied 3 hours back still index not created.
It will be much faster on a quiet system. It is also possible you have some redundant single-column indexes you could drop. You may wish to copy the table and index the copy, then, when it's ready, rename it.
How to see index creation progress.
SHOW FULL PROCESSLIST will display all the action in your MySQL server. You'll need a command line interface to give this command.
Is it safe to add composite indexes for this kind of large table
Yes, of course, but it takes time on a production system.
If I add partitions for this table, will it any better performance.
Probably not. What WILL help is DELETEing rows that are old, if you don't need them.
I have table with following schema:
+----------------------+--------------+--------------+-----+---------+-----------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+--------------+--------------+-----+---------+-----------+
| request_id | bigint(20) | NO | PRI | | |
| marketplace_id | int(11) | NO | PRI | | |
| feed_attribute_name | varchar(256) | NO | PRI | | |
| full_update_count | int(11) | NO | | | |
| partial_update_count | int(11) | NO | | | |
| ptd | varchar(256) | NO | PRI | | |
| processed_date | datetime | NO | PRI | | |
+----------------------+--------------+--------------+-----+---------+-----------+
and I am querying it like this:
EXPLAIN SELECT SUM(full_update_count) as total FROM
x.attribute_usage_information WHERE marketplace_id=6
AND ptd='Y' AND processed_date>2013-12-31 AND
feed_attribute_name='abc'
The query plan is:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE X ALL 1913668816 Using where
I am new to query optimization so my inferences can be wrong.
I am surprised that it is not using index which can be a reason for its slow execultion(around an hour). The table size is of order of 10^10. Can this query be rewritten so that it uses index because where clause is part a subset of the primary key set for the table?
EDIT: SHOW INDEX result
+---------------------------+------------+------------+--------------+----------------+------
|Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation Cardinality Sub_part Packed Null Index_type Comment
|attribute_usage_information | 0 | PRIMARY | 1 | request_id | A 2901956 BTREE
|attribute_usage_information | 0 | PRIMARY | 2 | marketplace_id | A 2901956 BTREE
|attribute_usage_information | 0 | PRIMARY | 3 | | feed_attribute_name A 273613033 BTREE
|attribute_usage_information | 0 | PRIMARY | 4 | ptd | A 1915291236 BTREE
|attribute_usage_information | 0 | PRIMARY | 5 | processed_date | A 1915291236 BTREE
EDIT 2: SHOW GRANT RESULT
GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD, PROCESS, REFERENCES, INDEX, ALTER, SHOW DATABASES, CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER ON *.* TO 'data_usage_rw'#'%' IDENTIFIED BY PASSWORD *** WITH GRANT OPTION
Your query:
SELECT SUM(full_update_count) as total
FROM x.attribute_usage_information
WHERE marketplace_id=6 AND ptd='Y' AND processed_date>2013-12-31 AND
feed_attribute_name='abc';
The "using where" is saying that MySQL is doing a full table scan. This is a simple query, so the only optimization approach is to create an index that reduces the number of rows being processed. he best index for this query is x.attribute_usage_information(marketplace_id, ptd, feed_attribute_name, processed_date, full_update_count).
You can create it as:
create index attribute_usage_information_idx on x.attribute_usage_information(marketplace_id, ptd, feed_attribute_name, processed_date, full_update_count);
By including full_update_count, this is a covering index. That further speeds the query because all columns used in the query are in the index. The execution engine does not need to look up values on the original data pages.
Cover your WHERE conditions with a composite index(marketplace_id,ptd,processed_date,feed_attribute_name)
ALTER TABLE `tablename` ADD INDEX (marketplace_id,ptd,processed_date,feed_attribute_name)
Be patient,it will take a while.
I am using MySQL 5.6 on Linux (RHEL). The database client is a Java program. The table in question (MyISAM or InnoDB, have tried both) has a multicolumn index comprising two integers (id's from other tables) and a timestamp.
I want to delete records which have timestamps before a given date. I have found that this operation is relatively slow (on the order of 30 seconds in a table which has a few million records). But I've also found that if the other two fields in the index are specified, the operation is much faster. No big surprise there.
I believe I could query the two non-timestamp tables for their index values and then loop over the delete operation, specifying one value of each id each time. I hope that wouldn't take too long; I haven't tried it yet. But it seems like I should be able to get MySQL to do the looping for me. I tried a query of the form
delete from mytable where timestamp < '2013-08-17'
and index1 in (select id from foo)
and index2 in (select id from bar);
but that's actually slower than
delete from mytable where timestamp < '2013-08-17';
Two questions. (1) Is there something I can do to speed up delete operations which depend only on timestamp? (2) Failing that, is there something I can do to get MySQL to loop over two other two index columns (and do it quickly)?
I actually tried this operation with both MyISAM and InnoDB tables with the same data -- they are approximately equally slow.
Thanks in advance for any light you can shed on this problem.
EDIT: More info about the table structure. Here is the output of show create table mytable:
CREATE TABLE `mytable` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`timestamp` datetime NOT NULL,
`fooId` int(10) unsigned NOT NULL,
`barId` int(10) unsigned NOT NULL,
`baz` double DEFAULT NULL,
`quux` varchar(16) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `fooId` (`fooId`,`barId`,`timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=14221944 DEFAULT CHARSET=latin1 COMMENT='stuff'
Here is the output of show indexes from mytable:
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
|mytable| 0 | PRIMARY | 1 | id | A | 2612681 | NULL | NULL | | BTREE | | |
|mytable| 0 | fooId | 1 | fooId | A | 20 | NULL | NULL | | BTREE | | |
|mytable| 0 | fooId | 2 | barId | A | 3294 | NULL | NULL | | BTREE | | |
|mytable| 0 | fooId | 3 | timestamp | A | 2612681 | NULL | NULL | | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
EDIT: More info -- output from "explain".
mysql> explain delete from mytable using mytable inner join foo inner join bar where mytable.fooId=foo.id and mytable.barId=bar.id and timestamp<'2012-08-27';
+----+-------------+-------+-------+---------------+---------+---------+-------------------------------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------------------------------+------+----------------------------------------------------+
| 1 | SIMPLE | foo | index | PRIMARY | name | 257 | NULL | 26 | Using index |
| 1 | SIMPLE | bar | index | PRIMARY | name | 257 | NULL | 38 | Using index; Using join buffer (Block Nested Loop) |
| 1 | SIMPLE |mytable| ref | fooId | fooId | 8 | foo.foo.id,foo.bar.id | 211 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+-------------------------------+------+----------------------------------------------------+
Use the multiple-table DELETE syntax to join the tables:
DELETE mytable
FROM mytable
JOIN foo ON foo.id = mytable.index1
JOIN bar ON bar.id = mytable.index2
WHERE timestamp < '2013-08-17'
I think that this should perform particularly well if mytable has a composite index over (index1, index2, timestamp) (and both foo and bar have indexes on their id columns, which will of course be the case if those columns are PK).
Forget about the other two ids. Add an index just on the time stamp. Otherwise you may be traversing the whole table.
I created a Java application that uses Hibernate ORM, with Hibernate tools I get an automated script that install or upgrades the DB schema from the Java objects used as entities.
The program works properly in MySQL, however for Oracle an error is triggered when in one column the constraint "unique" is declared and after an index is attempted to be defined. Oracle says that a "unique" constraint creates an index by default, so two indexes on the same column cannot be declared,
So, my question is if in MySQL there's an equivalence or relation between the unique constraint and one index.
Please clarify. Thanks in advanced.
A unique constraint requires a index so it can be enforced. Both DBMS create an appropriate index when you declare columns as unique. The only difference is that Oracle prevents you from creating redundant indexes but MySQL doesn't:
show index from test_table;
+------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| test_table | 0 | PRIMARY | 1 | id | A | 0 | NULL | NULL | | BTREE | |
| test_table | 0 | foo_unique | 1 | foo | A | 0 | NULL | NULL | | BTREE | |
| test_table | 1 | foo_key | 1 | foo | A | 0 | NULL | NULL | | BTREE | |
+------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
MySQL just doesn't care for useless indexes. For updates it will check all unique indexes and for SELECTs it will pick an arbitrary index.
So to make Oracle happy, delete the index before creating the unique index.