I have a table with around 500,000 rows, with a composite primary key index. I'm trying a simple select statement as follows
select * from transactions where account_id='1' and transaction_id='003a4955acdbcf72b5909f122f84d';
The explain statement gives me this
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
-------------------------------------------------------------------------------------------------------------------------------------
1 | SIMPLE | transactions | NULL | const | PRIMARY | PRIMARY | 94 | const,const | 1 | 100.00 | NULL
My primary index is on account_id and transaction_id.
My engine is InnoDB.
When I run my query it takes around 156 milliseconds.
Given that explain shows that only one row needs to be examined, I'm not sure how to optimize this further. What changes can I do to significantly improve this?
I'm going to speculate a bit, given the information provided: your primary key is composed of an integer field account_id and a varchar one called transaction_id.
Since they're both components of the PRIMARY index created when you defined them as PRIMARY KEY(account_id, transaction_id), as they are they're the best you can have.
I think the bottleneck here is the transaction_id: as a string, it requires more effort to be indexed, and to be searched for. Changing its type to a different, easier to search one (i.e. numbers) would probably help.
The only other improvement I see is to simplify the PRIMARY KEY itself, either by removing the account_id field (it seems useless to me, since the transaction_id tastes like being unique, but that depends on your design) or by substitute the whole key with an integer, AUTO INCREMENT value (not recommended).
Related
If you use a count on a non-null-column, on one table, without any where-parts, the optimaizer just return the number of rows in that table.
If you ask for a DISTINCT count on a UNIQE non-null-column, like the PRIMARY KEY, the answers should be the same, but this time mariadb do the calculations insted.
And if you have left join on other tables, and still no where-parts, the results should still be the number of rows in that table.
Is there a reason for mariadb not using thous optimizations? Is there case when the DISTINCT count of an unfiltered primary key, could give any other result then the number of rows in that tabel?
case:
CREATE TABLE products (
our_article_id varchar(50) CHARACTER SET utf8 NOT NULL,
...,
PRIMARY KEY(our_article_id)
);
CREATE TABLE product_article_id (
article_id varchar(255) COLLATE utf8_bin NOT NULL,
our_article_id varchar(50) CHARACTER SET utf8 NOT NULL,
...
PRIMARY KEY(article_id),
INDEX(our_article_id)
);
Count queries, 1st, basic count
DESCRIBE SELECT COUNT(our_article_id) FROM products;
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
2nd DISTINCT on primary key
DESCRIBE SELECT COUNT(DISTINCT our_article_id) FROM products;
+------+-------------+----------+-------+---------------+---------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------+-------+---------------+---------+---------+------+--------+-------------+
| 1 | SIMPLE | products | index | NULL | PRIMARY | 152 | NULL | 225089 | Using index |
+------+-------------+----------+-------+---------------+---------+---------+------+--------+-------------+
3th, DISTINCT on PRIMARY KEY, and a LEFT JOIN without WHERE-parts
DESCRIBE SELECT COUNT(DISTINCT our_article_id) FROM products LEFT JOIN product_article_id USING (our_article_id);
+------+-------------+--------------------+-------+---------------+---------+---------+----------------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------------+-------+---------------+---------+---------+----------------------------------+--------+-------------+
| 1 | SIMPLE | products | index | NULL | PRIMARY | 152 | NULL | 225089 | Using index |
| 1 | SIMPLE | product_article_id | ref | PRIMARY | PRIMARY | 152 | testseek.products.our_article_id | 12579 | Using index |
+------+-------------+--------------------+-------+---------------+---------+---------+----------------------------------+--------+-------------+
"Is there a reason for mariadb not using thous optimizations?" -- There are a zillion missing optimizations in MySQL/MariaDB; that's missing. Let's look at the history.
MySQL started about 2 decades ago as a lean and mean database engine. It focused on features that most people needed, while minimizing the overhead. This meant that a lot of rare optimizations were not in the early releases, and only get added over time if they seem important enough.
Take the PRIMARY KEY, for example. It is defined as UNIQUE. It is BTree organized. And, with InnoDB, it is also defined as Clustered. Other vendors allow various combinations clustering, non-BTree indexing, etc. MySQL decided that the limitations were "good enough" for "most" people.
Over the years, the 'worst' omissions have been gradually fixed. Transactions is probably the biggest and most important. It arrived in 2001(?), and MyISAM is being removed this year (2016) with the advent of 8.0.
4.1 (2002?) saw subqueries. Before that, creating a tmp table was "good enough". Now (8.0) subqueries are being one-upped by CTEs, which covers a few things that neither tmp tables nor subqueries can do efficiently.
There have been a huge number of optimizations put into MySQL 5.6 and 5.7 and MariaDB 10.x; you probably have not used more than a couple of them. The product is into "diminishing returns". It would damage its "lean and mean" heritage if it slowed down the optimizer to check for the next thousand extremely rare optimizations.
Meanwhile, guys like me spend a lot of time saying "MySQL/MariaDB doesn't have that; here's the workaround". It's the shorter COUNT(*) in your case. Since there is a clean workaround, it may be another decade before your suggestions are implemented. It is OK to file a bug report with bugs.mysql.com or mariadb.com to suggest the optimizations.
Another, almost never needed case, is INDEX(a ASC, b DESC) as a way of optimizing ORDER BY a ASC, b DESC. That is coming with 8.0. But I doubt if more than one query in 5,000 really needs it. (I have seen a lot of queries.) I suggest that its rarity is why it took two decades to implement it. The lack of a clean workaround is why it did not take another decade.
I was expecting this query to use a key.
mysql> DESCRIBE TABLE Foo;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| name | varchar(50) | NO | UNI | NULL | |
+-------+-------------+------+-----+---------+----------------+
mysql> EXPLAIN SELECT id FROM Foo WHERE name='foo';
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
Foo has a unique index on name, so why isn't the index being used in the SELECT?
From the MySQL Manual page entitled EXPLAIN Output Format:
Impossible WHERE noticed after reading const tables (JSON property:
message)
MySQL has read all const (and system) tables and notice that the WHERE
clause is always false.
and the definition of const tables, from the Page entitled Constants and Constant Tables:
A MySQL constant is something more than a mere literal in the query.
It can also be the contents of a constant table, which is defined as
follows:
A table with zero rows, or with only one row
A table expression that is restricted with a WHERE condition,
containing expressions of the form column = constant, for all the
columns of the table's primary key, or for all the columns of any of
the table's unique keys (provided that the unique columns are also
defined as NOT NULL).
The second reference is a page and half long. Please refer to it.
const
const
The table has at most one matching row, which is read at the start of
the query. Because there is only one row, values from the column in
this row can be regarded as constants by the rest of the optimizer.
const tables are very fast because they are read only once.
const is used when you compare all parts of a PRIMARY KEY or UNIQUE
index to constant values. In the following queries, tbl_name can be
used as a const table:
SELECT * FROM tbl_name WHERE primary_key=1;
SELECT * FROM tbl_name WHERE primary_key_part1=1 AND
primary_key_part2=2;
It could be because that the said table Foo very less volume of data. In such case optimizer will choose to do table scan rather than looking through index.
As MySQL Documentation clearly says
Indexes are less important for queries on small tables, or big tables
where report queries process most or all of the rows. When a query
needs to access most of the rows, reading sequentially is faster than
working through an index. Sequential reads minimize disk seeks, even
if not all the rows are needed for the query.
I have encountered a problem when joining two tables. One large table with 140M rows and the other is small table with 100 rows while joining on the primary key.
The two tables are:
DataTable
{
Date timestamp,
Hash varchar(20),
Type varchar(20),
Purchases int,
Store varchar(20),
Primary key (Date, Hash)
}
DataTable is a very big table with 140M rows
ProductTable
{
Hash varchar(20),
Name varchar(20),
Primary key (Hash)
}
ProductTable is small table with only 100 rows
I ran two separate queries
Select sum(DataTable.Purchases),DataTable.Store
from DataTable
Join ProductTable on ProductTable.Hash = DataTable.Hash
Where Type =2
and Date<='2015-12-31'
and Date>='2015-1-1'
group by DataTable.Store
This took very long time (actually never ends). When running explain this showed that it processed almost the half of table. As shown in this explain:
select_type |table |type |possible_keys|key | key_len | ref |rows |Extra |
-----------------------------------------------------------------------------
Simple |DataTable |All |Date, Hash |Null | Null | Null |7*10^7|using where
Simlpe |ProductTable| eq_ref |PRIMARY PRIMARY | 386 | ProductTable.Hash | 1 | Using where |
Just for kicks I took all the relevant hashes from the ProductTable and put them in a Where In clause. Like the following:
Select sum(DataTable.Purchases),DataTable.Store
From DataTable
Where DataTable.Hash in ("1ha84u","1ha850","1ha851",...,"1hl931")
Type =2
and DataTable.Date<='2015-12-31'
and DataTable.Date>='2015-12-1'
group by DataTable.Store
This resulted in much better performance - taking less than 2 seconds and scanning less rows.
select_type |table |type |possible_keys|key | key_len | ref |rows |Extra |
----------------------------------------------------------------------------
Simple |DataTable |range |Date, Hash |Date, Hash | 62 | Null|11097|using where
I do not understand why the Primary key wasn't used for the first query and why the second one resulted in much better performance.
I have made sure that the result was not cached by MySql between runs.
Try to add index (Hash, Date) to DataTable
I'm running a fairly simple auto catalog
CREATE TABLE catalog_auto (
id INT(10) UNSIGNED NOT NULL auto_increment,
make varchar(35),
make_t varchar(35),
model varchar(40),
model_t varchar(40),
model_year SMALLINT(4) UNSIGNED,
fuel varchar(35),
gearbox varchar(15),
wd varchar(5),
engine_cc SMALLINT(4) UNSIGNED,
variant varchar(40),
body varchar(30),
power_ps SMALLINT(4) UNSIGNED,
power_kw SMALLINT(4) UNSIGNED,
power_hp SMALLINT(4) UNSIGNED,
max_rpm SMALLINT(5) UNSIGNED,
torque SMALLINT(5) UNSIGNED,
top_spd SMALLINT(5) UNSIGNED,
seats TINYINT(2) UNSIGNED,
doors TINYINT(1) UNSIGNED,
weight_kg SMALLINT(5) UNSIGNED,
lkm_def TINYINT(3) UNSIGNED,
lkm_mix TINYINT(3) UNSIGNED,
lkm_urb TINYINT(3) UNSIGNED,
tank_cap TINYINT(3) UNSIGNED,
co2 SMALLINT(5) UNSIGNED,
PRIMARY KEY(id),
INDEX `gi`(`make`,`model`,`model_year`,`fuel`,`gearbox`,`wd`,`engine_cc`),
INDEX `mkt`(`make`,`make_t`),
INDEX `mdt`(`make`,`model`,`model_t`)
);
The table has about 60.000 rows so far, so, nothing that simple queries, even without indexes, couldn't handle.
The point is, i'm trying to get the hang of using indexes, so i made a few, based on my most frequent queries.
Say i want engine_cc for a specific set of criteria like so:
SELECT DISTINCT engine_cc FROM catalog_auto WHERE make='audi' AND model='a4' and model_year=2006 AND fuel='diesel' AND gearbox='manual' AND wd='front';
EXPLAIN says:
+----+-------------+--------------+------+---------------+------+---------+-------------------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+-------------------------------------+------+--------------------------+
| 1 | SIMPLE | catalog_auto | ref | gi,mkt,mdt | gi | 408 | const,const,const,const,const,const | 8 | Using where; Using index |
+----+-------------+--------------+------+---------------+------+---------+-------------------------------------+------+--------------------------+
The query is using gi index as expected, no problem here.
After selecting base criteria, i need the rest of the columns as well:
SELECT * FROM catalog_auto WHERE make='audi' AND model='a4' and model_year=2006 AND fuel='diesel' AND gearbox='manual' AND wd='front' AND engine_cc=1968;
EXPLAIN says:
+----+-------------+--------------+------+---------------+------+---------+-------------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+-------------------------------------------+------+-------------+
| 1 | SIMPLE | catalog_auto | ref | gi,mkt,mdt | gi | 411 | const,const,const,const,const,const,const | 3 | Using where |
+----+-------------+--------------+------+---------------+------+---------+-------------------------------------------+------+-------------+
It selected a KEY but NOT using the index. The query however, is very fast(1 row in set (0.00 sec)), but since the table doesn't have that many rows, i assume even without indexing, it would be the same.
Tried it like this:
SELECT * FROM catalog_auto WHERE id IN (SELECT id FROM catalog_auto WHERE make='audi' AND model='a6' AND model_year=2009);
Again, in EXPLAIN:
+----+--------------------+--------------+-----------------+--------------------+---------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------+-----------------+--------------------+---------+---------+------+-------+-------------+
| 1 | PRIMARY | catalog_auto | ALL | NULL | NULL | NULL | NULL | 59060 | Using where |
| 2 | DEPENDENT SUBQUERY | catalog_auto | unique_subquery | PRIMARY,gi,mkt,mdt | PRIMARY | 4 | func | 1 | Using where |
+----+--------------------+--------------+-----------------+--------------------+---------+---------+------+-------+-------------+
Still NOT using any index, not even PRIMARY KEY. Shouldn't this, at least use the PRIMARY KEY?
Documentation says: MySQL can ignore a key even if it finds one, if it determines that a full table scan would be faster, depending on the query.
Is that the reason why it's not using any of the indexes? Is this a good practice? If not, how would you recommend indexing columns, for a SELECT * statement, to always use an index, given the above query.
I'm not much of a MySQL expert, so any pointers would be greatly appreciated.
Using MySQL 5.5 with InnoDB.
I'm basically saying the same answer that #DStanley said, but I want to expand on it more than I can fit in a comment.
The "Using index" note means that the query is using only the index to get the columns it needs.
The absence of this note doesn't mean the query isn't using an index.
What you should look at is the key column in the EXPLAIN report:
+----+-------------+--------------+------+---------------+------+---------+-------------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+-------------------------------------------+------+-------------+
| 1 | SIMPLE | catalog_auto | ref | gi,mkt,mdt | gi | 411 | const,const,const,const,const,const,const | 3 | Using where |
+----+-------------+--------------+------+---------------+------+---------+-------------------------------------------+------+-------------+
The key column says the optimizer chooses to use the gi index. So it is using an index. And the ref column confirms that's referencing all seven columns of that index.
The fact that it must fetch more of the columns to return * means it can't claim "Using [only] index".
Also read this excerpt from https://dev.mysql.com/doc/refman/5.6/en/explain-output.html:
Using index
The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row. This strategy can be used when the query uses only columns that are part of a single index.
I think of this analogy, to a telephone book:
If you look up a business in a phone book, it's efficient because the book is alphabetized by the name. When you find it, you also have the phone number right there in the same entry. So if that's all you need, it's very quick. That's an index-only query.
If you want extra information about the business, like their hours or credentials or whether they carry a certain product, you have to do the extra step of using that phone number to call them and ask. That's a couple of extra minutes of time to get that information. But you were still able to find the phone number without having to read the entire phone book, so at least it didn't take hours or days. That's a query that used an index, but had to also go look up the row from the table to get other data.
I'm not a MySQL expert, but my guess is that the index was used for the row lookup, but the actual data has to be retrieved from the data pages, so an additional lookup is necessary.
In your first query, the data you ask for is available by looking only at the index keys. When you ask for columns that aren't in the index in the second and third queries, the engine uses the key to do a SEEK on the data tables, so it's still very fast.
With SQL performance, since the optimizer has a lot of freedom to choose the "best" plan, the proof is in the pudding when it comes to indexing. If adding an index makes a common query faster, great, use it. If not, then save the space and overhead of maintaining the index (or look for a better index).
Note that you don't get a free lunch - additional indices can actually slow down a system, particularly if you have frequent inserts or updates on columns that are indexed, since the systme will have to constantly maintain those indices.
I have the following MySQL table (simplified):
CREATE TABLE `track` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(256) NOT NULL,
`is_active` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `is_active` (`is_active`, `id`)
) ENGINE=MyISAM AUTO_INCREMENT=7495088 DEFAULT CHARSET=utf8
The 'is_active' column marks rows that I want to ignore in most, but not all, of my queries. I have some queries that read chunks out of this table periodically. One of them looks like this:
SELECT id,title from track where (track.is_active=1 and track.id > 5580702) ORDER BY id ASC LIMIT 10;
This query takes over a minute to execute. Here's the execution plan:
> EXPLAIN SELECT id,title from track where (track.is_active=1 and track.id > 5580702) ORDER BY id ASC LIMIT 10;
+----+-------------+-------+------+----------------+--------+---------+-------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------+--------+---------+-------+---------+-------------+
| 1 | SIMPLE | t | ref | PRIMARY,is_active | is_active | 1 | const | 3747543 | Using where |
+----+-------------+-------+------+----------------+--------+---------+-------+---------+-------------+
Now, if I tell MySQL to ignore the 'is_active' index, the query happens instantaneously.
> EXPLAIN SELECT id,title from track IGNORE INDEX(is_active) WHERE (track.is_active=1 AND track.id > 5580702) ORDER BY id ASC LIMIT 10;
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+
| 1 | SIMPLE | t | range | PRIMARY | PRIMARY | 4 | NULL | 1597518 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+
Now, what's really strange is that if I FORCE MySQL to use the 'is_active' index, the query once again happens instantaneously!
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+
| 1 | SIMPLE | t | range | is_active |is_active| 5 | NULL | 1866730 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+
I just don't understand this behavior. In the 'is_active' index, rows should be sorted by is_active, followed by id. I use both the 'is_active' and 'id' columns in my query, so it seems like it should only need to do a few hops around the tree to find the IDs, then use those IDs to retrieve the titles from the table.
What's going on?
EDIT: More info on what I'm doing:
Query cache is disabled
Running OPTIMIZE TABLE and ANALYZE TABLE had no effect
6,620,372 rows have 'is_active' set to True. 874,714 rows have 'is_active' set to False.
Using FORCE INDEX(is_active) once again speeds up the query.
MySQL version 5.1.54
It looks like MySQL is making a poor decision about how to use the index.
From that query plan, it is showing it could have used either the PRIMARY or is_active index, and it has chosen is_active in order to narrow by track.is_active first. However, it is only using the first column of the index (track.is_active). That gets it 3747543 results which then have to be filtered and sorted.
If it had chosen the PRIMARY index, it would be able to narrow down to 1597518 rows using the index, and they would be retrieved in order of track.id already, which should require no further sorting. That would be faster.
New information:
In the third case where you are using FORCE INDEX, MySQL is using the is_active index but now instead of only using the first column, it is using both columns (see key_len). It is therefore now able to narrow by is_active and sort and filter by id using the same index, and since is_active is a single constant, the ORDER BY is satisfied by the second column (ie the rows from a single branch of the index are already in sorted order). This seems to be an even better outcome than using PRIMARY - and probably what you intended in the first place, right?
I don't know why it wasn't using both columns of this index without FORCE INDEX, unless the query has changed in a subtle way in between. If not I'd put it down to MySQL making bad decisions.
I think the speedup is due to your where clause. I am assuming that it is only retrieving a small subset of the rows in the entire large table. It is faster to do a table scan of the retrieved data for is_active on the small subset than to do the filtering through a large index file. Traversing a single column index is much faster than traversing a combined index.
Few things you could try:
Do an OPTIMIZE and CHECK on your table, so mysql will re-calculate index values
have a look at http://dev.mysql.com/doc/refman/5.1/en/index-hints.html - you can tell mysql to choose the right index in different cases