mysql: why below query unused union index? - mysql

table a
+----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| uid | int(11) | YES | MUL | NULL | |
| channel | varchar(20) | YES | | NULL | |
| createAt | datetime | YES | | NULL | |
+----------+-------------+------+-----+---------+-------+
table a index: a_index_uid_createAt` (`uid`,`createAt`)
table b:
+-----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| uid | int(11) | NO | PRI | NULL | |
| date | date | YES | MUL | NULL | |
| channel | varchar(20) | YES | MUL | NULL | |
| gender | smallint(6) | YES | MUL | NULL | |
| chargeAmt | int(11) | YES | | 0 | |
| revised | smallint(6) | YES | | 0 | |
| createAt | datetime | YES | | NULL | |
+-----------+-------------+------+-----+---------+-------+
query st:
select DATE(a.createAt) date,a.channel,b.chargeAmt
FROM a, b
where a.uid = b.uid
and a.createAt >= '2021-05-10 00:00:00'
and a.createAt <= '2021-05-10 23:59:59';
explain:
+----+-------------+-------+--------+-------------------------------------------------+---------+---------+--------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------------------------------+---------+---------+--------------+--------+-------------+
| 1 | SIMPLE | a | ALL | a_index_uid_createAt | NULL | NULL | NULL | 172725 | Using where |
| 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4 | xiehou.r.uid | 1 | |
+----+-------------+-------+--------+-------------------------------------------------+---------+---------+--------------+--------+-------------+
why? a_index_uid_createAt index invalid!

Please use the JOIN .. ON syntax:
select DATE(a.createAt) date, a.channel, b.chargeAmt
FROM a
JOIN b ON a.uid = b.uid -- How the tables are related
WHERE a.createAt >= '2021-05-10 -- filtering
and a.createAt < '2021-05-10 + INTERVAL 1 DAY; -- filtering
The Optimizer, when it sees a JOIN, starts by deciding which table to start with. The preferred table is the one with filtering, namely a.
To do the filtering, it needs an index that starts with the columns mentioned in the WHERE clause.
The other table will be reached by looking at the ON, which seems to have PRIMARY KEY(uid)
So, the only useful index is
a: INDEX(createAt)
Any INDEX(uid, ...) is likely to be unused, since it starts with an existing index, namely PRIMARY KEY(uid).
(In the future, please use SHOW CREATE TABLE; it is more descriptive than DESCRIBE.)

Related

slow query even with the index on large table

I am performing a simple select query to extract username from table logs(containing 54864 rows).
It took about 7.836s to retrieve data.
How can I speed up the performace???
SELECT username FROM `logs`
WHERE
logs.branch=1
and
logs.added_on > '2016-11-27 00:00:00'
On describing table,
+-------------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| username | char(255) | YES | MUL | NULL | |
| fullname | char(255) | YES | | NULL | |
| package | char(255) | YES | | NULL | |
| prev_expiry | date | YES | | NULL | |
| recharged_upto | date | YES | | NULL | |
| payment_option | int(11) | YES | MUL | NULL | |
| amount | float(14,2) | YES | | NULL | |
| branch | int(11) | YES | MUL | NULL | |
| added_by | int(11) | YES | | NULL | |
| added_on | datetime | YES | MUL | NULL | |
| remark | text | YES | | NULL | |
| payment_mode | char(255) | YES | | NULL | |
| recharge_duration | char(255) | YES | | NULL | |
| invoice_number | char(255) | YES | | NULL | |
| cheque_no | char(255) | YES | | NULL | |
| bank_name | char(255) | YES | | NULL | |
| verify_by_ac | int(11) | YES | | 0 | |
| adjusted_days | int(11) | YES | | NULL | |
| adjustment_note | text | YES | | NULL | |
+-------------------+-------------+------+-----+---------+----------------+
20 rows in set
On explaining query,
+----+-------------+--------------------------+------+-----------------------------+--------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------------+------+-----------------------------+--------------+---------+-------+------+-------------+
| 1 | SIMPLE | logs | ref | branch_index,added_on_index | branch_index | 5 | const | 37 | Using where |
+----+-------------+--------------------------+------+-----------------------------+--------------+---------+-------+------+-------------+
1 row in set
updated:: explaing query after adding composite index(branch_added_index )
+----+-------------+--------------------------+------+------------------------------------------------+--------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------------+------+------------------------------------------------+--------------+---------+-------+------+-------------+
| 1 | SIMPLE | logs | ref | branch_index,added_on_index,branch_added_index | branch_index | 5 | const | 37 | Using where |
+----+-------------+--------------------------+------+------------------------------------------------+--------------+---------+-------+------+-------------+
1 row in set
Add a composite key on branch,added_on so you cover all the WHERE conditions since you use AND.
ALTER TABLE logs ADD KEY(branch,added_on)
This should be much faster,also you can drop the branch_index key since the above index can replace it.You only return 37 rows from 54000 so the cardinality is OK.
ALTER TABLE logs DROP INDEX `branch_index`;
Or you can use index hints
SELECT username FROM `logs` USE INDEX (branch_added_index) WHERE
logs.branch=1
and
logs.added_on > '2016-11-27 00:00:00'
if your table's existing index is already used for other queries try adding new composite index like below
create index <indexname> on logs(branch,added_on)
create a composite index on 2 fields (branch, added_on) like:
ALTER TABLE `logs` ADD KEY idx_branch_added (branch, added_on);
This will be even faster, because it is "covering":
INDEX(branch, added_on, username) -- in exactly that order.
(And drop any indexes that are prefixes of this.)
Index Cookbook
"Cardinality" is rarely of importance. And EXPLAIN often gets the value wrong.
The EXPLAIN shows 5 for the size of branch -- does it really need to be NULLable? Will you have 2 billion branches? Consider using something smaller, such as the 1-byte TINYINT UNSIGNED NOT NULL (values 0..255).
Also, shrink the 255 to something reasonable.
When describing a table, please use SHOW CREATE TABLE; it is more descriptive. It might help to know the Engine, Charset, etc.

MySQL query with JOIN and GROUP BY optimization. Is it possible?

I have two tables: gpnxuser and key_value
mysql> describe gpnxuser;
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| version | bigint(20) | NO | | NULL | |
| email | varchar(255) | YES | | NULL | |
| uuid | varchar(255) | NO | MUL | NULL | |
| partner_id | bigint(20) | NO | MUL | NULL | |
| password | varchar(255) | YES | | NULL | |
| date_created | datetime | YES | | NULL | |
| last_updated | datetime | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
and
mysql> describe key_value;
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| version | bigint(20) | NO | | NULL | |
| date_created | datetime | YES | | NULL | |
| last_updated | datetime | YES | | NULL | |
| upkey | varchar(255) | NO | MUL | NULL | |
| user_id | bigint(20) | YES | MUL | NULL | |
| security_level | int(11) | NO | | NULL | |
+----------------+--------------+------+-----+---------+----------------+
key_value.user_id is FK that references gpnxuser.id. I also have an index in gpnxuser.partner_id which is a FK that references a table called "partner" (which, I think, does not matter much to this question).
For partner_id = 64, I have 500K rows in gpnxuser which have relationship with approximatelly 6M rows in key_value.
I wanted to have a query that returned all distinct 'key_value.upkey' for userĀ“s belonging to a given partner. I did something like this:
select upkey from gpnxuser join key_value on gpnxuser.id=key_value.user_id where partner_id=64 group by upkey;
which takes forever to run. The explain for the query looks like:
mysql> explain select upkey from gpnxuser join key_value on gpnxuser.id=key_value.user_id where partner_id=64 group by upkey;
+----+-------------+-----------+------+----------------------------+--------------------+---------+-----------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+----------------------------+--------------------+---------+-----------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | gpnxuser | ref | PRIMARY,FKB2D9FEBE725C505E | FKB2D9FEBE725C505E | 8 | const | 259640 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | key_value | ref | FK9E0C0F912D11F5A9 | FK9E0C0F912D11F5A9 | 9 | gpnx_finance_db.gpnxuser.id | 14 | Using where |
+----+-------------+-----------+------+----------------------------+--------------------+---------+-----------------------------+--------+----------------------------------------------+
My question is: is there a query that can run fast and obtain the result that I want?
what you need to do is utilize EXISTS statement: This will cause only partial table scan until a match found and not more.
select upkey from (select distinct upkey from key_value) upk
where EXISTS
(select 1 from gpnxuser u, key_value kv
where u.id=kv.user_id and partner_id=1 and kv.upkey = upk.upkey)
NB. In the original query, group by is misused: distinct looks better there.
select DISTINCT upkey from gpnxuser join key_value on
gpnxuser.id=key_value.user_id where partner_id=1
I would look into partitioning your key_value table on user_id, if you typically run queries based on this column.
http://dev.mysql.com/doc/refman/5.1/en/partitioning.html

Simple MySQL query with performance issues

I have the following simple MySQL query:
SELECT SQL_NO_CACHE mainID
FROM tableName
WHERE otherID3=19
AND dateStartCol >= '2012-08-01'
AND dateStartCol <= '2012-08-31';
When I run this it takes 0.29 seconds to bring back 36074 results. When I increase my date period to bring back more results (65703) it runs in 0.56. When I run other similar SQL queries on the same server but on different tables (some tables are larger) the results come back in approximately 0.01 seconds.
Although 0.29 isn't slow - this is a basic part for a complex query and this timing means that it is not scalable.
See below for the table definition and indexes.
I know it's not server load as I have the same issue on a development server which has very little usage.
+---------------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------------+--------------+------+-----+---------+----------------+
| mainID | int(11) | NO | PRI | NULL | auto_increment |
| otherID1 | int(11) | NO | MUL | NULL | |
| otherID2 | int(11) | NO | MUL | NULL | |
| otherID3 | int(11) | NO | MUL | NULL | |
| keyword | varchar(200) | NO | MUL | NULL | |
| dateStartCol | date | NO | MUL | NULL | |
| timeStartCol | time | NO | MUL | NULL | |
| dateEndCol | date | NO | MUL | NULL | |
| timeEndCol | time | NO | MUL | NULL | |
| statusCode | int(1) | NO | MUL | NULL | |
| uRL | text | NO | | NULL | |
| hostname | varchar(200) | YES | MUL | NULL | |
| IPAddress | varchar(25) | YES | | NULL | |
| cookieVal | varchar(100) | NO | | NULL | |
| keywordVal | varchar(60) | NO | | NULL | |
| dateTimeCol | datetime | NO | MUL | NULL | |
+---------------------------+--------------+------+-----+---------+----------------+
+--------------------+------------+-------------------------------+--------------+---------------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------------+------------+-------------------------------+--------------+---------------------------+-----------+-------------+----------+--------+------+------------+---------+
| tableName | 0 | PRIMARY | 1 | mainID | A | 661990 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_otherID1 | 1 | otherID1 | A | 330995 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_otherID2 | 1 | otherID2 | A | 25 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_otherID3 | 1 | otherID3 | A | 48 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_dateStartCol | 1 | dateStartCol | A | 187 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_timeStartCol | 1 | timeStartCol | A | 73554 | NULL | NULL | | BTREE | |
|tableName | 1 | idx_dateEndCol | 1 | dateEndCol | A | 188 | NULL | NULL | | BTREE | |
|tableName | 1 | idx_timeEndCol | 1 | timeEndCol | A | 73554 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_keyword | 1 | keyword | A | 82748 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_hostname | 1 | hostname | A | 2955 | NULL | NULL | YES | BTREE | |
| tableName | 1 | idx_dateTimeCol | 1 | dateTimeCol | A | 220663 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_statusCode | 1 | statusCode | A | 2 | NULL | NULL | | BTREE | |
+--------------------+------------+-------------------------------+--------------+---------------------------+-----------+-------------+----------+--------+------+------------+---------+
Explain Output:
+----+-------------+-----------+-------+----------------------------------+-------------------+---------+------+-------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+-------+----------------------------------+-------------------+---------+------+-------+----------+-------------+
| 1 | SIMPLE | tableName | range | idx_otherID3,idx_dateStartCol | idx_dateStartCol | 3 | NULL | 66875 | 75.00 | Using where |
+----+-------------+-----------+-------+----------------------------------+-------------------+---------+------+-------+----------+-------------+
If that is really your query (and not a simplified version of same), then this ought to achieve best results:
CREATE INDEX table_ndx on tableName( otherID3, dateStartCol, mainID);
The first index entry means that the first match in the WHERE is very fast; the same also applies with dateStartCol. The third field is very small and does not slow the index appreciably, but allows for the datum you require to be found immediately in the index with no table access at all.
It is important that the keys are in the same index. In the EXPLAIN you posted, each key is in an index of its own, so even if MySQL chooses the best index, the performances will not be optimal. I'd try and use less indexes, for they also have a cost (shameless plug: Can Indices actually decrease SELECT performance? ).
First try to add the right key. It seems like dateStartCol is more selective than otherID3
ALTER TABLE tableName ADD KEY idx_dates(dateStartCol, dateStartCol)
Second - please make sure you select only rows you need by adding LIMIT clause to the SELECT. This will should up the query. Try like this:
SELECT SQL_NO_CACHE mainID
FROM tableName
WHERE otherID3=19
AND dateStartCol >= '2012-08-01'
AND dateStartCol <= '2012-08-31'
LIMIT 10;
Please also make sure that your MySQL tuned up properly. You may want to check key_buffer_size and innodb_buffer_pool_size as described in http://astellar.com/2011/12/why-is-stock-mysql-slow/
If this is a recurrent or important query then create a multiple column index:
CREATE INDEX index_name ON tableName (otherID3, dateStartCol)
Delete the non used indexes as they make table changes more expensive.
BTW you don't need two separate columns for date and time. You can combine then in a datetime or timestamp type. One less column and one less index.
The explain output shows it chose the dateStartCol index so you could try the opposite I suggested above:
CREATE INDEX index_name ON tableName (dateStartCol, otherID3)
Notice that the query's dateStartCol condition will still get 75% of the rows so not much improvement, if any, in using that single index.
How unique is otherID3? If there are not many repeated otherID3 you can hint the engine to use it.

MySQL Join Inconsistent Performance Depending on WHERE

I have a query that is taking several seconds to complete for certain WHERE conditions. To the best of my knowledge, everything is indexed correctly, but performance is still poor. All tables are InnoDB, blog_users and blog_users_profiles contain about 5 million rows and forum_contacts has about 10K rows.
The query
SELECT blog.*
FROM blog_users AS blog
INNER JOIN blog_user_profiles AS profile ON blog.user_id = profile.user_id
INNER JOIN forum_contacts AS contact ON profile.forum_id = contact.user_id
WHERE blog.comments > :comment_cutoff
AND blog.last_active > :time_cutoff
AND contact.type = :contact_type
AND contact.area = :location
LIMIT 0, 100
Explain output
+----+-------------+---------+--------+------------------------------+-----------+---------+------------------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+------------------------------+-----------+---------+------------------------------+---------+-------------+
| 1 | SIMPLE | blog | range | PRIMARY,last_active,comments | comments | 3 | NULL | 1313813 | Using where |
| 1 | SIMPLE | profile | eq_ref | PRIMARY,forum_id | PRIMARY | 3 | xc_db.blog.user_id | 1 | |
| 1 | SIMPLE | contact | eq_ref | PRIMARY,type,area,user_id | PRIMARY | 80 | xc_db.profile.forum_id,const | 1 | Using where |
+----+-------------+---------+--------+------------------------------+-----------+---------+------------------------------+---------+-------------+
Table structure (irrelevant rows snipped)
blog_users
+-------------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-----------------------+------+-----+---------+----------------+
| user_id | mediumint(8) unsigned | NO | PRI | NULL | auto_increment |
| username | varchar(16) | NO | UNI | NULL | |
| comments | mediumint(7) unsigned | NO | MUL | NULL | |
| last_active | int(10) unsigned | NO | MUL | NULL | |
+-------------+-----------------------+------+-----+---------+----------------+
blog_user_profiles
+----------+-----------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-----------------------+------+-----+---------+-------+
| user_id | mediumint(8) unsigned | NO | PRI | NULL | |
| forum_id | mediumint(8) unsigned | NO | MUL | 0 | |
+----------+-----------------------+------+-----+---------+-------+
forum_contacts
+---------+-----------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+-----------------------+------+-----+---------+-------+
| user_id | mediumint(8) unsigned | NO | PRI | NULL | |
| type | varchar(25) | NO | PRI | NULL | |
| area | varchar(255) | NO | MUL | NULL | |
+---------+-----------------------+------+-----+---------+-------+
The strange thing is that the execution time seems to be inversely related to the total number of rows returned. For example, a query like contact.area = 'United States' which returns a few thousand rows executes in less than 0.01 second. However, queries with fewer result rows such as contact.area = 'Egypt' which returns 20 rows takes in excess of 5 seconds to complete. Does my query need to be rewritten or is there a problem with the indexes?

Cannot figure out efficient SQL for 3-table INNER JOIN (MySQL) - "customers also bought" functionality

I'm trying to add the typical "customers who bought 'x' also bought 'y'" functionality to my website. Here is the table structure:
Table: qb_invoice
+--------------------------------+------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------------------+------------------+------+-----+-------------------+----------------+
| qbsql_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| TxnID | varchar(40) | YES | MUL | NULL | |
| Customer_ListID | varchar(40) | YES | MUL | NULL | |
| Customer_FullName | varchar(255) | YES | | NULL | |
+--------------------------------+------------------+------+-----+-------------------+----------------+
Table: qb_invoice_invoiceline
+-------------------------+------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------------+------------------+------+-----+-------------------+----------------+
| qbsql_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| Invoice_TxnID | varchar(40) | YES | MUL | NULL | |
| Item_ListID | varchar(40) | YES | MUL | NULL | |
| Item_FullName | varchar(255) | YES | | NULL | |
+-------------------------+------------------+------+-----+-------------------+----------------+
Table: qb_customer
+-------------------------------------+------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------------------------+------------------+------+-----+-------------------+----------------+
| qbsql_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| ListID | varchar(40) | YES | MUL | NULL | |
| Name | varchar(41) | YES | MUL | NULL | |
+-------------------------------------+------------------+------+-----+-------------------+----------------+
Given an Item_ListID I'd like a fast, efficient query to return a list of Item_ListID's along with a COUNT of the number of customers that ordered each item in the list, where all customers have in common the initially supplied Item_ListID.
Right now I have the following SQL that works, but is very slow:
SELECT qb_invoice_invoiceline.Item_FullName, count(*) as 'nummy'
FROM qb_invoice_invoiceline
WHERE qb_invoice_invoiceline.Invoice_TxnID =
ANY (SELECT qb_invoice.TxnID
FROM qb_invoice
INNER JOIN qb_customer ON qb_invoice.Customer_ListID = qb_customer.ListID
INNER JOIN qb_invoice_invoiceline ON qb_invoice.TxnID = qb_invoice_invoiceline.Invoice_TxnID
WHERE qb_invoice_invoiceline.Item_ListID = '1360000-57')
GROUP BY qb_invoice_invoiceline.Item_ListID
ORDER BY nummy DESC
I appreciate your help!
Here is the 'explain' output:
+----+--------------------+------------------------+-------+---------------------------+-------------+---------+-----------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------------------+-------+---------------------------+-------------+---------+-----------------------------------------+-------+----------------------------------------------+
| 1 | PRIMARY | qb_invoice_invoiceline | index | NULL | Item_ListID | 123 | NULL | 19690 | Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | qb_invoice_invoiceline | ref | Invoice_TxnID,Item_ListID | Item_ListID | 123 | const | 8 | Using where |
| 2 | DEPENDENT SUBQUERY | qb_invoice | ref | Customer_ListID,TxnID | TxnID | 123 | func | 206 | Using where |
| 2 | DEPENDENT SUBQUERY | qb_customer | ref | ListID | ListID | 123 | devdb.qb_invoice.Customer_ListID | 18 | Using where; Using index |
+----+--------------------+------------------------+-------+---------------------------+-------------+---------+-----------------------------------------+-------+----------------------------------------------+
Your query may be slow if there are no indexes available on the varchar fields that you are joining on. Can you give details on the indexes that are present on these tables?
I think that the query would benefit from indexes on qb_invoice.TxnID and qb_customer.ListID, and on qb_invoice_invoiceline.Item_ListID.