I have a query that is taking several seconds to complete for certain WHERE conditions. To the best of my knowledge, everything is indexed correctly, but performance is still poor. All tables are InnoDB, blog_users and blog_users_profiles contain about 5 million rows and forum_contacts has about 10K rows.
The query
SELECT blog.*
FROM blog_users AS blog
INNER JOIN blog_user_profiles AS profile ON blog.user_id = profile.user_id
INNER JOIN forum_contacts AS contact ON profile.forum_id = contact.user_id
WHERE blog.comments > :comment_cutoff
AND blog.last_active > :time_cutoff
AND contact.type = :contact_type
AND contact.area = :location
LIMIT 0, 100
Explain output
+----+-------------+---------+--------+------------------------------+-----------+---------+------------------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+------------------------------+-----------+---------+------------------------------+---------+-------------+
| 1 | SIMPLE | blog | range | PRIMARY,last_active,comments | comments | 3 | NULL | 1313813 | Using where |
| 1 | SIMPLE | profile | eq_ref | PRIMARY,forum_id | PRIMARY | 3 | xc_db.blog.user_id | 1 | |
| 1 | SIMPLE | contact | eq_ref | PRIMARY,type,area,user_id | PRIMARY | 80 | xc_db.profile.forum_id,const | 1 | Using where |
+----+-------------+---------+--------+------------------------------+-----------+---------+------------------------------+---------+-------------+
Table structure (irrelevant rows snipped)
blog_users
+-------------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-----------------------+------+-----+---------+----------------+
| user_id | mediumint(8) unsigned | NO | PRI | NULL | auto_increment |
| username | varchar(16) | NO | UNI | NULL | |
| comments | mediumint(7) unsigned | NO | MUL | NULL | |
| last_active | int(10) unsigned | NO | MUL | NULL | |
+-------------+-----------------------+------+-----+---------+----------------+
blog_user_profiles
+----------+-----------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-----------------------+------+-----+---------+-------+
| user_id | mediumint(8) unsigned | NO | PRI | NULL | |
| forum_id | mediumint(8) unsigned | NO | MUL | 0 | |
+----------+-----------------------+------+-----+---------+-------+
forum_contacts
+---------+-----------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+-----------------------+------+-----+---------+-------+
| user_id | mediumint(8) unsigned | NO | PRI | NULL | |
| type | varchar(25) | NO | PRI | NULL | |
| area | varchar(255) | NO | MUL | NULL | |
+---------+-----------------------+------+-----+---------+-------+
The strange thing is that the execution time seems to be inversely related to the total number of rows returned. For example, a query like contact.area = 'United States' which returns a few thousand rows executes in less than 0.01 second. However, queries with fewer result rows such as contact.area = 'Egypt' which returns 20 rows takes in excess of 5 seconds to complete. Does my query need to be rewritten or is there a problem with the indexes?
Related
table a
+----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| uid | int(11) | YES | MUL | NULL | |
| channel | varchar(20) | YES | | NULL | |
| createAt | datetime | YES | | NULL | |
+----------+-------------+------+-----+---------+-------+
table a index: a_index_uid_createAt` (`uid`,`createAt`)
table b:
+-----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| uid | int(11) | NO | PRI | NULL | |
| date | date | YES | MUL | NULL | |
| channel | varchar(20) | YES | MUL | NULL | |
| gender | smallint(6) | YES | MUL | NULL | |
| chargeAmt | int(11) | YES | | 0 | |
| revised | smallint(6) | YES | | 0 | |
| createAt | datetime | YES | | NULL | |
+-----------+-------------+------+-----+---------+-------+
query st:
select DATE(a.createAt) date,a.channel,b.chargeAmt
FROM a, b
where a.uid = b.uid
and a.createAt >= '2021-05-10 00:00:00'
and a.createAt <= '2021-05-10 23:59:59';
explain:
+----+-------------+-------+--------+-------------------------------------------------+---------+---------+--------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------------------------------+---------+---------+--------------+--------+-------------+
| 1 | SIMPLE | a | ALL | a_index_uid_createAt | NULL | NULL | NULL | 172725 | Using where |
| 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4 | xiehou.r.uid | 1 | |
+----+-------------+-------+--------+-------------------------------------------------+---------+---------+--------------+--------+-------------+
why? a_index_uid_createAt index invalid!
Please use the JOIN .. ON syntax:
select DATE(a.createAt) date, a.channel, b.chargeAmt
FROM a
JOIN b ON a.uid = b.uid -- How the tables are related
WHERE a.createAt >= '2021-05-10 -- filtering
and a.createAt < '2021-05-10 + INTERVAL 1 DAY; -- filtering
The Optimizer, when it sees a JOIN, starts by deciding which table to start with. The preferred table is the one with filtering, namely a.
To do the filtering, it needs an index that starts with the columns mentioned in the WHERE clause.
The other table will be reached by looking at the ON, which seems to have PRIMARY KEY(uid)
So, the only useful index is
a: INDEX(createAt)
Any INDEX(uid, ...) is likely to be unused, since it starts with an existing index, namely PRIMARY KEY(uid).
(In the future, please use SHOW CREATE TABLE; it is more descriptive than DESCRIBE.)
I would like to select random entries from a single table based on information coming from two other tables (one saved in a different database). The tables are as follows:
1- In databaseA called "islands" contains:
+-------------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| chrom | int(11) | NO | | NULL | |
| start | int(11) | NO | | NULL | |
| end | int(11) | NO | | NULL |
the indexes are:
+---------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| islands | 0 | PRIMARY | 1 | id | A | 15991 | NULL | NULL | | BTREE | | |
| islands | 1 | locations | 1 | line_string | A | NULL | 32 | NULL | | SPATIAL | | |
To select from this database I normally use:
SELECT * FROM islands FORCE INDEX (locations)
WHERE MBRIntersects(GeomFromText('Linestring(1 120, 1 120)'), line_string)
2 - In databaseB call "Context" contains:
+---------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| chrom | int(11) | NO | | NULL | |
| site | int(11) | NO | | NULL | |
| context | char(3) | NO | | NULL | |
There are only 4 possible contexts in this table (which are indexed)
3 - In databaseB called "Entries" Contains:
+-------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| chrom | int(11) | NO | MUL | NULL | |
| site | int(11) | NO | MUL | NULL | |
| methylation | float | NO | | NULL | |
I would like to select a random entry from the entries table that is located in the context table with a context entry of "CpG", can be located in the "islands" table or not (I have two different searches).
Update:
Based on comments below I am using
SELECT * FROM
(SELECT t.chrom, t.site, t.methylation, c.context
FROM f1 as t INNER JOIN context as c on c.chrom = t.chrom AND c.site = t.site
WHERE c.context = 'CpG'
) AS s LIMIT 2;
To get:
+-------+-----------+-------------+---------+
| chrom | site | methylation | context |
+-------+-----------+-------------+---------+
| 1 | 10003735 | 69 | CpG |
| 1 | 100063074 | 98.79 | CpG |
+-------+-----------+-------------+---------+
I would like to now join these results to the islands table to get the sites from the first part that are found within island regions (or not). I am using:
SELECT * FROM
(SELECT t.chrom, t.site, t.methylation, c.context
FROM f1 as t INNER JOIN context as c on c.chrom = t.chrom AND c.site = t.site
WHERE c.context = 'CpG'
) AS s
CROSS JOIN islands as i
WHERE MBRINTERSECTS(GeomFromText('Linestring(s.chrom s.site, s.chrom s.site)'), i.Line_string)
LIMIT 2;
However, this gives me an empty set (it shouldn't).
My database structure contains:
actions
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| placement_id | int(11) | YES | MUL | NULL | |
| lead_id | int(11) | NO | MUL | NULL | |
+--------------+--------------+------+-----+---------+----------------+
placements
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| publisher_id | int(11) | YES | MUL | NULL | |
| name | varchar(255) | NO | | NULL | |
| status | tinyint(1) | NO | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
leads
+---------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| status | varchar(255) | YES | | NULL | |
+---------+--------------+------+-----+---------+----------------+
I would like to retrieve a number of statuses per each placement (groupped):
| placement_id | placement_name | count
+--------------+----------------+-------
| 123 | PlacementOne | 12
| 567 | PlacementTwo | 15
I tried countless times and every query I wrote seems dumb, so I won't even post them here. I'm hopeless. An well commented query (so I can learn) would be much appreciated.
select a.placement_id, p.name as placement_name, count(l.status)
from actions a
inner join placements p on p.id = a.placement_id
inner join leads l on l.id = a.lead_id
group by a.placement_id, p.name
I have two tables: gpnxuser and key_value
mysql> describe gpnxuser;
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| version | bigint(20) | NO | | NULL | |
| email | varchar(255) | YES | | NULL | |
| uuid | varchar(255) | NO | MUL | NULL | |
| partner_id | bigint(20) | NO | MUL | NULL | |
| password | varchar(255) | YES | | NULL | |
| date_created | datetime | YES | | NULL | |
| last_updated | datetime | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
and
mysql> describe key_value;
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| version | bigint(20) | NO | | NULL | |
| date_created | datetime | YES | | NULL | |
| last_updated | datetime | YES | | NULL | |
| upkey | varchar(255) | NO | MUL | NULL | |
| user_id | bigint(20) | YES | MUL | NULL | |
| security_level | int(11) | NO | | NULL | |
+----------------+--------------+------+-----+---------+----------------+
key_value.user_id is FK that references gpnxuser.id. I also have an index in gpnxuser.partner_id which is a FK that references a table called "partner" (which, I think, does not matter much to this question).
For partner_id = 64, I have 500K rows in gpnxuser which have relationship with approximatelly 6M rows in key_value.
I wanted to have a query that returned all distinct 'key_value.upkey' for userĀ“s belonging to a given partner. I did something like this:
select upkey from gpnxuser join key_value on gpnxuser.id=key_value.user_id where partner_id=64 group by upkey;
which takes forever to run. The explain for the query looks like:
mysql> explain select upkey from gpnxuser join key_value on gpnxuser.id=key_value.user_id where partner_id=64 group by upkey;
+----+-------------+-----------+------+----------------------------+--------------------+---------+-----------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+----------------------------+--------------------+---------+-----------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | gpnxuser | ref | PRIMARY,FKB2D9FEBE725C505E | FKB2D9FEBE725C505E | 8 | const | 259640 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | key_value | ref | FK9E0C0F912D11F5A9 | FK9E0C0F912D11F5A9 | 9 | gpnx_finance_db.gpnxuser.id | 14 | Using where |
+----+-------------+-----------+------+----------------------------+--------------------+---------+-----------------------------+--------+----------------------------------------------+
My question is: is there a query that can run fast and obtain the result that I want?
what you need to do is utilize EXISTS statement: This will cause only partial table scan until a match found and not more.
select upkey from (select distinct upkey from key_value) upk
where EXISTS
(select 1 from gpnxuser u, key_value kv
where u.id=kv.user_id and partner_id=1 and kv.upkey = upk.upkey)
NB. In the original query, group by is misused: distinct looks better there.
select DISTINCT upkey from gpnxuser join key_value on
gpnxuser.id=key_value.user_id where partner_id=1
I would look into partitioning your key_value table on user_id, if you typically run queries based on this column.
http://dev.mysql.com/doc/refman/5.1/en/partitioning.html
I'm trying to add the typical "customers who bought 'x' also bought 'y'" functionality to my website. Here is the table structure:
Table: qb_invoice
+--------------------------------+------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------------------+------------------+------+-----+-------------------+----------------+
| qbsql_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| TxnID | varchar(40) | YES | MUL | NULL | |
| Customer_ListID | varchar(40) | YES | MUL | NULL | |
| Customer_FullName | varchar(255) | YES | | NULL | |
+--------------------------------+------------------+------+-----+-------------------+----------------+
Table: qb_invoice_invoiceline
+-------------------------+------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------------+------------------+------+-----+-------------------+----------------+
| qbsql_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| Invoice_TxnID | varchar(40) | YES | MUL | NULL | |
| Item_ListID | varchar(40) | YES | MUL | NULL | |
| Item_FullName | varchar(255) | YES | | NULL | |
+-------------------------+------------------+------+-----+-------------------+----------------+
Table: qb_customer
+-------------------------------------+------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------------------------+------------------+------+-----+-------------------+----------------+
| qbsql_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| ListID | varchar(40) | YES | MUL | NULL | |
| Name | varchar(41) | YES | MUL | NULL | |
+-------------------------------------+------------------+------+-----+-------------------+----------------+
Given an Item_ListID I'd like a fast, efficient query to return a list of Item_ListID's along with a COUNT of the number of customers that ordered each item in the list, where all customers have in common the initially supplied Item_ListID.
Right now I have the following SQL that works, but is very slow:
SELECT qb_invoice_invoiceline.Item_FullName, count(*) as 'nummy'
FROM qb_invoice_invoiceline
WHERE qb_invoice_invoiceline.Invoice_TxnID =
ANY (SELECT qb_invoice.TxnID
FROM qb_invoice
INNER JOIN qb_customer ON qb_invoice.Customer_ListID = qb_customer.ListID
INNER JOIN qb_invoice_invoiceline ON qb_invoice.TxnID = qb_invoice_invoiceline.Invoice_TxnID
WHERE qb_invoice_invoiceline.Item_ListID = '1360000-57')
GROUP BY qb_invoice_invoiceline.Item_ListID
ORDER BY nummy DESC
I appreciate your help!
Here is the 'explain' output:
+----+--------------------+------------------------+-------+---------------------------+-------------+---------+-----------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------------------+-------+---------------------------+-------------+---------+-----------------------------------------+-------+----------------------------------------------+
| 1 | PRIMARY | qb_invoice_invoiceline | index | NULL | Item_ListID | 123 | NULL | 19690 | Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | qb_invoice_invoiceline | ref | Invoice_TxnID,Item_ListID | Item_ListID | 123 | const | 8 | Using where |
| 2 | DEPENDENT SUBQUERY | qb_invoice | ref | Customer_ListID,TxnID | TxnID | 123 | func | 206 | Using where |
| 2 | DEPENDENT SUBQUERY | qb_customer | ref | ListID | ListID | 123 | devdb.qb_invoice.Customer_ListID | 18 | Using where; Using index |
+----+--------------------+------------------------+-------+---------------------------+-------------+---------+-----------------------------------------+-------+----------------------------------------------+
Your query may be slow if there are no indexes available on the varchar fields that you are joining on. Can you give details on the indexes that are present on these tables?
I think that the query would benefit from indexes on qb_invoice.TxnID and qb_customer.ListID, and on qb_invoice_invoiceline.Item_ListID.