MySQL query with JOIN and GROUP BY optimization. Is it possible? - mysql

I have two tables: gpnxuser and key_value
mysql> describe gpnxuser;
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| version | bigint(20) | NO | | NULL | |
| email | varchar(255) | YES | | NULL | |
| uuid | varchar(255) | NO | MUL | NULL | |
| partner_id | bigint(20) | NO | MUL | NULL | |
| password | varchar(255) | YES | | NULL | |
| date_created | datetime | YES | | NULL | |
| last_updated | datetime | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
and
mysql> describe key_value;
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| version | bigint(20) | NO | | NULL | |
| date_created | datetime | YES | | NULL | |
| last_updated | datetime | YES | | NULL | |
| upkey | varchar(255) | NO | MUL | NULL | |
| user_id | bigint(20) | YES | MUL | NULL | |
| security_level | int(11) | NO | | NULL | |
+----------------+--------------+------+-----+---------+----------------+
key_value.user_id is FK that references gpnxuser.id. I also have an index in gpnxuser.partner_id which is a FK that references a table called "partner" (which, I think, does not matter much to this question).
For partner_id = 64, I have 500K rows in gpnxuser which have relationship with approximatelly 6M rows in key_value.
I wanted to have a query that returned all distinct 'key_value.upkey' for userĀ“s belonging to a given partner. I did something like this:
select upkey from gpnxuser join key_value on gpnxuser.id=key_value.user_id where partner_id=64 group by upkey;
which takes forever to run. The explain for the query looks like:
mysql> explain select upkey from gpnxuser join key_value on gpnxuser.id=key_value.user_id where partner_id=64 group by upkey;
+----+-------------+-----------+------+----------------------------+--------------------+---------+-----------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+----------------------------+--------------------+---------+-----------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | gpnxuser | ref | PRIMARY,FKB2D9FEBE725C505E | FKB2D9FEBE725C505E | 8 | const | 259640 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | key_value | ref | FK9E0C0F912D11F5A9 | FK9E0C0F912D11F5A9 | 9 | gpnx_finance_db.gpnxuser.id | 14 | Using where |
+----+-------------+-----------+------+----------------------------+--------------------+---------+-----------------------------+--------+----------------------------------------------+
My question is: is there a query that can run fast and obtain the result that I want?

what you need to do is utilize EXISTS statement: This will cause only partial table scan until a match found and not more.
select upkey from (select distinct upkey from key_value) upk
where EXISTS
(select 1 from gpnxuser u, key_value kv
where u.id=kv.user_id and partner_id=1 and kv.upkey = upk.upkey)
NB. In the original query, group by is misused: distinct looks better there.
select DISTINCT upkey from gpnxuser join key_value on
gpnxuser.id=key_value.user_id where partner_id=1

I would look into partitioning your key_value table on user_id, if you typically run queries based on this column.
http://dev.mysql.com/doc/refman/5.1/en/partitioning.html

Related

mysql: why below query unused union index?

table a
+----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| uid | int(11) | YES | MUL | NULL | |
| channel | varchar(20) | YES | | NULL | |
| createAt | datetime | YES | | NULL | |
+----------+-------------+------+-----+---------+-------+
table a index: a_index_uid_createAt` (`uid`,`createAt`)
table b:
+-----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| uid | int(11) | NO | PRI | NULL | |
| date | date | YES | MUL | NULL | |
| channel | varchar(20) | YES | MUL | NULL | |
| gender | smallint(6) | YES | MUL | NULL | |
| chargeAmt | int(11) | YES | | 0 | |
| revised | smallint(6) | YES | | 0 | |
| createAt | datetime | YES | | NULL | |
+-----------+-------------+------+-----+---------+-------+
query st:
select DATE(a.createAt) date,a.channel,b.chargeAmt
FROM a, b
where a.uid = b.uid
and a.createAt >= '2021-05-10 00:00:00'
and a.createAt <= '2021-05-10 23:59:59';
explain:
+----+-------------+-------+--------+-------------------------------------------------+---------+---------+--------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------------------------------+---------+---------+--------------+--------+-------------+
| 1 | SIMPLE | a | ALL | a_index_uid_createAt | NULL | NULL | NULL | 172725 | Using where |
| 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4 | xiehou.r.uid | 1 | |
+----+-------------+-------+--------+-------------------------------------------------+---------+---------+--------------+--------+-------------+
why? a_index_uid_createAt index invalid!
Please use the JOIN .. ON syntax:
select DATE(a.createAt) date, a.channel, b.chargeAmt
FROM a
JOIN b ON a.uid = b.uid -- How the tables are related
WHERE a.createAt >= '2021-05-10 -- filtering
and a.createAt < '2021-05-10 + INTERVAL 1 DAY; -- filtering
The Optimizer, when it sees a JOIN, starts by deciding which table to start with. The preferred table is the one with filtering, namely a.
To do the filtering, it needs an index that starts with the columns mentioned in the WHERE clause.
The other table will be reached by looking at the ON, which seems to have PRIMARY KEY(uid)
So, the only useful index is
a: INDEX(createAt)
Any INDEX(uid, ...) is likely to be unused, since it starts with an existing index, namely PRIMARY KEY(uid).
(In the future, please use SHOW CREATE TABLE; it is more descriptive than DESCRIBE.)

slow query even with the index on large table

I am performing a simple select query to extract username from table logs(containing 54864 rows).
It took about 7.836s to retrieve data.
How can I speed up the performace???
SELECT username FROM `logs`
WHERE
logs.branch=1
and
logs.added_on > '2016-11-27 00:00:00'
On describing table,
+-------------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| username | char(255) | YES | MUL | NULL | |
| fullname | char(255) | YES | | NULL | |
| package | char(255) | YES | | NULL | |
| prev_expiry | date | YES | | NULL | |
| recharged_upto | date | YES | | NULL | |
| payment_option | int(11) | YES | MUL | NULL | |
| amount | float(14,2) | YES | | NULL | |
| branch | int(11) | YES | MUL | NULL | |
| added_by | int(11) | YES | | NULL | |
| added_on | datetime | YES | MUL | NULL | |
| remark | text | YES | | NULL | |
| payment_mode | char(255) | YES | | NULL | |
| recharge_duration | char(255) | YES | | NULL | |
| invoice_number | char(255) | YES | | NULL | |
| cheque_no | char(255) | YES | | NULL | |
| bank_name | char(255) | YES | | NULL | |
| verify_by_ac | int(11) | YES | | 0 | |
| adjusted_days | int(11) | YES | | NULL | |
| adjustment_note | text | YES | | NULL | |
+-------------------+-------------+------+-----+---------+----------------+
20 rows in set
On explaining query,
+----+-------------+--------------------------+------+-----------------------------+--------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------------+------+-----------------------------+--------------+---------+-------+------+-------------+
| 1 | SIMPLE | logs | ref | branch_index,added_on_index | branch_index | 5 | const | 37 | Using where |
+----+-------------+--------------------------+------+-----------------------------+--------------+---------+-------+------+-------------+
1 row in set
updated:: explaing query after adding composite index(branch_added_index )
+----+-------------+--------------------------+------+------------------------------------------------+--------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------------+------+------------------------------------------------+--------------+---------+-------+------+-------------+
| 1 | SIMPLE | logs | ref | branch_index,added_on_index,branch_added_index | branch_index | 5 | const | 37 | Using where |
+----+-------------+--------------------------+------+------------------------------------------------+--------------+---------+-------+------+-------------+
1 row in set
Add a composite key on branch,added_on so you cover all the WHERE conditions since you use AND.
ALTER TABLE logs ADD KEY(branch,added_on)
This should be much faster,also you can drop the branch_index key since the above index can replace it.You only return 37 rows from 54000 so the cardinality is OK.
ALTER TABLE logs DROP INDEX `branch_index`;
Or you can use index hints
SELECT username FROM `logs` USE INDEX (branch_added_index) WHERE
logs.branch=1
and
logs.added_on > '2016-11-27 00:00:00'
if your table's existing index is already used for other queries try adding new composite index like below
create index <indexname> on logs(branch,added_on)
create a composite index on 2 fields (branch, added_on) like:
ALTER TABLE `logs` ADD KEY idx_branch_added (branch, added_on);
This will be even faster, because it is "covering":
INDEX(branch, added_on, username) -- in exactly that order.
(And drop any indexes that are prefixes of this.)
Index Cookbook
"Cardinality" is rarely of importance. And EXPLAIN often gets the value wrong.
The EXPLAIN shows 5 for the size of branch -- does it really need to be NULLable? Will you have 2 billion branches? Consider using something smaller, such as the 1-byte TINYINT UNSIGNED NOT NULL (values 0..255).
Also, shrink the 255 to something reasonable.
When describing a table, please use SHOW CREATE TABLE; it is more descriptive. It might help to know the Engine, Charset, etc.

Complicated join and MAX() query in MySQL

I need to make a join across 4 tables whilst picking the maximum (i.e. most recent) timestamp of test to associate with a person. For each student in a class, I want to lookup what their most recent test is, and get its ID and timestamp
SELECT students.ref,
students.fname,
students.sname,
classes.name AS 'group',
tests.id,
max(tests.timestamp)
FROM tests, students, classlinks, classes
WHERE tests.ref=students.ref AND
classlinks.ref=students.ref AND
classlinks.classid=29 AND
tests.grade=2 AND
tests.subject=2
GROUP BY students.ref
ORDER BY students.sname ASC, students.fname ASC
looks like it is perfect: for each student in a class, it gives the timestamp of their most recent test. Unfortunately, the test ID associated with that timestamp is wrong: it is just giving the test ID of a random test.
If I change the 'group by' to be
GROUP BY students.ref, tests.id
then the query matches correct test IDs to correct timestamps, but now there are several entries for each student. Does anyone have any advice so that I can get one row for each student, with correct test ID matched to correct most recent timestamp? Any help appreciated. Thanks.
Table descriptions:
mysql> describe students;
+--------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| ref | varchar(50) | NO | UNI | NULL | |
| fname | varchar(22) | NO | | NULL | |
| sname | varchar(22) | NO | | NULL | |
| school | int(11) | NO | | NULL | |
| year | int(11) | NO | | NULL | |
+--------+-------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)
mysql> describe classes;
+---------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| subject | int(11) | YES | MUL | NULL | |
| type | int(11) | YES | | 1 | |
| school | int(11) | YES | | NULL | |
| year | int(11) | YES | | NULL | |
| name | varchar(50) | YES | | NULL | |
+---------+-------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)
mysql> describe classlinks;
+---------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| ref | varchar(50) | YES | MUL | NULL | |
| subject | int(11) | YES | | NULL | |
| school | int(11) | YES | | NULL | |
| classid | int(11) | YES | MUL | NULL | |
| type | int(11) | YES | | 1 | |
+---------+-------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)
mysql> describe tests;
+------------+-------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| subject | int(11) | YES | | NULL | |
| ref | varchar(22) | NO | MUL | NULL | |
| test | int(3) | NO | | NULL | |
| grade | varchar(22) | NO | | NULL | |
| timestamp | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
| lastupdate | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
I am assuming that the combination of (ref,timestamp) is unique in tests table. Here is my solution but I don't have any of your sample data to verify it. If it is incorrect than post a sample data so that I can test it.
UPDATE
Here is the update query which is working check the sqlfiddle
SELECT students.ref,
students.fname,
students.sname,
classes.name AS 'group',
tests.id,
T.timestamp
FROM (select ref,max(timestamp) as timestamp from tests group by ref)as T
natural join tests, students, classlinks, classes
WHERE
T.ref=students.ref AND
classlinks.ref=students.ref AND
classlinks.classid=classes.id AND
classlinks.classid=29 AND
tests.grade=2 AND
tests.subject=2
ORDER BY students.sname ASC, students.fname ASC
Using the logic in SQL the query can be written as follows, not sure about mySQL but hope the logic works.
Select ref
,fname
,sname
,ID
,group
,Timestamp
From
(select
S.ref
,S.fname
,S.sname,
,T.id
,classes.name AS 'group'
,T.timestamp
from
tests T,students S, classlinks, classes
Where
T.ref=S.ref and
T.grade=2 AND
classlinks.ref=students.ref AND
classlinks.classid=29 AND
classlinks.classid=classes.id AND
T.subject=2 ) A
inner join
(SELECT tests.ref
,max(tests.timestamp)
FROM
tests
group by
tests.ref
) B
on
A.ref=b.ref and
A.timestamp = b.timestamp

MySQL Join Inconsistent Performance Depending on WHERE

I have a query that is taking several seconds to complete for certain WHERE conditions. To the best of my knowledge, everything is indexed correctly, but performance is still poor. All tables are InnoDB, blog_users and blog_users_profiles contain about 5 million rows and forum_contacts has about 10K rows.
The query
SELECT blog.*
FROM blog_users AS blog
INNER JOIN blog_user_profiles AS profile ON blog.user_id = profile.user_id
INNER JOIN forum_contacts AS contact ON profile.forum_id = contact.user_id
WHERE blog.comments > :comment_cutoff
AND blog.last_active > :time_cutoff
AND contact.type = :contact_type
AND contact.area = :location
LIMIT 0, 100
Explain output
+----+-------------+---------+--------+------------------------------+-----------+---------+------------------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+------------------------------+-----------+---------+------------------------------+---------+-------------+
| 1 | SIMPLE | blog | range | PRIMARY,last_active,comments | comments | 3 | NULL | 1313813 | Using where |
| 1 | SIMPLE | profile | eq_ref | PRIMARY,forum_id | PRIMARY | 3 | xc_db.blog.user_id | 1 | |
| 1 | SIMPLE | contact | eq_ref | PRIMARY,type,area,user_id | PRIMARY | 80 | xc_db.profile.forum_id,const | 1 | Using where |
+----+-------------+---------+--------+------------------------------+-----------+---------+------------------------------+---------+-------------+
Table structure (irrelevant rows snipped)
blog_users
+-------------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-----------------------+------+-----+---------+----------------+
| user_id | mediumint(8) unsigned | NO | PRI | NULL | auto_increment |
| username | varchar(16) | NO | UNI | NULL | |
| comments | mediumint(7) unsigned | NO | MUL | NULL | |
| last_active | int(10) unsigned | NO | MUL | NULL | |
+-------------+-----------------------+------+-----+---------+----------------+
blog_user_profiles
+----------+-----------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-----------------------+------+-----+---------+-------+
| user_id | mediumint(8) unsigned | NO | PRI | NULL | |
| forum_id | mediumint(8) unsigned | NO | MUL | 0 | |
+----------+-----------------------+------+-----+---------+-------+
forum_contacts
+---------+-----------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+-----------------------+------+-----+---------+-------+
| user_id | mediumint(8) unsigned | NO | PRI | NULL | |
| type | varchar(25) | NO | PRI | NULL | |
| area | varchar(255) | NO | MUL | NULL | |
+---------+-----------------------+------+-----+---------+-------+
The strange thing is that the execution time seems to be inversely related to the total number of rows returned. For example, a query like contact.area = 'United States' which returns a few thousand rows executes in less than 0.01 second. However, queries with fewer result rows such as contact.area = 'Egypt' which returns 20 rows takes in excess of 5 seconds to complete. Does my query need to be rewritten or is there a problem with the indexes?

Cannot figure out efficient SQL for 3-table INNER JOIN (MySQL) - "customers also bought" functionality

I'm trying to add the typical "customers who bought 'x' also bought 'y'" functionality to my website. Here is the table structure:
Table: qb_invoice
+--------------------------------+------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------------------+------------------+------+-----+-------------------+----------------+
| qbsql_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| TxnID | varchar(40) | YES | MUL | NULL | |
| Customer_ListID | varchar(40) | YES | MUL | NULL | |
| Customer_FullName | varchar(255) | YES | | NULL | |
+--------------------------------+------------------+------+-----+-------------------+----------------+
Table: qb_invoice_invoiceline
+-------------------------+------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------------+------------------+------+-----+-------------------+----------------+
| qbsql_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| Invoice_TxnID | varchar(40) | YES | MUL | NULL | |
| Item_ListID | varchar(40) | YES | MUL | NULL | |
| Item_FullName | varchar(255) | YES | | NULL | |
+-------------------------+------------------+------+-----+-------------------+----------------+
Table: qb_customer
+-------------------------------------+------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------------------------+------------------+------+-----+-------------------+----------------+
| qbsql_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| ListID | varchar(40) | YES | MUL | NULL | |
| Name | varchar(41) | YES | MUL | NULL | |
+-------------------------------------+------------------+------+-----+-------------------+----------------+
Given an Item_ListID I'd like a fast, efficient query to return a list of Item_ListID's along with a COUNT of the number of customers that ordered each item in the list, where all customers have in common the initially supplied Item_ListID.
Right now I have the following SQL that works, but is very slow:
SELECT qb_invoice_invoiceline.Item_FullName, count(*) as 'nummy'
FROM qb_invoice_invoiceline
WHERE qb_invoice_invoiceline.Invoice_TxnID =
ANY (SELECT qb_invoice.TxnID
FROM qb_invoice
INNER JOIN qb_customer ON qb_invoice.Customer_ListID = qb_customer.ListID
INNER JOIN qb_invoice_invoiceline ON qb_invoice.TxnID = qb_invoice_invoiceline.Invoice_TxnID
WHERE qb_invoice_invoiceline.Item_ListID = '1360000-57')
GROUP BY qb_invoice_invoiceline.Item_ListID
ORDER BY nummy DESC
I appreciate your help!
Here is the 'explain' output:
+----+--------------------+------------------------+-------+---------------------------+-------------+---------+-----------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------------------+-------+---------------------------+-------------+---------+-----------------------------------------+-------+----------------------------------------------+
| 1 | PRIMARY | qb_invoice_invoiceline | index | NULL | Item_ListID | 123 | NULL | 19690 | Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | qb_invoice_invoiceline | ref | Invoice_TxnID,Item_ListID | Item_ListID | 123 | const | 8 | Using where |
| 2 | DEPENDENT SUBQUERY | qb_invoice | ref | Customer_ListID,TxnID | TxnID | 123 | func | 206 | Using where |
| 2 | DEPENDENT SUBQUERY | qb_customer | ref | ListID | ListID | 123 | devdb.qb_invoice.Customer_ListID | 18 | Using where; Using index |
+----+--------------------+------------------------+-------+---------------------------+-------------+---------+-----------------------------------------+-------+----------------------------------------------+
Your query may be slow if there are no indexes available on the varchar fields that you are joining on. Can you give details on the indexes that are present on these tables?
I think that the query would benefit from indexes on qb_invoice.TxnID and qb_customer.ListID, and on qb_invoice_invoiceline.Item_ListID.