MySql query is slow with join - how to speed it up - mysql

I have to export 554k records from our mysql db. At the current rate it will take 5 days to export the data and the slowness is mainly caused by the query below. The data structure consists of
Companies
--Contacts
----(Contact)Activities
For the contacts, we have an index on company_id. On the activities table, we have an index for contact_id and company_id which map back to the respective contacts and companies tables.
I need to grab each contact and the latest activity date that they have. This is the query that I'm running and it takes about .5 second to execute.
Select *
from contacts
left outer join (select occurred_at
,contact_id
from activities
where occurred_at is not null
group by contact_id
order by occurred_at desc) activities
on contacts.id = activities.contact_id
where company_id = 20
If I remove the join and just select * from contacts where company_id=20 the query executes in .016 sec.
If I use Explain for info on the join query I get this
Any ideas on how I can speed this up?
Edit:
Here are the table definitions.
CREATE TABLE `companies` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`street_address` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`city` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`state` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`county` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`website` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`external_id` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`user_id` int(11) DEFAULT NULL,
`falloff_date` date DEFAULT NULL,
`zipcode` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`phone` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`company_id` int(11) DEFAULT NULL,
`order_count` int(11) NOT NULL DEFAULT '0',
`active_job_count` int(11) NOT NULL DEFAULT '0',
`duplicate_of` int(11) DEFAULT NULL,
`warm_date` datetime DEFAULT NULL,
`employee_size` int(11) DEFAULT NULL,
`dup_checked` tinyint(1) DEFAULT '0',
`rating` int(11) DEFAULT NULL,
`delinquent` tinyint(1) DEFAULT '0',
`cconly` tinyint(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `index_companies_on_name` (`name`),
KEY `index_companies_on_user_id` (`user_id`),
KEY `index_companies_on_company_id` (`company_id`),
KEY `index_companies_on_external_id` (`external_id`),
KEY `index_companies_on_state_and_dup_checked` (`id`,`state`,`dup_checked`,`duplicate_of`),
KEY `index_companies_on_dup_checked` (`id`,`dup_checked`),
KEY `index_companies_on_dup_checked_name` (`dup_checked`,`name`),
KEY `index_companies_on_county` (`county`,`state`)
) ENGINE=InnoDB AUTO_INCREMENT=15190300 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `contacts` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`first_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`last_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`title` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`phone` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`extension` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`fax` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`email` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`active` tinyint(1) DEFAULT NULL,
`main` tinyint(1) DEFAULT NULL,
`company_id` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`external_id` int(11) DEFAULT NULL,
`second_phone` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_contacts_on_company_id` (`company_id`),
KEY `index_contacts_on_first_name` (`first_name`),
KEY `index_contacts_on_last_name` (`last_name`),
KEY `index_contacts_on_phone` (`phone`),
KEY `index_contacts_on_email` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=11241088 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `activities` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`kind` int(11) DEFAULT NULL,
`contact_id` int(11) DEFAULT NULL,
`call_status` int(11) DEFAULT NULL,
`occurred_at` datetime DEFAULT NULL,
`notes` text COLLATE utf8_unicode_ci,
`user_id` int(11) DEFAULT NULL,
`scheduled_for` datetime DEFAULT NULL,
`priority` tinyint(1) DEFAULT NULL,
`company_id` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`from_user_id` int(11) DEFAULT NULL,
`to_user_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_activities_on_contact_id` (`contact_id`),
KEY `index_activities_on_user_id` (`user_id`),
KEY `index_activities_on_company_id` (`company_id`)
) ENGINE=InnoDB AUTO_INCREMENT=515340 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

This is a greatest-n-per-group query, which comes up frequently on Stack Overflow.
Here's a solution that uses a MySQL 8.0 window function:
WITH latest_activities AS (
SELECT contact_id, occurred_at,
ROW_NUMBER() OVER (PARTITION BY contact_id ORDER BY occurred_at DESC) AS rn
FROM activities
)
SELECT *
FROM contacts AS c
LEFT OUTER JOIN latest_activities
ON c.id = latest_activities.contact_id AND latest_activities.rn = 1
WHERE c.company_id = 20
Here's a solution that should work on pre-8.0 versions:
SELECT c.*, a.*
FROM contacts AS c
LEFT OUTER JOIN activities AS a ON a.contact_id = c.id
LEFT OUTER JOIN activities AS a2 ON a2.contact_id = c.id
AND a2.occurred_at > a.occurred_at
WHERE c.company_id = 20
AND a2.contact_id IS NULL;
Another solution:
SELECT c.*, a.*
FROM contacts AS c
LEFT OUTER JOIN activities AS a ON a.contact_id = c.id
LEFT OUTER JOIN (
SELECT c2.contact_id, MAX(a2.occurred_at) AS occurred_at
FROM activities AS a2
INNER JOIN contacts AS c2 ON a2.contact_id = c2.id
WHERE c2.company_id = 20
GROUP BY c2.contact_id ORDER BY NULL
) AS latest_activities
ON latest_activities.contact_id = c.id
AND latest_activities.occurred_at = a.occurred_at
WHERE c.company_id = 20
It would be helpful to create a new index on activities (contact_id, occurred_at).

Don't use subqueries in the FROM clause if you can help it. They impede the MySQL optimizer. So, if you want one row:
Select c.*, a.occurred_at
from contacts c left outer join
from activities a
on c.id = a.contact_id and
a.occurred_at is not null
where c.company_id = 20
order by a.occurred_at desc
limit 1;
If you want one row per contact_id:
Select c.*, a.occurred_at
from contacts c left outer join
from activities a
on c.id = a.contact_id and
a.occurred_at is not null and
a.occurred_at = (select max(a2.occurred_at)
from activities a2
where a2.contact_id = a.contact_id
)
where c.company_id = 20
order by a.occurred_at desc
limit 1;
This can make use of an index on activities(contact_id, occured_at). and contact(company_id, contact_id).
Your query is doing one thing that is a clear no-no -- and no longer supported by the default settings in the most recent versions of MySQL. You have unaggregated columns in a select that are not in the group by. The contact_id should be generating an error.

I feel like I am overlooking something with how complicated the other answers are, but I would think this would be all you need.
SELECT c.*
, MAX(a.occurred_at) AS occurred_at
FROM contacts AS c
LEFT JOIN activities AS a
ON c.id = a.contact_id AND a.occurred_at IS NOT NULL
WHERE c.company_id = 20
GROUP BY c.id;
Notes: (1) this assumes you didn't actually want the duplicate contact_id from your original subquery to be in the final results. (2) This also assumes your server is not configured to require a full group by; if it is, you will need to manually expand c.* into the full column list, and copy that list to the GROUP BY clause as well.
Expanding on dnoeth's comments to your question; if you are not querying each company separately for a particular reason (chunking for load, code structure handling this also handles other stuff company by company, whatever), you could tweak the above query like so to get all your results in one query.
SELECT con.*
, MAX(a.occurred_at) AS occurred_at
FROM companies AS com
INNER JOIN contacts AS con ON com.id = con.company_id
LEFT JOIN activities AS a
ON con.id = a.contact_id AND a.occurred_at IS NOT NULL
WHERE [criteria for companies chosen to be queried]
GROUP BY con.id
ORDER BY con.company_id, con.id
;

Related

Optimizing MySQL "IN" Select?

I have the following MySQL query:
SELECT
`influencers`.*,
`locations`.`country_name`
FROM
`influencers`
LEFT JOIN `locations` ON `influencers`.`country_id` = `locations`.`id`
WHERE
`is_dead` = 0
AND `influencers`.`is_private` = 0
AND `influencers`.`country_id` = '31'
AND influencers.uuid IN(
SELECT
`influencer_uuid` FROM `category_influencer`
WHERE
`category_influencer`.`category_id` = 17
AND `category_influencer`.`is_main` = 1)
ORDER BY
`influencers`.`followed_by` DESC
LIMIT 7 OFFSET 6
I have identified the IN subquery is causing a lag of around 10s for this query to complete. Here is the EXPLAIN:
I have indexes on all columns being queried.
How can I significantly speed this query up?
Updated with SHOW CREATE TABLE for both:
locations
CREATE TABLE `locations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`country_name` varchar(255) DEFAULT NULL,
`city_name` varchar(255) DEFAULT NULL,
`type` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_index` (`city_name`, `country_name`),
KEY `type` (`type`)
USING BTREE) ENGINE = InnoDB AUTO_INCREMENT = 6479 DEFAULT CHARSET = utf8mb4
influencers
CREATE TABLE `influencers` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`bio` varchar(255) CHARACTER
SET utf8mb4 DEFAULT NULL,
`url` varchar(255) CHARACTER
SET utf8mb4 DEFAULT NULL,
`followed_by` int(11) DEFAULT NULL,
`follows` int(11) DEFAULT NULL,
`full_name` varchar(255) CHARACTER
SET utf8mb4 NOT NULL,
`social_id` varchar(255) DEFAULT NULL,
`is_private` tinyint (1) DEFAULT NULL,
`avatar` varchar(255) NOT NULL,
`username` varchar(30) NOT NULL,
`text_search` text CHARACTER
SET utf8mb4 NOT NULL,
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`uuid` varchar(255) DEFAULT NULL,
`is_dead` tinyint (4) DEFAULT NULL,
`country_id` int(11) DEFAULT NULL,
`city_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `username` (`username`),
UNIQUE KEY `uuid` (`uuid`),
KEY `is_dead` (`is_dead`),
KEY `updated_at` (`updated_at`),
KEY `followed_by` (`followed_by`),
KEY `social_id` (`social_id`),
KEY `is_private` (`is_private`),
KEY `country_id` (`country_id`),
FULLTEXT KEY `text_search` (`text_search`)) ENGINE = InnoDB AUTO_INCREMENT = 2278376 DEFAULT CHARSET = utf8
You could avoid the in clause using an inner join
SELECT
`influencers`.*,
`locations`.`country_name`
FROM
`influencers`
INNER JOIN (
SELECT
`influencer_uuid` FROM `category_influencer`
WHERE
`category_id` = 17
AND `is_main` = 1
) T ON T.influencer_uuid = influencers.uuid
LEFT JOIN `locations` ON `influencers`.`country_id` = `locations`.`id`
WHERE
`is_dead` = 0
AND `is_private` = 0
AND `country_id` = '31'
ORDER BY
`followed_by` DESC
LIMIT 7 OFFSET 6
This way instead of an iteration on all the IN result you use just a single relational match based on join
Unless I missed something, you can replace the subselect with JOIN:
SELECT influencers.*,
locations.country_name
FROM influencers
JOIN category_influencer T ON (
T.influencer_uuid = influencers.uuid
AND category_id = 17
AND is_main = 1)
LEFT JOIN locations ON influencers.country_id = locations.id
WHERE is_dead = 0
AND is_private = 0
AND country_id = '31'
ORDER BY followed_by DESC
LIMIT 7 OFFSET 6

MySQL Select Query running very slow

I have a MySQL query that works but is very slow. I am guessing due to the amount of joins.
SELECT
order_header.order_head_id,
order_header.order_date,
order_header.status,
suppliers.supplier,
categories.category,
order_header.user,
order_header.sage_ref,
SUM(order_lines.total_price) AS price
FROM
order_header
LEFT JOIN
order_lines ON order_header.order_head_id = order_lines.order_head_id
LEFT JOIN
suppliers ON order_header.supplier_id = suppliers.supp_id
LEFT JOIN
categories ON order_header.category = categories.cat_id
WHERE
order_header.status LIKE '%'
AND order_header.order_head_id LIKE '%'
AND order_header.user LIKE '%'
GROUP BY order_header.order_head_id
ORDER BY order_head_id DESC
LIMIT 50;
Results of the EXPLAIN query
SHOW CREATE TABLE results
CREATE TABLE `categories` (
`cat_id` int(11) NOT NULL AUTO_INCREMENT,
`category` varchar(45) DEFAULT NULL,
`status` varchar(45) DEFAULT NULL,
PRIMARY KEY (`cat_id`)
) ENGINE=InnoDB AUTO_INCREMENT=63 DEFAULT CHARSET=latin1
CREATE TABLE `order_header` (
`order_head_id` int(11) NOT NULL AUTO_INCREMENT,
`status` varchar(45) DEFAULT NULL,
`category` varchar(45) NOT NULL,
`order_date` date DEFAULT NULL,
`supplier_id` varchar(45) NOT NULL,
`user` varchar(45) DEFAULT NULL,
`sage_ref` varchar(45) DEFAULT NULL,
`query_notes` varchar(500) DEFAULT NULL,
PRIMARY KEY (`order_head_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2249 DEFAULT CHARSET=latin1
CREATE TABLE `order_lines` (
`order_lines_id` int(11) NOT NULL AUTO_INCREMENT,
`order_head_id` int(11) DEFAULT NULL,
`qty` int(11) DEFAULT NULL,
`description` varchar(255) DEFAULT NULL,
`unit_price` decimal(65,2) DEFAULT NULL,
`total_price` decimal(65,2) DEFAULT NULL,
PRIMARY KEY (`order_lines_id`)
) ENGINE=InnoDB AUTO_INCREMENT=3981 DEFAULT CHARSET=latin1
CREATE TABLE `suppliers` (
`supp_id` int(11) NOT NULL AUTO_INCREMENT,
`supplier` varchar(255) DEFAULT NULL,
`status` varchar(225) DEFAULT NULL,
PRIMARY KEY (`supp_id`)
) ENGINE=InnoDB AUTO_INCREMENT=161 DEFAULT CHARSET=latin1
SQL Version 5.6.30
I am not great on MySQL and was wondering if anyone can see a way to improve the query so that it runs quicker.
Your help would be gratefully appreciated.
Many thanks,
John
It can make sense to wrap the first (left) join into a GROUP BY subquery. GROUP BY and LIMIT will limit the number of row which will be used in the following two joins:
SELECT
x.order_head_id,
x.order_date,
x.status,
suppliers.supplier,
categories.category,
x.user,
x.sage_ref,
x.price
FROM (
SELECT
order_header.supplier_id,
order_header.category,
order_header.order_head_id,
order_header.order_date,
order_header.status,
order_header.user,
order_header.sage_ref,
SUM(order_lines.total_price) AS price
FROM order_header
LEFT JOIN order_lines ON order_header.order_head_id = order_lines.order_head_id
WHERE order_header.status LIKE '%'
AND order_header.order_head_id LIKE '%'
AND order_header.user LIKE '%'
GROUP BY order_header.order_head_id
ORDER BY order_head_id DESC
LIMIT 50
) x
LEFT JOIN suppliers ON x.supplier_id = suppliers.supp_id
LEFT JOIN categories ON x.category = categories.cat_id
ORDER BY order_head_id DESC

MySQL performance with nested sub query

I have two tables messages and members. I tried joining tables without having a nested query but it does not reflect the join on members. So, I initially thought that I could do the following
SELECT M1.*, COUNT(M2.emid) AS replies FROM messages M1
LEFT JOIN messages M2
ON M2.thread = M1.emid
INNER JOIN members M
ON M.meid = M1.emitter
WHERE
M1.thread is NULL AND
M1.receiver = 2
GROUP BY
M1.emid
but it does not seem to join the corresponding member. Then I tried this and it gives me the result that I need but I would like to know if there is a way to accomplish the same result using joins without the nested query
SELECT * FROM (
SELECT M1.*, COUNT(M2.emid) AS replies FROM messages M1
LEFT JOIN messages M2
ON M2.thread = M1.emid
WHERE
M1.thread is NULL AND
M1.receiver = 2
GROUP BY
M1.emid
) O INNER JOIN members M ON O.receiver = M.meid
-- Table structure for table members
CREATE TABLE `members` (
`meid` bigint(64) NOT NULL,
`name` varchar(32) DEFAULT NULL,
`lastname` varchar(32) DEFAULT NULL,
`email` varchar(128) NOT NULL,
`mobile` char(10) DEFAULT NULL,
`college` bigint(64) NOT NULL,
`major` bigint(64) NOT NULL,
`password` varchar(256) NOT NULL,
`oauth` varchar(128) DEFAULT NULL,
`confirmed` tinyint(4) DEFAULT NULL,
`active` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`joined` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
-- Table structure for table messages
CREATE TABLE `messages` (
`emid` bigint(20) NOT NULL,
`emitter` bigint(20) NOT NULL,
`receiver` bigint(20) NOT NULL,
`thread` bigint(20) DEFAULT NULL,
`opened` tinyint(4) DEFAULT '0',
`message` blob NOT NULL,
`timecard` datetime DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Improve MySQL nested select performance with join

I have seen many samples to improve MySQL nested selects with joins, but I can't figure this out for this query:
SELECT * FROM messages WHERE answer = 'SuccessSubscribed' AND phone NOT IN
(SELECT phone FROM messages WHERE answer = 'SuccessUnSubscribed');
the query finds people who have subscribed but never unsubscribed.
Table structure:
CREATE TABLE `messages` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`phone` varchar(12) COLLATE utf8_persian_ci NOT NULL,
`content` varchar(300) COLLATE utf8_persian_ci NOT NULL,
`flags` int(10) unsigned NOT NULL DEFAULT '0',
`answer` varchar(50) COLLATE utf8_persian_ci DEFAULT NULL,
....,
PRIMARY KEY (`id`),
....
) ENGINE=InnoDB CHARSET=utf8 COLLATE=utf8_persian_ci
Instead of the NOT IN, you can use LEFT JOIN with NULL check.
SELECT M1.*
FROM messages M1
LEFT JOIN messages M2 ON M2.phone = M1.phone AND M2.answer = 'SuccessUnSubscribed'
WHERE M1.answer = 'SuccessSubscribed' AND M2.phone IS NULL

Problem with sorting comments in date order

I am novice in MySQL and I have a problem with sorting two tables.
This SQL is about sorting newest comments on books, but I am getting these books sorted by first comments on them, not on latest.
SELECT b.*, c.date_added as date FROM books b
LEFT JOIN comments c ON (b.id = c.book_id)
GROUP BY b.id
ORDER BY date DESC
LIMIT 5
CREATE TABLE IF NOT EXISTS `books` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`caption` varchar(255) COLLATE utf8_bin NOT NULL,
`author` varchar(255) COLLATE utf8_bin NOT NULL,
`pages` int(11) NOT NULL,
`category_id` int(11) NOT NULL,
`filename` varchar(255) COLLATE utf8_bin NOT NULL,
`description` text COLLATE utf8_bin NOT NULL,
`date_added` datetime NOT NULL,
`publisher` varchar(255) COLLATE utf8_bin NOT NULL,
`price` decimal(10,2) NOT NULL,
`times_sold` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
CREATE TABLE IF NOT EXISTS `comments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`book_id` int(11) NOT NULL,
`author` varchar(255) COLLATE utf8_bin NOT NULL,
`email` varchar(255) COLLATE utf8_bin NOT NULL,
`body` text COLLATE utf8_bin NOT NULL,
`date_added` datetime NOT NULL,
`approved` tinyint(4) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Thank you for your time.
This should work out. You are specifying the group on b.id, but since you are multiple comments for each book, you need an aggregate function on c.date_added. In this case you can use MAX to show the most recent comment date.
SELECT b.*, MAX(c.date_added) as date FROM books b
LEFT JOIN comments c ON (b.id = c.book_id)
GROUP BY b.id
ORDER BY MAX(c.date_added) DESC
LIMIT 5
The 'Group By' clause is not correct and you need to remove it.
'Group By' is used for aggregate functions like Min() and Max().
Try This
SELECT b.*, c.date_added FROM books b
JOIN comments c ON
c.book_id = b.id
ORDER BY c.date_added DESC
LIMIT 5
There's no field called 'date', you need to order it by date_added (ORDER BY date_added DESC) as it is in your table!
If it's still not in the right order, just use ASC instead of DESC.