Optimizing MySQL "IN" Select? - mysql

I have the following MySQL query:
SELECT
`influencers`.*,
`locations`.`country_name`
FROM
`influencers`
LEFT JOIN `locations` ON `influencers`.`country_id` = `locations`.`id`
WHERE
`is_dead` = 0
AND `influencers`.`is_private` = 0
AND `influencers`.`country_id` = '31'
AND influencers.uuid IN(
SELECT
`influencer_uuid` FROM `category_influencer`
WHERE
`category_influencer`.`category_id` = 17
AND `category_influencer`.`is_main` = 1)
ORDER BY
`influencers`.`followed_by` DESC
LIMIT 7 OFFSET 6
I have identified the IN subquery is causing a lag of around 10s for this query to complete. Here is the EXPLAIN:
I have indexes on all columns being queried.
How can I significantly speed this query up?
Updated with SHOW CREATE TABLE for both:
locations
CREATE TABLE `locations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`country_name` varchar(255) DEFAULT NULL,
`city_name` varchar(255) DEFAULT NULL,
`type` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_index` (`city_name`, `country_name`),
KEY `type` (`type`)
USING BTREE) ENGINE = InnoDB AUTO_INCREMENT = 6479 DEFAULT CHARSET = utf8mb4
influencers
CREATE TABLE `influencers` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`bio` varchar(255) CHARACTER
SET utf8mb4 DEFAULT NULL,
`url` varchar(255) CHARACTER
SET utf8mb4 DEFAULT NULL,
`followed_by` int(11) DEFAULT NULL,
`follows` int(11) DEFAULT NULL,
`full_name` varchar(255) CHARACTER
SET utf8mb4 NOT NULL,
`social_id` varchar(255) DEFAULT NULL,
`is_private` tinyint (1) DEFAULT NULL,
`avatar` varchar(255) NOT NULL,
`username` varchar(30) NOT NULL,
`text_search` text CHARACTER
SET utf8mb4 NOT NULL,
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`uuid` varchar(255) DEFAULT NULL,
`is_dead` tinyint (4) DEFAULT NULL,
`country_id` int(11) DEFAULT NULL,
`city_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `username` (`username`),
UNIQUE KEY `uuid` (`uuid`),
KEY `is_dead` (`is_dead`),
KEY `updated_at` (`updated_at`),
KEY `followed_by` (`followed_by`),
KEY `social_id` (`social_id`),
KEY `is_private` (`is_private`),
KEY `country_id` (`country_id`),
FULLTEXT KEY `text_search` (`text_search`)) ENGINE = InnoDB AUTO_INCREMENT = 2278376 DEFAULT CHARSET = utf8

You could avoid the in clause using an inner join
SELECT
`influencers`.*,
`locations`.`country_name`
FROM
`influencers`
INNER JOIN (
SELECT
`influencer_uuid` FROM `category_influencer`
WHERE
`category_id` = 17
AND `is_main` = 1
) T ON T.influencer_uuid = influencers.uuid
LEFT JOIN `locations` ON `influencers`.`country_id` = `locations`.`id`
WHERE
`is_dead` = 0
AND `is_private` = 0
AND `country_id` = '31'
ORDER BY
`followed_by` DESC
LIMIT 7 OFFSET 6
This way instead of an iteration on all the IN result you use just a single relational match based on join

Unless I missed something, you can replace the subselect with JOIN:
SELECT influencers.*,
locations.country_name
FROM influencers
JOIN category_influencer T ON (
T.influencer_uuid = influencers.uuid
AND category_id = 17
AND is_main = 1)
LEFT JOIN locations ON influencers.country_id = locations.id
WHERE is_dead = 0
AND is_private = 0
AND country_id = '31'
ORDER BY followed_by DESC
LIMIT 7 OFFSET 6

Related

How can I optimize a operation with the Database in django?

I have a query which looks something like this:
SELECT
STRAIGHT_JOIN `reviewApp_review`.`id`, `reviewApp_review`.`reviewTitle`,
`reviewApp_review`.`reviewContent`, `reviewApp_review`.`translatedEnTitle`,
`reviewApp_review`.`translatedEnContent`, `reviewApp_review`.`translatedEnDate`,
`reviewApp_review`.`reviewLink`, `reviewApp_review`.`reviewUser`,
`reviewApp_review`.`reviewUserProfile`, `reviewApp_review`.`reviewDataCreated`,
`reviewApp_review`.`reviewDataDiscovered`, `reviewApp_review`.`reviewData`,
`reviewApp_review`.`reviewRating`, `reviewApp_review`.`reviewSignature`,
`reviewApp_review`.`reviewSignature2`, `reviewApp_review`.`reviewExternalId`,
`reviewApp_review`.`reviewStatus`, `reviewApp_review`.`reviewWebsite`,
`reviewApp_review`.`language`, `reviewApp_review`.`helpfulVotes`,
`reviewApp_review`.`verified`, `reviewApp_review`.`color`,
`reviewApp_review`.`style`, `reviewApp_review`.`size`,
`reviewApp_review`.`lastUpdated`, `reviewApp_review`.`lastSeen`,
`reviewApp_review`.`deleted`, `reviewApp_review`.`alerted`,
`reviewApp_review`.`analyzed`, `reviewApp_review`.`productLink_id`
FROM `reviewApp_review`
INNER JOIN `reviewApp_productlink` ON (`reviewApp_review`.`productLink_id` = `reviewApp_productlink`.`id`)
INNER JOIN `reviewApp_product` ON (`reviewApp_productlink`.`product_id` = `reviewApp_product`.`id`)
WHERE (`reviewApp_product`.`owner` = 'my product'
AND `reviewApp_productlink`.`customer_id` = '1241'
AND (`reviewApp_review`.`translatedEnContent` LIKE '%%urban%%'
OR (`reviewApp_review`.`reviewContent` LIKE '%%urban%%'
AND `reviewApp_review`.`translatedEnDate` = '0'))
)
ORDER BY `reviewApp_review`.`reviewRating` ASC
LIMIT 10
I want to select reviews from the database and when a result is a number of reviews under 10, it takes a lot of time, more than a minute. And I'm wondering if there is a solution to optimize this query.
I tried to make different operations in the SQL query, as you can see I tried to use STRAIGHT_JOIN, and INNER_JOIN but I didn't have a better time.
The result of the query should be a querySet that contains a list of reviews, matching all those conditions from the query.
From Django I'm using something like that to run this query:
querySet.model.objects.raw(rawQuery)
where rawQuery is the query described above.
Here I have a python function and I want to optimize it because it takes to much time.
def fixQS(querySet):
"""
Optimisation for SQL QUERY using order by
https://dba.stackexchange.com/a/40195/246455
https://docs.djangoproject.com/en/2.1/ref/models/querysets/#extra
STRAIGHT_JOIN
"""
# complete the SQL with params encapsulated in quotes
sql, params = querySet.query.sql_with_params()
newParams = ()
for param in params:
if not str(param).startswith("'"):
if isinstance(param, str):
param = re.sub("'", "\\'", param)
newParams = newParams + ("'{}'".format(param),)
else:
newParams = newParams + (param,)
rawQuery = sql % newParams
# escape the percent used in SQL LIKE statements
rawQuery = re.sub('%', '%%', rawQuery)
# replace SELECT with SELECT STRAIGHT_JOIN
rawQuery = rawQuery.replace('SELECT', 'SELECT STRAIGHT_JOIN')
return querySet.model.objects.raw(rawQuery)
I'm calling it like this :
currentReviews = fixQS(currentReviews)
And when the debugger evaluates this it takes a lot of time.
Here is the result of the command SHOW CREATE TABLE:
CREATE TABLE `reviewApp_review` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`reviewTitle` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
`reviewContent` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
`reviewLink` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
`reviewDataCreated` int(11) NOT NULL,
`reviewDataDiscovered` int(11) NOT NULL,
`reviewRating` varchar(30) COLLATE utf8mb4_unicode_ci NOT NULL,
`reviewSignature` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`reviewStatus` varchar(200) COLLATE utf8mb4_unicode_ci NOT NULL,
`productLink_id` int(11) NOT NULL,
`reviewWebsite` varchar(200) COLLATE utf8mb4_unicode_ci NOT NULL,
`language` varchar(200) COLLATE utf8mb4_unicode_ci NOT NULL,
`verified` tinyint(1) NOT NULL,
`color` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
`size` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
`style` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
`alerted` tinyint(1) NOT NULL,
`reviewUser` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
`reviewUserProfile` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
`reviewSignature2` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`reviewData` int(11) NOT NULL,
`reviewExternalId` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
`helpfulVotes` int(11) NOT NULL,
`lastSeen` int(11) NOT NULL,
`lastUpdated` int(11) NOT NULL,
`translatedEnContent` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
`translatedEnDate` int(11) NOT NULL,
`translatedEnTitle` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
`deleted` tinyint(1) NOT NULL,
`analyzed` datetime(6) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `reviewApp_re_productLink_id_084fa2ec_fk_reviewApp_productlink_id` (`productLink_id`),
KEY `reviewApp_review_reviewSignature_2c31b21d_uniq` (`reviewSignature`),
KEY `reviewApp_review_reviewSignature_224ab822_idx` (`reviewSignature`,`productLink_id`),
KEY `reviewApp_review_fa0816ae` (`reviewSignature2`),
KEY `reviewApp_review_alerted_57344ef1_uniq` (`alerted`),
KEY `reviewApp_review_reviewData_5dd52e46_uniq` (`reviewData`),
KEY `reviewApp_review_reviewRating_d418fcfc_uniq` (`reviewRating`),
KEY `reviewApp_review_reviewWebsite_25d7fd04_uniq` (`reviewWebsite`),
KEY `reviewApp_review_reviewExternalId_32dfb169_uniq` (`reviewExternalId`),
KEY `reviewApp_reviewContent` (`reviewContent`(100)),
KEY `reviewApp_translatedEnContent` (`translatedEnContent`(100)),
KEY `reviewApp_translatedEnDate` (`translatedEnDate`),
KEY `reviewApp_review_da602f0b` (`deleted`),
KEY `reviewApp_review_language_3de815cd_uniq` (`language`),
KEY `reviewApp_review_order_index` (`reviewData`),
KEY `reviewApp_review_rating_order_index` (`reviewRating`),
KEY `reviewApp_review_rating_order_index_desc` (`reviewRating`),
CONSTRAINT `reviewApp_re_productLink_id_084fa2ec_fk_reviewApp_productlink_id` FOREIGN KEY (`productLink_id`) REFERENCES `reviewApp_productlink` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=24274768 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
And this one is for the product:
CREATE TABLE `reviewApp_product` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
`owner` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
`customer_id` int(11) DEFAULT NULL,
`ean` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
`internalCode` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
`sku` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
`asin` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `reviewApp_product_name_37c04b41_uniq` (`name`,`customer_id`),
KEY `reviewApp_product_cb24373b` (`customer_id`),
KEY `reviewApp_product_name_23da262c_uniq` (`name`),
KEY `reviewApp_product_owner_680ac2b4_uniq` (`owner`),
KEY `reviewApp_product_index` (`owner`,`id`),
CONSTRAINT `reviewApp_product_customer_id_7663d434_fk_reviewApp_customer_id` FOREIGN KEY (`customer_id`) REFERENCES `reviewApp_customer` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=31556 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
And also for productLink I have this:
CREATE TABLE `reviewApp_productlink` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`productLink` varchar(2000) COLLATE utf8mb4_unicode_ci NOT NULL,
`customer_id` int(11) NOT NULL,
`productDataCreated` int(11) NOT NULL,
`product_id` int(11) DEFAULT NULL,
`domain` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`httpStatus` int(11) NOT NULL,
`externalProductId` varchar(1000) COLLATE utf8mb4_unicode_ci NOT NULL,
`ratingsNumber` int(11) NOT NULL,
`rating` double DEFAULT NULL,
`fb_data` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `reviewApp_productl_customer_id_f0141212_fk_reviewApp_customer_id` (`customer_id`),
KEY `reviewApp_productlink_9bea82de` (`product_id`),
CONSTRAINT `reviewApp_productl_customer_id_f0141212_fk_reviewApp_customer_id` FOREIGN KEY (`customer_id`) REFERENCES `reviewApp_customer` (`id`),
CONSTRAINT `reviewApp_productlin_product_id_6123214e_fk_reviewApp_product_id` FOREIGN KEY (`product_id`) REFERENCES `reviewApp_product` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=127823 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Remove STRAIGHT_JOIN so that the Optimizer is allowed to try looking at the tables in different orders. The following composite indexes may give the Optimizer better options:
reviewApp_review: INDEX(translatedEnContent, reviewContent,
translatedEnDate, productLink_id, reviewRating)
reviewApp_productlink: INDEX(customer_id, product_id, id)
reviewApp_product: INDEX(owner, id)
translatedEnDate is a DATE? Yet you are testing for "0"??
More
(repeating original, but with aliases to shorten the text)
SELECT
STRAIGHT_JOIN rr.((lots of stuff))
FROM `reviewApp_review` AS rr
INNER JOIN `reviewApp_productlink` AS pl ON (rr.`productLink_id` = pl.`id`)
INNER JOIN `reviewApp_product` AS p ON (pl.`product_id` = p.`id`)
WHERE ( p.`owner` = 'my product'
AND pl.`customer_id` = '1241'
AND (rr.`translatedEnContent` LIKE '%%urban%%'
OR ( rr.`reviewContent` LIKE '%%urban%%'
AND rr.`translatedEnDate` = '0')
) )
ORDER BY rr.`reviewRating` ASC
LIMIT 10
You seem to be gathering all those columns only to throw away all but 10 rows. An optimization is to first find the 10 ids, then fetch the rest of the columns:
SELECT rr.((lots of stuff))
FROM ( SELECT rr2.id
FROM reviewApp_review` AS rr2
INNER JOIN `reviewApp_productlink` AS pl
ON (rr2.`productLink_id` = pl.`id`)
INNER JOIN `reviewApp_product` AS p
ON (pl.`product_id` = p.`id`)
WHERE ( p.`owner` = 'my product'
AND pl.`customer_id` = '1241'
AND ( rr2.`translatedEnContent` LIKE '%%urban%%'
OR ( rr2.`reviewContent` LIKE '%%urban%%'
AND rr2.`translatedEnDate` = '0')
) )
ORDER BY rr2.`reviewRating` ASC
LIMIT 10
) AS ids
INNER JOIN reviewApp_review` AS rr ON ids.id = rr.id
ORDER BY rr.`reviewRating` ASC -- Yes, this needs repeating
p: INDEX(owner)
pl: INDEX(customer_id, product_id)
reviewApp_review: INDEX(productLink_id, reviewRating)
reviewApp_review: INDEX(reviewRating)
These are useless (at least for the current query):
KEY `reviewApp_reviewContent` (`reviewContent`(100)),
KEY `reviewApp_translatedEnContent` (`translatedEnContent`(100)),
Better might be
FULLTEXT(translatedEnContent),
FULLTEXT(reviewContent)
together with
MATCH(reviewContent) AGAINST ('+urban' IN BOOLEAN MODE)
MATCH(translatedEnContent) AGAINST ('+urban' IN BOOLEAN MODE)
But, unfortunately, it won't be cheap to do the subquery. The hope is that 'my product' and '1241' do enough filtering to minimize the usage of LIKE or MATCH.
A minor side note: When you have INDEX(a,b) or UNIQUE(a,b), it is unnecessary to also have INDEX(a). (I see several instances of this.)
Even better...
If possible, get rid of the test on translatedEnDate and combine the two text columns:
FULLTEXT(reviewContent, translatedEnContent)
MATCH(reviewContent, translatedEnContent)
AGAINST ('+urban' IN BOOLEAN MODE)
This will make it possible to start the search with an efficient FT lookup, followed by checking the owner and customer_id. This would avoid a full table scan of the multi-GB reviewApp_review.

MySql query is slow with join - how to speed it up

I have to export 554k records from our mysql db. At the current rate it will take 5 days to export the data and the slowness is mainly caused by the query below. The data structure consists of
Companies
--Contacts
----(Contact)Activities
For the contacts, we have an index on company_id. On the activities table, we have an index for contact_id and company_id which map back to the respective contacts and companies tables.
I need to grab each contact and the latest activity date that they have. This is the query that I'm running and it takes about .5 second to execute.
Select *
from contacts
left outer join (select occurred_at
,contact_id
from activities
where occurred_at is not null
group by contact_id
order by occurred_at desc) activities
on contacts.id = activities.contact_id
where company_id = 20
If I remove the join and just select * from contacts where company_id=20 the query executes in .016 sec.
If I use Explain for info on the join query I get this
Any ideas on how I can speed this up?
Edit:
Here are the table definitions.
CREATE TABLE `companies` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`street_address` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`city` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`state` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`county` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`website` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`external_id` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`user_id` int(11) DEFAULT NULL,
`falloff_date` date DEFAULT NULL,
`zipcode` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`phone` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`company_id` int(11) DEFAULT NULL,
`order_count` int(11) NOT NULL DEFAULT '0',
`active_job_count` int(11) NOT NULL DEFAULT '0',
`duplicate_of` int(11) DEFAULT NULL,
`warm_date` datetime DEFAULT NULL,
`employee_size` int(11) DEFAULT NULL,
`dup_checked` tinyint(1) DEFAULT '0',
`rating` int(11) DEFAULT NULL,
`delinquent` tinyint(1) DEFAULT '0',
`cconly` tinyint(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `index_companies_on_name` (`name`),
KEY `index_companies_on_user_id` (`user_id`),
KEY `index_companies_on_company_id` (`company_id`),
KEY `index_companies_on_external_id` (`external_id`),
KEY `index_companies_on_state_and_dup_checked` (`id`,`state`,`dup_checked`,`duplicate_of`),
KEY `index_companies_on_dup_checked` (`id`,`dup_checked`),
KEY `index_companies_on_dup_checked_name` (`dup_checked`,`name`),
KEY `index_companies_on_county` (`county`,`state`)
) ENGINE=InnoDB AUTO_INCREMENT=15190300 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `contacts` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`first_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`last_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`title` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`phone` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`extension` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`fax` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`email` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`active` tinyint(1) DEFAULT NULL,
`main` tinyint(1) DEFAULT NULL,
`company_id` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`external_id` int(11) DEFAULT NULL,
`second_phone` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_contacts_on_company_id` (`company_id`),
KEY `index_contacts_on_first_name` (`first_name`),
KEY `index_contacts_on_last_name` (`last_name`),
KEY `index_contacts_on_phone` (`phone`),
KEY `index_contacts_on_email` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=11241088 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `activities` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`kind` int(11) DEFAULT NULL,
`contact_id` int(11) DEFAULT NULL,
`call_status` int(11) DEFAULT NULL,
`occurred_at` datetime DEFAULT NULL,
`notes` text COLLATE utf8_unicode_ci,
`user_id` int(11) DEFAULT NULL,
`scheduled_for` datetime DEFAULT NULL,
`priority` tinyint(1) DEFAULT NULL,
`company_id` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`from_user_id` int(11) DEFAULT NULL,
`to_user_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_activities_on_contact_id` (`contact_id`),
KEY `index_activities_on_user_id` (`user_id`),
KEY `index_activities_on_company_id` (`company_id`)
) ENGINE=InnoDB AUTO_INCREMENT=515340 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
This is a greatest-n-per-group query, which comes up frequently on Stack Overflow.
Here's a solution that uses a MySQL 8.0 window function:
WITH latest_activities AS (
SELECT contact_id, occurred_at,
ROW_NUMBER() OVER (PARTITION BY contact_id ORDER BY occurred_at DESC) AS rn
FROM activities
)
SELECT *
FROM contacts AS c
LEFT OUTER JOIN latest_activities
ON c.id = latest_activities.contact_id AND latest_activities.rn = 1
WHERE c.company_id = 20
Here's a solution that should work on pre-8.0 versions:
SELECT c.*, a.*
FROM contacts AS c
LEFT OUTER JOIN activities AS a ON a.contact_id = c.id
LEFT OUTER JOIN activities AS a2 ON a2.contact_id = c.id
AND a2.occurred_at > a.occurred_at
WHERE c.company_id = 20
AND a2.contact_id IS NULL;
Another solution:
SELECT c.*, a.*
FROM contacts AS c
LEFT OUTER JOIN activities AS a ON a.contact_id = c.id
LEFT OUTER JOIN (
SELECT c2.contact_id, MAX(a2.occurred_at) AS occurred_at
FROM activities AS a2
INNER JOIN contacts AS c2 ON a2.contact_id = c2.id
WHERE c2.company_id = 20
GROUP BY c2.contact_id ORDER BY NULL
) AS latest_activities
ON latest_activities.contact_id = c.id
AND latest_activities.occurred_at = a.occurred_at
WHERE c.company_id = 20
It would be helpful to create a new index on activities (contact_id, occurred_at).
Don't use subqueries in the FROM clause if you can help it. They impede the MySQL optimizer. So, if you want one row:
Select c.*, a.occurred_at
from contacts c left outer join
from activities a
on c.id = a.contact_id and
a.occurred_at is not null
where c.company_id = 20
order by a.occurred_at desc
limit 1;
If you want one row per contact_id:
Select c.*, a.occurred_at
from contacts c left outer join
from activities a
on c.id = a.contact_id and
a.occurred_at is not null and
a.occurred_at = (select max(a2.occurred_at)
from activities a2
where a2.contact_id = a.contact_id
)
where c.company_id = 20
order by a.occurred_at desc
limit 1;
This can make use of an index on activities(contact_id, occured_at). and contact(company_id, contact_id).
Your query is doing one thing that is a clear no-no -- and no longer supported by the default settings in the most recent versions of MySQL. You have unaggregated columns in a select that are not in the group by. The contact_id should be generating an error.
I feel like I am overlooking something with how complicated the other answers are, but I would think this would be all you need.
SELECT c.*
, MAX(a.occurred_at) AS occurred_at
FROM contacts AS c
LEFT JOIN activities AS a
ON c.id = a.contact_id AND a.occurred_at IS NOT NULL
WHERE c.company_id = 20
GROUP BY c.id;
Notes: (1) this assumes you didn't actually want the duplicate contact_id from your original subquery to be in the final results. (2) This also assumes your server is not configured to require a full group by; if it is, you will need to manually expand c.* into the full column list, and copy that list to the GROUP BY clause as well.
Expanding on dnoeth's comments to your question; if you are not querying each company separately for a particular reason (chunking for load, code structure handling this also handles other stuff company by company, whatever), you could tweak the above query like so to get all your results in one query.
SELECT con.*
, MAX(a.occurred_at) AS occurred_at
FROM companies AS com
INNER JOIN contacts AS con ON com.id = con.company_id
LEFT JOIN activities AS a
ON con.id = a.contact_id AND a.occurred_at IS NOT NULL
WHERE [criteria for companies chosen to be queried]
GROUP BY con.id
ORDER BY con.company_id, con.id
;

sql can't figure out the query

I have three tables:
CREATE TABLE IF NOT EXISTS `contacts` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`providerId` int(10) unsigned NOT NULL DEFAULT '0',
`requestId` int(10) unsigned NOT NULL DEFAULT '0',
`status` binary(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
)
CREATE TABLE IF NOT EXISTS `messages` (
`id` int(255) NOT NULL AUTO_INCREMENT,
`fromuid` int(255) NOT NULL,
`touid` int(255) NOT NULL,
`sentdt` datetime NOT NULL,
`read` tinyint(1) NOT NULL DEFAULT '0',
`readdt` datetime DEFAULT NULL,
`messagetext` longtext CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
PRIMARY KEY (`id`),
KEY `id` (`id`)
)
CREATE TABLE IF NOT EXISTS `users` (
`id` bigint(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`email` varchar(255) DEFAULT NULL,
`mobile` varchar(15) NOT NULL,
`password` varchar(255) NOT NULL,
`city` varchar(255) NOT NULL,
`zip` varchar(15) DEFAULT NULL,
`device` varchar(50) DEFAULT NULL,
`version` varchar(10) DEFAULT NULL,
`photo` varchar(255) DEFAULT NULL,
`created` datetime NOT NULL,
`live` enum('0','1') NOT NULL DEFAULT '1',
`authenticationTime` datetime NOT NULL,
`userKey` varchar(255) DEFAULT NULL,
`IP` varchar(50) DEFAULT NULL,
`port` int(10) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `firstname` (`mobile`,`city`,`zip`)
)
And this SQL query that finds out friends/contacts for specified user (user id 1 in this case):
SELECT u.id
,u.mobile
,u.name
,(NOW() - u.authenticationTime) AS authenticateTimeDifference
,u.IP
,f.providerid
,f.requestid
,f.status
,u.port
FROM contacts f
LEFT JOIN users u ON u.id =
IF (
f.providerid = 1
,f.requestid
,f.providerid
) WHERE (
f.providerid = 1
AND f.status = 1
)
OR f.requestid = 1
That works fine but I want to be able to also join messages table and show user's friends/contacts who have talked latest (meaning latest conversations first) with order by messages.sentdt desc option but I am unable to figure out how to do that, I tried all joins but none worked :(
Your help will be greatly appreciated. Thanks
Update
Here is sample data above query returns:
In that same resultset, I want to be able to sort based on order by messages.sentdt desc but I am not sure how to pull that in and sort resultset by latest message first
Try this:
select u.id
, u.mobile
, u.name
, (NOW() - u.authenticationTime) as authenticateTimeDifference
, u.IP
, f.providerid
, f.requestid
, f.status
, u.port
from contacts f
left join users u
on u.id = if (f.providerid = 1, f.requestid, f.providerid)
left join (select fromuid, max(sentdt) as sentdt from messages group by fromuid) m
on m.fromuid = if (f.providerid = 1, f.providerid, f.requestid)
where (f.providerid = 1 and f.status = 1)
or f.requestid = 1
order by m.sentdt

Improve speed of MySQL Update query with Inner Join and Where

I have an update query that is running slow and wanted to see if i could improve performance
Here is the query
update appmaster a
INNER JOIN appid b ON a.activity = b.activity
SET a.activity_id = b.activity_id
WHERE a.activity_id IS null
appmaster contains about 9 millions records but less than 1 million of those have activity_id that is null
appid contains 171,000 records
I have an tried setting up an index on activity for both tables but doesn't seem to help.
I would set an index on appmaster activity_id because that is my where condition but that is the value im setting.
Any ideas?
appmaster table
CREATE TABLE appfiltration.appmaster (
id int(11) NOT NULL AUTO_INCREMENT,
name varchar(75) DEFAULT NULL,
activity varchar(150) NOT NULL,
class varchar(150) NOT NULL,
device varchar(60) NOT NULL,
version varchar(60) DEFAULT NULL,
theme varchar(50) DEFAULT NULL,
upload_date date DEFAULT NULL,
activity_id int(11) DEFAULT NULL,
PRIMARY KEY (id),
INDEX idx_activity (activity)
)
ENGINE = INNODB
AUTO_INCREMENT = 9457807
AVG_ROW_LENGTH = 136
CHARACTER SET utf8
COLLATE utf8_unicode_ci;
appid table
CREATE TABLE appfiltration.appid (
activity_id int(11) NOT NULL AUTO_INCREMENT,
activity varchar(200) NOT NULL,
name varchar(50) DEFAULT NULL,
result int(11) NOT NULL DEFAULT 0,
change1 float DEFAULT NULL,
result_last int(11) NOT NULL DEFAULT 0,
change2 float DEFAULT NULL,
result_last2 int(11) NOT NULL DEFAULT 0,
PRIMARY KEY (activity_id, activity)
)
ENGINE = INNODB
AUTO_INCREMENT = 251064
AVG_ROW_LENGTH = 63
CHARACTER SET utf8
COLLATE utf8_unicode_ci;
results from
EXPLAIN SELECT a.activity_id, b.activity_id
FROM appmaster a
JOIN appid b ON a.activity = b.activity
WHERE a.activity_id IS NULL

Join two tables only if a value is null or a specific number

I have three tables in a database:
Product table - +100000 entries
Attribute table (list of possible attributes of a product)
Product attribtue table (which contains the value of the attribute of a product)
I am looking for 8 random products and one of their attributes (attribute_id = 2), but if a product hasn't this attribute it should appear at the return of the query. I have been trying some sql queries without any succesful result because my return only shows the products that have the attribute and hide the others.
My three tables are like this:
CREATE TABLE `product` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`sku` varchar(20) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
`name` varchar(90) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
`provider_id` int(11) unsigned DEFAULT NULL,
`url` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`active` int(1) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `sku_UNIQUE` (`sku`)
) ENGINE=InnoDB AUTO_INCREMENT=123965 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `attribute` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(50) NOT NULL DEFAULT '',
`data_type` varchar(50) DEFAULT '',
PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=30 DEFAULT CHARSET=latin1;
CREATE TABLE `product_attribute` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`product_id` int(11) unsigned NOT NULL,
`attribute_id` int(11) unsigned NOT NULL DEFAULT '6',
`value` longtext NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `product_id` (`product_id`,`attribute_id`)
) ENGINE=InnoDB AUTO_INCREMENT=869437 DEFAULT CHARSET=latin1;
And this is one of the queries I tried, I thought it was correct but it have the same problem as the others...
SELECT product.id, product.sku, product.name,provider.name as provider_name,
product_attribute.value as author
FROM (`product`)
LEFT JOIN `provider` ON `product`.`provider_id` = `provider`.`id`
LEFT JOIN `product_attribute` ON `product`.`id` = `product_attribute`.`product_id`
WHERE `product`.`active` = '1' AND `product`.`url` IS NOT NULL
AND (`product_attribute`.`attribute_id` = 8 OR `product_attribute`.`attribute_id` IS NULL)
AND `product`.`provider_id` = '7' ORDER BY RAND() LIMIT 8
I was trying with left, inner and right join and nothing works.
You should put the condition for the left-joined table in the join, not the where clause
...
from product
left join provider ON product.provider_id = provider.id
left join product_attribute on product.id = product_attribute.product_id
and product_attribute.attribute_id = 8
where `product`.`active` = '1'
and `product`.`url` IS NOT NULL
and `product`.`provider_id` = '7'
...