Problem with sorting comments in date order - mysql

I am novice in MySQL and I have a problem with sorting two tables.
This SQL is about sorting newest comments on books, but I am getting these books sorted by first comments on them, not on latest.
SELECT b.*, c.date_added as date FROM books b
LEFT JOIN comments c ON (b.id = c.book_id)
GROUP BY b.id
ORDER BY date DESC
LIMIT 5
CREATE TABLE IF NOT EXISTS `books` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`caption` varchar(255) COLLATE utf8_bin NOT NULL,
`author` varchar(255) COLLATE utf8_bin NOT NULL,
`pages` int(11) NOT NULL,
`category_id` int(11) NOT NULL,
`filename` varchar(255) COLLATE utf8_bin NOT NULL,
`description` text COLLATE utf8_bin NOT NULL,
`date_added` datetime NOT NULL,
`publisher` varchar(255) COLLATE utf8_bin NOT NULL,
`price` decimal(10,2) NOT NULL,
`times_sold` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
CREATE TABLE IF NOT EXISTS `comments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`book_id` int(11) NOT NULL,
`author` varchar(255) COLLATE utf8_bin NOT NULL,
`email` varchar(255) COLLATE utf8_bin NOT NULL,
`body` text COLLATE utf8_bin NOT NULL,
`date_added` datetime NOT NULL,
`approved` tinyint(4) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Thank you for your time.

This should work out. You are specifying the group on b.id, but since you are multiple comments for each book, you need an aggregate function on c.date_added. In this case you can use MAX to show the most recent comment date.
SELECT b.*, MAX(c.date_added) as date FROM books b
LEFT JOIN comments c ON (b.id = c.book_id)
GROUP BY b.id
ORDER BY MAX(c.date_added) DESC
LIMIT 5

The 'Group By' clause is not correct and you need to remove it.
'Group By' is used for aggregate functions like Min() and Max().
Try This
SELECT b.*, c.date_added FROM books b
JOIN comments c ON
c.book_id = b.id
ORDER BY c.date_added DESC
LIMIT 5

There's no field called 'date', you need to order it by date_added (ORDER BY date_added DESC) as it is in your table!
If it's still not in the right order, just use ASC instead of DESC.

Related

How to optimize MySQL query with multiple identical subqueries and multiple grouping

Problem statement
The two identical sub-queries (i.e., "a" and "b") are used to derive sub-query "n" which I further aggregate to get the final result. The response time of the query is not optimal, can anyone share some ideas to help optimize? I tried to comebine "a" and "b" as well as "a" and "n" but neither turns out to be dead ends per my knowledge...
select n.businessLiaision,
n.channel,
n.name,
n.dt,
n.ifNew,
count(n.productRowId),
sum(n.totalQty),
sum(n.totalAmount)
from ( select a.productRowId,
a.name,
a.rowId,
a.dt,
a.businessLiaision,
a.channel,
a.ct,
a.totalQty,
a.totalAmount,
case when a.ct = sum(b.ct) then 'true' else 'false' end as 'ifNew'
from ( select d.productRowId,
p.name,
DATE_FORMAT(o.effectiveTime, '%m/%Y') as 'dt',
p.rowId,
p.businessLiaision,
p.channel,
count(*) as 'ct',
sum(d.qty) as 'totalQty',
sum(d.amountPostDiscount) as 'totalAmount'
from transactionParty as p
join transactionOrderHist as o on p.rowId = o.transactionPartyRowId
join transactionOrderDetailHist as d on o.rowId = d.orderRowId
where o.businessType = 'sales'
group by d.productRowId, p.name, DATE_FORMAT(o.effectiveTime, '%m/%Y'), p.rowId, p.businessLiaision, p.channel
) as a
left join ( select d.productRowId,
p.name,
DATE_FORMAT(o.effectiveTime, '%m/%Y') as 'dt',
count(*) as 'ct'
from transactionParty as p
join transactionOrderHist as o on p.rowId = o.transactionPartyRowId
join transactionOrderDetailHist as d on o.rowId = d.orderRowId
where o.businessType = 'sales'
group by d.productRowId, p.name, DATE_FORMAT(o.effectiveTime, '%m/%Y')
) as b on b.productRowId = a.productRowId and b.name = a.name and b.dt <= a.dt
group by a.productRowId, a.name, a.rowId, a.dt, a.ct, a.businessLiaision, a.channel, a.ct, a.totalQty, a.totalAmount
) as n
group by n.businessLiaision, n.channel, n.name, n.dt, n.ifNew
Explain plan result
enter image description here
Table descriptions
transactionParty
CREATE TABLE `transactionParty` (
`rowId` varchar(50) NOT NULL,
`name` varchar(100) DEFAULT NULL,
`code` varchar(20) DEFAULT NULL,
`businessLiaision` varchar(20) DEFAULT NULL,
`type` varchar(20) DEFAULT NULL,
`contractualType` varchar(20) DEFAULT NULL,
`paymentMethod` varchar(20) DEFAULT NULL,
`partyGroup` varchar(20) DEFAULT NULL,
`channel` varchar(20) DEFAULT NULL,
`costCenter` varchar(20) DEFAULT NULL,
`warehouseRowId` varchar(100) DEFAULT NULL,
`taxOption` varchar(20) DEFAULT NULL,
PRIMARY KEY (`rowId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
transactionOrderHist
CREATE TABLE `transactionOrderHist` (
`rowId` varchar(50) NOT NULL COMMENT '明道记录ID',
`orderId` varchar(50) DEFAULT NULL COMMENT '单据ID',
`orderCreationTime` datetime DEFAULT NULL COMMENT '单据创建时间',
`orderCreator` varchar(20) DEFAULT NULL COMMENT '单据创建人',
`businessType` varchar(20) DEFAULT NULL COMMENT '业务类型',
`businessLiaision` varchar(20) DEFAULT NULL COMMENT '内部业务负责人,如业务员/采购/文员',
`transactionPartyRowId` varchar(50) DEFAULT NULL COMMENT '往来单位',
`outboundWarehouse` varchar(50) DEFAULT NULL COMMENT '发货仓',
`inboundWarehouse` varchar(50) DEFAULT NULL COMMENT '收货仓',
`outboundWarehouseType` varchar(50) DEFAULT NULL,
`inboundWarehouseType` varchar(50) DEFAULT NULL,
`effectiveTime` datetime DEFAULT '0000-00-00 00:00:00' COMMENT '单据过账时间',
`orderEffectuater` varchar(20) DEFAULT NULL COMMENT '单据过账人',
`remark` varchar(200) DEFAULT NULL,
`productCount` int(10) DEFAULT NULL,
`totalUnitCount` int(10) DEFAULT NULL,
`totalCostAmount` decimal(10,0) DEFAULT NULL,
`totalPostDiscountAmount` decimal(10,0) DEFAULT NULL,
`paymentStatus` varchar(50) DEFAULT NULL,
`paymentDate` datetime DEFAULT NULL,
`outboundWarehouseRowId` varchar(50) DEFAULT NULL,
`inboundWarehouseRowId` varchar(50) DEFAULT NULL,
PRIMARY KEY (`rowId`) USING BTREE,
KEY `orderEffectiveDate` (`effectiveTime`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
transactionOrderDetailHist
CREATE TABLE `transactionOrderDetailHist` (
`rowId` varchar(50) NOT NULL,
`orderRowId` varchar(50) DEFAULT NULL,
`productRowId` varchar(50) DEFAULT NULL,
`qty` int(20) DEFAULT NULL,
`price` decimal(20,2) DEFAULT NULL,
`cost` decimal(20,2) DEFAULT NULL,
`amount` decimal(20,2) DEFAULT NULL,
`amountPostDiscount` decimal(20,2) DEFAULT NULL,
`type` varchar(50) DEFAULT NULL,
`effectiveTime` datetime DEFAULT NULL,
`costAmount` decimal(20,2) DEFAULT NULL,
PRIMARY KEY (`rowId`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8
FROM ( SELECT ... )
[LEFT] JOIN ( SELECT ... ) ON ...
is notoriously inefficient. Try to rewrite the query to avoid such.
Please provide EXPLAIN SELECT ...; meanwhile, I will do some guessing.
These indexes may help:
d: INDEX(orderRowId, productRowId, qty, amountPostDiscount)
o: INDEX(businessType, transactionPartyRowId, effectiveTime, rowId)
Looking at the final GROUP BY. It does not necessarily order the rows. If you want a particular order, add an ORDER BY. Be aware that the dt part is not quite in chronological order since it has month before year. The other group-bys are in derived tables, so an ORDER BY would be ignored.
If and b.name = a.name and b.dt <= a.dt is for doing "groupwise-max", that is the performance bottleneck. There are much better ways to do it. See Groupwise-Max .
If the text is in Chinese, you should consider converting to utf8mb4 -- utf8 is missing the 4-byte UTF-8 Chinese characters.
It is OK to have the PRIMARY KEY as VARCHAR(50), but it could be better to use INT. Is some outside process providing the rowId?

Mysql - PHP - Checking a users posts and comments likes

Would anyone be able to recommend the best way to check if a user has liked a post or comment?
I am currently building a website that has similair features to Facebooks wall.
My website will show a 'wall' of posts from people you follow that you can like or comment on.
For example, comments I have:
Comments table containing: id, user_id, text (plus other columns)
Comments Likes table: comment_id, user_id, created
This is the current query I use to get the comments and checks if user has liked it using an inner join on the likes table. It uses an IF() to return liked as either 1 or empty, which works fine:
SELECT comments.id, comments.post_id, comments.user_id, comments.reply_id, comments.created, comments.text, comments.likes, comments.replies, comments.flags, user.name, user.tagline, user.photo_id, user.photo_file, user.public_key,
**IF(likes.created IS NULL, '', '1') as 'liked'**
FROM events_feed_comments AS comments
INNER JOIN user AS user ON comments.user_id = user.id
**LEFT JOIN events_feed_comments_likes AS likes ON comments.id = likes.comment_id AND likes.user_id = :user**
WHERE comments.post_id = :post_id AND comments.reply_id IS NULL
ORDER BY comments.created DESC
LIMIT :limit OFFSET :offset
However, I realise that this will not be cacheable for anyone else as it contains the logged in users likes. There may end up being a lot of posts and so will need to introduce caching.
I am wondering what the best way to check the likes will be?
At the moment these are the solutions i can think of:
I could either select all the comments limited to say 30 at a time (cacheable)
Then loop over each result doing a fetch/count query in the likes table to see if a user has liked it.
I could do a fetch from the likes table doing a where in clause using the returned 30 id results.
Then do some sort of looping to see if the likes value matches the returned results.
Fetch all of the comments (cacheable), fetch all of a users likes (could be cacheable?), then do some looping / comparing to see if the values match.
I am just not sure what would be the best solution, or if there is some other recommended way to achieve this?
I am thinking the second approach may be best but i'm interested to see what you think?
Updates to show the table Create statements
CREATE TABLE `events_feed_comments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`post_id` int(11) NOT NULL,
`reply_id` int(11) DEFAULT NULL,
`text` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
`likes` int(11) NOT NULL,
`replies` int(11) NOT NULL,
`flags` smallint(6) NOT NULL,
`created` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
CREATE TABLE `events_feed_comments_likes` (
`comment_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`created` datetime NOT NULL,
PRIMARY KEY (`comment_id`,`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`photo_id` int(11) DEFAULT NULL,
`email` varchar(180) COLLATE utf8mb4_unicode_ci NOT NULL,
`roles` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL CHECK (json_valid(`roles`)),
`password` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`name` varchar(80) COLLATE utf8mb4_unicode_ci NOT NULL,
`tagline` varchar(120) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`biography` varchar(2000) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`social` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL CHECK (json_valid(`social`)),
`specialties` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`available` smallint(6) NOT NULL DEFAULT 0,
`theme` varchar(7) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`photo_file` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`following` int(11) NOT NULL,
`followers` int(11) NOT NULL,
`is_private` smallint(6) NOT NULL,
`public_key` varchar(32) COLLATE utf8mb4_unicode_ci NOT NULL,
`show_groups` smallint(6) NOT NULL,
`show_feed` smallint(6) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UNIQ_8D93D649E7927C74` (`email`),
UNIQUE KEY `UNIQ_8D93D64966F9D463` (`public_key`),
UNIQUE KEY `UNIQ_8D93D6497E9E4C8C` (`photo_id`),
CONSTRAINT `FK_8D93D6497E9E4C8C` FOREIGN KEY (`photo_id`) REFERENCES `photos` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=16 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Instead of using If in your SQL query, consider using SQL Case to simplify the query.
CASE --
WHEN ---- THEN '----'
ELSE '---'
END
For performance:
comments: INDEX(post_id, reply_id, created)
likes: INDEX(comment_id, user_id, created)
Those improvements may eliminate the need for "caching".
Please put "filtering" in WHERE, such as AND likes.user_id = :user** and put "relations" in ON. It can matter when using LEFT, and does help a human reading the query.
If events_feed_comments_likes is a many-to-many mapping table, you may want INDEX(user_id) also.
I assume this is the query in question:
SELECT comments.id, comments.post_id, comments.user_id, comments.reply_id,
comments.created, comments.text, comments.likes, comments.replies,
comments.flags, user.name, user.tagline,
user.photo_id, user.photo_file, user.public_key,
IF(likes.created IS NULL, '', '1') as 'liked'
FROM events_feed_comments AS comments
INNER JOIN user AS user ON comments.user_id = user.id
LEFT JOIN events_feed_comments_likes AS likes ON comments.id = likes.comment_id
AND likes.user_id = :user
WHERE comments.post_id = :post_id
AND comments.reply_id IS NULL
ORDER BY comments.created DESC
LIMIT :limit OFFSET :offset

How to speed up SQL query with multiple JOINs?

The below SQL query took 8.0943 seconds to execute. Is there a better way to speed this up?
SELECT
e.idno, e.estatus,
p.idno, p.id, p.time, p.date, p.employee, p.status, p.comment
FROM e_company_data e
INNER JOIN people_attendance p ON p.idno = e.idno
WHERE p.id = (SELECT MAX(id) FROM people_attendance p1
WHERE p1.idno = p.idno)
AND e.estatus = 1 ORDER BY e.idno
I have already indexed the following.
Table: people_attendance
Columns: idno, date, time, employee, status, comment
Table: e_company_data
Columns: idno, estatus
I might have done wrong on the indexes. Any help would be greatly appreciated. Thanks.
(From pastebin)
CREATE TABLE `people_attendance` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`reference` int(11) DEFAULT NULL,
`idno` varchar(11) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`date` date DEFAULT NULL,
`employee` varchar(80) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`status` varchar(15) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`time` time DEFAULT NULL,
`comment` varchar(80) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`reason` varchar(80) COLLATE utf8mb4_unicode_ci NOT NULL,
`counter` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idxidno` (`idno`),
KEY `idxattendance` (`employee`,`status`,`date`,`time`,`comment`) USING BTREE
) ENGINE=MyISAM AUTO_INCREMENT=12888 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
CREATE TABLE `e_company_data` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`reference` int(11) NOT NULL,
`company` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT \'\',
`department` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT \'0\',
`jobposition` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT \'\',
`companyemail` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT \'\',
`idno` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT \'\',
`pin` varchar(4) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`startdate` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT \'\',
`dateregularized` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT \'\',
`reason` varchar(455) COLLATE utf8mb4_unicode_ci DEFAULT \'\',
`leaveprivilege` int(11) DEFAULT NULL,
`estatus` int(2) NOT NULL,
PRIMARY KEY (`id`),
KEY `idxcompdata` (`idno`,`department`,`estatus`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=130 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Give this a try:
SELECT e.idno, e.estatus, p.idno, p.id, p.time, p.date, p.employee,
p.status, p.comment
FROM ( SELECT idno, MAX(id) AS last_id
FROM people_attendance
GROUP BY idno ) AS x
JOIN e_company_data e USING(idno)
JOIN people_attendance p ON p.id = x.last_id
WHERE e.estatus = 1
ORDER BY e.idno
The principle is to turn the correlated subquery into a derived table. Instead of 130 probes, it is one quick scan of a covering INDEX(idno, id) to get the 130 rows. After that, the rest is efficient JOINs.
Also, add INDEX(idno, status) (in either order) to e_company_data.
Possibly using window functions:
SELECT e.idno, e.estatus,
p.idno, p.id, p.time, p.date, p.employee, p.status, p.comment
FROM e_company_data e JOIN
(SELECT p.*, ROW_NUMBER() OVER (PARTITION BY p.idno ORDER BY p.id DESC) as seqnum
FROM people_attendance p
) p
ON p.idno = e.idno AND seqnum = 1
WHERE e.estatus = 1
ORDER BY e.idno;
This should benefit from indexes on people_attendance(idno, id desc) and e_company_data(status, idno).
EDIT:
For your version of the query:
SELECT e.idno, e.estatus,
p.idno, p.id, p.time, p.date, p.employee, p.status, p.comment
FROM e_company_data e JOIN
people_attendance p
ON p.idno = e.idno
WHERE p.id = (SELECT MAX(p2.id)
FROM people_attendance p2
WHERE p2.idno = p.idno
) AND
e.estatus = 1
ORDER BY e.idno;
I would recommend indexes on e_company_data(status, idno) and people_attendance(idno, id).
In addition to Rick James answer, keep in mind that your query is slow on agregate function "SELECT MAX(id)". Think about to add field which on update would keep max(id).

MySQL Select Query running very slow

I have a MySQL query that works but is very slow. I am guessing due to the amount of joins.
SELECT
order_header.order_head_id,
order_header.order_date,
order_header.status,
suppliers.supplier,
categories.category,
order_header.user,
order_header.sage_ref,
SUM(order_lines.total_price) AS price
FROM
order_header
LEFT JOIN
order_lines ON order_header.order_head_id = order_lines.order_head_id
LEFT JOIN
suppliers ON order_header.supplier_id = suppliers.supp_id
LEFT JOIN
categories ON order_header.category = categories.cat_id
WHERE
order_header.status LIKE '%'
AND order_header.order_head_id LIKE '%'
AND order_header.user LIKE '%'
GROUP BY order_header.order_head_id
ORDER BY order_head_id DESC
LIMIT 50;
Results of the EXPLAIN query
SHOW CREATE TABLE results
CREATE TABLE `categories` (
`cat_id` int(11) NOT NULL AUTO_INCREMENT,
`category` varchar(45) DEFAULT NULL,
`status` varchar(45) DEFAULT NULL,
PRIMARY KEY (`cat_id`)
) ENGINE=InnoDB AUTO_INCREMENT=63 DEFAULT CHARSET=latin1
CREATE TABLE `order_header` (
`order_head_id` int(11) NOT NULL AUTO_INCREMENT,
`status` varchar(45) DEFAULT NULL,
`category` varchar(45) NOT NULL,
`order_date` date DEFAULT NULL,
`supplier_id` varchar(45) NOT NULL,
`user` varchar(45) DEFAULT NULL,
`sage_ref` varchar(45) DEFAULT NULL,
`query_notes` varchar(500) DEFAULT NULL,
PRIMARY KEY (`order_head_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2249 DEFAULT CHARSET=latin1
CREATE TABLE `order_lines` (
`order_lines_id` int(11) NOT NULL AUTO_INCREMENT,
`order_head_id` int(11) DEFAULT NULL,
`qty` int(11) DEFAULT NULL,
`description` varchar(255) DEFAULT NULL,
`unit_price` decimal(65,2) DEFAULT NULL,
`total_price` decimal(65,2) DEFAULT NULL,
PRIMARY KEY (`order_lines_id`)
) ENGINE=InnoDB AUTO_INCREMENT=3981 DEFAULT CHARSET=latin1
CREATE TABLE `suppliers` (
`supp_id` int(11) NOT NULL AUTO_INCREMENT,
`supplier` varchar(255) DEFAULT NULL,
`status` varchar(225) DEFAULT NULL,
PRIMARY KEY (`supp_id`)
) ENGINE=InnoDB AUTO_INCREMENT=161 DEFAULT CHARSET=latin1
SQL Version 5.6.30
I am not great on MySQL and was wondering if anyone can see a way to improve the query so that it runs quicker.
Your help would be gratefully appreciated.
Many thanks,
John
It can make sense to wrap the first (left) join into a GROUP BY subquery. GROUP BY and LIMIT will limit the number of row which will be used in the following two joins:
SELECT
x.order_head_id,
x.order_date,
x.status,
suppliers.supplier,
categories.category,
x.user,
x.sage_ref,
x.price
FROM (
SELECT
order_header.supplier_id,
order_header.category,
order_header.order_head_id,
order_header.order_date,
order_header.status,
order_header.user,
order_header.sage_ref,
SUM(order_lines.total_price) AS price
FROM order_header
LEFT JOIN order_lines ON order_header.order_head_id = order_lines.order_head_id
WHERE order_header.status LIKE '%'
AND order_header.order_head_id LIKE '%'
AND order_header.user LIKE '%'
GROUP BY order_header.order_head_id
ORDER BY order_head_id DESC
LIMIT 50
) x
LEFT JOIN suppliers ON x.supplier_id = suppliers.supp_id
LEFT JOIN categories ON x.category = categories.cat_id
ORDER BY order_head_id DESC

MySql query is slow with join - how to speed it up

I have to export 554k records from our mysql db. At the current rate it will take 5 days to export the data and the slowness is mainly caused by the query below. The data structure consists of
Companies
--Contacts
----(Contact)Activities
For the contacts, we have an index on company_id. On the activities table, we have an index for contact_id and company_id which map back to the respective contacts and companies tables.
I need to grab each contact and the latest activity date that they have. This is the query that I'm running and it takes about .5 second to execute.
Select *
from contacts
left outer join (select occurred_at
,contact_id
from activities
where occurred_at is not null
group by contact_id
order by occurred_at desc) activities
on contacts.id = activities.contact_id
where company_id = 20
If I remove the join and just select * from contacts where company_id=20 the query executes in .016 sec.
If I use Explain for info on the join query I get this
Any ideas on how I can speed this up?
Edit:
Here are the table definitions.
CREATE TABLE `companies` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`street_address` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`city` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`state` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`county` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`website` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`external_id` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`user_id` int(11) DEFAULT NULL,
`falloff_date` date DEFAULT NULL,
`zipcode` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`phone` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`company_id` int(11) DEFAULT NULL,
`order_count` int(11) NOT NULL DEFAULT '0',
`active_job_count` int(11) NOT NULL DEFAULT '0',
`duplicate_of` int(11) DEFAULT NULL,
`warm_date` datetime DEFAULT NULL,
`employee_size` int(11) DEFAULT NULL,
`dup_checked` tinyint(1) DEFAULT '0',
`rating` int(11) DEFAULT NULL,
`delinquent` tinyint(1) DEFAULT '0',
`cconly` tinyint(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `index_companies_on_name` (`name`),
KEY `index_companies_on_user_id` (`user_id`),
KEY `index_companies_on_company_id` (`company_id`),
KEY `index_companies_on_external_id` (`external_id`),
KEY `index_companies_on_state_and_dup_checked` (`id`,`state`,`dup_checked`,`duplicate_of`),
KEY `index_companies_on_dup_checked` (`id`,`dup_checked`),
KEY `index_companies_on_dup_checked_name` (`dup_checked`,`name`),
KEY `index_companies_on_county` (`county`,`state`)
) ENGINE=InnoDB AUTO_INCREMENT=15190300 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `contacts` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`first_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`last_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`title` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`phone` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`extension` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`fax` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`email` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`active` tinyint(1) DEFAULT NULL,
`main` tinyint(1) DEFAULT NULL,
`company_id` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`external_id` int(11) DEFAULT NULL,
`second_phone` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_contacts_on_company_id` (`company_id`),
KEY `index_contacts_on_first_name` (`first_name`),
KEY `index_contacts_on_last_name` (`last_name`),
KEY `index_contacts_on_phone` (`phone`),
KEY `index_contacts_on_email` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=11241088 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `activities` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`kind` int(11) DEFAULT NULL,
`contact_id` int(11) DEFAULT NULL,
`call_status` int(11) DEFAULT NULL,
`occurred_at` datetime DEFAULT NULL,
`notes` text COLLATE utf8_unicode_ci,
`user_id` int(11) DEFAULT NULL,
`scheduled_for` datetime DEFAULT NULL,
`priority` tinyint(1) DEFAULT NULL,
`company_id` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`from_user_id` int(11) DEFAULT NULL,
`to_user_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_activities_on_contact_id` (`contact_id`),
KEY `index_activities_on_user_id` (`user_id`),
KEY `index_activities_on_company_id` (`company_id`)
) ENGINE=InnoDB AUTO_INCREMENT=515340 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
This is a greatest-n-per-group query, which comes up frequently on Stack Overflow.
Here's a solution that uses a MySQL 8.0 window function:
WITH latest_activities AS (
SELECT contact_id, occurred_at,
ROW_NUMBER() OVER (PARTITION BY contact_id ORDER BY occurred_at DESC) AS rn
FROM activities
)
SELECT *
FROM contacts AS c
LEFT OUTER JOIN latest_activities
ON c.id = latest_activities.contact_id AND latest_activities.rn = 1
WHERE c.company_id = 20
Here's a solution that should work on pre-8.0 versions:
SELECT c.*, a.*
FROM contacts AS c
LEFT OUTER JOIN activities AS a ON a.contact_id = c.id
LEFT OUTER JOIN activities AS a2 ON a2.contact_id = c.id
AND a2.occurred_at > a.occurred_at
WHERE c.company_id = 20
AND a2.contact_id IS NULL;
Another solution:
SELECT c.*, a.*
FROM contacts AS c
LEFT OUTER JOIN activities AS a ON a.contact_id = c.id
LEFT OUTER JOIN (
SELECT c2.contact_id, MAX(a2.occurred_at) AS occurred_at
FROM activities AS a2
INNER JOIN contacts AS c2 ON a2.contact_id = c2.id
WHERE c2.company_id = 20
GROUP BY c2.contact_id ORDER BY NULL
) AS latest_activities
ON latest_activities.contact_id = c.id
AND latest_activities.occurred_at = a.occurred_at
WHERE c.company_id = 20
It would be helpful to create a new index on activities (contact_id, occurred_at).
Don't use subqueries in the FROM clause if you can help it. They impede the MySQL optimizer. So, if you want one row:
Select c.*, a.occurred_at
from contacts c left outer join
from activities a
on c.id = a.contact_id and
a.occurred_at is not null
where c.company_id = 20
order by a.occurred_at desc
limit 1;
If you want one row per contact_id:
Select c.*, a.occurred_at
from contacts c left outer join
from activities a
on c.id = a.contact_id and
a.occurred_at is not null and
a.occurred_at = (select max(a2.occurred_at)
from activities a2
where a2.contact_id = a.contact_id
)
where c.company_id = 20
order by a.occurred_at desc
limit 1;
This can make use of an index on activities(contact_id, occured_at). and contact(company_id, contact_id).
Your query is doing one thing that is a clear no-no -- and no longer supported by the default settings in the most recent versions of MySQL. You have unaggregated columns in a select that are not in the group by. The contact_id should be generating an error.
I feel like I am overlooking something with how complicated the other answers are, but I would think this would be all you need.
SELECT c.*
, MAX(a.occurred_at) AS occurred_at
FROM contacts AS c
LEFT JOIN activities AS a
ON c.id = a.contact_id AND a.occurred_at IS NOT NULL
WHERE c.company_id = 20
GROUP BY c.id;
Notes: (1) this assumes you didn't actually want the duplicate contact_id from your original subquery to be in the final results. (2) This also assumes your server is not configured to require a full group by; if it is, you will need to manually expand c.* into the full column list, and copy that list to the GROUP BY clause as well.
Expanding on dnoeth's comments to your question; if you are not querying each company separately for a particular reason (chunking for load, code structure handling this also handles other stuff company by company, whatever), you could tweak the above query like so to get all your results in one query.
SELECT con.*
, MAX(a.occurred_at) AS occurred_at
FROM companies AS com
INNER JOIN contacts AS con ON com.id = con.company_id
LEFT JOIN activities AS a
ON con.id = a.contact_id AND a.occurred_at IS NOT NULL
WHERE [criteria for companies chosen to be queried]
GROUP BY con.id
ORDER BY con.company_id, con.id
;